javascript - Get initials and full last name from a string containing names - Stack Overflow

Assume there are some strings containing names in different format (each line is a possible user input)

Assume there are some strings containing names in different format (each line is a possible user input):

'Guilcher, G.M., Harvey, M. & Hand, J.P.'
'Ri Liesner, Peter Tom Collins, Michael Richards'
'Manco-Johnson M, Santagostino E, Ljung R.'

I need to transform those names to get the format Lastname ABC. So each surename should be transformed to its initial which are appended to the lastname.

The example should result in

Guilcher GM, Harvey M, Hand JP
Liesner R, Collins PT, Richards M
Manco-Johnson M, Santagostino E, Ljung R

The problem is the different (possible) input format. I think my attempts are not very smart, so I'm asking for

  1. Some hints to optimize the transformation code
  2. How do I put those in a single function at all? I think first of all I have to test which format the string has...??

So let me explain how far I tried to solve that:

First example string

In the first example there are initials followed by a dot. The dots should be removed and the ma between the name and the initals should be removed.

firstString
  .replace('.', '')
  .replace(' &', ', ')

I think I do need an regex to get the ma after the name and before the initials.

Second example string

In the second example the name should be splitted by space and the last element is handled as lastname:

const elm = secondString.split(/\s+/)
const lastname = elm[elm.length - 1]
const initials = elm.map((n,i) => {
  if (i !== elm.length - 1) return capitalizeFirstLetter(n)
})

return lastname + ' ' + initals.join('')

...not very elegant

Third example string

The third example has the already the correct format - only the dot at the end has to be removed. So nothing else has to be done with that input.

Assume there are some strings containing names in different format (each line is a possible user input):

'Guilcher, G.M., Harvey, M. & Hand, J.P.'
'Ri Liesner, Peter Tom Collins, Michael Richards'
'Manco-Johnson M, Santagostino E, Ljung R.'

I need to transform those names to get the format Lastname ABC. So each surename should be transformed to its initial which are appended to the lastname.

The example should result in

Guilcher GM, Harvey M, Hand JP
Liesner R, Collins PT, Richards M
Manco-Johnson M, Santagostino E, Ljung R

The problem is the different (possible) input format. I think my attempts are not very smart, so I'm asking for

  1. Some hints to optimize the transformation code
  2. How do I put those in a single function at all? I think first of all I have to test which format the string has...??

So let me explain how far I tried to solve that:

First example string

In the first example there are initials followed by a dot. The dots should be removed and the ma between the name and the initals should be removed.

firstString
  .replace('.', '')
  .replace(' &', ', ')

I think I do need an regex to get the ma after the name and before the initials.

Second example string

In the second example the name should be splitted by space and the last element is handled as lastname:

const elm = secondString.split(/\s+/)
const lastname = elm[elm.length - 1]
const initials = elm.map((n,i) => {
  if (i !== elm.length - 1) return capitalizeFirstLetter(n)
})

return lastname + ' ' + initals.join('')

...not very elegant

Third example string

The third example has the already the correct format - only the dot at the end has to be removed. So nothing else has to be done with that input.

Share Improve this question edited Jun 18, 2018 at 20:55 user3142695 asked Jun 18, 2018 at 20:16 user3142695user3142695 17.4k55 gold badges200 silver badges375 bronze badges 6
  • Given your strings, you need to properly find a delimiter to split on names. Normally I would say ma, but the ma is used a separation between lastname and titles. Do you have control over this list of names, and the delimiters they create? – Fallenreaper Commented Jun 18, 2018 at 20:21
  • is each string one name only? I would assume 'Guilcher, G.M.' is one string, and 'Harvey, M. & Hand, J.P.' is another string. Is every name beside the one with an & connecting them a single name per string? Or am I misunderstanding, and this is actually one string of multiple names? – Devin Fields Commented Jun 18, 2018 at 20:24
  • @DevinFields it is always one string of multiple names. Each line is one string and one possible user input – user3142695 Commented Jun 18, 2018 at 20:34
  • @Fallenreaper no, i dont have control over that – user3142695 Commented Jun 18, 2018 at 20:35
  • 1 Well, you'll first have to devise a way of separating the names. My first thought is to check the contents after each ma. Upon hitting the first ma, check if the following contents contains a period. If it does, you can include that in the contents prior to than ma. If not it is most likely the beginning of a new name. Account for edge cases, such the & joining two name and that instance of 'Harvey, M. & Hand, J.P. Ri Liesner' (unless there was supposed to be a ma after the J.P.) – Devin Fields Commented Jun 18, 2018 at 20:40
 |  Show 1 more ment

3 Answers 3

Reset to default 3

It wouldn't be possible without calling multiple replace() methods. The steps in provided solution is as following:

  • Remove all dots in abbreviated names
  • Substitute lastname with firstname
  • Replace lastnames with their beginning letter
  • Remove unwanted characters

Demo:

var s = `Guilcher, G.M., Harvey, M. & Hand, J.P.
Ri Liesner, Peter Tom Collins, Michael Richards
Manco-Johnson M, Santagostino E, Ljung R.`

// Remove all dots in abbreviated names
var b = s.replace(/\b([A-Z])\./g, '$1')
// Substitute first names and lastnames
.replace(/([A-Z][\w-]+(?: +[A-Z][\w-]+)*) +([A-Z][\w-]+)\b/g, ($0, $1, $2) => {
    // Replace full lastnames with their first letter
    return $2 + " " + $1.replace(/\b([A-Z])\w+ */g, '$1');
})
// Remove unwanted preceding / following mas and ampersands 
.replace(/(,) +([A-Z]+)\b *[,&]?/g, ' $2$1');

console.log(b);

Given your example data i would try to make guesses based on name part count = 2, since it is very hard to rely on any ,, & or \n - which means treat them all as ,.

Try this against your data and let me know of any use-cases where this fails because i am highly confident that this script will fail at some point with more data :)

let testString = "Guilcher, G.M., Harvey, M. & Hand, J.P.\nRi Liesner, Peter Tom Collins, Michael Richards\nManco-Johnson M, Santagostino E, Ljung R.";

const inputToArray = i => i
    .replace(/\./g, "")
    .replace(/[\n&]/g, ",")
    .replace(/ ?, ?/g, ",")
    .split(',');

const reducer = function(accumulator, value, index, array) {
    let pos = accumulator.length - 1;
    let names = value.split(' ');
    if(names.length > 1) {
        accumulator.push(names);
    } else {
        if(accumulator[pos].length > 1) accumulator[++pos] = [];
        accumulator[pos].push(value);
    }
    return accumulator.filter(n => n.length > 0);
};

console.log(inputToArray(testString).reduce(reducer, [[]]));

Here's my approach. I tried to keep it short but plexity was surprisingly high to get the edge cases.

  • First I'm formatting the input, to replace & for ,, and removing ..
  • Then, I'm splitting the input by \n, then , and finally (spaces).
  • Next I'm processing the chunks. On each new segment (delimited by ,), I process the previous segment. I do this because I need to be sure that the current segment isn't an initial. If that's the case, I do my best to skip that inital-only segment and process the previous one. The previous one will have the correct initial and surname, as I have all the information I neeed.
  • I get the initial on the segment if there's one. This will be used on the start of the next segment to process the current one.
  • After finishing each line, I process again the last segment, as it wont be called otherwise.

I understand the plexity is high without using regexp, and probably would have been better to use a state machine to parse the input instead.

const isInitial = s => [...s].every(c => c === c.toUpperCase());
const generateInitial = arr => arr.reduce((a, c, i) => a + (i < arr.length - 1 ? c[0].toUpperCase() : ''), '');
const formatSegment = (words, initial) => {
  if (!initial) {
    initial = generateInitial(words);
  }
  const surname = words[words.length - 1];
  return {initial, surname};
}

const doDisplay = x => x.map(x => x.surname + ' ' + x.initial).join(', ');

const doProcess = _ => {
  const formatted = input.value.replace(/\./g, '').replace(/&/g, ',');
  const chunks = formatted.split('\n').map(x => x.split(',').map(x => x.trim().split(' ')));
  const peoples = [];
  chunks.forEach(line => {
    let lastSegment = null;
    let lastInitial = null;
    let lastInitialOnly = false;
    line.forEach(segment => {
      if (lastSegment) {
        // if segment only contains an initial, it's the initial corresponding
        // to the previous segment
        const initialOnly = segment.length === 1 && isInitial(segment[0]);
        if (initialOnly) {
          lastInitial = segment[0];
        }
        // avoid processing last segments that were only initials
        // this prevents adding a segment twice
        if (!lastInitialOnly) {
          // if segment isn't an initial, we need to generate an initial
          // for the previous segment, if it doesn't already have one
          const people = formatSegment(lastSegment, lastInitial);
          peoples.push(people);
        }
        lastInitialOnly = initialOnly;
        
        // Skip initial only segments
        if (initialOnly) {
          return;
        }
      }
      lastInitial = null;
      
      // Remove the initial from the words
      // to avoid getting the initial calculated for the initial
      segment = segment.filter(word => {
        if (isInitial(word)) {
          lastInitial = word;
          return false;
        }
        return true;
      });
      lastSegment = segment;
    });
    
    // Process last segment
    if (!lastInitialOnly) {
      const people = formatSegment(lastSegment, lastInitial);
      peoples.push(people);
    }
  });
  return peoples;
}
process.addEventListener('click', _ => {
  const peoples = doProcess();
  const display = doDisplay(peoples);
  output.value = display;
});
.row {
  display: flex;
}

.row > * {
  flex: 1 0;
}
<div class="row">
  <h3>Input</h3>
  <h3>Output</h3>
</div>
<div class="row">
  <textarea id="input" rows="10">Guilcher, G.M., Harvey, M. & Hand, J.P.
Ri Liesner, Peter Tom Collins, Michael Richards
Manco-Johnson M, Santagostino E, Ljung R.
Jordan M, Michael Jackson & Willis B.</textarea>
  <textarea id="output" rows="10"></textarea>
</div>
<button id="process" style="display: block;">Process</button>

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1745352373a4623911.html

相关推荐

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信