Assume there are some strings containing names in different format (each line is a possible user input):
'Guilcher, G.M., Harvey, M. & Hand, J.P.'
'Ri Liesner, Peter Tom Collins, Michael Richards'
'Manco-Johnson M, Santagostino E, Ljung R.'
I need to transform those names to get the format Lastname ABC
. So each surename should be transformed to its initial which are appended to the lastname.
The example should result in
Guilcher GM, Harvey M, Hand JP
Liesner R, Collins PT, Richards M
Manco-Johnson M, Santagostino E, Ljung R
The problem is the different (possible) input format. I think my attempts are not very smart, so I'm asking for
- Some hints to optimize the transformation code
- How do I put those in a single function at all? I think first of all I have to test which format the string has...??
So let me explain how far I tried to solve that:
First example string
In the first example there are initials followed by a dot. The dots should be removed and the ma between the name and the initals should be removed.
firstString
.replace('.', '')
.replace(' &', ', ')
I think I do need an regex to get the ma after the name and before the initials.
Second example string
In the second example the name should be splitted by space and the last element is handled as lastname:
const elm = secondString.split(/\s+/)
const lastname = elm[elm.length - 1]
const initials = elm.map((n,i) => {
if (i !== elm.length - 1) return capitalizeFirstLetter(n)
})
return lastname + ' ' + initals.join('')
...not very elegant
Third example string
The third example has the already the correct format - only the dot at the end has to be removed. So nothing else has to be done with that input.
Assume there are some strings containing names in different format (each line is a possible user input):
'Guilcher, G.M., Harvey, M. & Hand, J.P.'
'Ri Liesner, Peter Tom Collins, Michael Richards'
'Manco-Johnson M, Santagostino E, Ljung R.'
I need to transform those names to get the format Lastname ABC
. So each surename should be transformed to its initial which are appended to the lastname.
The example should result in
Guilcher GM, Harvey M, Hand JP
Liesner R, Collins PT, Richards M
Manco-Johnson M, Santagostino E, Ljung R
The problem is the different (possible) input format. I think my attempts are not very smart, so I'm asking for
- Some hints to optimize the transformation code
- How do I put those in a single function at all? I think first of all I have to test which format the string has...??
So let me explain how far I tried to solve that:
First example string
In the first example there are initials followed by a dot. The dots should be removed and the ma between the name and the initals should be removed.
firstString
.replace('.', '')
.replace(' &', ', ')
I think I do need an regex to get the ma after the name and before the initials.
Second example string
In the second example the name should be splitted by space and the last element is handled as lastname:
const elm = secondString.split(/\s+/)
const lastname = elm[elm.length - 1]
const initials = elm.map((n,i) => {
if (i !== elm.length - 1) return capitalizeFirstLetter(n)
})
return lastname + ' ' + initals.join('')
...not very elegant
Third example string
The third example has the already the correct format - only the dot at the end has to be removed. So nothing else has to be done with that input.
Share Improve this question edited Jun 18, 2018 at 20:55 user3142695 asked Jun 18, 2018 at 20:16 user3142695user3142695 17.4k55 gold badges200 silver badges375 bronze badges 6- Given your strings, you need to properly find a delimiter to split on names. Normally I would say ma, but the ma is used a separation between lastname and titles. Do you have control over this list of names, and the delimiters they create? – Fallenreaper Commented Jun 18, 2018 at 20:21
- is each string one name only? I would assume 'Guilcher, G.M.' is one string, and 'Harvey, M. & Hand, J.P.' is another string. Is every name beside the one with an & connecting them a single name per string? Or am I misunderstanding, and this is actually one string of multiple names? – Devin Fields Commented Jun 18, 2018 at 20:24
- @DevinFields it is always one string of multiple names. Each line is one string and one possible user input – user3142695 Commented Jun 18, 2018 at 20:34
- @Fallenreaper no, i dont have control over that – user3142695 Commented Jun 18, 2018 at 20:35
- 1 Well, you'll first have to devise a way of separating the names. My first thought is to check the contents after each ma. Upon hitting the first ma, check if the following contents contains a period. If it does, you can include that in the contents prior to than ma. If not it is most likely the beginning of a new name. Account for edge cases, such the & joining two name and that instance of 'Harvey, M. & Hand, J.P. Ri Liesner' (unless there was supposed to be a ma after the J.P.) – Devin Fields Commented Jun 18, 2018 at 20:40
3 Answers
Reset to default 3It wouldn't be possible without calling multiple replace()
methods. The steps in provided solution is as following:
- Remove all dots in abbreviated names
- Substitute lastname with firstname
- Replace lastnames with their beginning letter
- Remove unwanted characters
Demo:
var s = `Guilcher, G.M., Harvey, M. & Hand, J.P.
Ri Liesner, Peter Tom Collins, Michael Richards
Manco-Johnson M, Santagostino E, Ljung R.`
// Remove all dots in abbreviated names
var b = s.replace(/\b([A-Z])\./g, '$1')
// Substitute first names and lastnames
.replace(/([A-Z][\w-]+(?: +[A-Z][\w-]+)*) +([A-Z][\w-]+)\b/g, ($0, $1, $2) => {
// Replace full lastnames with their first letter
return $2 + " " + $1.replace(/\b([A-Z])\w+ */g, '$1');
})
// Remove unwanted preceding / following mas and ampersands
.replace(/(,) +([A-Z]+)\b *[,&]?/g, ' $2$1');
console.log(b);
Given your example data i would try to make guesses based on name part count = 2, since it is very hard to rely on any ,
, &
or \n
- which means treat them all as ,
.
Try this against your data and let me know of any use-cases where this fails because i am highly confident that this script will fail at some point with more data :)
let testString = "Guilcher, G.M., Harvey, M. & Hand, J.P.\nRi Liesner, Peter Tom Collins, Michael Richards\nManco-Johnson M, Santagostino E, Ljung R.";
const inputToArray = i => i
.replace(/\./g, "")
.replace(/[\n&]/g, ",")
.replace(/ ?, ?/g, ",")
.split(',');
const reducer = function(accumulator, value, index, array) {
let pos = accumulator.length - 1;
let names = value.split(' ');
if(names.length > 1) {
accumulator.push(names);
} else {
if(accumulator[pos].length > 1) accumulator[++pos] = [];
accumulator[pos].push(value);
}
return accumulator.filter(n => n.length > 0);
};
console.log(inputToArray(testString).reduce(reducer, [[]]));
Here's my approach. I tried to keep it short but plexity was surprisingly high to get the edge cases.
- First I'm formatting the input, to replace
&
for,
, and removing.
. - Then, I'm splitting the input by
\n
, then,
and finally(spaces).
- Next I'm processing the chunks. On each new segment (delimited by
,
), I process the previous segment. I do this because I need to be sure that the current segment isn't an initial. If that's the case, I do my best to skip that inital-only segment and process the previous one. The previous one will have the correct initial and surname, as I have all the information I neeed. - I get the initial on the segment if there's one. This will be used on the start of the next segment to process the current one.
- After finishing each line, I process again the last segment, as it wont be called otherwise.
I understand the plexity is high without using regexp, and probably would have been better to use a state machine to parse the input instead.
const isInitial = s => [...s].every(c => c === c.toUpperCase());
const generateInitial = arr => arr.reduce((a, c, i) => a + (i < arr.length - 1 ? c[0].toUpperCase() : ''), '');
const formatSegment = (words, initial) => {
if (!initial) {
initial = generateInitial(words);
}
const surname = words[words.length - 1];
return {initial, surname};
}
const doDisplay = x => x.map(x => x.surname + ' ' + x.initial).join(', ');
const doProcess = _ => {
const formatted = input.value.replace(/\./g, '').replace(/&/g, ',');
const chunks = formatted.split('\n').map(x => x.split(',').map(x => x.trim().split(' ')));
const peoples = [];
chunks.forEach(line => {
let lastSegment = null;
let lastInitial = null;
let lastInitialOnly = false;
line.forEach(segment => {
if (lastSegment) {
// if segment only contains an initial, it's the initial corresponding
// to the previous segment
const initialOnly = segment.length === 1 && isInitial(segment[0]);
if (initialOnly) {
lastInitial = segment[0];
}
// avoid processing last segments that were only initials
// this prevents adding a segment twice
if (!lastInitialOnly) {
// if segment isn't an initial, we need to generate an initial
// for the previous segment, if it doesn't already have one
const people = formatSegment(lastSegment, lastInitial);
peoples.push(people);
}
lastInitialOnly = initialOnly;
// Skip initial only segments
if (initialOnly) {
return;
}
}
lastInitial = null;
// Remove the initial from the words
// to avoid getting the initial calculated for the initial
segment = segment.filter(word => {
if (isInitial(word)) {
lastInitial = word;
return false;
}
return true;
});
lastSegment = segment;
});
// Process last segment
if (!lastInitialOnly) {
const people = formatSegment(lastSegment, lastInitial);
peoples.push(people);
}
});
return peoples;
}
process.addEventListener('click', _ => {
const peoples = doProcess();
const display = doDisplay(peoples);
output.value = display;
});
.row {
display: flex;
}
.row > * {
flex: 1 0;
}
<div class="row">
<h3>Input</h3>
<h3>Output</h3>
</div>
<div class="row">
<textarea id="input" rows="10">Guilcher, G.M., Harvey, M. & Hand, J.P.
Ri Liesner, Peter Tom Collins, Michael Richards
Manco-Johnson M, Santagostino E, Ljung R.
Jordan M, Michael Jackson & Willis B.</textarea>
<textarea id="output" rows="10"></textarea>
</div>
<button id="process" style="display: block;">Process</button>
发布者:admin,转转请注明出处:http://www.yc00.com/questions/1745352373a4623911.html
评论列表(0条)