javascript - Regex to remove substrings such as "Official Video", "Audio", "Music V

admin•2025-04-20 14:54:45•questions•阅读0

I'm trying to clean YouTube video title from unnecessary words such as "Official Video",

I'm trying to clean YouTube video title from unnecessary words such as "Official Video", "Audio", "Music Video" etc. I need help constructing regex that I can use. What I tried so far:

const regex = /\s*[-\(\[]?\s*(-|official|video|audio|lyrics|lyric|hd|full|4k|music\s+video|\d{4})\s*[\)\]]?$/gi;

As I understand, this would remove only last occurrence of keywords. What I did is that I used it in a loop like this:

function clearSearchTerm(title) {
    const regex = /\s*[-\(\[]?\s*(-|official|video|audio|lyrics|lyric|hd|full|4k|music\s+video|\d{4})\s*[\)\]]?$/gi;
    let newTitle;

    do {
        newTitle = title;
        title = title.replace(regex, "");
    } while (newTitle !== title);

    return title;
}

Right now it works for me since I didn't find any example where it doesn't work. What was mentioned in comments is that I had problem that my previous regex would remove keywords if they appeared in middle of title which I guess is solved with this. If you have any idea how this can be improved, I'm all ears. In next part I will write examples of what I need to remove.

Words that I'm trying to remove are of kind:

Audio
Video
Lyrics
Official
Remaster
2020 (or years in general)
...

And all those words (and maybe more) can appear between ( and ) or between [ and ] or after -. Those words can be combined, for example: Some title - Official Video which should be cleaned to be Some title etc.

I'm trying to clean YouTube video title from unnecessary words such as "Official Video", "Audio", "Music Video" etc. I need help constructing regex that I can use. What I tried so far:

const regex = /\s*[-\(\[]?\s*(-|official|video|audio|lyrics|lyric|hd|full|4k|music\s+video|\d{4})\s*[\)\]]?$/gi;

As I understand, this would remove only last occurrence of keywords. What I did is that I used it in a loop like this:

function clearSearchTerm(title) {
    const regex = /\s*[-\(\[]?\s*(-|official|video|audio|lyrics|lyric|hd|full|4k|music\s+video|\d{4})\s*[\)\]]?$/gi;
    let newTitle;

    do {
        newTitle = title;
        title = title.replace(regex, "");
    } while (newTitle !== title);

    return title;
}

Words that I'm trying to remove are of kind:

Audio
Video
Lyrics
Official
Remaster
2020 (or years in general)
...

Share Improve this question edited Mar 6 at 15:40 asked Mar 6 at 14:38 Milos Stojanovic 7131 gold badge9 silver badges18 bronze badges

1 So what will happen to Video in The Buggles - Video killed the Radio Star then? :-) – C3roe Commented Mar 6 at 14:48
Necessary mistake :). Maybe it would be nice to match end of string in the end. But maybe it would be too hard of a constraint but I guess those words are almost always on the end of title. – Milos Stojanovic Commented Mar 6 at 14:50
There is no way to answer with 100% accuracy right now, so, something like this, maybe... Or maybe not. – Wiktor Stribiżew Commented Mar 6 at 14:59
@WiktorStribiżew A bit too repetitive, right? Why can't it be compressed into shorter regex where keywords wouldn't be repeated for every prefix ((, -, [)? – Milos Stojanovic Commented Mar 6 at 15:06
It does not matter. You may use a variable. JS regex patterns are not that much limited in length as PCRE. Or do you have thousands of these words? – Wiktor Stribiżew Commented Mar 6 at 15:08

| Show 2 more comments

2 Answers 2

Sorted by: Reset to default 3

With PCRE (typically in PHP), you can avoid the repetition of words by declaring a sub-pattern and then reuse it later in the main pattern. It's also possible to add comments and spaces for readability with the x flag:

/
(?(DEFINE)
  (?<words_to_drop>
    (?:
      \s*
      \b(?:Official|Video|Audio|Music|Lyrics?|Remaster(?:ed)?|HD|LP|HQ|4k|Full|Version)\b
      \s*
    )+
  )
)
# Finishing by - and words to remove (but not years).
\s+[-–]\s+\g<words_to_drop>$
| # or
# Words or years to remove between brackets or parenthesis.
\s*[[(](?:\g<words_to_drop>|\s*\d{4}\s*)+[\])]
/ix

See it in action with the explanation: https://regex101/r/kPeYzb/1

Notice that the regex part for the brackets or parenthesis isn't 100% correct, as it would also match "(Official video]", but I prefer making the regex short by avoiding a third re-use of the sub-pattern, and really don't think this matters a lot in your case.

If you have to stick to JavaScript's engine, you'll have to remove the spaces, comments and copy-paste the pattern for the words, leading to the same pattern, in JavaScript flavour:

const pattern = /\s+[-–]\s+(?:\s*\b(?:Official|Video|Audio|Music|Lyrics?|Remaster(?:ed)?|HD|LP|HQ|4k|Full|Version)\b\s*)+$|\s*[[(](?:(?:\s*\b(?:Official|Video|Audio|Music|Lyrics?|Remaster(?:ed)?|HD|LP|HQ|4k|Full|Version)\b\s*)+|\s*\d{4}\s*)+[\])]/gi;

In action here: https://regex101/r/kPeYzb/2

Now, about your question of avoiding having this list of words entered twice in the regex pattern, it is possible to create the regex object from a string, with the RegExp() constructor. This means that you could have an array of words (or word patterns) from a configuration:

// Original commented regular expression : https://regex101/r/kPeYzb/1

// We will build this regular expression from a custom list of words,
// for example taken from a configuration page.
const wordPatternsFromConfig = [
  'Official',
  'Video',
  'Audio',
  'Music',
  'Lyrics?',
  'Remaster(?:ed)?',
  'HD',
  'LP',
  'HQ',
  '4k',
  'Full',
  'Version',
  // Uncomment this pattern with an error, for the demo.
  //'Dumm(?y|ies)' // Instead of "Dumm(?:y|ies)"
];

// IMPORTANT: You should validate each word regex before saving the config.
// Example of how you could do this:
let validWordPatterns = [];
let invalidWordPatterns = [];
wordPatternsFromConfig.forEach((wordPattern) => {
  try {
    const wordRegex = new RegExp(wordPattern);
    validWordPatterns.push(wordPattern);
  } catch (e) {
    invalidWordPatterns.push(e.message);
  }
});
if (invalidWordPatterns.length > 0) {
  console.log('You have invalid word patterns! Check the following errors:', invalidWordPatterns);
}

// IMPORTANT: compared to the regex syntax, if we build a RegExp instance
//            from a string, each backslash should be escaped.
// The regex to match multiple words from this list of words to remove.
const regexWordsToRemove = '(?:\\s*\\b(?:' + validWordPatterns.join('|') + ')\\b\\s*)+';
// The full regex pattern, for the first cleanup step.
const patternCleanup1 = '\\s+[-–]\\s+' + regexWordsToRemove + '$|\\s*[[(](?:' + regexWordsToRemove + '|\\s*\\d{4}\\s*)+[\\])]';
// Create the regex object from the pattern string.
const regexCleanup1 = new RegExp(patternCleanup1, 'gmi');
// Printing it should give the same result as the original regex we
// made here: https://regex101/r/kPeYzb/2
//console.log(regexCleanup);
// A second regex to clean up some other undesired things at the end.
const regexCleanup2 = /\s*[-(\[|]*\s*$/gmi;

// When HTML is parsed and content loaded, add the JS logic.
document.addEventListener('DOMContentLoaded', (loaded) => {
  const input = document.getElementById('input');
  const output = document.getElementById('output');

  // Function to update the output, based on the input.
  function updateOutput() {
    output.value = input.value.replace(regexCleanup1, '').replace(regexCleanup2, '');
  }

  // When the input changes, update the output.
  input.addEventListener('input', updateOutput);
  
  // Update the output for the initial input value.
  updateOutput();
});

body {
  font-family: Arial, sans-serif;
}

.two-cols {
  display: grid;
  grid-template-columns: 1fr 1fr;
  grid-column-gap: .5em;
}

textarea {
  /* Just because the snippet space is small. */
  font-size: 0.8em;
  /* Don't wrap the text, to make comparison easier. */
  white-space: pre;
  overflow-wrap: normal;
  overflow-x: scroll;
  box-sizing: border-box;
  width: 100%;
}

textarea[readonly] {
  color: #666;
  background: #f8f8f8;
}

small {
  font-size: 0.65em;
}

<form id="clean-up" class="two-cols" action="#">

  <div>
    <label for="input">Input:</label>
    <textarea id="input" name="input"
              placeholder="Put your text here"
              rows="10">Some title - Official Video
Some title [Official Video]
Some title (Official Video)
The Buggles - Video killed the Radio Star
The Smashing Pumpkins - 1979 (Official Music Video)
Miki Jevremović - Prijatelji, ja vam pevam | [Official Music Audio]
1979 (Remastered 2012)
New Order – 1963 (Lyrics)
Paul Davis - '65 Love Affair (1981 LP Version HQ)
Pulp - Disco 2000</textarea>
  </div>
  
  <div>
    <label for="output">Out: <small>auto-updated</small></label>
    <textarea id="output" name="output"
              placeholder="Modified text" readonly
              rows="10"></textarea>
  </div>
  
</form>

This regex will match - or [ or ( followed by any number of literal spaces , followed by any of the words OFFICIAL VIDEO|REMASTER|LYRICS|AUDIO or a four digit number, followed any number of spaces followed by a matching closing bracket (when applicable).

REGEX PATTERN (ECMAScript(JavaScript) flavor)(Flags: gmi):

(?:-|\((?:(?<=\()(?= *[^)\n]+ *\)))|\[(?:(?<=\[)(?= *[^\]\n]+ *\]))) *(?:OFFICIAL VIDEO|REMASTER|LYRICS|AUDIO|\d{4})\s*(?:\]|\))?(?= |\n|$)

Regex demo: https://regex101/r/Wy2I0w/8 (10 matches)

NOTES:

(|\[(?:(?<=\[)(?= *[^\]\n]* *\])))
(?: Open non-capturing group (?:...) alternation (...|...|...) statement. Match one of the elements in the alternation statement separated by the pipe (|).
- Match literal dash - (1st option)
| Alternation element delimiter. Followed by 2nd option.
\( Match literal (
(?: Begin non-capturing group (?:...) (2nd option)
(?<= Begin lookbehind (?<=...) to check for opening (.
\( Match literal (. This character must precede this index point.
) Close lookbehind.
(?= Begin lookahead (?=...) to make sure there is a matching closing ). Will not consume characters.
* Match 0 or more (*) literal spaces .
[^)\n]+ Negated capturing class [^...] matches any character that is not ) or newline \n, 1 or more times (+).
* Match 0 or more (*) literal spaces .
\) Match literal ).
) Close lookahead.
) Close non-capturing group (2nd option)
| Alternation element delimiter. Followed by 3nd option.
\[ Match literal [.
(?: Begin non-capturing group (?:...) (3rd option)
(?<= Begin lookbehind (?<=...) to check for opening.
\[ Match literal [.
) Close *lookbehind.
(?= Begin lookahead to locate matching closing bracket ]. Will not consume characters.
* Match 0 or more literal spaces .
[^\]\n]+ Negated character class Match any character that is not ] or newline \n, one or more times (+).
* Match literal space 0 or more times.
\] Match literal ].
) Close lookahead.
) Close non-capturing group.
) Close alternation group.
* Match 0 or more literal spaces .
(?: Begin non-capturing group containing an alternation.
OFFICIAL VIDEO|REMASTER|LYRICS|AUDIO|\d{4} Altenation matches one of the words listed or four digits \d{4} (year).
) Close non-capturing group.
\s* Match 0 or more whitespace characters \s.
(?: Open non-capturing group containing alternation.
\]|\) Match either a literal ] or a literal ).
)? Close alternation group. Make it optional (?).
(?= Begin lookahead, will not consume characters.
|\n|$ Matches a literal space character , a newline \n or end of line $.
) Close lookahead.

TEST STRING:

FIRST title - Official Video 
SECOND title [Official VIDEO]
THIRD title (Lyrics) 
FOURTH title - Remaster
FIFTH title - [ Audio ]
SIXTH title ( Lyrics ) 
SEVENTH title (2020) 
EIGHT title (1999)
NINTH title (20)
TENTH title [ 2002 ]
ELEVENTH title [ 200 ]
TWELFTH  title ( 1999 )
THIRTEENTH  title ( Official Lyrics )
FOURTEENTH  title ( Official VIDEO]
FOURTEENTH  title ( Official VIDEO
FOURTEENTH  title [Official VIDEO)
FOURTEENTH  title Official VIDEO]

RESULT:

FIRST title 
SECOND title 
THIRD title  
FOURTH title 
FIFTH title - 
SIXTH title  
SEVENTH title  
EIGHT title 
NINTH title (20)
TENTH title 
ELEVENTH title [ 200 ]
TWELFTH  title 
THIRTEENTH  title ( Official Lyrics )
FOURTEENTH  title ( Official VIDEO]
FOURTEENTH  title ( Official VIDEO
FOURTEENTH  title [Official VIDEO)
FOURTEENTH  title Official VIDEO]

发布者：admin，转转请注明出处：http://www.yc00.com/questions/1744969113a4603860.html

admin

questions
javascript - Import external js File in Angular 4 and access functions and variables - Stack Overflow
Maybe my title is a bit unclear, so I will try to explain my problem with a simple task.Lets say I have
admin
32分钟前
00
questions
javascript - NVD3 - configuring ticks on axis - Stack Overflow
I have a nvd3 line chart which displays a time series and can't get the ticks on the x axis right.
admin
28分钟前
00
questions
javascript - Getting error : Unknown argument "id" on field "user" of type "Query&a
We are forming a query in relay. We have user database set as follows:function User(id, name, des) {thi
admin
27分钟前
00
questions
javascript - How do I access methods in React for unit testing - Stack Overflow
I'm having an incredibly difficult time unit testing anything with React. The docs on TestUtils ar
admin
21分钟前
10
questions
javascript - Apply click event only to the targeted class jQuery - Stack Overflow
This question might be asked earlier, but I was not able to grasp the answers since there were a lot bu
admin
21分钟前
00
questions
javascript - How To Check An Image Exists Or Not Through PHP? - Stack Overflow
I want to check an image hosted on another siteserver if that exists or not at the time of code runnin
admin
20分钟前
00
questions
javascript - Bing Maps - how to link to a push pin from a link outside the map - Stack Overflow
I have a Virtual Earth Maps (Bing Maps??) to which I have added a set of pushpins. Each pushpin is labe
admin
19分钟前
00
questions
java - Why won't native spring GRPC starter start? - Stack Overflow
So I used initializr and created a project. Then I followed the example here. So far, I've had no
admin
18分钟前
00
questions
css - Cannot add a class to metabox
I'm trying to add a class to the metabox, for doing so I have created the metabox element in this way:array( '
admin
18分钟前
10
questions
javascript - Iterate through elements, adding click event to each of them - Stack Overflow
jQuery newbie here.If I have this html:<ul id="floor-selector"><li id="floor-1&
admin
16分钟前
00
questions
menus - Adding additional html to the end of the root level in a custom nav walker
I have a custom nav walker, which essentially just add's new classes to the menu, however, my menu also contains so
admin
11分钟前
00
questions
javascript - jasmine-maven-plugin and require-js result in path issues - Stack Overflow
I am using require-js to model dependencies in my java script project. I also use jasmine to write BDD
admin
10分钟前
00
questions
javascript - tell datatable to use custom button for file export - Stack Overflow
I have a working htmljs datatable example jsfiddle that has two working buttons for exporting data; ex
admin
10分钟前
00
questions
javascript - Re-enable disabled command button when an input field is filled - Stack Overflow
I need to enabledisable a button on a JSF 2.0 page depending on user has entered text in a text area
admin
8分钟前
00
questions
thumbnails - Stop WordPress compressing images? – Quality is terrible
I understand WordPress creates smaller versions of the original image file.These versions are blurry in comparison to th
admin
6分钟前
00
questions
javascript - Firebase with useeffect cleanup function - Stack Overflow
I'm trying to get all data with the 'public' parameter of a firebase collection and then
admin
6分钟前
10
questions
javascript - jQuery - Register an element in the DOM after setting html() - Stack Overflow
I have a div, where I set the innerHTML after a button has been clicked:$('#headerDiv').html(
admin
4分钟前
00
questions
javascript - jquery select text - Stack Overflow
<div>select this<strong>dfdfdf<strong><div><div><span>something&
admin
3分钟前
00
questions
adding image to svg file using javascript and jquery - Stack Overflow
I'm trying to add an image to an SVG file..I tried this code::drawImage : function(src, x, y, h,
admin
1分钟前
10
questions
Display list of categories filtered by date?
I have a sidebar on our intranet which lists, hierarchically, posts by category.Such as:-- HR---Events----PicnicHoweve
admin
32秒前
00

发表回复

评论列表（0条）

暂无评论

javascript - Regex to remove substrings such as "Official Video", "Audio", "Music V

2 Answers 2

发表回复

评论列表（0条）

联系我们

400-800-8888

javascript - Regex to remove substrings such as &quot;Official Video&quot;, &quot;Audio&quot;, &quot;Music V

2 Answers 2

相关推荐

发表回复

评论列表（0条）

联系我们

400-800-8888

javascript - Regex to remove substrings such as "Official Video", "Audio", "Music V