Discussion on: React.js - Interview Question - duplicate hashtag remover.

View post

Find duplicates and remove them using RegExp.

Sorry, but I did just two functions... 😇

const text = `#t1 #t2
#t3 #t4 #t2 #t5 #t3
#t2`;

const getDupTags = txt => {
    const matches = txt.match(/\B#\S+\b/gm) ?? [];
    return [...new Set(matches.filter((tag, index) => matches.indexOf(tag) !== index))]
}


console.log(getDupTags(text))

const removeDupTags = (str, txt) => {
    let isFirstMatch = true;
    let matches;
    const regexp = new RegExp(str, 'dgm')
    while((matches = regexp.exec(txt))){
        if(isFirstMatch){
            isFirstMatch = false;
        }else if(matches){
        const [startIndex, endIndex] = matches.indices.flat();
        txt = txt.substring(0, startIndex - 1)+ txt.substring(endIndex)
        }
    }
    return txt;
}
console.log(removeDupTags('#t2', text))

Rajesh Royal • Aug 22 '22

@rhobert72 nice solution, also can you please write a little bit about the regex expression you used to understand it better. Like what this line is doing txt.match(/\B#\S+\b/gm)

Thanks.

Roberto Garella • Aug 24 '22

To find a tag within a text we have to play with word boundaries.
With \B (b upper case) we are looking for a position is not preceded and not followed by a word character. In our case we are looking for a word starts with # precede by a white space.
Instead, with \b (notice b is lower case) we are looking for a position preceded by a word character but not followed by a word character.
With \S+ we are looking for all characters (one or more than one) excluded: new line, tab and so on.
Flags: g stands for global and tells to regexp engine don't stop at the first match; m stands for multiline.
Flag m is not necessary. 😁

Rajesh Royal • Aug 25 '22

Thanks @rhobert72 ✌🏻