Discussion on: [JS] Calculate phonetic similarity of two strings, any ideas?

View post

When you say compare phonetic similarities, not the audio, because it's sourced from different speakers... are you then making the assumption that the speakers will all be using the same phonology for a given word? Because that's a different issue.

Jochem Stoel • Feb 21 '18

Interesting response. I don't know if I am making that assumption, should I?

Max Cerrina • Feb 21 '18

Within reasonable bounds, yes -- ie, depending on the dialect and accent, "Mary", "merry", and "marry" may be pronounced differently, or "cot" and "caught" -- or they might be the same. Or something like "beg" vs "bag" might sound the same, or not, but certainly won't be pronounced "dog", which is too different. Regardless, phonetic similarity will still play in, and if you wanted to get hella nerdy into it, you could use the actual linguistic generalizations of the phonemes for calculating the phonetic distance--ie, "k" and "g" would be much closer than "k" and "m". You'd end up with vowel-y regions I suppose controlling for vowel differences, as that's where the most difference would be, and even then I suppose it would only be an issue where a minimal pair has isn't minimal in some dialects. That'd be interesting to check out.

Jochem Stoel • Feb 25 '18

That would indeed be interesting to check out. For this particular purpose though, it is beyond overkill to go that far. When the application is waiting for input (asks a question) in some cases there is an array of suspected answers. In other cases even a selection of allowed answers. If the transcription is not accurate / does not literally match any, it checks if the transcription is phonetically similar to anything expected. For example the question is 'what is your favorite color', the user answers 'yellow', transcription API returns 'hello', application assumes user said yellow. So in this case extreme accuracy or sophisticated processing is not necessary and I am satisfied with the results I have.

let similar = require('.')

/* phonetic similarity in % */
console.log(
    similar('genes lora', 'jeans laura') // 100
)

console.log(
    similar('doggy love', 'jiggly puff') // 28.57142857142857
)

console.log(
    similar('boom shaka laka', 'toom baka laka') // 40
)

console.log(
    similar('dollars', 'dollhairs') // 57.14285714285714
)

console.log(
    similar('the fuck you\'re doing', 'the duck is chewing') // 14.285714285714285
)

console.log(
    similar('completely', 'different') // 0
)

Sorry for the late response, all this sent me on a Google search spree.

Your answer(s) here on this post are interesting as well as the only useful ones. I am curious; Do you just have a conceptual understanding of these things or can you also 'live up to' / (re)produce what we're talking about in actual code? (not that I'm asking you to)

Anywho, do you want to be my friend?