I am working on a natural language processing engine. It uses the Google Speech API to transcribe short recorded spoken audio and then the interpreter tries to make sense of it.
I now want to calculate the phonetic similarity between two strings, preferably in javascript. Not compare two audio files because the words or phrases are from different speakers.
For the sake of example beef and leaf sound 87% the same. The words deaf and cave something like 29%. (made up percentages)
A value between 0 and 1 is obviously fine too. Same difference.
Any ideas? Is this impossible?
Top comments (8)
When you say compare phonetic similarities, not the audio, because it's sourced from different speakers... are you then making the assumption that the speakers will all be using the same phonology for a given word? Because that's a different issue.
Interesting response. I don't know if I am making that assumption, should I?
Within reasonable bounds, yes -- ie, depending on the dialect and accent, "Mary", "merry", and "marry" may be pronounced differently, or "cot" and "caught" -- or they might be the same. Or something like "beg" vs "bag" might sound the same, or not, but certainly won't be pronounced "dog", which is too different. Regardless, phonetic similarity will still play in, and if you wanted to get hella nerdy into it, you could use the actual linguistic generalizations of the phonemes for calculating the phonetic distance--ie, "k" and "g" would be much closer than "k" and "m". You'd end up with vowel-y regions I suppose controlling for vowel differences, as that's where the most difference would be, and even then I suppose it would only be an issue where a minimal pair has isn't minimal in some dialects. That'd be interesting to check out.
That would indeed be interesting to check out. For this particular purpose though, it is beyond overkill to go that far. When the application is waiting for input (asks a question) in some cases there is an array of suspected answers. In other cases even a selection of allowed answers. If the transcription is not accurate / does not literally match any, it checks if the transcription is phonetically similar to anything expected. For example the question is 'what is your favorite color', the user answers 'yellow', transcription API returns 'hello', application assumes user said yellow. So in this case extreme accuracy or sophisticated processing is not necessary and I am satisfied with the results I have.
Sorry for the late response, all this sent me on a Google search spree.
Your answer(s) here on this post are interesting as well as the only useful ones. I am curious; Do you just have a conceptual understanding of these things or can you also 'live up to' / (re)produce what we're talking about in actual code? (not that I'm asking you to)
Anywho, do you want to be my friend?
There's a whole family of phonetic encoding algorithms beginning with Soundex. If you look for that (or successors like Metaphone) on npmjs.org you'll find implementations. To calculate the similarity you'll want to figure out the Levenshtein distance between two encoded words.
Hmm. I don't think I need the Levenshtein distance. Yes, I am comparing two strings but based on their phonetics and not the string similarity.
That's what Soundex is for. It phoneticizes the words so that if they are spelled similarly in Soundex, then they also sound similar in English. Then the Levenshtein distance algorithm becomes useful for exactly what you're trying to do.
you would need a library that knows the International Phonetic Alphabet spelling of the words you are inputting. then do a Set comparison of the IPA letters