When you say compare phonetic similarities, not the audio, because it's sourced from different speakers... are you then making the assumption that the speakers will all be using the same phonology for a given word? Because that's a different issue.
Pellentesque nec neque ex. Aliquam at quam vitae lacus convallis pulvinar. Mauris vitae ullamcorper lacus. Cras nisi dui, faucibus non dolor quis, volutpat euismod massa. Donec et pulvinar erat.
Within reasonable bounds, yes -- ie, depending on the dialect and accent, "Mary", "merry", and "marry" may be pronounced differently, or "cot" and "caught" -- or they might be the same. Or something like "beg" vs "bag" might sound the same, or not, but certainly won't be pronounced "dog", which is too different. Regardless, phonetic similarity will still play in, and if you wanted to get hella nerdy into it, you could use the actual linguistic generalizations of the phonemes for calculating the phonetic distance--ie, "k" and "g" would be much closer than "k" and "m". You'd end up with vowel-y regions I suppose controlling for vowel differences, as that's where the most difference would be, and even then I suppose it would only be an issue where a minimal pair has isn't minimal in some dialects. That'd be interesting to check out.
Pellentesque nec neque ex. Aliquam at quam vitae lacus convallis pulvinar. Mauris vitae ullamcorper lacus. Cras nisi dui, faucibus non dolor quis, volutpat euismod massa. Donec et pulvinar erat.
That would indeed be interesting to check out. For this particular purpose though, it is beyond overkill to go that far. When the application is waiting for input (asks a question) in some cases there is an array of suspected answers. In other cases even a selection of allowed answers. If the transcription is not accurate / does not literally match any, it checks if the transcription is phonetically similar to anything expected. For example the question is 'what is your favorite color', the user answers 'yellow', transcription API returns 'hello', application assumes user said yellow. So in this case extreme accuracy or sophisticated processing is not necessary and I am satisfied with the results I have.
Sorry for the late response, all this sent me on a Google search spree.
Your answer(s) here on this post are interesting as well as the only useful ones. I am curious; Do you just have a conceptual understanding of these things or can you also 'live up to' / (re)produce what we're talking about in actual code? (not that I'm asking you to)
When you say compare phonetic similarities, not the audio, because it's sourced from different speakers... are you then making the assumption that the speakers will all be using the same phonology for a given word? Because that's a different issue.
Interesting response. I don't know if I am making that assumption, should I?
Within reasonable bounds, yes -- ie, depending on the dialect and accent, "Mary", "merry", and "marry" may be pronounced differently, or "cot" and "caught" -- or they might be the same. Or something like "beg" vs "bag" might sound the same, or not, but certainly won't be pronounced "dog", which is too different. Regardless, phonetic similarity will still play in, and if you wanted to get hella nerdy into it, you could use the actual linguistic generalizations of the phonemes for calculating the phonetic distance--ie, "k" and "g" would be much closer than "k" and "m". You'd end up with vowel-y regions I suppose controlling for vowel differences, as that's where the most difference would be, and even then I suppose it would only be an issue where a minimal pair has isn't minimal in some dialects. That'd be interesting to check out.
That would indeed be interesting to check out. For this particular purpose though, it is beyond overkill to go that far. When the application is waiting for input (asks a question) in some cases there is an array of suspected answers. In other cases even a selection of allowed answers. If the transcription is not accurate / does not literally match any, it checks if the transcription is phonetically similar to anything expected. For example the question is 'what is your favorite color', the user answers 'yellow', transcription API returns 'hello', application assumes user said yellow. So in this case extreme accuracy or sophisticated processing is not necessary and I am satisfied with the results I have.
Sorry for the late response, all this sent me on a Google search spree.
Your answer(s) here on this post are interesting as well as the only useful ones. I am curious; Do you just have a conceptual understanding of these things or can you also 'live up to' / (re)produce what we're talking about in actual code? (not that I'm asking you to)
Anywho, do you want to be my friend?