Speech recognition

A speech understanding system must have answers for three questions: About the speech sounds uttered by the speaker, about the words which the speaker intended to express those speech sounds, and about the meaning i.e., semantic meaning which the speaker intended to express with those words. First of all what is a speech sound is? The answer is, All universal human languages use a limited stocks of songs of about 40 or 50 sounds which we call them as phones.

A phone is the sound that corresponds to a single vowel or constant, but there are some applications, combinations of letter such as “the” and “ns” produce single phones, and some letters produce phones in different contexts.

Consider a sequence of three phones, (k) (a) and (t) one can find that in the lexicon i.e.., dictionary that this is the pronounciation for the word ‘cat’. The two things make this is very difficult. The first is the existence of homophone different words that sound the same like “two” and ‘too’. The second is segmentation, the problem of deciding where one word ends and the other i.e., the next one begins. If a person tried to learn a foreign language will appreciate this problem, as first all words seem to turn together. Gradually one learns to pick out words from sounds. In this case, spectrographic analysis shows in fluent speech, the words really do run together with so silence between them. We learn to identify words boundaries despite the lack of silence. Some speech understanding systems extract and most likely string of words and pass them directly to an analyzer. The other systems have a control structure that considers multiple possible word interpretations so that understanding can be achieved even if some individual word is not recognized correctly and perfectly as well.

It's very calm over here, why not leave a comment?

Leave a Reply

You must be logged in to post a comment.