Article of the Month

next article | index | previous article

October 2002

Voice recognition technology from a user's perspective

By Giannina Lodato,

Giannina Lodato

A new way of thinking 'high tech style'

Over the years I have worked as a Correspondence Secretary for a Californian State Senator, and taught English as a second language. At present I am in the process of writing a book. I enjoy being busy. However, after fighting off a 25-year assault of multiple sclerosis (MS), my hands can no longer type an entire document. In order to continue writing, I now rely on voice recognition technology to do my typing for me. Baby boomer that I am, I must learn new tricks of the trade, with new tools of the trade, taking into account the effects of my multiple sclerosis. Once connected to my new technology, I feel connected to the world via email and word processing.

Adaptability is the key

Voice recognition technology is still in its infancy, and has provided me with some amusing and frustrating moments. Finding a voice recognizable to readers is tricky. Finding a voice recognizable to a computer is even trickier.


I've always found typographical errors (typos) amusing, but my new software's typos take the cake. However, if I'm tired or the computer makes the same mistake several times, I completely lose my humor and become frustrated, determined not to let it happen again.

Among the most comical computer interpretations are:

The microphone into which I dictate sits right in front of my mouth, jutting out from a headset with one earphone. The microphone is so sensitive, it even translates a heavy sigh into a, of, the, or what. A loud sneeze ("achoo") from my husband in a room nearby inspires the computer to type aha. Words unique to my writing must be trained. Otherwise, if I control my breathing, monitor the whereabouts of my allergy-prone husband and enunciate clearly, the computer usually understands my words perfectly on only the second attempt. Its first interpretations are nonetheless reminiscent of my childhood.

I am reminded of a grace we so often said before dinner, "Gracious Father, please bless this food for its intended uses." I understood the prayer to say Gracious Father, please bless this food for its tender juices. This is exactly the way my computer hears my voice, as a child would. Lots of patient word-training is needed to make the machine familiar with my way of speaking - my vocabulary and pronunciation.

Because I have been teaching English to foreigners for the last 20 years, I know written English grammar better than most Americans - maybe, just maybe, even better than the American who typed the program for this voice recognition software. That person is an expert in computer programming, not in English writing. For example: When we talk about particular decades, say the '60s or '70s, the apostrophe should be placed in front of the first number. This prevents redundancy of the number 19, as in 1960s and 1970s. Because we are talking of 10 years, we need to use a plural form tacking on an s. Hence, the short written form of a particular decade looks like this: '60s, '70s, NOT the way the software has it written, 60's and 70's.

Another programming idiosyncrasy I spend a lot of time adjusting to my liking has to do with spaces placed between quotation marks and the text being quoted. I suppose it is just a matter of taste, but I spend a lot of time eliminating spaces.

I never know if the word I want to use is already in the vocabulary or not. When I wanted to use the word "marzipan," the computer first gave me Mars see pan. After I got over the giggles, under my breath because the microphone picks up every little noise, I tried again a couple of times and the computer finally typed the right word on the screen. I was shocked to learn "marzipan" was actually in the vocabulary. You never know until you try.

My new software recognizes my voice best when I speak in half and full sentences. Problem is, it's hard for me to be so organized in my thinking. If the software mistakes one of my words for something else, I need only call up the correction window, which appears off to the side of my document, to pick the correct alternative the computer thinks it may have heard me say. Once I choose the correct alternative, the software automatically replaces it for the mistaken word. It is magical!

If my new software does not provide me with the correct alternative, I break down and just spell the word I want using command mode and the rules of voice recognition spelling. I often resort to spelling just because I lose my patience trying to get the machine to recognize what I have said.

My husband tells me I am like a parent spoiling a child when I fail to teach my new software the right spelling of the words I say. For example, whenever I begin a letter or an e-mail with the salutation "Dear" the computer insists on typing "Der." Apparently, I need to teach it the proper spelling by calling up the correction window and using the choices it gives me, or by typing the proper spelling as I know it.

Switching modes

Voice recognition software has two modes of operation: dictate mode and command mode. Dictate mode is the usual method people use when speaking into the microphone. However, it is often necessary to access command mode to make changes in a document, say, to capitalize a word or spell it.

To access command mode, one must first use a cue word. In the case of my software, I have programmed into the machine the word "computer" to act as the cue putting me into command mode.

All I need to say is, "computer select right (or left) one word", "computer capitalize this" and "computer move right (or left) one word" and the software goes into command mode, then right back into dictate mode. If I say "computer begin spell" I have accessed command mode for spelling and am ready to spell. When finished, I say, "computer return" and the software automatically switches back into dictate mode.

Speaking naturally:

Sitting in a room nearby, my husband is easily frustrated when he hears me dictate a few words, then stop to change into command mode to correct the words the computer thought it heard me say. He doesn't like the fact I only dictate a few words at a time.

One day, he got up and placed a piece of paper on my computer screen so I couldn't see what the computer was typing. He told me to say a few sentences at a time in a natural way of speaking. I did it and after a paragraph or so, I took down the piece of paper and looked at what the computer had typed. Amazingly, the machine understood my words very well and I didn't have too many corrections to make. The lesson is, speak in a normal manner and perhaps prepare several sentences on a piece of paper so you can read them into the microphone at a normal pace.

When I worked for a California State Senator as his Correspondence Secretary, I was responsible for transcribing his dictation into written form on a computer. I must say, his communication to me through his tape recorder was much more efficient in producing written documents than is my communication to a voice recognition machine. Simply put, human-to-human communication cannot be beat.

Lots of word-training, trial, error and patience are required when working with voice recognition software. Once I master it though, it will be a real benefit to me as I write for class. Until I get to know it intimately, I can produce only simple documents.

Magically, the new technology knows there are multiple spellings for some words - to, too, two - and gives me choices for spelling in the correction window off to the side of the document. I need only pick the correct spelling and the technology inserts it into the document for me. It's wonderful!

Communication options

Having MS no doubt motivates me to find a way of expressing myself in writing other than by typing. As long as I am able to talk, voice recognition technology offers me a mode of communicating never before available to people in my position.

The technology I use is called IBM ViaVoice for Mac and it has increased my productivity 10-fold. Imagine what it could do for busines people! Thanks to this wonderful new technology, I can now finish a book I've been writing in my word processing software. My e-mail is greatly improved and I know I've only begun to tap into the wonders of voice recognition.

Half the world's population either has a disability or is helping people with disabilities. Because of this, voice recognition technology is a real boon to people with disabilities, if they are at all computer-savvy. Fortunately for me, my husband is responsible for getting computers started in America, so he is quite computer savvy. As a person with a disability, voice recognition technology opens up the whole world to me.

It has even broadened the scope of my marital situation. Because my husband thinks so much like a computer, my understanding of the new technology contributes to the relationship between my husband and me.

The more I get to know my new software, the less I rely on the typing skills of my Hungarian husband. He never fails to believe that whatever I say in writing can always be better said in HIS words. When I exercise ample patience, voice recognition technology in combination with my husband's editorial input produce written documents I can be proud of. Without patience, I'm sunk!

I look forward to finding a voice that doesn't give me any lip

Writing is different when you dictate your thoughts instead of typing them out. You must have everything organized in your mind before you tell somebody else how to put it down on paper. It will be difficult for me to be so organized from the start. It's just a new way of thinking, that's all. What a dream to have someone else type for me. Until now, my husband has been my voice recognition machine. But he talks back!

The future

Given all the changes I must make in a document produced with voice recognition software, I can see the technology is still in its infancy. In spite of this, I find it magical, wonderful and definitely worth the effort needed to learn and adjust to it. For me the technology is a dream and I can only encourage those working on it and hope improvements are made quickly and smoothly. I suppose when computers are upgraded to have higher speeds and more main memory (RAM), voice recognition will improve.

At the moment, in order for me to dictate text through voice recognition and manipulate it into word processing, I need to wear two separate headsets: one for the voice recognition microphone, and a second for a gyroscope used to control the mouse via head movements. This is bulky, and I look forward to the day when I can wear just one headset and do all the work I need to do hands-free. Perhaps a camera on top of my monitor would photograph my lip motions to make voice recognition more accurate. Perhaps that same camera would photograph my eye motions and blink rate to determine my alertness and productivity.

Fascinated with my new voice recognition technology, I am compelled to spend as much time as possible with it, learning as much about it as I can. In spite of my MS, I am able to produce documents I can be proud of. I predict that before long, everyone in the computer industry will opt for voice recognition over keyboarding. It is the wave of the future, well worth the US$80 software cost, time and effort required to learn it.

Google links

End of Google links

next article | index | previous article