March 2005

Voice recognition musings

By Janine M Lodato,

Voice recognition shorthand and the birth of Weblish

It must be the former high school English teacher in me that hears the omission of the subjunctive in movies, TV adds and TV programs.

No, it is not a tense like the present, past or future tenses. No, it is not a part of speech like an adjective, nor is it a verb form like the infinitive. The subjunctive indicates a mood similar to the indicative mood phrase, which indicates a situation of fact, e.g. a phrase such as: "The wind is blowing today". A subjunctive is also similar to the interrogative mood phrase: e.g. "Did you wind the clock today?"

The subjunctive mood also can be used to suggest a situation contrary to fact.: e.g. "If I were rich I would buy a new car" or: "I wish I had enough money to buy a new car." Two flags in English that indicate the subjunctive mood are the words: "If" and "Wish".

English is very simple compared to other languages such as French, Spanish, German, Italian, etc. All have several pages in their grammar textbooks devoted to the conjugations needed for the subjunctive mood. No wonder English is becoming the common language, the lingua franca, of the world! The basic reason, of course, is the simplicity of its grammar. Even most scientific or technical papers are in English while, just a hundred years ago, they would have been written in Latin.

English, though, does have it’s idiosyncrasies which are difficult for foreigners, as well as for voice recognition software, to cope with, including spelling and associated pronunciation. Take for example the word spelled "wind." Depending on the context of the sentence we can pronounce it two different ways - such as in "Was the clock wound?" versus "Did the wind of the blowing wind wound your ears?"

There are a myriad of other words with the same dual pronunciations: e.g. wound, refuse, compact, and contract. There is no end to them! And then there are the irregular verbs, 300 of them, which have very complex tense structures. My husband and care giver (I am a disabled woman in wheelchair) who came to this country after the Hungarian student revolution was crushed by the Soviets, became one of the original geeks and never really learned the irregular verb tenses. Instead he used the auxiliaries, which he learned early and used in phrases such as: " I did in fact see it" instead of "I saw it." It’s no surprise that the voice recognition systems prefer such expression, that is easier for it to understand.

I have taught "English for Foreigners" for 20 years (my wheelchair existence did not affect that, in fact it may have even helped) to students on all levels, from beginners to highly advanced technical experts. My husband is my worst student, but he does act as my voice recognition robot sometimes instead of my computer - though he does talk back and the computer does not.

I must say, I am impressed that so many foreigners speak, understand, read and write English even better than many Americans. I am impressed, indeed, although I recognize they have no choice if they want to succeed at some higher level of occupation. The Internet, the now famous Web, has cemented this fact. Almost all websites –(addressed by www... and ending in either dotcom (in the form of ".com") or dotgov or dotedu or dotorg, etc.) are published on the web in English. Even the website addresses ( called URLs, universal record locators) end in dotde (Germany) or dotjp (Japan) or dotit (Italy), and symbols for all the other 60-plus similar names of the countries of the world, are published in English.

Google, the most favored search engine on the Web, indicates that, at this time, there are 1,900K dotcom, 105K dotgov, 370K dotorg, 158K dotedu and 385K dotnet webpages on the Internet (K, of course, stands for kilo which is one thousand). But, most interestingly, the German based dotde accounts for 2,050K webpages, and more than half of those are in English.

The Web has had a profound effect on English. In place of proper Webster style English, a new language has evolved on the Internet: "Weblish" (if I may be so bold in naming it). This new language is now used all over the world, and it is constantly coming up with new words most of which have an "e" or an "i" up front. Email is, of course, a well known example, but once email was created there had to be a new name for the regular mail, so why not "snail mail" as a great descriptive term.

Examples are all over the Web, and more are cropping up every minute. Just look at the URLs of the website themselves and you will see a whole new set of descriptive names: Froogle, the name for the shopping website of Google, is one such imaginative Weblish variation of English.

One of my favorite new words is Lindows, the Linux based variation of Windows. We all hope it will succeed.

But there is another interesting development on the Web. As part of Weblish, there are the new hieroglyphs (also known as ascii art) which many people now use in their emails. For example, I am a disabled woman in wheelchair, so I like to sign my email with the following combination of letters and characters (that looks like a wheelchair symbol) - skip ascii text


There are many other examples of this, like :-) for happy and so on.

My aging geek husband immediately suggested that we should call this new symbology: eglyphy or webglyphy or ... and he went on indefinitely with new words until I screamed: "please spare me, enough of this"

But he is right, somebody will come up with the best name and it will be adopted by many and become part of the new language.

Then there is SMS, the new cellular phone based short messaging system. It uses some of the best shorthand out of necessity, since its phones have only 12 keys. A good example: "C L M L8R" for "call me later." And so on - you get the drift.

But the most important development of simplified, short word based English is due to voice recognition. Hands-busy, eyes-busy people, and those with a functional disability, can benefit greatly from voice recognition because they don't have to use a mouse and keyboard to document their findings.

Voice-activated, easily used telephone systems will benefit people in all walks of life. Anyone driving a car will find voice recognition a much more effective way of manipulating a vehicle and communicating from the vehicle. We are all hands-busy or eyes-busy at one time or another - in the kitchen, in the garden, or giving care to children or adults in need etc. Personal computers have the capacity to accommodate voice recognition systems, like IBM ViaVoice. This is especially advantageous to a large population segment of people with disabilities, those who are chronically ill and older people as well as their caregivers.

Using a keyboard is next to impossible or at least difficult for this fast growing group of people. Caregivers and their patients would benefit from being able to use just their voices to document the treatments or care they provide to their patients. Additionally, voice recognition technology would allow them a hands-free environment in which to analyze, treat, and write about particular cases easily and quickly.

Linux voice recognition project

The care giving services market alone may justify the Linux-based voice recognition project. Providing care to the needy is one of the largest expenses in the Group of Ten nations, and it is the fastest growing sector as well. Just in the USA, the segment of the population which includes older people, people with disability and/or chronic illnesses accounts for 100 million people. Add to that the 5 million formal caregivers and 44 million informal care givers at work in America, and we are looking at half the population!!

In the care giving field, the simplicity, reliability and low cost of Linux for servers, tablets, embedded devices, and desktops are paramount features. Obviously, the market for these new technologies exists. What remains is for some courageous company with aggressive people to tap into that market. Once those companies get the technology distributed, the needs of many will be met, and a new mass market will open up isn't currently being filled. In fact, the field of opportunity already exists, but it needs to be expanded to serve people with physical and functional disabilities.

Yes, voice recognition offers great promise for the future. However, it isn't perfect and it still needs to be improved. One improvement would use lip reading to bolster its accuracy. Still another would be multi-tonal voice input. Another would be improvements in the design of directional microphones. Every generation of voice recognition software will be improved as the hardware for Linux gets bigger and stronger.

IBM is, in fact, now working on a lip reading system for installation in an automobile so that the on-board computer can very precisely understand the spoken commands of the driver. So, IBM, we need to get this technology into our desktops, ebook readers, tablet computers and PDAs as well - please? Why not license your lip reading technology to a consortium of open source developers, perhaps sponsored by a non-profit such as DRAIL, an organization concerned with the needy. Or, even more effective would be a licence for a group of such organizations, such as Robert Wood, AARP, Elderweb, etc. who would put lip reading and enhanced, precise voice recognition on an ebook reader (such as or the Korean manufactured one: or an iPOD (Apple’s great music machine), or a tablet computer, or any other PC that this segment of the population would feel comfortable with.

Virtual PCs running on a community based server to which the end-users could connect with simple telephones (POTS) or video telephones ( would be of great use. My geek caregiver husband suggests that we should call this project Wordows. "Oh no," I yell at my favorite geek, "That is not so good: it sounds weirdo!" You need to be careful when you create a new word in Weblish. So let us call it, if you will permit me, "Slimdows".

With all these changes, the English teacher in me comes to terms with a changing linguistic environment. Since life itself is constant change, English must also change.

Janine has been a user of voice recognition software for several years and is a advocate for open source solutions that will benefit people with disabilities, such as the Linux-based voice recognition project. See also: Voice Recognition - A user's perspective: A New Way of Thinking High-Tech Style, October 2002

Copyright 2004, Janine M Lodato.

