V-Commerce? A newly coined term (for voice-commerce), it is the application of speech recognition to e-commerce, introduced in Australia by Timemac Solutions. A recently-August 1998-established company Timemac was set up to take advantage of Internet technologies for the development of e-commerce solutions. Based in Melbourne with offices in Hong Kong, Singapore, Indonesia, and America, the company has established DomiNet (a Chinese language Web service that promotes Australian businesses in China, DragonStock (a Web-based online share trading system), and VicRegional (a Bendigo-based portal site). Timmac has just announced (December 1999) the launch of DragonTalk, which is not to be confused with other products with a Dragon prefix. Apart from using 'Dragon' because it has become almost synonymous with speech recognition applications, there is another special reason: occidental year 2000 coincides with the oriental lunar Year of the Dragon. DragonStock is an online, real-time, Web-based share trading system. DragonTalk adds to that system voice-enabled share transactions. Two problems had to be overcome: transaction security, and a speech recognition system that can recognise many clients without being trained for each instance. This is not a system that enables anyone to conduct a voice transaction. Each client has to establish an account, and in that process a biometric voice print is recorded for client recognition. The idea of voice printing goes back some time and was hailed for its potential as evidence of identity in litigation or criminal matters. Acceptance is having the same force as a fingerprint has not been realised because it has been demonstrated that some impersonators can produce voice prints indistinguishable from those of their targets. The security problem is addressed by requiring a client to provide details that would not be publicly known. There are other check points that, with the voice print and 'shared secrets', deliver a secure system. The system's ability to recognise speech from many sources is the result of technology developed by Nuance Communications, a Silicon Valley company. An important element is containment; by hunting the system's vocabulary it is possible to apply resources to variants of each word in the dictionary. For example, the term 'cardiac arrest' is unlikely to arise in the course of a share trading transaction. However, there are many ways of articulating the word, yes. Apart from 'yep', 'yeah', and other variants, there are numerous other words that call be used to have the same meaning as 'yes' (correct, affirmative, etc.). The problem of handling such a range of responses is compounded by what are commonly called national and regional accents. I was given the opportunity to query a U.S. stock information service that uses the Nuance voice recognition system. It had some trouble with tire Australian accent, and asked me to repeat my requests for Microsoft and IBM prices, but got each one on the second attempt and responded with the information. Timemac has developed an Australian dictionary to cope with the local vocabulary and modes of speech, a complex task very heavy on resources. Clients will not be confined to the voice service, just as one doesn't have to use phone keypad input for paying accounts by credit card or bank transfer; a human operator is always available. In the same way that people have become comfortable with pressing keys at a voice prompt, the convenience and speed of voice trading, particularly for straightforward transactions, should be attractive to users. Nuance technology is widely deployed throughout North America, mainly in the financial and travel industries, and is rapidly acquiring a large user base for share trading. It is built on artificial intelligence technology and has been extended to include dictionaries for UK, American, and Australian English, German, Japanese, and Latin American Spanish. A Chinese dictionary is being designed. Unlike speech recognition software designed for dictation, in which the program has to be trained for each user, the Nuance system is able to cope with many linguistic variables. For example, the spoken form of some languages is syncopated; parts of words are dropped, either regularly, in particular contexts, or by some speakers. Gloucester, for example, is usually pronounced Gloster. English is a language in which emphasis can convey meaning, and cadence tells the listener if a sentence is a question. Tonal languages work differently; each syllable can have tip to seven distinct meanings according to tone. Developments in the field are rapidly enabling new services within telecommunications networks and Timemac is taking advantage of that by bringing together technologies from Nuance, Interactive Intelligence, and Call Time. Interactive Intelligence, Inc. is an American company that specialises in automation of business communications, especially for call centres and service providers. Time Call was established in Australia to use Interactive Intelligence technology to provide integrated communications solutions that merge natural language speech recognition with computer telephony. Reprinted from the February 2000 issue of PC Update, the magazine of Melbourne PC User Group, Australia |