Teaching machines to obey spoken commands

www.mangaloretoday.com

April 3 2012 (The New York Times): Vlad Sejnoha is talking to the TV again. OK, maybe you’ve done that, too. But here’s the weird thing: His TV is listening. “Dragon TV,” Sejnoha says to the screen, “find movies with Meryl Streep.” Up pops a list of films like “Out of Africa” and “It’s Complicated.”

Sejnoha is sitting in what looks like a living room but is, in fact, a sort of laboratory inside Nuance Communications, the leading force in voice technology and the speech-recognition engine behind Siri, the virtual personal assistant on the Apple iPhone 4S.

Vlad Sejnoha

Here, Sejnoha, the company’s chief technology officer, and other executives are plotting a voice-enabled future where human speech brings responses from not only smartphones and televisions, cars and computers, but also coffee makers, refrigerators, thermostats, alarm systems and other smart devices and appliances.

It is a wildly disruptive idea. But such systems are already beginning to change the way we interact with the world and, for better and worse, how we think about technology. Until now, after all, we’ve talked only to one another. What if we begin talking to all sorts of machines, too – and, like Siri, those machines respond as if they were human?

The race is now on to make the voice the sought-after new interface between us and our technology. The results could rival innovations like the computer mouse and the graphic icon and, some experts say, eventually pose challenges for giants like Google by bypassing their traditional search engines.

No player is bigger in voice technology than Nuance, of Burlington, Massachusetts, an industry pioneer that has acquired more than 40 companies in the field and today employs 7,300 people. It is one of the companies that helped make a big technological leap from programs that take dictation to systems that actually extract meaning from words and respond to them. Now it wants to push far beyond that.

“They are the equivalent of Microsoft, Google or Amazon in a very niche technological space,” says Andrew Rosenberg, an assistant professor of computer science at Queens College.

Like many new technologies, sophisticated voice systems have potential drawbacks. Some experts worry about privacy invasions, others about our ever-deepening attachment to devices like smartphones.

Humans are wired for speech and tend to respond to talking devices as if they were kindred spirits, says Sherry Turkle, a professor of the social studies of science and technology at the Massachusetts Institute of Technology.

“I’m not saying voice recognition is bad,” Turkle says. “I’m saying it’s part of a package of attachments to objects where we should tread carefully because we are pushing a lot of Darwinian buttons in our psychology.”

Only a decade ago, voice-enabled virtual assistants seemed more science fiction than business fact. But in 2000, Paul Ricci, a former executive at Xerox, concluded that voice software could one day disrupt the marketplace the way the mouse and the icon had in the 1980s. “We had to decide early on where there were markets where we could successfully deploy the technology,” says Ricci, Nuance’s chief executive.

But not everyone is as enamoured with voice technology. Some privacy advocates worry that it adds an audio track to the digital trail that people leave behind when they use the Web or apps, potentially exposing them to more data mining.

Voice recognition software works by sending speech to processors that break down spoken words into sound waves and use algorithms to identify the most likely words formed by the sounds. The system typically records and stores speech so it can teach itself to become more accurate over time. Nuance, for example, believes that, aside from the federal government, it has amassed the largest archive of recorded speech in the United States.

Nuance says it is impossible to identify consumers from the recordings, because the company’s system recognises people’s voices only by unique codes on their devices, rather than by their names. The company’s privacy policy says it uses the voice data of consumers only to improve its own internal systems.

Dragon Go,’’ Sejnoha says into his iPhone, “I want to make reservations for three tomorrow night at Craigie on Main.” Dragon Go, Nuance’s answer to Siri, is a virtual assistant app that has been downloaded several million times since its introduction last summer.

Unlike Siri, however, Dragon Go doesn’t talk back. Sejnoha was asking for a reservation at a restaurant in Cambridge, Mass., and the app went directly to OpenTable and displayed his reservation options.

Dragon Go, Nuance’s first direct-to-consumer app, is part of a push to build the brand’s visibility and demonstrate Nuance’s technological advances to business customers. Its real goal is even bigger: to disrupt the role of search engines as gatekeepers to the Web.

For the most common queries, Dragon Go usually bypasses search engines by taking users directly to Web sites of companies like Amazon, Expedia and OpenTable, which are Nuance partners on the app. If people don’t find what they’re looking for there, Dragon Go offers traditional Web search.

The benefit for consumers, Nuance executives say, is faster answers in fewer steps. In many cases, Nuance collects a small fee from partner sites when people make restaurant reservations or complete purchases.

The app could be construed as a challenge to the likes of Google and Microsoft, which have their own voice products – such as Google Voice Actions and Microsoft Tellme – as well as search engines.

“If you are Google,” says Richard Davis, an analyst at Canaccord Genuity, “you are saying, ‘Holy smokes, we are about to get cut out of the equation.”’

Christopher Katsaros, a Google spokesman, declined to comment. The company has recently updated Google Voice Actions, its voice-command system for Android phones, with a feature that continuously converts people’s speech to text, making it faster and smoother to dictate and send text messages, search Google aloud, or ask for directions.

Lezli Goheen, a spokeswoman for Microsoft, said that the company had addressed consumers’ expectations for easier access to information through several means. In addition to Tellme, a program included in all new Windows products that lets people dictate text messages and commands to applications like calendars, she said, the company has introduced Bing Voice Search, a program that lets people speak their Bing searches.

Members of US Airways’ frequent-flier program who have registered their mobile phone numbers are greeted by name by “Wally,” an interactive voice system that Nuance created for the airline.

One day last month, Wally was talking to Kerry Hester, a senior vice president at US Airways, who had called to check on her own flight. “Hello, Kerry, I’ve matched your mobile number to your Dividend mileage account,” Wally said. Her flight from Phoenix to Los Angeles, Wally reported, unprompted, was “still scheduled to depart on time at 11:20 from Gate A23.”

US Airways introduced Wally last summer, as part of a relocation of its offshore customer service call-in operations back to the United States. Nuance designed the system to anticipate callers’ requests. Wally, for example, can automatically tell frequent-flier members their seat assignments or report whether they have received upgrades. It also converts people’s speech to text, so that, should customers ask to speak a live operator, they don’t have to repeat their original requests.

Last year, Nuance agreed to buy Vlingo, a fierce rival in the voice technology market, for an undisclosed sum. “From our standpoint, the ability to compete with Google, that owns half the smartphone market, and Microsoft, that bundles voice with their products, that’s the real business logic behind merging with Nuance,” says Dave Grannan, the chief executive of Vlingo, which is based in Cambridge, Mass.

Nuance and Vlingo share a vision of a world populated by cloud-based, voice-enabled virtual assistants that move seamlessly from one device to another.

One afternoon earlier this year, a team of Vlingo executives demonstrated their own TV voice-command system to a New York Times reporter. The executives also showed a short animated video in which a fictional couple merrily conversed with their smartphones, tablet computer, TV and car – and the devices replied in kind, alerting the male character that his car needed gas and the woman that her flight that day had been canceled because of bad weather.

“More proactively alerting you with voice, telling you something about your car or an accident ahead, a personal assistant thinking about your needs and keeping you connected to other people is where we think this technology is really going,” Grannan says.

Courtesy: Deccan Herald

Write Comment | E-Mail To a Friend | Facebook | Twitter | Print

Write your Comments on this Article
Your Name
Native Place / Place of Residence
Your E-mail
Your Comment	You have characters left.
Security Validation
Enter the characters in the image above