KERA Arts Story Search

Looking for events? Click here for the Go See DFW events calendar.

SXSW: For Computers, Our Evolving Language Can Get ‘Lost In Translation’

by Alan Melson 16 Mar 2015 10:05 AM
Lau and Ehsani presented examples of news headlines that computers struggled to interpret.

Lau and Ehsani presented examples of news headlines that computers struggled to interpret (click to enlarge).

If you ask a large group of people if they’ve used natural language processing before, you’re likely to get a collective question mark in response.

But ask the same crowd if they’ve asked Siri a question on their iPhones, as Howard Lau and Farzad Ehsani did at this past weekend’s SXSW session “Lost In Translation: Slang, Search and Social”, and you’ll probably see a decent show of hands.

Apple’s Siri technology is just one example of how natural language processing (NLP), the field of study and development concerned with the interaction between computers and human linguistics, has been put to work in real-world applications.  Lau and Ehsani, who lead companies working on different aspects of the technology, explained why it’s an exciting and maddening area of software design because of the complexity of meaning and tone in the language we use.

“The more a word is used, the more ambiguous it becomes,” said Lau, CEO of Attensity, which provides data analytics – including social media monitoring – for large companies.  “You need other words around it to disambiguate it.”

Computers have gotten much better at speech recognition over the past decade, and there are plenty of examples across a spectrum of products where that technology is now being used.   However, Lau said uniquely human characteristics of language present a tremendous challenge for programmers. Emotion is hard to convey, for example, and slang is ever-evolving; words take on very different meanings based on geography and a number of other factors.

“Language is fundamentally tribal, meaning that we exist in special interest groups – training groups, professional groups, social groups, generational groups,” he said.  “ … Language marks who is in our tribe, and who isn’t.”

Ehsani, whose company Fluential builds a health and wellness smartphone app with speech recognition functionality, said there have typically been two different methods for NLP:  Following a set of clearly-defined rules for how language works, or throwing a ton of data at a machine so it “learns” how language works.  He said the current approach favors using both.

“There are 40 different rules on how to pluralize in English,” he said.  “That’s a lot for a human to learn, but easier for a machine if you give it enough examples.”

The pair cited IBM’s Watson machine, which famously won on Jeopardy, as a great example of how NLP was paired with artificial intelligence to achieve an astonishing ability to interpret human language.  Similar applications are now found in Siri and other similar consumer speech-recognition interfaces.

However, the amount of new linguistic data to parse through is staggering.  Nearly 90 percent of the world’s data was created in the last two years, Lau said, and the majority was unsorted text.  His firm continues to try new methods for interpreting slang while tracking how its clients are talked about online.

“Social media is only one component – there are also e-mail logs, customer service notes, press releases, financial statements and more,” he said.  “All these need to come together in a common model for analysis and interpretation.”

Lau and Ehsani said the future will bring even more ways in which NLP is applied in daily use.  New languages are the next frontier; although the majority of work to date has been done in English, they said the research is going global, from companies building speech recognition platforms in China to work by the U.S. Department of Defense on Arabic recognition technologies.

Still, the constant evolution of how we communicate will always present a challenge for companies as they work to develop machines that can do a better job of understanding it.

“There are two parts to NLP: language and technology,” Lau said. “The language part doesn’t get enough attention.”