Man vs. Machine: Word-Sense Disambiguation

Before finding my calling as a translator, I studied computational linguistics at Georgetown University. (Hoya Saxa!)

Computational linguistics is a branch of computer science focusing on natural language data, as opposed to numeric or structured data. Its applications can be found in technologies like speech recognition, information retrieval, and machine translation. I was most interested in information retrieval and extraction, the kind of software that could sift through megabytes of unstructured text and pull out important information (such as intelligence data) or respond to natural language questions (like a chatbot capable of competing in Turing test challenges). I wrote my Master’s thesis on a machine learning system for the temporal annotation of news text.

Yeah, pretty nerdy fun stuff. 🙂

Although I ultimately chose a different career path, I still have a lot of respect for the technology behind natural language processing. Language technology has a clear place, even in translation.

That said, there are some tasks that qualified humans will always do better. One such task is word-sense disambiguation.

What is word-sense disambiguation?

Word-sense disambiguation is the process by which the meaning of a word or phrase is clarified (or disambiguated) when multiple meanings are possible.

This is something that we, as humans, do all the time with great proficiency. We consider the context in which a word is used and then select the most suitable meaning. In translation, word-sense disambiguation is made easier with the help of reference documents and glossaries, if available. We also have our real-world domain-specific expertise to help us identify which words are true possibilities and which can simply be ruled out.

Simply put… Within the neural networks of the human brain, we’ve really got word-sense disambiguation figured out.

For computers, however, word-sense disambiguation is a colossal challenge.

Software follows an algorithm to perform word-sense disambiguation. These algorithms must explicitly consider various factors about the word or phrase to be disambiguated, including its part of speech, the domain of the text as a whole, the immediate context surrounding the word or phrase (which may or may not relate to the surrounding text), language, dialect, statistical probabilities of collocations, and so on.

Even in a best-case scenario, this is a lot of processing for a computer. What if the word is spelled incorrectly? What if the sentence contains a grammatical error that throws off the part-of-speech tagger? What if the word or phrase is used within a cultural reference, easily recognizable by a human but just another string of words to a computer?

In translation (or machine translation), subtleties of the source text, such as wordplay or running metaphors, may be lost when word-sense disambiguation is not performed properly.

So, what does this mean?

Computational linguists have made impressive advancements in tackling the challenges of natural language applications, all of which involve word-sense disambiguation on some level. Yes, even machine translation is getting better. It can be quite useful in informal situations or in providing the gist of a text when no proper translation is available.

But if you are doing business in foreign markets, trust your translations to expert translators who can preserve the richness of your company voice in your written documentation and marketing materials. Make a good impression on your potential customers, because how you communicate is a direct reflection of your company’s professionalism and brand image.