In the middle of the highway, my father turned to our car navigation system for directions. Before chalking out a route from Calcutta to Santiniketan, the system asked our language preference — would we want to be guided through the alleys, bridges, and villages in English, Bengali, and Hindi? Marathi, Malayalam, the list went on and on. We have come a long way since the days when the only way to communicate with devices was via the keyboard or keypad.
But how does a computer, or any system built on technology, think, understand, and, in the case of car navigation systems, even speak in regional languages? Artificial intelligence, of course — one of the essential parts is the concept of natural language processing or NLP. It explains how a machine can understand and interpret human language.
To teach one or more languages, a computer is fed with specific rules called “grammar.” The computer builds an understanding of the corresponding language based on a certain grammar. It sounds exactly like what we humans do, right? No wonder Alan Turing, father of theoretical computer science and artificial intelligence, claimed, “A computer would deserve to be called intelligent if it could deceive a human into believing that it was human.”
And now they can. In 2017, The New Yorker published a Donald Trump speech composed entirely by a computer. They fed 2,70,000 words previously spoken by the current President of the United States of America into an algorithm that analyzed language patterns. They generated words, phrases, and sentences based on the interpreted data. It sounded like a Trump speech.
Like a large amount of training, data is fed to an algorithm. Humans read books and listen to and participate in conversations. The algorithm identifies patterns in the language input the same way we recognize the behavior of words and the relationships words and phrases have. Based on that acquired knowledge, we — a computer or a human — present our permutations and combinations of the language — that is, speak in it. This highlights the two broad phases of any text mining algorithm: analysis and generation.
I recently came across a meme saying, “Once you read a dictionary, everything else is simply a remix.” From the perspective of computational linguistics, it is quite right. Numerous fields in today’s world are dependent on computational linguistics, NLP, and text mining — such as translation applications, sentiment analysis, literature and library databases, and even psychology and neuroscience. Shail Shah, pursuing his master’s degree in Data Science from the Illinois Institute of Technology in Chicago, US, recently published a work on a Query-Based Text Summarizer. “Given a document, we analyze how accurately the system can pick up a relevant answer,” he said. The Google search engine has already standardized this concept.
Language systems are based on statistical models, where each model is built on a humongous volume of data (therein comes the name “big data”). Computational linguistics is used in social media analytics, where, using machine-learning tools, a computer can determine the sentiment attached to a tweet or Facebook post. The data collected for this is dynamic — it is continually updating itself — facilitated by each of us, adding to the data sets, or text corpus, with every update, text, and hashtag we use.
Bodhisattwa Majumder, a Ph.D. student at the University of California, San Diego, US, aims to enable “sample-efficient methods to improve machine reading.” One of the areas he is working on is structured prediction. NLP forms the basis of various software with a prediction module embedded — for example, predictive text in mobile phone keypads. When we key in words, the sequence we use — and their frequencies — get added to the database. Analyzing this input allows the mobile software to predict what we are about to type.
We can also see this feature while typing an email on our personal computers —where data are extracted from various fields —such as email subject, contacts, and past usage — and the program predicts the sentence we are typing. “In the US, there is a surge of work involving NLP from the industry to academia,” says Majumder, who has worked with Walmart Labs and Google AI. “Working with languages will play a key role in building artificial general intelligence and common sense AI. This will allow us to improve machine understanding,” he adds.
Any student interested in the field should follow Google Careers. They have internship opportunities, summer programs, and full-time positions. When hiring for computational linguistics positions in India, they wanted applicants fluent in specific regional languages. The NLP wing of the company seems to be developing its different modules — virtual assistant, navigation system, and so on — to be accessible to everyone in whichever part of the world they might be located. This allows everyone to make the technology their own. That, I believe, is the key to progress.