In Machine Learning, NLP

What is Natural Language Processing?

By Jeff Toffoli

Learn how computers interact with human languages.

In a certain sense, we've been talking to computers using “programming languages” for a long time. But for the majority of us, programming languages are difficult to understand. Language is fundamental to our everyday lives, and it makes sense that we would want our computers to be able to use and understand language the way we do.

The science and technology involved in communicating with computers using natural human languages has made enormous strides in recent years, and the field of Natural Language Processing, or “NLP”, is driving the creation of entirely new products and services.

books on table

video-scene-detection-car

Natural Language Processing is a subfield of artificial intelligence and computer science that involves the interaction between computers and humans using human languages. The objective of NLP is to extract meaning and understanding from the content of human communication in text and speech. Oftentimes, NLP is used to automate processes based on what a person has said. NLP technologies are being used to answer questions, generate text and summarize text automatically. NLP has a significant number of business applications, in part because huge amounts of human language data can be processed in a small amount of time.  

 

The concept of Natural Language Processing was first introduced almost 70 years ago. An article published by Alan Turing, named “Computing Machinery and Intelligence,” discusses ‘automated interpretation’ as it relates to the generation and interpretation of human language processing.

 

Modern approaches to NLP are generally powered by statistical models that are used to analyze the content, structure and meaning of spoken and written communications. Deep Neural Networks are the leading statistical method used to generate models that can be used in language data processing.

How does NLP work?

There are two components present in most NLP technologies: data pre-processing and algorithm development.

Data pre-processing:

In this phase, text data is processed, cleaned and structured so that a computer system can analyze it for further processing. This preprocessing phase helps to reduce the complexity and ambiguity that is inherent in natural human communication.

Tokenization

This step involves the breakdown and division of text passages into smaller functional units called tokens. Tokens can be individual words or phrases.

Stop-word removal 

In this step, common words and those that don’t convey information (think about how many times you say the word “um” every day) are removed so that only meaningful unique words are left.

Stemming and lemmatization 

Natural languages typically include words that come in a variety of versions with very similar meanings.  At this step two techniques, stemming and lemmatization,   are used to reduce words to their simplest version (for example; "playing", "played" and "plays" might all be reduced to simply "play"). 

Part-of-speech tagging 

This step involves the classification and division of words to whatever part of speech they belong to, like nouns, adjectives, verbs, subjects and objects, etc.

 

Algorithm development 

Cutting edge statistical methods, as well as more conventional "rules-based" approaches are used for NLP. Through pre-processing and algorithmic development, NLP-based applications classify, analyze and  structure natural language in text and speech. 

Machine learning-based algorithms 

Machine learning algorithms can automatically identify patterns in your data and construct a model that helps you understand and interact with natural languages. You only need to provide a properly configured machine learning algorithm with examples to learn from, plus processing power, plus time, and you'll be able to create a model that can help you make predictions, generate text and automate processes. 

Rule-based algorithms

Hand-crafted algorithms leverage linguistic rules. This approach of linguistic rules was very common in the past years and it is still commonly used on its own, or in combination with machine learning methods.

 

How do computers make sense of textual data through NLP? 

NLP systems make observations about patterns in textural data.  They categorize words, letters and phrases, and then use these building blocks to extract meaning from a text passage. Several syntactical techniques are used to put words within their proper context so that meaning can be determined. 

Word segmentation

Word segmentation involves the extraction of individual words from a "string" of text. Word segmentation can be fairly straightforward in languages like English where there are clear spaces between words, but it’s more complex when working with languages where such markers are not present, or are ambiguous [Figure 1]. 

 

[Figure 1: The Thai language does not typically have spaces between words, but uses spaces to separate phrases]

Parsing

Parsing Involves grammatical analysis of text data, where words are classified based on parts of speech. For example, for the sentence “A cat meowed.”, the word ‘cat’ is put under the category of noun, the word ‘meowed’ is put under the category of verbs and so forth.

Sentence breaking

As the name suggests, sentence breaking involves the breaking of large segments of text into shorter, more clearly understood pieces. 

Word sense disambiguation

In natural human languages, the same word can often carry multiple possible meanings or "senses." Word sense disambiguation uses contextual clues to identify the appropriate meaning of a given word. 

Named entity recognition (NER)

NER, as it’s often called, puts words into useful categories for analysis and understanding. For example, we might be interested in categories related to a person's contact information. In this case, we would want to know what their "name", "address", and "phone number" are. These named entities are concepts that can be specifically identified in text passages. 

 

What are applications of NLP?

Natural Language Processing has been adopted across nearly all industries, and is seeing the development of new applications all the time. Here are some common uses that you may have already encountered in your everyday life. 

Smart assistants 

Smart assistants have become a common part of our lives, performing a variety of automated tasks and answering questions. In many cases smart assistants are controlled through voice commands. Smart assistants typically must first pre-process spoken instructions to extract key words and phrases that can be used for NLP.

Predictive text

Features like auto-completing the text, auto-correcting the texts, predictions and suggestions are popular features powered by NLP. Predictive text has significantly improved user experience when using mobile applications, since it helps to compensate for the challenge of using small keyboards on mobile device screens. Predictive text is widely used in search engines and word processing applications. 

Chatbots

Chatbots are being used today by organizations to handle a growing variety of customer service inquiries. The ability to automate customer interactions, and route inquiries to the appropriate resource has proven to be an enormous cost savings for customer service organizations. The most advanced chatbots use machine learning to classify the contents of conversations and continually improve the performance over time.>

Text analytics 

Text analytics can identify patterns and trends from unstructured text. For example, text analytics can be used to understand customer sentiment related to a new product release, or to automatically identify customer pain points. Text analytics can summarize bulk text data and provide analytics, at a scale that is not possible for humans to do manually. Text analytics systems divide data into various categories so that quantities and correlations can be discerned from the data. Meaningful keywords can be extracted, and an analysis is made of the bulk text data amount. 

Conclusion 

NLP uses a variety of techniques to make human languages work with computers. By preprocessing language passages, the complexity of natural languages can be reduced and computational algorithms can have an easier time making sense of the data. Machine learning, as well as conventional “rules-based” approaches can be used to derive meaning and understanding from languages. NLP technologies are in widespread use, and continue to be adopted for new use cases. 

Start Building AI Models Today!

Sign up for a free account and get started with 1,000 free operations each month.

Start for free

thumbnail-portal-api-sign-up-cta

Subscribe to updates

Recent Posts