Transferring Human Knowledge to AI

On January 28, 2021 in All, EIT 2020: The Intelligent Revolution by Michael Matuschek

Sentiment Analysis: The Case for Context- and Culture-Sensitive AI

(Source: hoelixDE/Shutterstock.com)

In 2020, more than 4.4 billion internet users were producing a staggering amount of data through social-media posts, reviews, recommendations, and similar interactions. The insight gathered from this data is invaluable in guiding businesses and innovators through product development, marketing, and customer support. However, extracting that insight is challenging as opinion-oriented, customer-provided data is difficult for machines to understand and interpret because of complexities in human language and cultural context. Tools such as natural language processing (NLP) and machine learning (ML) enable computers to understand and derive meaning from human language. Furthermore, an advancing research area in artificial intelligence (AI) called sentiment analysis helps machines understand unstructured, customer-provided data and interpret opinions as positive, negative, or neutral.

Language Complexities in Semantic Analysis

To understand sentiment analysis in NLP, let’s look at this simple statement from a restaurant review: “The soup was good.” An analysis of the sentiment requires three actions:

Identify whether a statement, sentence or the whole text contains an opinion.
Understand whether the opinion is positive, negative, or neutral (called polarity).
Identify the target of the opinion.

In this instance, the sentiment analysis is unambiguously positive concerning a particular food served at the restaurant. However, other examples are less straightforward, as in a seemingly similar clause, “The beer is cold.” Many would consider this opinion positive because they like beer this way, but cold can have a negative polarity in other contexts. For example, “The coffee is cold” uses an identical sentence structure and adjective, but many people would consider cold coffee to be negative.

Other language complexities create additional challenges, such as sentences that contain multiple sentiments, for example: “The food was good, but the soup was cold.” Here, we have a positive, negative, and an ambiguous sentiment, depending on the customer’s preference for soup temperature. Similarly, “The soup was hot, but the beer was cold” would be positive sentiments for most people but ambiguous given potential customer contexts.

Modifiers further blur the line between polarities. For instance, consider the opinion: “The staff were almost too friendly.” Here, we must also think about irony, sarcasm, or figures of speech, making it challenging to identify sentiment correctly. Examples such as “We waited for more than an hour, really great service!” tend to be rare in training data and extremely difficult to encode manually in a systematic manner.

Cultural Variables in Semantic Analysis

Assigning polarity to opinions becomes even more challenging when considering personal, cultural, or circumstantial preferences. For example, consider analyzing customer reviews for a ryokan, a traditional Japanese guest house that is typically fancy and expensive but features a common bathing area rather than private bathrooms. Categorizing the absence or presence of something as positive or negative seems straightforward—for example, “There was dirt in the shower” or “There was a pool for the kids.” However, the ryokan example demonstrates how accounting for cultural variables and personal preferences is essential in attaining useable insights for data. In Japan, guests believe shared bathing areas to be a positive attribute. By contrast, most European travelers would view it negatively, particularly at an expensive hotel. This example highlights just one feature and two cultures.

Addressing Language and Cultural Variables in NLP

In NLP, sentiments can be analyzed at the whole-document level and at the paragraph and sentence levels, with results often then aggregated. Although whole-document analysis is useful, paragraph and sentence-level analysis can yield more granular and correspondingly accurate results (such as identifying sentiment about a particular product feature in addition to the complete product). The challenge comes in developing a lexicon—the set of rules that machines use to classify sentiments as positive, negative, or neutral. Many free tools and resources are trained on public data as a starting point. For instance, software libraries such as Natural Language Toolkit, spaCy, and TextBlob include sentiment models and retraining with user data. If you prefer not to code, cloud offerings such as Google Cloud Platform or Microsoft Azure enable you to get started with sentiment analysis immediately: Simply paste the text to be analyzed into a browser and build your application from there.

Beyond prototyping, data sets and ML models should address language and culture complexities. This means:

For planning: Find structured approaches to discovering variables and useful insights. For instance, analyze your data for underlying languages and cultures, tone, sources, author demographics, and then consult linguists to interpret those elements. Further improve your approach by interviewing people who belong to the author group to get a precise understanding of nuance and context.
For training data: Identify examples needed to address variables and include human-provided annotations. It might also mean revisiting knowledge bases such as dictionaries, adding more training data for the particular problem, or in some cases, removing problematic or misleading examples from your data if they do more harm than good.
For modeling: Find a method of representing sentences in a mathematically processable way. For example, word embeddings, which represent arbitrary text as numerical vectors, are useful for mapping words used in context to corresponding positive, negative, or neutral sentiments. Ideally, data analysis would be based explicitly or implicitly on individual customers’ preferences; however, this analysis is cumbersome and, in many cases, not possible if a user is not identifiable. A more accessible approach is to analyze data according to region and language. Then, model cultural differences with separate training examples.

Conclusion

Customer-provided data from media posts, reviews, recommendations, and the like provide invaluable insights for businesses and innovators. Complexities in natural language and cultures make it difficult for AI-driven machines to understand customer opinions. However, sentiment analysis can help ensure that these aspects are captured and reflected in insights. You can get started by using freely available tools and resources, but addressing complexities in language and culture is challenging, requiring significant planning, data prep, and modeling. Raising awareness about language and culture complexities is an excellent start in gaining useful insights and a highly valuable way to better understand your customers and their needs.

« Back

Michael Matuschek is a Senior Data Scientist form Düsseldorf, Germany. He holds a Master’s Degree in Computer Science and a PhD in Computational Linguistics. He has worked on diverse Natural Language Processing projects across different industries as well as academia. Covered topics include Sentiment Analysis for reviews, client email classification, and ontology enrichment.

Tagged With: analysis, complexities, data, emotion ai, insights, machines, opinion, opinion mining, opinions, sentiment, sentiment analysis, sentiments, social media, variables

Company

Resources

Support

Connect with Us

Bench Talk

Bench Talk for Design Engineers | The Official Blog of Mouser Electronics

Sentiment Analysis: The Case for Context- and Culture-Sensitive AI

Language Complexities in Semantic Analysis

Cultural Variables in Semantic Analysis

Addressing Language and Cultural Variables in NLP

Conclusion

Search

Categories

Featured Authors

All Authors

Archives

Tags

Customer Service Office

Company

Resources

Support

Connect with Us

Bench Talk

Bench Talk for Design Engineers | The Official Blog of Mouser Electronics

Sentiment Analysis: The Case for Context- and Culture-Sensitive AI

Language Complexities in Semantic Analysis

Cultural Variables in Semantic Analysis

Addressing Language and Cultural Variables in NLP

Conclusion

Related Posts

Using 2D Materials to Create Water Energy-Harvesting Devices

New Tech Tuesdays: Ambient Power: Energy-Harvesting Robots

New Tech Tuesdays: Extending Battery Life with Energy Harvesting Technology

Achieve Cost-Optimized Renewable Energy Harvesting

New Tech Tuesdays: The Power of Gravity: Unleashing the Potential of Energy Storage

How Energy Harvesting by Ultra-Low-Power MCUs Can Eliminate Batteries

Search

Categories

Featured Authors

All Authors

Archives

Tags

Customer Service Office

Company

Resources

Support

Connect with Us