Measuring Shopper Mood with the OpenVINO™ Toolkit

On August 19, 2019 in All, Open Source by M. Tim Jones

Measuring consumer sentiment is an important element of feedback for products, services, etc. In recent years, sentiment analysis—also known as opinion mining—has proven to be a useful tool in providing consumer feedback. Sentiment analysis uses text analysis and natural language processing in the context of social media. The basic idea of sentiment analysis is to capture a consumer’s opinion of a subject based on some form of communication—such as a tweet or a review from a website.

An evolution of sentiment analysis is to passively detect the mood of a consumer who passes by a shelf and looks at a product. This type of sentiment analysis allows not only the capturing of statistics about a consumer’s opinion of a product, but also the possibility of direct interaction—for example, notifying a salesperson if they express interest. In this blog, we will explore how you can use the Shopper Mood application of the Intel^® OpenVINO^™ toolkit to automatically infer the mood of shoppers looking at a retail display based on video input of their facial expression.

Shopper Mood Data Pipeline

Figure 1 shows the pipeline for the Shopper Mood application. Let’s take a closer look at what’s occurs in this deep-learning application.

Figure 1: The Shopper Mood Inference Pipeline diagram illustrates how this application of the OpenVINO™ toolkit processes a captured image to identify the mood detected on a shopper’s face. (Source: Author)

The process begins by capturing an image from a video camera mounted on a retail shelf. Next, the captured image is passed into the first of two deep neural networks (based on the Convolutional Neural Network, or CNN). CNNs are one of the most popular deep-learning network architectures designed to process images. They are made up of a large number of layers that on the front end process small windows of the image and on the back end produce one or more classification scores. The first CNN determines whether faces can be detected in the captured image. If the faces found by the first network exceed a configurable probability threshold, then each face is classified as a “Shopper” and passed to the second network. The second network identifies the type of emotion shown on the face using one of five categories:

Happy
Sad
Surprised
Angry
Neutral

If the CNN is unable to determine the emotion of the detected face (above a configurable threshold), then it’s simply labeled as “Unknown.” You can see the result of the process overlaid on the original image in Figure 2.

Figure 2: The Shopper Mood Monitor output screen shows an example of the results of the Shopper Mood Inference Pipeline overlaid on the original captured image. (Source: Intel)

From Figure 2, you can see that the time required to detect faces in the image was 136ms, and the sentiment analysis took 13ms. This fast processing time makes it possible to do this analysis in real-time in the event an immediate response is required—such as notifying a salesperson to assist the shopper.

The sample application can also be used for non-real-time statistics, optionally sending the resulting sentiment via Message Queue Telemetry Transport (MQTT) protocol to a data analytics system for accumulation and offline analysis.

Why this is Cool

With the Intel^® distribution of OpenVINO^™ and approximately 600 lines of Go, you can implement facial expression detection that would have required very specialized hardware and software a decade ago. The complex work is buried within the deep-learning models that have been pre-trained for facial and mood detection. Then, the glue source loads the models and presents the captured frames to the models for processing and classification. When paired with capable hardware such as one based upon the 6^th generation Intel^® Core^™ processor or Intel’s Neural Compute Stick 2 powered by the Intel Movidius™ X VPU, impressive inference speeds can be attained that enable real-time analytics.

Adapting this Example

Real-time detection of facial expressions has a wide range of applicable use cases. Many are commercial, such as understanding shopper sentiment, but you can also apply this solution to help people with certain types of facial recognition disorders. It is estimated that two percent of the general population suffers from developmental prosopagnosia. Developmental prosopagnosia refers to an impairment that affects recognition of people’s faces or recognition of facial expressions (expressive agnosia). This application could identify faces and facial expressions for individuals with developmental prosopagnosia.

In addition, consider applying this technology to augmented virtual reality. As more embedded devices begin to support deep learning, the possible augmented virtual reality use cases increase. For example, glasses could integrate a video camera and real-time facial detection in order to present a virtual overlay on a captured image that describes the inferred facial expression of someone who passes by the person wearing the glasses.

It’s easy to think of other applications. Using the sample code provided, you’ll just need to make use of the output classification for your application.

Where to Learn More

You can learn more about this demonstration at Intel’s^® IoT development kit GitHub.
The glue application was developed in the C++ and Go languages. The distribution includes the Intel^® optimized face detection and sentiment detection models for OpenVINO^™. You can easily experiment with this application using the Ubuntu 16.04 LTS Linux operating system, the Intel^® distribution of the OpenVINO^™ toolkit, and the OpenCL^™ runtime package.
You can also jumpstart your development using the AIoT development kit, which includes Ubuntu, OpenVINO^™, Intel^® Media SDK and Intel^® System Studio 2018 pre-installed with an Intel^® Core^™ processor. The development kit includes tutorials to help you get up and running quickly.
You can also use the AAEON UP board based up the Intel^® Apollo Lake^™ platform.

« Back

M. Tim Jones is a veteran embedded firmware architect with over 30 years of architecture and development experience. Tim is the author of several books and many articles across the spectrum of software and firmware development. His engineering background ranges from the development of kernels for geosynchronous spacecraft to embedded systems architecture and protocol development.

Tagged With: CNN, DNN, Expression Detection, Face Detection, Intel, OpenVino, Sentiment Analysis

Company

Resources

Support

Connect with Us

Bench Talk

Bench Talk for Design Engineers | The Official Blog of Mouser Electronics

Shopper Mood Data Pipeline

Why this is Cool

Adapting this Example

Where to Learn More

Search

Categories

Featured Authors

All Authors

Archives

Tags

Customer Service Office

Company

Resources

Support

Connect with Us

Bench Talk

Bench Talk for Design Engineers | The Official Blog of Mouser Electronics

Shopper Mood Data Pipeline

Why this is Cool

Adapting this Example

Where to Learn More

Related Posts

IIoT Potential with Edge Computing

New Tech Tuesdays: Enhancing Machine Reliability with QUINT POWER with IO-Link

Deep Learning’s Long Development Continues

Getting Started with Soldering: A Maker's Essential Skill

New Tech Tuesdays: Optimizing Airports with Digital Twins

Optimizing Robotic Cable Management

Search

Categories

Featured Authors

All Authors

Archives

Tags

Customer Service Office

Company

Resources

Support

Connect with Us