According to a National Safety Council's (NSC) study, a workplace injury occurs every seven seconds. This staggering statistic equates to around 4.5 million injured workers a year. And while on-site supervisors can gauge the mood of their workers, supervisors cannot monitor workers at all times to prevent incidents. In this blog, we will explore how you can use the Machine Operator Monitor application of the Intel® OpenVINO™ toolkit to automatically infer a machine operator’s level of focus and mood based on video input of their facial expression. Information regarding a machine operator’s mood and level of focus can be helpful in protecting the operator from serious injury.
Figure 1 shows the pipeline for the Machine Operator Monitor deep-learning application. Let’s explore this pipeline and the activities that occur.
Figure 1: The Operator Pose and Mood Inference Pipeline diagram illustrates how a captured image moves through the deep neural networks and the OpenVINO™ toolkit to identify a machine operator’s level of focus and mood. (Source: Author)
The application uses images captured by a video camera mounted on a manufacturing station. The captured image flows through a series of three deep neural networks (based on the Convolutional Neural Network or CNN). CNNs are a popular type of deep neural network that are commonly used to process images. The first CNN identifies whether a face exists in the captured frame. If the first CNN does not detect a face, then there is no need to further process the image for pose or mood. If the first CNN detects a face with a user-configurable detection threshold, the face is passed on to the next two stages. The next CNN determines whether the operator is watching the machine. The CNN does this by detecting whether the operator is facing towards the camera. The final CNN detects the operator’s facial expression. The operator must have a particular expression for a configurable amount of time for it to be considered.
Figure 2 shows an example of the completed process of these three stages of deep neural networks.
Figure 2: The Machine Operator Monitor screen shows an example of the output produced after this application of the OpenVINO™ toolkit processes the captured image. (Source: Intel)
As shown in Figure 2, the time required to detect the face as well as infer the mood and pose is around 140ms. This speed permits a fast-response time, allowing a prompt warning to the operator to minimize the chance of an accident and injury. The sample application also illustrates how the Message Queue Telemetry Transport (MQTT) protocol communicates the information to an industrial data analytics system.
The Machine Operator Monitoring application was developed with the Intel® distribution of OpenVINO™ and 700 lines of Go—or 500 lines of C++. This code is primarily glue code with the complex work going on in the deep neural networks pre-trained for the Machine Operator Monitor task. The first network can detect a face and check to ensure that the face rectangle is completely inside the captured frame—i.e., not a partial face. The captured image is then passed through the pose network that checks to see if the head is tilted within a 45 degree angle relative to the machine. Finally, the face image is passed into the sentiment network to identify the operator’s mood. When paired with capable hardware such as one based upon the 6th generation Intel® Core™ processor or Intel’s Neural Compute Stick 2 powered by the Intel Movidius™ X VPU, the application can deliver impressive inference speeds that enable real-time analytics.
Gaze tracking is an important new technology with many applications, but an important one today is in vehicle-driver monitoring. The 2018 Trucking Fatalities Reach Highest Level in 29 Years article by Alan Adler revealed that while motor vehicle crash deaths are declining—2 percent last year—large truck crashes rose to a 29 year high of 9 percent last year. The increase in distracted driving is one factor contributing to the rising number of trucking fatalities.
Using a deep neural network to track a driver’s head pose in real time is one way to ensure that a driver is paying attention to the road. Using information to monitor a driver’s gaze can help identify risks and ensure compliance of drivers and as a result reduce the risks that a distracted driver brings to our crowded roads.
Furthermore, you can use head pose detection in conjunction with other technologies—such as heart-rate detection, body temperature measurement, and breathing monitors—to identify drowsiness. Focusing on the eyes, monitoring blinking and eye movements could be used to detect micro-sleep—where we enter a very brief state of unconsciousness even though our eyes remain open and we appear attentive.
It’s easy to think of other applications for head pose and expression detection. Using the sample code provided, you’ll just need to use the output classification for your application, including modifying for proper head tilt.
M. Tim Jones is a veteran embedded firmware architect with over 30 years of architecture and development experience. Tim is the author of several books and many articles across the spectrum of software and firmware development. His engineering background ranges from the development of kernels for geosynchronous spacecraft to embedded systems architecture and protocol development.