Voice Control Enables Appliances to Talk and Listen

On January 15, 2021 in All, EIT 2020: The Intelligent Revolution, IoT by Wang Jing

Voice Control Technology Enables Every Appliance in the Home to Talk and Listen

(Source: ZinetroN/Shutterstock.com)

Here is a scenario: You come home from work or school, you tell the TV what show you want to watch, and it automatically turns on and switches to your preferred channel. Or perhaps you tell the stove to prepare for low and slow cooking so that dinner is cooked at the appropriate temperature at the right time. Today, home appliances are capable of performing these functions. Through voice control, you can just relax on the sofa after a tiring day at work or school and give instructions to these appliances that obediently follow your command.

Complex architecture and wide-ranging connections are the hallmarks of the Internet of Things. More companies are choosing cloud-hosted IoT systems because cloud architecture is secure, fast, and convenient. A system becomes more secure by using several layers of encryption and authentication. AI-based model training and deployment—such as natural language processing—can be completed with just one click. An IoT cloud generally includes a sensor embedded inside a home appliance that connects to the internet via Wi-Fi. It is used to receive data and transfer it to the cloud database to be analyzed and processed in the cloud environment. In this article, cloud architecture is used as the framework to explain how voice control technology enables home appliances to obey verbal commands and respond.

Voice Control Technology in Home Appliances

With constant AI and IoT developments, human-machine interaction (HMI) has seen more high-end experiences. Voice control technology is one of the most widely applied and popular research topics today. The application of voice control in home appliances, which eliminates the need for familiar remote controls and enables appliances to function using verbal commands alone is new to most people. Voice-controlled home appliances are made possible using AI, machine learning, speech recognition, IoT, and cloud computing.

Azure Cloud Voice Control and Speech Recognition Technology

A voice control system includes:

Speech recognition
Natural language understanding
Dialog management
Natural language generation
Speech synthesis

Speech recognition refers to the transformation of information from speech to text. The Azure platform's TTS (text-to-speech) uses a universal language model trained using Microsoft's existing data and is deployed in the cloud. This model can be used to create and train custom language models. It can select a specific lexicon and add it into the training data as needed.

Natural language analysis/natural language processing is a part of machine learning that designs models and conducts training.

The tasks of dialog management comprise three main points:

Prediction of user intention
Analysis is performed based on the contents of the dialog, and the machine-learning model predicts and confirms what to do next.
Providing an interface to implement exchange with the back-end/task model
It serves as an application interface to implement request exchange with the server or model, thus obtaining feedback results and generating text results.
Providing the expectation value for the results of semantic analysis.
It responds to meet the user's expectations through semantic parsing based on the user’s question.

The response text is generated based on the model's analysis of the user's command. The main effect of speech synthesis technology is transforming text into a humanized voice. The basic Azure cloud voice synthesis uses voice SDK or REST Application Programming Interface (API) protocols (see details below) to achieve text-to-speech with a neural or custom voice.

In home appliances, the dialog models’ emotional requirements are somewhat lower because most user commands are only functional requests, such as turning on the device and requesting the temperature or humidity.

Steps of a Basic Solution for Cloud Voice Control Technology

A basic solution for cloud voice control technology includes:

Dialog mode: The dialog mode is the central hub of human-machine language interaction; all other modes are derived from this. The system switches to dialog mode whenever the user gives a command. Azure developed an interface using the UWP application platform that monitors whether human voice triggers (such as saying to the platform: "Hi, cloud!") are successfully received or not.
Dictation mode: The user speaks a longer phrase or sentence and waits for speech recognition results. After saying the initial trigger, "‘Hi, cloud!", the user can then give the machine the actual command. The speech’s content is transmitted to the semantic analysis system (Azure LUIS), and the real-time speech-to-text service initializes the Universal Language Model. The operation is completed through the REST API/speech software development kit (SDK).
Interactive mode: The interactive mode is used when the user makes a brief request and wants the application to respond, which is a process that works thanks to the speech recognition and text-to-speech functions embedded in the application. In this article’s example, the interactive mode of the voice control system deployed in the Azure cloud is brought into play using the user-interactive Universal Windows Platform (UWB) application. A simple interface is provided on the UWP for user operation, or used for testing by developers.

Universal Windows Platform (UWP)

With the Universal Windows Platform, the same API can be universally applied to computers, smartphones, or other Windows 10 devices. In other words, the same code can be run on different terminals without writing different versions of the code for different platforms.

Cognitive Services Speech Recognition SDK & REST APIs

Voice SDK software allows manufacturers to boost voice quality enhancement in hands-free applications by using voice-band audio processing for automotive hands-free applications, such as speech recognition in cockpit devices.

The official documentation states that: "As an alternative method for voice SDK, the voice service allows the use of REST APIs to transform speech to text. Every accessible endpoint is connected to a certain region. The application requires a subscription key for the endpoint used. REST APIs are very limited since they can only be used in situations where voice SDKs are not available."

Using speech recognition as an example: A key for the REST API must be acquired before sending the HTTP request to the server. After authentication, the server returns the transformed audio locally. This diagram is an example of creating and using a REST client in an application and then invoking it (Figure 1). When invoking a REST client, the input is transformed into an HTTP request and sent to the REST API. The response from the communication endpoint is an HTTP response. The REST client transforms it to a type that the application can recognize and returns it to the application.

Figure 1: Creating and using a REST client in an application. (Source: gunnarpeipman.com)

We opt not to publicly disclose the details of our application’s REST client, so an adapter for the communication with external servers can be added. The adapter receives parameters of known types from the application, and the adapter returns to the same data to the external server.

Language Understanding Intelligent Service (LUIS)

Azure's LUIS is a cloud-based dialog AI service that allows machines to understand human language. The mode of operation can be summed up as follows: The client directly sends a voice request to LUIS through the application. The natural language processing function in LUIS transforms the command into JSON format. After it is analyzed, the answer is also returned in JSON format. The LUIS platform provides the user with a training model service. This model sporting a "continuous learning" function and responding to the client's request by making corrections continuously and automatically to improve accuracy.

Now, let’s take a look at how LUIS works using a residential humidity monitoring system as an example. What if you wanted a user to give the "check the humidity" command? LUIS incorporates the essential components of natural language processing:

Objective (the verb): Here, "to check" is the verb. The LUIS model accepts up to 80 objective words.
Complete language content: This is the complete command given by the user. The LUIS model accepts a maximum of 500 words for voice requests.
Entity (the noun): Here, "humidity" is the noun. The LUIS model can accept up to 30 entity nouns.

The user can customize LUIS features based on their own needs, which means that when your model cannot easily recognize one or a few words, it can automatically add new data for retraining.

Running Raspberry Pi 3 on Windows 10 IoT Core

Raspberry Pi is a development board that can connect sensors of different types. Raspberry Pi can be used with a Web server. Such a server receives different interpretation commands and sends electrical signals to control home appliances installed in the smart home.

How Voice Control Technology Is Used in Home Appliances

Voice control makes the home environment smarter and brings about home appliance automation (Figure 2). We can define it this way: Improving the homeowner's quality of life by using technologies that provide different services related to the areas of health, multimedia, entertainment, and energy.

Figure 2: Voice control technology recognizes audio commands to operate connected home appliances. (Source: Andrey Suslov/Shutterstock.com)

Example Application: A Smart Humidity Monitor with Cloud Services

Let’s take a look at how voice control technology for home appliances works with a smart voice-controlled humidity monitor using cloud architecture as an example.

Core technology

When running Universal Windows Platform (UWP) on Raspberry Pi 3, the speech recognition API and sensor interact with the user. Semantic analysis is performed in LUIS, and Raspberry Pi 3 inputs the user’s question. The answer finally comes from the speech recognition API of Cognitive Services.

Architecture

Cloud computing has become the first choice in data architecture to ensure that data transmission is secure, data processing is fast, and model predictions are accurate. Cloud deployment can also significantly reduce device operation and enhance device performance while improving user experience, thus achieving a win-win outcome. The cloud architecture selected here is the Microsoft Azure cloud platform that has recently given rise to major developments and innovations in the fields of AI & IoT.

Functions

Data storage: Data collected through sensors is stored in the cloud.
Speech-to-text and text-to-speech APIs are used to recognize users’ questions and answer using speech.
LUIS speech recognition and semantic analysis can predict the correct response to the user's command using previously trained models.
The home appliance can answer the user’s question through speech input by Raspberry Pi 3 and the speech recognition by Cognitive Services.

Solutions

Refer to the following GitHub link for an example of creating this type of solution.

https://microsoft.github.io/techcasestudies/iot/2017/06/02/Iomote.html

Data sent to the cloud

Data transfer from the sensor to the cloud database can already be accomplished using today's data architecture. Clients can directly use different types of databases to meet their various needs.

Conducting speech dialog: UWP application

Example: The user wishes to know what the humidity level in their home is, so they say, "Hey, cloud! à What is the humidity in the room now?" The text of the question is provided using the UWP running in Raspberry Pi 3 on the device. The application will communicate with all sensors and actuators and then trigger the system to send the question to LUIS for semantic analysis.

Analyzing the question by connecting with LUIS

LUIS is used to understand the command received from Raspberry Pi 3. Through model training, the application can recognize that the intention of the command is to detect the indoor humidity. After that, the LUIS API is added into the UWP application. When the user says the trigger command "Hey, cloud!", all contents are sent to LUIS through the API and analyzed. LUIS is called in the UWP, and it receives the input and analyzes the intention. Based on the predicted intention’s confidence level, the correct answer is provided to the user. A command is then sent to the IoT center to get the temperature from the sensor.

Developing web applications

A web application can be developed for device management. This application can display all sensor data received by the IoT center, making the management of devices easier and realizing the functions of restart and firmware update.

Human-machine interaction

The UWP application and web application interact with each other to give the client a response, with the web application being responsible for sending the command to the designated sensor, detecting the specific sensor’s current indoor humidity, and answering the user's question. Finally, the user is provided with the current indoor humidity through the text-to-speech API.

Conclusion

In the era of the Internet of Things, man's dream of attaining a high-quality and convenient life is made possible by home appliances with voice control and response capability. The voice control function of home appliances is designed using a combination of technologies that include artificial intelligence, machine learning, natural language processing, the Internet of Things, cloud computing, data transmission, and sensors.

The use of voice-control technology in home appliances is a very forward-looking application. The future home will certainly be a place filled with smart devices that can talk to their users. It is hoped the technology will draw more scientists to this field of study and work toward constant innovation and development.

« Back

Wang Jing is a machine-learning algorithm engineer currently working in the field of automotive inspection. Passionate about creating technical articles, she hopes her writings will arouse readers' interest in artificial intelligence and inspire more professionals to combine AI with cloud technology and big data to make life safe and convenient.

Tagged With: ai, cloud computing, iot, natural language processing, rest api, smart home appliances, speech recognition, speech synthesis, voice control

Bench Talk

Bench Talk for Design Engineers | The Official Blog of Mouser Electronics

Voice Control Technology Enables Every Appliance in the Home to Talk and Listen

Voice Control Technology in Home Appliances

Azure Cloud Voice Control and Speech Recognition Technology