(Source: ZinetroN/Shutterstock.com)
Here is a scenario: You come home from work or school, you tell the TV what show you want to watch, and it automatically turns on and switches to your preferred channel. Or perhaps you tell the stove to prepare for low and slow cooking so that dinner is cooked at the appropriate temperature at the right time. Today, home appliances are capable of performing these functions. Through voice control, you can just relax on the sofa after a tiring day at work or school and give instructions to these appliances that obediently follow your command.
Complex architecture and wide-ranging connections are the hallmarks of the Internet of Things. More companies are choosing cloud-hosted IoT systems because cloud architecture is secure, fast, and convenient. A system becomes more secure by using several layers of encryption and authentication. AI-based model training and deployment—such as natural language processing—can be completed with just one click. An IoT cloud generally includes a sensor embedded inside a home appliance that connects to the internet via Wi-Fi. It is used to receive data and transfer it to the cloud database to be analyzed and processed in the cloud environment. In this article, cloud architecture is used as the framework to explain how voice control technology enables home appliances to obey verbal commands and respond.
With constant AI and IoT developments, human-machine interaction (HMI) has seen more high-end experiences. Voice control technology is one of the most widely applied and popular research topics today. The application of voice control in home appliances, which eliminates the need for familiar remote controls and enables appliances to function using verbal commands alone is new to most people. Voice-controlled home appliances are made possible using AI, machine learning, speech recognition, IoT, and cloud computing.
A voice control system includes:
Speech recognition refers to the transformation of information from speech to text. The Azure platform's TTS (text-to-speech) uses a universal language model trained using Microsoft's existing data and is deployed in the cloud. This model can be used to create and train custom language models. It can select a specific lexicon and add it into the training data as needed.
Natural language analysis/natural language processing is a part of machine learning that designs models and conducts training.
The tasks of dialog management comprise three main points:
The response text is generated based on the model's analysis of the user's command. The main effect of speech synthesis technology is transforming text into a humanized voice. The basic Azure cloud voice synthesis uses voice SDK or REST Application Programming Interface (API) protocols (see details below) to achieve text-to-speech with a neural or custom voice.
In home appliances, the dialog models’ emotional requirements are somewhat lower because most user commands are only functional requests, such as turning on the device and requesting the temperature or humidity.
A basic solution for cloud voice control technology includes:
With the Universal Windows Platform, the same API can be universally applied to computers, smartphones, or other Windows 10 devices. In other words, the same code can be run on different terminals without writing different versions of the code for different platforms.
Voice SDK software allows manufacturers to boost voice quality enhancement in hands-free applications by using voice-band audio processing for automotive hands-free applications, such as speech recognition in cockpit devices.
The official documentation states that: "As an alternative method for voice SDK, the voice service allows the use of REST APIs to transform speech to text. Every accessible endpoint is connected to a certain region. The application requires a subscription key for the endpoint used. REST APIs are very limited since they can only be used in situations where voice SDKs are not available."
Using speech recognition as an example: A key for the REST API must be acquired before sending the HTTP request to the server. After authentication, the server returns the transformed audio locally. This diagram is an example of creating and using a REST client in an application and then invoking it (Figure 1). When invoking a REST client, the input is transformed into an HTTP request and sent to the REST API. The response from the communication endpoint is an HTTP response. The REST client transforms it to a type that the application can recognize and returns it to the application.
Figure 1: Creating and using a REST client in an application. (Source: gunnarpeipman.com)
We opt not to publicly disclose the details of our application’s REST client, so an adapter for the communication with external servers can be added. The adapter receives parameters of known types from the application, and the adapter returns to the same data to the external server.
Azure's LUIS is a cloud-based dialog AI service that allows machines to understand human language. The mode of operation can be summed up as follows: The client directly sends a voice request to LUIS through the application. The natural language processing function in LUIS transforms the command into JSON format. After it is analyzed, the answer is also returned in JSON format. The LUIS platform provides the user with a training model service. This model sporting a "continuous learning" function and responding to the client's request by making corrections continuously and automatically to improve accuracy.
Now, let’s take a look at how LUIS works using a residential humidity monitoring system as an example. What if you wanted a user to give the "check the humidity" command? LUIS incorporates the essential components of natural language processing:
The user can customize LUIS features based on their own needs, which means that when your model cannot easily recognize one or a few words, it can automatically add new data for retraining.
Raspberry Pi is a development board that can connect sensors of different types. Raspberry Pi can be used with a Web server. Such a server receives different interpretation commands and sends electrical signals to control home appliances installed in the smart home.
Voice control makes the home environment smarter and brings about home appliance automation (Figure 2). We can define it this way: Improving the homeowner's quality of life by using technologies that provide different services related to the areas of health, multimedia, entertainment, and energy.
Figure 2: Voice control technology recognizes audio commands to operate connected home appliances. (Source: Andrey Suslov/Shutterstock.com)
Let’s take a look at how voice control technology for home appliances works with a smart voice-controlled humidity monitor using cloud architecture as an example.
When running Universal Windows Platform (UWP) on Raspberry Pi 3, the speech recognition API and sensor interact with the user. Semantic analysis is performed in LUIS, and Raspberry Pi 3 inputs the user’s question. The answer finally comes from the speech recognition API of Cognitive Services.
Cloud computing has become the first choice in data architecture to ensure that data transmission is secure, data processing is fast, and model predictions are accurate. Cloud deployment can also significantly reduce device operation and enhance device performance while improving user experience, thus achieving a win-win outcome. The cloud architecture selected here is the Microsoft Azure cloud platform that has recently given rise to major developments and innovations in the fields of AI & IoT.
Refer to the following GitHub link for an example of creating this type of solution.
https://microsoft.github.io/techcasestudies/iot/2017/06/02/Iomote.html
Data transfer from the sensor to the cloud database can already be accomplished using today's data architecture. Clients can directly use different types of databases to meet their various needs.
Example: The user wishes to know what the humidity level in their home is, so they say, "Hey, cloud! à What is the humidity in the room now?" The text of the question is provided using the UWP running in Raspberry Pi 3 on the device. The application will communicate with all sensors and actuators and then trigger the system to send the question to LUIS for semantic analysis.
LUIS is used to understand the command received from Raspberry Pi 3. Through model training, the application can recognize that the intention of the command is to detect the indoor humidity. After that, the LUIS API is added into the UWP application. When the user says the trigger command "Hey, cloud!", all contents are sent to LUIS through the API and analyzed. LUIS is called in the UWP, and it receives the input and analyzes the intention. Based on the predicted intention’s confidence level, the correct answer is provided to the user. A command is then sent to the IoT center to get the temperature from the sensor.
A web application can be developed for device management. This application can display all sensor data received by the IoT center, making the management of devices easier and realizing the functions of restart and firmware update.
The UWP application and web application interact with each other to give the client a response, with the web application being responsible for sending the command to the designated sensor, detecting the specific sensor’s current indoor humidity, and answering the user's question. Finally, the user is provided with the current indoor humidity through the text-to-speech API.
In the era of the Internet of Things, man's dream of attaining a high-quality and convenient life is made possible by home appliances with voice control and response capability. The voice control function of home appliances is designed using a combination of technologies that include artificial intelligence, machine learning, natural language processing, the Internet of Things, cloud computing, data transmission, and sensors.
The use of voice-control technology in home appliances is a very forward-looking application. The future home will certainly be a place filled with smart devices that can talk to their users. It is hoped the technology will draw more scientists to this field of study and work toward constant innovation and development.
Wang Jing is a machine-learning algorithm engineer currently working in the field of automotive inspection. Passionate about creating technical articles, she hopes her writings will arouse readers' interest in artificial intelligence and inspire more professionals to combine AI with cloud technology and big data to make life safe and convenient.