9.1 Development

9.1.1 Definition of Intelligence

Intelligence refers to the aggregate or global capacity of an individual to act purposefully, think rationally, and manage effectively with their environment. This definition was proposed by the renowned psychologist David Wechsler in 1944 in his book The Measurement of Adult Intelligence [1]. Intelligence can be classified into human intelligence, nonhuman animal intelligence, and artificial intelligence (AI).

AI refers to the ability of a system to perceive the environment and perform actions to achieve specified objectives to the greatest extent possible. The evaluation criteria for AI or machine intelligence are similar to those for human intelligence, which involve taking dynamic environmental changes as the system input, implementing calculations and processing, and generating more efficient outputs to achieve specified objectives. In this definition, manually operated windshield wipers are not considered an intelligent system even if the switch is on the central information display. By contrast, windshield wipers that can automatically sense rainfall and activate themselves are considered an intelligent system even if they also have a traditional physical lever.

It should be noted that the term AI used here is broader than the narrow sense of AI. AI is a specific research field that has emerged in recent years and generally involves the use of machine learning algorithms for tasks that cannot be handled by simple causal rules. However, for the AI definition provided in this chapter, it is not necessary for the system to use complex algorithms.

In addition to intelligence, “smart” has similar meaning sometimes. Smartphones and smartwatches are commonly used instead of intelligent phones or intelligent watches. “Smart” does not constitute the scope in academic research but rather is a characteristic of consumer products. Although a smart device has no strict definition, it typically encompasses three features: powerful computing capabilities, real-time access to the Internet, and an open operating system.

For machines, intelligence and smart have similar meanings which often overlap but are not equivalent. Intelligence places more emphasis on the ability to handle each specific task. For example, brands such as Mercedes-Benz and Nissan have “intelligent headlights” to emphasize their ability to provide different lighting solutions under various environmental conditions. However, headlights rarely have new functions other than illumination. Conversely, smart emphasizes the diversity of tasks that can be handled, like “Smart TV”, and the most significant difference between smart and traditional televisions is the availability of richer content resources.

“Intelligent vehicles” is more popular than “smart vehicles” or “smart cars.” The reason may be that, in addition to the intelligent cockpit HMI system, intelligent vehicles also encompass autonomous driving, which is clearly beyond the definition of a smart device. In terms of the definition scope, if we only discuss intelligent cockpit HMI systems without involving autonomous driving, the use of the term “smart” would not be inappropriate, however, most people are still accustomed to using “intelligent.”

Automotive HMI combines both meanings of intelligence and smart. On one hand, the automotive cockpit is becoming increasingly similar to a smart device. Features such as videos, games, and lifestyle services have gradually been incorporated into the vehicle. In fact, many automotive HMI systems even utilize the Android operating system, which is similar to that widely used in smartphones and televisions. On the other hand, an automotive HMI system is not simply a collection of functions. It needs to collect real-time positioning, driver status, and other traffic information as well as several other environmental factors to provide more proactive and efficient services to users. We refer to these two directions of automotive HMI development as functional and contextual intelligence, respectively.

9.1.2 Automotive Cockpit as the Best Carrier of Intelligence

Automotive HMI is expected to become the most intelligent device available to consumers in the future. The automotive cockpit will also become the most intelligent space that consumers can access. These statements may sound radical, especially given that the current level of intelligence in automotive HMI systems is not yet on par with that in smartphones. However, when we strip down both automotive HMI systems and smartphones to machines for computation and analyze their architectures, we will discover the advantages of automotive HMI.

All computational machines require input and output devices. In 1949, John von Neumann, “the Father of the Modern Computer,” proposed a computer architecture that consists of five components: input devices, memory unit, arithmetic/logic unit, control unit, and output devices, as shown in Fig. 9.1. Input and output devices are not unique to electronic computers. As early as 1833, the British inventor Charles Babbage incorporated punched cards as input devices in his Analytical Engine (a purely mechanical calculator) and used a printer, plotter, and bell as output devices.

Fig. 9.1
An illustration of the computer architecture proposed by John von Neuman. It has 3 main components. Input device, central processing unit with a control, arithmetic or logic, and memory unit in the top-down order, and an output device. Arithmetic and memory units connect via bi-directional arrows.

Computer architecture proposed by John von Neumann

The intelligence level of a machine relies heavily on its input and output devices. According to David Wechsler's definition of intelligence, the machine should be able to adapt to specified environments, that is, it should receive sufficiently rich environmental information as an input. Simultaneously, the machine should be capable of efficient response and execution, which requires it to provide sufficiently rich forms of output. These inputs and outputs should be highly automated, minimizing the need for human intervention. Therefore, sensors play a crucial role as input devices, whereas actuators are essential output devices.

Although smartphones have powerful computational capabilities, they have a limited number of built-in sensors, including cameras, microphones, and inertial measurement units (IMUs). Moreover, some sensors may not work effectively when the smartphone is not in use. For example, when a smartphone is placed face up on a flat surface, the rear camera is unable to capture anything and the front camera can only capture the ceiling, thus providing no valid information. By contrast, the cockpit of an intelligent vehicle can be equipped with a wider variety of sensors that can work effectively as long as the vehicle is in use. For instance, in the cabin, seat cushion sensors can detect the presence of passengers, in-vehicle cameras can monitor user actions and expressions, steering angle sensors can determine driver fatigue, and microphone arrays can not only capture user voices but also locate which specific user is speaking. On the vehicle body and chassis, wheel speed sensors combined with IMUs can precisely determine the vehicle's state during motion, and the power system can monitor real-time energy consumption. For the external environment, cameras and radar can monitor surrounding vehicles, pedestrians, and obstacles, while rain and light sensors can assess current weather conditions.

The output devices of smartphones are even more limited, typically consisting of only the screen and speaker. Additionally, the screen and speaker can neither directly impact the motion of any physical device nor do they fall under the narrow definition of actuators. By contrast, vehicles have a wide range of actuators in addition to screens and speakers. Each electric machine in the vehicle can be considered an actuator, allowing adjustments in the seat position, mirror angle, trunk lid opening/closing, climate control airflow, and windshield wiper speed. Moreover, the vehicle itself is a large-scale actuator as it can maintain safe driving by controlling the outputs of the powertrain and chassis systems, avoiding loss of control and collisions with other vehicles, pedestrians, and obstacles. With advanced autonomous driving, the entire vehicle can also serve as a mode of transportation, acting as an actuator that transports users to their destinations.

The abundant sensors and actuators in vehicles are designed, calibrated, and managed in a unified manner, enabling a high level of synergy among them. Although smartphones can connect to some smart home devices to access a wide variety of sensors and actuators, such user-configured systems cannot achieve the same level of coordination as that observed in vehicles.

The rich array of sensors and actuators in vehicles can create vast space for potential intelligent scenarios. Once the user enters the vehicle, the vehicle can automatically adjust the climate temperature and ambient lighting color according to the driver's physical and mental state. The vehicle can also automatically adjust the seat position and rearview mirror angle based on the current sitting posture. After the navigation destination is set, the vehicle can automatically detect gas stations or restaurants along the route and recommend 2–3 meal options according to the user's preferences for quick selection. During the driving process, the vehicle can automatically adjust the brightness, range, and beam shape of the headlights based on environmental factors such as lighting intensity, position of nearby vehicles, and weather conditions. Upon arrival at the destination, the vehicle can drive autonomously from the building entrance to the designated parking space without the need for driver control. To realize these intelligent scenarios, the automotive HMI system must be made the core of computation and decision-making, while all input and output devices in the vehicle should be fully integrated. However, if the relatively isolated smartphone is considered as the core instead, it will be challenging to integrate these vast amounts of data.

Vehicles have tremendous potential for intelligent scenarios, but realizing this vision is not an easy task. First, the traditional electrical/electronic architecture of vehicles isolates the various systems from each other. For instance, although vehicles can collect real-time wheel speed data, these data may not be fed to the navigation system to correct real-time positioning. Connecting these data requires not only scenario definitions based on user experience but also an upgrade of the electrical/electronic architecture of the entire vehicle. Second, software and algorithms are not the strengths of traditional automotive companies. Once all the data are interconnected, powerful software and algorithms are needed to analyze them. The automotive industry generally lags behind the Internet industry in this regard, which has limited the output of intelligent experiences. For example, the concept of recommending 2–3 meal options based on user preferences may sound simple; however, in reality, there is currently no smartphone software that can perform this task satisfactorily, and most users still spend a considerable amount of time browsing through lengthy menus. Additionally, data security and privacy protection are crucial issues. Even if various data can be fully integrated at the engineering level, complying with the laws and regulations related to data security is still essential. Owing to the rapid development of the intelligent automotive industry, there is often a lag in the formulation of relevant laws and regulations.

9.2 Evaluation Indexes

The evaluation of automotive HMI intelligence can be divided into second-level evaluation indexes, which include comprehension, functional intelligence, and contextual intelligence.

9.2.1 Comprehension

Comprehension refers to the system's ability to understand the users' natural commands and engage in effective interactions. For automotive HMI systems, comprehension primarily focuses on the voice control modality. The voice commands from users are often not pre-defined words but rather colloquial sentences with contextual logic. Therefore, in addition to recognizing each word, the voice interaction system needs to analyze and understand these words to fully grasp the user's true intention. By contrast, interaction tasks using modalities such as touchscreens or buttons have clear operational purposes and a limited range of choices, eliminating the need for the system to comprehend the user's input. For example, in a list of navigation destinations on the central information display, each page may present six options, and the user can select and tap on one of these options. Subsequently, the system can accurately determine which option the user has selected based on the coordinate value of the touch point on the screen. In the future, the proliferation of more natural HMI modalities, such as gesture interaction, facial expression interaction, and brain-machine interfaces, may further expand the application scope of comprehension. Intelligent functions that do not require active user input do not fall under the scope of comprehension and will be discussed under contextual intelligence. Although excellent comprehension capabilities can make interactions more natural and convenient, they do not increase the number of functions or directly improve task success rates. Therefore, comprehension should not be confused with indexes under utility.

The in-vehicle voice interaction system should communicate freely with the users within a specified scope of objectives, providing them with a sense of efficiency, convenience, authenticity, reliability, and respectfulness. As voice control offers great flexibility in both input and output, designers can utilize distinctive responses to imbue vehicles with specific personalities, thereby enhancing the human–vehicle relationship, and enabling the vehicle to become more than just a tool.

The foundation of voice control comprehension lies in achieving natural language conversations [2]. Early voice control systems could only recognize specified mechanical commands such as “raise the temperature”. These commands could not be modified by users as any changes may impede system recognition. Natural language conversations significantly relax the constraints on the range of commands, allowing users to express themselves similarly to that in normal interpersonal communication. For example, phrases such as “It is too cold” or “I feel a bit chilly” can be understood as a request to raise the temperature of climate control.

In addition to natural language conversations, voice control should also possess comprehension capabilities such as contextual understanding, interruptibility, error correction, and sound-source localization. Contextual understanding refers to the system's ability to infer the topic of discussion based on the context when users engage in continuous conversation for more than one round with the HMI system. For example, if a user asks about the weather in Munich and then follows up with “what about Stuttgart?”, this indicates their interest in knowing the weather condition in Stuttgart rather than seeking other information about the city. Interruptibility means that users can interrupt the system's voice announcement and directly state the next command if they have already understood the system's intention before it completes the full announcement. This significantly improves the efficiency of the voice interaction process. Error correction allows users to rectify partial information when they make a mistake in their expression, without the need to restart the conversation. For instance, if a user incorrectly states the 10th digit of an 11-digit phone number, they can simply repeat the last four digits, avoiding the need to repeat the entire number. Sound-source localization refers to the ability of the voice interaction system to recognize which occupant in the vehicle is speaking via directional microphones. For example, upon identifying that a rear passenger has said “raise the temperature,” the system can adjust the temperature specifically for the rear seats.

9.2.2 Functional Intelligence

Functional intelligence in automotive HMI systems refers to the quantity and richness of open-ended applications that are not directly related to driving and vehicle control. These applications typically require Internet connectivity and may include entertainment applications, such as music, videos, and games, as well as service applications, such as dining, car wash, and parking.

The openness of these applications is reflected in two aspects: first, users can access newer and more applications through online downloads and upgrades. Second, the content of these applications is updated in real-time from online servers rather than being fixed within the local system. These applications can assume various forms within the automotive HMI system. They can exist as standalone software applications with their icons serving as gateways, similar to applications on smartphones. Alternatively, they can be integrated into existing modules. For example, users can choose to access an online music library from the music playback interface or they can click on the restaurants that appear on the map to make reservations.

Function Richness

Function richness refers to the number of functions covered by open-ended applications in automotive HMI systems. In the actual evaluation process, a function library can be established, and the proportion of functions provided by a specific HMI system can be examined.

The purpose of automotive functional intelligence evaluation is to integrate more valuable applications into the automotive HMI system and provide a better user experience. Therefore, in the evaluation process, we can consider excluding implementation methods for applications that do not fully fall under the capabilities of the automotive HMI system. Three types of methods can be excluded: first, methods using remote human customer service to implement specific types of functions or services such as General Motors' OnStar. Although human services can provide many functions, the automotive HMI system primarily serves as a communication device throughout this service process, with virtually no involvement of its own intelligence. Second, methods using third-party non-native in-vehicle devices to implement specific types of functions or services such as the Apple CarPlay projection. CarPlay is only supported by Apple devices but not Android devices and hence does not have user universality. Additionally, the automotive HMI system only serves as an output device and does not independently provide any intelligent services. Third, methods using non-automotive scenario applications to implement specific types of functions or services such as logging into the web version of WeChat through a browser on the central information display. Browser-based services are not optimized for driving scenarios and usually provide a poor user experience. Moreover, if an automotive HMI system can achieve the majority of functions and services simply by having a browser, further evaluation of such systems would be of limited significance. Common open-ended functions and services in automotive HMI systems are presented in Table 9.1.

Table 9.1 Common open-ended functions and services in automotive HMI systems

Occasionally, an automotive HMI system may provide multiple applications with similar functions. For example, in the Chinese market, some vehicles offer various online music applications, including QQ Music, Kugou Music, and Tingban, or various map navigation applications, including Amap, Baidu Map, and Tencent Map. From the perspective of function richness alone, having more applications of the same type is considered better as it provides users with more choices. However, if we consider the overall user experience, having too many applications of the same type can overwhelm users and make it difficult for them to choose. A better approach is for the automotive HMI system to integrate the resources of all similar applications into a unified platform. For instance, when a user searches for a specific song on a single online music platform, the system can automatically search for the highest quality version from multiple online music applications, eliminating the need for users to make inefficient choices, judgments, and corrections.

The applications provided by some car models are not directly sourced from Internet companies but instead involve the participation of automotive manufacturers. These applications can generally be categorized into three types. The first type is a platform-type application that integrates content resources from multiple applications. The second type uses the brand’s DNA to filter content. For example, a luxury automotive brand can package recommendations of high-end restaurants that align with its brand tonality in a restaurant recommendation application. The third type is the brand own created content. For instance, Nio Radio, an online radio station by Nio Inc., is the world's first user-created audio community for Internet-connected vehicles.

Whether having more open-ended functions in automotive HMI systems is better remains a topic with no absolute consensus in the industry. Why would users operate the vehicle's central information display to book a train ticket instead of booking it directly on their smartphones? Similar questions arise because the convenience of using these applications on the central information display does not necessarily surpass that of using a smartphone directly. However, if in addition to purchasing a train ticket, parking applications can automatically reserve a parking space at the train station for the user or if the ticketing application can provide rescheduling recommendations when encountering severe traffic congestion on the way to the train station, the advantages of the automotive HMI system over a smartphone become more apparent. Thus, several seemingly redundant in-vehicle functions are not actually useless; it means that the current design has not yet fully optimized the user experience flow.

In some standardized evaluation processes, where it is not feasible to fully and quantitatively assess the functions provided by each application, we can make the following general assumptions: it is better to have more types of functions and services; it is better to have a greater number of applications within each type; and brand-customized applications are superior to generic applications.

Content Resource Richness

Content resource richness refers to whether the online resources provided by open-ended applications in automotive HMI systems can meet the users' common needs. These online resources include purely digital content resources (e.g., music and movies) and points of interest with real locations (e.g., restaurants and gas stations).

When evaluating content resource richness, the content contained in third-party non-native in-vehicle devices, such as the Apple CarPlay projection, is typically not included. Under current technological conditions, smartphones serve as the gateway to almost all online content resources. For example, nearly any song can be found in mainstream online music applications, and almost all points of interest, including restaurants and gas/charging stations, can be found in mainstream map or service applications. If all these content resources from smartphones are considered, all vehicles would have access to highly comprehensive content resources, rendering the evaluation meaningless.

Content resource richness is subject to dynamic changes. For example, in the Chinese market in 2017, SAIC Roewe RX5 was launched, which was equipped with the Banma intelligent HMI system jointly developed by the SAIC Group and Alibaba, and integrated with the then-rich resource platform Xiami Music. This system was later adopted in other car models under the SAIC Group, including MG HS, as shown in Fig. 9.2. However, owing to Alibaba's gradual defeat in the competition for music copyrights against Tencent and NetEase, music resources in the Banma system decreased significantly. Subsequently, the Banma system introduced music resources from the Tencent-owned platform to expand its online music resources.

Fig. 9.2
A photo presents the close-up view of a mobile screen in landscape display. It has 3 rectangular structures with 3 icons. A music track playing with pause button at the center, an icon with text in a foreign language, the number 99, with text in a foreign language below are in left-right order.

(Source SAIC MG)

Banma Intelligent HMI System offering Xiami Music in MG HS (2018)

9.2.3 Contextual Intelligence

Even if an automotive HMI system has a wide range of functions and abundant content resources, this does not necessarily ensure good usability. The functions and content in HMI systems should also be matched and optimized for in-vehicle scenarios to achieve better contextual intelligence.

Contextual intelligence is more important for vehicles than for smartphones. This is because drivers often need to operate the automotive HMI system while driving, and significant driver distraction can cause potential hazards in driving safety. Therefore, intelligent functions should neither consume excessive time nor require substantial effort. For example, on a smartphone, users often spend several minutes browsing a restaurant menu, making selections, and placing an order; however, in a driving scenario, even a few seconds spent browsing the menu may pose serious safety hazards. Therefore, food service applications need to recommend a very limited number of choices to the users to minimize driver distraction. Such accurate dish recommendations rely heavily on rich data collection and powerful intelligent algorithms.

Furthermore, in non-driving scenarios, smartphone users have moderately low cross-application demands, whereas such demands occur very frequently during in-vehicle driving scenarios. For example, when using food service applications on a smartphone at home or in a shopping mall, users typically focus on comparing dishes from each restaurant and reading reviews from other customers without the need for navigation, which implies they do not need to switch to other applications. Conversely, when driving in a vehicle, users are likely to start navigation after selecting a desired restaurant. Many food service applications on smartphones do not have built-in navigation functions; therefore, users must switch to other dedicated navigation software, which disrupts the user experience flow. By contrast, in some automotive HMI systems, map navigation and restaurant searching are integrated into the same application, thereby eliminating the need for users to switch between applications and providing a more seamless user experience.

Although services in automotive scenarios are faced with challenges such as driver distraction or cross-application demands, designing automotive intelligent scenarios also has an advantage that the user's intention can be determined more efficiently. When a user picks up their smartphone and unlocks the screen at home, they may want to use a video application for entertainment, contact friends through WeChat, check the weather, or perform one of many other functions. Accurately determining the user's intention is difficult for a smartphone. However, in automotive scenarios, inferring the user's intention is considerably easier. When a user enters the car, they are likely to set a navigation destination. When encountering traffic congestion, they may be interested in some soothing music. When approaching a shopping mall, they may want to learn about special offers by the stores in the mall. By integrating various data and performing computations and predictions, automotive HMI systems can potentially provide users with more proactive, seamless, and intuitive interaction experiences.

  1. 1.

    Definition of Scenario Storylines

The series of situations and corresponding behaviors that users encounter while planning a trip and driving or riding in a vehicle until they reach their destination constitute the travel scenario storyline. The most common travel purposes for Chinese automotive users are daily commuting, shopping at malls, urban recreation, suburban outings, and long-distance road trips. In specific travel scenarios, users will have varying needs that arise from factors such as changes in the stage of vehicle usage, driving routes, road conditions, and weather conditions, among others. The scenario storyline generally includes the following stages of vehicle usage: preparation, departure, on route, arrival, refueling or charging, and third space, as shown in Fig. 9.3. However, some storylines may contain only three to five of the aforementioned stages.

Fig. 9.3
An illustration of a 6-stage travel scenario storyline. 1. Refueling. 2. Thirdspace. 3. Preparation. 4. Departure. 5. On route. 6. Arrival. Journey starts at preparation and ends just before refueling in a clockwise circular path. Each stage has its respective substages too.

Six stages of a complete travel scenario storyline

Next, we will introduce a typical scenario storyline for a daily commute. The protagonist is Nick, who lives in a city and drives an electric car to work on weekdays.

At 7 o'clock on a winter Monday morning, Nick wakes up at home. In the preparation stage, he first turns on his cell phone to check the weather and road conditions. The Monday morning rush hour traffic is as expected. Nick estimates his departure time while quickly freshening up and getting dressed. At this point, if the car can proactively send traffic information to Nick's phone for his commute to work, he will only need to tap the screen to learn about this information, without having to locate and activate the navigation application on his phone and enter the address.

In the departure stage, Nick heads to the underground garage, finds his car in the designated parking spot, unlocks it with the key, and enters the car. After entering the car, he places his laptop bag and cell phone in their designated places. After an entire night in the cold garage, the steering wheel and leather seats have become extremely cold. Nick first turns on the climate control system to warm up the vehicle, which also makes his hands more flexible for gripping the steering wheel. In this case, if the vehicle supported remote climate control, Nick could have turned it on 10 min before leaving his house to ensure a warm and comfortable environment upon entering the car. After adapting to the temperature inside the car, he leans forward to activate the in-vehicle navigation software, enters his work address, and selects the most time-efficient route for navigation. Subsequently, Nick opens the in-vehicle music app, finds his favorite playlist, and begins to play it. Once everything is ready, he shifts the gear to Drive, checks his surroundings, and drives out of the parking space.

During the on route stage, Nick needs to pass through residential roads, urban roads, and expressways. When he exits the community, he notices that the visibility is poor. Therefore, he pulls over by the road near the community gate, opens the in-vehicle weather application to check the air quality index, and finds a yellow warning for haze. Thus, he turns on the air purification function of the car's climate control system before proceeding onto the main urban road. If the vehicle could proactively remind the driver to activate the purification function when it detects poor external air quality, Nick could have avoided the need to stop the car to perform complicated operations.

After Nick drives on urban roads for a while, the road becomes congested because of the morning rush-hour traffic, with an endless line of red lights ahead. He wants to take advantage of this time to pre-order breakfast from his favorite coffee shop on the bottom floor of his office building such that he could pick it up directly after arriving. Ordering food on a cell phone is cumbersome, and he has to occasionally pay attention to the distance from the vehicle in front, which causes Nick to feel somewhat anxious. At this point, if the vehicle was equipped with a food service application that can display several recommended food options based on the user's order history, Nick could simply glance at them, select a few desired items, and complete payment using the car's voice interaction system, thus avoiding excessive visual distraction and maintaining situational awareness. Once the breakfast has been successfully ordered, the road also becomes clear. With the music playing, Nick drives onto the urban expressway. He is very familiar with this road; therefore, he does not need to constantly check the navigation information. After driving for more than 10 min, he reaches the expressway exit and finds that the usually smooth exit is now crowded because a vehicle ahead had been scratched when changing lanes. Nick regrets not taking the previous exit, as waiting for two additional traffic lights would have been faster than being stuck in the current queue. Nick then increases his speed to avoid being late for work. In fact, the car could have proactively alerted Nick prior to the previous exit that the road ahead was congested and advised him to leave the expressway earlier to reduce the waiting time and arrive at work faster.

After a journey of more than 30 min, Nick finally arrives at the company parking area and enters the parking stage. At this point, the vehicle has only 30% of the battery remaining and Nick hopes to fully charge it at his workplace before driving back home later. He drives to the charging piles in the parking area, only to find that all of them are occupied. Thus, he has no choice but to park temporarily in another parking lot and plans to check for an opportunity to charge his car at noon. If the vehicle had proactively inquired about the need for charging when the battery level fell below a certain threshold and provided options to reserve a charging pile, Nick would not have to worry about not having time to move his car to the charging pile because of an unscheduled meeting at noon, thus avoiding anxiety.

After parking the car, Nick unbuckles his seatbelt, takes his laptop bag and phone, and opens the door to step out. Nick's phone suddenly rings and it turns out to be a colleague asking about the location of the meeting room he had reserved. He tries to recall the room number and briefly chats with his colleague about the meeting content while thinking about the route to the coffee shop. When Nick arrives at the coffee shop and ends the call, he suddenly realizes that he may have forgotten to lock the car, so he has to go back and check. If he could view and control the car's door lock through a mobile application, he would not have to spend time returning to the parking area.

Similar scenario storylines can be designed for various situations, such as Picking up kids from school, meeting with friends on weekends, or going on a family outing to the suburbs. In these storylines, our primary focus is not to evaluate individual car technologies and functions but to assess whether these functions can seamlessly integrate with the previous and subsequent tasks in the scenario storyline and provide a more efficient, smooth, and seamless experience.

  1. 2.

    Number of Operation Steps and Intelligence Level

Various criteria can be applied to assess the intelligence level of automotive HMI systems, such as the total time required by users to complete a set of tasks or the degree of satisfaction they experience after completing a task. Under current technological conditions, the number of operation steps is an index that is highly correlated with intelligence level. Fewer operation steps needed to complete a task indicate a higher intelligence level. In human-to-human communication, we cannot necessarily equate lower communication content with higher intelligence when achieving the same communication purpose. This is because effective interpersonal communication often involves not only improving efficiency but also demonstrating proper etiquette, expressing emotions, and fostering mutual empathy. However, as the intelligence level of vehicles is still far below that of humans, it is not necessary to impose such high requirements. If vehicles can achieve optimal efficiency, they will be able to meet the majority of user requirements under different scenarios. The most intuitive performance index of operational efficiency is the number of operation steps. Therefore, if we can only use a single objective and quantitative index to describe the intelligence level of an automotive HMI system in achieving specified goals, the choice should be the number of operation steps.

Optimizing the number of operation steps through intelligent means can be achieved in two ways. The first approach is proactive recommendations. For example, during mealtime, if the map can automatically display a gateway for restaurant selection, users can directly tap on it to search for nearby restaurants, as shown in Fig. 9.4. Otherwise, users would need to enter the directory to search for points of interest and select a restaurant, which requires 2–3 additional steps. If the system can recommend a specific restaurant based on the user's preferences, it can also save them the trouble of searching through a list. However, such a design requires a higher intelligence level to achieve precise recommendations, as inaccurate recommendations may confuse the users. The second approach is using scenario-based modular designs. For example, when users want to take a nap inside their vehicle, they typically need to close the sunshade, control the windows, set the alarm, and adjust the music and lighting, among other tasks. In a traditional interaction logic tree, these functions are distributed in different locations, and some functions are located at deep logic levels, requiring users to perform several tedious operations. By contrast, if these functions are integrated into a “nap mode” widget or shortcut directory, users can simply activate this mode to conveniently operate all functions, significantly reducing the total number of operation steps, as shown in Fig. 9.5. Furthermore, if the system can automatically detect when the user wants to rest inside the vehicle after parking, it can automatically enter the nap mode, further reducing the operation steps and enhancing the intelligence level.

Fig. 9.4
A photo presents a close-up view of a display unit within a Mercedes-Benz S-Class. It has a navigation map with a tracker following a straight route. On the left, an arrow indicates a restaurant icon at the top from a list of 4 options, top-down. The icon has a plate flanked by a knife and a fork.

Restaurant selection gateway in Mercedes-Benz S-Class (2020), which automatically pops up at mealtime

Fig. 9.5
A photo presents a close-up view of the display unit of a 4-wheeler. Left. A timer reads, 00 colon 00 colon 00. Right. An aerial route map with 6 icons on the left in 2 rows of 3 each. Texts are in a foreign language.

Nap mode widget in the 2021 Geely Xingyue L, translated from Chinese language

In the two cases mentioned above, we can observe that, under current technological conditions, good contextual intelligence can be simply achieved by using excellent design and simple logic. However, for contextual intelligence to achieve its fullest potential, we must rely on richer data inputs and more powerful intelligent algorithms.

When counting the number of operation steps, in-depth research on how to define “one step” should be conducted. We can consider a single tap on the central information display, a press of a physical button, or a brief voice control command as one operation step. However, performing precise sliding gestures on the central information display, such as increasing the climate temperature by 5° in some car models, is more challenging than a single taps. Similarly, stating a specific navigation address is more complicated than issuing quick commands such as “confirm” or “cancel.” Therefore, for more complex interaction steps, we can consider assigning them a coefficient greater than 1 to calibrate the actual operational load of such steps.

In automotive HMI systems, certain purely entertainment-oriented functions, such as video games and casual voice chats, do not necessarily need to pursue efficiency. Therefore, the intelligence level of these functions cannot be assessed through step counting and usually requires a more subjective and non-standardized evaluation method. However, these entertainment-oriented functions are not essential in automotive HMI systems, and, even if they exist, do not constitute a significant proportion.

  1. 3.

    Other Indexes of Contextual Intelligence

In addition to the two indexes directly related to the number of operation steps, namely, proactive recommendations and scenario-based modular design, contextual intelligence also needs to consider the sense of immersion, personalization, and privacy protection. A sense of immersion refers to the comprehensive atmosphere created by the system's functions or services in a specified scenario, enabling users to immerse themselves in an enjoyable experience. Examples include extensive ambient lighting, captivating on-screen visuals, and surround sound effects. Personalization means that, in a specified scenario, the system can provide targeted and differentiated functions or services for different users at different times and in different environments, such as automatically adjusting the seat position according to the user's body size or recommending familiar restaurants and dishes when entering a commercial district. Privacy protection refers to the security functions or services provided by the system to protect the user's privacy. On one hand, this includes compliance with corresponding data security regulations. For example, cars sold in China should ensure that the images from the car's external camera cannot be directly transmitted outside the vehicle (e.g., to the cloud or the user's cell phone), only if human faces and car number plates outside are blurred. On the other hand, it also involves providing users with a subjective sense of privacy, such as blocking in-vehicle cameras with physical covers.