CN115065962A

Movatterモバイル変換

Info

Publication number: CN115065962A
Application number: CN202210645612.0A
Authority: CN
Inventors: 刘世健; 王钊; 彭康; 胡赛; 张若毓
Original assignee: Dongfeng Motor Group Co Ltd
Current assignee: Dongfeng Motor Group Co Ltd
Priority date: 2022-06-08
Filing date: 2022-06-08
Publication date: 2022-09-16

Abstract

Translated fromChinese

本发明公开了一种基于机器学习的车载蓝牙电话来电自动接听方法，其包括如下步骤：1)当有来电时，判断车载蓝牙与蓝牙电话的蓝牙是否连接；2)当与蓝牙电话的蓝牙连接时，判断车辆是否在行驶中；3)当得出车辆在行驶中时，通过强化学习判断是否接听该来电号码，及计算该来电号码的来电的次数。本发明还提供一种基于机器学习的车载蓝牙电话来电自动接听系统。本发明是在车辆行驶过程中车载蓝牙电话的来电自动接听功能基础上，通过机器学习的方法实现选择性的自动接听电话。

The invention discloses a method for automatically answering incoming calls of a car bluetooth phone based on machine learning, which comprises the following steps: 1) when there is an incoming call, judging whether the car bluetooth is connected with the bluetooth of the bluetooth phone; 2) when the bluetooth of the bluetooth phone is connected , judge whether the vehicle is running; 3) when it is found that the vehicle is running, judge whether to answer the caller number through reinforcement learning, and count the number of calls from the caller number. The invention also provides an automatic answering system for incoming calls of a vehicle-mounted bluetooth phone based on machine learning. The invention is based on the automatic answering function of incoming calls of the vehicle-mounted bluetooth phone during the running of the vehicle, and realizes the selective automatic answering of calls through the method of machine learning.

Description

Translated fromChinese

基于机器学习的车载蓝牙电话来电自动接听方法及系统Method and system for automatic answering of car bluetooth phone calls based on machine learning

技术领域technical field

本发明属于汽车通信技术领域，具体涉及一种基于机器学习的车载蓝牙电话来电自动接听方法及系统。The invention belongs to the technical field of automobile communication, and in particular relates to a method and system for automatically answering incoming calls of a car bluetooth phone based on machine learning.

背景技术Background technique

随着车机系统的逐渐普及，在车辆上的车载APP也越来越多样。但车机屏幕与驾驶人存在着一定距离。在车辆行驶过程中，驾驶人去操作车机屏幕时，会导致视线偏移，容易影响驾驶安全性。比如蓝牙电话的来电行为，需要驾驶人去主动点击接听。在某些蓝牙电话APP中添加了来电自动接听功能，无需驾驶人主动点击接听，即可解听来电。可对于驾驶人来说，并不是所有的来电都想解听，即并不想蓝牙电话APP都执行自动接听指令。因此，如何判断一个电话是否自动接听成为关键。With the gradual popularization of in-vehicle systems, the in-vehicle APPs on vehicles are becoming more and more diverse. But there is a certain distance between the car screen and the driver. During the driving process of the vehicle, when the driver operates the screen of the vehicle, it will lead to deviation of sight, which will easily affect the driving safety. For example, the call behavior of a Bluetooth phone requires the driver to take the initiative to click to answer. In some Bluetooth phone APPs, the automatic answering function of incoming calls has been added, and the incoming call can be answered without the driver taking the initiative to click to answer. But for drivers, not all incoming calls want to be answered, that is, they do not want the Bluetooth phone APP to execute automatic answering instructions. Therefore, how to judge whether a call is answered automatically becomes the key.

发明内容SUMMARY OF THE INVENTION

本发明的目的在于提供一种基于机器学习的车载蓝牙电话来电自动接听方法及系统，该方法和系统是在车辆行驶过程中车载蓝牙电话的来电自动接听功能基础上，通过机器学习的方法实现选择性的自动接听电话。The purpose of the present invention is to provide a method and system for automatically answering incoming calls of a car bluetooth phone based on machine learning. Sexual automatic answering of calls.

本发明所采用的技术方案是：The technical scheme adopted in the present invention is:

一种基于机器学习的车载蓝牙电话来电自动接听方法，其包括如下步骤：A method for automatically answering incoming calls based on a machine learning based on-vehicle Bluetooth phone, comprising the following steps:

1)当有来电时，判断车载蓝牙与蓝牙电话的蓝牙是否连接；1) When there is an incoming call, determine whether the Bluetooth of the car is connected to the Bluetooth of the Bluetooth phone;

2)当与蓝牙电话的蓝牙连接时，判断车辆是否在行驶中；2) When connected with the bluetooth of the bluetooth phone, determine whether the vehicle is running;

3)当得出车辆在行驶中时，通过强化学习判断是否接听该来电号码，及计算该来电号码的来电的次数：3) When it is concluded that the vehicle is running, use reinforcement learning to determine whether to answer the caller number, and calculate the number of calls from the caller number:

(1)读取存储介质中该来电号码的来电的次数，并计算该来电号码的来电次数，判断来电号码是否超过一定次数；(1) Read the number of incoming calls of the incoming call number in the storage medium, and calculate the incoming number of the incoming call number to determine whether the incoming call number exceeds a certain number of times;

(2)当该来电号码来电超过一定次数时，在车机屏幕上进行来电提醒；(2) When the caller number exceeds a certain number of times, the caller will be reminded on the car screen;

(3)等待一定时间后，通过处理器输出以前驾驶人对当前来电号码的操作动作进行自动接听、隐私模式自动接听或自动挂断操作。(3) After waiting for a certain period of time, the processor outputs the operation actions of the previous driver on the current caller number to perform automatic answering, automatic answering in privacy mode or automatic hanging up operation.

更进一步的方案是，步骤(1)中，当存储介质中无该来电号码，则判断该号码是第一次来电，无法为其提供学习对象，当一个号码的来电次数小于一定次数时，为学习阶段，只计算该来电号码的来电次数，不做响应。A further solution is that, in step (1), when there is no such caller number in the storage medium, it is judged that the number is the first call, and the learning object cannot be provided for it, and when the number of calls of a number is less than a certain number of times, it is In the learning stage, only the number of calls from the caller number is counted, and no response is made.

更进一步的方案是，步骤(3)中，等待一定时间是给驾驶人预留的操作时间，如驾驶人在这个时间段内进行了操作，则处理器将该操作计入学习对象，不输出以前驾驶人对当前来电号码的操作动作进行自动接听、隐私模式自动接听或自动挂断操作。A further solution is that, in step (3), waiting for a certain time is the operation time reserved for the driver. If the driver performs an operation within this time period, the processor will count the operation as a learning object, and will not output the operation. In the past, the driver performed automatic answering, automatic answering in privacy mode, or automatic hanging up operation on the operation action of the current caller number.

更进一步的方案是，所述一定次数为8-12次。A further solution is that the certain number of times is 8-12 times.

更进一步的方案是，所述一定时间为4-7秒。A further solution is that the certain time is 4-7 seconds.

更进一步的方案是，步骤3)中，当处理器执行接听来电号码操作后，询问驾驶人是否满意作为环境奖励R，并记录作为学习依据。A further solution is that, in step 3), after the processor performs the operation of answering the incoming call number, it asks the driver whether he is satisfied as an environmental reward R, and records it as a learning basis.

更进一步的方案是，所述强化学习方法包括模型的建立、简化和求解；A further solution is that the reinforcement learning method includes model establishment, simplification and solution;

其中，模型的建立：Among them, the establishment of the model:

所述模型包括环境要素S、个体的动作A、环境的奖励R、个体的策略π、状态价值函数v_π(s)、动作价值函数q_π(s，a)、环境的状态转化模型

The model includes environmental elements S, individual action A, environmental reward R, individual strategy π, state value function v_π (s), action value function q_π (s, a), and the state transition model of the environment.

环境就是来电场景，每一次来电都会引起环境的改变，因此，环境要素S有无数多个；The environment is the call scene, and each call will cause the environment to change. Therefore, there are countless environmental elements S;

个体的动作A包括自动接听、隐私模式自动接听、自动挂断；The individual's action A includes automatic answering, automatic answering in privacy mode, and automatic hanging up;

环境的奖励R包括满意、不满意、一般；其中，满意为1，不满意为0，一般为0.5；The reward R of the environment includes satisfaction, dissatisfaction, and general; among them, satisfaction is 1, dissatisfaction is 0, and general is 0.5;

策略π是个体的动作A的依据，此处采用条件概率分布分布π(a|s)，表示在环境要素S时进行动作a的概率，即π(a|s)＝P(A_t＝a|S_t＝s)；某一动作A的概率越大，被执行的可能性也就越大；The strategy π is the basis of the individual's action A. Here, the conditional probability distribution π(a|s) is used to represent the probability of performing the action a when the environmental element S is present, that is, π(a|s)=P(A_t =a |S_t = s); the greater the probability of a certain action A, the greater the possibility of being executed;

个体在策略π和环境要素S时，状态价值函数v_π(s)与环境的奖励R不同，状态价值函数v_π(s)考虑在环境要素S时，做出动作后环境奖励R_t+1和后续的R_t+2、R_t+3……，表达式为：v_π(s)＝E_π(R_t+1+γR_t+2+γ²R_t+3+...|S_t＝s)，其中γ为衰减因子，取值为[0，1]；When the individual is in the strategy π and the environmental element S, the state value function v_π (s) is different from the reward R of the environment. The state value function v_π (s) considers the environmental element S, and the environmental reward R_t+1 after making an action and subsequent R_t+2 , R_t+3 ......, the expression is: v_π (s)=E_π (R_t+1 +γR_t+2 +γ² R_t+3 +...|S_t = s), where γ is the attenuation factor, and the value is [0, 1];

动作价值函数q_π(s，a)的表达式为：The expression of the action-value function q_π (s, a) is:

q_π(s，a)＝E_π(R_t+1+γR_t+2+γ²R_t+3+...|S_t＝s，A_t＝a)；q_π (s, a) = E_π (R_t+1 +γR_t+2 +γ² R_t+3 +...|S_t =s, A_t =a);

环境的状态转化模型

是在环境要素S下采取动作a，转换到下一环境要素S’的概率；The state transition model of the environment

is the probability of taking action a under the environmental element S and switching to the next environmental element S';

简化：simplify:

采用马尔科夫决策过程来简化强化学习的建模；如果是真实的环境转化过程，转化到下一环境要素S’的来电与上一则来电相关，也与上上一则来电相关，则简化的方法是假设环境要素转化的马尔可夫性，也就是下一状态的概率仅与上一状态有关，用公式表达为：

The Markov decision process is used to simplify the modeling of reinforcement learning; if it is a real environment transformation process, the incoming call transformed to the next environmental element S' is related to the previous incoming call and is also related to the previous incoming call, then simplify The method is to assume the Markov property of the transformation of environmental elements, that is, the probability of the next state is only related to the previous state, which is expressed as:

同时个体的策略π、状态价值函数v_π(s)、动作价值函数q_π(s，a)也做马尔可夫假设；求解：At the same time, the individual's strategy π, state value function v_π (s), and action value function q_π (s, a) also make Markov assumptions; solve:

将状态价值函数v_π(s)和动作价值函数q_π(s，a)转换为贝尔曼方程，得到：Converting the state-value function v_π (s) and the action-value function q_π (s, a) into the Bellman equation, we get:

v_π(s)＝E_π(R_t+1+γv_πS_t+1|S_t＝s)v_π (s)=E_π (R_t+1 +γv_π S_t+1 |S_t =s)

q_π(s，a)＝E_π(R_t+1+γq_π(S_t+1，A_t+1)|S_t＝s，A_t＝a)q_π (s, a)=E_π (R_t+1 +γq_π (S_t+1 , A_t+1 )|S_t =s, A_t =a)

利用状态价值函数v_π(s)和动作价值函数q_π(s，a)的定义，得到上述两者之间的关系转化式：Using the definitions of the state value function v_π (s) and the action value function q_π (s, a), the transformation formula of the relationship between the above two is obtained:

将两式结合起来得到：Combining the two formulas gives:

寻找最优策略π_*：Find the optimal policy π_* :

即可将状态价值函数v_π(s)和动作价值函数q_π(s，a)表示为The state value function v_π (s) and the action value function q_π (s, a) can be expressed as

得到最后表达式，前面的来电过程代入方程，即可得到下一次来电的最优解。The final expression is obtained, and the previous incoming call process is substituted into the equation to obtain the optimal solution for the next incoming call.

更进一步的方案是，当蓝牙电话的蓝牙未连接时，界面置灰，不可操作；当车辆未处在行驶状态，驾驶人在手机上操作，记录此次来电的接听动作和次数。A further solution is that when the bluetooth of the bluetooth phone is not connected, the interface is grayed out and inoperable; when the vehicle is not in the driving state, the driver operates on the mobile phone to record the answering actions and times of the call.

本发明还提供一种基于机器学习的车载蓝牙电话来电自动接听系统，该系统采用上述基于机器学习的车载蓝牙电话来电自动接听方法，其包括蓝牙模块、计算处理器、存储介质；The present invention also provides a system for automatically answering incoming calls based on machine learning for a car bluetooth phone. The system adopts the above-mentioned method for automatically answering incoming calls based on a car bluetooth phone based on machine learning, which includes a bluetooth module, a computing processor, and a storage medium;

蓝牙模块，用于接收手机传来的数据，并将该数据传递给计算处理器；The Bluetooth module is used to receive the data from the mobile phone and transmit the data to the computing processor;

计算处理器，用于接收蓝牙模块传来的数据、记录驾驶人的操作，并进行强化学习；The computing processor is used to receive data from the Bluetooth module, record the driver's operation, and perform reinforcement learning;

存储介质，用于存储计算处理器传来的数据。The storage medium is used to store the data transmitted from the computing processor.

更进一步的方案是，所述计算处理器包括判断模块、强化学习模块、计算模块、执行模块；强化学习模块分别与判断模块、计算模块、执行模块连接；A further solution is that the computing processor includes a judgment module, a reinforcement learning module, a calculation module, and an execution module; the reinforcement learning module is respectively connected with the judgment module, the calculation module, and the execution module;

判断模块用于判断当有来电时，车载蓝牙与蓝牙电话是否连接，及当蓝牙电话的蓝牙连接时，判断车辆是否在行驶中；The judging module is used to judge whether the vehicle Bluetooth is connected with the Bluetooth phone when there is an incoming call, and when the Bluetooth of the Bluetooth phone is connected, judge whether the vehicle is running;

强化学习模块通过强化学习判断是否接听该来电号码；The reinforcement learning module judges whether to answer the caller number through reinforcement learning;

计算模块用于计算该来电号码的来电的次数，及记录驾驶人对该来电的操作；The calculation module is used to calculate the number of incoming calls from the incoming call number, and record the driver's operation on the incoming call;

执行模块用于执行强化学习模块传来的命令，进行自动接听、隐私模式自动接听或自动挂断操作。The execution module is used to execute the command from the reinforcement learning module, and perform automatic answering, automatic answering in privacy mode, or automatic hanging up operation.

本发明通过判断车辆行驶状态来确认该功能是否开启，并且此功能依托于蓝牙电话，在蓝牙电话未连接的情况下，无法实现。The present invention confirms whether the function is turned on by judging the driving state of the vehicle, and this function relies on the Bluetooth phone, which cannot be realized when the Bluetooth phone is not connected.

本发明来电自动接听功能对一则来电的操作可分为自动接听、自动挂断、隐私模式自动接听(驾驶人隐藏通话号码，音频切换到手机上，仅在车机上显示来电中的模式)、等待驾驶人操作。The operation of the automatic incoming call answering function of the present invention can be divided into automatic answering, automatic hanging up, and automatic answering in privacy mode (the driver hides the call number, the audio is switched to the mobile phone, and only the mode of the incoming call is displayed on the car), Wait for the driver to operate.

采用机器学习的方式，对来电进行分析，由于一则来电最终结果只有三种结果：接听、隐私接听和不接(由于在车辆行驶过程中，不存在驾驶人对来电未知的情况，故将来电未接被判定为不接)，而判断是否接听与驾驶人以前的行为习惯相关，因此可采用机器学习中的强化学习算法来判断是否接听。Using machine learning to analyze incoming calls, there are only three final results for an incoming call: answering, answering privately, and not answering (due to the fact that the driver does not know the incoming call during the driving process, the incoming call will be Missed answer is judged as not answering), and judging whether to answer is related to the driver's previous behavior habits, so the reinforcement learning algorithm in machine learning can be used to determine whether to answer.

本发明的有益技术效果：Beneficial technical effects of the present invention:

实现了车辆在行驶过程中来电自动接听功能，提高了驾驶安全性。通过强化学习的方式，能结合驾驶人的习惯去接听电话，提高了车机使用的便捷性。同时提供多种接听模式，丰富了车机的功能。The function of automatic answering of incoming calls during the driving process of the vehicle is realized, and the driving safety is improved. Through reinforcement learning, it is possible to answer the phone in combination with the driver's habits, which improves the convenience of using the car. At the same time, it provides a variety of answering modes, which enriches the functions of the car.

附图说明Description of drawings

下面将结合附图及实施例对本发明作进一步说明，附图中：The present invention will be further described below in conjunction with the accompanying drawings and embodiments, in which:

图1是基于机器学习的车载蓝牙电话来电自动接听系统的结构示意图；Fig. 1 is a schematic diagram of the structure of an automatic answering system for incoming calls of a car bluetooth phone based on machine learning;

图2是强化学习的流程图；Figure 2 is a flowchart of reinforcement learning;

图3是基于机器学习的车载蓝牙电话来电自动接听方法的流程图。FIG. 3 is a flow chart of a method for automatically answering incoming calls on a car Bluetooth phone based on machine learning.

具体实施方式Detailed ways

为了使本发明的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本发明进行进一步详细说明。应当理解，此处所描述的具体实施例仅用以解释本发明，并不用于限定本发明。In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.

实施例1Example 1

参见图2和图3，一种基于机器学习的车载蓝牙电话来电自动接听方法，其包括如下步骤：Referring to FIG. 2 and FIG. 3 , a method for automatically answering incoming calls based on a machine learning based on-board Bluetooth phone includes the following steps:

首先，当车速不为零时以及蓝牙连接时才可以使用该功能，由于该功能基于蓝牙电话，则可以直接读取蓝牙电话的蓝牙连接(hfp_connected)信息，但hfp_connected不等于true时，蓝牙电话页面置灰；而车速状态(car_speed)，可由中间件层处理车速信息，再向APP层提供接口(车速传感器采集车速，然后将数据传递给计算处理器)；当hfp_connected等于true并且car_speed大于零时，可使用该功能；且车机提供此功能的开关，当开关打开时，则可使用此功能。First of all, this function can only be used when the vehicle speed is not zero and when the bluetooth is connected. Since this function is based on the bluetooth phone, the bluetooth connection (hfp_connected) information of the bluetooth phone can be directly read, but when hfp_connected is not equal to true, the bluetooth phone page Grayed out; and the car speed status (car_speed), the middleware layer can process the car speed information, and then provide an interface to the APP layer (the car speed sensor collects the car speed, and then transmits the data to the computing processor); when hfp_connected is equal to true and car_speed is greater than zero, This function can be used; and the car machine provides a switch for this function, when the switch is turned on, this function can be used.

强化学习本身是要基于过往经验判断，如果一个号码是该车机第一个打进来的号码，则无法为其提供学习对象，因此设立当呼入号码大于十次时才启用此功能，前十次为学习阶段，不做响应；大于十次后，每次来电提供给驾驶人五秒的反应时间，即五秒内不做反应，给驾驶人操作时间窗口，在五秒内操作计入学习对象，五秒外则自动响应；强化学习的数据一直保存在车机的存储介质中。Reinforcement learning itself needs to be judged based on past experience. If a number is the first number called by the car, it cannot provide a learning object for it. Therefore, it is established that this function is only enabled when the number of incoming calls is greater than ten times. The second is the learning stage, and no response is made; after more than ten times, each call will provide the driver with a five-second response time, that is, if no response is made within five seconds, the driver will be given an operation time window, and the operation within five seconds will be included in the learning. The object will automatically respond after five seconds; the data of reinforcement learning is always saved in the storage medium of the vehicle.

强化学习分为三个阶段：模型的建立，模型的简化，求解。Reinforcement learning is divided into three stages: model establishment, model simplification, and solution.

第一个阶段模型的建立：The first stage of model establishment:

模型包括环境要素S、个体的动作A、环境的奖励R、个体的策略π、状态价值函数v_π(s)、动作价值函数q_π(s，a)、环境的状态转化模型

针对于每一则电话，个体的动作A包括自动接听、隐私模式自动接听、自动挂断；For each call, the individual's action A includes automatic answering, automatic answering in privacy mode, and automatic hanging up;

环境的奖励R中，由于本发明是代替驾驶人去接听电话，而驾驶人对此动作的满意程度无法直接判断，因此可以在通话结束后主动询问驾驶人是否满意此次动作，满意为1，不满意为0，一般为0.5；In the reward R of the environment, since the present invention is to answer the phone on behalf of the driver, and the driver's satisfaction with this action cannot be directly judged, it is possible to actively ask the driver whether he is satisfied with the action after the call, and the satisfaction is 1. Dissatisfaction is 0, generally 0.5;

策略π是个体的动作A的依据，此处采用条件概率分布分布π(a|s)，表示在环境要素S时进行动作a的概率，即π(s，a)＝P(A_t＝a|S_t＝s)；某一动作A的概率越大，被执行的可能性也就越大；其中t为时间点，t＝0,1,2,3…；The strategy π is the basis of the individual's action A. Here, the conditional probability distribution π(a|s) is used to represent the probability of the action a when the environmental element S is present, that is, π(s, a) = P(A_t = a |S_t = s); the greater the probability of a certain action A, the greater the possibility of being executed; where t is the time point, t=0, 1, 2, 3...;

个体在策略π和环境要素S时，采取行动后的价值(状态价值函数)v_π(s)与环境的奖励R不同，状态价值函数v_π(s)考虑在环境要素S时，做出动作后环境奖励R_t+1和后续的R_t+2、R_t+3…，表达式为：v_π(s)＝E_π(R_t+1+γR_t+2+γ²R_t+3+...|S_t＝s)，其中γ为衰减因子，取值为[0,1]；E为在策略π时的期望值。When the individual takes action in the policy π and the environmental element S, the value (state value function) v_π (s) is different from the reward R of the environment. The state value function v_π (s) takes the action when the environmental element S is considered. The post-environment reward R_t+1 and subsequent R_t+2 , R_t+3 . . . is expressed as: v_π (s)=E_π (R_t+1 +γR_t+2 +γ² R_t+3 +...|S_t =s), where γ is the decay factor, which is [0, 1]; E is the expected value when the policy is π.

环境的状态转化模型

简化：simplify:

v_π(s)＝E_π(R_t+1+γv_πS_t+1S_t＝s)v_π (s)=E_π (R_t+1 +γv_π S_t+1 S_t =s)

将两式结合起来得到：Combining the two formulas gives:

寻找最优策略π_*：Find the optimal policy π_* :

假设在t+n时刻来电，车机端自动根据t+n-1时刻的各项值代入方程得到的最优策略选取三种操作(自动接听、自动挂断、隐私模式自动接听)中最符合的一种。完成此次通话后，即收到挂断操作后，弹出弹窗主动询问驾驶人是否满意此次动作，获取到的结果作为t+n时刻的激励，并且记录t+n时刻的状态价值函数和动作价值函数，代入方程得到在t+n+1时刻更新当前的最优策略，并以此为依据在t+n+1时刻做出相应的操作。Assuming that there is an incoming call at time t+n, the vehicle terminal automatically selects the optimal strategy obtained by substituting the values at time t+n-1 into the equation and selects the most suitable one of the three operations (automatic answering, automatic hanging up, and automatic answering in privacy mode). a kind of. After completing the call, that is, after receiving the hang-up operation, a pop-up window will pop up to ask the driver whether he is satisfied with the action, and the obtained result is used as the incentive at time t+n, and the state value function and value function at time t+n are recorded. The action value function is substituted into the equation to get the current optimal strategy updated at time t+n+1, and based on this, corresponding operations are made at time t+n+1.

实施例2Example 2

参见图1，一种基于机器学习的车载蓝牙电话来电自动接听系统，该系统采用上述基于机器学习的车载蓝牙电话来电自动接听方法，其包括蓝牙模块、计算处理器、存储介质；Referring to FIG. 1 , a system for automatically answering incoming calls based on a machine learning based on-board bluetooth phone, the system adopts the above-mentioned method for automatically answering incoming calls based on a car bluetooth phone based on machine learning, which includes a bluetooth module, a computing processor, and a storage medium;

本实施例中，计算处理器包括判断模块、强化学习模块、计算模块、执行模块；强化学习模块分别与判断模块、计算模块、执行模块连接；In this embodiment, the computing processor includes a judgment module, a reinforcement learning module, a calculation module, and an execution module; the reinforcement learning module is respectively connected with the judgment module, the calculation module, and the execution module;

应当理解的是，对本领域普通技术人员来说，可以根据上述说明加以改进或变换，而所有这些改进和变换都应属于本发明所附权利要求的保护范围。It should be understood that, for those skilled in the art, improvements or changes can be made according to the above description, and all these improvements and changes should fall within the protection scope of the appended claims of the present invention.

Claims

1. A vehicle-mounted Bluetooth telephone incoming call automatic answering method based on machine learning is characterized by comprising the following steps:

1) when an incoming call exists, judging whether the vehicle-mounted Bluetooth is connected with the Bluetooth of the Bluetooth telephone;

2) when the vehicle is connected with the Bluetooth of the Bluetooth telephone, judging whether the vehicle is running;

3) when the vehicle is in the driving process, judging whether to answer the incoming call number or not through reinforcement learning, and calculating the number of times of the incoming call number:

(1) reading the number of incoming calls of the incoming call number in the storage medium, calculating the number of incoming calls of the incoming call number, and judging whether the number of incoming calls exceeds a certain number;

(2) when the incoming call number exceeds a certain number of times, the incoming call is reminded on the screen of the vehicle machine;

(3) after waiting for a certain time, the processor outputs the operation action of the previous driver on the current incoming call number to carry out automatic answering, automatic answering in a privacy mode or automatic hanging up operation.

2. The machine learning-based vehicle-mounted Bluetooth telephone incoming call automatic answering method according to claim 1, characterized in that:

in the step (1), when the incoming call number does not exist in the storage medium, the number is judged to be the first incoming call, and a learning object cannot be provided for the number, and when the incoming call number of a number is less than a certain number, the learning stage is performed, only the incoming call number of the incoming call number is calculated, and no response is made.

3. The machine learning-based vehicle-mounted Bluetooth telephone incoming call automatic answering method according to claim 1, characterized in that: in the step (3), the waiting time is the operation time reserved for the driver, if the driver operates in the time period, the processor counts the operation into a learning object, and does not output the operation action of the previous driver on the current incoming call number to carry out automatic answering, automatic answering in a privacy mode or automatic hanging-up operation.

4. The machine learning-based vehicle-mounted Bluetooth telephone incoming call automatic answering method according to claim 1 or 2, characterized in that: the certain times are 8-12 times.

5. The machine learning-based vehicle-mounted Bluetooth telephone incoming call automatic answering method according to claim 1 or 3, characterized in that: the certain time is 4-7 seconds.

6. The machine learning-based vehicle-mounted Bluetooth telephone incoming call automatic answering method according to claim 1, characterized in that: in step 3), after the processor executes the operation of answering the incoming call number, the processor inquires whether the driver is satisfied as the environment reward R and records the environment reward R as the learning basis.

7. The vehicle-mounted Bluetooth telephone incoming call automatic answering method based on machine learning as claimed in claim 1, wherein: the reinforcement learning method comprises the steps of establishing, simplifying and solving a model;

wherein, the establishment of the model:

the model comprises an environment element S, an individual action A, an environment reward R, an individual strategy pi and a state value function v_π (s) action cost function q_π (s, a), model of state transition of the environment

The environment is an incoming call scene, and each incoming call can cause the change of the environment, so that the environment elements S are countless;

the individual action A comprises automatic answering, privacy mode automatic answering and automatic hanging up;

the reward R of the environment includes satisfaction, dissatisfaction, general; wherein, the satisfaction is 1, the unsatisfied meaning is 0, and the general meaning is 0.5;

the policy pi is a basis of the individual action a, and here, a conditional probability distribution pi (a | S) is used to indicate a probability that the action a is performed when the environment element S is present, that is, pi (a | S) ═ P (a |)_t ＝a|S_t S); the greater the probability of a certain action a, the greater the likelihood of being performed;

the state value function v of an individual when the policy pi and the environment element S_π (s) a state cost function v, different from the reward R of the environment_π (S) consideration of the environmental reward R after the action made in the environment element S_t+1 And subsequent R_t+2 、R_t+3 … …, the expression is: v. of_π (s)＝E_π (R_t+1 +γR_t+2 +γ² R_t+3 +...|S_t S), where γ is the attenuation factor, with values of [0,1 ]]；

Function of action merit q_π The expression of (s, a) is:

q_π (s，a)＝E_π (R_t+1 +γR_t+2 +γ² R_t+3 +...|S_t ＝s，A_t ＝a)；

state transition model of environment

Is the probability of taking action a under the environmental element S to transition to the next environmental element S';

and (3) simplification:

a Markov decision process is adopted to simplify modeling of reinforcement learning; if the real environment conversion process is performed, the incoming call converted to the next environment element s' is related to the previous incoming call and also related to the previous incoming call, the simplified method is to assume the markov property of the conversion of the environment element, that is, the probability of the next state is only related to the previous state, and is expressed by the following formula:

simultaneous individual strategy pi, state cost function v_π (s) action cost function q_π (s, a) also make Markov assumptions;

solving:

function of state worth v_π (s) and a motion cost function q_π (s, a) is converted to Bellman equation to yield:

v_π (s)＝E_π (R_t+1 +γv_π S_t+1 |S_t ＝s)

q_π (s，a)＝E_π (R_t+1 +γq_π (S_t+1 ，A_t+1 )|S_t ＝s，A_t ＝a)

using a state cost function v_π (s) and a motion cost function q_π (s, a) to obtain a conversion formula of the relationship between the two:

combining the two formulas to obtain:

finding an optimal strategy pi_* ：

I.e. the state cost function v_π (s) and a motion cost function q_π (s, a) is expressed as:

and obtaining a final expression, substituting the previous incoming call process into the equation, and obtaining the optimal solution of the next incoming call.

8. The machine learning-based vehicle-mounted Bluetooth telephone incoming call automatic answering method according to claim 1, characterized in that: when the Bluetooth of the Bluetooth telephone is not connected, the interface is grey and is inoperable; when the vehicle is not in a driving state, the driver operates the mobile phone to record the answering action and the number of times of the incoming call.

9. The utility model provides a vehicle-mounted bluetooth telephone incoming call automatic answering system based on machine learning which characterized in that: the system adopts the vehicle-mounted Bluetooth telephone incoming call automatic answering method based on machine learning of any one of claims 1-7, which comprises a Bluetooth module, a computing processor and a storage medium;

the Bluetooth module is used for receiving data transmitted by the mobile phone and transmitting the data to the computing processor;

the computing processor is used for receiving the data transmitted by the Bluetooth module, recording the operation of the driver and carrying out reinforcement learning;

and the storage medium is used for storing the data transmitted by the computing processor.

10. The machine learning-based vehicle-mounted Bluetooth telephone incoming call automatic answering system according to claim 9, wherein: the computing processor comprises a judging module, a reinforcement learning module, a computing module and an executing module; the reinforcement learning module is respectively connected with the judging module, the calculating module and the executing module;

the judging module is used for judging whether the vehicle-mounted Bluetooth is connected with the Bluetooth telephone when an incoming call is received, and judging whether the vehicle is in driving when the Bluetooth of the Bluetooth telephone is connected;

the reinforcement learning module judges whether to answer the incoming call number or not through reinforcement learning;

the calculating module is used for calculating the number of times of incoming calls of the incoming call number and recording the operation of a driver on the incoming calls;

the execution module is used for executing the command transmitted by the reinforcement learning module and carrying out automatic answering, privacy mode automatic answering or automatic hanging-up operation.