
技术领域technical field
本发明涉及音箱云服务领域,具体涉及一种音箱云服务平台系统。The invention relates to the field of speaker cloud service, in particular to a speaker cloud service platform system.
背景技术Background technique
智能音箱是一个音箱升级的产物,是家庭消费者用语音进行上网的一个工具,比如点播歌曲、上网购物,或是了解天气预报,它也可以对智能家居设备进行控制,比如打开窗帘、设置冰箱温度、提前让热水器升温等。2018年6月11日,百度在京发布首款自有品牌智能音箱“小度智能音箱”。2019年6月1日,百度旗下人工智能助手“小度智能音箱大金刚”登陆小度商城;11月25日,由华为和帝瓦雷联合打造的华为SoundX智能音箱正式发布。Smart speaker is a product of speaker upgrade. It is a tool for home consumers to use voice to access the Internet, such as on-demand songs, online shopping, or knowing the weather forecast. It can also control smart home devices, such as opening curtains and setting refrigerators. temperature, let the water heater warm up in advance, etc. On June 11, 2018, Baidu released its first self-owned brand smart speaker "Xiaodu Smart Speaker" in Beijing. On June 1, 2019, Baidu's artificial intelligence assistant "Xiaodu Smart Speaker Donkey Kong" landed on Xiaodu Mall; on November 25, Huawei's SoundX smart speaker jointly created by Huawei and Devialet was officially released.
现有的云服务平台系统存在应用单一,不能有效调节音箱与音箱之间的关系的技术问题。本发明通过提供一种音箱云服务平台系统,能够解决问题。The existing cloud service platform system has the technical problem that the application is single and the relationship between the speakers cannot be effectively adjusted. The present invention can solve the problem by providing a cloud service platform system for speakers.
发明内容SUMMARY OF THE INVENTION
本发明所要解决的技术问题是现有技术中存在的应用单一,不能有效调节音箱与音箱之间的关系的技术问题。提供一种新的音箱云服务平台系统,该音箱云服务平台系统具有应用多元,有效处理音箱之间的冲突和关系的特点。The technical problem to be solved by the present invention is that the existing technology has a single application and cannot effectively adjust the relationship between the sound box and the sound box. A new speaker cloud service platform system is provided. The speaker cloud service platform system has the characteristics of multiple applications and effective handling of conflicts and relationships between speakers.
为解决上述技术问题,采用的技术方案如下:In order to solve the above technical problems, the technical solutions adopted are as follows:
一种音箱云服务平台系统,所述音箱包含语音输入模块、网络连接单元以及播放装置,所述音箱云服务平台系统包括与云服务器,用于连接云服务器和音箱的网络连接单元;音箱处设置有定位检测单元,云服务器实时接收定位检测单元的数据;云服务器执行音箱冲突检测步骤:A speaker cloud service platform system, the speaker includes a voice input module, a network connection unit and a playback device, the speaker cloud service platform system includes a cloud server, a network connection unit for connecting the cloud server and the speaker; the speaker is provided with There is a positioning detection unit, and the cloud server receives the data of the positioning detection unit in real time; the cloud server performs the steps of speaker conflict detection:
步骤1,云服务器接收处于工作状态的音箱的定位数据;Step 1, the cloud server receives the positioning data of the speaker in working state;
步骤2,云服务器依据定位数据判断在运音箱的状态,如音箱定位距离小于预定义阈值,将对应的音箱标记为“疑似同组”,并向相应的音箱发送“疑似同组确认信息”;Step 2, the cloud server judges the status of the speakers in motion according to the positioning data, if the positioning distance of the speakers is less than the predefined threshold, the corresponding speakers are marked as "suspected to be in the same group", and "suspected to be in the same group confirmation information" is sent to the corresponding speakers;
步骤3,接收“疑似同组确认信息”的反馈结果,如结果为“yes”,则将同组音箱进行统一数据传输给同组任一音箱,并控制任一音箱在同组范围内进行数据传输;如结果为“NO”,则按照权重优先级依次传输播放数据并控制音量大小。Step 3: Receive the feedback result of "suspected confirmation information of the same group", if the result is "yes", then transmit the unified data of the same group of speakers to any speaker in the same group, and control any speaker to perform data within the same group. Transmission; if the result is "NO", the playback data will be transmitted in order according to the weight priority and the volume will be controlled.
本发明的工作原理:本发明将音箱定位信息作为判断是否“疑似同组”的依据,并根据反馈结果有效调节可能关联或冲突的音箱与音箱之间的关系。同时云服务器可调控多种互联网应用。The working principle of the present invention: The present invention uses the speaker positioning information as the basis for judging whether it is "suspected to be in the same group", and effectively adjusts the relationship between the possibly related or conflicting speakers and the speakers according to the feedback results. At the same time, the cloud server can control a variety of Internet applications.
上述方案中,为优化,进一步地,所述权重优先级由云服务器通过如下方式确定:In the above solution, for optimization, further, the weight priority is determined by the cloud server in the following manner:
步骤1.1,判断音箱的联网启动时间,时间在前的优先级高;Step 1.1, judge the networking startup time of the speaker, the priority of the time is higher;
步骤1.2,判断音箱的自检状态,自检状态好的优先级高。Step 1.2, judge the self-inspection status of the speaker, and a good self-inspection status has a high priority.
进一步地,所述云服务器还可根据音箱的指令调用互联网上的其他应用。Further, the cloud server can also call other applications on the Internet according to the instructions of the speaker.
进一步地,所述云服务器还接收音箱的语音控制信号进行语音识别,包括:Further, the cloud server also receives the voice control signal of the speaker for voice recognition, including:
步骤一,建立历史语音特征图库,历史语音特征图是将预先输入或历史记录的语句语音进行特征提取,绘制语句语音特征图,语句语音特征图包含字、词、句特征图;The first step is to establish a historical voice feature library. The historical voice feature map is to perform feature extraction on the pre-input or historically recorded sentence voice, and draw a sentence voice feature map, and the sentence voice feature map includes character, word, and sentence feature maps;
步骤二,将音箱实时采集的语句语音进行特征提取,绘制目标语句语音特征图;任选历史语音特征图库中的语句语音特征图定义为参考图像,将目标语句语音特征图为目标图像,In step 2, feature extraction is performed on the speech speech collected by the speaker in real time, and a speech characteristic map of the target speech is drawn; optionally, the speech speech characteristic map of the speech in the historical speech characteristic library is defined as a reference image, and the speech characteristic map of the target speech is defined as the target image,
步骤三,将目标图像IC进行二值化处理,值为1则定义为有语音特征,0则定义为无语音特征;将二值化处理后的特征图采用单元网格划分为网格图,定义网格图首点(x1,y1)为原点,定义检索匹配步长为L,自原点开始,沿着x的方向进行检索,如果检索出值为1的出的,则记录该点的位置和值,并依序标号,否则继续检索匹配;In step 3, the target imageIC is subjected to binarization processing, and the value of 1 is defined as having a voice feature, and 0 is defined as no voice feature; the feature map after binarization processing is divided into a grid map by using a unit grid. , define the first point (x1, y1) of the grid map as the origin, define the search matching step as L, start from the origin, and search along the direction of x. If the value of 1 is retrieved, record the value of the point position and value, and label them in sequence, otherwise continue to search for matches;
步骤四,将点(x1,y1+N*L)更新为原点,返还执行步骤步骤三,直至x方向和y方向都检索匹配完毕,完成初步定位检索匹配,其中N为整数,L为常数;Step 4: Update the point (x1, y1+N*L) to the origin, and return to step 3 until both the x-direction and the y-direction are searched and matched, and the preliminary positioning search match is completed, where N is an integer and L is a constant;
步骤五,依次将值为1的点取出,将当次取出的1值点更新为原点,更新检索匹配步长为L/2,沿着x方向依次进行检索匹配,之前已经检索匹配的点不再检索匹配,检索匹配到超出范围则自动将检索匹配步长减半,继续检索匹配,直至步长减为最小,检索匹配过程中出现新的1值点则定义为需要进行y向检索匹配的新点,执行步骤六,否则执行步骤七;Step 5: Take out the points with a value of 1 in turn, update the 1-value point taken out at the current time as the origin, update the search matching step size to L/2, and search and match in sequence along the x direction. The points that have been searched and matched before are not Re-search for matching, if the search matching exceeds the range, the search matching step size is automatically halved, and the search matching continues until the step size is reduced to the minimum. A new 1-value point during the search matching process is defined as the one that needs to be searched and matched in the y direction. New point, go to step six, otherwise go to step seven;
步骤六,检索匹配步长为L/2不变,沿着y方向依次进行检索匹配,之前已经检索匹配的点不再检索匹配,检索匹配到超出范围则自动将检索匹配步长减半,继续检索匹配,直至步长减为最小,检索匹配过程中出现新的1值点则定义为需要进行x向检索匹配的新点,执行步骤五,否则执行步骤七;Step 6: The search matching step size is unchanged at L/2, and the search matching is performed in turn along the y direction. The points that have been searched and matched before will not be searched for matching. If the search matching exceeds the range, the search matching step size will be automatically halved and continue. Search and match until the step size is reduced to the minimum. A new 1-value point in the search and matching process is defined as a new point that needs to be searched and matched in the x-direction. Step 5 is performed, otherwise, step 7 is performed;
步骤七,直至没有新点需要检索匹配,结束检索匹配,将检索匹配出的1值点的区域集合为有效目标图像;Step 7, until no new points need to be searched and matched, end the search and match, and set the area set of the 1-value points obtained by the search as a valid target image;
步骤八,将有效目标图像在历史语音特征图库进行搜索匹配分析;Step 8: Search and match the effective target image in the historical voice feature gallery;
步骤九,根据识别结果调用相应策略。Step 9: Invoke the corresponding strategy according to the identification result.
进一步地,所述步骤八还包括图像校对处理,包括:Further, the step 8 also includes image proofreading processing, including:
步骤a,将有效目标图像定义为任选历史语音特征图库中一参考图像定义为IC;Step a, define a valid target image asA reference image in the optional historical speech feature library is defined as IC;
步骤b,定义参考图像IC和通过极坐标变换后的目标图像有关联关系如下:Step b, define the reference image IC and the target image transformed by polar coordinates The relationship is as follows:
其中,αz为尺度偏移参数,为旋转偏移参数; where αz is the scale offset parameter, is the rotation offset parameter;
步骤c,计算出参考图像IC在极坐标系中径向上的投影目标图像在径向上的投影将KC(i)和取对数得到LKC(i)和将LKC(i)和的平移差值作为尺度偏移参数αz;Stepc , calculate the projection of the reference image IC on the radial direction in the polar coordinate system target image Projection in the radial direction Put KC (i) and Take the logarithm to get LKC (i) and Put LKC (i) and The translation difference of is used as the scale offset parameter αz ;
i=1,2,...nr,i=1,2,...nr ,
为Ki=Kmax处角度方向的采样数,ce()表示大于或等于括号内值的最小整数,fl()表示小于或大于括号内值的最大整数;目标图像的大小为2Kmax×2Kmax,nr=Kmax为径向方向采样数,nφ=8Ki为角度方向采样数; is the number of samples in the angular direction atKi = Kmax , ce() represents the smallest integer greater than or equal to the value in the brackets, fl() represents the largest integer less than or greater than the value in the brackets; the size of the target image is 2Kmax × 2Kmax , nr =Kmax is the number of samples in the radial direction, nφ =8Ki is the number of samples in the angular direction;
步骤d,根据步骤c中的尺度偏移参数,计算出参考图像IC和目标图像在径向和角度上的投影:Step d, according to the scale offset parameter in step c, calculate the reference imageIC and the target image Projection on radial and angular:
对和进行归一化计算,计算出最高点的平移量根据计算出旋转偏移参数right and Perform a normalization calculation to calculate the translation of the highest point according to Calculate the rotation offset parameter
步骤e,将旋转偏移参数φz和尺度偏移参数αz带入步骤A对目标图像进行矫正,同时根据计算出∈z最小值对应的位置点为目标图像的中心点,完成图像校对处理In step e, the rotation offset parameter φz and the scale offset parameter αz are brought into step A to correct the target image, and at the same time according to the Calculate the position point corresponding to the minimum value of ∈z As the center point of the target image, complete the image proofreading process
进一步地,所述步骤八中的搜索匹配分析还包括:Further, the search matching analysis in the step 8 also includes:
步骤A,将目标图像以中心点为中心作同心圆,分割语音特征图像成B个环形区域,最后将每个环形区域分割成K个扇形区域,K和B均为预定义的常数;Step A, the target image Make concentric circles with the center point as the center, divide the speech feature image into B annular areas, and finally divide each annular area into K sector-shaped areas, where K and B are both predefined constants;
步骤B,计算出每一扇区Ssq的扇区语音特征值Vsqθ作为Code1;Step B, calculate the sector voice feature value Vsqθ of each sector Ssq as Code1;
其中,Fsqθ(x,y)为扇形区域Ssq的各像素的灰度值,Psqθ表示扇形区域Ssq内像素灰度值的平均值,nsq为环形区域Ssq内的数目,0<sq≤B×K-1,θ={0°,(360°/K),2*(360°/K),3*(360°/K),...≤180°};Among them, Fsqθ (x, y) is the gray value of each pixel in the fan-shaped area Ssq , Psqθ is the average value of the pixel gray value in the fan-shaped area Ssq , nsq is the number in the annular area Ssq , 0 <sq≤B×K-1, θ={0°, (360°/K), 2*(360°/K), 3*(360°/K), ...≤180°};
步骤C,将语音特征图像旋转(180°/K)后,重复步骤B,提取每个扇区Ssq的扇区语音特征值Vsqθ作为Code2;Step C, after the voice feature image is rotated (180°/K), repeat step B, extract the sector voice feature value Vsqθ of each sector Ssq as Code2;
步骤E,将Code1和Code2分别旋转R×(360°/K)(R=0,1,2...K-1)得到Code1’和Code2’;Step E, rotate Code1 and Code2 by R×(360°/K) (R=0, 1, 2...K-1) to obtain Code1' and Code2';
步骤F,将步骤E的Code1和Code2,Code1’和Code2’输入历史语音特征图库进行匹配。Step F: Input the Code1 and Code2, Code1' and Code2' of Step E into the historical speech feature library for matching.
本发明的有益效果:本发明将音箱定位信息作为判断是否“疑似同组”的依据,并根据反馈结果有效调节可能关联或冲突的音箱与音箱之间的关系。同时云服务器可调控多种互联网应用。将语音的特征识别转换为特征图谱的整体识别,能够有更高的识别效率。通过对于特征图像的预交校对和定位处理,提高了控制的精度和效率。Beneficial effects of the present invention: The present invention uses the speaker positioning information as the basis for judging whether it is "suspected to be in the same group", and effectively adjusts the relationship between the possibly related or conflicting speakers and the speakers according to the feedback results. At the same time, the cloud server can control a variety of Internet applications. Converting the feature recognition of speech into the overall recognition of the feature map can have higher recognition efficiency. The accuracy and efficiency of control are improved by pre-posting, proofreading and positioning processing of feature images.
附图说明Description of drawings
下面结合附图和实施例对本发明进一步说明。The present invention will be further described below in conjunction with the accompanying drawings and embodiments.
图1,音箱冲突检测步骤示意图。Figure 1 is a schematic diagram of the steps of speaker conflict detection.
具体实施方式Detailed ways
为了使本发明的目的、技术方案及优点更加清楚明白,以下结合实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本发明,并不用于限定本发明。In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.
实施例1Example 1
本实施例提供一种音箱云服务平台系统,所述音箱包含语音输入模块、网络连接单元以及播放装置,所述音箱云服务平台系统包括与云服务器,用于连接云服务器和音箱的网络连接单元;音箱处设置有定位检测单元,云服务器实时接收定位检测单元的数据;云服务器执行音箱冲突检测步骤:This embodiment provides a speaker cloud service platform system, the speaker includes a voice input module, a network connection unit and a playback device, the speaker cloud service platform system includes a cloud server, a network connection unit for connecting the cloud server and the speaker The speaker is provided with a positioning detection unit, and the cloud server receives the data of the positioning detection unit in real time; the cloud server performs the speaker conflict detection steps:
步骤1,云服务器接收处于工作状态的音箱的定位数据;Step 1, the cloud server receives the positioning data of the speaker in working state;
步骤2,云服务器依据定位数据判断在运音箱的状态,如音箱定位距离小于预定义阈值,将对应的音箱标记为“疑似同组”,并向相应的音箱发送“疑似同组确认信息”;Step 2, the cloud server judges the status of the speakers in motion according to the positioning data, if the positioning distance of the speakers is less than the predefined threshold, the corresponding speakers are marked as "suspected to be in the same group", and "suspected to be in the same group confirmation information" is sent to the corresponding speakers;
步骤3,接收“疑似同组确认信息”的反馈结果,如结果为“yes”,则将同组音箱进行统一数据传输给同组任一音箱,并控制任一音箱在同组范围内进行数据传输;如结果为“NO”,则按照权重优先级依次传输播放数据并控制音量大小。Step 3: Receive the feedback result of "suspected confirmation information of the same group", if the result is "yes", then transmit the unified data of the same group of speakers to any speaker in the same group, and control any speaker to perform data within the same group. Transmission; if the result is "NO", the playback data will be transmitted in order according to the weight priority and the volume will be controlled.
本实施例将音箱定位信息作为判断是否“疑似同组”的依据,并根据反馈结果有效调节可能关联或冲突的音箱与音箱之间的关系。同时云服务器可调控多种互联网应用。In this embodiment, the speaker positioning information is used as the basis for judging whether it is "suspected to be in the same group", and the relationship between the possible related or conflicting speakers and the speakers is effectively adjusted according to the feedback result. At the same time, the cloud server can control a variety of Internet applications.
具体地,所述权重优先级由云服务器通过如下方式确定:Specifically, the weight priority is determined by the cloud server in the following manner:
步骤1.1,判断音箱的联网启动时间,时间在前的优先级高;Step 1.1, judge the networking startup time of the speaker, the priority of the time is higher;
步骤1.2,判断音箱的自检状态,自检状态好的优先级高。Step 1.2, judge the self-inspection status of the speaker, and a good self-inspection status has a high priority.
具体地,所述云服务器还可根据音箱的指令调用互联网上的其他应用。Specifically, the cloud server can also call other applications on the Internet according to the instructions of the speaker.
优选地,所述云服务器还接收音箱的语音控制信号进行语音识别,包括:Preferably, the cloud server also receives the voice control signal of the speaker for voice recognition, including:
步骤一,建立历史语音特征图库,历史语音特征图是将预先输入或历史记录的语句语音进行特征提取,绘制语句语音特征图,语句语音特征图包含字、词、句特征图;The first step is to establish a historical voice feature library. The historical voice feature map is to perform feature extraction on the pre-input or historically recorded sentence voice, and draw a sentence voice feature map, and the sentence voice feature map includes character, word, and sentence feature maps;
步骤二,将音箱实时采集的语句语音进行特征提取,绘制目标语句语音特征图;任选历史语音特征图库中的语句语音特征图定义为参考图像,将目标语句语音特征图为目标图像,In step 2, feature extraction is performed on the speech speech collected by the speaker in real time, and a speech characteristic map of the target speech is drawn; optionally, the speech speech characteristic map of the speech in the historical speech characteristic library is defined as a reference image, and the speech characteristic map of the target speech is defined as the target image,
步骤三,将目标图像IC进行二值化处理,值为1则定义为有语音特征,0则定义为无语音特征;将二值化处理后的特征图采用单元网格划分为网格图,定义网格图首点(x1,y1)为原点,定义检索匹配步长为L,自原点开始,沿着x的方向进行检索,如果检索出值为1的出的,则记录该点的位置和值,并依序标号,否则继续检索匹配;In step 3, the target imageIC is subjected to binarization processing, and the value of 1 is defined as having a voice feature, and 0 is defined as no voice feature; the feature map after binarization processing is divided into a grid map by using a unit grid. , define the first point (x1, y1) of the grid map as the origin, define the search matching step as L, start from the origin, and search along the direction of x. If the value of 1 is retrieved, record the value of the point position and value, and label them in sequence, otherwise continue to search for matches;
步骤四,将点(x1,y1+N*L)更新为原点,返还执行步骤步骤三,直至x方向和y方向都检索匹配完毕,完成初步定位检索匹配,其中N为整数,L为常数;Step 4: Update the point (x1, y1+N*L) to the origin, and return to step 3 until both the x-direction and the y-direction are searched and matched, and the preliminary positioning search match is completed, where N is an integer and L is a constant;
步骤五,依次将值为1的点取出,将当次取出的1值点更新为原点,更新检索匹配步长为L/2,沿着x方向依次进行检索匹配,之前已经检索匹配的点不再检索匹配,检索匹配到超出范围则自动将检索匹配步长减半,继续检索匹配,直至步长减为最小,检索匹配过程中出现新的1值点则定义为需要进行y向检索匹配的新点,执行步骤六,否则执行步骤七;Step 5: Take out the points with a value of 1 in turn, update the 1-value point taken out at the current time as the origin, update the search matching step size to L/2, and search and match in sequence along the x direction. The points that have been searched and matched before are not Re-search for matching, if the search matching exceeds the range, the search matching step size is automatically halved, and the search matching continues until the step size is reduced to the minimum. A new 1-value point during the search matching process is defined as the one that needs to be searched and matched in the y direction. New point, go to step six, otherwise go to step seven;
步骤六,检索匹配步长为L/2不变,沿着y方向依次进行检索匹配,之前已经检索匹配的点不再检索匹配,检索匹配到超出范围则自动将检索匹配步长减半,继续检索匹配,直至步长减为最小,检索匹配过程中出现新的1值点则定义为需要进行x向检索匹配的新点,执行步骤五,否则执行步骤七;Step 6: The search matching step size is unchanged at L/2, and the search matching is performed in turn along the y direction. The points that have been searched and matched before will not be searched for matching. If the search matching exceeds the range, the search matching step size will be automatically halved and continue. Search and match until the step size is reduced to the minimum. A new 1-value point in the search and matching process is defined as a new point that needs to be searched and matched in the x-direction. Step 5 is performed, otherwise, step 7 is performed;
步骤七,直至没有新点需要检索匹配,结束检索匹配,将检索匹配出的1值点的区域集合为有效目标图像;Step 7, until no new points need to be searched and matched, end the search and match, and set the area set of the 1-value points obtained by the search as a valid target image;
步骤八,将有效目标图像在历史语音特征图库进行搜索匹配分析;Step 8: Search and match the effective target image in the historical voice feature gallery;
步骤九,根据识别结果调用相应策略。Step 9: Invoke the corresponding strategy according to the identification result.
优选地,所述步骤八还包括图像校对处理,包括:Preferably, the step 8 further includes image proofreading processing, including:
步骤a,将有效目标图像定义为任选历史语音特征图库中一参考图像定义为IC;Step a, define a valid target image asA reference image in the optional historical speech feature library is defined as IC;
步骤b,定义参考图像IC和通过极坐标变换后的目标图像有关联关系如下:Step b, define the reference imageIC and the target image transformed by polar coordinates The relationship is as follows:
其中,αz为尺度偏移参数,为旋转偏移参数; where αz is the scale offset parameter, is the rotation offset parameter;
步骤c,计算出参考图像IC在极坐标系中径向上的投影目标图像在径向上的投影将KC(i)和取对数得到LKC(i)和将LKC(i)和的平移差值作为尺度偏移参数αz;Stepc , calculate the projection of the reference image IC on the radial direction in the polar coordinate system target image Projection in the radial direction Put KC (i) and Take the logarithm to get LKC (i) and Put LKC (i) and The translation difference of is used as the scale offset parameter αz ;
i=1,2,...nr,i=1,2,...nr ,
为Ki=Kmax处角度方向的采样数,ce()表示大于或等于括号内值的最小整数,fl()表示小于或大于括号内值的最大整数;目标图像的大小为2Kmax×2Kmax,nr=Kmax为径向方向采样数,nφ=8Ki为角度方向采样数; is the number of samples in the angular direction atKi = Kmax , ce() represents the smallest integer greater than or equal to the value in the brackets, fl() represents the largest integer less than or greater than the value in the brackets; the size of the target image is 2Kmax × 2Kmax , nr =Kmax is the number of samples in the radial direction, nφ =8Ki is the number of samples in the angular direction;
步骤d,根据步骤c中的尺度偏移参数,计算出参考图像IC和目标图像在径向和角度上的投影:Step d, according to the scale offset parameter in step c, calculate the reference imageIC and the target image Projection on radial and angular:
对和进行归一化计算,计算出最高点的平移量根据计算出旋转偏移参数right and Perform a normalization calculation to calculate the translation of the highest point according to Calculate the rotation offset parameter
步骤e,将旋转偏移参数φz和尺度偏移参数αz带入步骤A对目标图像进行矫正,同时根据计算出∈z最小值对应的位置点为目标图像的中心点,完成图像校对处理In step e, the rotation offset parameter φz and the scale offset parameter αz are brought into step A to correct the target image, and at the same time according to the Calculate the position point corresponding to the minimum value of ∈z As the center point of the target image, complete the image proofreading process
优选地,所述步骤八中的搜索匹配分析还包括:Preferably, the search matching analysis in the step 8 further includes:
步骤A,将目标图像以中心点为中心作同心圆,分割语音特征图像成B个环形区域,最后将每个环形区域分割成K个扇形区域,K和B均为预定义的常数;Step A, the target image Make concentric circles with the center point as the center, divide the speech feature image into B annular areas, and finally divide each annular area into K sector-shaped areas, where K and B are both predefined constants;
步骤B,计算出每一扇区Ssq的扇区语音图像特征值Vsqθ作为Code1;Step B, calculates the sector voice image feature value Vsqθ of each sector Ssq as Code1;
其中,Fsqθ(x,y)为扇形区域Ssq的各像素的灰度值,Psqθ表示扇形区域Ssq内像素灰度值的平均值,nsq为环形区域Ssq内的数目,0<sq≤B×K-1,θ={0°,(360°/K),2*(360°/K),3*(360°/K),...≤180°};Among them, Fsqθ (x, y) is the gray value of each pixel in the fan-shaped area Ssq , Psqθ is the average value of the pixel gray value in the fan-shaped area Ssq , nsq is the number in the annular area Ssq , 0 <sq≤B×K-1, θ={0°, (360°/K), 2*(360°/K), 3*(360°/K), ...≤180°};
步骤C,将语音特征图像旋转(180°/K)后,重复步骤B,提取每个扇区Ssq的扇区语音特征值Vsqθ作为Code2;Step C, after the voice feature image is rotated (180°/K), repeat step B, extract the sector voice feature value Vsqθ of each sector Ssq as Code2;
步骤E,将Code1和Code2分别旋转R×(360°/K)(R=0,1,2...K-1)得到Code1’和Code2’;Step E, rotate Code1 and Code2 by R×(360°/K) (R=0, 1, 2...K-1) to obtain Code1' and Code2';
步骤F,将步骤E的Code1和Code2,Code1’和Code2’输入历史语音特征图库进行匹配。Step F: Input the Code1 and Code2, Code1' and Code2' of Step E into the historical speech feature library for matching.
本实施例将音箱定位信息作为判断是否“疑似同组”的依据,并根据反馈结果有效调节可能关联或冲突的音箱与音箱之间的关系。同时云服务器可调控多种互联网应用。将语音的特征识别转换为特征图谱的整体识别,能够有更高的识别效率。通过对于特征图像的预交校对和定位处理,提高了控制的精度和效率。In this embodiment, the speaker positioning information is used as the basis for judging whether it is "suspected to be in the same group", and the relationship between the possible related or conflicting speakers and the speakers is effectively adjusted according to the feedback result. At the same time, the cloud server can control a variety of Internet applications. Converting the feature recognition of speech into the overall recognition of the feature map can have higher recognition efficiency. The accuracy and efficiency of control are improved by pre-posting, proofreading and positioning processing of feature images.
尽管上面对本发明说明性的具体实施方式进行了描述,以便于本技术领域的技术人员能够理解本发明,但是本发明不仅限于具体实施方式的范围,对本技术领域的普通技术人员而言,只要各种变化只要在所附的权利要求限定和确定的本发明精神和范围内,一切利用本发明构思的发明创造均在保护之列。Although the illustrative specific embodiments of the present invention are described above so that those skilled in the art can understand the present invention, the present invention is not limited to the scope of the specific embodiments. As long as such changes fall within the spirit and scope of the present invention defined and determined by the appended claims, all inventions and creations utilizing the inventive concept are included in the protection list.
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202210095507.4ACN114550715A (en) | 2022-01-26 | 2022-01-26 | A speaker cloud service platform system |
| GB2300697.6AGB2616512B (en) | 2022-01-26 | 2023-01-17 | Cloud service platform system for speakers |
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202210095507.4ACN114550715A (en) | 2022-01-26 | 2022-01-26 | A speaker cloud service platform system |
| Publication Number | Publication Date |
|---|---|
| CN114550715Atrue CN114550715A (en) | 2022-05-27 |
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202210095507.4APendingCN114550715A (en) | 2022-01-26 | 2022-01-26 | A speaker cloud service platform system |
| Country | Link |
|---|---|
| CN (1) | CN114550715A (en) |
| GB (1) | GB2616512B (en) |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN104507005A (en)* | 2014-12-05 | 2015-04-08 | 广东欧珀移动通信有限公司 | Grouping control method and system of wireless loudspeaker box |
| CN104735589A (en)* | 2015-01-30 | 2015-06-24 | 广东欧珀移动通信有限公司 | A volume adjustment system and method for GPS-based smart speaker grouping |
| CN105845156A (en)* | 2016-03-22 | 2016-08-10 | 广东欧珀移动通信有限公司 | Control method, device and system of music playing system |
| CN106998514A (en)* | 2016-01-26 | 2017-08-01 | 湖南汇德电子有限公司 | Intelligent multichannel collocation method and system |
| CN107517462A (en)* | 2017-10-12 | 2017-12-26 | 柴雪 | A kind of multichannel audio amplifier control method and device |
| CN108513243A (en)* | 2018-02-28 | 2018-09-07 | 成都星环科技有限公司 | A kind of intelligence sound field calibration system |
| CN108737933A (en)* | 2018-05-30 | 2018-11-02 | 上海与德科技有限公司 | A kind of dialogue method, device and electronic equipment based on intelligent sound box |
| CN108834138A (en)* | 2018-05-25 | 2018-11-16 | 四川斐讯全智信息技术有限公司 | A kind of distribution method and system based on voice print database |
| US20190288657A1 (en)* | 2018-03-15 | 2019-09-19 | Harman International Industries, Incorporated | Smart speakers with cloud equalizer |
| CN209642953U (en)* | 2019-06-05 | 2019-11-15 | 中山市力泰电子工业有限公司 | A new type of wireless transmission audio |
| CN112134966A (en)* | 2020-11-26 | 2020-12-25 | 飞天诚信科技股份有限公司 | Cloud sound box broadcast voice configuration method and system |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9898250B1 (en)* | 2016-02-12 | 2018-02-20 | Amazon Technologies, Inc. | Controlling distributed audio outputs to enable voice output |
| US11653148B2 (en)* | 2019-07-22 | 2023-05-16 | Apple Inc. | Modifying and transferring audio between devices |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN104507005A (en)* | 2014-12-05 | 2015-04-08 | 广东欧珀移动通信有限公司 | Grouping control method and system of wireless loudspeaker box |
| CN104735589A (en)* | 2015-01-30 | 2015-06-24 | 广东欧珀移动通信有限公司 | A volume adjustment system and method for GPS-based smart speaker grouping |
| CN106998514A (en)* | 2016-01-26 | 2017-08-01 | 湖南汇德电子有限公司 | Intelligent multichannel collocation method and system |
| CN105845156A (en)* | 2016-03-22 | 2016-08-10 | 广东欧珀移动通信有限公司 | Control method, device and system of music playing system |
| CN107517462A (en)* | 2017-10-12 | 2017-12-26 | 柴雪 | A kind of multichannel audio amplifier control method and device |
| CN108513243A (en)* | 2018-02-28 | 2018-09-07 | 成都星环科技有限公司 | A kind of intelligence sound field calibration system |
| US20190288657A1 (en)* | 2018-03-15 | 2019-09-19 | Harman International Industries, Incorporated | Smart speakers with cloud equalizer |
| CN108834138A (en)* | 2018-05-25 | 2018-11-16 | 四川斐讯全智信息技术有限公司 | A kind of distribution method and system based on voice print database |
| CN108737933A (en)* | 2018-05-30 | 2018-11-02 | 上海与德科技有限公司 | A kind of dialogue method, device and electronic equipment based on intelligent sound box |
| CN209642953U (en)* | 2019-06-05 | 2019-11-15 | 中山市力泰电子工业有限公司 | A new type of wireless transmission audio |
| CN112134966A (en)* | 2020-11-26 | 2020-12-25 | 飞天诚信科技股份有限公司 | Cloud sound box broadcast voice configuration method and system |
| Title |
|---|
| 李沛谕;: "语音识别技术在智能音箱系统中的应用技术浅析", 中国新通信, no. 20, 20 October 2018 (2018-10-20)* |
| Publication number | Publication date |
|---|---|
| GB202300697D0 (en) | 2023-03-01 |
| GB2616512A (en) | 2023-09-13 |
| GB2616512B (en) | 2024-11-06 |
| Publication | Publication Date | Title |
|---|---|---|
| CN109101602B (en) | Image retrieval model training method, image retrieval method, device and storage medium | |
| US20210295088A1 (en) | Image detection method, device, storage medium and computer program product | |
| CN113095336B (en) | Method for training key point detection model and method for detecting key points of target object | |
| JP6800351B2 (en) | Methods and devices for detecting burr on electrode sheets | |
| CN111027563A (en) | Text detection method, device and recognition system | |
| WO2019196633A1 (en) | Training method for image semantic segmentation model and server | |
| WO2022077646A1 (en) | Method and apparatus for training student model for image processing | |
| CN111402253A (en) | Online monitoring method for state of power transmission and transformation equipment integrating edge calculation and deep learning | |
| WO2020062493A1 (en) | Image processing method and apparatus | |
| WO2020062494A1 (en) | Image processing method and apparatus | |
| CN113657248B (en) | Training method, device and computer program product for face recognition model | |
| CN115359308B (en) | Model training method, device, equipment, storage medium and program for identifying difficult cases | |
| CN111784776A (en) | Visual positioning method and apparatus, computer readable medium and electronic device | |
| CN112883966A (en) | Image character recognition method, device, medium and electronic equipment | |
| CN114241411A (en) | Counting model processing method and device based on target detection and computer equipment | |
| CN114550715A (en) | A speaker cloud service platform system | |
| CN114783425A (en) | A Federated Learning Method and System for Speech Recognition Based on Private Parameters | |
| WO2023091131A1 (en) | Methods and systems for retrieving images based on semantic plane features | |
| CN112001301B (en) | Building monitoring method, device and electronic device based on global cross entropy weighting | |
| CN113778078A (en) | Positioning information generation method and device, electronic equipment and computer readable medium | |
| CN112001300A (en) | Building monitoring method and device based on cross entropy according to position and electronic equipment | |
| CN107291885A (en) | A kind of big data visualization system and its control method based on man-machine interaction | |
| CN113260044B (en) | CSI fingerprint positioning method, device and device based on double-layer dictionary learning | |
| WO2024197655A1 (en) | Wifi fingerprinting indoor localization method based on denoising auto-encoder and convolutional neural network | |
| CN114863148B (en) | Target recognition method and terminal equipment based on millimeter wave radar |
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |