CN118331426A

Movatterモバイル変換

Info

Publication number: CN118331426A
Application number: CN202410520993.9A
Authority: CN
Inventors: 吴勇民
Original assignee: Hangzhou Pinjie Network Technology Co Ltd
Current assignee: Hangzhou Pinjie Network Technology Co Ltd
Priority date: 2024-04-28
Filing date: 2024-04-28
Publication date: 2024-07-12

Abstract

The embodiment of the disclosure discloses a gesture control method, a gesture control device, electronic equipment and a gesture control medium for an article management system. One embodiment of the method comprises the following steps: acquiring a gesture dynamic image sequence and a gesture perception signal sequence; inputting the gesture dynamic image sequence into a pre-trained first gesture instruction recognition model to obtain first control instruction information; extracting features of each gesture sensing signal in the gesture sensing signal sequence to obtain a gesture sensing feature information sequence; each gesture perception characteristic information in the gesture perception characteristic information sequence is spliced to obtain a gesture target spectrogram; inputting the gesture target spectrogram to a pre-trained second gesture instruction recognition model to obtain second control instruction information; fusing the first control instruction information and the second control instruction information to obtain system control instruction information; and controlling the article management system based on the system control instruction information. This embodiment improves the accuracy of controlling the item management system.

Description

Gesture control method and device for article management system, electronic equipment and medium

Technical Field

Embodiments of the present disclosure relate to the field of computer technology, and in particular, to a gesture control method, device, electronic apparatus, and medium for an article management system.

Background

In the logistics transportation process, information such as the warehouse-in and warehouse-out of the articles in the article warehouse, the temperature and humidity in the warehouse and the like needs to be recorded in real time. Currently, in controlling an article management system, the following methods are generally adopted: and receiving a control instruction through a capacitive touch screen included in the article management system, or shooting gesture actions through a camera to identify gesture instructions, so as to obtain the control instruction and control the article management system.

However, the inventors have found that when the article management system is controlled in the above manner, there are often the following technical problems:

firstly, a worker in an article warehouse often wears heavy working gloves to operate, and when the capacitive touch screen is directly used by the gloves, the sensitivity of the capacitive touch screen is reduced, so that the accuracy of a control instruction received by an article management system is reduced, and further, the accuracy of the control of the article management system is reduced;

Second, when a gesture is shot by a camera, the accuracy of the recognized gesture command decreases as the sharpness of the image shot by the camera decreases, and further, the accuracy of the control of the article management system decreases.

The above information disclosed in this background section is only for enhancement of understanding of the background of the inventive concept and, therefore, may contain information that does not form the prior art that is already known to those of ordinary skill in the art in this country.

Disclosure of Invention

The disclosure is in part intended to introduce concepts in a simplified form that are further described below in the detailed description. The disclosure is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

Some embodiments of the present disclosure propose article management system gesture control methods, apparatus, electronic devices, and computer readable media to address one or more of the technical problems mentioned in the background section above.

In a first aspect, some embodiments of the present disclosure provide an item management system gesture control method, the method comprising: acquiring a gesture dynamic image sequence and a gesture perception signal sequence; inputting the gesture dynamic image sequence into a pre-trained first gesture instruction recognition model to obtain first control instruction information; performing feature extraction processing on each gesture sensing signal in the gesture sensing signal sequence to generate gesture sensing feature information, so as to obtain a gesture sensing feature information sequence; performing stitching processing on each gesture perception characteristic information in the gesture perception characteristic information sequence to obtain a gesture target spectrogram; inputting the gesture target spectrogram to a pre-trained second gesture instruction recognition model to obtain second control instruction information; the first control instruction information and the second control instruction information are fused to obtain system control instruction information; and controlling the article management system based on the system control instruction information.

In a second aspect, some embodiments of the present disclosure provide an article management system gesture control apparatus, the apparatus comprising: an acquisition unit configured to acquire a gesture dynamic image sequence and a gesture perception signal sequence; the first input unit is configured to input the gesture dynamic image sequence to a pre-trained first gesture instruction recognition model to obtain first control instruction information; the feature extraction unit is configured to perform feature extraction processing on each gesture sensing signal in the gesture sensing signal sequence to generate gesture sensing feature information, so as to obtain a gesture sensing feature information sequence; the splicing unit is configured to splice each piece of gesture perception characteristic information in the gesture perception characteristic information sequence to obtain a gesture target spectrogram; the second input unit is configured to input the gesture target spectrogram to a pre-trained second gesture instruction recognition model to obtain second control instruction information; the fusion unit is configured to fuse the first control instruction information and the second control instruction information to obtain system control instruction information; and a control unit configured to control the article management system based on the system control instruction information.

In a third aspect, some embodiments of the present disclosure provide an electronic device comprising: one or more processors; a storage device having one or more programs stored thereon, which when executed by one or more processors causes the one or more processors to implement the method described in any of the implementations of the first aspect above.

In a fourth aspect, some embodiments of the present disclosure provide a computer readable medium having a computer program stored thereon, wherein the program, when executed by a processor, implements the method described in any of the implementations of the first aspect above.

Drawings

The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. The same or similar reference numbers will be used throughout the drawings to refer to the same or like elements. It should be understood that the figures are schematic and that elements and components are not necessarily drawn to scale.

FIG. 1 is a flow chart of some embodiments of an item management system gesture control method according to the present disclosure;

FIG. 2 is a schematic structural view of some embodiments of an item management system gesture control device according to the present disclosure;

fig. 3 is a schematic structural diagram of an electronic device suitable for use in implementing some embodiments of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.

It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings. Embodiments of the present disclosure and features of embodiments may be combined with each other without conflict.

It should be noted that the terms "first," "second," and the like in this disclosure are merely used to distinguish between different devices, modules, or units and are not used to define an order or interdependence of functions performed by the devices, modules, or units.

It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.

The names of messages or information interacted between the various devices in the embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.

The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

FIG. 1 illustrates a flow 100 of some embodiments of an item management system gesture control method according to the present disclosure. The gesture control method of the article management system comprises the following steps:

and step 101, acquiring a gesture dynamic image sequence and a gesture perception signal sequence.

In some embodiments, the execution subject of the gesture control method of the article management system may acquire the gesture dynamic image sequence from the camera and the gesture sensing signal sequence from the radar sensor through a wired connection or a wireless connection. Here, the camera may be used to capture the gesture motion image sequence. The radar sensor may be configured to sense the gesture sensing signal sequence. The radar sensor may include, but is not limited to, at least one of: a transmit antenna assembly and a receive antenna assembly. The gesture dynamic images in the gesture dynamic image sequence may be ordered according to the chronological order. The gesture dynamic image sequence can represent a gesture command action. The gesture dynamic image sequence and the gesture sensing signal sequence may be acquired simultaneously. The gesture sensing signal in the gesture sensing signal sequence may be a radar signal. The gesture sensing signal sequence may represent a gesture command action.

As an example, the radar sensor may be a millimeter wave radar.

It should be noted that the wireless connection may include, but is not limited to, 3G/4G connection, wiFi connection, bluetooth connection, wiMAX connection, zigbee connection, UWB (ultra wideband) connection, and other now known or later developed wireless connection.

Step 102, inputting the gesture dynamic image sequence into a pre-trained first gesture instruction recognition model to obtain first control instruction information.

In some embodiments, the executing body may input the gesture dynamic image sequence to a first gesture command recognition model trained in advance, to obtain the first control command information. The pre-trained first gesture command recognition model may be a neural network model that takes a gesture dynamic image sequence as an input and takes first control command information as an output.

Alternatively, the pre-trained first gesture instruction recognition model may be obtained by training the following steps:

First, a first sample gesture information set is acquired. Wherein each first sample gesture information in the first sample gesture information set includes: the first sample gesture image sequence and the first sample control instruction information. The first sample gesture image sequence may represent a gesture command action. The first sample control instruction information may be a control instruction corresponding to the first sample gesture image sequence. The first set of sample gesture information may be obtained from a storage terminal. The storage terminal may be configured to store the first sample gesture information set.

Second, selecting first sample gesture information from the first sample gesture information set, and executing the following first training substep:

And a first sub-step of inputting a first sample gesture image sequence included in the first sample gesture information into a convolution sub-model included in the initial first gesture instruction recognition model to obtain an initial gesture space-time characteristic diagram. Wherein the initial first gesture instruction recognition model further comprises: an attention sub-model, a circulation sub-model, and an output sub-model. The first sample gesture information may be randomly selected from the set of first sample gesture information to perform a first training sub-step. The initial first gesture command recognition model may be an untrained neural network model having a first sample gesture image sequence as input and first initial gesture control command information as output. Here, the above convolutional submodel may be divided into nine layers: the first, third, fifth, seventh and eighth layers may be three-dimensional convolution layers that may be used to convolve the output of the previous layer. The second, fourth, sixth and ninth layers may be maximum pooling layers that may be used to pool the output of the previous layer. The output submodel can be divided into two layers: the first layer may be a random inactivation layer that may be used to randomly inactivate the output of the cyclic submodel described above. The second layer may be a normalization layer, which may be used to normalize the output of the random inactivation layer described above.

As an example, the attention sub-model described above may be a CBAM (Convolutional Block Attention Module, attention mechanism module) model. The cyclic submodel may be an LSTM (Long Short Term Memory, long-term memory) model.

And a second sub-step of inputting the initial gesture space-time feature map to an attention sub-model included in the initial first gesture instruction recognition model to obtain initial gesture attention feature information.

And a third sub-step of inputting the initial gesture attention characteristic information into a circulation sub-model included in the initial first gesture instruction recognition model to obtain initial gesture circulation characteristic information.

And a fourth sub-step of inputting the initial gesture circulation characteristic information into an output sub-model included in the initial first gesture command recognition model to obtain first initial gesture control command information.

And a fifth substep of determining a first control difference value between the first initial gesture control instruction information and the first sample control instruction information included in the first sample gesture information based on a preset first loss function.

As an example, the above-mentioned preset first loss function may be, but is not limited to, at least one of: MSE (Mean Square Error ) function, MAE (Mean Absolute Error, mean absolute error) function, or Categorical Cross Entropy Loss (class cross entropy loss) function.

A sixth substep, in response to determining that the first control difference value is less than the first target value, determines the initial first gesture command recognition model as the first gesture command recognition model.

As an example, the first target value may be 0.1.

Optionally, the executing body may further adjust related parameters in the initial first gesture command recognition model in response to determining that the first control difference value is equal to or greater than the first target value, determine the adjusted initial first gesture command recognition model as the initial first gesture command recognition model, and select first sample gesture information from each of the first sample gesture information set that is not selected, so as to execute the first training step again. The related parameters in the initial first gesture instruction recognition model can be adjusted through a preset adjustment algorithm.

As an example, the preset adjustment algorithm may be, but is not limited to, at least one of the following: adam (Adam) optimizer algorithm or gradient descent algorithm.

Thus, the first gesture command recognition model can extract feature information in the gesture image through the convolution submodel. Then, the channel feature information and the spatial feature information in the feature map can be extracted by the attention sub-model, respectively. Then, the association relation information between the characteristic information in the sequence can be extracted through the cyclic submodel. Finally, gesture control instruction information can be obtained through the output submodel. Therefore, the accuracy of the obtained gesture control instruction can be improved after the characteristic extraction is performed for a plurality of times, and further, the accuracy of gesture control of the article management system can be improved.

And step 103, carrying out feature extraction processing on each gesture sensing signal in the gesture sensing signal sequence to generate gesture sensing feature information, and obtaining a gesture sensing feature information sequence.

In some embodiments, the executing body may perform feature extraction processing on each gesture sensing signal in the gesture sensing signal sequence to generate gesture sensing feature information, so as to obtain a gesture sensing feature information sequence.

In some optional implementations of some embodiments, the performing body performs feature extraction processing on each gesture sensing signal in the gesture sensing signal sequence to generate gesture sensing feature information, and may include the following steps:

Firstly, converting the gesture sensing signals to obtain a gesture sensing spectrogram. The gesture sensing signal can be subjected to noise reduction processing through a preset noise reduction algorithm, so that a gesture noise reduction signal is obtained. Then, the gesture noise reduction signal can be converted through a preset conversion algorithm, so that a gesture sensing spectrogram is obtained. The gesture sensing spectrogram can be obtained by combining antenna sensing sub-spectrogram sets. Each of the antenna perception sub-spectrum in the antenna perception sub-spectrum set may correspond to a transmitting antenna or a receiving antenna of the radar sensor. The position of each antenna perception sub-spectrum in the antenna perception sub-spectrum set in the gesture perception spectrum may correspond to a spatial position of each transmitting antenna and each receiving antenna included in the radar sensor.

As an example, the above-mentioned preset noise reduction algorithm may be an MTI (Moving Target Indication, moving object display) algorithm. The predetermined conversion algorithm may be a 2D-FFT (Two-Dimensional Fourier Transform ) algorithm. The gesture-aware spectrogram may be a RD (range-doppler ) chart. Each of the antenna-aware spectrograms in the set of antenna-aware spectrograms may be a RD (range-doppler ) chart.

And secondly, performing target detection processing on the gesture sensing spectrogram to obtain a gesture sensing target point set. The gesture sensing spectrogram can be subjected to target detection processing through a preset target detection algorithm, so that a gesture sensing target point set is obtained.

As an example, the above-mentioned preset target detection algorithm may be a CFAR (Constant FALSE ALARM RATE Detector, detector under Constant false alarm probability) algorithm.

And thirdly, screening the gesture sensing target point set based on the gesture sensing spectrogram to obtain a screening sensing target point set.

And fourthly, performing first projection processing on the gesture sensing spectrogram based on the screening sensing target point set to obtain a gesture sensing distance spectrogram. The projection of each screening perception target point in the screening perception target point set on the distance axis in the gesture perception spectrogram can be determined as the gesture perception distance spectrogram.

And fifthly, carrying out positioning processing on the gesture sensing distance spectrogram to obtain a gesture sensing azimuth spectrogram and a gesture sensing pitch spectrogram. The gesture sensing distance spectrogram can be subjected to positioning processing through a preset positioning algorithm, so that a gesture sensing azimuth spectrogram and a gesture sensing pitch spectrogram are obtained.

As an example, the above-mentioned preset positioning algorithm may be, but is not limited to, at least one of: CBF (Conventional Beamforming, spatial spectrum estimation) algorithm or MUSIC (Mutiple Signal Classification, multiple signal classification) algorithm.

And sixthly, performing second projection processing on the gesture sensing spectrogram based on the screening sensing target point set to obtain a gesture sensing speed spectrogram. The projection of each screening perception target point in the screening perception target point set on the speed axis in the gesture perception spectrogram can be determined as the gesture perception speed spectrogram.

And seventhly, carrying out fusion processing on the gesture sensing distance spectrogram, the gesture sensing azimuth spectrogram, the gesture sensing pitch spectrogram and the gesture sensing speed spectrogram to obtain the gesture sensing characteristic information. The gesture sensing distance spectrogram, the gesture sensing azimuth spectrogram, the gesture sensing pitch spectrogram and the gesture sensing speed spectrogram can be determined to be the gesture sensing distance spectrogram, the gesture sensing azimuth spectrogram, the gesture sensing pitch spectrogram and the gesture sensing speed spectrogram included in the gesture sensing characteristic information.

In some optional implementations of some embodiments, the executing body performs a screening process on the gesture sensing target point set based on the gesture sensing spectrogram to obtain a screening sensing target point set, and may include the following steps:

first, for each gesture-aware target point in the set of gesture-aware target points, the following generation sub-steps are performed:

And a first sub-step, performing first mapping processing on the gesture sensing spectrogram to obtain an azimuth angle value corresponding to the gesture sensing target point. The mapping processing can be performed on each antenna perception sub-spectrogram of the gesture perception spectrogram in the horizontal direction through a preset mapping algorithm, so that the azimuth angle value is obtained.

As an example, the above-mentioned preset mapping algorithm may be an FFT (Fast Fourier Transform ) algorithm.

And a second sub-step, performing second mapping processing on the gesture sensing spectrogram to obtain a pitching angle value corresponding to the gesture sensing target point. The mapping processing can be performed on each antenna perception sub-spectrogram of the gesture perception spectrogram in the vertical direction through the preset mapping algorithm, so as to obtain the pitching angle value.

And a third sub-step of generating gesture target point coordinates corresponding to the gesture perception target point based on the azimuth angle value and the pitching angle value. The first coordinate value may be a tangent value of the pitch angle value and a distance value corresponding to the gesture sensing target point in the gesture sensing spectrogram. Then, a product of the first coordinate value and the cotangent value of the azimuth angle value may be determined as an abscissa of the gesture target point coordinate. Then, a product of the first coordinate value and a tangent value of the azimuth angle value may be determined as an ordinate of the gesture target point coordinate. Finally, a cotangent value of the distance value and the pitch angle value corresponding to the gesture sensing target point in the gesture sensing spectrogram can be determined as a vertical coordinate of the gesture target point coordinate.

And secondly, clustering the generated gesture target point coordinates to obtain an outlier target point coordinate set. The generated gesture target point coordinates can be clustered through a preset clustering algorithm, and a target point cluster and an outlier target point coordinate set are obtained.

As an example, the above-mentioned preset clustering algorithm may be, but is not limited to, at least one of the following: K-Means (K-Means) clustering algorithm or DBSCAN (Density-Based Spatial Clustering of Applications with Noise, density-based spatial clustering algorithm) clustering algorithm.

And thirdly, deleting gesture sensing target points corresponding to the outlier target point coordinates in the outlier target point coordinate set from the gesture sensing target point set to obtain the screening sensing target point set.

Therefore, the accuracy and the accuracy of the obtained gesture feature spectrogram can be improved through the screening method, so that gesture recognition can be performed subsequently.

And 104, performing stitching processing on each gesture perception characteristic information in the gesture perception characteristic information sequence to obtain a gesture target spectrogram.

In some embodiments, the executing body may perform stitching processing on each gesture sensing feature information in the gesture sensing feature information sequence to obtain a gesture target spectrogram.

In some optional implementations of some embodiments, the performing body performs stitching processing on each gesture sensing feature information in the gesture sensing feature information sequence to obtain a gesture target spectrogram, and may include the following steps:

The first step, the gesture sensing distance spectrograms included in each gesture sensing characteristic information in the gesture sensing characteristic information sequence are spliced, and a distance time spectrogram is obtained. The gesture sensing distance spectrograms included in each gesture sensing characteristic information in the gesture sensing characteristic information sequence can be spliced according to the time sequence, and the distance time spectrogram is obtained.

And secondly, performing stitching processing on gesture sensing azimuth spectrograms included in each gesture sensing characteristic information in the gesture sensing characteristic information sequence to obtain azimuth time spectrograms. The gesture sensing azimuth spectrogram included in each gesture sensing characteristic information in the gesture sensing characteristic information sequence can be spliced according to the time sequence, so that the azimuth time spectrogram is obtained.

And thirdly, performing splicing processing on gesture sensing pitch angle spectrograms included in each gesture sensing characteristic information in the gesture sensing characteristic information sequence to obtain a pitch angle time spectrogram. The gesture sensing pitch angle spectrograms included in each gesture sensing characteristic information in the gesture sensing characteristic information sequence can be spliced according to time sequence, and the pitch angle time spectrogram is obtained.

And fourthly, performing splicing processing on gesture perception velocity spectrograms included in each gesture perception characteristic information in the gesture perception characteristic information sequence to obtain a velocity time spectrogram. The gesture sensing speed spectrogram included in each gesture sensing characteristic information in the gesture sensing characteristic information sequence can be spliced according to the time sequence, so that the speed time spectrogram is obtained.

And fifthly, performing splicing processing on the distance time spectrogram, the azimuth angle time spectrogram, the pitch angle time spectrogram and the speed time spectrogram to obtain the gesture target spectrogram. The distance time spectrogram, the azimuth angle time spectrogram, the pitch angle time spectrogram and the speed time spectrogram can be spliced into the gesture target spectrogram according to the spatial sequence from top to bottom.

And 105, inputting the gesture target spectrogram to a pre-trained second gesture instruction recognition model to obtain second control instruction information.

In some embodiments, the executing body may input the gesture target spectrogram to a pre-trained second gesture instruction recognition model to obtain second control instruction information. The pre-trained second gesture command recognition model may be a pre-trained neural network model with a gesture target spectrogram as input and second control command information as output.

Alternatively, the pre-trained second gesture instruction recognition model may be obtained by training the following steps:

A second step of selecting second sample gesture information from the second sample gesture information set, and executing the following second training substep:

And a second sub-step, performing stitching processing on each second sample gesture characteristic information in the second sample gesture characteristic information sequence to obtain a second sample gesture target spectrogram. The specific implementation manner and the technical effects of generating the second sample gesture target spectrogram may refer to step 104 in the foregoing embodiment, which is not described herein again.

And a third sub-step of inputting a second sample gesture target spectrogram to a first connection sub-model included in the initial second gesture instruction recognition model to obtain initial connection characteristic information. Wherein, the initial second gesture instruction recognition model further includes: a channel attention sub-model, a spatial attention sub-model, and a second connection sub-model. The initial second gesture command recognition model may be an untrained neural network model with a second sample gesture target spectrogram as input and second initial gesture control command information as output.

Here, the first connection sub-model may be divided into five layers: the first layer is a convolution layer and can be used for carrying out convolution processing on the second sample gesture target spectrogram. The second layer is a pooling layer, and can be used for pooling the output of the first layer. The third layer is a connection layer and can be used for performing connection operation on the output of the second layer. The fourth layer is a transition layer that can be used to reduce the number of channels of the output of the third layer. The fifth layer is a connection layer and can be used for performing connection operation on the output of the fourth layer to obtain the initial connection characteristic information.

The channel attention sub-model may be a channel attention module. The spatial attention sub-model may be a spatial attention module.

The second connection sub-model can be divided into five layers: the first layer is a transition layer that can be used to reduce the number of channels of the initial spatial signature. The second layer is a connection layer, and can be used for performing connection operation on the output of the first layer. The third layer is a transition layer, and the number of channels of the output of the second layer can be reduced forever. The fourth layer is a connection layer and can be used for performing connection operation on the output of the third layer. The fifth layer is a classification layer and can be used for carrying out normalization operation on the output of the fourth layer to obtain second initial gesture control instruction information.

As an example, the third layer and the fifth layer included in the first connection sub-model, and the second layer and the fourth layer included in the second connection sub-model may be one Dense Block, respectively. The fifth layer included in the second connection sub-model may be a softmax (normalization) function.

And a fourth sub-step of inputting the initial connection characteristic information into a channel attention sub-model included in the initial second gesture instruction recognition model to obtain the initial channel characteristic information.

And a fifth sub-step of inputting the initial channel characteristic information into a space attention sub-model included in the initial second gesture instruction recognition model to obtain initial space characteristic information.

And a sixth sub-step of inputting the initial spatial feature information into a second connection sub-model included in the initial second gesture command recognition model to obtain second initial gesture control command information.

And a seventh substep of determining a second control difference value between the second initial gesture control instruction information and the second sample control instruction information included in the second sample gesture information based on a preset second loss function.

As an example, the above-mentioned preset second loss function may be, but is not limited to, at least one of: MSE (Mean Square Error ) function, MAE (Mean Absolute Error, mean absolute error) function, or Categorical Cross Entropy Loss (class cross entropy loss) function.

An eighth substep, in response to determining that the second control difference value is less than the second target value, determines the initial second gesture command recognition model as the second gesture command recognition model.

As an example, the above-mentioned second target value may be 0.1.

Optionally, the executing body may further adjust related parameters in the initial second gesture command recognition model in response to determining that the second control difference value is equal to or greater than the second target value, determine the adjusted initial second gesture command recognition model as the initial second gesture command recognition model, and select second sample gesture information from each second sample gesture information that is not selected in the second sample gesture information set, so as to execute the second training step again. The related parameters in the initial second gesture instruction recognition model can be adjusted through the preset adjustment algorithm.

Therefore, the second gesture instruction recognition model can initially extract the characteristic information in the characteristic spectrogram of the radar signal through the first connection sub-model. The channel attention information may then be further extracted by a channel attention sub-model. The spatial attention information may then be further extracted by a spatial attention sub-model. Finally, gesture control instruction information in the channel attention information and the spatial attention information can be extracted through the second connection sub-model. Therefore, the accuracy of the obtained gesture control instruction can be improved, and further, the accuracy of gesture control of the article management system can be improved.

And 106, carrying out fusion processing on the first control instruction information and the second control instruction information to obtain system control instruction information.

As an example, the above-mentioned preset normalization algorithm may be a Sigmoid (normalization) function. The pre-trained classification network model may be an FC (full Connected) network model.

The relevant content of steps 102-106 is taken as an invention point of the embodiments of the present disclosure, solving the second technical problem mentioned in the background art, namely "the accuracy of the control of the item management system is reduced". Among other factors, factors that lead to reduced accuracy in control of the item management system tend to be as follows: when a gesture is shot by a camera, the accuracy of the recognized gesture command may be reduced as the sharpness of the image shot by the camera is reduced. If the above factors are solved, the effect of improving the accuracy of the control of the article management system can be achieved. To achieve this effect, the present disclosure may recognize gesture instructions sensed by the radar sensor through the neural network model while recognizing gesture instructions photographed by the camera through the neural network model. Then, the gesture command shot by the camera and the gesture command perceived by the radar sensor can be fused through the neural network model, so that the accuracy of gesture recognition is further improved. Therefore, the camera can be used, and meanwhile, the radar is also used for identifying the gesture command, so that the accuracy of gesture command identification is improved, and further, the accuracy of control of the article management system can be improved.

Step 107, controlling the article management system based on the system control instruction information.

In some embodiments, the executing entity may control the article management system based on the system control instruction information. The article management system may be a system that operates on a computing device that includes a capacitive touch screen. The article management system described above may be used to record and manage articles in a logistics warehouse in real time.

Optionally, the executing body controls the article management system based on the system control instruction information, and may include the following steps:

And the first step, in response to determining that the system control instruction information is first instruction information, displaying a startup and shutdown interface. The first instruction information may be information representing "display on/off interface".

And a second step of displaying a keyboard chart in response to determining that the system control instruction information is second instruction information. The second instruction information may be information representing "display keyboard interface".

And thirdly, displaying the article chart information in response to determining that the system control instruction information is third instruction information. The third instruction information may be information representing "displaying an item chart". The item map information may characterize the storage information of items within the item warehouse. The item chart information may include, but is not limited to, at least one of the following: article warehouse-in time, article quantity, article volume and article storage position information.

And a fourth step of restarting the article management system in response to determining that the system control instruction information is fourth instruction information. The fourth instruction information may be information indicating "restarting the article management system".

With further reference to FIG. 2, as an implementation of the method illustrated in the above figures, the present disclosure provides some embodiments of an article management system gesture control apparatus, corresponding to those method embodiments illustrated in FIG. 1, which may find particular application in a variety of electronic devices.

As shown in fig. 2, the article management system gesture control apparatus 200 of some embodiments includes: an acquisition unit 201, a first input unit 202, a feature extraction unit 203, a stitching unit 204, a second input unit 205, a fusion unit 206, and a control unit 207. Wherein, the acquisition unit 201 is configured to acquire a gesture dynamic image sequence and a gesture perception signal sequence; a first input unit 202 configured to input the gesture dynamic image sequence to a first gesture instruction recognition model trained in advance, to obtain first control instruction information; a feature extraction unit 203 configured to perform feature extraction processing on each gesture sensing signal in the gesture sensing signal sequence to generate gesture sensing feature information, so as to obtain a gesture sensing feature information sequence; a stitching unit 204, configured to stitch each piece of gesture sensing feature information in the gesture sensing feature information sequence to obtain a gesture target spectrogram; a second input unit 205 configured to input the gesture target spectrogram to a pre-trained second gesture instruction recognition model, to obtain second control instruction information; a fusion unit 206 configured to perform fusion processing on the first control instruction information and the second control instruction information to obtain system control instruction information; the control unit 207 is configured to control the article management system based on the above-described system control instruction information.

It will be appreciated that the elements described in the item management system gesture control apparatus 200 correspond to the various steps in the item management system gesture control method described with reference to fig. 1. Thus, the operations, features, and advantages described above for the gesture control method of the article management system are equally applicable to the gesture control device 200 of the article management system and the units contained therein, and are not described herein.

Referring now to fig. 3, a schematic diagram of an electronic device 300 suitable for use in implementing some embodiments of the present disclosure is shown. The electronic devices in some embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), car terminals (e.g., car navigation terminals), and the like, as well as stationary terminals such as digital TVs, desktop computers, and the like. The terminal device shown in fig. 3 is only one example and should not impose any limitation on the functionality and scope of use of the embodiments of the present disclosure.

As shown in fig. 3, the electronic device 300 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 301 that may perform various suitable actions and processes in accordance with a program stored in a Read Only Memory (ROM) 302 or a program loaded from a storage means 308 into a Random Access Memory (RAM) 303. In the RAM 303, various programs and data required for the operation of the electronic apparatus 300 are also stored. The processing device 301, the ROM 302, and the RAM 303 are connected to each other via a bus 304. An input/output (I/O) interface 305 is also connected to bus 304.

In general, the following devices may be connected to the I/O interface 305: input devices 306 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 307 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 308 including, for example, magnetic tape, hard disk, etc.; and communication means 309. The communication means 309 may allow the electronic device 300 to communicate with other devices wirelessly or by wire to exchange data. While fig. 3 shows an electronic device 300 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead. Each block shown in fig. 3 may represent one device or a plurality of devices as needed.

In particular, according to some embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, some embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such embodiments, the computer program may be downloaded and installed from a network via communications device 309, or from storage device 308, or from ROM 302. The above-described functions defined in the methods of some embodiments of the present disclosure are performed when the computer program is executed by the processing means 301.

It should be noted that, the computer readable medium described in some embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In some embodiments of the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In some embodiments of the present disclosure, however, the computer-readable signal medium may comprise a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (HyperText Transfer Protocol ), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the internet (e.g., the internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed networks.

The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a gesture dynamic image sequence and a gesture perception signal sequence; inputting the gesture dynamic image sequence into a pre-trained first gesture instruction recognition model to obtain first control instruction information; performing feature extraction processing on each gesture sensing signal in the gesture sensing signal sequence to generate gesture sensing feature information, so as to obtain a gesture sensing feature information sequence; performing stitching processing on each gesture perception characteristic information in the gesture perception characteristic information sequence to obtain a gesture target spectrogram; inputting the gesture target spectrogram to a pre-trained second gesture instruction recognition model to obtain second control instruction information; the first control instruction information and the second control instruction information are fused to obtain system control instruction information; and controlling the article management system based on the system control instruction information.

Computer program code for carrying out operations for some embodiments of the present disclosure may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in some embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware. The described units may also be provided in a processor, for example, described as: the processor comprises an acquisition unit, a first input unit, a feature extraction unit, a splicing unit, a second input unit, a fusion unit and a control unit. The names of these units do not limit the unit itself in some cases, and the acquisition unit may also be described as "a unit that acquires a gesture dynamic image sequence and a gesture sensing signal sequence", for example.

The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.

The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above technical features, but encompasses other technical features formed by any combination of the above technical features or their equivalents without departing from the spirit of the invention. Such as the above-described features, are mutually substituted with (but not limited to) the features having similar functions disclosed in the embodiments of the present disclosure.

Claims

1. A method of gesture control for an item management system, comprising:

acquiring a gesture dynamic image sequence and a gesture perception signal sequence;

Inputting the gesture dynamic image sequence into a pre-trained first gesture instruction recognition model to obtain first control instruction information;

Performing feature extraction processing on each gesture sensing signal in the gesture sensing signal sequence to generate gesture sensing feature information, and obtaining a gesture sensing feature information sequence;

Performing stitching processing on each gesture perception characteristic information in the gesture perception characteristic information sequence to obtain a gesture target spectrogram;

Inputting the gesture target spectrogram to a pre-trained second gesture instruction recognition model to obtain second control instruction information;

performing fusion processing on the first control instruction information and the second control instruction information to obtain system control instruction information;

and controlling the article management system based on the system control instruction information.

2. The method of claim 1, wherein the feature extraction processing of each gesture-aware signal in the sequence of gesture-aware signals to generate gesture-aware feature information comprises:

converting the gesture sensing signals to obtain gesture sensing spectrograms;

Performing target detection processing on the gesture sensing spectrogram to obtain a gesture sensing target point set;

Screening the gesture sensing target point set based on the gesture sensing spectrogram to obtain a screening sensing target point set;

Based on the screening perception target point set, performing first projection processing on the gesture perception spectrogram to obtain a gesture perception distance spectrogram;

positioning the gesture sensing distance spectrogram to obtain a gesture sensing azimuth spectrogram and a gesture sensing pitch spectrogram;

based on the screening perception target point set, performing second projection processing on the gesture perception spectrogram to obtain a gesture perception speed spectrogram;

And carrying out fusion processing on the gesture sensing distance spectrogram, the gesture sensing azimuth spectrogram, the gesture sensing pitch spectrogram and the gesture sensing speed spectrogram to obtain the gesture sensing characteristic information.

3. The method of claim 2, wherein the screening the gesture sensing target point set based on the gesture sensing spectrogram to obtain a screening sensing target point set includes:

for each gesture-aware target point in the set of gesture-aware target points, performing the generating step of:

performing first mapping processing on the gesture sensing spectrogram to obtain an azimuth angle value corresponding to the gesture sensing target point;

Performing second mapping processing on the gesture sensing spectrogram to obtain a pitching angle value corresponding to the gesture sensing target point;

generating gesture target point coordinates corresponding to the gesture perception target point based on the azimuth angle value and the pitching angle value;

clustering the generated gesture target point coordinates to obtain an outlier target point coordinate set;

and deleting gesture perception target points corresponding to the outlier target point coordinates in the outlier target point coordinate set from the gesture perception target point set to obtain the screening perception target point set.

4. The method according to claim 2, wherein the performing a stitching process on each gesture sensing feature information in the gesture sensing feature information sequence to obtain a gesture target spectrogram includes:

performing stitching processing on gesture sensing distance spectrograms included in each gesture sensing characteristic information in the gesture sensing characteristic information sequence to obtain a distance time spectrogram;

Performing stitching processing on gesture sensing azimuth spectrograms included in each gesture sensing characteristic information in the gesture sensing characteristic information sequence to obtain azimuth time spectrograms;

Performing splicing processing on gesture sensing pitch angle spectrograms included in each gesture sensing characteristic information in the gesture sensing characteristic information sequence to obtain a pitch angle time spectrogram;

Performing stitching processing on gesture perception velocity spectrograms included in each gesture perception characteristic information in the gesture perception characteristic information sequence to obtain velocity time spectrograms;

And performing splicing processing on the distance time spectrogram, the azimuth angle time spectrogram, the pitch angle time spectrogram and the speed time spectrogram to obtain the gesture target spectrogram.

5. The method of claim 1, wherein the pre-trained first gesture instruction recognition model is trained by:

Acquiring a first sample gesture information set, wherein each first sample gesture information in the first sample gesture information set comprises: the first sample gesture image sequence and the first sample control instruction information;

Selecting first sample gesture information from the first sample gesture information set, and executing the following first training steps:

Inputting a first sample gesture image sequence included in first sample gesture information into a convolution submodel included in an initial first gesture instruction recognition model to obtain an initial gesture space-time feature map, wherein the initial first gesture instruction recognition model further comprises: an attention sub-model, a circulation sub-model, and an output sub-model;

inputting the initial gesture space-time feature map to a attention sub-model included in an initial first gesture instruction recognition model to obtain initial gesture attention feature information;

inputting initial gesture attention characteristic information into a circulation sub-model included in an initial first gesture instruction recognition model to obtain initial gesture circulation characteristic information;

Inputting the initial gesture circulation characteristic information into an output sub-model included in an initial first gesture instruction recognition model to obtain first initial gesture control instruction information;

Determining a first control difference value of first initial gesture control instruction information and first sample control instruction information included in first sample gesture information based on a preset first loss function;

in response to determining that the first control difference value is less than the first target value, an initial first gesture command recognition model is determined as the first gesture command recognition model.

6. The method of claim 5, wherein the method further comprises:

And in response to determining that the first control difference value is greater than or equal to a first target value, adjusting relevant parameters in an initial first gesture instruction recognition model, determining the adjusted initial first gesture instruction recognition model as an initial first gesture instruction recognition model, and selecting first sample gesture information from all first sample gesture information which is not selected in the first sample gesture information set so as to execute the first training step again.

7. An article management system gesture control device, comprising:

an acquisition unit configured to acquire a gesture dynamic image sequence and a gesture perception signal sequence;

the first input unit is configured to input the gesture dynamic image sequence to a pre-trained first gesture instruction recognition model to obtain first control instruction information;

the feature extraction unit is configured to perform feature extraction processing on each gesture sensing signal in the gesture sensing signal sequence to generate gesture sensing feature information, so as to obtain a gesture sensing feature information sequence;

the splicing unit is configured to splice each piece of gesture perception characteristic information in the gesture perception characteristic information sequence to obtain a gesture target spectrogram;

the second input unit is configured to input the gesture target spectrogram to a pre-trained second gesture instruction recognition model to obtain second control instruction information;

The fusion unit is configured to fuse the first control instruction information and the second control instruction information to obtain system control instruction information;

and a control unit configured to control the article management system based on the system control instruction information.

8. An electronic device, comprising:

One or more processors;

a storage device having one or more programs stored thereon;

When executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-6.

9. A computer readable medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the method of any of claims 1-6.