Movatterモバイル変換


[0]ホーム

URL:


CN113762045B - Point reading position identification method, device, point reading device and storage medium - Google Patents

Point reading position identification method, device, point reading device and storage medium

Info

Publication number
CN113762045B
CN113762045BCN202110488678.9ACN202110488678ACN113762045BCN 113762045 BCN113762045 BCN 113762045BCN 202110488678 ACN202110488678 ACN 202110488678ACN 113762045 BCN113762045 BCN 113762045B
Authority
CN
China
Prior art keywords
preset target
image
preset
prediction
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110488678.9A
Other languages
Chinese (zh)
Other versions
CN113762045A (en
Inventor
项小明
王禹
刘睿哲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co LtdfiledCriticalTencent Technology Shenzhen Co Ltd
Priority to CN202110488678.9ApriorityCriticalpatent/CN113762045B/en
Publication of CN113762045ApublicationCriticalpatent/CN113762045A/en
Application grantedgrantedCritical
Publication of CN113762045BpublicationCriticalpatent/CN113762045B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

Translated fromChinese

本申请涉及一种点读位置识别方法、装置、点读设备和存储介质,方法包括:获取点读区域内的待识别图像;对待识别图像进行目标检测,得到目标检测结果;若根据目标检测结果确定待识别图像中包含第一预设目标或第二预设目标,输出第一预设目标或者第二预设目标中的特定位置信息,特定位置表示预设目标中指定的目标位置;若根据目标检测结果确定待识别图像中包含第一预设目标和第二预设目标,根据预设的第一预设目标和第二预设目标的优先级输出优先级更高的预设目标中的特定位置信息。上述方法通过对图像进行目标检测,来确定用户所指位置,无需配套的点读笔也可实现点读,且支持通过两种目标指定点读位置,丰富了可支持的点读方式。

The present application relates to a point-reading position recognition method, apparatus, point-reading device, and storage medium. The method comprises: obtaining an image to be recognized within a point-reading area; performing target detection on the image to be recognized to obtain a target detection result; if it is determined based on the target detection result that the image to be recognized contains a first preset target or a second preset target, outputting specific position information in the first preset target or the second preset target, wherein the specific position represents the target position specified in the preset target; if it is determined based on the target detection result that the image to be recognized contains a first preset target and a second preset target, outputting specific position information in the preset target with a higher priority based on the priority of the preset first preset target and the second preset target. The above method determines the position pointed to by the user by performing target detection on the image. Point reading can be achieved without a matching point-reading pen, and supports specifying the point-reading position through two targets, enriching the supported point-reading methods.

Description

Click-to-read position identification method and device, click-to-read equipment and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method and apparatus for identifying a click-to-read position, a click-to-read device, and a storage medium.
Background
In teenager study, reading books is one of the important ways, and more parents can choose to make teenagers read paper books in order to better protect eyes of teenagers. When teenagers encounter unrecognized words or unintelligible content, etc. while reading a paper book, they need to answer it. With the development of technology, products such as a point-and-read machine are presented, the point-and-read machine can support teenagers to identify corresponding book contents by indicating the content of the book with confusion, the point-and-read machine identifies the corresponding book contents by identifying the indicated position and gives corresponding responses, for example, the teenagers indicate unrecognized words, the point-and-read machine provides pronunciation, explanation and the like, and as the teenagers indicate an expression for solving the problem, the point-and-read machine gives a solution method and the like.
In the related art, part of the point-reading machine must use matched hardware equipment to carry out point-reading, for example, a magnetic device is arranged at a customized book and a pre-embedded point-reading position, and a corresponding position is pointed-read by a matched point-reading pen, so that corresponding content is triggered, and the mode is limited greatly.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a method, an apparatus, a pointing device, and a storage medium for recognizing a pointing position that can support a rich pointing method.
A method of click-to-read location identification, the method comprising:
acquiring an image to be identified in a click-to-read area;
Performing target detection on the image to be identified to obtain a target detection result;
If the image to be identified contains a first preset target or a second preset target according to the target detection result, outputting specific position information in the first preset target or the second preset target, wherein the specific position represents a target position appointed in the preset target;
If the image to be identified contains a first preset target and a second preset target according to the target detection result, outputting specific position information in the preset targets with higher priority according to the priorities of the preset first preset target and the preset second target.
A point-to-read location identification device, the device comprising:
The image acquisition module is used for acquiring an image to be identified in the click-to-read area;
The target detection module is used for carrying out target detection on the image to be identified to obtain a target detection result;
the position information output module is used for outputting specific position information in the first preset target or the second preset target if the image to be identified contains the first preset target or the second preset target according to the target detection result, wherein the specific position represents the designated target position in the preset target, and outputting specific position information in the preset target with higher priority according to the priority of the preset first preset target and the preset second preset target if the image to be identified contains the first preset target and the second preset target according to the target detection result.
A point-and-read device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:
acquiring an image to be identified in a click-to-read area;
Performing target detection on the image to be identified to obtain a target detection result;
If the image to be identified contains a first preset target or a second preset target according to the target detection result, outputting specific position information in the first preset target or the second preset target, wherein the specific position represents a target position appointed in the preset target;
If the image to be identified contains a first preset target and a second preset target according to the target detection result, outputting specific position information in the preset targets with higher priority according to the priorities of the preset first preset target and the preset second target.
A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:
acquiring an image to be identified in a click-to-read area;
Performing target detection on the image to be identified to obtain a target detection result;
If the image to be identified contains a first preset target or a second preset target according to the target detection result, outputting specific position information in the first preset target or the second preset target, wherein the specific position represents a target position appointed in the preset target;
If the image to be identified contains a first preset target and a second preset target according to the target detection result, specific position information in the preset target with higher priority output priority of the first preset target and the second preset target is read.
According to the method, the device, the point-reading equipment and the storage medium for identifying the point-reading position, the image to be identified in the point-reading area is obtained, target detection is carried out on the image to be identified, wherein the target detection comprises detection of a first preset target and a second preset target, on the basis of the target detection result, if only one preset target is contained, the position of a specific position in the preset target is output, and if two preset targets are contained, the specific position information in the preset target with higher priority is output, and the position is determined as the point-reading position. According to the method, the position pointed by the user is determined by carrying out target detection on the image, the click-reading can be realized without a matched click-reading pen, the click-reading positions are pointed by two targets, for example, the click-reading mode is realized by a hand-pointing or pen-point click-reading mode, and the supportable click-reading mode is enriched.
Drawings
FIG. 1 is a flow chart of a read-in-place identification method according to an embodiment;
FIG. 2 is a flow chart of a read position identification method according to an embodiment;
FIG. 3 is a flow chart of inputting image features into two or more attribute prediction branches, respectively, and obtaining an attribute prediction result output by each attribute prediction branch in one embodiment;
FIG. 4 is a schematic diagram of a network architecture of a click-through location recognition model in one embodiment;
FIG. 5 (1) is a schematic diagram showing the center position of a preset target (hand) in one embodiment;
FIG. 5 (2) is a thermodynamic diagram corresponding to a preset target center position in an embodiment;
FIG. 6 (1) is a schematic diagram of a specific location (finger tip) of a preset target in one embodiment;
FIG. 6 (2) is a thermodynamic diagram of the finger tip or nib position versus position in one embodiment;
FIG. 7 (1) is a schematic diagram of corresponding text content identified and output by a pointing device according to a point-read target coordinate location in one embodiment;
FIG. 7 (2) is a schematic diagram of corresponding text content identified and output by the pointing device according to the point-read target coordinate position in another embodiment;
FIG. 8 is a block diagram of a read-in-place identification device in one embodiment;
FIG. 9 is an internal block diagram of a read-out device in one embodiment.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
In some embodiments, the method for recognizing a click-to-read position provided by the application can be applied to a click-to-read device, the click-to-read device obtains an image to be recognized in a click-to-read area, target detection is performed on the image to be recognized, wherein the target detection comprises detection of a first preset target and a second preset target, on the basis of the result of target detection, if only one preset target is included, the position of a specific position in the preset target is output, if two preset targets are included, specific position information in the preset target with higher priority is output, the position is determined as the click-to-read position, and text content is recognized and output according to the position of the specific position, so that the purpose of clicking-to-read is realized.
In other embodiments, the method for identifying a click-to-read position provided by the application can be applied to a system comprising a click-to-read device and a server. The point-and-read device communicates with the server through a network. The method comprises the steps that a server obtains an image to be identified in a click-to-read area from click-to-read equipment, target detection is carried out on the image to be identified, wherein the target detection comprises detection of a first preset target and a second preset target, on the basis of the target detection result, if only one preset target is contained, the position of a specific position in the preset target is output, if two preset targets are contained, specific position information in the preset target with higher priority is output, the specific position information is determined to be the click-to-read position, and finally the determined click-to-read position is fed back to the click-to-read equipment, so that the click-to-read equipment can identify text content according to the position of the specific position to output, and the purpose of clicking-to-read is achieved. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, basic cloud computing services such as big data and artificial intelligent platforms. The click-to-read device may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, etc. having an image capturing function. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the present application is not limited herein.
The cloud computing (cloud computing) refers to a delivery and use mode of an IT infrastructure, to obtain required resources in an on-demand and easily-extensible mode through a network, and the cloud computing (cloud computing) refers to a delivery and use mode of a service, to obtain required services in an on-demand and easily-extensible mode through the network. Such services may be IT, software, internet related, or other services. Cloud Computing is a product of fusion of traditional computer and network technology developments such as Grid Computing (Grid Computing), distributed Computing (DistributedComputing), parallel Computing (Parallel Computing), utility Computing (Utility Computing), network storage (Network Storage Technologies), virtualization (Virtualization), load balancing (Load Balance), and the like.
With the development of the internet, real-time data flow and diversification of connected devices, and the promotion of demands of search services, social networks, mobile commerce, open collaboration and the like, cloud computing is rapidly developed. Unlike the previous parallel distributed computing, the generation of cloud computing will promote the revolutionary transformation of the whole internet mode and enterprise management mode in concept.
In some embodiments of the application, it is contemplated that computer vision is utilized to identify whether a particular target is present in the acquired image. Computer Vision (CV) is a science of studying how to "look" a machine, and more specifically, to replace a human eye with a camera and a Computer to perform machine Vision such as recognition, measurement, etc. on a target, and further perform graphic processing, so that the Computer processing becomes an image more suitable for human eye observation or transmission to an instrument for detection. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can acquire information from images or multidimensional data. Computer vision techniques typically include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D techniques, virtual reality, augmented reality, synchronous positioning, and map construction, among others, as well as common biometric recognition techniques such as face recognition, fingerprint recognition, and others.
In some embodiments of the application, image feature extraction, attribute prediction, and the like are also implemented using a neural network that belongs to machine learning. Machine learning (MACHINE LEARNING, ML) is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, and the like. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, confidence networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like.
In one embodiment, as shown in fig. 1, a method for identifying a click-to-read position is provided, which includes steps S110 to S140.
Step S110, an image to be identified in the click-to-read area is obtained.
The click-to-read area represents an image acquisition area of the click-to-read device, and is usually a range covered by an image acquisition module of the click-to-read device. For example, in one embodiment, the movable point-and-read device is placed on a desktop, and the coverage area aligned with the image acquisition module of the point-and-read device is the point-and-read area. In this embodiment, the image acquired in the read-to-point area is recorded as an image to be recognized.
In one embodiment, the method is applied to a system of a server and a point-reading device, the image to be identified in the point-reading area is acquired by the server from the point-reading device and acquired by an image acquisition module of the point-reading device, and when the method is applied to the point-reading device, the image to be identified in the point-reading area is acquired by the point-reading device and acquired by the image acquisition module.
Further, in one embodiment, during operation of the pointing device, images in a pointing region are acquired at intervals of a preset time, where the preset time period may be set to an arbitrary duration according to an actual situation.
Step S120, performing target detection on the image to be identified to obtain a target detection result.
Target detection, also called target extraction, is an image segmentation based on target geometry and statistical features. In this embodiment, after the image to be identified is obtained, the target detection is performed in the image to be identified, and further, the target detection in this embodiment includes detecting whether the image to be identified includes a first preset target and/or a second preset target.
In one embodiment, the object detection of the first preset object and the second preset object on the image to be identified can be achieved in any mode. For example, in one embodiment, two object detection models are trained separately, one for detecting a first preset object and the other for detecting a second preset object. In another embodiment, a model is trained comprising a feature extraction portion and an attribute prediction portion, wherein a branch in the attribute prediction portion for detecting whether a preset target is included comprises two channels for detecting a first preset target and a second preset target, respectively. In other embodiments, this may be accomplished in other ways.
Step S130, if it is determined that the image to be identified contains the first preset target or the second preset target according to the target detection result, specific position information in the first preset target or the second preset target is output, and the specific position represents a target position specified in the preset target.
Step S140, if it is determined that the image to be identified includes the first preset target and the second preset target according to the target detection result, outputting the specific position information in the preset target with higher priority according to the priority of the preset first preset target and the preset second target.
And if the image to be identified is determined to only contain the first preset target, outputting the specific position information in the first preset target, and if the image to be identified is determined to only contain the second preset target, outputting the specific position information in the second preset target.
And if the image to be identified is determined to simultaneously contain a first preset target and a second preset target according to the target detection result, outputting specific position information in the preset target with higher priority according to the priority of the preset first preset target and the preset second target. The priorities of the first preset target and the second preset target can be preset according to actual conditions.
Wherein the specific position in the preset target represents a position set in the preset target in advance. In practical applications, when a user faces an unintelligible text content, the user may point a finger tip, a pen tip, etc. to a position to be read, that is, a position in contact with the text to be identified, and when the point reading device identifies the point reading position, the user needs to identify the position of the finger tip, the pen tip, etc., in this embodiment, the user first identifies a preset target by identifying a hand (such as an arm, a palm) or a pen, etc., and then determines the position of the finger tip, the pen tip, etc. from the preset target and outputs the target. In one embodiment, the specific position of the first preset target and the specific position of the second preset target may be set according to practical situations, for example, the first preset target is a palm, the specific position of the first preset target is a finger, the second preset target is a pen, the specific position of the second preset target is a pen tip, and so on.
In another embodiment, if it is determined that the first preset target or the second preset target is not detected in the image to be identified according to the target detection result, it is possible that the current user does not perform the click-to-read operation, and no location information is output at this time.
In a specific embodiment, the first preset target represents a user's hand (such as an arm or palm portion), the second preset target represents a pen (such as a sign pen, a capacitance pen, or any other pen), and if only the user's hand is detected in the image to be recognized, specific position information in the hand is output, if only the pen is detected in the image to be recognized, position information of the pen is output, and if the hand and the pen are detected at the same time, specific position information in the second preset target pen is output according to the priority. In other embodiments, the preset target may be other targets as well. In this embodiment, taking the first preset target as a hand and the second preset target as a pen as an example, a user may possibly click out text content to be understood through a finger tip or a pen point in a scene of writing with the pen, the method supports two modes of clicking and reading by the hand and the pen, under the scene, a user does not need to select a fixed mode to click, but can switch between the two modes of pointing and reading at the finger tip and the pen point at will, so that the complicated operation of clicking and reading by using the fixed mode is reduced, the operation difficulty of the user is reduced, and the user experience is improved.
Further, in one embodiment, when the position information of the preset target is output, the position information of the contact position of the preset target and the text content to be recognized is output. For example, in one embodiment, the user points to the text content to be recognized through the finger tip, that is, the position of the finger tip is the position corresponding to the text content to be recognized which needs to be output, and the position information of the finger tip is output, or the user points to the text content to be recognized through the pen point, and the position information of the pen point is output.
According to the click-to-read position identification method, the image to be identified in the click-to-read area is obtained, target detection is carried out on the image to be identified, wherein the target detection comprises detection of a first preset target and a second preset target, on the basis of the target detection result, if only one preset target is contained, the position of a specific position in the preset target is output, and if two preset targets are contained, the specific position information in the preset target with higher priority is output, and the position is determined to be the click-to-read position. According to the method, the position pointed by the user is determined by carrying out target detection on the image, the click-reading can be realized without a matched click-reading pen, the click-reading positions are pointed by two targets, for example, the click-reading mode is realized by a hand-pointing or pen-point click-reading mode, and the supportable click-reading mode is enriched.
In one embodiment, performing target detection on an image to be identified to obtain a target detection result comprises extracting image features of the image to be identified, respectively inputting the image features into more than two attribute prediction branches to obtain attribute prediction results output by each attribute prediction branch, wherein the attribute prediction branches comprise a central position prediction branch of a preset target and a specific position prediction branch in the preset target.
Features of an image can be divided into two layers, including low-level visual features, and high-level semantic features. The low-level visual features include three aspects of texture, color and shape. Semantic features are things-to-things relationships. The texture feature extraction algorithm comprises a gray level symbiotic matrix method, and the color feature extraction algorithm comprises a histogram method, a cumulative histogram method, a color clustering method and the like. The shape feature extraction algorithm comprises the steps of high-level semantic extraction of space moment features and the like, namely semantic network, mathematical logic, framework and the like. In other embodiments, extracting features of the image may also be accomplished through a neural network.
In one embodiment, extracting image features of an image to be identified comprises extracting the image features of the image to be identified through a feature extraction network, wherein the feature extraction network comprises a continuous downsampling layer and a continuous upsampling layer, the number of downsampling layers is a first preset number, the number of upsampling layers is a second preset number, and the second preset number is smaller than or equal to the first preset number.
In machine learning, pattern recognition and image processing, feature extraction starts from an initial set of measured data and builds derived values (features) that are intended to provide information and non-redundancy, thereby facilitating subsequent learning and generalization steps and in some cases leading to better interpretability. In this embodiment, the extraction of the image features of the image to be identified may be achieved by any one of the ways.
Downsampling (subsampled), also called downsampling (downsampled), has the main purposes of two, 1, making the image conform to the size of the display area, and 2, generating a thumbnail of the corresponding image. The downsampling principle is that for an image I with size of M x N, s times downsampling is carried out to obtain a resolution image with size of (M/s) x (N/s), of course, s should be the common divisor of M and N, if a matrix image is considered, the image in the window of the original image s x s is changed into a pixel, and the value of the pixel point is the average value of all pixels in the window. In a specific embodiment, the image to be identified is downsampled using a continuous downsampling layer, wherein the downsampling layer employs a downsampling factor of 2.
Upsampling (upsampling), also known as image interpolation (interpolation), is mainly aimed at enlarging the original image so that it can be displayed on a higher resolution display device. The up-sampling principle is that the image amplification almost adopts an interpolation method, namely, new elements are inserted between pixel points by adopting a proper interpolation algorithm on the basis of original image pixels. Among them, common methods for interpolating an image include a conventional image interpolation method, an edge-based image interpolation algorithm, a region-based image interpolation algorithm, and the like. In a specific embodiment, the downsampled image features of the minimum size are upsampled using successive upsampling layers, wherein the upsampling layers have a multiple of 2, i.e., the image features output via the upsampling layers are 2 times the size of the input image features.
Further, in one embodiment, image features of an image to be identified are extracted through a feature extraction network, wherein the image features comprise downsampling images of different scales in a layer-by-layer mode, the downsampling layers are a first preset number, the downsampling images are upsampled in a layer-by-layer mode through a feature pyramid network to obtain the image features of the image to be identified, the upsampling layers are a second preset number, and the second preset number is smaller than or equal to the first preset number. In a specific embodiment, the downsampling layer comprises 5 layers, and the upsampling layer comprises 3 layers, namely the image to be identified is input into the feature extraction network, and the size of the output image features is 1/4 of the size of the image to be identified.
In one embodiment, before inputting the image to be identified into the feature extraction network, further comprising adjusting the size of the image to be identified to a preset size. The preset size may be set to any value according to practical situations, for example, the preset size is set to 320×320. Wherein the resizing of the image may be accomplished according to any of a number of ways.
Further, in one embodiment, each downsampled image feature is upsampled layer by layer through the feature pyramid network, including taking a downsampled image feature of a minimum size as an initial upsampled image feature, stitching the minimum size downsampled image feature with the initial upsampled image feature, inputting the downsampled image feature into a first upsampling layer to obtain a first upsampled image feature, stitching the first upsampled image feature with the downsampled image feature of the same size, inputting the downsampled image feature into a second upsampling layer to obtain a second upsampled image feature, stitching the second upsampled image feature with the downsampled image feature of the same size, and inputting the downsampled image feature into a third upsampling layer to obtain a third upsampled image feature, which is an image feature of an image to be identified.
In another embodiment, the characteristic channel of the characteristic extraction network is N, and the balance between accuracy and operation speed can be realized by adjusting according to the requirement of the device on the model operation performance.
According to the embodiment, the image features are extracted through the pre-trained neural network, and the neural network is trained, so that more accurate image features can be extracted for target detection.
In this embodiment, the attribute prediction branch is used for carrying out attribute prediction on the extracted image feature, wherein the attribute prediction branch at least comprises a central position prediction branch of the preset target and a specific position prediction branch in the preset target. The output result of the central position prediction branch of the preset target is the position information of the central position of the preset target, and the output result of the specific position prediction branch in the preset target is the specific position information in the preset target; it can be understood that, in the present embodiment, the target detection result includes position information of a preset target center position, specific position information of a preset target.
In one embodiment, the specific position prediction branch comprises a specific position prediction sub-branch and a specific position drift prediction sub-branch, as shown in fig. 2, the image features are respectively input into more than two attribute prediction branches, and the attribute prediction result output by each attribute prediction branch is obtained, including steps S210 to S230.
Step S210, inputting the image features into a central position prediction branch of a preset target to obtain a thermodynamic diagram corresponding to the central position of the first preset target and a thermodynamic diagram corresponding to the central position of a second preset target.
The thermodynamic diagram (Heatmap) may reflect data information in a two-dimensional matrix or table with color changes that intuitively indicate the size of the data values in a defined shade of color. In this embodiment, after the center position of the preset target is obtained, it is converted into a form of thermodynamic diagram to be represented. In one embodiment, the thermodynamic diagram is specifically a gaussian thermodynamic diagram, and the two-dimensional coordinates are subjected to gaussian transformation to obtain a corresponding gaussian thermodynamic diagram. The transformation of the coordinates into the gaussian thermodynamic diagram can be achieved in any way.
The center position of the preset target represents the center position of the generated boundary frame of the preset target when the preset target is detected. The preset target bounding box is a box containing the preset target, usually a rectangular box, and the arm contains the bounding box when the arm is detected, and the pen contains the bounding box when the pen is detected.
In one embodiment, the central position prediction branch of the preset target includes two channels, wherein one channel is used for outputting a central position thermodynamic diagram of the first preset target, and the other channel is used for outputting a central position thermodynamic diagram of the second preset target. In one specific embodiment, the central position prediction branch of the preset target comprises a 3*3 convolution layer and a 1*1 convolution layer, wherein attribute learning is performed through the 3*3 convolution layer, and the number of channels is adjusted to the dimension of the target attribute through 1*1 convolution.
Step S220, inputting the image features into the specific position prediction sub-branch to obtain a thermodynamic diagram corresponding to the specific position in the first preset target and a thermodynamic diagram corresponding to the specific position in the second preset target.
Similar to the central location prediction branch, in this step, the thermodynamic diagram is also converted to be represented for a specific location in the preset target.
In one embodiment, the specific location prediction sub-branch includes two channels, wherein one channel is used for outputting a specific location thermodynamic diagram of a first preset target, and the other channel is used for outputting a specific location thermodynamic diagram of a second preset target. In one particular embodiment the position-specific predictor branch includes a 3*3 convolution layer and a 1*1 convolution layer, wherein the attribute learning is performed by the 3*3 convolution layer and the number of channels is scaled to the dimension of the target attribute by 1*1 convolution.
In step S230, the image feature is input into a specific position drift prediction sub-branch, so as to obtain a specific position drift when the image to be identified is converted into a thermodynamic diagram.
Since only integer coordinates of pixels can be represented in the thermodynamic diagram, a conversion error may occur when converting the coordinate position in the original image into the thermodynamic diagram, and thus in this embodiment, the conversion error from float to int when converting the image to be recognized into the thermodynamic diagram is predicted by a specific position-shifting predictor branch. In one particular embodiment, the image feature size in the input specific location drift predictor branch is 1/4 of the image size to be identified, assuming that the pixel coordinates in the image to be identified are (120, 121), the coordinates in the image feature are (30,30.25), represented in the thermodynamic diagram as positive integer coordinates (30, 30), and the location drift is (0,0.25) in this embodiment.
In one embodiment, the specific position drift prediction sub-branch comprises two channels, and if the fact that the image to be recognized simultaneously comprises a first preset target and a second preset target is detected, the abscissa value and the ordinate value of the specific position of the two preset targets need to be output, so that the four channels in the embodiment are respectively used for outputting the position drift of the abscissa value and the ordinate value of the specific position of the two preset targets. In one particular embodiment the position specific drift prediction sub-branch comprises a 3*3 convolution layer and a 1*1 convolution layer, wherein the attribute learning is performed by the 3*3 convolution layer and the number of channels is scaled to the dimension of the target attribute by 1*1 convolution.
In this embodiment, the central positions of the first preset target and the second preset target, the specific positions of the first preset target and the second preset target, and the position drift of the specific positions of the first preset target and the second preset target are respectively predicted by three attribute prediction branches, wherein whether the first preset target and/or the second preset target are detected can be determined according to the output result of the central position prediction branch of the preset targets, if the first preset target and/or the second preset target are detected, the preset targets to be output are determined (only the first preset target is detected to output the first preset target, only the second preset target is detected to output the second preset target, and meanwhile, the first preset target and the second preset target are detected to output the preset target with higher priority), and then the specific positions of the preset targets to be output are determined according to the output results of the specific position prediction sub-branch and the specific position drift prediction sub-branch.
In one embodiment, as shown in fig. 3, after performing object detection on the image to be identified, the method further includes steps S310 to S330.
Step S310, traversing thermodynamic diagrams corresponding to the central positions of the central position prediction branch output by the preset targets, and determining the prediction position confidence coefficient of the central position of the first preset target and the prediction position confidence coefficient of the central position of the second preset target.
In one embodiment, thermodynamic diagrams for the output of the center positions of two preset targets may be obtained by traversing each channel of the center position prediction branch of the preset targets. In one embodiment, the predicted positions outputted by any channel of the central position prediction branch of the preset target may include a plurality of predicted positions, so that the confidence degrees corresponding to the predicted positions are respectively read, that is, in this embodiment, the predicted position confidence degree of the central position of the first preset target and the predicted position confidence degree of the central position of the second preset target. The confidence level is also called reliability, or confidence level, confidence coefficient, and the estimated value and the overall parameter are within a certain allowable error range, and the corresponding probability is called confidence, and the larger the confidence, the more likely the target value approaches the correct value.
Step S320, if the maximum value of the confidence coefficient of the predicted position of the first preset target center position is greater than or equal to the preset threshold value, it is determined that the image to be identified contains the first preset target.
Step S330, if the maximum value of the confidence coefficient of the predicted position of the second preset target center position is greater than or equal to the preset threshold value, it is determined that the image to be identified contains the second preset target.
In this embodiment, the magnitude comparison is performed by presetting a preset threshold and comparing the preset threshold with the confidence coefficient, and further, in this embodiment, if the confidence coefficient of the center position is greater than the preset threshold, it means that a corresponding preset target is detected in the image to be identified. It may be appreciated that, in another embodiment, if the maximum value of the confidence coefficient of the predicted position of the center position of the first preset target is smaller than the preset threshold, it is determined that the image to be identified does not include the first preset target, and if the maximum value of the confidence coefficient of the predicted position of the center position of the second preset target is smaller than the preset threshold, it is determined that the image to be identified does not include the second preset target. The preset threshold may be set according to practical situations, for example, the preset threshold is set to 80%, 90%, etc.
In this embodiment, by comparing the confidence in the output result of the predicted branch of the preset target center position with the preset threshold, it is determined whether the image to be identified includes the first preset target and/or the second preset target, so that the accuracy of target detection can be improved.
In one embodiment, the method further comprises if it is determined that the image to be identified contains the first preset target and/or the second preset target according to the target detection result, reading the specific position according to the thermodynamic diagram corresponding to the specific position in the preset target, and restoring the specific position in the preset target to the same original coordinates as the image to be identified according to the specific position drift to obtain the specific position information corresponding to the preset target.
In this embodiment, after determining that the image to be identified includes the first preset target and/or the second preset target, coordinate information of a specific position corresponding to the preset target needs to be output, a coordinate of the specific position in the thermodynamic diagram may be predicted according to the specific position prediction sub-branch, a position drift of the specific position during conversion into the thermodynamic diagram may be predicted according to the specific position drift prediction sub-branch, and then the coordinate of the specific position in the thermodynamic diagram is restored to coordinate information with the same size as the image to be identified, so as to determine the position of the position in the image to be identified, and then text content at the position is identified, thereby achieving the purpose of point reading.
In one embodiment, the attribute prediction branches further comprise a size prediction branch of a preset target and a distance prediction branch of a preset target center position and a specific position, and parameters of the feature extraction network and each attribute prediction branch are adjusted based on sample prediction results output by each attribute prediction branch when the feature extraction network and each attribute prediction branch are trained.
In one embodiment, the size of the preset target predicts the size of the bounding box that the branch uses to output the preset target, and further, the size of the preset target predicts the width and height of the bounding box that the branch uses to output the preset target. The distance prediction branch of the central position and the specific position of the preset target is used for outputting the distance between the central position and the specific position of the preset target, for example, the distance between the position of the finger and the central position of the boundary frame of the arm, and for example, the distance between the position of the pen point and the central position of the boundary frame of the pen.
The training of the neural network can be achieved in any mode. When the feature extraction network and each attribute prediction branch are trained, training is carried out based on sample data carrying labeling data, the sample data is input into a preset neural network frame (comprising the feature extraction network and each attribute prediction branch), and sample prediction results output by 5 attribute prediction branches are output, wherein the sample prediction results comprise a thermodynamic diagram corresponding to the center position of a preset target of a sample image, a thermodynamic diagram corresponding to a specific position in the preset target of the sample image, a specific position drift in the preset target of the sample image, the size of a boundary frame of the preset target of the sample image and the distance between the specific position and the center position in the preset target of the sample image. And then, based on sample prediction results output by each attribute prediction branch, adjusting parameters of the feature extraction network and each attribute prediction branch, and stopping training when the termination condition is reached, so as to obtain a training and determining neural network, wherein the training and determining neural network comprises the feature extraction network and each attribute prediction branch.
In one embodiment, the sample data for training the neural network includes at least sample data including only a first preset target, sample images including only a second preset target, and sample images including both the first preset target and the second preset target.
In a specific embodiment, the predicted branch of the size of the preset target and the predicted branch of the distance between the center position of the preset target and the specific position are similar in structure, and each comprises a 3*3 convolution layer and a 1*1 convolution layer, wherein attribute learning is performed through the 3*3 convolution layer, and the number of channels is adjusted to the dimension of the target attribute through 1*1 convolution.
In this embodiment, when the network is trained, the output results of the size prediction branch of the preset target and the distance prediction branch of the center position and the specific position of the preset target are combined to perform training, and these are effective supervision information in the training stage, so that the training of the overall task is facilitated.
In one embodiment, feature extraction is performed on an image to be identified based on a feature extraction network determined in advance through training, and attribute prediction is performed on image features based on an attribute prediction branch network determined in advance through training. In one particular embodiment, the feature extraction network selects MobileNetV2 and the property prediction branch uses a convolutional network for different property predictions. In other embodiments, the feature extraction network and the attribute prediction branch network may employ other networks as well.
The application also provides an application scene, which applies the click-to-read position identification method.
Specifically, the application of the click-to-read position identification method in the application scene is as follows:
In the present embodiment, the feature extraction network and each output attribute prediction branch are collectively referred to as a click-through position recognition model. FIG. 4 is a schematic diagram of a network architecture for reading a location identification model in one embodiment.
1. And acquiring an image to be identified in the click-reading area, and adjusting the size of the image to be identified to 320 x 320 as the input of the model.
2. The click-to-read location recognition model is composed of two parts, a) a Backbone network Feature extraction part < Backbone Feature >, and b) an attribute prediction part < Attribute Predict >.
A) The Backbone network Feature extraction part adopts MobileNetV as a backbond, then up-samples the three later layers in a FPN (Feature pyramid networks, feature pyramid network) mode, performs Feature fusion to obtain image features, wherein the size of the image features is 1/4 of the size (80 x 80) of an input image to be identified, the Feature channel is N, and the balance of precision and running speed can be realized by adjusting the requirements of equipment on model running performance.
B) The attribute prediction part predicts different attributes by adopting a convolution network to detect two preset targets of a hand and a pen as main bodies and simultaneously regress coordinates of a fingertip and a pen point. And (3) from the feature map extracted in the a to each attribute branch, performing attribute learning through 3*3 convolution, and then adjusting the channel number to the dimension of the target attribute through 1*1 convolution, wherein the attribute comprises the following contents:
Center (n=2, channel 2), predicting the thermodynamic diagram of the position of the center point of the preset target (Heatmap), calculating the coordinates of the center point of the bounding box (BoundingBox) of the marked data hand and pen, and converting the coordinates of the center point into gaussian thermodynamic diagram form by taking the coordinates of the center point as the center, wherein fig. 5 (1) shows a schematic diagram of the center position of the preset target (hand) in one embodiment, and fig. 5 (2) shows a thermodynamic diagram corresponding to the center position of the preset target in one embodiment.
W/H (n=2, channel 2), height and width of the hand or pen BoundingBox, width and height of the preset target if present for each point.
A relation (n=4, channel 4), the coordinate difference between the coordinates of the fingertip or nib and the coordinates of the center point, i.e. the coordinate value of the center of the fingertip or nib relative to the center of the pen, is used to control the relative relationship between the target point and the center point
Keypoints (n=2, channel 2), coordinate thermodynamic diagrams of a fingertip or nib (same as Center operations), as shown in fig. 6 (1), are schematic diagrams of specific positions of a preset target (finger tips) in one embodiment, and as shown in fig. 6 (2), thermodynamic diagrams corresponding to the positions of the fingertip or nib in one embodiment.
Offset (n=4, channel 4), which is a floating point number after 1/4 downsampling from the original map coordinates, is a floating point number, but only integer coordinates can be represented on the thermodynamic diagram, and conversion errors from float to int are learned by Offset.
When the click-to-read position recognition model is trained, two types of data are adopted for model training data acquisition, namely a) single fingertip gesture data and pen data are not contained, and b) pen holding data comprise three contents of pen holding and pen non-holding hands. When data is organized in the training stage, data in ab is used for target detection at the same time, and an empty thermodynamic diagram is complemented in the data in the training target thermodynamic diagram format, so that the difference of three different gesture semantics of fingertip gestures, common gestures and pen holding gestures is enhanced.
3. After training, the model adopts the following point-reading coordinate solving mode when integrated in an SDK (software development kit), and only uses Center, keypoints, offset branches in the model operation stage, and the steps are as follows:
a) Traversing two channels of the Center, wherein a channel 0 represents a hand target, a channel 1 represents a pen target, finding out a position with the maximum value of each channel, if the value is larger than a set threshold A, indicating that the position has a valid hand or pen, otherwise, no preset target exists in the current frame.
B) Traversing two channels of Keypoints, wherein channel 0 represents the position of a fingertip, channel 1 represents the position of a nib, finding out the position with the maximum value of each channel, if the value is larger than a set threshold B, indicating that the position has a valid fingertip or nib, otherwise, the specific position of a preset target in the current frame is not available.
C) And b, taking an Offset value from the coordinate position corresponding to the coordinate position found in the b in the Offset, and restoring the Offset value into the original coordinate.
D) According to the results of a) and b), the position coordinates of the nib are returned if only the nib is present, the position coordinates of the fingertip are returned if only the fingertip is present, and the position coordinates of the nib with higher priority are returned according to the priorities of the fingertip and the nib if both the fingertip and the nib are present.
Further, after the click target coordinate position is output in the above embodiment, the corresponding text content is identified according to the click target coordinate position, so as to achieve the purpose of clicking, and as shown in fig. 7 (1) and fig. 7 (2), the click device identifies and outputs a corresponding text content schematic diagram according to the click target coordinate position.
According to the click-to-read position identification method, the target detection is taken as a main body in a multi-task learning mode of the deep learning model, the hand and the pen are detected simultaneously, the coordinates of the fingertip and the nib are returned simultaneously, fusion identification of the fingertip and the nib is realized in the single model, the problem of semantic confusion of common gestures and the click-to-read gestures can be solved well, the fingertip identification and the nib identification are supported simultaneously in a mode of setting the nib priority, and when the fingertip and the nib exist simultaneously, the nib is returned preferentially in a mode of setting the nib to have high priority. By the method, seamless interaction flow can be realized in the click-to-read scene, no interruption operation is caused between writing and clicking-to-read by a user, and learning efficiency and immersive experience are greatly improved. On the premise of not changing the click-to-read equipment based on the camera, a click-to-read interaction mode of the integration of the pen point and the fingertip is provided, the product functions of both the fingertip click-to-read and the pen point click-to-read are realized, and the use experience of a user is improved very well.
In one embodiment, the application further provides a click-to-read method, which comprises the steps of obtaining an image to be identified in a click-to-read area, identifying corresponding text content according to specific position information of a preset target in the image to be identified, and displaying the text content in a display screen. The method comprises the steps of carrying out target detection on an image to be identified to obtain a target detection result, outputting specific position information in a first preset target or a second preset target if the image to be identified contains the first preset target or the second preset target according to the target detection result, wherein the specific position information in the first preset target or the second preset target represents a designated target position in the preset target, and outputting specific position information in the preset target with higher priority according to the priority of the preset first preset target and the second preset target if the image to be identified contains the first preset target and the second preset target according to the target detection result.
For the specific embodiments of the above-mentioned click-to-read method, reference may be made to the above-mentioned embodiments of the click-to-read position identification method, which are not described herein.
It should be understood that, although the steps in the flowcharts referred to in the above embodiments are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least a part of the steps in the flowcharts referred to in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the execution of the steps or stages is not necessarily sequential, but may be performed alternately or alternately with at least a part of the steps or stages in other steps or other steps.
In one embodiment, as shown in fig. 8, a click-to-read position identifying apparatus is provided, which may employ a software module or a hardware module, or a combination of both, as a part of a click-to-read device, and specifically includes an image acquisition module 810, an object detection module 820, and a position information output module 830, where:
the image acquisition module 810 is configured to acquire an image to be identified in the click-to-read area;
the target detection module 820 is configured to perform target detection on the image to be identified, so as to obtain a target detection result;
The position information output module 830 is configured to output specific position information in a preset target with higher priority if the image to be identified is determined to contain the first preset target or the second preset target according to the target detection result, and the specific position information in the preset target with higher priority is output if the image to be identified is determined to contain the first preset target and the second preset target according to the target detection result.
According to the click-to-read position identification device, the image to be identified in the click-to-read area is obtained, target detection is carried out on the image to be identified, wherein the target detection comprises detection of a first preset target and a second preset target, on the basis of the target detection result, if only one preset target is included, the position of a specific position in the preset target is output, and if two preset targets are included, the specific position information in the preset target with higher priority is output, and the position is determined to be the click-to-read position. The device determines the position pointed by the user by carrying out target detection on the image, can realize the click-reading without a matched click-reading pen, supports the pointing-reading position pointed by two targets, for example, enriches the supportable click-reading modes by pointing-reading by a hand or pointing-reading by a pen point.
In one embodiment, the object detection module 820 of the device comprises a feature extraction unit for extracting image features of an image to be identified, and an attribute prediction unit for respectively inputting the image features into more than two attribute prediction branches to obtain attribute prediction results output by each attribute prediction branch, wherein the attribute prediction branches comprise a central position prediction branch of a preset object and a specific position prediction branch in the preset object.
In one embodiment, the feature extraction unit of the device is further configured to extract image features of an image to be identified through a feature extraction network, where the feature extraction network includes a continuous downsampling layer and a continuous upsampling layer, the number of downsampling layers is a first preset number, the number of upsampling layers is a second preset number, and the second preset number is less than or equal to the first preset number.
In one embodiment, the specific position prediction branch comprises a specific position prediction branch and a specific position drift prediction branch, the attribute prediction unit of the device comprises a central position thermodynamic diagram prediction branch used for inputting image features into the central position prediction branch of a preset target to obtain a thermodynamic diagram corresponding to the central position of the first preset target and a thermodynamic diagram corresponding to the central position of a second preset target, a specific position thermodynamic diagram prediction branch used for inputting image features into the specific position prediction branch to obtain a thermodynamic diagram corresponding to a specific position in the first preset target and a thermodynamic diagram corresponding to a specific position in the second preset target, and a specific position drift prediction branch used for inputting image features into the specific position drift prediction branch to obtain a specific position drift when an image to be recognized is converted into the thermodynamic diagram.
In one embodiment, the device further comprises a confidence reading module, configured to traverse thermodynamic diagrams corresponding to the central positions outputted by the central position prediction branch of the preset target, determine a predicted position confidence of the central position of the first preset target and a predicted position confidence of the central position of the second preset target, and in this embodiment, the target detection module 820 is further configured to determine that the image to be identified includes the first preset target if the maximum value of the predicted position confidence of the central position of the first preset target is greater than or equal to a preset threshold, and determine that the image to be identified includes the second preset target if the maximum value of the predicted position confidence of the central position of the second preset target is greater than or equal to a preset threshold.
In one embodiment, the device further comprises a position restoration module, wherein the position restoration module is used for restoring the specific position in the preset target to the same original coordinates as the image to be identified according to the specific position drift to obtain the specific position information of the corresponding preset target and obtain the specific position information of the corresponding preset target if the image to be identified is determined to contain the first preset target and/or the second preset target according to the target detection result.
In one embodiment, the attribute prediction branch further comprises a size prediction branch of a preset target and a distance prediction branch of a preset target center position and a specific position, and the device further comprises a model training module, wherein the model training module is used for adjusting parameters of the feature extraction network and each attribute prediction branch based on sample prediction results output by each attribute prediction branch when training the feature extraction network and each attribute prediction branch.
For specific embodiments of the click-to-read position recognition device, reference may be made to the above embodiments of the click-to-read position recognition method, which are not described herein. The above-mentioned respective modules in the click-reading position recognition apparatus may be implemented in whole or in part by software, hardware, and a combination thereof. The modules can be embedded in the processor in the point-and-read device in a hardware mode or can be independent of the processor in the point-and-read device in a software mode, and the processor can conveniently call and execute the operations corresponding to the modules.
In one embodiment, a pointing device is provided, the internal structure of which may be as shown in FIG. 9. The point-and-read device comprises a processor, a memory, a communication interface, a display screen and an input device which are connected through a system bus. Wherein the processor of the pointing device is adapted to provide computing and control capabilities. The memory of the pointing device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The communication interface of the click-to-read device is used for conducting wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a method of point-to-read location identification. The display screen of the point-reading device can be a liquid crystal display screen or an electronic ink display screen, and the input device of the point-reading device can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad and an image acquisition device (camera) which are arranged on the shell of the point-reading device, and can also be an external keyboard, a touch pad or a mouse and the like.
It will be appreciated by those skilled in the art that the structure shown in FIG. 9 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the pointing device to which the present inventive arrangements may be applied, and that a particular pointing device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
In one embodiment, there is also provided a point-and-read device including a memory and a processor, the memory storing a computer program, the processor implementing the steps of the method embodiments described above when executing the computer program.
In one embodiment, a computer-readable storage medium is provided, storing a computer program which, when executed by a processor, implements the steps of the method embodiments described above.
In one embodiment, a computer program product or computer program is provided that includes computer instructions stored in a computer readable storage medium. The processor of the pointing device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions, so that the pointing device performs the steps in the above-described method embodiments.
Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, or the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory. By way of illustration, and not limitation, RAM can be in various forms such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), etc.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The foregoing examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.

Claims (12)

Extracting image characteristics of the image to be identified, respectively inputting the image characteristics into more than two attribute prediction branches to obtain attribute prediction results output by each attribute prediction branch, wherein each attribute prediction branch comprises a specific position prediction sub-branch and a specific position drift prediction sub-branch, the attribute prediction results of the specific position prediction sub-branches comprise thermodynamic diagrams corresponding to specific positions, the attribute prediction results of the specific position drift prediction sub-branches comprise specific position drift when the image to be identified is converted into the thermodynamic diagrams, and the specific position drift represents conversion errors of coordinate positions in the image when the image to be identified is converted into the thermodynamic diagrams;
The target detection module comprises a feature extraction unit, an attribute prediction unit and a position information output module, wherein the feature extraction unit is used for extracting image features of the image to be identified, the attribute prediction unit is used for respectively inputting the image features into more than two attribute prediction branches to obtain attribute prediction results output by the attribute prediction branches, the attribute prediction branches comprise specific position prediction sub-branches and specific position drift prediction sub-branches, the attribute prediction results of the specific position prediction sub-branches comprise thermodynamic diagrams corresponding to the specific positions, the attribute prediction results of the specific position drift prediction sub-branches comprise specific position drift when the image to be identified is converted into the thermodynamic diagrams, the specific position drift represents conversion errors of coordinate positions in the image when the image to be identified is converted into the thermodynamic diagrams, the position information output module is used for determining that the image to be identified contains a first preset target or a second preset target according to the attribute prediction results output by the attribute prediction branches, and the specific position information output by the position information output module is used for reducing the specific position in the preset targets to the same original coordinates as the image to be identified according to the specific position drift, and the first preset target or the second preset target is higher than the preset target or the second preset target is contained in the preset target, and the first preset target is predicted according to the attribute prediction results output by the attribute prediction branches.
CN202110488678.9A2021-05-062021-05-06 Point reading position identification method, device, point reading device and storage mediumActiveCN113762045B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202110488678.9ACN113762045B (en)2021-05-062021-05-06 Point reading position identification method, device, point reading device and storage medium

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202110488678.9ACN113762045B (en)2021-05-062021-05-06 Point reading position identification method, device, point reading device and storage medium

Publications (2)

Publication NumberPublication Date
CN113762045A CN113762045A (en)2021-12-07
CN113762045Btrue CN113762045B (en)2025-08-26

Family

ID=78786999

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202110488678.9AActiveCN113762045B (en)2021-05-062021-05-06 Point reading position identification method, device, point reading device and storage medium

Country Status (1)

CountryLink
CN (1)CN113762045B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN108536287A (en)*2018-03-262018-09-14深圳市深晓科技有限公司A kind of method and device indicating reading according to user
CN111523486A (en)*2020-04-242020-08-11重庆理工大学 A Grab Detection Method for Robot Arm Based on Improved CenterNet

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN110058705A (en)*2019-04-282019-07-26视辰信息科技(上海)有限公司It draws this aid reading method, calculate equipment, point reading side apparatus and electronic equipment
CN112016346A (en)*2019-05-282020-12-01阿里巴巴集团控股有限公司Gesture recognition method, device and system and information processing method
CN112486344B (en)*2020-12-312022-05-27卓德善Judgment and display method for pen point estimated falling point

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN108536287A (en)*2018-03-262018-09-14深圳市深晓科技有限公司A kind of method and device indicating reading according to user
CN111523486A (en)*2020-04-242020-08-11重庆理工大学 A Grab Detection Method for Robot Arm Based on Improved CenterNet

Also Published As

Publication numberPublication date
CN113762045A (en)2021-12-07

Similar Documents

PublicationPublication DateTitle
US20230177821A1 (en)Document image understanding
CN114332484B (en)Key point detection method, device, computer equipment and storage medium
US11681409B2 (en)Systems and methods for augmented or mixed reality writing
CN112101344B (en)Video text tracking method and device
US11704357B2 (en)Shape-based graphics search
CN111209897B (en)Video processing method, device and storage medium
CN113673432B (en) Handwriting recognition method, touch display device, computer device and storage medium
US12386885B2 (en)Picture search method and apparatus, electronic device, computer-readable storage medium
CN113537187B (en)Text recognition method, text recognition device, electronic equipment and readable storage medium
CN113516113A (en)Image content identification method, device, equipment and storage medium
CN113449726B (en) Text matching and recognition method and device
CN111709338B (en)Method and device for table detection and training method of detection model
WO2024120223A1 (en)Image processing method and apparatus, and device, storage medium and computer program product
CN111291758B (en) Method and device for recognizing stamp characters
CN110750501A (en)File retrieval method and device, storage medium and related equipment
CN113516735B (en) Image processing method, device, computer readable medium and electronic device
CN115019324A (en) Interactive method, device, computer device and storage medium for text scanning
CN113569838A (en)Text recognition method and device based on text detection algorithm
CN113762045B (en) Point reading position identification method, device, point reading device and storage medium
CN118230339A (en)Text recognition method and device and electronic equipment
CN117392698A (en)Method, device, equipment and storage medium for identifying hand-drawn circuit diagram
US12079950B2 (en)Image processing method and apparatus, smart microscope, readable storage medium and device
CN113128496B (en)Method, device and equipment for extracting structured data from image
CN115147856B (en)Table information extraction method and electronic equipment
CN116884019A (en) Signature recognition method, device, computer equipment and storage medium

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp