Movatterモバイル変換


[0]ホーム

URL:


CN104063705A - Handwriting feature extracting method and device - Google Patents

Handwriting feature extracting method and device
Download PDF

Info

Publication number
CN104063705A
CN104063705ACN201410247878.5ACN201410247878ACN104063705ACN 104063705 ACN104063705 ACN 104063705ACN 201410247878 ACN201410247878 ACN 201410247878ACN 104063705 ACN104063705 ACN 104063705A
Authority
CN
China
Prior art keywords
handwriting data
line diagnostic
gravity
center
grid
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410247878.5A
Other languages
Chinese (zh)
Other versions
CN104063705B (en
Inventor
曹骥
李健
张连毅
武卫东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing InfoQuick SinoVoice Speech Technology Corp.
Original Assignee
JIETONG HUASHENG SPEECH TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by JIETONG HUASHENG SPEECH TECHNOLOGY Co LtdfiledCriticalJIETONG HUASHENG SPEECH TECHNOLOGY Co Ltd
Priority to CN201410247878.5ApriorityCriticalpatent/CN104063705B/en
Publication of CN104063705ApublicationCriticalpatent/CN104063705A/en
Application grantedgrantedCritical
Publication of CN104063705BpublicationCriticalpatent/CN104063705B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Landscapes

Abstract

The invention provides a handwriting feature extracting method and device. The handwriting feature extracting method comprises the steps that handwriting data are collected according to a writing time sequence of the handwriting data and are preprocessed, and the preprocessed handwriting data are obtained; evenly-spaced segmentation is conducted on the preprocessed handwriting data according to the time sequence, and a plurality of stroke segment vectors are obtained; online features of the stroke segment vectors are obtained, wherein the online features comprise angles and central coordinates of the stroke segment vectors; the gravity center of the handwriting data is obtained according to the preprocessed handwriting data, and off-line features of the preprocessed handwriting data are extracted according to the gravity center; numerical value normalization processing is conducted according to the online features and the off-line features, and results of numerical value normalization processing are regarded as features of the collected handwriting data. Thus, the handwriting feature extracting method and device solve the problem that the accurate rate of online handwriting data feature recognition is low.

Description

The method and apparatus that a kind of handwriting characteristic extracts
Technical field
The application relates to on-line handwritten Chinese character recognition technology field, particularly relates to the method and apparatus that a kind of handwriting characteristic extracts.
Background technology
The handwriting data of hand script Chinese input equipment, due to writer's writing style, the difference of the precision of person's handwriting collecting device, for identical word, there is very large deformation and the difference on figure, therefore, for the feature extraction of handwriting data, have higher requirement, need to be when handwriting data feature extraction can effectively expressing handwriting data essence, embody the identical point of the handwriting data of same word, and distinguish the difference of the handwriting data of different words.
The feature extracting method of current traditional handwriting data is to carry out sequential scanning method and grid statistical method is carried out the extraction of handwriting data feature by the data of identifying the handwriting.Wherein, it is that some position or the angle of identifying the handwriting in data according to the order of writing scans that handwriting data carries out sequential scanning method, the method is not considered the angle change information of the track in handwriting data feature, does not consider the similarity between the adjacent angle of handwriting data feature yet.Network statistics method is according to wide and contour mode, to carry out the extraction of handwriting data feature, the method is not considered the information of the symmetrical projection of handwriting data feature, and there is the problem that extraction is too mechanical, regular and deformation adaptability is not good of handwriting feature in the method.
Said method, exists handwriting data feature extraction not comprehensively and the not good problem of handwriting data feature adaptability, and above problem has had a strong impact on the classifying quality of follow-up sorter, and then has caused the low problem of hand script Chinese input equipment handwriting data feature recognition accuracy.
Summary of the invention
The method and apparatus that the application provides a kind of handwriting characteristic to extract, to solve the low problem of hand script Chinese input equipment handwriting data feature recognition accuracy.
In order to address the above problem, the application discloses a kind of method that handwriting characteristic extracts, and comprising:
The time series of writing according to handwriting data gathers handwriting data and carries out pre-service, obtains pretreated handwriting data;
According to described time series, described pretreated handwriting data is carried out to uniformly-spaced segmentation, obtain a plurality of stroke vector paragraphs;
Obtain the on-line diagnostic of described a plurality of stroke vector paragraphs, described on-line diagnostic comprises angle and the centre coordinate of described a plurality of stroke vector paragraphs;
According to described pretreated handwriting data, obtain the center of gravity of described handwriting data, according to described center of gravity, extract the off-line diagnostic of described pretreated handwriting data;
According to described on-line diagnostic and off-line diagnostic, carry out numerical value normalized, the feature using the result of described numerical value normalized as the described handwriting data gathering.
Preferably, described time series of writing according to handwriting data gathers handwriting data and carries out pre-service, and the step that obtains pretreated handwriting data comprises:
The handwriting data of collection is carried out after linear dimension Regularization according to the time series of writing, obtain each natural stroke segment length;
According to described each the natural stroke segment length obtaining, obtain the length of the handwriting data that formed by described each natural stroke section.
Preferably, the angle of described a plurality of stroke vector paragraphs comprises: the angle between the stroke vector paragraph that the angle of the angle of each stroke vector paragraph and X-axis positive dirction, each stroke vector paragraph and Y-axis positive dirction and each stroke vector paragraph are adjacent.
Preferably, it is characterized in that, described off-line diagnostic comprises projection off-line diagnostic or grid off-line diagnostic or fan-shaped off-line diagnostic or profile off-line diagnostic.
Preferably, when described off-line diagnostic is described projection off-line diagnostic, the step that the described center of gravity of described foundation is extracted the off-line diagnostic of described pretreated handwriting data comprises:
The center of gravity of described handwriting data of take is carried out divided in horizontal direction to described pretreated handwriting data and vertical direction is cut apart as cut-point, by described pretreated handwriting data He Xia subregion in subregion from divided in horizontal direction is, from vertical direction, be divided into left half region and right half region, the centre coordinate that scans respectively each stroke vector paragraph number that subregion, lower subregion, left half region and right half region occur on described;
When described off-line diagnostic is described grid off-line diagnostic, the step that the described center of gravity of described foundation is extracted the off-line diagnostic of described pretreated handwriting data comprises:
Eight directions of definition two dimensional surface, East, West, South, North, the southeast, northeast, southwest, northwest;
The center of gravity of described handwriting data of take is carried out divided in horizontal direction to described pretreated handwriting data and vertical direction is cut apart as cut-point, by described pretreated handwriting data grid and lower grid from divided in horizontal direction is, from vertical direction, be divided into left grid and right grid, the number occurring in eight directions of the centre coordinate that scans respectively each stroke vector paragraph grid, lower grid, left grid and right grid on described;
When described off-line diagnostic is described fan-shaped off-line diagnostic, the step that the described center of gravity of described foundation is extracted the fan-shaped off-line diagnostic of described pretreated handwriting data comprises:
Eight directions of definition two dimensional surface, East, West, South, North, the southeast, northeast, southwest, northwest;
The center of gravity of described handwriting data of take is the center of circle, and described pretreated handwriting data is divided into a plurality of sector regions, scans respectively the number that the centre coordinate of each stroke vector paragraph occurs in eight directions;
When described off-line diagnostic is described profile off-line diagnostic, the step that the described center of gravity of described foundation is extracted the profile off-line diagnostic of described pretreated handwriting data comprises:
Eight directions of definition two dimensional surface, East, West, South, North, the southeast, northeast, southwest, northwest;
The center of gravity of described handwriting data of take is end point, scans respectively the number that the centre coordinate of each stroke vector paragraph occurs in eight directions.
In order to address the above problem, disclosed herein as well is the device that a kind of handwriting characteristic extracts, comprising:
Acquisition module, gathers handwriting data and carries out pre-service for the time series of writing according to handwriting data, obtains pretreated handwriting data;
Cut apart module, for according to described time series, described pretreated handwriting data being carried out to uniformly-spaced segmentation, obtain a plurality of stroke vector paragraphs;
Computing module, for obtaining the on-line diagnostic of described a plurality of stroke vector paragraphs, described on-line diagnostic comprises angle and the centre coordinate of described a plurality of stroke vector paragraphs;
Extraction module, for obtain the center of gravity of described handwriting data according to described pretreated handwriting data, extracts the off-line diagnostic of described pretreated handwriting data according to described center of gravity;
Processing module, for according to described on-line diagnostic and off-line diagnostic, carries out numerical value normalized, the feature using the result of described numerical value normalized as the described handwriting data gathering.
Preferably, described acquisition module comprises: linear gauge mould preparation piece, for the handwriting data of collection is carried out after linear dimension Regularization according to the time series of writing, obtains each natural stroke segment length;
Length acquisition module, for described each the natural stroke segment length according to obtaining, obtains the length of the handwriting data that is comprised of described each natural stroke section.
Preferably, the angle of described a plurality of stroke vector paragraphs comprises: the angle between the stroke vector paragraph that the angle of the angle of each stroke vector paragraph and X-axis positive dirction, each stroke vector paragraph and Y-axis positive dirction and each stroke vector paragraph are adjacent.
Preferably, described off-line diagnostic comprises projection off-line diagnostic or grid off-line diagnostic or fan-shaped off-line diagnostic or profile off-line diagnostic.
Preferably, when described off-line diagnostic is described projection off-line diagnostic, described extraction module is when extracting the off-line diagnostic of described pretreated handwriting data according to described center of gravity:
The center of gravity of described handwriting data of take is carried out divided in horizontal direction to described pretreated handwriting data and vertical direction is cut apart as cut-point, by described pretreated handwriting data He Xia subregion in subregion from divided in horizontal direction is, from vertical direction, be divided into left half region and right half region, the centre coordinate that scans respectively each stroke vector paragraph number that subregion, lower subregion, left half region and right half region occur on described;
When described off-line diagnostic is described grid off-line diagnostic, described extraction module is when extracting the off-line diagnostic of described pretreated handwriting data according to described center of gravity:
Eight directions of definition two dimensional surface, East, West, South, North, the southeast, northeast, southwest, northwest;
The center of gravity of described handwriting data of take is carried out divided in horizontal direction to described pretreated handwriting data and vertical direction is cut apart as cut-point, by described pretreated handwriting data grid and lower grid from divided in horizontal direction is, from vertical direction, be divided into left grid and right grid, the number occurring in eight directions of the centre coordinate that scans respectively each stroke vector paragraph grid, lower grid, left grid and right grid on described;
When described off-line diagnostic is described fan-shaped off-line diagnostic, described extraction module is when extracting the fan-shaped off-line diagnostic of described pretreated handwriting data according to described center of gravity:
Eight directions of definition two dimensional surface, East, West, South, North, the southeast, northeast, southwest, northwest;
The center of gravity of described handwriting data of take is the center of circle, and described pretreated handwriting data is divided into a plurality of sector regions, scans respectively the number that the centre coordinate of each stroke vector paragraph occurs in eight directions;
When described off-line diagnostic is described profile off-line diagnostic, described extraction module is when extracting the profile off-line diagnostic of described pretreated handwriting data according to described center of gravity:
Eight directions of definition two dimensional surface, East, West, South, North, the southeast, northeast, southwest, northwest;
The center of gravity of described handwriting data of take is end point, scans respectively the number that the centre coordinate of each stroke vector paragraph occurs in eight directions.
Compared with prior art, the application comprises following advantage:
First, the application carries out uniformly-spaced segmentation according to time series to pretreated handwriting data, obtains the on-line diagnostic of a plurality of stroke vector paragraphs, and described on-line diagnostic comprises angle and the centre coordinate of a plurality of stroke vector paragraphs.By calculating angle and the centre coordinate of a plurality of stroke vector paragraphs, thereby the feature extraction that makes handwriting data has covered local characteristics and the global property of handwriting data, avoid only considering in existing method the position of handwriting data unique point, thereby caused the incomplete problem of handwriting data feature extraction.
Secondly, the application is by obtaining the center of gravity of handwriting data to pretreated handwriting data, and carry out symmetrical projection according to center of gravity, then extract local feature and the global property of the handwriting data of adjacent area, thus the too mechanical and not good problem of deformation adaptability while having avoided wide and contour mode to extract handwriting data feature.
Again, the application, by the on-line diagnostic extracting and the combination of off-line diagnostic, has obtained effective handwriting data feature, and then has guaranteed the reliability of follow-up sorter training, and significantly improved the classify accuracy of sorter, finally improved the recognition accuracy of hand script Chinese input equipment.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of the method for a kind of handwriting characteristic extraction in the embodiment of the present application one;
Fig. 2 is the handwriting data schematic diagram that the handwriting data in the application gathers after collecting device;
Fig. 3 is the process flow diagram of the method for a kind of handwriting characteristic extraction in the embodiment of the present application two;
Fig. 4 is stroke vector paragraph in the application and the angle schematic diagram of adjacent stroke vector paragraph;
Fig. 5 is the schematic diagram of take the projection off-line diagnostic that the center of gravity of handwriting data is cut-point in the application;
Fig. 6 is the schematic diagram of eight directions of the two dimensional surface in the application;
Fig. 7 is the schematic diagram of take the fan-shaped off-line diagnostic that the center of gravity of handwriting data is cut-point in the application;
Fig. 8 is the structured flowchart of a kind of handwriting characteristic extraction element in the embodiment of the present application three.
Embodiment
For the application's above-mentioned purpose, feature and advantage can be become apparent more, below in conjunction with the drawings and specific embodiments, the application is described in further detail.
With reference to Fig. 1, show the method that a kind of handwriting characteristic in the embodiment of the present application one extracts, comprising:
Step 101: the time series of writing according to handwriting data gathers handwriting data and carries out pre-service, obtains pretreated handwriting data.
Wherein, the time series that handwriting data is write is obtained by collecting device.
The handwriting data that a kind of handwriting data gathers after collecting device as shown in Figure 2, wherein, handwriting data is after collecting device, collect a series of data coordinates point, data coordinates point comprises abscissa value and the ordinate value of each point, and, the end mark of each stroke and the end mark of whole word.For example: the data coordinates point collecting comprises (X0, Y0), (X1, Y1), (X2, Y2) ... (Xn, Yn).The essential characteristic that includes handwriting data in a series of data coordinates points that collect, can process according to these features data of identifying the handwriting, and then extracts handwriting characteristic.
Step 102: according to described time series, described pretreated handwriting data is carried out to uniformly-spaced segmentation, obtain a plurality of stroke vector paragraphs.
According to collecting device, the seasonal effect in time series stroke of user writing is carried out accurately to uniformly-spaced segmentation, the handwriting data after segmentation is stroke vector paragraph.
Step 103: obtain the on-line diagnostic of described a plurality of stroke vector paragraphs, described on-line diagnostic comprises angle and the centre coordinate of described a plurality of stroke vector paragraphs.
The centre coordinate of stroke vector paragraph can obtain by following formula:
(Xi,Yi)=(Xi+Xi+12,Yi+Yi+12)
Wherein, Xifor the origin coordinates of stroke vector paragraph, Xi+1termination coordinate for stroke vector paragraph.
Step 104: obtain the center of gravity of described handwriting data according to described pretreated handwriting data, extract the off-line diagnostic of described pretreated handwriting data according to described center of gravity.
It should be noted that, above-mentioned steps 103 and 104 is not limited to said sequence when reality is carried out, and also can step 104 carry out before step 103, can also the two executed in parallel.
Step 105: according to described on-line diagnostic and off-line diagnostic, carry out numerical value normalized, the feature using the result of described numerical value normalized as the described handwriting data gathering.
Wherein, the scope of the result of numerical value normalized can suitably be set according to actual conditions by those skilled in the art, is preferably 0-8.
Feature refers to the special nature that a certain material possesses self, is basic sign and the sign that is different from other materials.For the handwriting characteristic of hand script Chinese input equipment, refer to handwriting mode and characteristic in shape.
The result of numerical value normalized can be carried out to the identification of word in the following manner as the feature of the handwriting data gathering.
First, by the feature of handwriting data and the template comparison of character library of extracting, the word that the characteristic matching rate of the handwriting data with extracting is large is listed, for user, selected, user selects, after correct input characters, to complete the identification of handwriting.
Wherein, the process of establishing of the template of character library comprises: to word known in dictionary, by trainer's handwriting input, set up the corresponding relation of dictionary Chinese word and handwriting, the word of trainer's handwriting input is as the template of known word.Same word can be by a plurality of trainer's handwriting inputs, and repeatedly, thereby a word can corresponding a plurality of hand-written templates.When coupling, can be by a plurality of template matches of the word of handwriting input and a plurality of words.
It should be noted that, the application has only enumerated and a kind ofly the feature of the handwriting data of extraction is carried out to word has known method for distinguishing, can adopt any mode in prior art to carry out word identification to the feature of extracted handwriting data, and the application is not limited
By the present embodiment, first, the application carries out uniformly-spaced segmentation according to time series to pretreated handwriting data, obtains the on-line diagnostic of a plurality of stroke vector paragraphs, and described on-line diagnostic comprises angle and the centre coordinate of a plurality of stroke vector paragraphs.By calculating angle and the centre coordinate of a plurality of stroke vector paragraphs, thereby the feature extraction that makes handwriting data has covered local characteristics and the global property of handwriting data, avoid only considering in existing method the position of handwriting data unique point, thereby caused the incomplete problem of handwriting data feature extraction.
Secondly, the application is by obtaining the center of gravity of handwriting data to pretreated handwriting data, and carry out symmetrical projection according to center of gravity, then extract local feature and the global property of the handwriting data of adjacent area, thus the too mechanical and not good problem of deformation adaptability while having avoided wide and contour mode to carry out the extraction of handwriting characteristic.
Again, the application, by the on-line diagnostic extracting and the combination of off-line diagnostic, has obtained effective handwriting data feature, and then has guaranteed the reliability of follow-up sorter training, and significantly improved the classify accuracy of sorter, finally improved the recognition accuracy of hand script Chinese input equipment.
With reference to Fig. 3, show the method that a kind of handwriting characteristic in the embodiment of the present application two extracts.
In the present embodiment, a kind of method that handwriting characteristic extracts, comprising:
Step 301: the time series of writing according to handwriting data gathers handwriting data and carries out pre-service, obtains pretreated handwriting data.
In the present embodiment, by collecting device, collect a series of coordinate points of handwriting data.Wherein, coordinate points comprises abscissa value and the ordinate value of each coordinate points, and the starting point coordinate of each stroke, the termination coordinate of each stroke, the end coordinate of each stroke and the end coordinate of whole word.
After collecting handwriting data, the time series of writing according to handwriting data gathers handwriting data and carries out pre-service, obtains pretreated handwriting data.
Step 302: the handwriting data of collection is carried out to linear dimension Regularization according to the time series of writing, and the regular size to 96*96, then obtains each natural stroke segment length.
Linear dimension Regularization refers to the size of unifying the handwriting data of collection by stretching, adopts the conversion such as rotation, translation to change the position of the handwriting data gathering.
Nature stroke section refer to user in writing process horizontal, vertical, skim, right-falling stroke.
By following formula, obtain each natural stroke segment length:
li=(Xi-Xi+1)2+(Yi-Yi+1)22---(1)
Wherein, Xifor the starting point abscissa value of natural stroke section, Xi+1for the termination abscissa value of natural stroke section, Yifor the starting point ordinate value of natural stroke section, Yi+1ordinate value for the termination coordinate of natural stroke section.
Can obtain each natural stroke segment length according to (1) formula, according to each the natural stroke segment length obtaining, obtain the length of the handwriting data that formed by each natural stroke section.
Can obtain by following formula the length of handwriting data:
l=Σi=1n-1li---(2)
Wherein, n is the number of point, lifor natural stroke segment length.
It should be noted that, the scope of linear dimension Regularization can suitably be set according to actual conditions by those skilled in the art, is preferably the regular size to 96*96.
Step 303: according to described time series, described pretreated handwriting data is carried out to uniformly-spaced segmentation, obtain a plurality of stroke vector paragraphs;
Can obtain stroke vector paragraph by following formula:
lv=lnd---(3)
Wherein, lvfor stroke vector paragraph, ndfor the dimension of proper vector, ndvalue arbitrarily, as long as length that can decile handwriting data.
Step 304: obtain the on-line diagnostic of described a plurality of stroke vector paragraphs, described on-line diagnostic comprises angle and the centre coordinate of described a plurality of stroke vector paragraphs; The angle of described a plurality of stroke vector paragraphs comprises: the angle between the stroke vector paragraph that the angle of the angle of each stroke vector paragraph and X-axis positive dirction, each stroke vector paragraph and Y-axis positive dirction and each stroke vector paragraph are adjacent.
Wherein, the angular range of the angular range of each stroke vector paragraph and X-axis positive dirction and each stroke vector paragraph and Y-axis positive dirction is 0-180 degree.
The angle of a kind of stroke vector paragraph and adjacent stroke vector paragraph as shown in Figure 4.Wherein, stroke vector paragraph 0 is adjacent vector with stroke vector paragraph 1, and the angle between stroke vector paragraph is obtained by stroke vector paragraph 0 and 1 calculating of stroke vector paragraph, and the scope of the angle between the stroke vector paragraph that stroke vector paragraph is adjacent is 0-180 degree.
Step 305: obtain the center of gravity of described handwriting data according to described pretreated handwriting data, extract the off-line diagnostic of described pretreated handwriting data according to described center of gravity;
Calculate the centre coordinate of each the natural stroke segment length after linear dimension Regularization, by following formula, calculate the centre coordinate of nature stroke segment length:
(Xi′,Yi′)=(Xi+Xi+12,Yi+Yi+12)---(4)
Xifor the initial abscissa value of natural stroke section, Xi+1for natural stroke section is ended abscissa value; Yifor the starting point ordinate value of natural stroke section, Yi+1ordinate value for the termination coordinate of natural stroke section.
According to formula (1) and formula (4), obtain the center of gravity of handwriting data, the center of gravity formula of handwriting data:
(Xg,Yg)=(Σi=1n-1liXi′Σi=1n-1li,Σi=1n-1liYi′Σi=1n-1li)
Wherein, each meaning of parameters in above-mentioned center of gravity formula is identical with formula (1) and (4).
Preferably, described off-line diagnostic comprises projection off-line diagnostic or grid off-line diagnostic or fan-shaped off-line diagnostic or profile off-line diagnostic.
Preferably, when described off-line diagnostic is described projection off-line diagnostic, the step that the described center of gravity of described foundation is extracted the off-line diagnostic of described pretreated handwriting data comprises:
The center of gravity of described handwriting data of take is carried out divided in horizontal direction to described pretreated handwriting data and vertical direction is cut apart as cut-point, by described pretreated handwriting data He Xia subregion in subregion from divided in horizontal direction is, from vertical direction, be divided into left half region and right half region.
The projection off-line diagnostic that the center of gravity of handwriting data of take is cut-point as shown in Figure 5.Wherein, center of gravity represents with solid round dot, and the region after divided represents with grid.Then the centre coordinate that each stroke vector paragraph is scanned respectively in the upper subregion after cutting apart, lower subregion, left half region and the right half region number that subregion, lower subregion, left half region and right half region occur on described.Wherein, be divided into handwriting data in He Xia subregion, subregion and scan according to mode from left to right or mode from right to left, be divided into handwriting data in left half region and right half region and scan according to mode from top to bottom or mode from top to bottom.
When described off-line diagnostic is described grid off-line diagnostic, the step that the described center of gravity of described foundation is extracted the off-line diagnostic of described pretreated handwriting data comprises:
Eight directions of definition two dimensional surface, East, West, South, North, the southeast, northeast, southwest, northwest, eight concrete directions are as shown in Figure 6.
The center of gravity of described handwriting data of take is carried out divided in horizontal direction to described pretreated handwriting data and vertical direction is cut apart as cut-point, by described pretreated handwriting data grid and lower grid from divided in horizontal direction is, from vertical direction, be divided into left grid and right grid.Wherein, the lattice number of upper grid and lower grid is consistent, and the lattice number of left grid and right grid is also consistent; And upper grid height is consistent with lower grid height, left mesh width is also consistent with right mesh width.The number occurring in eight directions of the centre coordinate that then scans respectively each stroke vector paragraph in the upper grid after cutting apart, lower grid, left grid and right grid grid, lower grid, left grid and right grid on described.
When described off-line diagnostic is described fan-shaped off-line diagnostic, the step that the described center of gravity of described foundation is extracted the fan-shaped off-line diagnostic of described pretreated handwriting data comprises:
Eight directions of definition two dimensional surface, East, West, South, North, the southeast, northeast, southwest, northwest;
The center of gravity of described handwriting data of take is the center of circle, and described pretreated handwriting data is divided into a plurality of sector regions.For example: take center of gravity as the center of circle, center of gravity represents (as the black circle of circle centre position in Fig. 7) with solid round dot, and handwriting data is divided into 16 sector regions, as shown in Figure 7.Scanning is divided into the number that the centre coordinate of each stroke vector paragraph in 16 sector regions occurs in eight directions respectively.
When described off-line diagnostic is described profile off-line diagnostic, the step that the described center of gravity of described foundation is extracted the profile off-line diagnostic of described pretreated handwriting data comprises:
Eight directions of definition two dimensional surface, East, West, South, North, the southeast, northeast, southwest, northwest;
The center of gravity of described handwriting data of take is end point, scans respectively the number that the centre coordinate of each stroke vector paragraph occurs in eight directions, wherein, can scan eastwards from east to west or from west, and concrete scan mode the application is not limited.
It should be noted that, the center of gravity of handwriting data is the center of circle, divides a plurality of sector regions, can according to actual conditions, suitably divide sector region by those skilled in the art, is preferably 16 sector regions.
Step 306: according to described on-line diagnostic and off-line diagnostic, carry out numerical value normalized, the feature using the result of described numerical value normalized as the described handwriting data gathering.
According to described on-line diagnostic and off-line diagnostic, carry out numerical value normalized, the feature using the result of described numerical value normalized as the described handwriting data gathering.
Explanation based on said method embodiment, the application also provides the embodiment of corresponding a kind of handwriting characteristic extraction element, realizes the content described in said method embodiment.
Referring to Fig. 8, show the structured flowchart of a kind of handwriting characteristic extraction element in the embodiment of the present application four, specifically can comprise: acquisition module, for the time series of writing according to handwriting data, gather handwriting data and carry out pre-service, obtain pretreated handwriting data.
Cut apart module, for according to described time series, described pretreated handwriting data being carried out to uniformly-spaced segmentation, obtain a plurality of stroke vector paragraphs.
Computing module, for obtaining the on-line diagnostic of described a plurality of stroke vector paragraphs, described on-line diagnostic comprises angle and the centre coordinate of described a plurality of stroke vector paragraphs.
Extraction module, for obtain the center of gravity of described handwriting data according to described pretreated handwriting data, extracts the off-line diagnostic of described pretreated handwriting data according to described center of gravity.
Processing module, for according to described on-line diagnostic and off-line diagnostic, carries out numerical value normalized, the feature using the result of described numerical value normalized as the described handwriting data gathering.
Preferably, described acquisition module comprises: linear gauge mould preparation piece, for the handwriting data of collection is carried out after linear dimension Regularization according to the time series of writing, obtains each natural stroke segment length.
Length acquisition module, for described each the natural stroke segment length according to obtaining, obtains the length of the handwriting data that is comprised of described each natural stroke section.
Preferably, the angle of described a plurality of stroke vector paragraphs comprises: the angle of the angle of each stroke vector paragraph and X-axis positive dirction, each stroke vector paragraph and Y-axis positive dirction and, the angle between the stroke vector paragraph that each stroke vector paragraph is adjacent.
Preferably, described off-line diagnostic comprises projection off-line diagnostic or grid off-line diagnostic or fan-shaped off-line diagnostic or profile off-line diagnostic.
Preferably, when described off-line diagnostic is described projection off-line diagnostic, described extraction module is when extracting the off-line diagnostic of described pretreated handwriting data according to described center of gravity:
The center of gravity of described handwriting data of take is carried out divided in horizontal direction to described pretreated handwriting data and vertical direction is cut apart as cut-point, by described pretreated handwriting data He Xia subregion in subregion from divided in horizontal direction is, from vertical direction, be divided into left half region and right half region, the centre coordinate that scans respectively each stroke vector paragraph number that subregion, lower subregion, left half region and right half region occur on described.
When described off-line diagnostic is described grid off-line diagnostic, described extraction module is when extracting the off-line diagnostic of described pretreated handwriting data according to described center of gravity:
Eight directions of definition two dimensional surface, East, West, South, North, the southeast, northeast, southwest, northwest;
The center of gravity of described handwriting data of take is carried out divided in horizontal direction to described pretreated handwriting data and vertical direction is cut apart as cut-point, by described pretreated handwriting data grid and lower grid from divided in horizontal direction is, from vertical direction, be divided into left grid and right grid, the number occurring in eight directions of the centre coordinate that scans respectively each stroke vector paragraph grid, lower grid, left grid and right grid on described.
When described off-line diagnostic is described fan-shaped off-line diagnostic, described extraction module is when extracting the fan-shaped off-line diagnostic of described pretreated handwriting data according to described center of gravity:
Eight directions of definition two dimensional surface, East, West, South, North, the southeast, northeast, southwest, northwest;
The center of gravity of described handwriting data of take is the center of circle, and described pretreated handwriting data is divided into a plurality of sector regions, scans respectively the number that the centre coordinate of each stroke vector paragraph occurs in eight directions.
When described off-line diagnostic is described profile off-line diagnostic, described extraction module is when extracting the profile off-line diagnostic of described pretreated handwriting data according to described center of gravity:
Eight directions of definition two dimensional surface, East, West, South, North, the southeast, northeast, southwest, northwest;
The center of gravity of described handwriting data of take is end point, scans respectively the number that the centre coordinate of each stroke vector paragraph occurs in eight directions.
In sum, a kind of handwriting characteristic extraction element of the embodiment of the present application mainly comprises following advantage:
First, the application carries out uniformly-spaced segmentation according to time series to pretreated handwriting data, obtains the on-line diagnostic of a plurality of stroke vector paragraphs, and described on-line diagnostic comprises angle and the centre coordinate of a plurality of stroke vector paragraphs.By calculating angle and the centre coordinate of a plurality of stroke vector paragraphs, thereby the feature extraction that makes handwriting data has covered local characteristics and the global property of handwriting data, avoid only considering in existing method the position of handwriting data unique point, thereby caused the incomplete problem of handwriting data feature extraction.
Secondly, the application is by obtaining the center of gravity of handwriting data to pretreated handwriting data, and carry out symmetrical projection according to center of gravity, then extract local feature and the global property of the handwriting data of adjacent area, thus the too mechanical and not good problem of deformation adaptability while having avoided wide and contour mode to extract handwriting data feature.
Again, the application, by the on-line diagnostic extracting and the combination of off-line diagnostic, has obtained effective handwriting data feature, and then has guaranteed the reliability of follow-up sorter training, and significantly improved the classify accuracy of sorter, finally improved the recognition accuracy of hand script Chinese input equipment.
For device embodiment, because it is substantially similar to embodiment of the method, so description is fairly simple, relevant part is referring to the part explanation of embodiment of the method.
Each embodiment in this instructions all adopts the mode of going forward one by one to describe, and each embodiment stresses is the difference with other embodiment, between each embodiment identical similar part mutually referring to.
The method and apparatus that a kind of handwriting characteristic above the application being provided extracts, be described in detail, applied specific case herein the application's principle and embodiment are set forth, the explanation of above embodiment is just for helping to understand the application's method and core concept thereof; Meanwhile, for one of ordinary skill in the art, the thought according to the application, all will change in specific embodiments and applications, and in sum, this description should not be construed as the restriction to the application.

Claims (10)

CN201410247878.5A2014-06-052014-06-05The method and apparatus that a kind of handwriting characteristic is extractedActiveCN104063705B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201410247878.5ACN104063705B (en)2014-06-052014-06-05The method and apparatus that a kind of handwriting characteristic is extracted

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201410247878.5ACN104063705B (en)2014-06-052014-06-05The method and apparatus that a kind of handwriting characteristic is extracted

Publications (2)

Publication NumberPublication Date
CN104063705Atrue CN104063705A (en)2014-09-24
CN104063705B CN104063705B (en)2017-08-11

Family

ID=51551409

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201410247878.5AActiveCN104063705B (en)2014-06-052014-06-05The method and apparatus that a kind of handwriting characteristic is extracted

Country Status (1)

CountryLink
CN (1)CN104063705B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN110222144A (en)*2019-04-172019-09-10深圳壹账通智能科技有限公司Method for extracting content of text, device, electronic equipment and storage medium
CN110263636A (en)*2019-05-152019-09-20赞同科技股份有限公司A kind of lossless person's handwriting restoring method and system
CN111191512A (en)*2019-12-042020-05-22湖北工业大学Online handwriting stroke matching method based on longest path

Citations (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN1255685A (en)*1998-11-272000-06-07英业达集团(西安)电子技术有限公司Handwritten character recognition system without strokes order
CN1658221A (en)*2004-01-142005-08-24国际商业机器公司Method and apparatus for performing handwriting recognition by analysis of stroke start and end points
US20060088215A1 (en)*1999-03-242006-04-27British Telecommunications, Public Limited CompanyHandwriting recognition system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN1255685A (en)*1998-11-272000-06-07英业达集团(西安)电子技术有限公司Handwritten character recognition system without strokes order
US20060088215A1 (en)*1999-03-242006-04-27British Telecommunications, Public Limited CompanyHandwriting recognition system
CN1658221A (en)*2004-01-142005-08-24国际商业机器公司Method and apparatus for performing handwriting recognition by analysis of stroke start and end points

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
达吾勒·阿布都哈依尔 等: ""哈萨克文脱机手写字符识别系统的研究与实现"", 《计算机工程》*
邓国强: ""无约束联机手写汉字特征提取与识别融合的研究"", 《中国优秀硕士学位论文全文数据库信息科技辑》*

Cited By (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN110222144A (en)*2019-04-172019-09-10深圳壹账通智能科技有限公司Method for extracting content of text, device, electronic equipment and storage medium
CN110222144B (en)*2019-04-172023-03-28深圳壹账通智能科技有限公司Text content extraction method and device, electronic equipment and storage medium
CN110263636A (en)*2019-05-152019-09-20赞同科技股份有限公司A kind of lossless person's handwriting restoring method and system
CN111191512A (en)*2019-12-042020-05-22湖北工业大学Online handwriting stroke matching method based on longest path
CN111191512B (en)*2019-12-042023-05-30武汉汉德瑞庭科技有限公司 Online Handwriting Stroke Matching Method Based on Longest Path

Also Published As

Publication numberPublication date
CN104063705B (en)2017-08-11

Similar Documents

PublicationPublication DateTitle
Pechwitz et al.Baseline estimation for Arabic handwritten words
Lawgali et al.HACDB: Handwritten Arabic characters database for automatic character recognition
CN102629322B (en)Character feature extraction method based on stroke shape of boundary point and application thereof
CN105608454B (en) Text detection method and system based on text structure component detection neural network
CN107392141B (en)Airport extraction method based on significance detection and LSD (least squares distortion) line detection
MahmoudRecognition of writer-independent off-line handwritten Arabic (Indian) numerals using hidden Markov models
CN110598690A (en)End-to-end optical character detection and identification method and system
CN101561866A (en)Character recognition method based on SIFT feature and gray scale difference value histogram feature
CN103093240A (en)Calligraphy character identifying method
CN105469047A (en)Chinese detection method based on unsupervised learning and deep learning network and system thereof
CN105787522A (en)Writing attitude evaluation method and writing attitude evaluation system based on handwriting
CN106503694B (en)Digit recognition method based on eight neighborhood feature
Afakh et al.Aksara jawa text detection in scene images using convolutional neural network
CN103235945A (en)Method for recognizing handwritten mathematical formulas and generating MathML (mathematical makeup language) based on Android system
CN102360436A (en)Identification method for on-line handwritten Tibetan characters based on components
CN104063705A (en)Handwriting feature extracting method and device
CN101488182B (en)Image characteristics extraction method used for handwritten Chinese character recognition
Roy et al.A novel approach to skew detection and character segmentation for handwritten Bangla words
Zand et al.Recognition-based segmentation in Persian character recognition
CN106503706B (en)The method of discrimination of Chinese character pattern cutting result correctness
Madushanka et al.Sinhala handwritten character recognition by using enhanced thinning and curvature histogram based method
Azmi et al.Digital paleography: Using the digital representation of Jawi manuscripts to support paleographic analysis
Firdaus et al.Arabic letter segmentation using modified connected component labeling
CN111402256B (en)Three-dimensional point cloud target detection and attitude estimation method based on template
CN104657751A (en)Mainline direction feature based deep belief network image classification method

Legal Events

DateCodeTitleDescription
C06Publication
PB01Publication
C10Entry into substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant
CP03Change of name, title or address

Address after:100193 Haidian District, Beijing, Northeast China, Beijing Zhongguancun Software Park incubator 2 floor 1.

Patentee after:Beijing InfoQuick SinoVoice Speech Technology Corp.

Address before:100193 two, 206-1, Zhongguancun Software Park, 8 Northeast Northeast Road, Haidian District, Beijing, 206-1

Patentee before:Jietong Huasheng Speech Technology Co., Ltd.

CP03Change of name, title or address
CP02Change in the address of a patent holder

Address after:Building 2102, building 1, Haidian District, Beijing

Patentee after:BEIJING SINOVOICE TECHNOLOGY Co.,Ltd.

Address before:100193 Haidian District, Beijing, Northeast China, Beijing Zhongguancun Software Park incubator 2 floor 1.

Patentee before:BEIJING SINOVOICE TECHNOLOGY Co.,Ltd.

CP02Change in the address of a patent holder

[8]ページ先頭

©2009-2025 Movatter.jp