Movatterモバイル変換


[0]ホーム

URL:


CN1367446A - Chinese personal biographical notes information treatment system and method - Google Patents

Chinese personal biographical notes information treatment system and method
Download PDF

Info

Publication number
CN1367446A
CN1367446ACN 01105285CN01105285ACN1367446ACN 1367446 ACN1367446 ACN 1367446ACN 01105285CN01105285CN 01105285CN 01105285 ACN01105285 ACN 01105285ACN 1367446 ACN1367446 ACN 1367446A
Authority
CN
China
Prior art keywords
text
resume
text block
mark
block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN 01105285
Other languages
Chinese (zh)
Other versions
CN1167026C (en
Inventor
吕楠
郑飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qiancheng Wuyou Network Information Technology (Beijing) Co., Ltd.
Original Assignee
Shanghai Branch Co Qiancheng Wuyou Network Information Technology (beijing) Co
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Branch Co Qiancheng Wuyou Network Information Technology (beijing) CofiledCriticalShanghai Branch Co Qiancheng Wuyou Network Information Technology (beijing) Co
Priority to CNB011052856ApriorityCriticalpatent/CN1167026C/en
Publication of CN1367446ApublicationCriticalpatent/CN1367446A/en
Application grantedgrantedCritical
Publication of CN1167026CpublicationCriticalpatent/CN1167026C/en
Anticipated expirationlegal-statusCritical
Expired - Lifetimelegal-statusCriticalCurrent

Links

Images

Landscapes

Abstract

The present invention relates to a Chinese personal biographical notes information processing method and its system. Said method includes the folloiwng steps: pretreating inputted Chinese personal biographical notes text to form noted first biographical notes text; making the first biographical notes text undergo the words-separation treatment to form noted second biographical notes text; identifying and noting the common technical term group of personal biographical notes in the second biographical notes text to form noted third biographical notes text, and making the noted third biographical notes text undergo the process of text structure analysis to form noted text block with specific type. Said method and system can treat personal biographic notes text, can extract its main information to form an unified format.

Description

Chinese personal biographical notes information treatment system and method
The present invention relates to the natural language understanding in Chinese information processing and the computational linguistics, relate in particular to a kind of Chinese personal biographical notes information treatment system and method.
Resume information is a kind of information resources in the personnel recruitment work, is the master data that the talent seeks in enterprises and institutions.Especially along with development of internet technology, increasing enterprises and institutions have sought to find and recruit the talent by the internet.The talent is hunted in online not only can make the personnel department of enterprises and institutions break away from heavy and complicated traditional manpower work, and the abundant information resources of network provide wide talent choice for enterprises and institutions.Meanwhile, as all types of talents that resume information is provided, equally also increased selected chance of employing.On the other hand, just because of deliver the convenience of resume on the net, the job hunter can apply for a lot of positions at short notice simultaneously, causes many enterprises that recruit on the internet all will receive hundreds of electronics resume text every day.Like this, the recruitment talent's enterprise personnel department just needs to drop into a large amount of electronics resume texts that a large amount of Manpower Branch reason is received, has increased new burden.Simultaneously, because the design style of various resumes and the difference of each one writing style, the concrete form of resume information varies with each individual, and is ever-changing, brings inconvenience to database foundation and talents information retrieval.Traditional method can only rely on manually to classify handles the different electronics resume of these forms, is entered in the database job hunter's information is manual.In order to alleviate the new burden of personnel department of enterprises and institutions, just need a kind of method of resume text of automatic processing arbitrary format, from the resume text, extract the key message that enterprise is concerned about the most automatically.
Being of purpose of the present invention provides a kind of relevant information the go forward side by side Chinese personal biographical notes information treatment system and method for row format processing extracted automatically from the Chinese personal biographical notes text of any format write.
According to an aspect of the present invention, provide a kind of Chinese personal biographical notes information processing method, this method may further comprise the steps:
Chinese personal biographical notes text to input carries out pre-service, forms the first resume text that has marked;
The described first resume text is carried out word segmentation processing, form the second resume text that has marked;
Mark discerned in resume in described second resume text proprietary name phrase commonly used, form the 3rd resume text that has marked;
The 3rd resume text that has marked is carried out the text structure analysis, form the text block that has marked and had particular type.
According to a further aspect in the invention, provide a kind of Chinese personal biographical notes information treatment system, it comprises:
In order to the resume text message identification annotation equipment that character, word, phrase and proper noun in the resume text of input are marked; And
In order to the resume text behind the identification mark carried out piecemeal and the text block behind the piecemeal to be marked, cuts apart and merges the resume text structure analysis annotation equipment of combination.
Adopt Chinese personal biographical notes information treatment system of the present invention and method, can handle the resume text that any writing style forms, extract the main information in the resume text, a kind of unified format of final formation has been brought convenience to qualified database foundation and talents information retrieval.
The present invention is described in further detail below with reference to accompanying drawing and preferred embodiment.Other purpose, feature and effect of the present invention will become clearer in the following description.
Fig. 1 is the block scheme of expression according to Chinese personal biographical notes information treatment system of the present invention;
Fig. 2 is the operational flowchart of expression according to Chinese personal biographical notes information treatment system of the present invention;
Fig. 3 is the more detailed process flow diagram of relevant pretreatment process in the expression operational flowchart shown in Figure 2;
Fig. 4 is the more detailed process flow diagram of relevant resume text structure analysis process in the expression operational flowchart shown in Figure 2.
Referring to Fig. 1, Chinese personal biographical notes information treatment system of the present invention comprises the resume text message identification annotation equipment 1 in order to character, word, phrase and proper noun in the resume text of input are marked; In order to the resume text behind the identification mark carried out piecemeal and the text block behind the piecemeal to be marked, cuts apart and merges the resume text structure analysis annotation equipment 2 of combination; And gather various information according to specific order, gather device 3 as the information gathering of information extraction result output.
Wherein, resume text message identification annotation equipment 1 comprises: in order to the specific character in the text is discerned the pretreatment unit 11 of mark; Described text is carried out the word segmentation processing device 12 of word segmentation processing; And the proper noun identification annotation equipment 13 of the proprietary name phrase commonly used of the resume in the described text being discerned mark.
The resume text structure is analyzed annotation equipment 2 and is comprised: in order to described text is carried out the resume text sections device 21 of initial piecemeal by natural paragraph; The text block annotation equipment 22 that the text block of described initial piecemeal is mated mark; To cutting apart, form the text block segmenting device 23 of text block with single type through the text block of mark; And the text block composite set 24 that each text block that has same type after described cutting apart is merged the big text block that is combined into single type.
Next referring to Fig. 2 to Fig. 4, its expression is according to the operational flowchart of Chinese personal biographical notes information treatment system of the present invention.Step S1, system's input Chinese personal biographical notes text.Step S2, system carries out pre-service to the resume text of input, and it comprises step S21, and system discerns and mark the numeral in the original resume text, foreign language word and punctuation mark etc.; Step S22, system further carry out identification marking to the time on date in the text, URL web page address and e-mail address etc.So far, system forms the first resume text that has marked.
Step S3, conventional dictionary of system's utilization and resume dictionary carry out word segmentation processing to the first resume text.Wherein, the resume dictionary is a kind of special dictionary at Chinese Resume text special configuration, and it has comprised the bigger combination vocabulary of granularity that extracts in a large number from true resume text.After the word segmentation processing step, system forms the second resume text that has marked.In the second resume text, having occurred can be for Chinese word, common phrase and the resume proper noun and the phrase of identification, for example, " Beijing ", " Tsing-Hua University ", " undergraduate course ", " graduation ", " carefree working net ", " development department ", " slip-stick artist ", " technical director ", " education background ", " work experience ", " hobby " or the like.
Step S4, system utilize proprietary name phrase identification knowledge base (calling first knowledge base in the following text) and first rule-interpreter that the resume in the above-mentioned second resume text is used always proprietary name phrase (for example name, educational institution's title, major name, work unit's title, department's title, academic title's job title, project name, take on role etc.) and discern mark.Wherein, first knowledge base is to construct at the characteristics of proprietary name phrase commonly used in the resume, and it has comprised the architectural feature rule of many resumes proprietary name phrase commonly used.For example, according to this rule, the proprietary name phrase of similar " place noun (as Beijing, Shanghai, Jiangsu Province)+one or more other nouns (as aviation, traffic)+educational institution's title suffix (as university, institute) " this structure will be identified and be labeled as " educational institution's title ".First rule-interpreter is used the proprietary name phrase in order to analysis that the phrase structure feature rule in first knowledge base is made an explanation always thereby identify above-mentioned resume.After proper noun identification annotation step, system forms the 3rd resume text that has marked.
Step S5, system carries out the text structure analysis to the 3rd resume text that has marked.It comprises step S51, by natural paragraph the 3rd resume text is carried out initial piecemeal; Step S52, system utilize Text Mode knowledge base (calling second knowledge base in the following text) and second rule-interpreter to initially the text block of piecemeal mate mark.Text block behind overmatching mark both may be that only to comprise the text of single type information fast, also may be the mixing text block that comprises polytype information.Wherein, second knowledge base has comprised the pattern rules of many latent structures according to text block dissimilar in the resume text.Second rule-interpreter is then in order to make an explanation to the pattern rules in second knowledge base and to analyze.For example, according to this rule, similar in the above-mentioned text block " life period start-stop scope AND exists the title AND of educational institution to exist major name AND to have the degree title " will be noted as " education background piece ".Step S53, system utilizes first database and specific decision criteria to determine to mix the stem type of text block, so-called stem refers to top continuous some sentences of text piece, and these sentences only comprise the information of same type, and the information type that comprises immediately following (if any) after stem is different with the information type of stem.Wherein, first database is also referred to as " information frequency weights database ", and it comprises the statistics of many different information frequencies of occurrences dissimilar text block that come out from a large amount of true resume texts.Step S54, system utilize resume text sections clue dictionary and probability database that above-mentioned mixing text block is cut apart, and are about to text piece and are divided into thinner, as to have single type text block.Wherein, this piecemeal clue dictionary and probability database comprise the probability statistics data that many training from a large amount of true resume texts, the piecemeal clue word that extracts and these speech become resume text sections mark.Step S55, each text block that has same type after cutting apart more than system incites somebody to action merges the big text block that is combined into single type.For example, essential information piece, education background piece, working experience piece, project experience piece, job hunting require piece and out of Memory piece etc.
Step S6, system collect corresponding information from all kinds of text block, the information that collect has all been marked out by identification gradually in front each step.For example, from individual essential information piece, collect name, sex, date of birth, marital status, postcode, telephone number, Email address, inhabitation city, mailing address or information such as inhabitation address, ID (identity number) card No.; From the education background text block, collect start-stop days, the educational institution's title accept the education, be that name or major name, educational background or degree title, the most well educated title, foreign language extremely wait information such as stage; Information such as collection work start-stop days, unit one belongs to's title, department's title, academic title's post of serving as, working year number from the working experience text block; Assembled item start-stop days, project name, developing instrument title, hardware environment title, software environment title and information such as role who serves as or responsibility from project experience text block; Require the text block to collect information such as unit property that industry, job function title, work place, the monthly pay be engaged in require, expect, from the out of Memory text block, collect the out of Memory that is not included in above-mentioned text block from job hunting, as the certificate name of professional skill, training experience, acquisition, reward information such as title, personal interest and personal preference.
Step S7, system gathers various information according to specific order, exports as the information extraction result.
The above only is the preferred embodiment of Chinese personal biographical notes information treatment system of the present invention and method.According to design of the present invention, those skilled in the art can also make various modifications and conversion to this, but this modification and conversion all belong to scope of the present invention.

Claims (10)

CNB011052856A2001-01-222001-01-22Chinese personal biographical notes information treatment system and methodExpired - LifetimeCN1167026C (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CNB011052856ACN1167026C (en)2001-01-222001-01-22Chinese personal biographical notes information treatment system and method

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CNB011052856ACN1167026C (en)2001-01-222001-01-22Chinese personal biographical notes information treatment system and method

Publications (2)

Publication NumberPublication Date
CN1367446Atrue CN1367446A (en)2002-09-04
CN1167026C CN1167026C (en)2004-09-15

Family

ID=4654368

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CNB011052856AExpired - LifetimeCN1167026C (en)2001-01-222001-01-22Chinese personal biographical notes information treatment system and method

Country Status (1)

CountryLink
CN (1)CN1167026C (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN103514200A (en)*2012-06-272014-01-15镇江睿泰信息科技有限公司Resume combined making and issuing system and method
CN104318340A (en)*2014-09-252015-01-28中国科学院软件研究所Information visualization method and intelligent visual analysis system based on text curriculum vitae information
CN106815208A (en)*2015-12-012017-06-09北京国双科技有限公司The analysis method and device of law judgement document
CN106815206A (en)*2015-12-012017-06-09北京国双科技有限公司The analysis method and device of law judgement document
CN106815207A (en)*2015-12-012017-06-09北京国双科技有限公司For the information processing method and device of law judgement document
CN107145584A (en)*2017-05-102017-09-08西南科技大学A kind of resume analytic method based on n gram models
CN108845980A (en)*2018-05-302018-11-20深圳市元征科技股份有限公司A kind of resume generation method, system, device and computer readable storage medium
CN109271479A (en)*2018-09-292019-01-25广东润弘科技有限公司A kind of resume structuring processing method
CN109471924A (en)*2018-10-182019-03-15国云科技股份有限公司 A recognition and matching analysis method for the resumes of talents with the same name with the same pronunciation
CN109740147A (en)*2018-12-142019-05-10国云科技股份有限公司A kind of big quantity personnel resume duplicate removal Match Analysis
CN111737969A (en)*2020-07-272020-10-02北森云计算有限公司Resume parsing method and system based on deep learning
CN112052646A (en)*2020-08-272020-12-08安徽聚戎科技信息咨询有限公司Text data labeling method
CN112149389A (en)*2020-09-272020-12-29南方电网数字电网研究院有限公司Resume information structured processing method and device, computer equipment and storage medium
CN112651236A (en)*2020-12-282021-04-13中电金信软件有限公司Method and device for extracting text information, computer equipment and storage medium
TWI736831B (en)*2019-01-282021-08-21洽吧智能股份有限公司Textual relationship analysis method and system

Cited By (19)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN103514200A (en)*2012-06-272014-01-15镇江睿泰信息科技有限公司Resume combined making and issuing system and method
CN104318340B (en)*2014-09-252017-07-07中国科学院软件研究所Information visualization methods and intelligent visible analysis system based on text resume information
CN104318340A (en)*2014-09-252015-01-28中国科学院软件研究所Information visualization method and intelligent visual analysis system based on text curriculum vitae information
CN106815207B (en)*2015-12-012020-08-11北京国双科技有限公司Information processing method and device for legal referee document
CN106815208A (en)*2015-12-012017-06-09北京国双科技有限公司The analysis method and device of law judgement document
CN106815206A (en)*2015-12-012017-06-09北京国双科技有限公司The analysis method and device of law judgement document
CN106815207A (en)*2015-12-012017-06-09北京国双科技有限公司For the information processing method and device of law judgement document
CN107145584A (en)*2017-05-102017-09-08西南科技大学A kind of resume analytic method based on n gram models
CN107145584B (en)*2017-05-102020-06-19西南科技大学 A resume parsing method based on n-gram model
CN108845980A (en)*2018-05-302018-11-20深圳市元征科技股份有限公司A kind of resume generation method, system, device and computer readable storage medium
CN109271479A (en)*2018-09-292019-01-25广东润弘科技有限公司A kind of resume structuring processing method
CN109471924A (en)*2018-10-182019-03-15国云科技股份有限公司 A recognition and matching analysis method for the resumes of talents with the same name with the same pronunciation
CN109740147A (en)*2018-12-142019-05-10国云科技股份有限公司A kind of big quantity personnel resume duplicate removal Match Analysis
TWI736831B (en)*2019-01-282021-08-21洽吧智能股份有限公司Textual relationship analysis method and system
CN111737969A (en)*2020-07-272020-10-02北森云计算有限公司Resume parsing method and system based on deep learning
CN112052646A (en)*2020-08-272020-12-08安徽聚戎科技信息咨询有限公司Text data labeling method
CN112052646B (en)*2020-08-272024-03-29安徽聚戎科技信息咨询有限公司Text data labeling method
CN112149389A (en)*2020-09-272020-12-29南方电网数字电网研究院有限公司Resume information structured processing method and device, computer equipment and storage medium
CN112651236A (en)*2020-12-282021-04-13中电金信软件有限公司Method and device for extracting text information, computer equipment and storage medium

Also Published As

Publication numberPublication date
CN1167026C (en)2004-09-15

Similar Documents

PublicationPublication DateTitle
CN110825882B (en) An information system management method based on knowledge graph
CN112307153B (en)Automatic construction method and device of industrial knowledge base and storage medium
CN1167026C (en)Chinese personal biographical notes information treatment system and method
CN110765257A (en)Intelligent consulting system of law of knowledge map driving type
CN107766371A (en)A kind of text message sorting technique and its device
CN106408249A (en)Resume and position matching method and device
WO2000043915A1 (en)Generating personalized user profiles for utilizing the generated user profiles to perform adaptive internet searches
CN112163424A (en)Data labeling method, device, equipment and medium
CN110516057B (en)Petition question answering method and device
CN110334343B (en)Method and system for extracting personal privacy information in contract
CN110851576A (en)Question and answer processing method, device, equipment and readable medium
CN113076735A (en)Target information acquisition method and device and server
CN111191413B (en)Method, device and system for automatically marking event core content based on graph sequencing model
CN110019703A (en)Data markers method and device, intelligent answer method and system
CN111177401A (en) A method for extracting knowledge from free text in power grid
CN110968571A (en)Big data analysis and processing platform for financial information service
CN114880588A (en)News popularity prediction method based on knowledge graph
CN116542800A (en)Intelligent financial statement analysis system based on cloud AI technology
CN118296107A (en)Large model driven intelligent knowledge platform
CN119515162A (en) A digital enterprise diagnosis system and method based on large model technology
CN119862253A (en) Public opinion monitoring method, device, computer equipment and storage medium
CN111949781B (en)Intelligent interaction method and device based on natural sentence syntactic analysis
CN114548072A (en)Automatic content analysis and information evaluation method and system for contract files
CN119005133A (en)Text generation method and system based on large language model
CN118536957A (en)Talent post matching method and device based on model screening, medium and equipment

Legal Events

DateCodeTitleDescription
C10Entry into substantive examination
SE01Entry into force of request for substantive examination
C06Publication
PB01Publication
C10Entry into substantive examination
SE01Entry into force of request for substantive examination
C14Grant of patent or utility model
GR01Patent grant
ASSSuccession or assignment of patent right

Owner name:QIANCHENGWUYOU NETWORKS INFORMATION TECHNOLOGY(BE

Free format text:FORMER OWNER: SHANGHAI BRANCH CO., QIANCHENG WUYOU NETWORK INFORMATION TECHNOLOGY (BEIJING) CO

Effective date:20050311

C41Transfer of patent application or patent right or utility model
TR01Transfer of patent right

Effective date of registration:20050311

Address after:100022, China Merchants Building, No. 118, Jianguo Road, 32, Beijing, Chaoyang District

Patentee after:Qiancheng Wuyou Network Information Technology (Beijing) Co., Ltd.

Address before:200001, building 17, new one hundred building, 800 East Nanjing Road, Shanghai

Patentee before:Shanghai Branch Co., Qiancheng Wuyou Network Information Technology (Beijing) Co

CX01Expiry of patent term

Granted publication date:20040915

CX01Expiry of patent term

[8]ページ先頭

©2009-2025 Movatter.jp