Movatterモバイル変換


[0]ホーム

URL:


CN104077385A - Classification and retrieval method of files - Google Patents

Classification and retrieval method of files
Download PDF

Info

Publication number
CN104077385A
CN104077385ACN201410301532.9ACN201410301532ACN104077385ACN 104077385 ACN104077385 ACN 104077385ACN 201410301532 ACN201410301532 ACN 201410301532ACN 104077385 ACN104077385 ACN 104077385A
Authority
CN
China
Prior art keywords
file
shortcut
label
files
described file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410301532.9A
Other languages
Chinese (zh)
Inventor
管延军
蒋红宇
郑永莹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Haitai Fangyuan High Technology Co Ltd
Original Assignee
Beijing Haitai Fangyuan High Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Haitai Fangyuan High Technology Co LtdfiledCriticalBeijing Haitai Fangyuan High Technology Co Ltd
Priority to CN201410301532.9ApriorityCriticalpatent/CN104077385A/en
Publication of CN104077385ApublicationCriticalpatent/CN104077385A/en
Pendinglegal-statusCriticalCurrent

Links

Classifications

Landscapes

Abstract

The invention provides a classification and retrieval method of files. The method comprises the following steps: A. adding indexes for obtained files, and performing word segmentation on the contents of the files; B, taking the words which reach the set degree of activity in the word segmentation as the labels of the files; C, creating folders of which the number is the same as that of the labels of the files, and respectively distributing the labels to the folders as the attributes of the folders for use; D, creating the shortcuts of the files of which the number is the same as the labels of the files, and respectively identifying the shortcuts of the files by the labels, wherein the shortcuts of the files are related to the physical memory addresses of the files through file indexing; and E, respectively placing the shortcuts of the files into the folders corresponding to the corresponding labels. The classification and retrieval method is used for solving the problems in an existing file classifying technology.

Description

A kind of classification of file and search method
Technical field
The present invention relates to a kind of classification and search method of file.
Background technology
In the document classification search method of existing operation system, generally creating the file identical with file attribute according to the attribute of file type and file self carrys out storage file.This wherein, file attribute and file type are changeless.Because a file is likely present under multiple files simultaneously, in the time of locating file, need to carry out searching of execute file according to the attribute of file like this.
Create file according to file self attributes, can limit folder type, make file taxonomical hierarchy and characteristic of division immobilization, cause file not carry out document classification, poor expandability according to different application requirements.And exist leave same a file identical attribute in situation under multiple files, cause the redundancy of file data.In addition,, the in the situation that of having deposited a large amount of same property file under identical file folder, can greatly affect ff speed.
Summary of the invention
In view of this, fundamental purpose of the present invention is to provide a kind of classification and search method of file, to solve the above-mentioned problems in the prior art.
The sorting technique that the invention provides a kind of file, comprises the following steps:
A, add index for obtaining file, and the content of described file is carried out to participle;
B, will in participle, reach set the word of liveness as the label of described file;
C, the establishment quantity file identical with the number of labels of described file, and described label is distributed to respectively to described file, use as the attribute of described file;
D, create the shortcut of quantity and the label described file identical and that be associated with the physical storage address of described file by described file index of described file, and identify respectively the shortcut of described file with each described label;
E, the shortcut of described file is put into respectively to the described file corresponding with its label.
As seen from the above, the participle that meets certain standard according to " liveness " in file content, as folder tabs, can carry out document classification according to different application requirements.And be only stored in a situation under physical address with a file, can not cause the redundancy of file data.In addition, under identical file folder, only deposit the file shortcut with a kind of label, therefore can greatly improve ff speed.
In above-mentioned method, also comprise:
F, in the time that the content of described file changes, the content of described file is re-started to participle, then return to above-mentioned steps B, to redefine the label of described file and to upgrade described file corresponding to described file and the shortcut of described file.
As seen from the above, the label of file can upgrade along with the variation of file content, can increase and decrease in real time thus the quantity of corresponding folder and file shortcut.
In above-mentioned method, the method for described participle is mechanical Chinese word segmentation method.
In above-mentioned method, described liveness is the frequency of utilization of word.
In above-mentioned method, described liveness is the tolerance that the word frequency of occurrences is greater than certain value.
The present invention also provides a kind of document retrieval method based on file classifying method described in above-mentioned any one, it is characterized in that, comprises the following steps:
Attribute according to file described in the keyword retrieval of input: if hit, return to the file shortcut in this file;
Otherwise, the shortcut according to file described in the keyword retrieval of input: if hit, return to corresponding file shortcut, otherwise, exit retrieval.
Brief description of the drawings
Fig. 1 is the process flow diagram of file classifying method of the present invention.
Embodiment
Below in conjunction with accompanying drawing, introduce in detail classification and the search method of following a kind of file provided by the invention.
As shown in Figure 1, the sorting technique of above-mentioned file comprises the following steps:
Step 100: add index for obtaining file, and the content of this file is carried out to participle.
In the present embodiment, can adopt some conventional segmenting methods to carry out participle to the content of file, for example, based on the segmenting method (mechanical Chinese word segmentation method) of dictionary, dictionary coupling.
This method is mated Chinese character string to be analyzed according to certain strategy with the entry in " fully large " machine dictionary, if find certain character string in dictionary, the match is successful.Identify a word, be divided into forward coupling and reverse coupling according to the difference of direction of scanning.According to the situation of the preferential coupling of different length, be divided into maximum (the longest) coupling and minimum (the shortest) coupling.According to whether combining with part-of-speech tagging process, can be divided into again the integral method that simple segmenting method and participle combine with mark.Conventional method is as maximum forward matching method, reverse maximum matching method, minimum syncopation: make the word cutting out in each count minimum and bi-directional matching method.Because above-mentioned segmenting method is comparatively conventional, therefore do not repeat them here.
Step 200: the label using the word that in participle, " liveness " is higher as file.
In the present embodiment, " liveness " of so-called word can mean the frequency of utilization (number of times occurring hereof) of word.Can, according to " liveness " order from high to low, select successively multiple words, respectively as multiple labels of file.For example, in file content, the frequency that three words of " world cup ", " Brazil ", " winning the championship " occur comes the front three of all words in this content, with the label using these three words as file respectively.Or the word that the frequency of occurrences is greater than to certain value uses as having height " liveness " label.
Step 300: create the file corresponding with file label quantity, and using each label as folder attribute, distribute to respectively newly-built file.
Taking above-mentioned label as example, create three attributes and be respectively the file of " world cup ", " Brazil " and " winning the championship ".
Step 400: according to the quantity of label, create the shortcut of the above-mentioned file being associated with above-mentioned file physical storage address by file index, and with each label shortcut of identification document respectively.
For example, described file has three file labels, creates three shortcuts.By add logic association mark in file shortcut, file shortcut and file index are set up to logic association relation, in this example, identify respectively three shortcuts of described file with label " world cup ", " Brazil " and " winning the championship ".
Step 500: the shortcut of described file is put into respectively to the file corresponding with its label.
For example, in the file that attribute is " world cup ", deposit the file shortcut of label for " world cup ", in the file of attribute for " Brazil ", deposit the file shortcut of label for " Brazil ", by that analogy.
Step 600: in the time that file content changes, the content of this file is re-started to participle, then return to above-mentioned steps 200, redefine file label, with file corresponding to this file of real-time update and file shortcut.
In the present embodiment, if after file content changes, " Italy " word also meets the condition (having height " liveness ") as label, the new label as this file by " Italy " word, create file and a file shortcut with label " Italy " mark that a new file attribute is " Italy ", and this shortcut is put into this new file.The label not changing for other is not done any variation.Concrete steps are the same, repeat no more.So just, realized the object that changes real-time update document classification with file content.
Based on above-mentioned file classifying method, the present invention also provides a kind of document retrieval method, and the method comprises the following steps:
First, according to the attribute of the above-mentioned file of keyword retrieval of input.If hit (thering is the folder attribute corresponding with above-mentioned keyword), return to file shortcut wherein; Otherwise, the above-mentioned file shortcut of keyword retrieval according to input: if hit, return to corresponding file shortcut, otherwise, exit retrieval.
In above-mentioned retrieving, different search conditions may be corresponding to different files (having different folder attribute), although and different file is deposited different file shortcuts, these shortcuts may be pointed to a file (concrete reason is with reference to above) simultaneously.Like this, in retrieving files process, only need retrieving files folder or file shortcut just can find the physical address of respective file, and without the direct physical address of retrieving files.So greatly save retrieval time, improved recall precision.
The foregoing is only preferred embodiment of the present invention, in order to limit the present invention, within the spirit and principles in the present invention not all, any amendment of doing, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.

Claims (6)

CN201410301532.9A2014-06-272014-06-27Classification and retrieval method of filesPendingCN104077385A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201410301532.9ACN104077385A (en)2014-06-272014-06-27Classification and retrieval method of files

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201410301532.9ACN104077385A (en)2014-06-272014-06-27Classification and retrieval method of files

Publications (1)

Publication NumberPublication Date
CN104077385Atrue CN104077385A (en)2014-10-01

Family

ID=51598639

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201410301532.9APendingCN104077385A (en)2014-06-272014-06-27Classification and retrieval method of files

Country Status (1)

CountryLink
CN (1)CN104077385A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN105159936A (en)*2015-08-062015-12-16广州供电局有限公司File classification apparatus and method
CN105512339A (en)*2015-12-312016-04-20深圳市朗科科技股份有限公司File searcher and searching method
CN108153839A (en)*2017-12-152018-06-12北京小米移动软件有限公司Document handling method and device
CN108874814A (en)*2017-05-102018-11-23北京国双科技有限公司The processing method and processing device of legal documents
CN110532231A (en)*2019-09-022019-12-03Oppo(重庆)智能科技有限公司File polling method, file polling device and terminal device
CN110555010A (en)*2019-09-112019-12-10中国南方电网有限责任公司power grid real-time operation data storage system
CN110807005A (en)*2018-08-012020-02-18广州金山移动科技有限公司 Method, device and electronic device for adding folder icon
CN112597100A (en)*2020-09-172021-04-02武汉大学File management method and device based on object proxy tag
CN114117044A (en)*2021-11-242022-03-01浙江省医疗器械检验研究院Personal electronic document classification method, device, equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN101763424A (en)*2009-12-142010-06-30刘二中Method for determining characteristic words and searching according to file content
US20120173642A1 (en)*2009-02-172012-07-05Tagle Information Technology Inc.Methods and Systems Using Taglets for Management of Data
CN102591920A (en)*2011-12-192012-07-18刘松涛Method and system for classifying document collection in document management system
CN102750352A (en)*2012-06-112012-10-24深圳市同洲电子股份有限公司Method and device for classified collection of historical access records in browser
CN102999637A (en)*2012-12-292013-03-27珠海金山办公软件有限公司Method and system for automatically adding file tab to file according to file feature code
CN103119596A (en)*2011-09-152013-05-22株式会社东芝Apparatus, method and program for document classification

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20120173642A1 (en)*2009-02-172012-07-05Tagle Information Technology Inc.Methods and Systems Using Taglets for Management of Data
CN101763424A (en)*2009-12-142010-06-30刘二中Method for determining characteristic words and searching according to file content
CN103119596A (en)*2011-09-152013-05-22株式会社东芝Apparatus, method and program for document classification
CN102591920A (en)*2011-12-192012-07-18刘松涛Method and system for classifying document collection in document management system
CN102750352A (en)*2012-06-112012-10-24深圳市同洲电子股份有限公司Method and device for classified collection of historical access records in browser
CN102999637A (en)*2012-12-292013-03-27珠海金山办公软件有限公司Method and system for automatically adding file tab to file according to file feature code

Cited By (14)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN105159936A (en)*2015-08-062015-12-16广州供电局有限公司File classification apparatus and method
CN105512339A (en)*2015-12-312016-04-20深圳市朗科科技股份有限公司File searcher and searching method
CN108874814B (en)*2017-05-102022-05-27北京国双科技有限公司Legal document processing method and device
CN108874814A (en)*2017-05-102018-11-23北京国双科技有限公司The processing method and processing device of legal documents
CN108153839B (en)*2017-12-152021-12-21北京小米移动软件有限公司File processing method and device
CN108153839A (en)*2017-12-152018-06-12北京小米移动软件有限公司Document handling method and device
CN110807005A (en)*2018-08-012020-02-18广州金山移动科技有限公司 Method, device and electronic device for adding folder icon
CN110532231B (en)*2019-09-022021-12-17Oppo(重庆)智能科技有限公司File query method, file query device and terminal equipment
CN110532231A (en)*2019-09-022019-12-03Oppo(重庆)智能科技有限公司File polling method, file polling device and terminal device
CN110555010A (en)*2019-09-112019-12-10中国南方电网有限责任公司power grid real-time operation data storage system
CN110555010B (en)*2019-09-112022-04-05中国南方电网有限责任公司Power grid real-time operation data storage system
CN112597100A (en)*2020-09-172021-04-02武汉大学File management method and device based on object proxy tag
CN112597100B (en)*2020-09-172022-07-15武汉大学 A file management method and device based on object proxy tag
CN114117044A (en)*2021-11-242022-03-01浙江省医疗器械检验研究院Personal electronic document classification method, device, equipment and storage medium

Similar Documents

PublicationPublication DateTitle
CN104077385A (en)Classification and retrieval method of files
CN104679778B (en)A kind of generation method and device of search result
CN102402605B (en)Mixed distribution model for search engine indexing
CN103049568B (en)The method of the document classification to magnanimity document library
CN107085583B (en)Electronic document management method and device based on content
CN104166651A (en) Method and device for data search based on integration of similar data objects
CN103365992B (en)Method for realizing dictionary search of Trie tree based on one-dimensional linear space
CN103473296A (en) A Recycle Bin Mechanism and System Applicable to Cloud Computing
CN107526746B (en)Method and apparatus for managing document index
CN107844493B (en)File association method and system
CN105589894B (en)Document index establishing method and device and document retrieval method and device
CN107436911A (en)Fuzzy query method, device and inquiry system
CN106991102A (en)The processing method and processing system of key-value pair in inverted index
CN104317891A (en)Method and device for tagging pages
CN104915449A (en)Faceted search system and method based on water conservancy object classification labels
CN104298736A (en)Method and device for aggregating and connecting data as well as database system
US20180276244A1 (en)Method and system for searching for similar images that is nearly independent of the scale of the collection of images
WO2019171190A1 (en)System and method for searching based on text blocks and associated search operators
CN108241713A (en)A kind of inverted index search method based on polynary cutting
CN104125300A (en)Synchronizing method for set-card separate type domestic gateway business configuration data
CN101957860A (en)Method and device for releasing and searching information
CN108460093A (en) Data processing method and device for public security system
CN104572871A (en)Method and device for searching based on index table
CN103345383A (en)Method and device for comparing multithreading data
CN110232047A (en)Time wire management system, method, computer readable storage medium and the terminal of cad file

Legal Events

DateCodeTitleDescription
C06Publication
PB01Publication
C10Entry into substantive examination
SE01Entry into force of request for substantive examination
CB02Change of applicant information

Address after:100094, Beijing, Haidian District, West Road, No. 8, Zhongguancun Software Park, building 9, international software building E, one floor, two layers

Applicant after:BEIJING HAITAI FANGYUAN HIGH TECHNOLOGY CO., LTD.

Address before:100094, Beijing, Haidian District, West Road, No. 8, Zhongguancun Software Park, building 9, international software building E, one floor, two layers

Applicant before:Beijing Haitai Fangyuan High Technology Co., Ltd.

CORChange of bibliographic data
RJ01Rejection of invention patent application after publication
RJ01Rejection of invention patent application after publication

Application publication date:20141001


[8]ページ先頭

©2009-2025 Movatter.jp