Movatterモバイル変換


[0]ホーム

URL:


CN105512339A - File searcher and searching method - Google Patents

File searcher and searching method
Download PDF

Info

Publication number
CN105512339A
CN105512339ACN201511028086.XACN201511028086ACN105512339ACN 105512339 ACN105512339 ACN 105512339ACN 201511028086 ACN201511028086 ACN 201511028086ACN 105512339 ACN105512339 ACN 105512339A
Authority
CN
China
Prior art keywords
file
index
word segmentation
information
list
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201511028086.XA
Other languages
Chinese (zh)
Inventor
张学连
谭求强
滕行哲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Netac Technology Co Ltd
Original Assignee
Netac Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Netac Technology Co LtdfiledCriticalNetac Technology Co Ltd
Priority to CN201511028086.XApriorityCriticalpatent/CN105512339A/en
Publication of CN105512339ApublicationCriticalpatent/CN105512339A/en
Pendinglegal-statusCriticalCurrent

Links

Classifications

Landscapes

Abstract

The invention discloses a file searcher and a searching method. The file searcher comprises a file monitoring module, a participle indexing module, an instruction receiving module and a searching response module, wherein the file monitoring module is used for monitoring a file system stored in a magnetic disk to find a newly added file appearing in the current file system; the participle indexing module is used for carrying out corresponding participle indexing processing on the newly added file according to a participle database recorded in a preset electronic dictionary and adding corresponding participle indexing information into a preset indexing information list; wherein, the indexing information list is stored in the magnetic disk and includes an indexing file list and an indexing participle list; the instruction receiving module is used for receiving a file searching instruction input by a user terminal; wherein, the file searching instruction comprises corresponding search terms; the searching response module is used for screening out indexed file information corresponding to the searching terms from the indexed file list and returning the indexed file information to the user terminal. According to the file searcher and the searching method disclosed by the invention, the file indexing degree is improved, and further the file searching precision is promoted.

Description

File retriever and retrieval method
Technical Field
The invention relates to the technical field of file retrieval, in particular to a file retriever and a retrieval method.
Background
Currently, some file retrievers are gradually appearing in the market for facilitating a user to retrieve files stored in a storage device such as a magnetic disk. However, when the file retrievers index the file, the corresponding indexing degree is low, so that the retrieval accuracy of the file retrieval is affected, and the user experience is poor.
In summary, it can be seen that how to improve the indexing degree of the file to improve the file retrieval accuracy is an urgent problem to be solved at present.
Disclosure of Invention
In view of the above, the present invention provides a file retriever and a file retrieval method, which improve the indexing degree of a file, thereby improving the file retrieval accuracy. The specific scheme is as follows:
a document retriever, comprising:
the file monitoring module is used for monitoring a file system stored in a disk so as to find a newly added file appearing in the current file system;
the word segmentation index module is used for carrying out corresponding word segmentation index processing on the newly added file according to a word segmentation database recorded in a preset electronic dictionary and adding corresponding word segmentation index information to a preset index information list; the index information list is stored in the disk and comprises an index file list and an index word segmentation list;
the instruction receiving module is used for receiving a file searching instruction input by a user terminal; wherein the file searching instruction comprises corresponding searching words;
and the retrieval response module is used for screening out index file information corresponding to the search terms from the index file list and returning the index file information to the user terminal.
Preferably, the file monitoring module includes:
a directory adding unit configured to add a file name corresponding to a new file to a file directory of the file system when the new file is written to the file system;
the index file caching unit is used for caching the index file list to obtain a corresponding index file caching list;
the first monitoring unit is used for monitoring whether a first type file exists in the file system in real time, and when the first type file exists in the file system, the first type file is determined as a newly added file; the first type file is a file with a file name existing in the file directory but not existing in the index file cache list.
Preferably, the word segmentation index module includes:
the word segmentation index unit is used for carrying out corresponding word segmentation index processing on the newly added file according to the word segmentation database to obtain corresponding index file information and index word segmentation information;
the first information adding unit is used for adding the index file information obtained by the word segmentation index unit to the index file list;
and the second information adding unit is used for adding the index word segmentation information obtained by the word segmentation index unit to the index word segmentation list.
Preferably, the retrieval response module includes:
the index participle caching unit is used for caching the index participle list to obtain a corresponding index participle caching list;
the first screening unit is used for screening out index word segmentation information corresponding to the search word from the index word segmentation cache list;
and the second screening unit is used for screening out index file information corresponding to the index word segmentation information screened out by the first screening unit from the index file cache list and returning the index file information to the user terminal.
Preferably, the file monitoring module further comprises:
the second monitoring unit is used for monitoring whether the index file cache list contains a file name corresponding to a second type file in real time; the second type file is a file of which the file name exists in the index file cache list but does not exist in the file directory;
and the information deleting unit is used for deleting the index file information corresponding to the file name in the index file cache list when the file name corresponding to the second type file is contained in the index file cache list.
Preferably, the document retriever further includes:
and the word segmentation index starting control module is used for monitoring the workload of the current file system in real time, generating a word segmentation index starting control instruction when the workload of the file system is less than a preset load, and sending the word segmentation index starting control instruction to the word segmentation index module so as to control the word segmentation index module to start word segmentation index processing on newly added files appearing in the file system.
Preferably, the document retriever further includes:
the dictionary updating module is used for providing a dictionary updating interface for a user, receiving dictionary updating information input by the user through the dictionary updating interface and updating the electronic dictionary by utilizing the dictionary updating information; the dictionary updating information comprises newly added word segmentation information, word segmentation modification information or word segmentation deletion information.
The invention also discloses a file retrieval method, which comprises the following steps:
monitoring a file system stored in a disk to find a newly added file appearing in the current file system;
performing corresponding word segmentation index processing on the newly added file according to a word segmentation database recorded in a preset electronic dictionary, and adding corresponding word segmentation index information to a preset index information list; the index information list is stored in the disk and comprises an index file list and an index word segmentation list;
receiving a file searching instruction input by a user terminal; wherein the file searching instruction comprises corresponding searching words;
and screening out index file information corresponding to the search terms from the index file list, and returning the index file information to the user terminal.
Preferably, the file retrieval method further includes:
monitoring the current working load of the file system in real time, generating a word segmentation index starting control instruction when the working load of the file system is smaller than a preset load, and using the word segmentation index starting control instruction to trigger word segmentation index processing on newly added files appearing in the file system.
Preferably, the file retrieval method further includes:
providing a dictionary updating interface for a user, receiving dictionary updating information input by the user through the dictionary updating interface, and updating the electronic dictionary by using the dictionary updating information; the dictionary updating information comprises newly added word segmentation information, word segmentation modification information or word segmentation deletion information.
In the present invention, a document retriever includes: the file monitoring module is used for monitoring the file system stored in the disk so as to find new files appearing in the current file system; the word segmentation index module is used for carrying out corresponding word segmentation index processing on the newly added file according to a word segmentation database recorded in a preset electronic dictionary and adding corresponding word segmentation index information to a preset index information list; the index information list is stored in a disk and comprises an index file list and an index word segmentation list; the instruction receiving module is used for receiving a file searching instruction input by a user terminal; wherein the file searching instruction comprises corresponding searching words; and the retrieval response module is used for screening out index file information corresponding to the search terms from the index file list and returning the index file information to the user terminal. Therefore, the invention carries out word segmentation index processing on the newly added file appearing in the magnetic disk according to the word segmentation database recorded in the electronic dictionary, and the electronic dictionary records a large amount of daily commonly used words, so the word segmentation database in the electronic dictionary can greatly improve the subdivision degree of the words, thereby correspondingly improving the indexing degree of the file and further improving the file retrieval precision.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a schematic diagram of an application structure of a document retriever according to an embodiment of the present invention;
FIG. 2 is a diagram of a specific application structure of a document retriever according to an embodiment of the present invention;
fig. 3 is a flowchart of a file retrieval method according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention discloses a file retriever, which is shown in figure 1 and comprises:
the file monitoring module 11 is configured to monitor a file system stored in a disk to find a new file appearing in a current file system;
the segmentation index module 12 is configured to perform corresponding segmentation index processing on the newly added file according to a segmentation database recorded in a preset electronic dictionary, and add corresponding segmentation index information to a preset index information list; the index information list is stored in a disk and comprises an index file list and an index word segmentation list; in the present embodiment, the word segmentation index processing is to perform word extraction on the content of the newly added document in a reverse order from the back to the front, and the word processing according to the phrase that can be formed according to the chinese habit and the word processing according to the single word that cannot be formed into the phrase are performed.
An instruction receiving module 13, configured to receive a file search instruction input by a user terminal; wherein the file searching instruction comprises corresponding searching words;
and the retrieval response module 14 is configured to screen out index file information corresponding to the search term from the index file list, and return the index file information to the user terminal.
In the development of each module, C language is preferably used, and C language is closer to machine language than Java language, so that each developed module has higher operation speed.
Therefore, the embodiment of the invention carries out word segmentation index processing on the newly added files appearing in the magnetic disk according to the word segmentation database recorded in the electronic dictionary, and the electronic dictionary records a large amount of daily commonly used words, so that the word segmentation degree of the words can be greatly improved by taking the word segmentation database in the electronic dictionary as the word segmentation database, thereby correspondingly improving the indexing degree of the files and further improving the file retrieval precision.
The embodiment of the invention discloses a specific file retriever, and compared with the previous embodiment, the embodiment further explains and optimizes the technical scheme. Specifically, the method comprises the following steps:
referring to fig. 2, in this embodiment, the file monitoring module 11 may specifically include a directory adding unit 111, an index file caching unit 112, and a first monitoring unit 113; wherein,
a directory adding unit 111 for adding a file name corresponding to a new file to a file directory of the file system when the new file is written to the file system;
an index file caching unit 112, configured to cache the index file list to obtain a corresponding index file caching list;
the first monitoring unit 113 is configured to monitor whether a first type file exists in the file system in real time, and when the first type file exists in the file system, determine the first type file as a newly added file; the first type file is a file with a file name existing in a file directory but not existing in an index file cache list.
In addition, the word segmentation index module 12 in this embodiment may specifically include a word segmentation index unit 121, a first information adding unit 122, and a second information adding unit 123; wherein,
a word segmentation index unit 121, configured to perform corresponding word segmentation index processing on the newly added file according to the word segmentation database, so as to obtain corresponding index file information and index word segmentation information;
a first information adding unit 122, configured to add the index file information obtained by the participle indexing unit 121 to the index file list;
the second information adding unit 123 is configured to add the index participle information obtained by the participle indexing unit 121 to the index participle list.
In this embodiment, the retrieval response module 14 specifically includes an index word segmentation cache unit 141, a first filtering unit 142, and a second filtering unit 143; wherein,
the index participle caching unit 141 is configured to cache the index participle list to obtain a corresponding index participle caching list;
a first filtering unit 142, configured to filter out index participle information corresponding to the search term from the index participle cache list;
the second filtering unit 143 is configured to filter out, from the index file cache list, index file information corresponding to the index participle information filtered by the first filtering unit 142, and return the index file information to the user terminal.
In order to avoid that the deleted file is retrieved by the user, the file monitoring module in this embodiment may further include a second monitoring unit and an information deleting unit; wherein,
the second monitoring unit is used for monitoring whether the index file cache list contains a file name corresponding to the second type file or not in real time; the second type file is a file of which the file name exists in the index file cache list but does not exist in the file directory;
and the information deleting unit is used for deleting the index file information corresponding to the file name in the index file cache list when the file name corresponding to the second type file is contained in the index file cache list.
Referring to fig. 2, in order to avoid performing the word segmentation indexing operation in the high-load operating state of the file system, the file retriever in this embodiment may further include:
and the word segmentation index starting control module 15 is configured to monitor the workload of the current file system in real time, generate a word segmentation index starting control instruction when the workload of the file system is smaller than a preset load, and send the word segmentation index starting control instruction to the word segmentation index module 12, so as to control the word segmentation index module 12 to start word segmentation index processing on a newly added file appearing in the file system.
In order to facilitate the user to dynamically update the word library in the electronic dictionary, the file retriever in this embodiment may further include:
the dictionary updating module 16 is used for providing a dictionary updating interface for the user, receiving dictionary updating information input by the user through the dictionary updating interface, and updating the electronic dictionary by utilizing the dictionary updating information; the dictionary updating information comprises newly added word segmentation information, word segmentation modification information or word segmentation deletion information.
The embodiment of the invention also discloses a file retrieval method, which is shown in figure 3 and comprises the following steps:
step S31: monitoring a file system stored in a magnetic disk to find a newly added file appearing in the current file system;
step S32: performing corresponding word segmentation index processing on the newly added file according to a word segmentation database recorded in a preset electronic dictionary, and adding corresponding word segmentation index information to a preset index information list; the index information list is stored in a disk and comprises an index file list and an index word segmentation list; in the present embodiment, the word segmentation index processing is to perform word extraction on the content of the newly added document in a reverse order from the back to the front, and the word processing according to the phrase that can be formed according to the chinese habit and the word processing according to the single word that cannot be formed into the phrase are performed. Step S33: receiving a file searching instruction input by a user terminal; wherein the file searching instruction comprises corresponding searching words;
step S34: and screening index file information corresponding to the search terms from the index file list, and returning the index file information to the user terminal.
In order to avoid performing the word segmentation indexing operation in the high-load working state of the file system, the method in this embodiment may further include: the method comprises the steps of monitoring the working load of the current file system in real time, generating a word segmentation index starting control instruction when the working load of the file system is smaller than a preset load, and starting the control instruction by using the word segmentation index to trigger word segmentation index processing on newly added files appearing in the file system.
In order to facilitate the user to dynamically update the word stock in the electronic dictionary, the method in this embodiment may further include: providing a dictionary updating interface for a user, receiving dictionary updating information input by the user through the dictionary updating interface, and updating the electronic dictionary by utilizing the dictionary updating information; the dictionary updating information comprises newly added word segmentation information, word segmentation modification information or word segmentation deletion information.
For more specific contents of the above steps, reference may be made to corresponding contents disclosed in the foregoing embodiments, and details are not repeated here.
Therefore, the embodiment of the invention carries out word segmentation index processing on the newly added files appearing in the magnetic disk according to the word segmentation database recorded in the electronic dictionary, and the electronic dictionary records a large amount of daily commonly used words, so that the word segmentation degree of the words can be greatly improved by taking the word segmentation database in the electronic dictionary as the word segmentation database, thereby correspondingly improving the indexing degree of the files and further improving the file retrieval precision.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The document retriever and the retrieval method provided by the invention are described in detail above, and a specific example is applied in the text to explain the principle and the implementation of the invention, and the description of the above embodiment is only used to help understanding the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (10)

CN201511028086.XA2015-12-312015-12-31File searcher and searching methodPendingCN105512339A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201511028086.XACN105512339A (en)2015-12-312015-12-31File searcher and searching method

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201511028086.XACN105512339A (en)2015-12-312015-12-31File searcher and searching method

Publications (1)

Publication NumberPublication Date
CN105512339Atrue CN105512339A (en)2016-04-20

Family

ID=55720319

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201511028086.XAPendingCN105512339A (en)2015-12-312015-12-31File searcher and searching method

Country Status (1)

CountryLink
CN (1)CN105512339A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN105930470A (en)*2016-04-252016-09-07安徽富驰信息技术有限公司File retrieval method based on feature weight analysis technology
CN108446336A (en)*2018-02-272018-08-24平安科技(深圳)有限公司Intelligent search method, device, equipment and the storage medium of organization names
CN115705353A (en)*2021-08-102023-02-17腾讯科技(深圳)有限公司 An index processing method and related device based on full-text retrieval

Citations (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN101136016A (en)*2006-09-012008-03-05北大方正集团有限公司 An index online update method for a full-text retrieval system
CN102819592A (en)*2012-08-082012-12-12河海大学Lucene-based desktop searching system and method
US20120317105A1 (en)*2009-09-212012-12-13Zte CorporationMethod and Apparatus for Updating Index and Sequencing Search Results Based on Updated Index in Terminal
CN103177127A (en)*2013-04-182013-06-26陶光毅Jukebox-based database storage system and method using same
CN104077385A (en)*2014-06-272014-10-01北京海泰方圆科技有限公司Classification and retrieval method of files
CN104834664A (en)*2015-02-022015-08-12北京理工大学Optical disc juke-box oriented full text retrieval system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN101136016A (en)*2006-09-012008-03-05北大方正集团有限公司 An index online update method for a full-text retrieval system
US20120317105A1 (en)*2009-09-212012-12-13Zte CorporationMethod and Apparatus for Updating Index and Sequencing Search Results Based on Updated Index in Terminal
CN102819592A (en)*2012-08-082012-12-12河海大学Lucene-based desktop searching system and method
CN103177127A (en)*2013-04-182013-06-26陶光毅Jukebox-based database storage system and method using same
CN104077385A (en)*2014-06-272014-10-01北京海泰方圆科技有限公司Classification and retrieval method of files
CN104834664A (en)*2015-02-022015-08-12北京理工大学Optical disc juke-box oriented full text retrieval system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
熊回香 等: "基于词索引的中文全文检索关键技术及其发展方向", 《中国图书馆学报(双月刊)》*
高雪霞 等: "基于词典知识库的快速检索算法研究", 《德州学院学报》*

Cited By (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN105930470A (en)*2016-04-252016-09-07安徽富驰信息技术有限公司File retrieval method based on feature weight analysis technology
CN105930470B (en)*2016-04-252019-03-26安徽富驰信息技术有限公司A kind of document retrieval method based on feature weight analytical technology
CN108446336A (en)*2018-02-272018-08-24平安科技(深圳)有限公司Intelligent search method, device, equipment and the storage medium of organization names
CN108446336B (en)*2018-02-272019-11-05平安科技(深圳)有限公司Intelligent search method, device, equipment and the storage medium of organization names
CN115705353A (en)*2021-08-102023-02-17腾讯科技(深圳)有限公司 An index processing method and related device based on full-text retrieval

Similar Documents

PublicationPublication DateTitle
CN104331428B (en)The storage of a kind of small documents and big file and access method
CN111782731B (en) Data synchronization method and device
US20150248465A1 (en)Method and apparatus for processing history operation records of electronic terminal, and storage medium
US11526575B2 (en)Web browser with enhanced history classification
CN108255972A (en)A kind of text searching method and system
CN110674408A (en)Service platform, and real-time generation method and device of training sample
CN107045531A (en)A kind of system and method for optimization HDFS small documents access
US20130191414A1 (en)Method and apparatus for performing a data search on multiple user devices
CN105446572A (en)Text-editing method and device used for screen display device
RU2562397C2 (en)Method and apparatus for inserting address of hyperlink into bookmark
CN114328983A (en) Document shredding method, data retrieval method, device and electronic device
CN106055546A (en)Optical disk library full-text retrieval system based on Lucene
CN111460289A (en) News information push method and device
CN105512339A (en)File searcher and searching method
CN107193754B (en)Method and apparatus for data storage for searching
CN117556030A (en)Method and device for determining related words of applet, processing equipment and search system
JP7293780B2 (en) Information processing device, document management system and program
CN110851346B (en)Query statement boundary problem detection method, device, equipment and storage medium
CN104090924A (en)Method and device for cleaning up privacy data
CN116340469B (en)Synonym mining method and device, storage medium and electronic equipment
JP2019003406A (en) Information collecting apparatus, information collecting method, and information collecting program
CN105630837B (en)Media record searching method and device
CN117708385A (en)Operation method and device of resource character string and electronic equipment
CN110597953A (en) A keyword search method, mobile terminal and computer storage medium
CN117112877A (en)Medical document processing method and device applied to inquiry medicine

Legal Events

DateCodeTitleDescription
C06Publication
PB01Publication
C10Entry into substantive examination
SE01Entry into force of request for substantive examination
RJ01Rejection of invention patent application after publication
RJ01Rejection of invention patent application after publication

Application publication date:20160420


[8]ページ先頭

©2009-2025 Movatter.jp