Disclosure of Invention
In view of the above, the present invention provides a file retriever and a file retrieval method, which improve the indexing degree of a file, thereby improving the file retrieval accuracy. The specific scheme is as follows:
a document retriever, comprising:
the file monitoring module is used for monitoring a file system stored in a disk so as to find a newly added file appearing in the current file system;
the word segmentation index module is used for carrying out corresponding word segmentation index processing on the newly added file according to a word segmentation database recorded in a preset electronic dictionary and adding corresponding word segmentation index information to a preset index information list; the index information list is stored in the disk and comprises an index file list and an index word segmentation list;
the instruction receiving module is used for receiving a file searching instruction input by a user terminal; wherein the file searching instruction comprises corresponding searching words;
and the retrieval response module is used for screening out index file information corresponding to the search terms from the index file list and returning the index file information to the user terminal.
Preferably, the file monitoring module includes:
a directory adding unit configured to add a file name corresponding to a new file to a file directory of the file system when the new file is written to the file system;
the index file caching unit is used for caching the index file list to obtain a corresponding index file caching list;
the first monitoring unit is used for monitoring whether a first type file exists in the file system in real time, and when the first type file exists in the file system, the first type file is determined as a newly added file; the first type file is a file with a file name existing in the file directory but not existing in the index file cache list.
Preferably, the word segmentation index module includes:
the word segmentation index unit is used for carrying out corresponding word segmentation index processing on the newly added file according to the word segmentation database to obtain corresponding index file information and index word segmentation information;
the first information adding unit is used for adding the index file information obtained by the word segmentation index unit to the index file list;
and the second information adding unit is used for adding the index word segmentation information obtained by the word segmentation index unit to the index word segmentation list.
Preferably, the retrieval response module includes:
the index participle caching unit is used for caching the index participle list to obtain a corresponding index participle caching list;
the first screening unit is used for screening out index word segmentation information corresponding to the search word from the index word segmentation cache list;
and the second screening unit is used for screening out index file information corresponding to the index word segmentation information screened out by the first screening unit from the index file cache list and returning the index file information to the user terminal.
Preferably, the file monitoring module further comprises:
the second monitoring unit is used for monitoring whether the index file cache list contains a file name corresponding to a second type file in real time; the second type file is a file of which the file name exists in the index file cache list but does not exist in the file directory;
and the information deleting unit is used for deleting the index file information corresponding to the file name in the index file cache list when the file name corresponding to the second type file is contained in the index file cache list.
Preferably, the document retriever further includes:
and the word segmentation index starting control module is used for monitoring the workload of the current file system in real time, generating a word segmentation index starting control instruction when the workload of the file system is less than a preset load, and sending the word segmentation index starting control instruction to the word segmentation index module so as to control the word segmentation index module to start word segmentation index processing on newly added files appearing in the file system.
Preferably, the document retriever further includes:
the dictionary updating module is used for providing a dictionary updating interface for a user, receiving dictionary updating information input by the user through the dictionary updating interface and updating the electronic dictionary by utilizing the dictionary updating information; the dictionary updating information comprises newly added word segmentation information, word segmentation modification information or word segmentation deletion information.
The invention also discloses a file retrieval method, which comprises the following steps:
monitoring a file system stored in a disk to find a newly added file appearing in the current file system;
performing corresponding word segmentation index processing on the newly added file according to a word segmentation database recorded in a preset electronic dictionary, and adding corresponding word segmentation index information to a preset index information list; the index information list is stored in the disk and comprises an index file list and an index word segmentation list;
receiving a file searching instruction input by a user terminal; wherein the file searching instruction comprises corresponding searching words;
and screening out index file information corresponding to the search terms from the index file list, and returning the index file information to the user terminal.
Preferably, the file retrieval method further includes:
monitoring the current working load of the file system in real time, generating a word segmentation index starting control instruction when the working load of the file system is smaller than a preset load, and using the word segmentation index starting control instruction to trigger word segmentation index processing on newly added files appearing in the file system.
Preferably, the file retrieval method further includes:
providing a dictionary updating interface for a user, receiving dictionary updating information input by the user through the dictionary updating interface, and updating the electronic dictionary by using the dictionary updating information; the dictionary updating information comprises newly added word segmentation information, word segmentation modification information or word segmentation deletion information.
In the present invention, a document retriever includes: the file monitoring module is used for monitoring the file system stored in the disk so as to find new files appearing in the current file system; the word segmentation index module is used for carrying out corresponding word segmentation index processing on the newly added file according to a word segmentation database recorded in a preset electronic dictionary and adding corresponding word segmentation index information to a preset index information list; the index information list is stored in a disk and comprises an index file list and an index word segmentation list; the instruction receiving module is used for receiving a file searching instruction input by a user terminal; wherein the file searching instruction comprises corresponding searching words; and the retrieval response module is used for screening out index file information corresponding to the search terms from the index file list and returning the index file information to the user terminal. Therefore, the invention carries out word segmentation index processing on the newly added file appearing in the magnetic disk according to the word segmentation database recorded in the electronic dictionary, and the electronic dictionary records a large amount of daily commonly used words, so the word segmentation database in the electronic dictionary can greatly improve the subdivision degree of the words, thereby correspondingly improving the indexing degree of the file and further improving the file retrieval precision.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention discloses a file retriever, which is shown in figure 1 and comprises:
the file monitoring module 11 is configured to monitor a file system stored in a disk to find a new file appearing in a current file system;
the segmentation index module 12 is configured to perform corresponding segmentation index processing on the newly added file according to a segmentation database recorded in a preset electronic dictionary, and add corresponding segmentation index information to a preset index information list; the index information list is stored in a disk and comprises an index file list and an index word segmentation list; in the present embodiment, the word segmentation index processing is to perform word extraction on the content of the newly added document in a reverse order from the back to the front, and the word processing according to the phrase that can be formed according to the chinese habit and the word processing according to the single word that cannot be formed into the phrase are performed.
An instruction receiving module 13, configured to receive a file search instruction input by a user terminal; wherein the file searching instruction comprises corresponding searching words;
and the retrieval response module 14 is configured to screen out index file information corresponding to the search term from the index file list, and return the index file information to the user terminal.
In the development of each module, C language is preferably used, and C language is closer to machine language than Java language, so that each developed module has higher operation speed.
Therefore, the embodiment of the invention carries out word segmentation index processing on the newly added files appearing in the magnetic disk according to the word segmentation database recorded in the electronic dictionary, and the electronic dictionary records a large amount of daily commonly used words, so that the word segmentation degree of the words can be greatly improved by taking the word segmentation database in the electronic dictionary as the word segmentation database, thereby correspondingly improving the indexing degree of the files and further improving the file retrieval precision.
The embodiment of the invention discloses a specific file retriever, and compared with the previous embodiment, the embodiment further explains and optimizes the technical scheme. Specifically, the method comprises the following steps:
referring to fig. 2, in this embodiment, the file monitoring module 11 may specifically include a directory adding unit 111, an index file caching unit 112, and a first monitoring unit 113; wherein,
a directory adding unit 111 for adding a file name corresponding to a new file to a file directory of the file system when the new file is written to the file system;
an index file caching unit 112, configured to cache the index file list to obtain a corresponding index file caching list;
the first monitoring unit 113 is configured to monitor whether a first type file exists in the file system in real time, and when the first type file exists in the file system, determine the first type file as a newly added file; the first type file is a file with a file name existing in a file directory but not existing in an index file cache list.
In addition, the word segmentation index module 12 in this embodiment may specifically include a word segmentation index unit 121, a first information adding unit 122, and a second information adding unit 123; wherein,
a word segmentation index unit 121, configured to perform corresponding word segmentation index processing on the newly added file according to the word segmentation database, so as to obtain corresponding index file information and index word segmentation information;
a first information adding unit 122, configured to add the index file information obtained by the participle indexing unit 121 to the index file list;
the second information adding unit 123 is configured to add the index participle information obtained by the participle indexing unit 121 to the index participle list.
In this embodiment, the retrieval response module 14 specifically includes an index word segmentation cache unit 141, a first filtering unit 142, and a second filtering unit 143; wherein,
the index participle caching unit 141 is configured to cache the index participle list to obtain a corresponding index participle caching list;
a first filtering unit 142, configured to filter out index participle information corresponding to the search term from the index participle cache list;
the second filtering unit 143 is configured to filter out, from the index file cache list, index file information corresponding to the index participle information filtered by the first filtering unit 142, and return the index file information to the user terminal.
In order to avoid that the deleted file is retrieved by the user, the file monitoring module in this embodiment may further include a second monitoring unit and an information deleting unit; wherein,
the second monitoring unit is used for monitoring whether the index file cache list contains a file name corresponding to the second type file or not in real time; the second type file is a file of which the file name exists in the index file cache list but does not exist in the file directory;
and the information deleting unit is used for deleting the index file information corresponding to the file name in the index file cache list when the file name corresponding to the second type file is contained in the index file cache list.
Referring to fig. 2, in order to avoid performing the word segmentation indexing operation in the high-load operating state of the file system, the file retriever in this embodiment may further include:
and the word segmentation index starting control module 15 is configured to monitor the workload of the current file system in real time, generate a word segmentation index starting control instruction when the workload of the file system is smaller than a preset load, and send the word segmentation index starting control instruction to the word segmentation index module 12, so as to control the word segmentation index module 12 to start word segmentation index processing on a newly added file appearing in the file system.
In order to facilitate the user to dynamically update the word library in the electronic dictionary, the file retriever in this embodiment may further include:
the dictionary updating module 16 is used for providing a dictionary updating interface for the user, receiving dictionary updating information input by the user through the dictionary updating interface, and updating the electronic dictionary by utilizing the dictionary updating information; the dictionary updating information comprises newly added word segmentation information, word segmentation modification information or word segmentation deletion information.
The embodiment of the invention also discloses a file retrieval method, which is shown in figure 3 and comprises the following steps:
step S31: monitoring a file system stored in a magnetic disk to find a newly added file appearing in the current file system;
step S32: performing corresponding word segmentation index processing on the newly added file according to a word segmentation database recorded in a preset electronic dictionary, and adding corresponding word segmentation index information to a preset index information list; the index information list is stored in a disk and comprises an index file list and an index word segmentation list; in the present embodiment, the word segmentation index processing is to perform word extraction on the content of the newly added document in a reverse order from the back to the front, and the word processing according to the phrase that can be formed according to the chinese habit and the word processing according to the single word that cannot be formed into the phrase are performed. Step S33: receiving a file searching instruction input by a user terminal; wherein the file searching instruction comprises corresponding searching words;
step S34: and screening index file information corresponding to the search terms from the index file list, and returning the index file information to the user terminal.
In order to avoid performing the word segmentation indexing operation in the high-load working state of the file system, the method in this embodiment may further include: the method comprises the steps of monitoring the working load of the current file system in real time, generating a word segmentation index starting control instruction when the working load of the file system is smaller than a preset load, and starting the control instruction by using the word segmentation index to trigger word segmentation index processing on newly added files appearing in the file system.
In order to facilitate the user to dynamically update the word stock in the electronic dictionary, the method in this embodiment may further include: providing a dictionary updating interface for a user, receiving dictionary updating information input by the user through the dictionary updating interface, and updating the electronic dictionary by utilizing the dictionary updating information; the dictionary updating information comprises newly added word segmentation information, word segmentation modification information or word segmentation deletion information.
For more specific contents of the above steps, reference may be made to corresponding contents disclosed in the foregoing embodiments, and details are not repeated here.
Therefore, the embodiment of the invention carries out word segmentation index processing on the newly added files appearing in the magnetic disk according to the word segmentation database recorded in the electronic dictionary, and the electronic dictionary records a large amount of daily commonly used words, so that the word segmentation degree of the words can be greatly improved by taking the word segmentation database in the electronic dictionary as the word segmentation database, thereby correspondingly improving the indexing degree of the files and further improving the file retrieval precision.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The document retriever and the retrieval method provided by the invention are described in detail above, and a specific example is applied in the text to explain the principle and the implementation of the invention, and the description of the above embodiment is only used to help understanding the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.