Detailed Description
In order to make the objects, features and advantages of the present disclosure more apparent and understandable, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are only a part of the embodiments of the present disclosure, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.
Referring to fig. 1, fig. 1 schematically shows a flow chart of a retrieval method according to an embodiment of the present disclosure. The method mainly includes operations S101 to S104.
S101, a retrieval instruction is obtained.
The retrieval instruction may be a voice instruction issued by the user to search for content. Specific contents of the search instruction are, for example, "to send a red packet", "play a sunny day of a peridium", "give a gift from lai dupu", and the like.
S102, analyzing the search command to obtain the search category corresponding to the search command.
Specifically, the retrieval instruction is analyzed according to a preset neural network model, and the retrieval category corresponding to the retrieval instruction is obtained. The search category is, for example, "music category", "literature category", "science category", "gourmet category", "news category", "history category", "movie & tv series category", or the like. Other retrieval categories can be obtained by analogy according to the description of the embodiment by a person skilled in the art, and then each retrieval category can be configured in advance.
Taking the search instruction as "sunny day for playing the perils" as an example, the instruction "sunny day for playing the perils" may be analyzed through a preset neural network model, and the corresponding search category is determined to be "music category". It is understood that the search instruction may also correspond to more than one search category, for example, the search instruction "lisu", and the corresponding search category may be "literature category", and may also be "music category", in which case, for example, the search category more interested by the user may be determined according to the usage data of the user, for example, the play record, and the search instruction may be corresponding to the search category more interested by the user.
The neural network is an algorithmic mathematical model for distributed parallel information processing formed by widely interconnecting a large number of simple neurons, and has wide application prospects in the fields of system identification, pattern recognition, intelligent control and the like.
In the embodiment of the disclosure, a large number of retrieval instructions of known retrieval categories can be used for training the neural network model, and parameters of the neural network model are adjusted in the training process until the analysis accuracy of the neural network model meets the requirement, and the obtained neural network model is the preset neural network model. For example, when the analysis accuracy of the adjusted neural network model on the retrieval instruction reaches 99.99%, the adjusted neural network model is used as a preset neural network model.
S103, calling the search mode of the search category.
In the embodiment of the disclosure, each retrieval mode comprises at least one retrieval element, and each retrieval element corresponds to one matching mode.
The retrieval elements are types corresponding to retrieval contents required when the retrieval mode to which the retrieval elements belong is used for retrieval, and each retrieval element corresponds to a large number of specific retrieval contents. For example, the search contents "zhou jen", "li yuchun", "TFBOYS", etc. are all singer names, and the corresponding search element may be defined as "singer name".
Taking the search category as "music category" as an example, the search elements of the search method include "song title", "singer name", "genre", and the like, for example. Those skilled in the art can obtain other search elements of the "music class" search category and obtain search elements of other search categories according to the description of the embodiment.
Further, a corresponding matching mode can be set for each retrieval element according to the degree of correlation of each retrieval element in the retrieval category. The matching method is, for example, "must", "should", "no _ must", etc., where the matching method corresponding to "must" for the search element means that the search element must be completely matched with the search element; the matching mode of the search element corresponding to the 'should' indicates that the matching degree of the search element reaches a certain value, for example, 80%; the matching mode of the search element corresponding to "no _ most" indicates whether the search element is matched with the search element. Other matching manners can be obtained by those skilled in the art according to the description of the embodiment.
The search method of the "music class" search category will be exemplified by taking the search category as "music class", and the search elements as "song title", "singer name", and "genre". For example, the search element "song name" is matched in a "should" manner, the minimum matching degree is 80%, and the corresponding score (i.e., weight) is 400; the matching mode of the search element 'singer name' is 'no _ must', and the corresponding score is 100; the matching mode of the type of the retrieval element is 'must', the corresponding score is 200, the three retrieval elements and the matching modes are combined to form the retrieval mode of the music type retrieval category, which means that 400 scores are obtained when the matching degree of the resource and the song name reaches 80%, 200 scores are obtained when the resource and the song name are completely matched, 100 scores are obtained when the resource and the song name are completely matched, and the obtained scores are added to obtain the final score. It is understood that the above examples are merely illustrative of the retrieval method in the embodiments of the present disclosure, and those skilled in the art may obtain other customized retrieval methods according to the description of the embodiments.
In the embodiment of the present disclosure, a corresponding relationship between each retrieval type and a retrieval method is preset, where each retrieval type corresponds to at least one retrieval method, and each retrieval method corresponds to one retrieval type. It is understood that one search category may correspond to a plurality of search methods, and different search elements are emphasized among the plurality of search methods. The search elements included in each search method may be different, and the matching methods corresponding to the same search element may also be different, so that the search is performed with emphasis from different sides.
In addition, in the embodiment of the disclosure, when the retrieval mode is called, whether the retrieval mode is in a json format is checked, and when the retrieval mode is not in the json format, the retrieval mode is rejected to be called, and a corresponding feedback result is output, so that a maintenance worker corrects and uploads the retrieval mode according to the feedback result.
And S104, searching according to the searching mode.
In the disclosed embodiment, operation S104 may include operations S104A through S104B.
S104A, analyzing the searching command to obtain one or more searching fields, each corresponding to one searching element.
The search field is specific contents obtained from the search command, and examples thereof include "Zhou Jieron", "sunny day", "Libai", "rock" and the like. The search field "zhou jen" corresponds to, for example, the search element "singer name", and the search field "sunny day" corresponds to the search element "singer name", and the like. The contents of other search fields can be obtained by those skilled in the art according to the description of the present embodiment.
In the embodiment of the present disclosure, a resource list corresponding to each retrieval category is pre-established, where the resource list includes the at least one retrieval element and includes at least one retrieval field corresponding to each retrieval element, and the resource list is stored in a partitioned manner.
Fig. 2 schematically shows a schematic diagram of a resource list stored in a retrieval method according to an embodiment of the present disclosure. Referring to fig. 2, each search category is stored with its resource list, taking "music category" search category as an example, its resource list includes search elements "song title", "singer name", "genre", "lyrics", etc., each search element includes a large number of search fields, for example, search elements "singer name" includes search fields "jieren", "TFBOYS", "Janis join", etc. And adding a search field to the resource list in real time. The resource list is established according to the retrieval category, so that later maintenance is facilitated, huge indexes do not need to be established when the retrieval instructions are indexed, and the retrieval efficiency is improved.
In the embodiment of the present disclosure, the resource list shard may be stored in an Elastic Search (ES for short) server. ES is a Lucene-based search server, providing a distributed multi-user capability full-text search engine. The resource list can be stored and scheduled in a fragmentation mode through the distributed ES servers. Since the ES cluster is a distributed search engine, the resource lists distributed at different nodes of the ES cluster are fragments, and the size and number of the fragments affect the search performance overhead. In the embodiment of the present disclosure, the fragmentation principle is, for example: ensuring that the number of fragments of each node is lower than 20-25 fragments configured in each GB heap memory; or considering that the data in the resource list grows too fast, the fragmentation capacity is limited firstly, and then the fragmentation number is limited, for example, the resource list is estimated to reach 100GB, and for an ES cluster with a maximum heap memory of 32GB, the maximum capacity of the fragmentation can be limited to 30GB, so that it is reasonable to set 4-5 fragmentation in the resource list of 100 GB.
In the embodiment of the disclosure, a word list of similar meaning words is also pre-established, and a large number of fields with the same entity meaning are stored in the word list of similar meaning words. Synonyms refer to words having the same meaning of an entity, i.e., words pointing to the same entity, such as "Libai", "Litaibai", "the poetry", "the lotus-house" and the like.
Further, operation S104A includes: extracting one or more keywords in the retrieval instruction, inquiring the similar meaning word of each keyword, and indexing the resource list corresponding to the retrieval category according to each keyword and the corresponding similar meaning word to obtain the retrieval field corresponding to each keyword.
Specifically, first, one or more keywords in the search instruction are extracted, for example, using an index pattern of jieba participles. Keywords are for example fields in the search instruction that have the meaning of an entity. Taking the search instruction as "want to listen to the national song of the people's republic of china" as an example, the extracted keywords are, for example, { "china", "the people's republic of china", "chinese", "people", "republic", "national song" }.
Secondly, according to the extracted keywords, a word list of the near-meaning words is inquired to obtain the near-meaning words corresponding to each keyword, such as the near-meaning words corresponding to the 'people' republic of China 'China, the near-meaning words corresponding to the' Chinese 'China', the near-meaning words corresponding to the 'national songs' are 'heroic military soups', and the like.
Then, for example, by using the search mode of jieba segmentation, the resource list corresponding to the search category is queried according to each obtained keyword and the corresponding synonym of the keyword, so as to obtain the search field "heroic military progress song", where the search element corresponding to the search field is "song name". It is understood that, since the search elements having an influence on the search result are stored in the resource list, not every field in the search instruction corresponds to a search element.
In the embodiment of the present disclosure, the above operation S104A is not performed on all the retrieval instructions, that is, there is also a retrieval instruction that does not split a field. For example, for an instruction with a definite resource type, such as 'playing a movie tomb stealing note', the retrieval is directly carried out without splitting, so that the storage space can be saved, and the retrieval performance can be improved.
S104B, searching the one or more search fields according to the matching method of the search elements corresponding to the one or more search fields.
Taking the search fields obtained in operation S104A as "heroic military music, and" star group "as examples, the attribute corresponding to the search element" song name "in the search mode is replaced by" heroic military music, "the attribute corresponding to the search element" singer name "is replaced by" star group, "and the attributes corresponding to the other search elements are replaced by" star group, "where" star group "represents any match, and the" heroic military music, "and" star group "are compositely searched according to the corresponding matching modes, that is, compositely searched in the resources corresponding to the search category. Further, the score corresponding to each resource can be obtained, the higher the score is, the higher the comprehensive matching degree of the resource and the retrieval field is, and a plurality of resources with the highest scores can be output in sequence.
In this embodiment of the disclosure, when the retrieval category corresponding to the retrieval instruction includes more than one retrieval manner, operation S104B specifically includes: searching is performed sequentially according to the one or more searching methods, for example, a searching order of the one or more searching methods is preset, and searching is performed sequentially by the one or more searching methods in sequence; and executing the target file when the retrieval is carried out according to one retrieval mode and the matching score between the obtained target file and the retrieval instruction is larger than a threshold value. The threshold is used to ensure the accuracy of the target file, e.g., 500 or the like.
In addition, the retrieval method in the embodiment can adopt log4j2 log, has the advantages of high speed, asynchronous writing, good compression and the like, and when the asynchronous writing is adopted, the performance is increased linearly, so that the smoothness of a log system is ensured under frequent retrieval and high concurrent access.
In the embodiment of the disclosure, a retrieval instruction is acquired, the retrieval instruction is analyzed to obtain a retrieval type corresponding to the retrieval instruction, a retrieval mode of the retrieval type is called, and retrieval is performed according to the retrieval mode. By setting a corresponding retrieval mode for each retrieval category in a self-defined manner and calling the retrieval mode of the corresponding retrieval category for the retrieval instruction to perform retrieval, the retrieval accuracy and expandability are improved, the user experience is improved, a corresponding resource list is set for each retrieval category, a near word file is maintained, the performance of instruction query is improved, and the retrieval accuracy is further improved.
Referring to fig. 3, fig. 3 schematically shows a structural diagram of a retrieval system according to an embodiment of the present disclosure. The system mainly comprises anacquisition module 301, ananalysis module 302, acalling module 303 and aretrieval module 304.
The obtainingmodule 301 is configured to obtain a retrieval instruction.
The retrieval instruction may be a voice instruction issued by the user to search for content. Specific contents of the search instruction are, for example, "to send a red packet", "play a sunny day of a peridium", "give a gift from lai dupu", and the like.
Theanalysis module 302 is configured to analyze the search instruction to obtain a search category corresponding to the search instruction.
Specifically, the retrieval instruction is analyzed according to a preset neural network model, and the retrieval category corresponding to the retrieval instruction is obtained. The search category is, for example, "music category", "literature category", "science category", "gourmet category", "news category", "history category", "movie & tv series category", or the like. Other retrieval categories can be obtained by analogy according to the description of the embodiment by a person skilled in the art, and then each retrieval category can be configured in advance.
Taking the search instruction as "sunny day for playing the perils" as an example, the instruction "sunny day for playing the perils" may be analyzed through a preset neural network model, and the corresponding search category is determined to be "music category". It is understood that the search instruction may also correspond to more than one search category, for example, the search instruction "lisu", and the corresponding search category may be "literature category", and may also be "music category", in which case, for example, the search category more interested by the user may be determined according to the usage data of the user, for example, the play record, and the search instruction may be corresponding to the search category more interested by the user.
In the embodiment of the disclosure, a large number of retrieval instructions of known retrieval categories can be used for training the neural network model, and parameters of the neural network model are adjusted in the training process until the analysis accuracy of the neural network model meets the requirement, and the obtained neural network model is the preset neural network model. For example, when the analysis accuracy of the adjusted neural network model on the retrieval instruction reaches 99.99%, the adjusted neural network model is used as a preset neural network model.
The callingmodule 303 is configured to call a search mode of the search category.
In the embodiment of the disclosure, each retrieval mode comprises at least one retrieval element, and each retrieval element corresponds to one matching mode.
The retrieval elements are types corresponding to retrieval contents required when the retrieval mode to which the retrieval elements belong is used for retrieval, and each retrieval element corresponds to a large number of specific retrieval contents. For example, the search contents "zhou jen", "li yuchun", "TFBOYS", etc. are all singer names, and the corresponding search element may be defined as "singer name".
Taking the search category as "music category" as an example, the search elements of the search method include "song title", "singer name", "genre", and the like, for example. Those skilled in the art can obtain other search elements of the "music class" search category and obtain search elements of other search categories according to the description of the embodiment.
Further, a corresponding matching mode can be set for each retrieval element according to the degree of correlation of each retrieval element in the retrieval category. The matching method is, for example, "must", "should", "no _ must", etc., where the matching method corresponding to "must" for the search element means that the search element must be completely matched with the search element; the matching mode of the search element corresponding to the 'should' indicates that the matching degree of the search element reaches a certain value, for example, 80%; the matching mode of the search element corresponding to "no _ most" indicates whether the search element is matched with the search element. Other matching manners can be obtained by those skilled in the art according to the description of the embodiment.
The search method of the "music class" search category will be exemplified by taking the search category as "music class", and the search elements as "song title", "singer name", and "genre". For example, the search element "song name" is matched in a "should" manner, the minimum matching degree is 80%, and the corresponding score (i.e., weight) is 400; the matching mode of the search element 'singer name' is 'no _ must', and the corresponding score is 100; the matching mode of the type of the retrieval element is 'must', the corresponding score is 200, the three retrieval elements and the matching modes are combined to form the retrieval mode of the music type retrieval category, which means that 400 scores are obtained when the matching degree of the resource and the song name reaches 80%, 200 scores are obtained when the resource and the song name are completely matched, 100 scores are obtained when the resource and the song name are completely matched, and the obtained scores are added to obtain the final score. It is understood that the above examples are merely illustrative of the retrieval method in the embodiments of the present disclosure, and those skilled in the art may obtain other customized retrieval methods according to the description of the embodiments.
In the embodiment of the present disclosure, the retrieval system further includes a setting module, configured to preset a corresponding relationship between each retrieval category and a retrieval manner, where each retrieval category corresponds to at least one retrieval manner, and each retrieval manner corresponds to one retrieval category. It is understood that one search category may correspond to a plurality of search methods, and different search elements are emphasized among the plurality of search methods. The search elements included in each search method may be different, and the matching methods corresponding to the same search element may also be different, so that the search is performed with emphasis from different sides.
And aretrieval module 304, configured to perform retrieval according to the retrieval method.
In the embodiment of the disclosure, the retrieval system further comprises an analysis module for analyzing the retrieval instruction to obtain one or more retrieval fields, and each retrieval field corresponds to one retrieval element.
The search field is specific contents obtained from the search command, and examples thereof include "Zhou Jieron", "sunny day", "Libai", "rock" and the like. The search field "zhou jen" corresponds to, for example, the search element "singer name", and the search field "sunny day" corresponds to the search element "singer name", and the like. The contents of other search fields can be obtained by those skilled in the art according to the description of the present embodiment.
In the embodiment of the present disclosure, the retrieval system further includes an establishing module and a storage module, the establishing module is configured to establish in advance a resource list corresponding to each retrieval category, the resource list includes the at least one retrieval element and includes at least one retrieval field corresponding to each retrieval element, and the storage module is configured to perform fragmentation storage on the resource list. The storage module stores a resource storage list as shown in fig. 2, for example.
In the embodiment of the present disclosure, the storage module may store the resource list shard in an Elastic Search (ES for short) server. ES is a Lucene-based search server, providing a distributed multi-user capability full-text search engine. The resource list can be stored and scheduled in a fragmentation mode through the distributed ES servers. Since the ES cluster is a distributed search engine, the resource lists distributed at different nodes of the ES cluster are fragments, and the size and number of the fragments affect the search performance overhead.
In the embodiment of the present disclosure, the establishing module is further configured to establish a word list of similar words in advance, where the word list of similar words stores a large number of fields with the same entity meaning. Synonyms refer to words having the same meaning of an entity, i.e., words pointing to the same entity, such as "Libai", "Litaibai", "the poetry", "the lotus-house" and the like.
Further, the parsing module comprises an extraction sub-module, a query sub-module and an indexing sub-module.
The extraction submodule is used for extracting one or more keywords in the retrieval instruction. Specifically, the extraction sub-module extracts one or more keywords in the search instruction by using, for example, an index pattern of the jieba participle. Keywords are for example fields in the search instruction that have the meaning of an entity. Taking the search instruction as "want to listen to the national song of the people's republic of china" as an example, the extracted keywords are, for example, { "china", "the people's republic of china", "chinese", "people", "republic", "national song" }.
The query submodule is used for querying the similar meaning words of each keyword. Specifically, the query submodule queries the word list of the near-sense words according to the extracted keywords to obtain the near-sense words corresponding to each keyword, such as the near-sense word "china" corresponding to the "people's republic of china", the near-sense word "chinese" corresponding to the "chinese", the near-sense word "hero song" corresponding to the "heroic military song", and the like.
And the indexing submodule is used for indexing the resource list corresponding to the retrieval category according to each keyword and the corresponding similar meaning word to obtain a retrieval field corresponding to each keyword. Specifically, the indexing sub-module queries a resource list corresponding to the retrieval category of each obtained keyword and a near-meaning word corresponding to the keyword according to the obtained keyword and the near-meaning word corresponding to the keyword by using a search mode of jieba word segmentation, so as to obtain a retrieval field of the heroic military song, wherein a retrieval element corresponding to the retrieval field is the song name. It is understood that, since the search elements having an influence on the search result are stored in the resource list, not every field in the search instruction corresponds to a search element.
In the embodiment of the present disclosure, the parsing module is not executed for all the search instructions, that is, there is also a search instruction that does not split a field. For example, for an instruction with a definite resource type, such as 'playing a movie tomb stealing note', the retrieval is directly carried out without splitting, so that the storage space can be saved, and the retrieval performance can be improved.
In the embodiment of the present disclosure, the retrievingmodule 304 is further configured to retrieve the one or more retrieval fields according to the matching manner of the retrieval elements corresponding to the one or more retrieval fields.
Taking the retrieval fields obtained in the analysis module as 'heroic army song' and 'group star' as examples, the attribute corresponding to the retrieval element 'song name' in the retrieval mode is replaced by 'heroic army song', the attribute corresponding to the retrieval element 'singer name' is replaced by 'group star', the attributes corresponding to other retrieval elements are replaced by 'star', wherein, 'star' represents any matching, and the 'heroic army song' and the 'group star' are subjected to composite retrieval according to the corresponding matching modes, namely composite retrieval is carried out in the resources corresponding to the retrieval types. Further, the score corresponding to each resource can be obtained, the higher the score is, the higher the comprehensive matching degree of the resource and the retrieval field is, and a plurality of resources with the highest scores can be output in sequence.
In the embodiment of the present disclosure, when the retrieval category corresponding to the retrieval instruction includes more than one retrieval method, theretrieval module 304 is further configured to perform retrieval sequentially according to the more than one retrieval method, for example, a retrieval sequence of the more than one retrieval method is preset, and the more than one retrieval method is sequentially used for retrieval sequentially according to the sequence; and executing the target file when the retrieval is carried out according to one retrieval mode and the matching score between the obtained target file and the retrieval instruction is larger than a threshold value. The threshold is used to ensure the accuracy of the target file, e.g., 500 or the like.
In the embodiment of the present disclosure, the obtainingmodule 301 obtains a retrieval instruction, the analyzingmodule 302 analyzes the retrieval instruction to obtain a retrieval type corresponding to the retrieval instruction, the callingmodule 303 calls a retrieval method of the retrieval type, and the retrievingmodule 304 performs retrieval according to the retrieval method. By setting a corresponding retrieval mode for each retrieval category in a self-defined manner and calling the retrieval mode of the corresponding retrieval category for the retrieval instruction to perform retrieval, the retrieval accuracy and expandability are improved, the user experience is improved, a corresponding resource list is set for each retrieval category, a near word file is maintained, the performance of instruction query is improved, and the retrieval accuracy is further improved.
Referring to fig. 4, fig. 4 shows a hardware configuration diagram of an electronic device.
The electronic device described in this embodiment includes:
amemory 41, aprocessor 42 and a computer program stored on thememory 41 and executable on the processor, the processor implementing the retrieval method described in the embodiment of fig. 1 in the foregoing when executing the program.
Further, the electronic device further includes:
at least oneinput device 43; at least oneoutput device 44.
Thememory 41,processor 42input device 43 andoutput device 44 are connected by abus 45.
Theinput device 43 may be a camera, a touch panel, a physical button, or a mouse. Theoutput device 44 may specifically be a display screen.
TheMemory 41 may be a high-speed Random Access Memory (RAM) Memory or a non-volatile Memory (non-volatile Memory), such as a magnetic disk Memory. Thememory 41 is used for storing a set of executable program code, and theprocessor 42 is coupled to thememory 41.
Further, an embodiment of the present disclosure also provides a computer-readable storage medium, where the computer-readable storage medium may be provided in the terminal in the foregoing embodiments, and the computer-readable storage medium may be the memory in the foregoing embodiment shown in fig. 4. The computer-readable storage medium has stored thereon a computer program which, when executed by a processor, implements the retrieval method described in the foregoing embodiment shown in fig. 1. Further, the computer-readable storage medium may be various media that can store program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and an actual implementation may have another division, for example, a plurality of modules or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication link may be through some interfaces, and the indirect coupling or communication link of the modules may be in an electrical, mechanical or other form.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present disclosure may be integrated into one processing module, or each module may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.
It is noted that while for simplicity of explanation, the foregoing method embodiments have been described as a series of acts or combination of acts, it will be appreciated by those skilled in the art that the present disclosure is not limited by the order of acts, as some steps may, in accordance with the present disclosure, occur in other orders and concurrently. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no acts or modules are necessarily required of the disclosure.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In summary, the present disclosure should not be construed as limiting the present disclosure, since the concepts of the embodiments of the present disclosure can be changed in the specific implementation manners and the application ranges by those skilled in the art.