Summary of the invention
The embodiment of the present invention provides a kind of original content statement method and device based on similarity detection, by originalContent carries out the detection of the similarity before original statement, can prevent the publication of non-original content in advance, avoid hair tortiousIt is raw.
First aspect of the embodiment of the present invention provides a kind of original content statement method based on similarity detection, can wrapIt includes:
When getting original statement request, request the separator in corresponding original content will be original according to original statementContent is split as different original short sentences;
Original short sentence short sentence corresponding with the original content of publication in platform is carried out by similarity using similarity algorithmMatching obtains similarity value;
When similarity value is less than or equal to minimum similarity degree threshold value, original statement request is responded in original content progressHold publication.
Further, above-mentioned using similarity algorithm that original short sentence and the original content of publication in platform is corresponding shortSentence carries out similarity mode and obtains similarity value, comprising:
The first keyword in original short sentence is extracted based on grammatical expression way and has issued the corresponding short sentence of original contentIn the second keyword;
Similarity value between first keyword and the second keyword is calculated using similarity algorithm.
Further, the above method further include:
When similarity threshold is greater than or equal to maximum similarity threshold value, stop response original content statement request.
Further, the above method further include:
When similarity threshold is greater than minimum similarity degree threshold value and is less than maximum similarity threshold value, asked for original content statementCorresponding original content is asked to match asynchronous audit mode.
Further, the above method further include:
After stopping response original content statement request, output infringement warning information.
Second aspect of the embodiment of the present invention provides a kind of original content statement device based on similarity detection, can wrapIt includes:
Original content separating modules, for requesting corresponding original according to original statement when getting original statement requestOriginal content is split as different original short sentences by the separator in wound content;
Similarity value computing module, for using similarity algorithm by the publication original content in original short sentence and platformCorresponding short sentence carries out similarity mode and obtains similarity value;
Ask respond module is declared, for responding original Shen when similarity value is less than or equal to minimum similarity degree threshold valueBright request carries out content publication to original content.
Further, above-mentioned similarity value computing module includes:
Keyword extracting unit, for extracting the first keyword in original short sentence based on grammatical expression way and having issuedThe second keyword in the corresponding short sentence of original content;
Similarity value computing unit, for calculating the phase between the first keyword and the second keyword using similarity algorithmLike angle value.
Further, above-mentioned apparatus further include:
Request response stopping modular, for stopping response when similarity threshold is greater than or equal to maximum similarity threshold valueOriginal content statement request.
Further, above-mentioned apparatus further include:
Audit mode matching module, for when similarity threshold is greater than minimum similarity degree threshold value and is less than maximum similarity thresholdWhen value, corresponding original content is requested to match asynchronous audit mode for original content statement.
Further, above-mentioned apparatus further include:
Warning information output module is used for after stopping response original content statement request, exports infringement warning information.
In embodiments of the present invention, by carrying out the detection of the similarity before original statement to original content, in similarity valueWhen meeting minimum threshold, original statement request is responded, original content is issued.The publication of non-original content is prevented in advance, is avoidedGeneration tortious.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, completeSite preparation description.
Original content statement method provided in an embodiment of the present invention based on similarity detection can be applied to Chinese communityThe application scenarios of class original content protection.
In embodiments of the present invention, the original content statement device based on similarity detection can be smart phone, plateThe terminal devices such as computer.
It should be noted that the description and claims of this application and term " first " in above-mentioned attached drawing, "Two " etc. be to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should be understood that using in this wayData be interchangeable under appropriate circumstances, so as to embodiments herein described herein.In addition, term " includes " and " toolHave " and their any deformation, it is intended that cover it is non-exclusive include, for example, containing a series of steps or unitsProcess, method, system, product or equipment those of are not necessarily limited to be clearly listed step or unit, but may include without clearOther step or units listing to Chu or intrinsic for these process, methods, product or equipment.
It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phaseMutually combination.The application is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
Below in conjunction with attached drawing 1, method is declared to the original content provided in an embodiment of the present invention based on similarity detectionIt describes in detail.
Referring to Figure 1, for the embodiment of the invention provides a kind of original contents based on similarity detection to declare methodFlow diagram.As shown in Figure 1, the embodiment of the present invention the method may include following steps S101- step S103.
S101 requests the separator in corresponding original content according to original statement when getting original statement requestOriginal content is split as to different original short sentences.
Specifically, above-mentioned apparatus can be when getting original statement request, according in the corresponding original content of the requestSeparator original content is split as to different original short sentences, it is to be understood that above-mentioned separator can be original contentIn paragraph symbol, comma, branch, fullstop etc. symbol that text is divided.The original short sentence of original content after dividingIt can be a word, a word or one section of word etc..
Original short sentence short sentence corresponding with the original content of publication in platform is carried out phase using similarity algorithm by S102Similarity value is obtained like degree matching.
It is understood that above-mentioned apparatus can also be according to above-mentioned separator method for splitting to original issued in platformWound content is split, and the corresponding short sentence of original content has been issued.
Further, above-mentioned apparatus can calculate above-mentioned original short sentence using similarity algorithm and issue original content pairThe short sentence answered carries out similarity mode and obtains similarity value, it is to be understood that above-mentioned similarity value can be used for assessing twoSimilar degree between original content.Above-mentioned similarity algorithm can be Euclidean distance or cosine similarity.
In an alternative embodiment, above-mentioned apparatus can extract the first keyword in original short sentence based on grammatical expression wayThe second keyword in the corresponding short sentence of original content is issued, it is to be understood that above-mentioned based on grammatical expression wayIt extracts and can be the extraction for carrying out subject and predicate, guest's keyword to short sentence.Further, it can be calculated using above-mentioned similarity algorithmSimilarity value between first keyword and the second keyword.
In an alternative embodiment, above-mentioned apparatus can carry out similarity detection using machine learning detection algorithm, by rightThe features such as keyword, short sentence carry out machine sample training, improve the discrimination of machine detection algorithm.
S103 responds original statement request to original content when similarity value is less than or equal to minimum similarity degree threshold valueCarry out content publication.
Specifically, when similarity value is less than or equal to minimum similarity degree threshold value, it is believed that original statement request corresponds toOriginal content there is no the suspicion for plagiarizing infringement, be the original content of author, request so as to respond original statement to originalIt creates content and carries out content publication.
In an alternative embodiment, when similarity threshold is greater than or equal to maximum similarity threshold value, it is believed that original ShenThe corresponding original content of bright request is to plagiarize, and the creation of author belongs to abuse, so as to stop responding original contentStatement request, forbids the publication of non-original content, prevents real original content from being plagiarized.Optionally, stop responding above-mentioned askAfter asking, infringement warning information can be exported.
In an alternative embodiment, above-mentioned apparatus can be directed to each one credit value account of user setting, and user does not deliverOne original content can increase the size of credit value in the account, if there are acts of plagiarism for the content of publication, reduce accountThe size of interior credit value.It is understood that above-mentioned apparatus can reduce this and ask after stopping the above-mentioned original statement request of responseSeek the size of credit value in corresponding credit accounts.
In an alternative embodiment, when above-mentioned similarity threshold is greater than minimum similarity degree threshold value and is less than maximum similarity threshold valueWhen, it is believed that requesting corresponding original content, there may be plagiarism suspicion, dislike to further determine whether really to exist to plagiarizeIt doubts, above-mentioned apparatus can request corresponding original content to match asynchronous audit mode for original content statement, by manually being examinedCore increases the accuracy to similarity detection.
In embodiments of the present invention, by carrying out the detection of the similarity before original statement to original content, in similarity valueWhen meeting minimum threshold, original statement request is responded, original content is issued.The publication of non-original content is prevented in advance, is avoidedGeneration tortious.
It should be noted that step shown in the flowchart of the accompanying drawings can be in such as a group of computer-executable instructionsIt is executed in computer installation, although also, logical order is shown in flow charts, and it in some cases, can be with notThe sequence being same as herein executes shown or described step.
Below in conjunction with attached drawing 2 and attached drawing 3, to the original content Shen provided in an embodiment of the present invention based on similarity detectionBright device describes in detail.It should be noted that attached drawing 2 and the attached original content Shen shown in Fig. 3 based on similarity detectionBright device, the method for executing embodiment illustrated in fig. 1 of the present invention illustrate only and the embodiment of the present invention for ease of descriptionRelevant part, it is disclosed by specific technical details, please refer to present invention embodiment shown in FIG. 1.
Fig. 2 is referred to, for the embodiment of the invention provides a kind of original contents based on similarity detection to declare deviceStructural schematic diagram.As shown in Fig. 2, the original content statement device 10 of the embodiment of the present invention may include: that original content separates mouldBlock 101, similarity value computing module 102, statement ask respond module 103, request response stopping modular 104, audit modeWith module 105 and warning information output module 106.Wherein, the keyword extraction list as shown in Figure 3 of similarity value computing module 102Member 1021 and similarity value computing unit 1022.
Original content separating modules 101, for being requested according to original statement corresponding when getting original statement requestOriginal content is split as different original short sentences by the separator in original content.
In the specific implementation, original content separating modules 101 can be when getting original statement request, according to the request pairOriginal content is split as different original short sentences by the separator in original content answered, it is to be understood that above-mentioned separatorIt can be the symbol that paragraph symbol, comma, branch, fullstop etc. in original content divide text.It is original after dividingThe original short sentence of content can be a word, a word or one section of word etc..
Similarity value computing module 102, for using similarity algorithm that original short sentence and the publication in platform is originalThe corresponding short sentence of content carries out similarity mode and obtains similarity value.
It is understood that original content separating modules 101 can also be according to above-mentioned separator method for splitting in platformIssued original content is split, and the corresponding short sentence of original content has been issued.
Further, similarity value computing module 102 can be calculated above-mentioned original short sentence and sent out using similarity algorithmThe corresponding short sentence of cloth original content carries out similarity mode and obtains similarity value, it is to be understood that above-mentioned similarity value can be withFor assessing similar degree between two original contents.Above-mentioned similarity algorithm can be Euclidean distance or cosine is similarDegree.
In an alternative embodiment, keyword extracting unit 1021 can be extracted in original short sentence based on grammatical expression wayFirst keyword and the second keyword in the corresponding short sentence of original content is issued, it is to be understood that above-mentioned based on the syntaxThe extraction of expression way can be the extraction that subject and predicate, guest's keyword are carried out to short sentence.Further, similarity value computing unit1022 can calculate the similarity value between the first keyword and the second keyword using above-mentioned similarity algorithm.
In an alternative embodiment, above-mentioned apparatus 10 can carry out similarity detection using machine learning detection algorithm, pass throughMachine sample training is carried out to features such as keyword, short sentences, improves the discrimination of machine detection algorithm.
Ask respond module 103 is declared, for responding original when similarity value is less than or equal to minimum similarity degree threshold valueStatement request carries out content publication to original content.
In the specific implementation, when similarity value is less than or equal to minimum similarity degree threshold value, it is believed that original statement requestCorresponding original content is the original content of author, so that declaring ask respond module 103 can there is no the suspicion for plagiarizing infringementContent publication is carried out to original content to respond original statement request.
In an alternative embodiment, when similarity threshold is greater than or equal to maximum similarity threshold value, it is believed that original ShenThe corresponding original content of bright request is to plagiarize, and the creation of author belongs to abuse, to request response stopping modular 104 canTo stop response original content statement request, forbids the publication of non-original content, prevent real original content from being plagiarized.It is optional, stop after responding above-mentioned request, warning information output module 106 can export infringement warning information.
In an alternative embodiment, above-mentioned apparatus 10 can be directed to each one credit value account of user setting, and user does not send outOne original content of table can increase the size of credit value in the account, if there are acts of plagiarism for the content of publication, reduce accountThe size of indoor credit value.It is understood that above-mentioned apparatus 10 can reduce after stopping the above-mentioned original statement request of responseThe size of credit value in the corresponding credit accounts of the request.
In an alternative embodiment, when above-mentioned similarity threshold is greater than minimum similarity degree threshold value and is less than maximum similarity threshold valueWhen, it is believed that requesting corresponding original content, there may be plagiarism suspicion, dislike to further determine whether really to exist to plagiarizeIt doubts, audit mode matching module 105 can request corresponding original content to match asynchronous audit mode for original content statement, byIt is manually audited, increases the accuracy to similarity detection.
In embodiments of the present invention, by carrying out the detection of the similarity before original statement to original content, in similarity valueWhen meeting minimum threshold, original statement request is responded, original content is issued.The publication of non-original content is prevented in advance, is avoidedGeneration tortious.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be withRelevant hardware is instructed to complete by computer program, the program can be stored in computer-readable storage mediumIn, the program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, the storage medium can be magneticDish, CD, read-only memory (Read-Only Memory, ROM) or random access memory (Random AccessMemory, RAM) etc..
The above disclosure is only the preferred embodiments of the present invention, cannot limit the right model of the present invention with this certainlyIt encloses, therefore equivalent changes made in accordance with the claims of the present invention, is still within the scope of the present invention.