Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a schematic configuration diagram of an implementation environment according to an embodiment of the present invention. As shown in fig. 1, the text presentation system 100 includes a server 110 and a client 120. The server 110 further includes a code database 111, a corpus database 112, and a textgeneration processing unit 113. The code database 111 stores code data for each topic and updates the code data in real time; the corpus database 112 stores corpora such as phrases and phrases used in generating texts.
In an embodiment of the present invention, the textgeneration processing unit 113 is configured to read real-time code data in the code database 111, identify behavioral data, determine a descriptive phrase, and generate a text in combination with the corpus database 112.
Then, the server 110 sends the generated text to the client 120, and the client 120 serves as an application program of the media populator, recommends the presentation text to the user, and provides a social platform for the user to interact. Wherein, the server 110 and the client 120 can be connected in a wired or wireless way.
Fig. 2 is an exemplary flowchart of a text generation method according to an embodiment of the present invention. The method is applied to the server. As shown in fig. 2, the method may include the steps of:
step 201, acquiring real-time code data for a theme, wherein the real-time code data is written according to a machine language and carries content data under the theme.
In this step, a code database in the server obtains and stores real-time code data for a topic, and the code data is written according to a machine language and carries content data under the topic. Machine languages are, for example, java, PHP (hypertext preprocessor), asp, ruby, etc., each with its own set of programming rules.
The theme can be a sports event, and the written code data comprises content data of various items in the sports event, match details, scores and the like of various athletes; for another example, the subject may be a singing game, and the written code data includes content data of each link, game details, scores and the like of each singer. As another example, the theme may be a remote controlled toy vehicle tournament, with the coded data being written including content data in individual tracks, operating speeds, operating times, rankings, etc. of individual toy vehicles.
Atstep 202, behavioral data of at least one object is identified from the real-time code data.
This step enables the conversion from machine-readable "code data" to client-user-readable "performance data". Specifically, the object includes a person, an animal, or an object, i.e., a subject of participation on a certain subject. For each object, its behavioral performance data includes one or more behaviors and performance assessment data corresponding to each behavior.
For example, when the subject is a sports event, the subject is an athlete, performance data of each athlete corresponds to each game detail of the athlete during the game, the performance includes each action of the athlete, and the performance evaluation data includes a score of each action, an evaluation result of a referee, a total score, a prize result, and the like. Taking the diving report as an example, the diving game is a scoring and classifying game, and the performance data of each athlete comprises each technical detail (namely a plurality of behaviors) and corresponding score (namely performance evaluation data corresponding to each behavior) such as walking board, running platform, take-off, height, difficulty coefficient, air attitude, coordination and coordination, water entry action and the like.
When the server identifies, firstly, the mapping relation between the real-time code data and the object, the behavior and the performance judging data is set according to the writing rule of the code, and then the behavior and the performance judging data of each object are identified from the real-time code data according to the mapping relation.
For example, using java as machine language as an example, mapping various fields in the real-time code data into object, behavior and performance evaluation data according to the grammar rules of java. For example, the field "object" is mapped to "object", the field "action" is mapped to "action", and the field "score" is mapped to "performance judgment data".
Table 1 lists performance data results according to an embodiment of the invention. The theme is a 3-meter double springboard playoff of the Riyoyo-Olympic Congress, the server identifies that the object comprises two athletes, namely Wuminxia and Shitingv 25035, the ages of which are 31 and 24 respectively, the behaviors comprise five rounds of diving actions, and the performance judging data comprises scores and ranks of each round and final total scores and medal results.
Table 1 identified performance data results
Step 203, determining at least one description phrase according to the behavior data of at least one object.
This step implements the association from "performance data" to "descriptive phrases". Specifically, there are three methods for determining a descriptive phrase:
the method comprises the following steps: comparing the behavior expression data of the plurality of objects under the theme, and selecting at least one description phrase matched with the comparison result from preset description phrases.
The method is to transversely compare a plurality of objects aiming at the same project under the same theme. For example, the theme of the diving game of the olympic games includes a total of 8 game items, which are: the diving board has the advantages of 3 meters for women, 3 meters for men, 3 meters for women, 3 meters for men, 10 meters for women, 10 meters for men, 10 meters for women and 10 meters for men. And synchronously comparing the data of a plurality of athletes aiming at the same game item, and restoring a describable phrase according to the comparison result.
In an embodiment, the results of comparing the behavior performance data of the plurality of objects pairwise include greater than, equal to, or less than, and the server presets a plurality of corresponding description phrases. Table 2 shows the default descriptive phrases based on comparison with historical performance data according to an embodiment of the present invention.
TABLE 2 Preset descriptive phrases based on comparison with historical performance data
For example, in the Liyoyo Olympic 400-meter freestyle playoff, the player has a final score of 3 minutes 41 seconds 68 for Sunpun, and the other player has a final score of 3 minutes 41 seconds 55 for Holton, and comparing the two scores, the score of Sunpun is slightly lower than that of Holton, and the server can determine that the corresponding descriptive phrases are "slightly behind", "not enemy opponent", and "regret".
The second method comprises the following steps: for each subject, the current real-time performance data is compared longitudinally with the historical performance data.
Specifically, fig. 3 is an exemplary flowchart for determining descriptive phrases in accordance with one embodiment of the present invention. As shown in fig. 3, the method comprises the following steps:
step 2031, for each object, obtaining historical performance data of the object.
In the case of a sporting event, the historical performance data may include past performance of the athlete, world rankings, project adjustments, and the like.
Step 2032, comparing the behavior expression data of the object with the historical expression data according to a plurality of data types.
The step adopts a classification comparison method, and is divided into a plurality of data types according to the attributes of the data, such as the match place, the age of the object, the action of each link, the score, the ranking and the like.
Step 2033, selecting a comparison result with a display value from the comparison results of the data types.
And considering whether the finally generated text content has a showing value for the user, screening out a showing value comparison result from a plurality of comparison results of a plurality of data types. The screening method can be that each comparison result is scored according to the display value, then the scores are sorted, and a plurality of comparison results with the display value are selected from the scores
For example, for the athlete's grandson pops, there is no reported value for the comparison of their locations participating in the freestyle game, such as "Beijing Olympic Games" versus "Riyoyo Olympic Games", and the comparison is deemed to have no demonstrated value and is scored as 0. For another example, the speed of each 50m stage in the free swimming of grand poplar is compared, the comparison result can indicate different achievement gaps, the larger the gap is, the higher the score is, the more the reported value is, and the comparison result is screened out to be considered to have the display value.
Step 2034, selecting at least one description phrase matching with the comparison result with the display value from the preset description phrases.
Similar to the description of the first method, the comparison result between the performance data and the historical performance data can be divided into greater than, equal to and less than, so that at least one matching description phrase can be selected from the preset description phrases.
The third method comprises the following steps: for each subject, the current real-time performance data is compared to performance expectation data.
Specifically, for each object, acquiring performance expectation data of the object; comparing the behavior expression data of the object with the expression expectation data, and selecting at least one description phrase matched with the comparison result from a preset description phrase group.
Table 3 shows the default descriptive phrases in accordance with the comparison with the expected performance data according to one embodiment of the present invention. For example, taking the performance of grand poplar participating in a man 400 m freestyle playoff as an example, the grand poplar was judged to have "regret" rather than exceeded expectations based on the expected value before the playoff, since it was behind Holton.
TABLE 3 Preset descriptive phrases based on comparison with expected performance data
Step 204, generating a text of the subject according to the behavior data of the at least one object and the at least one description phrase.
This step enables the extension from scattered "behavioral expression data", "descriptive phrases" to complete "text". The specific generation method comprises the steps of selecting a connecting word for each description phrase in a preset corpus database; connecting the behavior expression data, the conjunction words and the description phrases of at least one object into at least one short sentence; and combining at least one short sentence into at least one paragraph, and connecting at least one paragraph to obtain the text.
The linking words have functions of "play", "forward", "close", and specifically include transition words of context, connective words of tone, connective words of logic, background introduction in historical presentation data, and the like. For example, taking the performance of a man 400 m freestyle finals as an example, for the object "grand poplar", a plurality of descriptive phrases are determined: unfortunately, the crown cannot be defended, the countryside can be captured by the Chinese all the time, and the enemy-free opponent Hall is avoided. For "unfortunately failing to defend the crown", determine the conjunction word as the co-located word "get army"; for the 'always being caught by a Chinese to catch a great hope', the linking word is determined to be the reason of 'the Chinese swimming male player who obtains the Olympic Game gold medal as the first place and is also the last Olympic game champion of the project'; for "not enemy opponent, Holton", the conjunctive word is determined to be the disjunctive word "but final".
FIG. 4 is a diagram illustrating a generated text according to an embodiment of the invention. Included in the generated text is a description of a number of objects under the theme of a man's 400 m freestyle duel, including athletes holton, grand poplar, qiubao, gai, debyler, daiti. The generated text includes three paragraphs, each including a player's performance, ranking, and a descriptive phrase identified by underlining.
In the embodiment, the real-time code data for a theme is acquired, the behavioral expression data of at least one object is identified from the real-time code data, at least one description phrase is determined according to the behavioral expression data of the at least one object, the text of the theme is generated according to the behavioral expression data of the at least one object and the at least one description phrase, and the code database is directly connected without depending on a live broadcast system, so that the dependence on artificial character live broadcast reports in the prior art is eliminated, all technical details of a match and related data of performance judgment can be restored into a vivid and humanized match text expression, the speed is high, the information quantity is large, the readability is achieved, and the humanization of machine reports is really achieved.
Based on the text generation method provided in the above embodiment, the robot can independently realize humanized expression of robot reports through the learning and algorithm of the robot, and the report text brought by the humanized expression technology passes the turing test (i.e. if the computer can answer a series of questions proposed by a human tester within 5 minutes and more than 30% of the answers are mistaken for the human being, the computer passes the test), the quality of the manuscript is not different from that of the manual report.
In addition, compared with a method for splicing match text reports by converting live television voice explanations into text descriptions through voice recognition, due to the technical limitation of voice-to-text conversion, the conversion error rate is quite high, and batch application is not available.
Fig. 5 is an exemplary flowchart of a text generation method according to another embodiment of the present invention. The method is applied to the server. As shown in fig. 5, the method may include the steps of:
step 501, acquiring real-time code data for a theme, wherein the real-time code data is written according to a machine language and carries content data under the theme.
Step 502, identifying the behavioral performance data of at least one object from the real-time code data according to the mapping relationship, wherein the behavioral performance data comprises one or more behaviors and performance judgment data corresponding to each behavior.
Referring to the description ofstep 202, the server first sets a mapping relationship between the real-time code data and the object, behavior, and performance evaluation data according to the writing rule of the code, and then performs recognition according to the mapping relationship.
FIG. 6 is a diagram illustrating a generated text according to another embodiment of the present invention. Theinterface 600 shown in FIG. 6 illustrates a generated text, which is presented atblock 610 with the title "Olympic Water 1 st gold! Wuminoxix/sturtian \25035, a promotional party for the text is given inbox 620 as "Tencent sports", and a body of text is given inbox 630, including all performance data given in table 1 above, identified by underlining.
Step 503, determining at least one description phrase according to the behavior performance data of at least one object, the historical performance data of each object and the performance expectation data.
This step combines the three methods for determining the descriptive phrases given instep 203, and will not be described herein again.
Step 504, based on the corpus database, generating a text of the subject according to the behavioral expression data of the at least one object and the at least one description phrase.
Based on the description instep 204, when generating a text, selecting a conjunction word for each description phrase in a preset corpus database; connecting the behavior expression data, the conjunction words and the description phrases of at least one object into at least one short sentence; and combining at least one short sentence into at least one paragraph, and connecting at least one paragraph to obtain the text.
Considering that texts may have various styles, when at least one short sentence is combined into at least one paragraph, a plurality of types of paragraph templates and a word count limit of each paragraph template are preset. These different types of paragraph templates constitute different styles of text. For example, types of paragraph templates include abstract, background introduction, detailed text, overview, appendix, and the like.
And for each paragraph template, determining at least one short sentence matched with the paragraph template, and combining the determined at least one short sentence to obtain a paragraph, wherein the number of words of the paragraph does not exceed the word number limit of the paragraph template.
And 505, performing keyword review on the generated text.
The review includes keyword review, and the manuscripts with higher risk weighting level can be submitted to a manual review window for review.
And step 506, sending the checked text to the client for display.
FIG. 7 is a diagram of displayed text according to an embodiment of the invention. In the client'sdisplay interface 700, a story showing a sporting event is recommended to the user. Entitled "Zhang Meng Xue wins Ri Yong Olympic capital for the Chinese military," Ten Cungo sports "for the promotional party," Ten Cungo sports "for date" 2016-08-07, "and" 22:23 "for the roll-out time reported, and provides a" comment "option (see 721) and a" share "option (see 722) for the user to interact on the social platform. An abstract of the report is given inblock 730, the highlight of the report is given inblock 740, "game focus", the detailed text of the report is given inblock 750, "highlight playback", and the appendix of the report is given inblock 760, "player material".
In this embodiment, at least one description phrase is determined according to the behavior expression data of at least one object, the historical expression data of each object, and the expression expectation data, so that the behavior expression data can be associated with a plurality of description phrases, the content of the text is enriched, and the information amount and readability of the text are further improved. In addition, when the paragraphs are combined, by setting different types of paragraph templates, different styles of expression texts can be intelligently selected according to the match result, so that various vivid and humanized texts can be output for the user to browse and read.
Fig. 8 is a schematic structural diagram of a server according to an embodiment of the present invention. As shown in fig. 8, theserver 800 includes:
the acquiringmodule 810 is configured to acquire real-time code data for a theme, where the real-time code data is written according to a machine language and carries content data under the theme;
an identifyingmodule 820, configured to identify performance data of at least one object from the real-time code data obtained by the obtainingmodule 810;
a determiningmodule 830, configured to determine at least one description phrase according to the behavior data of the at least one object obtained by the identifyingmodule 820; and a process for the preparation of a coating,
agenerating module 840, configured to generate a text of the topic according to the behavior data of the at least one object obtained by the identifyingmodule 820 and the at least one description phrase determined by the determiningmodule 830.
In an embodiment, theserver 800 further comprises:
asetting module 850, configured to set a mapping relationship between the real-time code data and the object, behavior, and performance evaluation data according to a writing rule of the code;
the behavior performance data includes one or more behaviors and performance evaluation data corresponding to each behavior, and the identifyingmodule 820 is configured to identify the behavior and the performance evaluation data of each object according to the mapping relationship set by thesetting module 850.
In an embodiment, the determiningmodule 830 is configured to, for each object, obtain historical performance data of the object; the behavior expression data and the historical expression data of the object are respectively compared according to a plurality of data types, the comparison result with the display value is screened out from a plurality of comparison results of the plurality of data types, and at least one description phrase matched with the comparison result with the display value is selected from a preset description phrase group.
In one embodiment, the determiningmodule 830 is configured to, for each object, obtain performance expectation data of the object; comparing the behavior expression data of the object with the expression expectation data, and selecting at least one description phrase matched with the comparison result from a preset description phrase group.
In an embodiment, thegenerating module 840 is configured to select a conjunction word for each description phrase in a preset corpus database; connecting the behavior expression data, the conjunction words and the description phrases of at least one object into at least one short sentence; and combining at least one short sentence into at least one paragraph, and connecting at least one paragraph to obtain the text.
Fig. 9 is a schematic structural diagram of a server according to another embodiment of the present invention. Theserver 900 may include: aprocessor 910, amemory 920, aport 930, and abus 940. Theprocessor 910 and thememory 920 are interconnected by abus 940.Processor 910 may receive and transmit data throughport 930. Wherein,
processor 910 is configured to execute modules of machine-readable instructions stored bymemory 920.
Memory 920 stores modules of machine-readable instructions executable byprocessor 910. Theprocessor 910 may execute instruction modules including: the device comprises anacquisition module 921, arecognition module 922, adetermination module 923 and ageneration module 924. Wherein,
the obtainingmodule 921 when executed by theprocessor 910 may be: and acquiring real-time code data aiming at a theme, wherein the real-time code data is written according to a machine language and carries content data under the theme.
Therecognition module 922 when executed by theprocessor 910 may be: behavioral performance data of at least one object is identified from the real-time code data obtained by the obtainingmodule 921.
The determiningmodule 923, when executed by theprocessor 910, may be: at least one description phrase is determined according to the behavior data of at least one object obtained by therecognition module 922.
Thegeneration module 924 when executed by theprocessor 910 may be: and generating a text of the subject according to the behavior performance data of the at least one object obtained by therecognition module 922 and the at least one description phrase determined by thedetermination module 923.
In one embodiment, the instruction modules executable by theprocessor 910 further include: asetup module 925. Wherein,
thesetup module 925 when executed by theprocessor 910 may be to: setting a mapping relation between real-time code data and object, behavior and performance evaluation data according to a writing rule of the code;
where the performance data includes one or more behaviors and performance assessment data corresponding to each behavior, therecognition module 922 when executed by theprocessor 910 may be to: the behavior and performance evaluation data of each object are identified according to the mapping relationship set by thesetting module 925.
It can be seen that the instruction modules stored in thememory 920 can implement various functions of the acquisition module, the identification module, the determination module, the generation module and the setting module in the foregoing embodiments when executed by theprocessor 910.
In the above device and system embodiments, the specific method for each module and unit to implement its own function is described in the method embodiment, and is not described here again.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each module may exist alone physically, or two or more modules are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
In addition, each of the embodiments of the present invention can be realized by a data processing program executed by a data processing apparatus such as a computer. It is clear that the data processing program constitutes the invention. Further, the data processing program, which is generally stored in one storage medium, is executed by directly reading the program out of the storage medium or by installing or copying the program into a storage device (such as a hard disk and/or a memory) of the data processing device. Such a storage medium therefore also constitutes the present invention. The storage medium may use any type of recording means, such as a paper storage medium (e.g., paper tape, etc.), a magnetic storage medium (e.g., a flexible disk, a hard disk, a flash memory, etc.), an optical storage medium (e.g., a CD-ROM, etc.), a magneto-optical storage medium (e.g., an MO, etc.), and the like.
The invention therefore also discloses a storage medium in which a data processing program is stored which is designed to carry out any one of the embodiments of the method according to the invention described above.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.