CN111507082A

Movatterモバイル変換

Info

Publication number: CN111507082A
Application number: CN202010328492.2A
Authority: CN
Inventors: 张红军; 李小鹏; 李顺龙
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2020-04-23
Filing date: 2020-04-23
Publication date: 2020-08-07

Abstract

The application provides a text processing method and device, a storage medium and an electronic device, wherein the method comprises the following steps: acquiring a target text, wherein the target text is a text to be split into a plurality of sub-texts; determining a target splitting mode matched with the target text, wherein the target splitting mode is used for splitting the target text into a plurality of sub-texts according to the position of the target element in the target text; and splitting the target text based on the target splitting mode to obtain a plurality of target sub-texts corresponding to the target text. Through the method and the device, the problems of low splitting efficiency and high labor cost in a manual text splitting mode in the related technology are solved, the labor cost of text splitting is reduced, and the text splitting efficiency is improved.

Description

Text processing method and device, storage medium and electronic device

Technical Field

The present application relates to the field of computers, and in particular, to a text processing method and apparatus, a storage medium, and an electronic apparatus.

Background

In the process from planning to shooting finished products of TV plays, movies and the like, the resolution of the script is the most basic link of the whole shooting period. As the raw data input of each link such as later planning and shooting, the core elements of each session in the scenario can be extracted according to the content of the original scenario by the scenario disassembling.

Therefore, in the related art, the method of manually splitting the text (e.g., script) has the problems of low splitting efficiency and high labor cost.

Disclosure of Invention

The embodiment of the application provides a text processing method and device, a storage medium and an electronic device, and aims to at least solve the problems of low splitting efficiency and high labor cost in a manual text splitting mode in the related art.

According to an aspect of an embodiment of the present application, there is provided a text processing method, including: acquiring a target text, wherein the target text is a text to be split into a plurality of sub-texts; determining a target splitting mode matched with the target text, wherein the target splitting mode is used for splitting the target text into a plurality of sub-texts according to the position of the target element in the target text; and splitting the target text based on the target splitting mode to obtain a plurality of target sub-texts corresponding to the target text.

According to another aspect of an embodiment of the present application, there is provided a text processing apparatus including: the device comprises a first obtaining unit, a second obtaining unit and a third obtaining unit, wherein the first obtaining unit is used for obtaining a target text, and the target text is a text to be split into a plurality of sub-texts; the determining unit is used for determining a target splitting mode matched with the target text, wherein the target splitting mode is used for splitting the target text into a plurality of sub-texts according to the position of the target element in the target text; and the splitting unit is used for splitting the target text based on the target splitting mode to obtain a plurality of target sub-texts corresponding to the target text.

According to a further aspect of an embodiment of the present application, there is also provided a computer-readable storage medium having a computer program stored thereon, wherein the computer program is configured to perform the steps of any of the above method embodiments when executed.

According to a further aspect of an embodiment of the present application, there is also provided an electronic apparatus, including a memory and a processor, the memory storing a computer program therein, the processor being configured to execute the computer program to perform the steps in any of the above method embodiments.

According to the method and the device, the target text is obtained in a mode of carrying out text splitting and positioning according to the position of the target element in the target text, wherein the target text is a text to be split into a plurality of sub-texts; determining a target splitting mode matched with the target text, wherein the target splitting mode is used for splitting the target text into a plurality of sub-texts according to the position of the target element in the target text; the method comprises the steps of splitting a target text based on a target splitting mode to obtain a plurality of target sub-texts corresponding to the target text, and splitting and positioning the text according to the position of a target element (such as a scene sequence number, a position, atmosphere, time and the like) in the target text by presetting the splitting mode corresponding to the text (such as a script), so that the accuracy of splitting position determination can be ensured, the text splitting can be automatically completed, the effects of reducing the labor cost of text splitting and improving the text splitting efficiency are achieved, and the problems of low splitting efficiency and high labor cost existing in the mode of manually splitting the script in the related technology are solved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.

FIG. 1 is a block diagram of an alternative server hardware configuration according to an embodiment of the present application;

FIG. 2 is a flow diagram of an alternative text processing method according to an embodiment of the present application;

FIG. 3 is a schematic diagram of an alternative text processing method according to an embodiment of the application;

FIG. 4 is a schematic diagram of another alternative text processing method according to an embodiment of the application;

FIG. 5 is a flow diagram of another alternative text processing method according to an embodiment of the present application;

FIG. 6 is a schematic diagram of an alternative text processing apparatus according to an embodiment of the present application;

FIG. 7 is a schematic diagram of an alternative text processing apparatus according to an embodiment of the present application;

FIG. 8 is a schematic diagram of yet another alternative text processing apparatus according to an embodiment of the present application;

FIG. 9 is a schematic diagram of yet another alternative text processing apparatus according to an embodiment of the present application;

FIG. 10 is a schematic diagram of yet another alternative text processing apparatus according to an embodiment of the present application;

FIG. 11 is a schematic diagram of yet another alternative text processing apparatus according to an embodiment of the present application;

FIG. 12 is a schematic diagram of yet another alternative text processing apparatus according to an embodiment of the present application;

FIG. 13 is a schematic diagram of yet another alternative text processing apparatus according to an embodiment of the present application.

Detailed Description

The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.

According to one aspect of an embodiment of the present application, there is provided a text processing method. Alternatively, the method may be performed in a server or similar computing device. Taking an example of an application running on a server, fig. 1 is a block diagram of a hardware structure of an optional server according to an embodiment of the present application. As shown in fig. 1, the server 10 may include one or more processors 102 (only one is shown in fig. 1), wherein the processors 102 may include, but are not limited to, a processing device such as an MCU (micro controller Unit) or an FPGA (Field Programmable Gate Array) and a memory 104 for storing data, and optionally, the server may further include atransmission device 106 for communication function and an input/output device 108. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration, and is not intended to limit the structure of the server. For example, the server 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.

The memory 104 can be used for storing computer programs, for example, software programs and modules of application software, such as computer programs corresponding to the text processing method in the embodiment of the present application, and the processor 102 executes various functional applications and data processing by running the computer programs stored in the memory 104, so as to implement the method described above. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, memory 104 may further include memory located remotely from processor 102, which may be connected to server 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

Thetransmission device 106 is used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the server 10. In one example, thetransmission device 106 includes a NIC (Network Interface Controller) that can be connected to other Network devices through a base station to communicate with the internet. In one example, thetransmission device 106 may be an RF (Radio Frequency) module, which is used for communicating with the internet in a wireless manner.

In this embodiment, a text processing method operating on the server is provided, and fig. 2 is a flowchart of an alternative text processing method according to an embodiment of the present application, and as shown in fig. 2, the flowchart includes the following steps:

step S202, a target text is obtained, wherein the target text is a text to be split into a plurality of sub-texts;

step S204, determining a target splitting mode matched with the target text, wherein the target splitting mode is used for splitting the target text into a plurality of sub-texts according to the position of the target element in the target text;

and step S206, splitting the target text based on the target splitting mode to obtain a plurality of target sub-texts corresponding to the target text.

Optionally, the executing subject of the above steps may be a server, but is not limited thereto, and other devices (for example, terminal devices) capable of performing text processing may be used to execute the method in the embodiment of the present application.

Through this embodiment, adopt the mode of carrying out the text split location according to the position of target element in the target text, owing to predetermine the split mode that corresponds with the text, carry out the text split location according to the position of target element in the target text, can guarantee the accuracy that the split position was confirmed to can accomplish the text split automatically, solve the problem that the split efficiency that the mode of carrying out the script split manually among the correlation technique exists is low, the cost of labor is high, the cost of labor of text split has been reduced, the efficiency of text split has been improved.

The following explains a text processing method in the embodiment of the present application with reference to fig. 2.

In step S202, a target text is obtained, where the target text is a text to be split into a plurality of sub-texts.

Based on the text processing requirements, one text needs to be split into multiple sub-texts. For example, for a scenario (a kind of target text), a fast and general way is needed to help a practitioner segment (i.e., split) the entire scenario to ensure that the staging stage of the pre-shooting stage is successfully completed. However, the scenario splitting and the element marking are performed in a manner that the user manually finishes the splitting of each field and the element marking, so that the working efficiency is low, and the field numbering error is easy to occur.

To overcome at least one of the above problems, a processing method for automatically splitting a text is provided in the present embodiment. The method may be applied in a network architecture as shown in fig. 3. The terminal device is connected with the server through a network, and is used for uploading texts, displaying information, interacting with a user in a man-machine manner, receiving split sub-texts and the like, and the processing process can be executed through a client of the running target application of the terminal device.

The terminal equipment can be operated with a client of a text splitting application, and through the client, a user can select a target text to be split and upload the selected target text to a background, namely a server for splitting the text. The background may receive the uploaded target text.

As an alternative embodiment, the obtaining the target text includes: and receiving a target script uploaded by the target object through the client, wherein the target script is a target text.

The text to be split may be a script, the multiple sub scripts split into may be multiple sub scripts, and the sub scripts may be the times contained in the script. The user can upload the script through the movie making system, and after the script is uploaded, the server can acquire the target script. The target scenario includes a plurality of sessions, and the server needs to split the target scenario into a plurality of sub scenarios, that is, a plurality of sessions.

Through this embodiment, upload the target script through the customer end, carry out the script split by the server to can reduce user's operation, realize one-button script split, thereby promote work efficiency.

In step S204, a target splitting pattern matching the target text is determined, where the target splitting pattern is used to split the target text into a plurality of sub-texts according to the position of the target element in the target text.

Depending on the content and writing habits of the text record to be used, the text may be written in a particular writing mode (writing format), and different text may be written in different writing modes. After receiving the text to be split, determining a matching splitting mode with the text through mode matching, and thus splitting the text by using the matching mode.

As an alternative embodiment, determining a target splitting pattern matching the target text may include: determining a candidate splitting mode matched with the target text from the multiple splitting modes according to the position of a target element contained in the target text; acquiring update information of the candidate splitting mode, wherein the update information is information obtained by updating element values of target elements determined from a target text based on the candidate splitting mode; and adjusting the candidate splitting mode by using the updating information to obtain a target splitting mode.

The text may contain a plurality of words or phrases, each of which may be referred to as an element. Among all elements, an element or a combination of elements that can be used for text content understanding may be referred to as an element, for example, a person, a scene, a time, a place, a serial number, and the like. Different types of text, differing in their corresponding elements, for example, for a script, "in" may represent an interior field and "out" may represent an exterior field, which is an element that contains a particular meaning, while for other types of text, "in" and "out" may not be an element of the text. Among all the elements, a specific element that can be used for text splitting may be referred to as a key element (i.e., a target element), for example, a scene, a specific serial number (e.g., a field number, etc.).

Because the text writing formats are different, the expression modes (writing modes) of the target elements and the combination modes of the target elements are different, for example, for one script, the starting contents of each scene in the script can be written according to the format of a 'scene time position' (for example, 'within 01-03 hospital corridor day'). The splitting pattern of the text is used to determine the splitting position of the sub-text according to the position of the target element (i.e., the key element) contained in the text. For example, the position of the first sentence of each sub-text is determined based on the position of the target element included in the text. Different splitting modes, which may be used to split different written formats of text.

It should be noted that the server may split the text based on the splitting rule of the splitting mode. The splitting rule of the splitting mode is a basis for splitting the text based on the splitting mode, and defines which elements or elements are determined as target elements and the positions of the target elements in the text, that is, the splitting positions are determined based on what format is used, where the format may be: "target element type 1 target element type 2 … … target element type n", which defines the type of target element included, the positional relationship between the target elements, and the connection sign between adjacent target elements, and which may be the part of speech or the type of the target element, etc.

For example, for a scenario that writes the beginning of each session in a particular format (e.g., "01-03 hospital corridor day"), the splitting rule of the splitting model that matches this scenario format may be configured as "session scene time position", where the target elements and the precedence order of the target elements are: field, scene, time, location, adjacent target elements may be separated by spaces. When the mode is used for recognition, the contents of the target text can be scanned in sequence, the contents which conform to the writing mode of 'scene time position' are determined as the beginning of a field, and the positions among the contents are determined as the positions of script splitting.

It should be noted that the splitting rule of the split mode can be as fuzzy as possible to be compatible with more text examples, for example, the splitting rule of a certain split mode is: regarding the content meeting the writing mode of "scene time position" as the beginning of a field, the splitting rule of the splitting mode can support: blurring irrelevant character areas, for example, a character element is included between the scene and the scene, and the character element is blurred and still recognized as the adjacency of the scene and the scene; and, identification of cross-row key element regions, e.g., "field", "scene", located in different rows, may be identified as the adjacency of two elements, field "and scene".

For the target text, a candidate splitting mode matched with the target text can be determined from the multiple splitting modes according to the position of the target element contained in the target text. For a plurality of splitting modes, the matching degree of each splitting mode and the target text can be confirmed according to the sequence of the splitting modes, random extraction or other modes. For a splitting mode, splitting the target text based on the splitting mode, and obtaining a splitting result of the splitting mode, where the splitting result may include at least one of: the number of the obtained sub-texts is split, and the number of target elements contained in each sub-text is obtained. The degree to which the target text matches the split pattern may be determined by a weighted sum of the number of sub-texts and the number of target elements contained in each sub-text. The split mode with the highest matching degree with the target text in the plurality of split modes can be determined as the candidate split mode.

For example, for the scenario, common mainstream scenario formats can be collected at an early stage to cover most mainstream scenario modes, so that the best matching mode identification can be performed on the scenario input by the user.

For a candidate split mode, the candidate split mode may be directly determined as the target split mode. Or, in order to ensure the accuracy of text splitting, the candidate splitting mode may also be adjusted: and acquiring the update information of the candidate splitting mode, wherein the update information is used for updating the element values of the target elements determined by the candidate splitting mode from the target text, namely which elements in the target text are determined as the target elements.

By the embodiment, the best matching mode is determined and is adjusted by using the updating information, so that the text splitting capability of the text splitting mode can be improved, and the splitting accuracy is ensured.

As an alternative embodiment, the obtaining of the update information of the candidate split mode includes: splitting the target text by using a candidate splitting mode to obtain one or more reference sub-texts; sending the reference subfile to a client of the target object so as to acquire updating information input aiming at the target object through the client; and receiving the updating information returned by the client.

The acquired update information may be input information acquired by way of interaction with the user. In order to facilitate the user to preview and confirm the splitting result of the splitting mode so as to adjust the splitting mode, the target text can be split by using the candidate splitting mode to obtain one or more reference sub-texts, and the reference sub-texts are sent to the client of the target object.

For example, after obtaining a scenario uploaded by a user, the server performs pattern recognition, quickly finds a pattern with the highest matching degree, uses the matched pattern to segment and recognize the scenario, and returns part of the scenarios (for example, 5 scenarios) obtained by segmentation and target elements in each identified scenario to the user for preview.

The client may display one or more reference sub-texts, or may interact with a user (target object), obtain input information of the user, that is, update information, and return the obtained update information to the server. The server can receive the updated information returned by the client. The update information is information obtained by updating the element value of the target element specified from the target text based on the candidate split mode, and is, for example, information obtained by updating the value of "field number" from "field number a" to "field number B" and the value of "position" from null to "inner".

Through the embodiment, the mode adjustment supervision can be ensured by acquiring the updating information through interaction with the user, so that the text splitting capability of the text splitting mode is improved.

For the terminal equipment side, the sub-texts obtained after splitting can be prompted to the user in a mode that the client displays the reference sub-texts, or the sub-texts obtained after splitting can be prompted to the user in a mode that the client displays the reference sub-texts and the element values of the target elements.

As an alternative embodiment, after the reference subfile is sent to the client of the target object, a reference subfile and element information interface may be displayed by the client; detecting target operation executed by a target object on the element information interface; and responding to the target operation, and generating updating information corresponding to the target operation according to the updated reference element value.

After receiving the reference sub-text and the related information sent by the server, the client may display a reference sub-text and element information interface, and the element information interface may display the target element and the reference element value of the target element. The reference element value is an element value of the target element extracted from the reference sub-text by the candidate split mode.

The user may perform an input operation (one of the target operations) on the element information interface, which may include, but is not limited to, at least one of: the element value of the target element is modified, for example, the "field number" is corrected from the "field number a" to the "field number B", a new element value of the target element is added, for example, the reference element value for the "position" element in the reference sub-text is null, and the reference element value for the "position" element is added as "inner", and the reference element value of the target element is updated.

After the client acquires the input operation of the user, the client can generate the update information corresponding to the target operation according to the input operation of the user and send the update information to the server.

According to the method and the device for correcting the split mode, the client displays the reference sub-text and the element value of the target element, and generates the updated information according to the input operation of the user, so that the user can conveniently correct the split mode, and the user experience is improved.

The candidate splitting mode may be adjusted using the update information to obtain a target splitting mode, where adjusting the splitting mode refers to: and adjusting the splitting rule. For elements other than the target element (e.g., a person), a fuzzification process may be performed (i.e., using a target symbol instead of an element other than the target element, the fuzzification symbol may be an "+" or other type of symbol) to form a splitting rule as shown below: keyword type (target element type) 1 keyword type 2 x keyword type 3 keyword type 4, wherein x is an element which is fuzzified.

As an alternative embodiment, the adjusting the candidate splitting mode using the updated information to obtain the target splitting mode includes: performing modeling processing on the update information to obtain model update information, wherein the modeling processing comprises: retaining the target element in the update information, performing fuzzification processing on the elements except the target element (namely, replacing the elements except the target element by the target symbol), and retaining the separator; and updating the candidate splitting mode by using the mode updating information to obtain the target splitting mode.

The user can partially correct the preview information (e.g., the reference sub-text and the element value of the target element included in the reference sub-text) by inputting the information. The client can acquire the content, i.e., the update information, input by the user. For the updated information, the specific updated information input by the user may be converted into the splitting rule of the split mode (i.e., the mode update information) in the foregoing manner, so as to rectify the candidate split mode.

The target text may be split again based on the corrected candidate split mode, so as to obtain new preview information (e.g., new reference sub-text and element values of target elements included in the new reference sub-text), and the new preview information is sent to the user for previewing. If the user confirms that the new preview information is correct (i.e. the new reference sub-text is split correctly without correction), the split mode used at this time is the target split mode. The target text can be globally split by using a target splitting mode, so that all the subfolders are obtained.

Optionally, the server may perform a patterned processing on the input information of the user: firstly, relevant elements, keywords and the like (namely target elements) in input information are reserved, then other non-relevant elements are fuzzified, separators among the elements are reserved, and finally, keyword positions and element-related context positions are recovered to generate a final splitting mode (namely the target splitting mode).

For example, if the server identifies a scenario that does not yet cover the pattern, that is, there is no splitting pattern matching the scenario format of the scenario among splitting patterns preset on the server, the entire scenario or a part of the content of the scenario may be sent to the client of the user for display. The display interface of the client may be as shown in fig. 4, the right window of the interface displays the entire script or partial content of the script, and the parameters displayed by the left window of the interface are key elements (i.e., target elements) in the script, including: script ID (script Id), line content (lineContent), time (time), location (location), scene (scene), field number (scene order), etc. Since there are no matching split patterns, the values of the individual parameters are null.

The user may simply enter left content, including: script ID "18069", line contents "01-03 hospital corridor in day", time "day", location "in"; scene "hospital corridor", session number "03". The background can reserve corresponding keywords for Chinese characters with identification degrees such as scenes, time and the like, weaken common characters (namely fuzzification) for specific scene atmosphere environment contents, determine the relative position of each element in the appearance of the context, weaken Arabic and Chinese character numbers for field numbers, weaken specific separation elements such as spaces and the like, and finally form a splitting mode.

In step S206, the target text is split based on the target splitting mode, and a plurality of target sub-texts corresponding to the target text are obtained.

After the target splitting mode is determined, the target text can be automatically split based on the target splitting mode, and splitting of the target text is completed.

As an optional embodiment, splitting the target text based on the target splitting mode to obtain a plurality of target sub-texts corresponding to the target text includes: determining a target position of the target element in the target text based on the target split mode; determining a target splitting position corresponding to the target position according to the position relation between the position of the target element and the text splitting position; and splitting the target text according to the target splitting position to obtain a plurality of target sub-texts.

Based on the target splitting mode, the server can scan the target text line by line to determine the target position of the target element in the target text, that is, the position of the content meeting the splitting rule in the target text.

After the target position is determined, the target splitting position corresponding to the target position can be determined according to the position relation between the position of the target element and the text splitting position. The positional relationship between the position of the target element and the text splitting position defines that the molecular text is split at a position before, after, or in the middle of the content satisfying the splitting rule. And splitting the target text according to the target splitting position to obtain a plurality of target sub-texts.

Since the target element may appear at multiple positions in the target text, the number of target positions may be multiple, the number of target splitting positions may also be multiple, and the number of target sub-texts into which the target text is split may also be multiple.

It should be noted that the splitting of the target text based on the candidate splitting mode may be a partial splitting of the target text. After the target splitting mode is determined, splitting the target text based on the target splitting mode may be global splitting of the target text, that is, performing line-by-line scanning recognition on the target text (e.g., script text) based on the target splitting mode to complete splitting of sub-texts (e.g., scenes).

By the embodiment, the text splitting position is determined according to the corresponding relation between the position of the target element and the splitting position, so that the text splitting accuracy can be ensured, and the text splitting efficiency can be improved.

In addition to splitting the target text into a plurality of subfolders, the server may parse each subfolder to extract specific elements or specific elements included in each subfolder, such as scenes, field numbers, atmosphere environments, and the like.

For the text of a specific industry, some industry dictionaries can be used for assisting in carrying out secondary correction on the extracted elements, the analysis of all the sub-texts is finally completed, the split multiple target sub-texts and specific elements contained in each target sub-text are obtained, and the analysis results of the multiple target sub-texts and each target sub-text are output.

For example, for a scenario (target text) containing 5 scenes, the segmentation of the scenes can be completed by performing line-by-line scanning recognition of the text of the scenario based on the determined splitting mode. After the segmentation is completed, 5 fields (multiple target sub-texts) are obtained. And extracting elements such as scenes, field numbers, atmosphere environments and the like for each segmented field, and performing secondary correction on the extracted elements by using some industry dictionaries to finish the analysis of all fields. The final output result may be 5 files, each file including the content of one field and the elements of the scene, the field number, the atmosphere environment, and the like in the field, or may also be 1 file, in which the splitting position of the adjacent field, the elements of the scene, the field number, the atmosphere environment, and the like in each field are identified by the identification information.

As an optional embodiment, after splitting the target text based on the target splitting mode to obtain a plurality of target sub-texts corresponding to the target text, a text order table of the plurality of target sub-texts may be obtained; generating a target element information table corresponding to the target text; and saving the text sequence table and the target element information table.

Besides the plurality of target sub-texts, a plurality of summary tables can be generated and stored in a target database (different summary tables can be stored in the same or different databases) so as to form overview cognition on the texts, and therefore subsequent processing can be performed by using the plurality of split target sub-texts. The various summary tables may include, but are not limited to, at least one of: a text order table, a target element information table, wherein the text order table is used for indicating the order of a plurality of target texts in the target texts, for example, a scene order table; the target element information table is a list of target elements included in each target sub-text, for example, a summary table based on a scene dimension, a summary table based on a character dimension.

In addition, a joint summary of the multiple elements contained in each target sub-text may also be generated, for example, a summary based on the scene dimension and the character dimension.

Through the embodiment, when the sub-texts obtained after splitting are output, the summary tables with different dimensions are output, so that the summary understanding of the texts can be conveniently formed, and the subsequent processing flow based on the sub-texts is convenient to perform.

In this example, the target text is a scenario, the sub-text is a field in the scenario, and the target elements are elements such as a field number, a scene, a time, and a position included in the scenario.

In this example, the staff member may collect common script formats and configure splitting modes corresponding to different script formats on the server.

The server can identify a splitting mode which is most matched with the script from a plurality of configured splitting modes according to the script format used by the script input by the user, automatically split the script in a field based on the most matched splitting mode, and also can finish the extraction of elements in each field.

For a small number of scripts or scenes which cannot be covered by the configured splitting mode, the splitting mode is allowed to be automatically identified according to the input information of the user; according to the identified splitting mode, performing field splitting on the script, using an industry dictionary to assist in identifying factors such as atmosphere environment in each field, and finally completing extraction of key factors in each field.

As shown in fig. 5, the flow of the text processing method in this example may include the following steps:

step S502, obtaining the script uploaded by the user, performing pattern recognition on the script, searching the splitting pattern with the highest matching degree with the script, splitting the script by using the splitting pattern, and returning a plurality of split scenes to the user for the user to preview and confirm.

The user can upload the scenario through the movie production system. After the script is uploaded, the server can perform mode recognition on the script, finds the splitting mode with the highest matching degree with the script, and splits the script by using the splitting mode. The splitting process described above may be partial splitting, i.e. splitting only a few fields (a plurality of fields, e.g. 5 fields) in the transcript.

After the plurality of fields are obtained, the split fields can be returned to the user for the user to preview, and whether the split fields are correct or not can be confirmed.

Step S504, the input content of the user is obtained, and the matched splitting mode is adjusted based on the input information to obtain the target splitting mode.

The user can correct the preview information and return the input corrected content, i.e., the input content, to the server.

After receiving the input content of the user, the server may convert the specific input content into a scenario splitting rule (mode update information), and adjust the matched splitting mode using the converted scenario splitting rule to obtain a target splitting mode.

When the input information is processed, the keyword (target element) in the input information is retained, other non-relevant elements are fuzzified and retained with a separator, and then the keyword position and the element-related context position are restored to generate the script splitting rule.

Step S506, the script text is scanned and identified line by line based on the target splitting mode, and the script is split into a plurality of scenes.

After the splitting mode is confirmed to be correct, the script can be scanned line by line based on the target splitting mode, and the script is split into a plurality of fields. And then extracting elements such as scenes, scene numbers, atmosphere environments and the like from the split scenes, secondarily correcting the extracted elements by using some industry dictionaries in an auxiliary way, identifying character props and the like in each scene, and finally completing the analysis of all scenes.

And step S508, acquiring a summary table of the script and outputting the standard script.

Based on the identified elements, a scene order list can be further output, and report output in the early stage of shooting is completed based on a summary list of scene dimensions and character dimensions, so that the staff can form overview cognition on the script.

After obtaining the summary table of each session, the specific elements (e.g., scene, session number, atmosphere environment, etc.) of each session, and the whole scenario, the above information may be used to generate a scenario meeting the standard according to the requirements of the standard scenario, and the generated standard scenario may be returned to the user.

A standard transcript may contain a plurality of files, one of which may correspond to a session, in which a particular element may be identified, or a summary table corresponding to the transcript. Alternatively, the standard scenario may be a file in which individual sessions in the scenario, specific elements contained in individual sessions, and a summary table of the scenario may be identified by index, link, or other means.

Through this example, can shield various script rules to the user, provide transparent formula's script split service, carry out loaded down with trivial details work such as script split and summarization through script split instrument, realize that user one-key formula triggers the statement, greatly promote work efficiency, avoided the movie & TV to shoot the human input of script split in earlier stage, data collection arrangement.

Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present application.

According to another aspect of the embodiments of the present application, there is provided a text processing apparatus for implementing the text processing method in the above embodiments. Optionally, the apparatus is used to implement the above embodiments and preferred embodiments, and details are not repeated for what has been described. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.

Fig. 6 is a schematic diagram of an alternative text processing apparatus according to an embodiment of the present application, as shown in fig. 6, the apparatus including:

a first obtaining unit 602, configured to obtain a target text, where the target text is a text to be split into multiple sub-texts;

a determining unit 604, connected to the first obtaining unit 602, configured to determine a target splitting pattern matched with the target text, where the target splitting pattern is used to split the target text into multiple sub-texts according to a position of the target element in the target text;

the splitting unit 606 is connected to the determining unit 604, and configured to split the target text based on the target splitting mode to obtain multiple target sub-texts corresponding to the target text.

Alternatively, the first obtaining unit 602 may be used in step S202 in the foregoing embodiment, the determining unit 604 may be used in step S204 in the foregoing embodiment, and the splitting unit 606 may be used to execute step S206 in the foregoing embodiment.

As an alternative embodiment, as shown in fig. 7, the first obtaining unit 602 includes:

the receiving module 702 is configured to receive a target scenario uploaded by a target object through a client, where the target scenario is a scenario to be split into multiple sessions, and the target scenario is a target text.

As an alternative embodiment, as shown in fig. 8, the determining unit 604 includes:

a first determining module 802, configured to determine, according to a position of a target element included in a target text in the target text, a candidate split mode matching the target text from among multiple split modes;

an obtaining module 804, configured to obtain update information of a candidate split mode, where the update information is information obtained by updating an element value of a target element determined from a target text based on the candidate split mode;

an adjusting module 806, configured to adjust the candidate splitting mode using the update information to obtain the target splitting mode.

As an alternative embodiment, as shown in fig. 9, the obtaining module 804 includes:

the splitting sub-module 902 is configured to split the target text using the candidate splitting mode to obtain one or more reference sub-texts;

a sending sub-module 904, configured to send the reference sub-text to a client of the target object, so as to obtain, by the client, update information input for the target object;

and the receiving submodule 906 is used for receiving the update information returned by the client.

As an alternative embodiment, as shown in fig. 10, the apparatus further includes:

a display unit 1002, configured to display, by a client, a reference sub-text and an element information interface after sending the reference sub-text to the client of the target object, where the element information interface displays a target element and a reference element value of the target element, and the reference element value is an element value of the target element extracted from the reference sub-text by the candidate split mode;

a detection unit 1004 for detecting a target operation performed by a target object on the element information interface, wherein the target operation is used for updating a reference element value of the target element;

a generating unit 1006, configured to generate, in response to the target operation, update information corresponding to the target operation according to the updated reference element value.

As an alternative embodiment, as shown in fig. 11, the adjusting module 806 includes:

the processing sub-module 1102 is configured to perform a modeling process on the update information to obtain the mode update information, where the modeling process includes: reserving the target element in the update information, performing fuzzification processing on other elements except the target element, and reserving the separator;

and the updating submodule 1104 is configured to update the candidate split mode by using the mode update information, so as to obtain a target split mode.

As an alternative embodiment, as shown in fig. 12, the splitting unit 606 includes:

a second determining module 1202, configured to determine a target position of the target element in the target text based on the target splitting mode;

a third determining module 1204, configured to determine, according to a position relationship between the position of the target element and the text splitting position, a target splitting position corresponding to the target position;

the splitting module 1206 is configured to split the target text according to the target splitting position to obtain a plurality of target sub-texts.

As an alternative embodiment, as shown in fig. 13, the apparatus further includes:

a second obtaining unit 1302, configured to obtain a text order table of a plurality of target sub-texts after splitting the target text based on the target splitting mode to obtain a plurality of target sub-texts corresponding to the target text, where the text order table is used to indicate an order of the plurality of target texts in the target text;

a generating unit 1304 for generating a target element information table corresponding to the target text, wherein the target element information table is a list of target elements included in each of the plurality of target sub-texts;

a holding unit 1306 for holding the text order table and the target element information table.

It should be noted that, the above modules may be implemented by software or hardware, and for the latter, the following may be implemented, but not limited to: the modules are all positioned in the same processor; alternatively, the modules are respectively located in different processors in any combination.

According to yet another aspect of embodiments herein, there is provided a computer-readable storage medium. Optionally, the storage medium has a computer program stored therein, where the computer program is configured to execute the steps in any one of the methods provided in the embodiments of the present application when the computer program is executed.

Alternatively, in the present embodiment, the storage medium may be configured to store a computer program for executing the steps of:

acquiring a target text, wherein the target text is a text to be split into a plurality of sub-texts;

determining a target splitting mode matched with the target text, wherein the target splitting mode is used for splitting the target text into a plurality of sub-texts according to the position of the target element in the target text;

and splitting the target text based on the target splitting mode to obtain a plurality of target sub-texts corresponding to the target text.

Optionally, in this embodiment, the storage medium may include, but is not limited to: a variety of media that can store computer programs, such as a usb disk, a ROM (Read-only Memory), a RAM (Random Access Memory), a removable hard disk, a magnetic disk, or an optical disk.

According to still another aspect of an embodiment of the present application, there is provided an electronic apparatus including: a processor (which may be the processor 102 in fig. 1) and a memory (which may be the memory 104 in fig. 1) having a computer program stored therein, the processor being configured to execute the computer program to perform the steps of any of the above methods provided in embodiments of the present application.

Optionally, the electronic apparatus may further include a transmission device (the transmission device may be thetransmission device 106 in fig. 1) and an input/output device (the input/output device may be the input/output device 108 in fig. 1), wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.

Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:

Optionally, for an optional example in this embodiment, reference may be made to the examples described in the above embodiment and optional implementation, and this embodiment is not described herein again.

It will be apparent to those skilled in the art that the modules or steps of the present application described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present application is not limited to any specific combination of hardware and software.

The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method of text processing, comprising:

determining a target splitting mode matched with the target text, wherein the target splitting mode is used for splitting the target text into a plurality of sub-texts according to the position of a target element in the target text;

2. The method of claim 1, wherein obtaining the target text comprises:

receiving a target scenario uploaded by a target object through a client, wherein the target scenario is a scenario to be split into a plurality of scenes, and the target scenario is the target text.

3. The method of claim 1, wherein determining the target splitting pattern that matches the target text comprises:

determining a candidate splitting mode matched with the target text from a plurality of splitting modes according to the position of a target element contained in the target text;

acquiring update information of the candidate splitting mode, wherein the update information is information obtained by updating element values of the target elements determined from the target text based on the candidate splitting mode;

and adjusting the candidate splitting mode by using the updating information to obtain the target splitting mode.

4. The method of claim 3, wherein obtaining the updated information of the candidate split mode comprises:

splitting the target text based on the candidate splitting mode to obtain one or more reference sub-texts;

sending the reference subfile to a client of a target object so as to acquire the updating information input aiming at the target object through the client;

and receiving the updating information returned by the client.

5. The method of claim 4, wherein after sending the reference subfile to the client of the target object, the method further comprises:

displaying, by the client, the reference sub-text and an element information interface, wherein a target element and a reference element value of the target element are displayed on the element information interface, and the reference element value is an element value of the target element extracted from the reference sub-text by the candidate split mode;

detecting a target operation executed by the target object on the element information interface, wherein the target operation is used for updating a reference element value of the target element;

and responding to the target operation, and generating the updating information corresponding to the target operation according to the updated reference element value.

6. The method of claim 3, wherein using the updated information to adjust the candidate split mode to obtain the target split mode comprises:

performing a modeling process on the update information to obtain model update information, wherein the modeling process includes: reserving the target element in the update information, performing fuzzification processing on elements except the target element, and reserving a separator;

and updating the candidate splitting mode by using the mode updating information to obtain the target splitting mode.

7. The method of claim 1, wherein splitting the target text based on the target splitting pattern to obtain the plurality of target subfiles corresponding to the target text comprises:

determining a target position of the target element in the target text based on the target split mode;

determining a target splitting position corresponding to the target position according to the position relation between the position of the target element and the text splitting position;

and splitting the target text according to the target splitting position to obtain a plurality of target sub-texts.

8. The method according to any one of claims 1 to 7, wherein after splitting the target text based on the target splitting pattern, obtaining the plurality of target sub-texts corresponding to the target text, the method further comprises:

acquiring a text sequence table of the target sub-texts, wherein the text sequence table is used for representing the sequence of the target sub-texts in the target text;

generating a target element information table corresponding to the target text, wherein the target element information table is a list of target elements contained in each of the plurality of target sub-texts;

and saving the text sequence table and the target element information table.

9. A text processing apparatus, comprising:

the device comprises a first obtaining unit, a second obtaining unit and a third obtaining unit, wherein the first obtaining unit is used for obtaining a target text, and the target text is a text to be split into a plurality of sub-texts;

a determining unit, configured to determine a target splitting pattern matched with the target text, where the target splitting pattern is used to split the target text into multiple sub-texts according to a position of a target element in the target text;

and the splitting unit is used for splitting the target text based on the target splitting mode to obtain a plurality of target sub-texts corresponding to the target text.

10. A computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to carry out the method of any one of claims 1 to 8 when executed.

11. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the method of any of claims 1 to 8 by means of the computer program.