Movatterモバイル変換


[0]ホーム

URL:


CN117743809B - A cell detection data preprocessing method, device and storage medium - Google Patents

A cell detection data preprocessing method, device and storage medium
Download PDF

Info

Publication number
CN117743809B
CN117743809BCN202410189827.5ACN202410189827ACN117743809BCN 117743809 BCN117743809 BCN 117743809BCN 202410189827 ACN202410189827 ACN 202410189827ACN 117743809 BCN117743809 BCN 117743809B
Authority
CN
China
Prior art keywords
data
rule
target
sub
rules
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410189827.5A
Other languages
Chinese (zh)
Other versions
CN117743809A (en
Inventor
张秋云
许吾琴
郑进芳
陈广勇
刘扶芮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lab
Original Assignee
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang LabfiledCriticalZhejiang Lab
Priority to CN202410189827.5ApriorityCriticalpatent/CN117743809B/en
Publication of CN117743809ApublicationCriticalpatent/CN117743809A/en
Application grantedgrantedCritical
Publication of CN117743809BpublicationCriticalpatent/CN117743809B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Landscapes

Abstract

Translated fromChinese

本说明书公开了一种细胞检测数据预处理方法、装置及存储介质,获取待处理的各细胞检测数据和各细胞检测数据对应的配置信息,配置信息至少包含若干提取规则,针对各细胞检测数据包含的每个子数据,确定该子数据的数据标识,根据数据标识,确定数据标识对应的各提取规则,依次针对每个提取规则,通过该提取规则从该子数据中提取对应的目标值,按照提取各目标值的顺序,确定该子数据对应的目标数据,根据各子数据对应的目标数据,确定结果数据。通过提取规则,可以实现对细胞检测数据中的各数据值的提取,并将提取到的各数据值重新排列整合为结果数据输出,该结果数据可用于机器学习算法的数据分析,与手动进行预处理相比,提高了预处理效率。

This specification discloses a cell detection data preprocessing method, device and storage medium, which obtains each cell detection data to be processed and the configuration information corresponding to each cell detection data, wherein the configuration information at least includes several extraction rules, and for each sub-data contained in each cell detection data, the data identifier of the sub-data is determined, and according to the data identifier, each extraction rule corresponding to the data identifier is determined, and for each extraction rule in turn, the corresponding target value is extracted from the sub-data through the extraction rule, and the target data corresponding to the sub-data is determined in the order of extracting each target value, and the result data is determined according to the target data corresponding to each sub-data. Through the extraction rules, the extraction of each data value in the cell detection data can be realized, and the extracted data values can be rearranged and integrated into the result data output, and the result data can be used for data analysis of machine learning algorithms, which improves the preprocessing efficiency compared with manual preprocessing.

Description

Cell detection data preprocessing method, device and storage medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a method and an apparatus for preprocessing cell detection data, and a storage medium.
Background
The machine learning algorithm is utilized to analyze the cell detection data in different growth stages, so that the internal rules of various physiological changes of biological cells can be found, and theoretical support is provided for drug research and development.
The cell detection data can be obtained through a specific detection instrument, the same batch of cell samples are generally detected for multiple times, and the detection data obtained by each detection is subjected to data analysis by a machine learning algorithm, but the detection data can be input into the machine learning algorithm after being preprocessed due to the limitation of a data format applicable to the current machine learning algorithm. The existing pretreatment operation is usually completed manually, the efficiency is low, and in order to improve the pretreatment efficiency of the cell detection data, the specification provides a pretreatment method of the cell detection data.
Disclosure of Invention
The present disclosure provides a method, apparatus, storage medium and electronic device for preprocessing cell detection data, so as to at least partially solve the above-mentioned problems in the prior art.
The technical scheme adopted in the specification is as follows:
the specification provides a method for preprocessing cell detection data, comprising the following steps:
Acquiring cell detection data to be processed and configuration information corresponding to the cell detection data, wherein the configuration information at least comprises a plurality of extraction rules;
determining a data identifier of each piece of sub-data contained in the cell detection data, and determining each extraction rule corresponding to the sub-data according to the data identifier;
Sequentially aiming at each extraction rule, extracting a corresponding target value from the sub-data through the extraction rule;
Determining target data corresponding to the sub data composed of the target values according to the sequence of extracting the target values;
And determining result data according to the target data corresponding to each piece of sub data.
Optionally, the configuration information further includes a type identifier, and the data type of each cell detection data includes at least a database type, a data stream type and a data file type;
Before determining the data identification of each sub-data contained in the respective cell detection data, the method further comprises:
determining the data type of each cell detection data according to the type identifier;
When the data type is a database type, taking a data value of a key and each data value corresponding to the key as sub-data;
when the data type is the data stream type, taking the payload carried in one transmission of the data stream as one piece of sub-data;
When the data type is a data file type, one line of data is regarded as one sub data.
Optionally, the configuration information further includes a data processing rule, where the data processing rule includes at least a rule tag and a plurality of extraction rules;
according to the data identification, each extraction rule corresponding to the sub data is determined, and the method specifically comprises the following steps:
Matching the data identifier with a rule tag, and determining the rule tag corresponding to the data identifier as a target tag;
and taking the data processing rule to which the target label belongs as a target rule, and taking each extraction rule contained in the target rule as each extraction rule corresponding to the sub data.
Optionally, for each piece of sub-data included in the cell detection data, determining a data identifier of the piece of sub-data, and determining, according to the data identifier, each extraction rule corresponding to the piece of sub-data, including:
Acquiring update rule information, determining a data processing rule updated by a user as the update rule, wherein the update rule information is information for adding, deleting or modifying the data processing rule in the configuration information by the user;
Determining a data processing rule contained in the updating rule, and taking the data processing rule contained in the updating rule and the data processing rule contained in the configuration information as matching rules;
Respectively taking rule labels of all data processing rules contained in the matching rules as matching labels;
Matching the data identifier with each matching tag, and determining the matching tag corresponding to the data identifier as a target tag;
and taking the data processing rule to which the target label belongs as a target rule, and taking each extraction rule contained in the target rule as each extraction rule corresponding to the sub data.
Optionally, the target tag is a plurality of;
the data processing rule to which the target tag belongs is used as a target rule, and each extraction rule included in the target rule is used as each extraction rule corresponding to the sub data, specifically including:
respectively determining data processing rules to which a plurality of target labels belong as candidate rules;
And determining the priority and the setting time corresponding to each candidate rule, taking the candidate rule with the highest priority and the latest setting time as a target rule, and taking the extraction rule contained in the target rule as each extraction rule corresponding to the sub-data.
Optionally, the configuration information further includes a plurality of execution rules;
according to the order of extracting the target values, determining target data corresponding to the sub data composed of the target values, wherein the target data specifically comprises:
Sequentially storing each target value into an output buffer according to the sequence of extracting each target value;
Determining an operation of each execution rule and a target value required by the operation according to each execution rule, and performing operation on the target value required by the operation through the operation to determine an operation result;
Sequentially storing operation results corresponding to each execution rule into the output buffer;
Outputting the data in the output buffer according to the preservation sequence to obtain the target data corresponding to the sub data.
Optionally, determining the result data according to the target data corresponding to each piece of sub data specifically includes:
Determining the arrangement sequence of each piece of sub data, and sequentially obtaining target data corresponding to each piece of sub data according to the arrangement sequence;
and determining result data according to each target data.
The present specification provides a cell detection data preprocessing apparatus, the apparatus comprising:
the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module acquires all cell detection data to be processed and configuration information corresponding to the cell detection data, and the configuration information at least comprises a plurality of extraction rules;
The extraction rule determining module is used for determining the data identification of each piece of sub-data contained in the cell detection data, and determining each extraction rule corresponding to the sub-data according to the data identification;
the target value determining module sequentially extracts corresponding target values from the sub-data according to each extracting rule;
the target data determining module is used for determining target data corresponding to the sub data formed by the target values according to the sequence of extracting the target values;
and the result data determining module is used for determining result data according to the target data corresponding to each piece of sub data.
The present specification provides a computer readable storage medium storing a computer program which when executed by a processor implements the above cell detection data preprocessing method.
The present specification provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the above cell detection data preprocessing method when executing the program.
The above-mentioned at least one technical scheme that this specification adopted can reach following beneficial effect:
In the cell detection data preprocessing method provided by the specification, each piece of cell detection data to be processed and configuration information corresponding to each piece of cell detection data are acquired, the configuration information at least comprises a plurality of extraction rules, the data identification of each piece of sub-data contained in each piece of cell detection data is determined, each extraction rule corresponding to the data identification is determined according to the data identification, corresponding target values are extracted from the sub-data according to each extraction rule sequentially aiming at each extraction rule through the extraction rules, target data corresponding to the sub-data are determined according to the order of extracting each target value, and result data are determined according to the target data corresponding to each piece of sub-data. The extraction rule in the configuration information can be used for extracting each data value in the cell detection data, and rearranging and integrating the extracted data values into result data to be output.
Drawings
The accompanying drawings, which are included to provide a further understanding of the specification, illustrate and explain the exemplary embodiments of the present specification and their description, are not intended to limit the specification unduly. In the drawings:
FIG. 1 is a schematic flow chart of a method for preprocessing cell detection data in the present specification;
FIG. 2 is an example of partial result data provided herein;
FIG. 3 is a schematic diagram of the content of one configuration information provided in the present specification;
FIG. 4 is a schematic diagram of a device for preprocessing cell detection data provided in the present specification;
fig. 5 is a schematic view of the electronic device corresponding to fig. 1 provided in the present specification.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the present specification more apparent, the technical solutions of the present specification will be clearly and completely described below with reference to specific embodiments of the present specification and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present specification. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present application based on the embodiments herein.
The following describes in detail the technical solutions provided by the embodiments of the present specification with reference to the accompanying drawings.
Fig. 1 is a schematic flow chart of a method for preprocessing cell detection data in the present specification, specifically including the following steps:
S100: and acquiring each piece of cell detection data to be processed and configuration information corresponding to each piece of cell detection data, wherein the configuration information at least comprises a plurality of extraction rules.
All steps in the cell detection data preprocessing method provided in the present specification can be implemented by any electronic device having a computing function, such as a terminal, a server, and the like. For convenience of description, the method for preprocessing cell detection data provided in the present specification will be described below with only a server as an execution subject.
When a user performs pretreatment of cell detection data by applying the pretreatment method provided by the specification, the cell detection data to be treated and configuration information corresponding to each cell detection data are required to be input into a server together, the server acquires each cell detection data and the configuration information, and the pretreatment is performed on each cell detection data according to the configuration information. The configuration information contains at least a number of extraction rules.
The extraction rule is used for extracting data values contained in the cell detection data, and then the server rearranges or carries out specific operation on the extracted data values to obtain result data suitable for machine learning algorithm data analysis.
Because different data processing operations are required in preprocessing in different application scenarios, such as for different cell samples, or for different research purposes, each preprocessing needs to be entered with a profile adapted to the requirements for which each cell detection data of the preprocessing is to be performed.
S102: and determining the data identification of each piece of sub-data contained in the cell detection data, and determining each extraction rule corresponding to the data identification according to the data identification.
First, for each piece of sub-data included in each cell detection data, the server determines the data identification of the piece of sub-data.
For each sub-data contained in the respective cell test data, the data identity is an identity representation of that sub-data, and it can be determined which sub-data is in the respective cell test data. The data identification may be determined according to a reading order when the server reads each sub data.
And then, the server determines each extraction rule corresponding to the sub data according to the data identification.
Specifically, the configuration information also includes a data processing rule, where the data processing rule at least includes a rule tag and a plurality of extraction rules. The rule tag is used for representing which sub-data the data processing rule is subjected to, and the rule tag has a corresponding relation with the data identifier. And the server matches the data identifier of the sub data with the rule tag of the data processing rule, and determines the rule tag corresponding to the data identifier as the target tag. The data processing rules of the sub-data represent the respective data processing operations performed on the sub-data.
The server takes the data processing rule of the target label as a target rule and takes each extraction rule contained in the target rule as each extraction rule corresponding to the sub data.
In the preprocessing method of the present specification, different data processing operations may be performed on each sub-data according to actual requirements, that is, the extraction rules corresponding to different sub-data may be different.
In one or more embodiments of the present disclosure, the server may preset a naming rule of the rule tag, and match the name of the rule tag according to the data identifier. In the configuration information, rule labels of the data processing rules are required to be named according to preset naming rules, and for each piece of sub data, the server matches the data identification of the sub data with the rule labels, and determines the rule labels corresponding to the data identifications of the sub data according to the corresponding relation of the names.
If the naming rule of the preset rule tag is that the data identifier of the sub-data corresponding to the rule tag is used as the suffix. The server can determine the rule label corresponding to the data identifier by performing fuzzy matching on the data identifier and the rule label.
Through the data identification, the extraction rule corresponding to each piece of sub data can be matched in all the extraction rules contained in the configuration information, and the data extraction is carried out on the piece of sub data according to the extraction rule of each piece of sub data.
S104: for each extraction rule in turn, a corresponding target value is extracted from the sub-data by the extraction rule.
The extraction rules are used for extracting data values contained in the cell detection data, and after determining each extraction rule corresponding to the sub data, the server sequentially executes each extraction rule to extract the data of the sub data.
For each extraction rule, the server determines the position of the target value to be extracted by the extraction rule, namely, determines the number of data values contained in the sub data as the target value, and performs data extraction on the sub data according to the determined position to obtain the target value corresponding to the extraction rule.
By extracting the data value of each sub-data, extraction of each data value included in each cell detection data can be achieved so as to rearrange and integrate the extracted data into result data.
S106: and determining target data corresponding to the sub data composed of the target values according to the order of extracting the target values.
Each sub data is composed of a plurality of corresponding extraction rules, the server executes each extraction rule in turn, the target value corresponding to each extraction rule can be obtained, and each target value arranged according to the extraction sequence forms the target data corresponding to the sub data.
Specifically, the server sequentially stores each target data in the output buffer according to the order of extracting each target value, and sequentially outputs each target value in the output buffer according to the storage order to obtain the target data corresponding to the sub data.
S108: and determining result data according to the target data corresponding to each piece of sub data.
Because the purpose of preprocessing is to obtain a normalized table, the server needs to determine the arrangement sequence of each piece of sub data, and according to the arrangement sequence, sequentially output the target data corresponding to each piece of sub data, so as to finally obtain the result data composed of each piece of target data.
Fig. 2 is an example of partial result data provided in the present specification, as shown in fig. 2, one line of data in the result data is target data corresponding to one piece of sub data. And outputting the target data corresponding to each piece of sub data in sequence to obtain result data composed of each piece of data.
In the cell detection data preprocessing method provided by the specification, each piece of cell detection data to be processed and configuration information corresponding to each piece of cell detection data are acquired, the configuration information at least comprises a plurality of extraction rules, the data identification of each piece of sub-data contained in each piece of cell detection data is determined, each extraction rule corresponding to the data identification is determined according to the data identification, corresponding target values are extracted from the sub-data according to each extraction rule sequentially aiming at each extraction rule through the extraction rules, target data corresponding to the sub-data are determined according to the order of extracting each target value, and result data are determined according to the target data corresponding to each piece of sub-data. The extraction rule in the configuration information can be used for extracting each data value in the cell detection data, and rearranging and integrating the extracted data values into result data to be output.
In the step S102, the preprocessing method in the present specification may be compatible with different types of cell detection data, but different implementation manners are required for different data types. Therefore, the server also needs to determine the data type of the cell detection data before determining each sub-data constituting each cell detection data, and the server determines the data type of the cell detection data according to the type identifier.
The cell detection data is entered by the user, so a type identifier characterizing the data type of the cell detection data is also included in the configuration information entered by the user, i.e. the configuration information also includes a type identifier.
The server acquires the type identifier and determines the data type of the cell detection data. The data types of the cell detection data may include database types, data stream types, data file types, etc., with different data types representing different storage forms of the cell detection data. When the data type of the cell detection data is a database type, the cell detection data is stored in the database, and the server takes the data value of one key and each data value corresponding to the key stored in the database as one piece of sub-data. When the data type of the cell detection data is the data stream type, the cell detection data is stored on a network server, the cell detection data is encapsulated in the payload transmitted by the network through a network protocol such as TCP or UDP, and the server takes the payload carried in one transmission of the data stream as one piece of sub data. When the data type of the cell detection data is the data file type, it means that the cell detection data is stored in a file, and the server uses one line of data in the file as one piece of sub data.
It should be noted that although the storage forms of the cell detection data of different data types are different, the data values contained in the sub-data of the same order of the different data types are the same for the same cell detection data stored in the different data types.
Because the cell detection data are obtained by multiple detection, the cell detection data are detected at each preset time node generally crossing each growth cycle of the cell, and a user can select a storage mode of the detection data according to requirements. For example, if a part of the cell detection data is stored in the local server and another part of the cell detection data is stored in another server, then when the pretreatment is performed locally, the cell detection data stored in the local server may be input as a server type, and the cell detection data stored in the other server may be input as a data stream type.
Therefore, in order to increase the convenience of the pretreatment method, the server is enabled to pretreat a plurality of different types of cell detection data. The server may characterize the data type with a data identification, i.e. in this case the data identification is used both for determining which of the cell detection data the sub-data is, and for determining the data type of the cell detection data to which the sub-data belongs.
For example, for cell detection data of a database type, the server specifies that the data identifier has values of "k1, k2, k3, …" indicating the order of the corresponding keys in the database for each sub-data. For the cell detection data of the data stream type, the server specifies that the values of the data identifiers are "p1, p2, p3 and …", and the sequence of the sub-data in network transmission is indicated. For cell detection data of the data file type, the server specifies that the values of the data identifiers are 'q 1, q2, q3 and …', and the sequence of each sub data in the data file is represented.
In the data extraction of each sub data, there may be a case where the positions and the orders of target values to be extracted for the plurality of sub data are identical, that is, the data processing operations performed for the plurality of sub data are identical. The configuration information further includes a data processing rule, where the data processing rule includes at least a rule tag and a plurality of extraction rules, as described in S102. The extraction rules included in the data processing rules corresponding to the plurality of sub-data are consistent, and in this case, in order to prevent code redundancy and improve preprocessing efficiency, a general rule may be set in the configuration information, and the general rule may be matched with the data identifiers of the plurality of sub-data.
For example, if the same data processing operation is performed on the 1 st to 100 th sub data, the 100 th sub data corresponds to a common rule whose rule tag may match the data identification of the 1 st to 100 th sub data.
In addition, in consideration of cell detection data of a database type, quick positioning and searching can be performed according to a mode of a key and a sub key, wherein the key is identity representation of each sub data and can be used for determining a row stored in the database of each sub data, and the sub key can be used for determining a column stored in the database of each data value contained in each sub data.
Therefore, the server can directly specify the data identifier of each sub-data included in the cell detection data of the database type as a single unified data identifier without distinguishing the order of the keys for each cell detection data of the database type. For example, the server specifies the data identification of the cell detection data of the database type as "db". That is, the cell detection data of the database type, the data identifiers corresponding to all the sub-data are the same, and according to the data identifier, the extraction rule corresponding to each sub-data can be determined, and in each extraction rule, the target value to be extracted is determined in a mode of 'key+sub-key'.
In the preprocessing method provided in the present specification, a user may change a data processing rule set previously, and the server determines a data processing rule corresponding to each sub data according to the updated data processing rule and the data processing rule in the configuration information.
Specifically, the server acquires update rule information, determines a data processing rule updated by the user, and uses the update rule information as update rule, wherein the update rule information is information for adding, deleting or modifying the data processing rule in the configuration information by the user. And secondly, determining the data processing rules contained in the updating rules, and taking the data processing rules contained in the updating rules and the data processing rules contained in the configuration information as matching rules. Then, the rule labels of the data processing rules contained in the matching rules are respectively used as matching labels, the data identifiers of the sub data are matched with the matching labels, the matching labels corresponding to the data identifiers are determined to be used as target labels, and the data processing rules to which the target labels belong are used as target rules. And finally, taking each extraction rule contained in the target rule as each extraction rule corresponding to the sub data.
The data processing rule included in the configuration information is a data processing rule set by the user for the first time, and after the user inputs the update rule information, a plurality of corresponding target tags may exist for the same sub-data, which means that the sub-data has a plurality of corresponding data processing rules. The data processing rule indicates that the data processing operation is performed on the sub data, and only one data processing operation can be performed on one sub data, so that only one corresponding data processing rule can be performed on one sub data.
The server respectively determines data processing rules to which a plurality of target labels belong as candidate rules, screens each candidate rule according to preset screening rules, and determines the target rules. The screening rules may be determined in a variety of ways, and this is not a limitation in this specification.
In one or more embodiments of the present description, the screening rules are determined based on the setup time of each candidate rule. Specifically, the server determines the setting time of each candidate rule, and takes the data processing rule with the latest setting time as the target rule of the sub-data. Thus, the server can perform data processing on each sub-data according to the data processing rule which is input by the user recently.
For example, among the data processing rules included in the configuration information, there is a general rule having a rule tag of "rule_1:100", and among the data processing rules included in the update rule information, the user updates the data processing rule of the first sub-data, re-inputs a data processing rule, and the re-input data processing rule has a rule tag of "rule_1". The server takes two rule labels of 'rule_1:100' and 'rule_1' as target labels corresponding to the data identification of the first sub-data, takes the data processing rules corresponding to the two target labels as candidate rules, screens according to the setting time of each candidate rule, and finally takes the candidate rule corresponding to the rule label 'rule_1' as the target rule. When the server processes the first sub-data, the data processing rule corresponding to "rule_1" is set as the reference, and the data processing rule corresponding to "rule_1:100" is not followed. Therefore, the user can flexibly change the data processing rules in the configuration information according to the actual requirements.
In one or more embodiments of the present description, the screening rules are determined based on the priority of each candidate rule. Specifically, for the data processing rule contained in the configuration information, the server assigns the highest priority to the data processing rule, if the data processing rule is updated by the user later, the priority of the updated rule is set to the highest priority, and the priority of the data processing rule contained in the configuration information is reduced. Therefore, the highest priority of the data processing rule which is newly set by the user can be ensured, and the server can perform data processing operation on each piece of sub data according to the data processing rule which is newly set by the user.
However, after the data processing rules are updated for multiple times, the number of candidate rules corresponding to the same sub-data is large, and a large amount of time is consumed to screen target rules from a large number of candidate rules only according to the priority or the set time, so that the preprocessing efficiency is reduced.
Therefore, the server may set a modification threshold, and when the number of modifications of the user is equal to the modification threshold, the priority is reset once. That is, when the update rule information is acquired, it is necessary to determine whether the number of times of acquiring the update rule information is equal to the modification threshold, if not, the priority of the update rule is set to the highest priority, the priority of other data processing rules set previously is not changed, if so, the priority of the update rule is set to the highest priority, and it is determined that the update rule information of the number of modification threshold closest to the current time is acquired, and as an update rule group, the priority of the data processing rules other than the update rule group is lowered by one step. In this case, there may be a plurality of matching rules for the same priority.
The server can screen each candidate rule according to the priority and the setting time at the same time, and the target rule is determined. The server determines the priority and the setting time of each candidate rule, takes the priority as a first screening condition, takes the setting time as a second screening condition, firstly determines the candidate rule with the highest priority, takes the candidate rule as a target rule if the candidate rule with the highest priority is only one, and takes the candidate rule with the highest priority as the target rule if the candidate rule with the highest priority is a plurality of candidate rules according to the second screening condition.
Therefore, when the server determines the target rule from the plurality of candidate rules, the priority or the setting time of each data processing rule does not need to be judged one by one, each data processing rule with the same priority can be used as a group, and the server only needs to judge the priority of each group of data processing rules once, so that the judgment times of the priority are reduced, the speed of determining the target rule is increased, and the preprocessing efficiency is improved.
In the above step S106, in the preprocessing method provided in the present disclosure, in order to satisfy a more complex data analysis scenario, the server may calculate each cell detection data, and obtain result data according to the calculation result and the extracted target data. The server can operate each piece of sub-data contained in each piece of cell detection data according to the execution rules, so the configuration information also contains a plurality of execution rules.
Specifically, the server sequentially stores the target values in the output buffer in the order in which the target values are extracted. For each execution rule, determining an operation of the execution rule and a target value required by the operation, and determining an operation result corresponding to the execution rule by operating the target value required by the operation. And then, sequentially storing operation results corresponding to the execution rules into an output buffer, and outputting data in the output buffer according to the storage sequence to obtain target data corresponding to the sub data.
Fig. 3 is a schematic content diagram of configuration information provided in the present specification, where, as shown in fig. 3, the configuration information includes a type identifier and a data processing rule, and the data processing rule includes a rule tag, m extraction rules, and n execution rules.
Only one data processing rule included in the configuration information shown in fig. 3 may be a plurality of data processing rules in practice.
In addition, it should be noted that, as shown in fig. 3, only one content composition mode of the configuration information, the server may preset other configuration information composition modes, and the user needs to write the configuration information according to the preset configuration information composition modes. For example, the rule tag in the configuration information may also be directly used as an identifier of the extraction rule or the execution rule, where the configuration information includes a plurality of extraction rules and execution rules, and the extraction rules or the execution rules include rule tags.
In order to prevent data loss during the process of acquiring the detection data of each cell, the result data obtained under the condition of data loss is used for data analysis, and the data analysis result is influenced. Therefore, the server acquires the total number of sub-data included in each cell detection data before acquiring each cell detection data.
And the server sequentially determines the target data corresponding to each piece of sub data according to the sequence of each piece of sub data. After all the target data are obtained, the server determines how many sub data are subjected to data processing operation together according to the sequence of the last sub data, and determines the execution quantity of the sub data. The server compares the execution number with the acquired total number, if the execution number is consistent with the acquired total number, the server can determine that no data loss occurs, the result data can be normally used, and if the execution number is inconsistent with the acquired total number, the result data cannot be used for data analysis.
The above is a method for preprocessing cell detection data provided in the present specification, and based on the same concept, the present specification further provides a corresponding device for preprocessing cell detection data, as shown in fig. 4.
Fig. 4 is a schematic diagram of a pretreatment device for cell detection data provided in the present specification, specifically including:
the acquisition module 200 is used for acquiring each piece of cell detection data to be processed and configuration information corresponding to each piece of cell detection data, wherein the configuration information at least comprises a plurality of extraction rules;
the extraction rule determining module 202 determines, for each piece of sub-data included in the cell detection data, a data identifier of the piece of sub-data, and determines, according to the data identifier, each extraction rule corresponding to the piece of sub-data;
the target value determining module 204 sequentially extracts, for each extraction rule, a corresponding target value from the sub data by the extraction rule;
A target data determining module 206, configured to determine target data corresponding to the sub data composed of the target values according to the order in which the target values are extracted;
the result data determining module 208 determines result data according to the target data corresponding to each sub data.
Optionally, the configuration information further includes a type identifier, where the data type of each cell detection data includes at least a database type, a data stream type, and a data file type, and the extraction rule determining module 202 is specifically configured to determine, according to the type identifier, the data type of each cell detection data, when the data type is the database type, a data value of a key and each data value corresponding to the key are used as one piece of sub-data, when the data type is the data stream type, a payload carried in one transmission of the data stream is used as one piece of sub-data, and when the data type is the data file type, one line of data is used as one piece of sub-data.
Optionally, the configuration information further includes a data processing rule, where the data processing rule includes at least a rule tag and a plurality of extraction rules, and the extraction rule determining module 202 is specifically configured to match the data identifier with the rule tag, determine a rule tag corresponding to the data identifier, use the data processing rule to which the target tag belongs as a target tag, and use each extraction rule included in the target rule as each extraction rule corresponding to the sub-data.
Optionally, the extraction rule determining module 202 is specifically configured to obtain update rule information, determine a data processing rule updated by a user, as an update rule, where the update rule information is information that a user adds, deletes or modifies a data processing rule in the configuration information, determine a data processing rule included in the update rule, respectively match a rule tag of each data processing rule included in the update rule with a data processing rule included in the configuration information, match each match tag with a data tag, determine a match tag corresponding to the data tag, and use a data processing rule corresponding to the target tag as a target rule, and use each extraction rule included in the target rule as each extraction rule corresponding to the sub-data.
Optionally, the number of the target tags is multiple, and the extraction rule determining module 202 is specifically configured to determine data processing rules to which the multiple target tags belong, determine, as candidate rules, priorities and setting times corresponding to the candidate rules, use, as target rules, candidate rules with highest priorities and latest setting times, and use, as extraction rules corresponding to the sub-data, extraction rules included in the target rules.
Optionally, the target data determining module 206 is specifically configured to sequentially store each target value in the output buffer according to the order of extracting each target value, sequentially determine, for each execution rule, an operation of the execution rule and a target value required by the operation, perform the operation on the target value required by the operation, determine an operation result, sequentially store the operation result corresponding to each execution rule in the output buffer, and output data in the output buffer according to the storage order, so as to obtain the target data corresponding to the sub data.
Optionally, the result data determining module 208 is specifically configured to determine an arrangement sequence of each piece of sub data, sequentially obtain the target data corresponding to each piece of sub data according to the arrangement sequence, and determine the result data according to each piece of target data.
The present specification also provides a computer readable storage medium storing a computer program operable to perform the cell detection data preprocessing method provided in fig. 1 above.
The present specification also provides a schematic structural diagram of the electronic device shown in fig. 5. At the hardware level, the electronic device includes a processor, an internal bus, a network interface, a memory, and a non-volatile storage, as illustrated in fig. 5, although other hardware required by other services may be included. The processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to implement the cell detection data preprocessing method described in fig. 1. Of course, other implementations, such as logic devices or combinations of hardware and software, are not excluded from the present description, that is, the execution subject of the following processing flows is not limited to each logic unit, but may be hardware or logic devices.
Improvements to one technology can clearly distinguish between improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) and software (improvements to the process flow). However, with the development of technology, many improvements of the current method flows can be regarded as direct improvements of hardware circuit structures. Designers almost always obtain corresponding hardware circuit structures by programming improved method flows into hardware circuits. Therefore, an improvement of a method flow cannot be said to be realized by a hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (e.g., field programmable gate array (Field Programmable GATE ARRAY, FPGA)) is an integrated circuit whose logic functions are determined by user programming of the device. A designer programs to "integrate" a digital system onto a PLD without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Moreover, nowadays, instead of manually manufacturing integrated circuit chips, such programming is mostly implemented with "logic compiler (logic compiler)" software, which is similar to the software compiler used in program development and writing, and the original code before being compiled is also written in a specific programming language, which is called hardware description language (Hardware Description Language, HDL), but HDL is not just one, but a plurality of kinds, such as ABEL(Advanced Boolean Expression Language)、AHDL(Altera Hardware Description Language)、Confluence、CUPL(Cornell University Programming Language)、HDCal、JHDL(Java Hardware Description Language)、Lava、Lola、MyHDL、PALASM、RHDL(Ruby Hardware Description Language), and VHDL (Very-High-SPEED INTEGRATED Circuit Hardware Description Language) and Verilog are currently most commonly used. It will also be apparent to those skilled in the art that a hardware circuit implementing the logic method flow can be readily obtained by merely slightly programming the method flow into an integrated circuit using several of the hardware description languages described above.
The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application SPECIFIC INTEGRATED Circuits (ASICs), programmable logic controllers, and embedded microcontrollers, examples of controllers include, but are not limited to, the following microcontrollers: ARC 625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic of the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller may thus be regarded as a kind of hardware component, and means for performing various functions included therein may also be regarded as structures within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.
The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in one or more software and/or hardware elements when implemented in the present specification.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
The description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.
The foregoing is merely exemplary of the present disclosure and is not intended to limit the disclosure. Various modifications and alterations to this specification will become apparent to those skilled in the art. Any modifications, equivalent substitutions, improvements, or the like, which are within the spirit and principles of the present description, are intended to be included within the scope of the claims of the present application.

Claims (7)

The extraction rule determining module is used for acquiring update rule information, determining the modification times corresponding to the update rule information and determining the data processing rule updated by the user, wherein the update rule information is information for adding, deleting or modifying the data processing rule in the configuration information by the user; setting the priority of the update rule as the highest priority; when the modification times are equal to a preset modification threshold value, determining to acquire the update rule information of the modification threshold value number closest to the current time as an update rule group, and reducing the priority of the data processing rule which does not belong to the update rule group by one level in other data processing rules which are set previously; when the modification times are not equal to the modification threshold value, the priority of the other data processing rules set before is not changed; for each piece of sub data contained in each piece of cell detection data, determining a data identifier of the piece of sub data, matching the data identifier with each matching tag, and determining a matching tag corresponding to the data identifier as a target tag; determining the priority and the setting time of the data processing rules of the target tag, wherein the data processing rules of the target tag are used as a group of data processing rules, the data processing rules with the same priority are used as target rules, the data processing rules with the latest setting time are used as extraction rules contained in the target rules, and the extraction rules corresponding to the sub data are used as extraction rules;
CN202410189827.5A2024-02-202024-02-20 A cell detection data preprocessing method, device and storage mediumActiveCN117743809B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202410189827.5ACN117743809B (en)2024-02-202024-02-20 A cell detection data preprocessing method, device and storage medium

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202410189827.5ACN117743809B (en)2024-02-202024-02-20 A cell detection data preprocessing method, device and storage medium

Publications (2)

Publication NumberPublication Date
CN117743809A CN117743809A (en)2024-03-22
CN117743809Btrue CN117743809B (en)2024-05-24

Family

ID=90251195

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202410189827.5AActiveCN117743809B (en)2024-02-202024-02-20 A cell detection data preprocessing method, device and storage medium

Country Status (1)

CountryLink
CN (1)CN117743809B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN101101600A (en)*2007-07-102008-01-09北京大学 An automatic metadata extraction method based on multiple rules in web search
CN108572921A (en)*2017-05-152018-09-25北京金山云网络技术有限公司 Rule set update method, device, rule matching method and device
CN110196934A (en)*2019-05-072019-09-03中国科学院微电子研究所A kind of method and device generating handbook data
KR20200050104A (en)*2018-11-012020-05-11주식회사 케이티Method for providing Text-To-Speech service and relay server for the same
CN112163150A (en)*2020-09-172021-01-01北京三快在线科技有限公司Information pushing method and device
CN114564930A (en)*2022-03-022022-05-31中国建设银行股份有限公司Document information integration method, apparatus, device, medium, and program product
US11816321B1 (en)*2019-01-312023-11-14Splunk Inc.Enhancing extraction rules based on user feedback

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US12273322B2 (en)*2022-06-202025-04-08Microsoft Technology Licensing, LlcFirewall rule and data flow analysis and modification

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN101101600A (en)*2007-07-102008-01-09北京大学 An automatic metadata extraction method based on multiple rules in web search
CN108572921A (en)*2017-05-152018-09-25北京金山云网络技术有限公司 Rule set update method, device, rule matching method and device
KR20200050104A (en)*2018-11-012020-05-11주식회사 케이티Method for providing Text-To-Speech service and relay server for the same
US11816321B1 (en)*2019-01-312023-11-14Splunk Inc.Enhancing extraction rules based on user feedback
CN110196934A (en)*2019-05-072019-09-03中国科学院微电子研究所A kind of method and device generating handbook data
CN112163150A (en)*2020-09-172021-01-01北京三快在线科技有限公司Information pushing method and device
CN114564930A (en)*2022-03-022022-05-31中国建设银行股份有限公司Document information integration method, apparatus, device, medium, and program product

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于多种规则的课程元数据自动抽取;杨宇;张铭;周宝曜;;计算机科学;20080325(03);全文*

Also Published As

Publication numberPublication date
CN117743809A (en)2024-03-22

Similar Documents

PublicationPublication DateTitle
CN107562467B (en)Page rendering method, device and equipment
CN113282659A (en)Data processing method and device based on block chain
CN111966334B (en)Service processing method, device and equipment
CN110399359B (en)Data backtracking method, device and equipment
US12399939B2 (en)Data processing method and apparatus, readable storage medium, and electronic device
CN116028820B (en)Model training method and device, storage medium and electronic equipment
CN111737981A (en)Vocabulary error correction method and device, computer equipment and storage medium
CN117743809B (en) A cell detection data preprocessing method, device and storage medium
CN114333102A (en)Parameter configuration method and configuration device of unmanned equipment
CN103559574B (en)Method and system for operating workflow
CN110083602B (en)Method and device for data storage and data processing based on hive table
CN109376988A (en)A kind for the treatment of method and apparatus of business datum
CN117935915A (en) A method and device for managing gene expression detection data
CN117440000A (en)Parallel call interface method, device, electronic equipment and readable storage medium
CN115017915B (en)Model training and task execution method and device
CN114296709B (en) A code processing method, device, equipment and medium
CN115994252A (en)Data processing method, device and equipment
CN116861976A (en)Training method, device, equipment and storage medium of anomaly detection model
CN110321433B (en)Method and device for determining text category
CN115905267A (en)Data processing method and device, storage medium and electronic equipment
CN110659328B (en)Data query method, device, equipment and computer readable storage medium
CN109753351A (en)A kind of Time-critical tasks processing method, device, equipment and medium
CN119179935B (en)Data storage method, medium and device based on data quality
CN116957587A (en)Risk identification method, risk identification device, storage medium and electronic equipment
CN116431465A (en)Interface matching method and device, storage medium and electronic equipment

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp