


技术领域technical field
本申请涉及开源组件的数据校验技术领域,具体涉及一种开源组件的数据校验方法及装置。The present application relates to the technical field of data verification of open source components, in particular to a method and device for data verification of open source components.
背景技术Background technique
现代软件开发,针对开源组件安全数据的校验,就是分析开源组件数据是否正确。目前存在一些针对数据采集阶段进行数据校验的方法。目前的数据校验方法在判断数据的任一个指标的采集出现问题(比如,未采集到或采集到的数据指标不符合要求等)时,基本都是采取同样的处理方式进行预警操作。然而,在某些情况下,对于组件的一些指标而言,当出现未采集到等情况时,可能是该组件不存在相应的指标,这样该组件数据就会直接被误认为问题组件数据,而不会进入下一阶段(数据同步阶段),进而影响数据校验的准确性,进一步会影响数据更新的合理性。而且,对于组件的一些指标而言,即便是采集到的数据指标不符合要求,但其实对数据的后续处理基本没有影响,但目前的校验方法使得该些组件数据并不会进入下一阶段,进而影响数据更新的合理性。In modern software development, the verification of the security data of open source components is to analyze whether the data of open source components is correct. There are currently some methods for data validation during the data collection phase. The current data verification method basically adopts the same processing method to carry out early warning operations when it is judged that there is a problem in the collection of any index of the data (for example, the data index is not collected or the collected data index does not meet the requirements, etc.). However, in some cases, when some indicators of a component are not collected, it may be that the corresponding indicator does not exist for the component, so that the component data will be directly mistaken for the problem component data, and It will not enter the next stage (data synchronization stage), which will affect the accuracy of data verification and further affect the rationality of data update. Moreover, for some indicators of the components, even if the collected data indicators do not meet the requirements, it has basically no impact on the subsequent processing of the data, but the current verification method prevents the data of these components from entering the next stage , thus affecting the rationality of data update.
另外,目前的数据校验方法都没有针对数据的全流程校验,即在数据采集阶段、数据同步阶段、数据清洗阶段以及数据升级阶段均进行数据校验,但实际上在每一个阶段数据均可能出现问题,因此现有的数据校验方法无法保证最终获取到的数据均为没有问题的数据。In addition, the current data verification methods are not aimed at the whole process of data verification, that is, data verification is performed in the data collection stage, data synchronization stage, data cleaning stage, and data upgrading stage, but in fact, data is verified at each stage. Problems may occur, so the existing data verification methods cannot guarantee that the finally obtained data are all problem-free data.
发明内容Contents of the invention
本申请的目的在于提供一种开源组件的数据校验方法、装置、电子设备及计算机可读存储介质,能够解决上述至少一技术问题。The purpose of the present application is to provide a data verification method, device, electronic device and computer-readable storage medium of an open source component, which can solve at least one of the above technical problems.
为实现上述目的,本申请提供了一种提供了一种开源组件的数据校验方法,包括:In order to achieve the above purpose, this application provides a data verification method that provides an open source component, including:
将当前次数据采集阶段采集到的各条组件数据划分为必填数据和非必填数据,所述必填数据包括所述组件数据中的组件名称、组件发布时间,所述非必填数据包括所述组件数据中的组件许可信息和组件版本信息;Divide each piece of component data collected in the current data collection stage into mandatory data and non-mandatory data, the mandatory data includes the component name in the component data, and the release time of the component, and the non-mandatory data includes Component license information and component version information in the component data;
确定每条所述组件数据中的所述必填数据是否在当前次采集被采集到;Determine whether the required data in each piece of component data is collected in the current collection;
如果结果为否,则将该条所述组件数据记录到异常数据存储设备并进行第一等级的预警操作;If the result is no, then record the component data described in this article to the abnormal data storage device and perform the first-level early warning operation;
如果结果为是,则将该条所述组件数据记录到正常数据存储设备,且确定当前次采集到的所述组件许可信息是否存在于预设许可库中,确定当前次采集到的所述组件版本信息是否存在于预设版本库中,以及确定当前次采集到的所述组件发布时间符合设定格式;If the result is yes, record the component data in the normal data storage device, and determine whether the component license information collected at the current time exists in the preset license library, and determine the component at the current time. Whether the version information exists in the preset version library, and determining that the release time of the component collected at the current time conforms to the set format;
如果当前次采集到的所述组件许可信息不存在于预设许可库中,则在正常数据存储设备的组件表中对其进行标记,以及记录至异常数据存储设备并进行第二等级的预警操作,所述第一等级高于第二等级;If the component license information collected this time does not exist in the preset license library, mark it in the component table of the normal data storage device, and record it to the abnormal data storage device and perform a second-level early warning operation , the first level is higher than the second level;
如果当前次采集到的所述组件版本信息不存在于预设版本库中,则在正常数据存储设备的组件表中对其进行标记,以及记录至异常数据存储设备并进行第二等级的预警操作;If the component version information collected this time does not exist in the preset version library, mark it in the component table of the normal data storage device, and record it to the abnormal data storage device and perform a second-level early warning operation ;
如果当前次采集到的所述组件发布时间不符合设定格式,则在正常数据存储设备的组件表中对其进行标记,以及记录至所述异常数据存储设备并进行所述第二等级的预警操作。If the release time of the component collected this time does not conform to the set format, mark it in the component table of the normal data storage device, and record it to the abnormal data storage device and perform the second level of early warning operate.
可选地,所述方法还包括:Optionally, the method also includes:
所述数据采集阶段采集到的各条漏洞数据具有关联属性,所述关联属性为漏洞影响的组件及组件版本范围的信息;Each piece of vulnerability data collected in the data collection stage has an associated attribute, and the associated attribute is the information of the component affected by the vulnerability and the version range of the component;
确定每条所述漏洞数据的关联属性是否为空,如果结果为否,则将该条漏洞数据记录到正常数据存储设备,如果结果为是,则将该条漏洞数据记录到异常数据存储设备并进行第一等级的预警操作。Determine whether the associated attribute of each piece of vulnerability data is empty, if the result is no, then record this piece of vulnerability data into a normal data storage device, if the result is yes, then record this piece of vulnerability data into an abnormal data storage device and Carry out the first-level early warning operation.
可选地,在数据同步阶段,将正常数据存储设备存储的数据进行数据同步后,还包括变化量比较步骤:Optionally, in the data synchronization stage, after the data stored in the normal data storage device is synchronized, a change amount comparison step is also included:
将所述组件数据的变化量分别与第一上限阈值和第一下限阈值进行比较,如果所述组件数据的变化量大于所述第一上限阈值或者小于所述第二下限阈值,则发出警告信息;Comparing the change amount of the component data with a first upper limit threshold and a first lower limit threshold respectively, if the change amount of the component data is greater than the first upper limit threshold or less than the second lower limit threshold, a warning message is issued ;
将所述漏洞数据的变化量分别与第二上限阈值和第二下限阈值进行比较,如果所述漏洞数据的变化量大于所述第二上限阈值或者小于所述第二下限阈值,则发出警告信息。Comparing the change amount of the vulnerability data with a second upper limit threshold and a second lower limit threshold respectively, if the change amount of the vulnerability data is greater than the second upper limit threshold or less than the second lower limit threshold, a warning message is issued .
可选地,在数据清洗阶段,将数据同步后的数据进行数据清洗时建立组件与漏洞关系表,建立组件与许可关系表,以及在用于记录所述组件数据的组件表中记录每一组件的各组件版本对应的漏洞数量;Optionally, in the data cleaning stage, when the data after data synchronization is cleaned, a component-vulnerability relationship table is established, a component-permission relationship table is established, and each component is recorded in the component table used to record the component data The number of vulnerabilities corresponding to each component version of ;
在数据清洗后,所述方法还包括:After data cleaning, the method also includes:
确定所述组件表中每一组件的各组件版本对应的漏洞数量是否与所述组件与漏洞关系表中对应组件版本的数据行数相等,如果结果为否,则发出警告信息;以及Determine whether the number of vulnerabilities corresponding to each component version of each component in the component table is equal to the number of data rows of the corresponding component version in the component and vulnerability relationship table, and if the result is no, then send a warning message; and
确定所述组件表中每一组件许可信息是否存在于所述组件与许可关系表,如果结果为否,则发出警告信息。It is determined whether the license information of each component in the component table exists in the component and license relationship table, and if the result is no, a warning message is issued.
可选地,在数据清洗后,所述方法还包括:Optionally, after data cleaning, the method further includes:
确定各条所述组件数据的关键字段是否为空,如果结果为是,则发出警告信息。Determine whether the key field of the component data described in each item is empty, and if the result is yes, issue a warning message.
可选地,在数据清洗后,所述方法还包括:Optionally, after data cleaning, the method further includes:
利用预先建立并定期维护的所述基准数据库的数据对清洗后的数据进行校验,如果清洗后的数据存在与所述基准数据库中的数据不一致的数据,则发出警告信息。The data after cleaning is verified by using the data of the reference database established in advance and regularly maintained, and if the data after cleaning is inconsistent with the data in the reference database, a warning message is issued.
可选地,所述方法还包括:Optionally, the method also includes:
在数据清洗阶段完成数据清洗后,统计相较当前次清洗前的第一数据变化量以及抽样检查清洗后数据的内容;After the data cleaning is completed in the data cleaning stage, count the first data change compared with the current cleaning and sample the contents of the cleaned data;
在清洗后的数据完成数据升级后,统计相较当前次升级前的第二数据变化量以及抽样检查升级后数据的内容,所述抽样检查的升级后数据和所述抽样检查的清洗后数据一致;After the cleaned data is upgraded, the second data variation before the current upgrade is counted and the content of the upgraded data is sample checked, and the updated data of the sample check is consistent with the cleaned data of the sample check ;
确定所述第二数据变化量和所述第一数据变化量是否一致以及所述抽样检查的升级后数据的内容与所述抽样检查的清洗后数据的内容是否相同;determining whether the second data change amount is consistent with the first data change amount and whether the content of the upgraded data in the sample check is the same as the content of the cleaned data in the sample check;
如果结果为否,则发出警告信息。If the result is no, a warning message is issued.
为实现上述目的,本申请还提供了一种开源组件的数据校验装置,包括:In order to achieve the above purpose, the application also provides a data verification device for open source components, including:
划分模块,用于将当前次数据采集阶段采集到的各条组件数据划分为必填数据和非必填数据,所述必填数据包括所述组件数据中的组件名称、组件发布时间,所述非必填数据包括所述组件数据中的组件许可信息和组件版本信息;A division module, configured to divide each piece of component data collected in the current data collection phase into mandatory data and non-mandatory data, the mandatory data including the component name and component release time in the component data, the Non-mandatory data includes component license information and component version information in the component data;
确定模块,用于确定每条所述组件数据中的所述必填数据是否在当前次采集被采集到;A determining module, configured to determine whether the required data in each piece of component data is collected in the current collection;
记录及预警模块,用于如果结果为否,则将该条所述组件数据记录到异常数据存储设备并进行第一等级的预警操作;The recording and early warning module is used to record the component data described in this article to the abnormal data storage device and perform the first-level early warning operation if the result is no;
记录及确定模块,用于如果结果为是,则将该条所述组件数据记录到正常数据存储设备,且确定当前次采集到的所述组件许可信息是否存在于预设许可库中,确定当前次采集到的所述组件版本信息是否存在于预设版本库中,以及确定当前次采集到的所述组件发布时间符合设定格式;The recording and determining module is used to record the component data in the normal data storage device if the result is yes, and determine whether the component license information collected at the current time exists in the preset license library, and determine whether the current Whether the component version information collected for the first time exists in the preset version library, and determine that the release time of the component collected for the current time conforms to the set format;
第一标记及记录模块,用于如果当前次采集到的所述组件许可信息不存在于预设许可库中,则在正常数据存储设备的组件表中对其进行标记,以及记录至异常数据存储设备并进行第二等级的预警操作,所述第一等级高于第二等级,所述预设许可库根据惯用许可站点进行维护;The first marking and recording module is used to mark the component license information in the component table of the normal data storage device if the current collected component license information does not exist in the preset license library, and record it in the abnormal data storage The device performs a second-level early warning operation, the first level is higher than the second level, and the preset license library is maintained according to the customary license site;
第二标记及记录模块,用于如果当前次采集到的所述组件版本信息不存在于预设版本库中,则在正常数据存储设备的组件表中对其进行标记,以及记录至异常数据存储设备并进行第二等级的预警操作,所述预设版本库根据惯用开源站点进行维护;The second marking and recording module is used to mark the component version information in the component table of the normal data storage device if the current collected component version information does not exist in the preset version library, and record it to the abnormal data storage equipment and carry out the second-level early warning operation, and the preset version library is maintained according to the customary open source site;
第三标记及记录模块,如果当前次采集到的所述组件发布时间不符合设定格式,则在正常数据存储设备的组件表中对其进行标记,以及所述异常数据存储设备并进行所述第二等级的预警操作。The third marking and recording module, if the release time of the component collected this time does not conform to the set format, it will be marked in the component table of the normal data storage device, and the abnormal data storage device will perform the described The second level of early warning operation.
为实现上述目的,本申请还提供了一种电子设备,包括:To achieve the above purpose, the present application also provides an electronic device, including:
处理器;processor;
存储器,其中存储有所述处理器的可执行指令;a memory storing executable instructions of the processor;
其中,所述处理器配置为经由执行所述可执行指令来执行如前所述的开源组件的数据校验方法。Wherein, the processor is configured to execute the data checking method of the open source component as mentioned above by executing the executable instruction.
为实现上述目的,本申请还提供了一种计算机可读存储介质,其上存储有程序,所述程序被处理器执行时实现如前所述的开源组件的数据校验方法。To achieve the above object, the present application also provides a computer-readable storage medium, on which a program is stored, and when the program is executed by a processor, the data verification method of the open source component as mentioned above is realized.
本申请还提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。电子设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该电子设备执行如上所述的开源组件的数据校验方法。The present application also provides a computer program product or computer program, the computer program product or computer program comprising computer instructions stored in a computer-readable storage medium. The processor of the electronic device reads the computer instruction from the computer-readable storage medium, and the processor executes the computer instruction, so that the electronic device executes the data verification method of the open source component as described above.
本申请将数据采集阶段采集到的每条组件数据划分为必填数据和非必填数据,只要当前次采集能够采集到必填数据,不管非必填数据的采集情况如何,即将该条组件数据记录在正常数据存储设备,进而可以进入下一阶段的处理,避免因为非必填数据未采集到或采集到的数据不符合要求而使该条组件数据没有被记录在正常数据存储设备而无法进入下一阶段的处理,进而可以有效提升本地组件数据更新的合理性。另外,在当前次采集到必填数据的前提下,如果当前次采集到的组件许可信息不存在于预设许可库中,则在正常数据存储设备的组件表中对其进行标记,以及记录至异常数据存储设备并进行第二等级的预警操作,如果当前次采集到的组件版本信息不存在于预设版本库中,则在正常数据存储设备的组件表中对其进行标记,以及记录至异常数据存储设备并进行第二等级的预警操作,如果当前次采集到的组件发布时间不符合设定格式,则在正常数据存储设备的组件表中对其进行标记,以及异常数据存储设备并进行第二等级的预警操作,也就是说,本申请在采集到的非必填数据存在上述不符合要求但不影响组件数据后续处理的情况下,既可以在正常数据存储设备的组件表中对其进行标记,又可以记录至异常数据存储设备以进行预警操作,便于提醒开发人员。This application divides each piece of component data collected in the data collection stage into mandatory data and non-required data. As long as the current collection can collect the required data, regardless of the collection of non-required data, the Recorded in the normal data storage device, and then can enter the next stage of processing, avoiding that the component data is not recorded in the normal data storage device and cannot be entered because the non-mandatory data is not collected or the collected data does not meet the requirements. The next stage of processing can effectively improve the rationality of local component data updates. In addition, under the premise that the required data is collected in the current time, if the component license information collected in the current time does not exist in the preset license library, it will be marked in the component table of the normal data storage device, and recorded in the Abnormal data storage device and carry out the second-level early warning operation. If the current collected component version information does not exist in the preset version library, it will be marked in the component table of the normal data storage device and recorded in the abnormal The data storage device will carry out the second-level early warning operation. If the component release time collected at the current time does not conform to the set format, it will be marked in the component table of the normal data storage device, and the abnormal data storage device will be carried out in the second level. Two-level early warning operation, that is to say, if the non-mandatory data collected by this application does not meet the above requirements but does not affect the subsequent processing of component data, it can be processed in the component table of the normal data storage device. It can also be recorded to the abnormal data storage device for early warning operation, which is convenient for reminding developers.
附图说明Description of drawings
图1是本申请实施例开源组件的数据校验方法的流程图。FIG. 1 is a flowchart of a data verification method of an open source component in an embodiment of the present application.
图2是本申请实施例开源组件的数据校验装置的示意框图。Fig. 2 is a schematic block diagram of a data verification device of an open source component according to an embodiment of the present application.
图3是本申请实施例电子设备的示意框图。Fig. 3 is a schematic block diagram of an electronic device according to an embodiment of the present application.
具体实施方式Detailed ways
为了详细说明本申请的技术内容、构造特征、所实现目的及效果,以下结合实施方式并配合附图详予说明。In order to describe in detail the technical content, structural features, achieved goals and effects of the present application, the following will be described in detail in conjunction with the embodiments and accompanying drawings.
实施例一Embodiment one
请结合图1,本申请公开了一种开源组件的数据校验方法,包括:Please refer to Figure 1, this application discloses a data verification method for open source components, including:
101、将当前次数据采集阶段采集到的各条组件数据划分为必填数据和非必填数据,必填数据包括组件数据中的组件名称、组件发布时间,非必填数据包括组件数据中的组件许可信息和组件版本信息。101. Divide each piece of component data collected in the current data collection stage into mandatory data and non-mandatory data. Mandatory data includes the component name and component release time in the component data, and non-mandatory data includes the component data in the component data. Component license information and component version information.
数据采集阶段的采集通过本地的组件唯一标识(组件名称)从开源网站上进行采集。对于较少数组件而言,可能不存在组件许可信息和组件版本信息,所以本申请将其划分为非必填数据。The collection in the data collection stage is collected from the open source website through the local component unique identification (component name). For a small number of components, there may be no component license information and component version information, so this application classifies them as non-mandatory data.
102、确定每条组件数据中的必填数据是否在当前次采集被采集到,如果结果为否,则进入步骤103,如果结果为是,则进入步骤104、步骤105、步骤106、步骤107。102. Determine whether the required data in each piece of component data is collected in the current collection, if the result is no, then enter
103、将该条组件数据记录到异常数据存储设备并进行第一等级的预警操作。第一等级为最高异常等级,表示出现错误(ERROR)。103. Record the piece of component data in the abnormal data storage device and perform a first-level early warning operation. The first level is the highest exception level, indicating that an error (ERROR) has occurred.
104、将该条组件数据记录到正常数据存储设备。104. Record the item of component data to a normal data storage device.
105、确定当前次采集到的组件许可信息是否存在于预设许可库中,如果结果为否,则进入步骤108。105. Determine whether the currently collected component license information exists in the preset license library, and if the result is no, go to step 108.
106、确定当前次采集到的组件版本信息是否存在于预设版本库中,如果结果为否,则进入步骤108。106. Determine whether the currently collected component version information exists in the preset version library, and if the result is no, go to step 108.
107、确定当前次采集到的组件发布时间符合设定格式,如果结果为否,则进入步骤108。107. Determine that the component release time collected at the current time conforms to the set format, and if the result is no, go to step 108.
108、在正常数据存储设备的组件表中对其进行标记,以及记录至异常数据存储设备并进行第二等级(WARN)的预警操作,第一等级高于第二等级(即第二等级的严重程度低于第一等级)。108. Mark it in the component table of the normal data storage device, and record it to the abnormal data storage device and perform the early warning operation of the second level (WARN), the first level is higher than the second level (ie the second level is serious lower than level 1).
预设许可库根据惯用许可站点进行维护,即会进行定期更新,常见的组件许可皆会收集在预设许可库。倘若当前次采集到的组件许可信息不存在于预设许可库中,则大概率说明该组件许可信息并非常规许可,所以可以正常数据存储设备的组件表中对其进行标记,以及记录至异常数据存储设备并进行第二等级的预警操作,以提醒开发人员。The default license library is maintained according to the customary license site, that is, it will be updated regularly, and common component licenses will be collected in the default license library. If the current collected component license information does not exist in the default license library, it is likely to indicate that the component license information is not a conventional license, so it can be marked in the component table of the normal data storage device and recorded in the abnormal data Store the device and perform a second-level alert operation to alert developers.
预设版本库根据惯用开源站点进行维护,即会进行定期更新,常见的组件版本皆会收集在预设许可库。倘若当前次采集到的组件版本信息不存在于预设版本库中,则大概率说明该组件版本信息较为特殊,所以可以正常数据存储设备的组件表中对其进行标记,以及记录至异常数据存储设备并进行第二等级的预警操作,以提醒开发人员。The default version library is maintained according to the customary open source site, that is, it will be updated regularly, and the common component versions will be collected in the default license library. If the current collected component version information does not exist in the default version library, it is likely to indicate that the component version information is special, so it can be marked in the component table of the normal data storage device and recorded in the abnormal data storage equipment and perform second-level early warning operations to alert developers.
由于组件发布时间的格式是否符合设定格式,只是较为形式问题,并不会影响后续数据的处理,因此在组件发布时间的格式不符合设定格式时,可以在正常数据存储设备的组件表中对其进行标记,同时记录至异常数据存储设备并进行第二等级的预警操作,进而可以提醒开发人员进行修复。Whether the format of the component release time conforms to the set format is only a matter of form and will not affect the processing of subsequent data. Therefore, when the format of the component release time does not conform to the set format, it can be displayed in the component table of the normal data storage device. Mark it, record it to the abnormal data storage device and perform the second-level early warning operation, and then remind the developer to repair it.
本申请将数据采集阶段采集到的每条组件数据划分为必填数据和非必填数据,只要当前次采集能够采集到必填数据,不管非必填数据的采集情况如何,即将该条组件数据记录在正常数据存储设备,进而可以进入下一阶段的处理,避免因为非必填数据未采集到或采集到的数据不符合要求而使该条组件数据没有被记录在正常数据存储设备而无法进入下一阶段的处理,进而可以有效提升本地组件数据更新的合理性。另外,在当前次采集到必填数据的前提下,如果当前次采集到的组件许可信息不存在于预设许可库中,则在正常数据存储设备的组件表中对其进行标记,以及记录至异常数据存储设备并进行第二等级的预警操作,如果当前次采集到的组件版本信息不存在于预设版本库中,则在正常数据存储设备的组件表中对其进行标记,以及记录至异常数据存储设备并进行第二等级的预警操作,如果当前次采集到的组件发布时间不符合设定格式,则在正常数据存储设备的组件表中对其进行标记,以及异常数据存储设备并进行第二等级的预警操作,也就是说,本申请在采集到的非必填数据存在上述不符合要求但不影响组件数据后续处理的情况下,既可以在正常数据存储设备的组件表中对其进行标记,又可以记录至异常数据存储设备以进行预警操作,便于提醒开发人员。This application divides each piece of component data collected in the data collection stage into mandatory data and non-required data. As long as the current collection can collect the required data, regardless of the collection of non-required data, the Recorded in the normal data storage device, and then can enter the next stage of processing, avoiding that the component data is not recorded in the normal data storage device and cannot be entered because the non-mandatory data is not collected or the collected data does not meet the requirements. The next stage of processing can effectively improve the rationality of local component data updates. In addition, under the premise that the required data is collected in the current time, if the component license information collected in the current time does not exist in the preset license library, it will be marked in the component table of the normal data storage device, and recorded in the Abnormal data storage device and carry out the second-level early warning operation. If the current collected component version information does not exist in the preset version library, it will be marked in the component table of the normal data storage device and recorded in the abnormal The data storage device will carry out the second-level early warning operation. If the component release time collected at the current time does not conform to the set format, it will be marked in the component table of the normal data storage device, and the abnormal data storage device will be carried out in the second level. Two-level early warning operation, that is to say, if the non-mandatory data collected by this application does not meet the above requirements but does not affect the subsequent processing of component data, it can be processed in the component table of the normal data storage device. It can also be recorded to the abnormal data storage device for early warning operation, which is convenient for reminding developers.
具体地,方法还包括:Specifically, the method also includes:
数据采集阶段采集到的各条漏洞数据具有关联属性,关联属性为漏洞影响的组件及组件版本范围的信息。即本申请从特定站点采集漏洞数据,使得采集到的各条漏洞数据具有关联属性。Each piece of vulnerability data collected in the data collection stage has associated attributes, and the associated attributes are information about the components affected by the vulnerability and the version range of the components. That is, this application collects vulnerability data from a specific site, so that each piece of collected vulnerability data has associated attributes.
确定每条漏洞数据的关联属性是否为空,如果结果为否,则将该条漏洞数据记录到正常数据存储设备(漏洞表),如果结果为是,则将该条漏洞数据记录到异常数据存储设备并进行第一等级的预警操作。Determine whether the associated attribute of each piece of vulnerability data is empty, if the result is no, then record this piece of vulnerability data into a normal data storage device (vulnerability table), if the result is yes, then record this piece of vulnerability data into an abnormal data storage equipment and carry out the first-level early warning operation.
每条漏洞数据分别包括漏洞编号、漏洞发布时间、漏洞简介。Each piece of vulnerability data includes the vulnerability number, vulnerability release time, and vulnerability profile.
本申请使得采集到的各条漏洞数据均具有关联属性,并且在确定每条漏洞数据的关联属性不为空的情况下,才将其记录到正常数据存储设备的漏洞表中,进而确保正常数据存储设备采集到的漏洞数据均存在漏洞所影响的组件及组件版本范围的信息。This application enables each piece of vulnerability data collected to have associated attributes, and only when it is determined that the associated attribute of each piece of vulnerability data is not empty, it is recorded in the vulnerability table of the normal data storage device, thereby ensuring that the normal data The vulnerability data collected by the storage device contains information about the components affected by the vulnerability and the version range of the components.
具体地,在数据同步阶段(即将数据同步至另一存储设备,为本领域常规步骤,在此不作详述),将正常数据存储设备存储的数据进行数据同步后,还包括变化量比较步骤:Specifically, in the data synchronization stage (that is, synchronizing data to another storage device, which is a routine step in the art and will not be described in detail here), after the data stored in the normal data storage device is synchronized, the change amount comparison step is also included:
将组件数据的变化量(即存在变化的组件数据的条数)分别与第一上限阈值和第一下限阈值进行比较,如果组件数据的变化量大于第一上限阈值或者小于第二下限阈值,则发出警告信息。The variation of the component data (that is, the number of pieces of component data that have changed) is compared with the first upper threshold and the first lower threshold, and if the variation of the component data is greater than the first upper threshold or less than the second lower threshold, then Issue a warning message.
将漏洞数据的变化量(即存在变化的组件数据的条数)分别与第二上限阈值和第二下限阈值进行比较,如果漏洞数据的变化量大于第二上限阈值或者小于第二下限阈值,则发出警告信息。The amount of change of the vulnerability data (that is, the number of pieces of component data that has changed) is compared with the second upper limit threshold and the second lower limit threshold, and if the amount of change of the vulnerability data is greater than the second upper limit threshold or less than the second lower limit threshold, then Issue a warning message.
由于数据每次更新的数量通常会在一个范围内波动,如果不在该范围内,意味着数据的更新可能存在问题,因此本申请通过数据变化量与设定阈值的比较,来判断是否可能存在异常并由此发出警告信息,使人工可以介入处理。Since the number of data updates each time usually fluctuates within a range, if it is not within this range, it means that there may be problems with the data update. Therefore, this application judges whether there may be abnormalities by comparing the amount of data change with the set threshold. And thus a warning message is issued, so that manual intervention can be performed.
具体地,在数据清洗阶段,将数据同步后的数据进行数据清洗时建立组件与漏洞关系表,建立组件与许可关系表,以及在用于记录组件数据的组件表中记录每一组件的各组件版本对应的漏洞数量。组件与漏洞关系表即建立组件表中的组件和漏洞表中的漏洞的影响关系,组件与许可关系表即建立组件表中的组件和预设许可表中的许可的关联关系,为数据清洗阶段的常规操作,是本领域技术人员所知悉,在此不作详述。Specifically, in the data cleaning stage, when data is cleaned after data synchronization, a component-vulnerability relationship table is established, a component-permission relationship table is established, and each component of each component is recorded in the component table used to record component data. The number of vulnerabilities corresponding to the version. The component-vulnerability relationship table establishes the impact relationship between the components in the component table and the vulnerabilities in the vulnerability table, and the component-permission relationship table establishes the association relationship between the components in the component table and the permissions in the preset permission table, which is the data cleaning stage. The conventional operations are known to those skilled in the art and will not be described in detail here.
在数据清洗后,方法还包括:After data cleaning, the method also includes:
确定组件表中每一组件的各组件版本对应的漏洞数量是否与组件与漏洞关系表中对应组件版本的数据行数相等,如果结果为否,则发出警告信息;以及Determine whether the number of vulnerabilities corresponding to each component version of each component in the component table is equal to the number of data rows corresponding to the component version in the component-vulnerability relationship table, and if the result is no, issue a warning message; and
确定组件表中每一组件许可信息是否存在于组件与许可关系表,如果结果为否,则发出警告信息。Determine whether the license information of each component in the component table exists in the component and license relationship table, and if the result is no, issue a warning message.
本申请通过在组件表中记录每一组件的各组件版本对应的漏洞数量,并利用该漏洞数量与组件与漏洞关系表中对应组件版本的数据行数进行比较验证,如果两者不相等,则说明存在问题,进而会发出警告信息,从而有利于降低收集到的数据存在问题的概率。This application records the number of vulnerabilities corresponding to each component version of each component in the component table, and uses the number of vulnerabilities to compare and verify the number of data rows of the corresponding component version in the component-vulnerability relationship table. If the two are not equal, then Indicates that there is a problem, and then a warning message will be issued, which will help reduce the probability of problems with the collected data.
具体地,在数据清洗后,方法还包括:Specifically, after data cleaning, the method also includes:
确定各条组件数据的关键字段是否为空,如果结果为是,则发出警告信息。可以分为两种情况,一种是为空时数据一定存在问题(比如版本发布时间、最新版本),一种是为空时数据可能存在问题(许可简称、推荐版本)。Determine whether the key field of each piece of component data is empty, and if the result is yes, issue a warning message. It can be divided into two situations, one is that there must be problems with the data when it is empty (such as version release time, the latest version), and the other is that there may be problems with the data when it is empty (license abbreviation, recommended version).
具体地,在数据清洗后,方法还包括:Specifically, after data cleaning, the method also includes:
利用预先建立并定期维护的基准数据库的数据对清洗后的数据进行校验,如果清洗后的数据存在与基准数据库中的数据不一致的数据,则发出警告信息,以使人工介入。Use the data of the benchmark database established in advance and regularly maintained to verify the cleaned data. If the cleaned data is inconsistent with the data in the benchmark database, a warning message will be issued to allow manual intervention.
具体地,方法还包括:Specifically, the method also includes:
在数据清洗阶段完成数据清洗后,统计相较当前次清洗前的第一数据变化量以及抽样检查清洗后数据的内容。After the data cleaning is completed in the data cleaning stage, the first data change amount compared with the current cleaning is counted and the content of the cleaned data is checked by sampling.
在清洗后的数据完成数据升级后,统计相较当前次升级前的第二数据变化量以及抽样检查升级后数据的内容,抽样检查的升级后数据和抽样检查的清洗后数据一致。After the cleansed data is upgraded, count the amount of change in the second data before the current upgrade and sample check the content of the updated data. The updated data in the sample check is consistent with the cleaned data in the sample check.
确定第二数据变化量和第一数据变化量是否一致以及抽样检查的升级后数据的内容与抽样检查的清洗后数据的内容是否相同。It is determined whether the second data change amount is consistent with the first data change amount and whether the content of the sampled updated data is the same as that of the sampled cleaned data.
如果结果为否,则发出警告信息。If the result is no, a warning message is issued.
数据升级即利用不同的语法将清洗后的数据写入ES集群,为本领域技术人员所知悉,这里不展开介绍。Data upgrade is to use different syntax to write the cleaned data into the ES cluster, which is known to those skilled in the art and will not be introduced here.
通过确定第二数据变化量和第一数据变化量是否一致以及抽样检查的升级后数据的内容与抽样检查的清洗后数据的内容是否相同,可以对升级结果进行校验,基本确定是否正确的完成升级。By determining whether the second data variation is consistent with the first data variation and whether the content of the upgraded data in the sampling inspection is the same as that of the cleaned data in the sampling inspection, the upgrade result can be verified to basically determine whether it is completed correctly upgrade.
具体地,本申请定期或客户对数据存疑时人工对知识库中数据进行校验,主要分为两个方面,一是数据关系校验,如组件许可关系、组件漏洞关系,二是利用基准库数据,对数据完整性、正确性校验,此处的基准库可以是常用组件基准库,也可以是基于产线分析出来的全量数据库。Specifically, this application manually verifies the data in the knowledge base on a regular basis or when the customer has doubts about the data, which is mainly divided into two aspects, one is data relationship verification, such as component license relationship and component vulnerability relationship, and the other is using the benchmark database Data, for data integrity and correctness verification, the reference library here can be a common component reference library, or a full database based on production line analysis.
由上可知,在本申请的具体实施方式中,可以对开源组件的数据进行全流程的校验,即在数据采集阶段、数据同步阶段、数据清洗阶段以及数据升级阶段皆会进行数据校验,进而有利于得到更加正确的数据。It can be seen from the above that in the specific implementation of this application, the data of the open source components can be verified in the whole process, that is, the data verification will be performed in the data collection stage, data synchronization stage, data cleaning stage and data upgrade stage. This will help to obtain more accurate data.
实施例二Embodiment two
请参阅图2,本申请公开了一种开源组件的数据校验装置,包括:Please refer to Figure 2, this application discloses a data verification device for open source components, including:
划分模块201,用于将当前次数据采集阶段采集到的各条组件数据划分为必填数据和非必填数据,必填数据包括组件数据中的组件名称、组件发布时间,非必填数据包括组件数据中的组件许可信息和组件版本信息;The
确定模块202,用于确定每条组件数据中的必填数据是否在当前次采集被采集到;A determining
记录及预警模块203,用于如果结果为否,则将该条组件数据记录到异常数据存储设备并进行第一等级的预警操作;Recording and
记录及确定模块204,用于如果结果为是,则将该条组件数据记录到正常数据存储设备,且确定当前次采集到的组件许可信息是否存在于预设许可库中,确定当前次采集到的组件版本信息是否存在于预设版本库中,以及确定当前次采集到的组件发布时间符合设定格式;The record and
标记及记录模块205,用于如果当前次采集到的组件许可信息不存在于预设许可库中,则在正常数据存储设备的组件表中对其进行标记,以及记录至异常数据存储设备并进行第二等级的预警操作,第一等级高于第二等级,预设许可库根据惯用许可站点进行维护;如果当前次采集到的组件版本信息不存在于预设版本库中,则在正常数据存储设备的组件表中对其进行标记,以及记录至异常数据存储设备并进行第二等级的预警操作,预设版本库根据惯用开源站点进行维护;以及如果当前次采集到的组件发布时间不符合设定格式,则在正常数据存储设备的组件表中对其进行标记,以及异常数据存储设备并进行第二等级的预警操作。Marking and
本申请将数据采集阶段采集到的每条组件数据划分为必填数据和非必填数据,只要当前次采集能够采集到必填数据,不管非必填数据的采集情况如何,即将该条组件数据记录在正常数据存储设备,进而可以进入下一阶段的处理,避免因为非必填数据未采集到或采集到的数据不符合要求而使该条组件数据没有被记录在正常数据存储设备而无法进入下一阶段的处理,进而可以有效提升本地组件数据更新的合理性。另外,在当前次采集到必填数据的前提下,如果当前次采集到的组件许可信息不存在于预设许可库中,则在正常数据存储设备的组件表中对其进行标记,以及记录至异常数据存储设备并进行第二等级的预警操作,如果当前次采集到的组件版本信息不存在于预设版本库中,则在正常数据存储设备的组件表中对其进行标记,以及记录至异常数据存储设备并进行第二等级的预警操作,如果当前次采集到的组件发布时间不符合设定格式,则在正常数据存储设备的组件表中对其进行标记,以及异常数据存储设备并进行第二等级的预警操作,也就是说,本申请在采集到的非必填数据存在上述不符合要求但不影响组件数据后续处理的情况下,既可以在正常数据存储设备的组件表中对其进行标记,又可以记录至异常数据存储设备以进行预警操作,便于提醒开发人员。This application divides each piece of component data collected in the data collection stage into mandatory data and non-required data. As long as the current collection can collect the required data, regardless of the collection of non-required data, the Recorded in the normal data storage device, and then can enter the next stage of processing, avoiding that the component data is not recorded in the normal data storage device and cannot be entered because the non-mandatory data is not collected or the collected data does not meet the requirements. The next stage of processing can effectively improve the rationality of local component data updates. In addition, under the premise that the required data is collected in the current time, if the component license information collected in the current time does not exist in the preset license library, it will be marked in the component table of the normal data storage device, and recorded in the Abnormal data storage device and carry out the second-level early warning operation. If the current collected component version information does not exist in the preset version library, it will be marked in the component table of the normal data storage device and recorded in the abnormal The data storage device will carry out the second-level early warning operation. If the component release time collected at the current time does not conform to the set format, it will be marked in the component table of the normal data storage device, and the abnormal data storage device will be carried out in the second level. Two-level early warning operation, that is to say, if the non-mandatory data collected by this application does not meet the above requirements but does not affect the subsequent processing of component data, it can be processed in the component table of the normal data storage device. It can also be recorded to the abnormal data storage device for early warning operation, which is convenient for reminding developers.
实施例三Embodiment three
请结合图3,本申请公开了一种电子设备,包括:Please refer to Figure 3, this application discloses an electronic device, including:
处理器30;
存储器40,其中存储有处理器30的可执行指令;
其中,处理器30配置为经由执行可执行指令来执行如实施例一所述的开源组件的数据校验方法。Wherein, the
实施例四Embodiment four
本申请公开了一种计算机可读存储介质,其上存储有程序,程序被处理器执行时实现如实施例一所述的开源组件的数据校验方法。The present application discloses a computer-readable storage medium, on which a program is stored, and when the program is executed by a processor, the data verification method of an open source component as described in Embodiment 1 is realized.
实施例五Embodiment five
本申请实施例公开了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。电子设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该电子设备执行上述开源组件的数据校验方法。The embodiment of the present application discloses a computer program product or computer program, where the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium. The processor of the electronic device reads the computer instruction from the computer-readable storage medium, and the processor executes the computer instruction, so that the electronic device executes the above-mentioned data verification method of the open source component.
应当理解,在本申请实施例中,所称处理器可以是中央处理模块(CentralProcessing Unit,CPU),该处理器还可以是其他通用处理器、数字信号处理器(DigitalSignal Processor,DSP)、专用集成电路(Application SpecificIntegratedCircuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。It should be understood that in the embodiment of the present application, the so-called processor can be a central processing module (Central Processing Unit, CPU), and the processor can also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), dedicated integrated Circuit (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序指令相关的硬件来完成,的程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,的存储介质可为磁碟、光盘、只读存储记忆体(Read-OnlyMemory,ROM)或随机存储记忆体(Random AccessMemory,RAM)等。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above-mentioned embodiments can be implemented through hardware related to computer program instructions, and the program can be stored in a computer-readable storage medium. When the program is executed , may include the processes of the embodiments of the above-mentioned methods. Wherein, the storage medium may be a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM) or a random access memory (Random Access Memory, RAM), etc.
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其它实施例的相关描述。In the foregoing embodiments, the descriptions of each embodiment have their own emphases, and for parts not described in detail in a certain embodiment, reference may be made to relevant descriptions of other embodiments.
以上所揭露的仅为本申请的较佳实例而已,不能以此来限定本申请之权利范围,因此依本申请权利要求所作的等同变化,均属于本申请所涵盖的范围。The above disclosures are only preferred examples of the present application, and should not be used to limit the scope of the present application. Therefore, equivalent changes made according to the claims of the present application all fall within the scope of the present application.
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202310122533.6ACN116048528A (en) | 2023-02-03 | 2023-02-03 | Data verification method and device for open source components |
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202310122533.6ACN116048528A (en) | 2023-02-03 | 2023-02-03 | Data verification method and device for open source components |
| Publication Number | Publication Date |
|---|---|
| CN116048528Atrue CN116048528A (en) | 2023-05-02 |
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202310122533.6APendingCN116048528A (en) | 2023-02-03 | 2023-02-03 | Data verification method and device for open source components |
| Country | Link |
|---|---|
| CN (1) | CN116048528A (en) |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20160371961A1 (en)* | 2015-06-16 | 2016-12-22 | Google Inc. | Remote alarm hushing |
| CN111274294A (en)* | 2020-01-09 | 2020-06-12 | 中国科学院计算机网络信息中心 | Universal distributed heterogeneous data integrated logic convergence organization, release and service method and system |
| US10839389B1 (en)* | 2015-09-29 | 2020-11-17 | BuyerQuest, Inc. | System and method for updating and managing hosted catalogs in a procurement system |
| CN113239048A (en)* | 2021-05-26 | 2021-08-10 | 五八有限公司 | Data management method and device, electronic equipment and storage medium |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20160371961A1 (en)* | 2015-06-16 | 2016-12-22 | Google Inc. | Remote alarm hushing |
| US10839389B1 (en)* | 2015-09-29 | 2020-11-17 | BuyerQuest, Inc. | System and method for updating and managing hosted catalogs in a procurement system |
| CN111274294A (en)* | 2020-01-09 | 2020-06-12 | 中国科学院计算机网络信息中心 | Universal distributed heterogeneous data integrated logic convergence organization, release and service method and system |
| CN113239048A (en)* | 2021-05-26 | 2021-08-10 | 五八有限公司 | Data management method and device, electronic equipment and storage medium |
| Publication | Publication Date | Title |
|---|---|---|
| CN107766236B (en) | Test task automatic management method, device, equipment and storage medium | |
| US10782964B2 (en) | Measuring similarity of software components | |
| US20190095391A1 (en) | Processing a data set | |
| CN115269444B (en) | Code static detection method and device and server | |
| CN112416417A (en) | Code amount statistical method and device, electronic equipment and storage medium | |
| CN112925524A (en) | Method and device for detecting unsafe direct memory access in driver | |
| CN119322733B (en) | Code analysis method, device, equipment, and storage medium based on artificial intelligence | |
| CN116048528A (en) | Data verification method and device for open source components | |
| CN108446213A (en) | A kind of static code mass analysis method and device | |
| Mock et al. | Made-wic: Multiple annotated datasets for exploring weaknesses in code | |
| CN111124922B (en) | Rule-based automatic program repairing method, storage medium and computing device | |
| CN109408368B (en) | An output method, storage medium and server of test auxiliary information | |
| CN117033215A (en) | Test data processing method and device for increment codes, electronic equipment and storage medium | |
| CN112685277B (en) | Warning information inspection method, device, electronic device and readable storage medium | |
| CN116307735A (en) | Project iteration risk identification alarm method, system, equipment and storage medium | |
| US20050050503A1 (en) | Systems and methods for establishing data model consistency of computer aided design tools | |
| CN115269502A (en) | Bid document processing method, device and storage medium | |
| CN114880713A (en) | User behavior analysis method, device, equipment and medium based on data link | |
| CN110866557B (en) | Data evaluation method and device, storage medium and electronic device | |
| CN114860549A (en) | Buried point data verification method, device, equipment and storage medium | |
| CN113722163A (en) | Chip verification method and device and chip verification platform | |
| CN120371678B (en) | Function stack backtracing method, performance analysis method, device, equipment, medium and product | |
| CN114117542B (en) | A method, device, electronic device and storage medium for detecting tampering of a report | |
| CN117744580B (en) | Inspection report generation method, device, electronic device and storage medium | |
| CN120104439B (en) | A monitoring system and method |
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |