技术领域Technical field
本发明的实施例一般涉及计算机数据处理领域,并且更具体地,涉及一种面向电力行业业务系统的数据库完整性评价方法。Embodiments of the present invention generally relate to the field of computer data processing, and more specifically, to a database integrity evaluation method for power industry business systems.
背景技术Background technique
对于数据完整性评估方法的研究,关键在于解决关系数据完整性度量问题。现有的数据完整性评估方法大多通过提出抽象的数据完整性度量框架,然后在框架基础上实现函数依赖对数据完整性的影响,并提出具体的关系数据完整性度量指标及体系方案。在当前环境下,对数据完整性研究的重点集中在完整性的定义、保证技术、保证模型及其应用,而对数据完整性评估方法的研究多集中在对数据的校验上,然而这对于数据量巨大的电力行业业务系统来说工作效率极慢。For the research on data integrity assessment methods, the key is to solve the problem of relational data integrity measurement. Most of the existing data integrity assessment methods propose an abstract data integrity measurement framework, then realize the impact of functional dependencies on data integrity based on the framework, and propose specific relational data integrity measurement indicators and system solutions. In the current environment, the focus of data integrity research is on the definition of integrity, assurance technology, assurance models and their applications, while the research on data integrity assessment methods is mostly focused on data verification. However, this is not suitable for For power industry business systems with huge amounts of data, the work efficiency is extremely slow.
对于数据库完整性评估存在以下问题:The following issues exist for database integrity assessment:
1)没有量化形式地去评估数据库完整性;1) There is no quantitative form to assess database integrity;
2)随着数据量的快速增长,根据数据库进行完整性评估效率低,开销大;2) With the rapid growth of data volume, integrity assessment based on the database is inefficient and expensive;
3)未给出具体的数据库完整性评价方法。3) No specific database integrity evaluation method is given.
发明内容Contents of the invention
根据本发明的实施例,提供了一种面向电力行业业务系统的数据库完整性评价方案。According to embodiments of the present invention, a database integrity evaluation scheme for power industry business systems is provided.
在本发明的第一方面,提供了一种面向电力行业业务系统的数据库完整性评价方法。该方法包括:In a first aspect of the present invention, a database integrity evaluation method for power industry business systems is provided. The method includes:
从第一功能表中读取第一字段,所述第一字段为数据表中包含的总记录数;从第二功能表中读取第二字段和第三字段,并根据第二字段建立数据表与字段的所属关系;所述第二字段表示用户可见的字段所属的数据表的名称;所述第三字段为数据表中每个字段包含的空值数;Read the first field from the first function table, which is the total number of records contained in the data table; read the second field and the third field from the second function table, and create data based on the second field The ownership relationship between tables and fields; the second field represents the name of the data table to which the user-visible fields belong; the third field is the number of null values contained in each field in the data table;
从数据库中读取一数据表,累加所述数据表的第三字段的值,得到所述数据表的none值数量;计算去除全空字段的none值数量和满字段记录数;Read a data table from the database, accumulate the value of the third field of the data table, and obtain the number of none values in the data table; calculate the number of none values and the number of full field records excluding all empty fields;
遍历所述数据库中的全部数据表,将得到的数据指标值进行累加,计算数据库全空字段比例、数据库满字段记录比例、数据库none值比例和数据库去除全空字段的none值比例,对所述数据库的数据完整性进行评价。Traverse all the data tables in the database, accumulate the obtained data index values, and calculate the proportion of all empty fields in the database, the proportion of full field records in the database, the proportion of none values in the database, and the proportion of none values in the database after removing all empty fields. The data integrity of the database is evaluated.
进一步地,所述计算去除全空字段的none值数量,包括:Further, the calculation of the number of none values removing all empty fields includes:
Q=P-M*NQ=P-M*N
其中,Q为数据表中去除全空字段的none值数量;P为数据表中none值数量;M为数据表中全空字段数量;N为数据表中总记录数。Among them, Q is the number of none values in the data table excluding all empty fields; P is the number of none values in the data table; M is the number of all empty fields in the data table; N is the total number of records in the data table.
进一步地,所述满字段记录数为:Further, the number of full field records is:
K=N-FK=N-F
其中,K为数据表中满字段记录数;N为数据表中总记录数;F为数据表中非满字段记录数;所述非满字段记录数为数据表中包含空值的字段中的总记录数。Among them, K is the number of full field records in the data table; N is the total number of records in the data table; F is the number of non-full field records in the data table; the number of non-full field records is the number of fields in the data table that contain null values. total.
进一步地,所述将得到的数据值进行累加,包括:Further, accumulating the obtained data values includes:
将各数据表字段数进行累加,得到数据库总字段数;Accumulate the number of fields in each data table to obtain the total number of fields in the database;
将各数据表字段数与记录数的乘积进行累加,得到数据库数值总数;Accumulate the product of the number of fields and the number of records in each data table to obtain the total number of database values;
将各数据表总记录数进行累加,得到数据库总记录数;Accumulate the total number of records in each data table to obtain the total number of records in the database;
将各数据表none值数量进行累加,得到数据库none值数量;Accumulate the number of none values in each data table to obtain the number of none values in the database;
将各数据表去除全空字段的none值数量进行累加,得到数据库去除全空字段none值数量;Accumulate the number of none values in each data table after removing all empty fields, and obtain the number of none values in the database after removing all empty fields;
将各数据表满字段记录数进行累加,得到数据库满字段记录数;Accumulate the number of full field records in each data table to obtain the number of full field records in the database;
将各数据表去除全空字段后的数值进行累加,得到数据库去除全空字段后的数值总数。Accumulate the values in each data table after removing all empty fields to obtain the total number of values in the database after removing all empty fields.
进一步地,所述去除全空字段后的数值为:Further, the value after removing all empty fields is:
H=(R-M)*NH=(R-M)*N
其中,H为数据表中去除全空字段后的数值;R为数据表中的字段数;M为数据表中全空字段数量;N为数据表中总记录数。Among them, H is the value after removing all empty fields in the data table; R is the number of fields in the data table; M is the number of all empty fields in the data table; N is the total number of records in the data table.
进一步地,所述计算数据库全空字段比例、数据库满字段记录比例、数据库none值比例和数据库去除全空字段的none值比例,包括:Further, the calculation of the proportion of all empty fields in the database, the proportion of full field records in the database, the proportion of none values in the database and the proportion of none values in the database excluding all empty fields includes:
所述数据库全空字段比例为数据库全空字段数与数据库总字段数的比值;The ratio of all empty fields in the database is the ratio of the number of all empty fields in the database to the total number of fields in the database;
所述数据库满字段记录比例为数据库满字段记录数与数据库总记录数的比值;The ratio of full field records in the database is the ratio of the number of full field records in the database to the total number of records in the database;
所述数据库none值比例为数据库none值数量与数据库数值总数的比值;The database none value ratio is the ratio of the number of database none values to the total number of database values;
所述数据库去除全空字段的none值比例为数据库去除全空字段none值数量与数据库去除全空字段后的数值总数的比值。The proportion of none values in the database after all empty fields are removed is the ratio of the number of none values in the database after all empty fields are removed and the total number of values in the database after all empty fields are removed.
进一步地,所述对所述数据库的数据完整性进行评价,包括:Further, the evaluation of the data integrity of the database includes:
所述数据库的数据完整性与所述数据库全空字段比例、所述数据库none值比例、所述数据库去除全空字段的none值比例呈负相关;The data integrity of the database is negatively correlated with the proportion of all empty fields in the database, the proportion of none values in the database, and the proportion of none values in the database except all empty fields;
所述数据库的数据完整性与所述数据库满字段记录比例呈正相关。The data integrity of the database is positively correlated with the proportion of full field records in the database.
在本发明的第二方面,提供了一种面向电力行业业务系统的数据库完整性评价装置。该装置包括:In a second aspect of the present invention, a database integrity evaluation device for power industry business systems is provided. The device includes:
读取模块,用于从第一功能表中读取第一字段,从第二功能表中读取第二字段和第三字段,并根据第二字段建立数据表与字段的所属关系;所述第一字段为数据表中包含的总记录数;所述第二字段表示用户可见的字段所属的数据表的名称;所述第三字段为数据表中每个字段包含的空值数;The reading module is used to read the first field from the first function table, read the second field and the third field from the second function table, and establish the ownership relationship between the data table and the field according to the second field; The first field is the total number of records contained in the data table; the second field represents the name of the data table to which the user-visible fields belong; the third field is the number of null values contained in each field in the data table;
第一累加计算模块,用于从数据库中读取一数据表,累加所述数据表的第三字段的值,得到所述数据表的none值数量;计算去除全空字段的none值数量和满字段记录数;The first accumulation calculation module is used to read a data table from the database, accumulate the value of the third field of the data table, and obtain the number of none values in the data table; calculate the number of none values and the full value after excluding all empty fields. Number of field records;
第二累加计算模块,用于遍历所述数据库中的全部数据表,将得到的数据指标值进行累加,计算数据库全空字段比例、数据库满字段记录比例、数据库none值比例和数据库去除全空字段的none值比例;The second accumulation calculation module is used to traverse all data tables in the database, accumulate the obtained data index values, and calculate the proportion of all empty fields in the database, the proportion of records in the database full fields, the proportion of none values in the database, and the proportion of all empty fields in the database. The proportion of none values;
评价模块,用于对所述数据库的数据完整性进行评价。An evaluation module is used to evaluate the data integrity of the database.
在本发明的第三方面,提供了一种电子设备。该电子设备包括:存储器和处理器,所述存储器上存储有计算机程序,所述处理器执行所述程序时实现如以上所述的方法。In a third aspect of the invention, an electronic device is provided. The electronic device includes: a memory and a processor. A computer program is stored on the memory. When the processor executes the program, the method as described above is implemented.
在本发明的第四方面,提供了一种计算机可读存储介质,其上存储有计算机程序,所述程序被处理器执行时实现如根据本发明的第一方面的方法。In a fourth aspect of the present invention, there is provided a computer-readable storage medium having a computer program stored thereon, which when executed by a processor implements the method according to the first aspect of the present invention.
应当理解,发明内容部分中所描述的内容并非旨在限定本发明的实施例的关键或重要特征,亦非用于限制本发明的范围。本发明的其它特征将通过以下的描述变得容易理解。It should be understood that the content described in this summary is not intended to identify key or important features of the embodiments of the invention, nor is it intended to limit the scope of the invention. Other features of the invention will become readily apparent from the following description.
本发明通过分析各个数据表中存在空属性的记录数占总记录数的比例和表中存在空值的属性数占总属性数量的比例,对数据库的合理性和完整性进行评价,搜索数据表的空格、空列、空行等属性时,具有较高的发现率和准确性。This invention evaluates the rationality and completeness of the database by analyzing the ratio of the number of records with null attributes to the total number of records in each data table and the ratio of the number of attributes with null values to the total number of attributes in the table, and searches the data table. It has a higher discovery rate and accuracy when using attributes such as spaces, empty columns, and empty rows.
附图说明Description of the drawings
结合附图并参考以下详细说明,本发明各实施例的上述和其他特征、优点及方面将变得更加明显。在附图中,相同或相似的附图标记表示相同或相似的元素,其中:The above and other features, advantages and aspects of various embodiments of the invention will become more apparent with reference to the following detailed description taken in conjunction with the accompanying drawings. In the drawings, the same or similar reference numbers represent the same or similar elements, where:
图1示出了根据本发明的实施例的面向电力行业业务系统的数据库完整性评价方法的流程图;Figure 1 shows a flow chart of a database integrity evaluation method for power industry business systems according to an embodiment of the present invention;
图2示出了根据本发明的实施例的面向电力行业业务系统的数据库完整性评价装置的方框图;Figure 2 shows a block diagram of a database integrity evaluation device for power industry business systems according to an embodiment of the present invention;
图3示出了能够实施本发明的实施例的示例性电子设备的方框图。Figure 3 shows a block diagram of an exemplary electronic device capable of implementing embodiments of the invention.
具体实施方式Detailed ways
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的全部其他实施例,都属于本发明保护的范围。In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments These are some embodiments of the present invention, rather than all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of the present invention.
另外,本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。In addition, the term "and/or" in this article is only an association relationship that describes related objects, indicating that there can be three relationships. For example, A and/or B can mean: A alone exists, and A and B exist simultaneously. There are three cases of B alone. In addition, the character "/" in this article generally indicates that the related objects are an "or" relationship.
本发明中,通过Oracle数据库自带的dbms_stats模块,进行数据库基础信息统计,对基础信息进行处理和加工,并对数据表进行检测,搜索数据表的空格、空列、空行等属性时,具有较高的发现率和准确性,分析表中存在空属性的记录数占总记录数的比例和表中存在空值的属性数占总属性数量的比例,计算数据表的数据完整性,在将数据表的评价值进行累加,从而对数据库的合理性和完整性进行评价。In the present invention, through the dbms_stats module that comes with the Oracle database, basic information statistics of the database are carried out, the basic information is processed and processed, and the data table is detected. When searching for attributes such as spaces, empty columns, and empty rows of the data table, it has Higher discovery rate and accuracy. Analyze the ratio of the number of records with null attributes in the table to the total number of records and the ratio of the number of attributes with null values in the table to the total number of attributes. Calculate the data integrity of the data table. The evaluation values of the data table are accumulated to evaluate the rationality and completeness of the database.
图1示出了本发明实施例的面向电力行业业务系统的数据库完整性评价方法的流程图。Figure 1 shows a flow chart of a database integrity evaluation method for power industry business systems according to an embodiment of the present invention.
该方法包括:The method includes:
S110、从第一功能表中读取第一字段,所述第一字段为数据表中包含的总记录数;从第二功能表中读取第二字段和第三字段,并根据第二字段建立数据表与字段的所属关系;所述第二字段表示用户可见的字段所属的数据表的名称;所述第三字段为数据表中每个字段包含的空值数。S110. Read the first field from the first function table, where the first field is the total number of records included in the data table; read the second field and the third field from the second function table, and calculate the Establish the ownership relationship between the data table and the fields; the second field represents the name of the data table to which the user-visible fields belong; and the third field is the number of null values contained in each field in the data table.
作为本发明的一种实施例,通过在Oracle数据库中调用dbms_stats功能模块,在调用dbms_stats功能模块后,从第一功能表中读取第一字段,所述第一功能表例如user_tables表,表示当前用户可见表;所述第一字段例如num_row字段,所述num_row字段表示数据表中包含的总记录数。所述第二功能表例如user_tab_columns表,表示当前用户可见字段,即当前用户能看到的字段;第二字段例如table_name字段,表示用户可见的字段所属的数据表的名称;第三字段例如num_null字段,表示数据表中每个字段包含的空值数。根据table_name字段建立数据表与数据字段的所属关系。As an embodiment of the present invention, by calling the dbms_stats function module in the Oracle database, after calling the dbms_stats function module, the first field is read from the first function table. The first function table, such as the user_tables table, represents the current A user-visible table; the first field is, for example, a num_row field, and the num_row field represents the total number of records contained in the data table. The second function table, such as the user_tab_columns table, represents the fields visible to the current user, that is, the fields that the current user can see; the second field, such as the table_name field, represents the name of the data table to which the user-visible fields belong; and the third field, such as the num_null field , indicating the number of null values contained in each field in the data table. Establish the ownership relationship between data tables and data fields based on the table_name field.
本实施例中,利用了Oracle自带的dbms_stats功能模块实现数据表基础数据收集,降低了完整性方法开发的难度和工作量,比提高了准确性。In this embodiment, the dbms_stats function module that comes with Oracle is used to collect basic data of the data table, which reduces the difficulty and workload of developing the integrity method and improves the accuracy.
S120、从数据库中读取一数据表,累加所述数据表的第三字段的值,得到所述数据表的none值数量。S120. Read a data table from the database, accumulate the value of the third field of the data table, and obtain the number of none values in the data table.
所述从数据库中读取到的数据表表示为A表,累加A表中的num_null字段的值,得到A表的none值数量;其中num_null字段为A表各字段包含的空值数,例如,如果num_null等于num_row,则说明该字段为全空字段,其中num_row为A表包含总记录数;如果num_null为0,则说明该字段中无空值,该字段为满数据字段。none值数量为A表中的各字段num_null值的累加之和。The data table read from the database is represented as table A, and the values of the num_null field in table A are accumulated to obtain the number of none values in table A; where the num_null field is the number of null values contained in each field of table A, for example, If num_null is equal to num_row, it means that the field is a completely empty field, where num_row is the total number of records contained in table A; if num_null is 0, it means that there are no null values in the field, and the field is a full data field. The number of none values is the cumulative sum of the num_null values of each field in table A.
S130、计算去除全空字段的none值数量。S130. Calculate the number of none values excluding all empty fields.
所述去除全空字段的none值数量为A表none值数量-A表全空字段数量*A表总记录数,即所述去除全空字段的none值数量Q为:The number of none values to remove all empty fields is the number of none values in table A - the number of all empty fields in table A * the total number of records in table A, that is, the number Q of none values to remove all empty fields is:
Q=P-M*NQ=P-M*N
其中,Q为数据表中去除全空字段的none值数量;P为数据表中none值数量;M为数据表中全空字段数量;N为数据表中总记录数。Among them, Q is the number of none values in the data table excluding all empty fields; P is the number of none values in the data table; M is the number of all empty fields in the data table; N is the total number of records in the data table.
S140、计算满字段记录数。S140. Calculate the number of full field records.
所述计算满字段记录数,通过sql语句实现下述逻辑:先令集合{a1,a2,...,an}为A表的各字段,分别判断每个集合元素是否为none,如果是,则将记录数+1,遍历整个集合后,得到非满字段记录数。计算方式为:To calculate the number of full field records, the following logic is implemented through SQL statements: the shilling set {a1 , a2 ,..., an } is each field of table A, and it is judged whether each set element is none, If so, add 1 to the number of records, and after traversing the entire collection, get the number of non-full field records. The calculation method is:
K=N-FK=N-F
其中,K为数据表中满字段记录数;N为数据表中总记录数;F为数据表中非满字段记录数;所述非满字段记录数为数据表中包含空值的字段中的总记录数。Among them, K is the number of full field records in the data table; N is the total number of records in the data table; F is the number of non-full field records in the data table; the number of non-full field records is the number of fields in the data table that contain null values. total.
上述S110~S140,通过Oracle数据库自带的dbms_stats模块,进行单个数据表基础信息统计,统计得到了数据表的字段数、记录数、全空字段数、none值数量、去除全空字段的none值数量、满字段记录数等数据指标值。The above S110~S140 uses the dbms_stats module that comes with the Oracle database to perform basic information statistics on a single data table. The statistics include the number of fields, the number of records, the number of all-empty fields, the number of none values, and the none values excluding all-empty fields. Data indicator values such as quantity and number of full field records.
S150、遍历所述数据库中的全部数据表,将得到的数据指标值进行累加。S150. Traverse all data tables in the database and accumulate the obtained data index values.
作为本发明的一种实施例,通过遍历数据库中的全部数据表,将每个数据表的数据指标值进行累加,得到整个数据库的数据指标总值。As an embodiment of the present invention, by traversing all data tables in the database and accumulating the data index values of each data table, the total data index value of the entire database is obtained.
具体包括:Specifically include:
累加各表字段数乘以记录数,得到数据库数值总数;Accumulate the number of fields in each table multiplied by the number of records to obtain the total number of database values;
累加各表字段数,得到数据库总字段数;Accumulate the number of fields in each table to obtain the total number of fields in the database;
累加各表总记录数,得到数据库总记录数;Accumulate the total number of records in each table to obtain the total number of records in the database;
累加各表全空字段数,得到该数据库总全空字段数;Accumulate the number of all-empty fields in each table to obtain the total number of all-empty fields in the database;
累加各表none值数量,得到数据库none值数量;Accumulate the number of none values in each table to obtain the number of none values in the database;
累加各表去除全空字段的none值数量,得到数据库去除全空字段none值数量;Accumulate the number of none values in each table to remove all empty fields, and obtain the number of none values in the database after removing all empty fields;
累加各表满字段记录数量,得到数据库满字段记录数量;Accumulate the number of full field records in each table to obtain the number of full field records in the database;
累加各表(字段数-全空字段数)*记录数,得到数据库去除全空字段后数值总数,即所述去除全空字段后的数值为:Accumulate each table (number of fields - number of empty fields) * number of records to get the total number of values in the database after removing all empty fields, that is, the value after removing all empty fields is:
H=(R-M)*NH=(R-M)*N
其中,H为数据表中去除全空字段后的数值;R为数据表中的字段数;M为数据表中全空字段数量;N为数据表中总记录数。Among them, H is the value after removing all empty fields in the data table; R is the number of fields in the data table; M is the number of all empty fields in the data table; N is the total number of records in the data table.
S160、计算数据库全空字段比例、数据库满字段记录比例、数据库none值比例和数据库去除全空字段的none值比例,对所述数据库的数据完整性进行评价。S160. Calculate the proportion of all empty fields in the database, the proportion of full field records in the database, the proportion of none values in the database, and the proportion of none values in the database excluding all empty fields, and evaluate the data integrity of the database.
所述数据库全空字段比例为全空字段数/数据库总字段数;The ratio of all empty fields in the database is the number of all empty fields/the total number of fields in the database;
所述数据库满字段记录比例为满字段记录总数/数据库记录总数;The ratio of full field records in the database is the total number of full field records/the total number of database records;
所述数据库none值比例为none值数量/数据库数值总数;The ratio of none values in the database is the number of none values/the total number of database values;
所述数据库去除全空字段的none值比例为去除全空字段后none值数量/数据库去除全空字段后数值总数。The proportion of none values in the database after removing all empty fields is the number of none values after removing all empty fields/the total number of values in the database after removing all empty fields.
进一步地,170、通过上述四项评价指标可以对数据库的完整性进行评价,包括:Furthermore, 170. The integrity of the database can be evaluated through the above four evaluation indicators, including:
所述数据库的数据完整性与所述数据库全空字段比例呈负相关,即所述数据库全空字段比例越高,则所述数据库的完整性越差;反之,所述数据库全空字段比例越低,则所述数据库的完整性越好。The data integrity of the database is negatively correlated with the proportion of all empty fields in the database, that is, the higher the proportion of all empty fields in the database, the worse the integrity of the database; conversely, the higher the proportion of all empty fields in the database. The lower, the better the integrity of the database.
所述数据库的数据完整性与所述数据库none值比例呈负相关,即所述数据库none值比例越高,则所述数据库的完整性越差;反之,所述数据库none值比例越低,则所述数据库的完整性越好。The data integrity of the database is negatively correlated with the none value ratio of the database, that is, the higher the none value ratio of the database, the worse the integrity of the database; conversely, the lower the none value ratio of the database, the The integrity of the database is better.
所述数据库的数据完整性与所述数据库去除全空字段的none值比例呈负相关,即所述数据库去除全空字段的none值比例越高,则所述数据库的完整性越差;反之,所述数据库去除全空字段的none值比例越低,则所述数据库的完整性越好。The data integrity of the database is negatively correlated with the proportion of none values in the database when all empty fields are removed, that is, the higher the proportion of none values in the database when all empty fields are removed, the worse the integrity of the database is; conversely, The lower the proportion of none values in the database excluding all-empty fields, the better the integrity of the database.
所述数据库的数据完整性与所述数据库满字段记录比例呈正相关;即所述数据库满字段记录比例越高,则所述数据库的完整性越好;反之,所述数据库满字段记录比例越低,则所述数据库的完整性越差。The data integrity of the database is positively correlated with the proportion of full field records in the database; that is, the higher the proportion of full field records in the database, the better the integrity of the database; conversely, the lower the proportion of full field records in the database. , the worse the integrity of the database.
通过上述指标评价可以从数据质量的量化评估角度对数据库的完整性进行评价。Through the above index evaluation, the integrity of the database can be evaluated from the perspective of quantitative assessment of data quality.
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本发明并不受所描述的动作顺序的限制,因为依据本发明,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于可选实施例,所涉及的动作和模块并不一定是本发明所必须的。It should be noted that for the sake of simple description, the foregoing method embodiments are expressed as a series of action combinations. However, those skilled in the art should know that the present invention is not limited by the described action sequence. Because in accordance with the present invention, certain steps may be performed in other orders or simultaneously. Secondly, those skilled in the art should also know that the embodiments described in the specification are all optional embodiments, and the actions and modules involved are not necessarily necessary for the present invention.
以上是关于方法实施例的介绍,以下通过装置实施例,对本发明所述方案进行进一步说明。The above is an introduction to the method embodiments. The solution of the present invention will be further described below through device embodiments.
如图2所示,装置200包括:As shown in Figure 2, device 200 includes:
读取模块201,用于从第一功能表中读取第一字段,从第二功能表中读取第二字段和第三字段,并根据第二字段建立数据表与字段的所属关系;所述第一字段为数据表中包含的总记录数;所述第二字段表示用户可见的字段所属的数据表的名称;所述第三字段为数据表中每个字段包含的空值数;The reading module 201 is used to read the first field from the first function table, read the second field and the third field from the second function table, and establish the ownership relationship between the data table and the field according to the second field; The first field is the total number of records contained in the data table; the second field represents the name of the data table to which the user-visible fields belong; the third field is the number of null values contained in each field in the data table;
第一累加计算模块202,用于从数据库中读取一数据表,累加所述数据表的第三字段的值,得到所述数据表的none值数量;计算去除全空字段的none值数量和满字段记录数;The first accumulation calculation module 202 is used to read a data table from the database, accumulate the value of the third field of the data table, and obtain the number of none values in the data table; calculate the number of none values excluding all empty fields and Number of full field records;
进一步地,还包括第一计算模块,用于计算所述去除全空字段的none值数量,包括:Further, it also includes a first calculation module, used to calculate the number of none values after removing all empty fields, including:
Q=P-M*NQ=P-M*N
其中,Q为数据表中去除全空字段的none值数量;P为数据表中none值数量;M为数据表中全空字段数量;N为数据表中总记录数。Among them, Q is the number of none values in the data table excluding all empty fields; P is the number of none values in the data table; M is the number of all empty fields in the data table; N is the total number of records in the data table.
进一步地,还包括第二计算模块,用于计算所述满字段记录数,包括:Further, it also includes a second calculation module for calculating the number of full field records, including:
K=N-FK=N-F
其中,K为数据表中满字段记录数;N为数据表中总记录数;F为数据表中非满字段记录数;所述非满字段记录数为数据表中包含空值的字段中的总记录数。Among them, K is the number of full field records in the data table; N is the total number of records in the data table; F is the number of non-full field records in the data table; the number of non-full field records is the number of fields in the data table that contain null values. total.
第二累加计算模块203,用于遍历所述数据库中的全部数据表,将得到的数据指标值进行累加,计算数据库全空字段比例、数据库满字段记录比例、数据库none值比例和数据库去除全空字段的none值比例。The second accumulation calculation module 203 is used to traverse all data tables in the database, accumulate the obtained data index values, and calculate the proportion of all empty fields in the database, the proportion of full field records in the database, the proportion of none values in the database, and the proportion of all empty fields in the database. The proportion of none values in the field.
进一步地,第二累加计算模块203还包括第一累加模块,用于将每个数据表的数据指标值进行累加,得到整个数据库的数据指标总值,具体包括:Further, the second accumulation calculation module 203 also includes a first accumulation module, which is used to accumulate the data index values of each data table to obtain the total data index value of the entire database, specifically including:
将各数据表字段数进行累加,得到数据库总字段数;Accumulate the number of fields in each data table to obtain the total number of fields in the database;
将各数据表字段数与记录数的乘积进行累加,得到数据库数值总数;Accumulate the product of the number of fields and the number of records in each data table to obtain the total number of database values;
将各数据表总记录数进行累加,得到数据库总记录数;Accumulate the total number of records in each data table to obtain the total number of records in the database;
将各数据表none值数量进行累加,得到数据库none值数量;Accumulate the number of none values in each data table to obtain the number of none values in the database;
将各数据表去除全空字段的none值数量进行累加,得到数据库去除全空字段none值数量;Accumulate the number of none values in each data table after removing all empty fields, and obtain the number of none values in the database after removing all empty fields;
将各数据表满字段记录数进行累加,得到数据库满字段记录数;Accumulate the number of full field records in each data table to obtain the number of full field records in the database;
将各数据表去除全空字段后的数值进行累加,得到数据库去除全空字段后的数值总数;即所述去除全空字段后的数值为:Accumulate the values after removing all empty fields in each data table to obtain the total number of values in the database after removing all empty fields; that is, the value after removing all empty fields is:
H=(R-M)*NH=(R-M)*N
其中,H为数据表中去除全空字段后的数值;R为数据表中的字段数;M为数据表中全空字段数量;N为数据表中总记录数。Among them, H is the value after removing all empty fields in the data table; R is the number of fields in the data table; M is the number of all empty fields in the data table; N is the total number of records in the data table.
所述第二累加计算模块203还包括第三计算模块,用于计算数据库全空字段比例、数据库满字段记录比例、数据库none值比例和数据库去除全空字段的none值比例,包括:The second accumulation calculation module 203 also includes a third calculation module, which is used to calculate the proportion of all empty fields in the database, the proportion of full field records in the database, the proportion of none values in the database, and the proportion of none values in the database after removing all empty fields, including:
所述数据库全空字段比例为数据库全空字段数与数据库总字段数的比值;The ratio of all empty fields in the database is the ratio of the number of all empty fields in the database to the total number of fields in the database;
所述数据库满字段记录比例为数据库满字段记录数与数据库总记录数的比值;The ratio of full field records in the database is the ratio of the number of full field records in the database to the total number of records in the database;
所述数据库none值比例为数据库none值数量与数据库数值总数的比值;The database none value ratio is the ratio of the number of database none values to the total number of database values;
所述数据库去除全空字段的none值比例为数据库去除全空字段none值数量与数据库去除全空字段后的数值总数的比值。The proportion of none values in the database after all empty fields are removed is the ratio of the number of none values in the database after all empty fields are removed and the total number of values in the database after all empty fields are removed.
评价模块204,用于对所述数据库的数据完整性进行评价;具体包括:The evaluation module 204 is used to evaluate the data integrity of the database; specifically includes:
所述数据库的数据完整性与所述数据库全空字段比例、所述数据库none值比例、所述数据库去除全空字段的none值比例呈负相关;The data integrity of the database is negatively correlated with the proportion of all empty fields in the database, the proportion of none values in the database, and the proportion of none values in the database except all empty fields;
所述数据库的数据完整性与所述数据库满字段记录比例呈正相关。The data integrity of the database is positively correlated with the proportion of full field records in the database.
具体包括:Specifically include:
所述数据库的数据完整性与所述数据库全空字段比例呈负相关,即所述数据库全空字段比例越高,则所述数据库的完整性越差;反之,所述数据库全空字段比例越低,则所述数据库的完整性越好。The data integrity of the database is negatively correlated with the proportion of all empty fields in the database, that is, the higher the proportion of all empty fields in the database, the worse the integrity of the database; conversely, the higher the proportion of all empty fields in the database. The lower, the better the integrity of the database.
所述数据库的数据完整性与所述数据库none值比例呈负相关,即所述数据库none值比例越高,则所述数据库的完整性越差;反之,所述数据库none值比例越低,则所述数据库的完整性越好。The data integrity of the database is negatively correlated with the none value ratio of the database, that is, the higher the none value ratio of the database, the worse the integrity of the database; conversely, the lower the none value ratio of the database, the The integrity of the database is better.
所述数据库的数据完整性与所述数据库去除全空字段的none值比例呈负相关,即所述数据库去除全空字段的none值比例越高,则所述数据库的完整性越差;反之,所述数据库去除全空字段的none值比例越低,则所述数据库的完整性越好。The data integrity of the database is negatively correlated with the proportion of none values in the database when all empty fields are removed, that is, the higher the proportion of none values in the database when all empty fields are removed, the worse the integrity of the database is; conversely, The lower the proportion of none values in the database excluding all-empty fields, the better the integrity of the database.
所述数据库的数据完整性与所述数据库满字段记录比例呈正相关;即所述数据库满字段记录比例越高,则所述数据库的完整性越好;反之,所述数据库满字段记录比例越低,则所述数据库的完整性越差。The data integrity of the database is positively correlated with the proportion of full field records in the database; that is, the higher the proportion of full field records in the database, the better the integrity of the database; conversely, the lower the proportion of full field records in the database. , the worse the integrity of the database.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,所述描述的模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that for the convenience and simplicity of description, the specific working process of the described module can be referred to the corresponding process in the foregoing method embodiment, and will not be described again here.
如图3所示,电子设备包括中央处理单元(CPU),其可以根据存储在只读存储器(ROM)中的计算机程序指令或者从存储单元加载到随机访问存储器(RAM)中的计算机程序指令,来执行各种适当的动作和处理。在RAM中,还可以存储设备操作所需的各种程序和数据。CPU、ROM以及RAM通过总线彼此相连。输入/输出(I/O)接口也连接至总线。As shown in Figure 3, the electronic device includes a central processing unit (CPU) that can perform, according to computer program instructions stored in a read-only memory (ROM) or loaded from a storage unit into a random access memory (RAM), to perform various appropriate actions and processing. In RAM, various programs and data required for device operation can also be stored. CPU, ROM and RAM are connected to each other through buses. Input/output (I/O) interfaces are also connected to the bus.
电子设备中的多个部件连接至I/O接口,包括:输入单元,例如键盘、鼠标等;输出单元,例如各种类型的显示器、扬声器等;存储单元,例如磁盘、光盘等;以及通信单元,例如网卡、调制解调器、无线通信收发机等。通信单元允许电子设备通过诸如因特网的计算机网络和/或各种电信网络与其他设备交换信息/数据。Multiple components in electronic equipment are connected to the I/O interface, including: input units, such as keyboards, mice, etc.; output units, such as various types of displays, speakers, etc.; storage units, such as magnetic disks, optical disks, etc.; and communication units , such as network cards, modems, wireless communication transceivers, etc. The communication unit allows the electronic device to exchange information/data with other devices through computer networks such as the Internet and/or various telecommunications networks.
处理单元执行上文所描述的各个方法和处理,例如方法S110~S160。例如,在一些实施例中,方法S110~S160可被实现为计算机软件程序,其被有形地包含于机器可读介质,例如存储单元。在一些实施例中,计算机程序的部分或者全部可以经由ROM和/或通信单元而被载入和/或安装到设备上。当计算机程序加载到RAM并由CPU执行时,可以执行上文描述的方法S110~S160的一个或多个步骤。备选地,在其他实施例中,CPU可以通过其他任何适当的方式(例如,借助于固件)而被配置为执行方法S110~S160。The processing unit executes each method and processing described above, such as methods S110 to S160. For example, in some embodiments, methods S110 to S160 may be implemented as a computer software program, which is tangibly included in a machine-readable medium, such as a storage unit. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device via the ROM and/or communication unit. When the computer program is loaded into RAM and executed by the CPU, one or more steps of the methods S110 to S160 described above may be performed. Alternatively, in other embodiments, the CPU may be configured to perform methods S110 to S160 in any other suitable manner (eg, by means of firmware).
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、芯片上系统的系统(SOC)、负载可编程逻辑设备(CPLD)等等。The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, and without limitation, exemplary types of hardware logic components that may be used include: Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), Systems on Chips (SOCs), Load programmable logic device (CPLD) and so on.
用于实施本发明的方法的程序代码可以采用一个或多个编程语言的任何组合来编写。这些程序代码可以提供给通用计算机、专用计算机或其他可编程数据处理装置的处理器或控制器,使得程序代码当由处理器或控制器执行时使流程图和/或框图中所规定的功能/操作被实施。程序代码可以完全在机器上执行、部分地在机器上执行,作为独立软件包部分地在机器上执行且部分地在远程机器上执行或完全在远程机器或服务器上执行。Program code for implementing the methods of the invention may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general-purpose computer, special-purpose computer, or other programmable data processing device, such that the program codes, when executed by the processor or controller, cause the functions specified in the flowcharts and/or block diagrams/ The operation is implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
在本发明的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。In the context of this disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, laptop disks, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
此外,虽然采用特定次序描绘了各操作,但是这应当理解为要求这样操作以所示出的特定次序或以顺序次序执行,或者要求所有图示的操作应被执行以取得期望的结果。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了若干具体实现细节,但是这些不应当被解释为对本发明的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实现中。相反地,在单个实现的上下文中描述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实现中。Furthermore, although operations are depicted in a specific order, this should be understood to require that such operations be performed in the specific order shown or in sequential order, or that all illustrated operations should be performed to achieve desirable results. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, although several specific implementation details are included in the above discussion, these should not be construed as limiting the scope of the invention. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single implementation. Conversely, various features that are described in the context of a single implementation can also be implemented in multiple implementations separately or in any suitable subcombination.
尽管已经采用特定于结构特征和/或方法逻辑动作的语言描述了本主题,但是应当理解所附权利要求书中所限定的主题未必局限于上面描述的特定特征或动作。相反,上面所描述的特定特征和动作仅仅是实现权利要求书的示例形式。Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are merely example forms of implementing the claims.
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202010826563.1ACN111897803B (en) | 2020-08-17 | 2020-08-17 | Database integrity evaluation method for power industry service system |
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202010826563.1ACN111897803B (en) | 2020-08-17 | 2020-08-17 | Database integrity evaluation method for power industry service system |
| Publication Number | Publication Date |
|---|---|
| CN111897803A CN111897803A (en) | 2020-11-06 |
| CN111897803Btrue CN111897803B (en) | 2023-10-20 |
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202010826563.1AActiveCN111897803B (en) | 2020-08-17 | 2020-08-17 | Database integrity evaluation method for power industry service system |
| Country | Link |
|---|---|
| CN (1) | CN111897803B (en) |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN115203300B (en)* | 2022-06-22 | 2023-10-31 | 中电金信软件有限公司 | Data verification method and device |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2002351875A (en)* | 2001-05-21 | 2002-12-06 | Requisite Technology Inc | Method and apparatus for analyzing the quality of database content |
| KR20040023376A (en)* | 2002-09-11 | 2004-03-18 | 한국과학기술정보연구원 | Real-time quality measurement method of bibliographic database |
| CN102156922A (en)* | 2011-04-20 | 2011-08-17 | 重庆市电力公司綦南供电局 | Data integrity determination method for application assessment expansion platform of IT (Information Technology) system |
| CN109299062A (en)* | 2018-07-02 | 2019-02-01 | 北京市天元网络技术股份有限公司 | A kind of quality evaluating method and system towards document category digital resource metadata |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2002351875A (en)* | 2001-05-21 | 2002-12-06 | Requisite Technology Inc | Method and apparatus for analyzing the quality of database content |
| KR20040023376A (en)* | 2002-09-11 | 2004-03-18 | 한국과학기술정보연구원 | Real-time quality measurement method of bibliographic database |
| CN102156922A (en)* | 2011-04-20 | 2011-08-17 | 重庆市电力公司綦南供电局 | Data integrity determination method for application assessment expansion platform of IT (Information Technology) system |
| CN109299062A (en)* | 2018-07-02 | 2019-02-01 | 北京市天元网络技术股份有限公司 | A kind of quality evaluating method and system towards document category digital resource metadata |
| Title |
|---|
| 关于数据表中字段属性"NULL"的理解与使用;李登道, 苏娜;计算机应用与软件(12);全文* |
| Publication number | Publication date |
|---|---|
| CN111897803A (en) | 2020-11-06 |
| Publication | Publication Date | Title |
|---|---|---|
| CN109684352B (en) | Data analysis system, data analysis method, storage medium, and electronic device | |
| CN112182071B (en) | Data association relation mining method and device, electronic equipment and storage medium | |
| CN112445875B (en) | Data association and verification method and device, electronic equipment and storage medium | |
| CN113778994B (en) | Database detection method, device, electronic device and computer readable medium | |
| CN111125266A (en) | Data processing method, device, equipment and storage medium | |
| CN111625561B (en) | Data query method and device | |
| CN109325062B (en) | Data dependency mining method and system based on distributed computation | |
| WO2018228049A1 (en) | Database performance index monitoring method, apparatus and device, and storage medium | |
| CN106484915B (en) | A method and system for cleaning massive data | |
| CN113763502B (en) | Chart generation method, device, equipment and storage medium | |
| CN113515560A (en) | Analysis method, device, electronic device and storage medium for vehicle failure | |
| CN108052542B (en) | Multidimensional data analysis method based on presto data | |
| CN111897803B (en) | Database integrity evaluation method for power industry service system | |
| US8548980B2 (en) | Accelerating queries based on exact knowledge of specific rows satisfying local conditions | |
| CN113918561B (en) | Hybrid query method and system based on analysis scene on cloud and storage medium | |
| CN111949644B (en) | Data quality evaluation method and equipment for power industry service system | |
| WO2025194836A1 (en) | Error positioning method and apparatus for information system, and electronic device and storage medium | |
| CN111897889B (en) | Data table integrity evaluation method for power industry service system | |
| CN111984625B (en) | Database load characteristic processing method and device, medium and electronic equipment | |
| CN116881219A (en) | Database optimization processing method and device, electronic equipment and storage medium | |
| CN104657388A (en) | Data processing method and device | |
| CN109766333A (en) | Data processing empty value method, apparatus and terminal device | |
| US8359329B2 (en) | Method, computer apparatus and computer program for identifying unusual combinations of values in data | |
| CN114547041A (en) | Method and device for determining database table | |
| CN115062687A (en) | Enterprise credit monitoring method, device, equipment and storage medium |
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |