Power grid mass data quality verification method based on PMS modelTechnical Field
The invention belongs to the technical field of power grid operation monitoring, and particularly relates to a power grid mass data quality verification method based on a PMS model.
Background
Under the guidance of 'twelve and five planning' of a national power grid company, the construction of a massive quasi-real-time data service platform is completed, in part of provinces, the platform leads a large amount of real-time data into a management information large area to participate in fusion analysis of the information data, so that the acquisition, access and analysis of real-time data of a production management system, an SCADA (supervisory control and data acquisition), a dispatching automation system, a meter, an online monitoring system and other systems are realized, and the analysis and high-level application (such as line loss analysis, load analysis and the like) of service data are realized. However, with the continuous increase of the acquisition amount of the power grid automation system, the coverage range is wider and wider, the stored data is multiplied, and stricter requirements are provided for the integrity, uniqueness, timeliness, legality, consistency, accuracy and the like of the data so as to facilitate the development of the big data service of the power grid by the big data mining and analyzing.
A power Production Management System (PMS) is constructed based on a unified application platform PI3000, and adopts a multilayer (specifically divided into a data layer, a service layer and a presentation layer) architecture system with a B/S and C/S mixed mode. The PI3000 platform is a set of service basic software platform which is developed based on model driving and component concept and is oriented to the power industry, and integrates the current service basic software platform concept according to the guiding principle of the SG186 engineering unified application platform. The design target is that the business model can be consistently established and maintained aiming at the complex and changeable requirements of the power enterprise, a complete infrastructure is provided for personalized application development, the generation of an application system is automatically or assisted, and the development and implementation efficiency of the application system is improved to the greatest extent. The PMS constructed on the PMS is highly uniform, has good continuous expansion capability and can meet the standard specification.
The PMS is an integrated production management information platform which takes asset management as a core, and the service covers three layers of a company headquarter, a network province company and a city company and runs through the whole process of power grid transmission, transformation and distribution production, and the PMS has very important significance for realizing power grid production intensification, refinement and standardized management and improving the asset management level of the company. The PMS is used as a working platform of production management and an online team, and finally realizes risk pre-control and auxiliary decision making of the production management through means of standard specification, flow monitoring, safety monitoring and the like.
Disclosure of Invention
The purpose of the invention is as follows: in order to overcome the defects in the prior art, the invention provides a power grid mass data quality verification method based on a PMS model, which can quickly complete the access function of real-time data; through the data quality checking module, the accuracy of data quality detection in each service system is improved; visualization and completeness are provided through the data quality auditing module.
The technical scheme is as follows: in order to achieve the purpose, the invention provides a power grid mass data quality verification method based on a PMS model, and provides a data quality verification system, which comprises a storage layer, an engine layer, a processing layer and a display layer which are sequentially connected with one another, wherein the storage layer comprises a relational database and a real-time database, the engine layer comprises a data transmission detection engine, a data quality detection engine and a data quality auditing engine, and the processing layer comprises a data transmission detection service, a data quality detection service and a data quality auditing service;
the method comprises the following steps:
and S1, data acquisition: data are collected in a mode of transmitting an E file through an FTP, specifically, the data firstly enter an FTP server to form E file data, and then the E file data enter an interface server;
s2 enters storage layer: the data in the interface server enters a storage layer and is stored in a real-time database in the storage layer;
s3 enters the engine layer: selecting a data transmission detection engine through an engine layer;
s4 enters the processing layer: entering a data transmission detection service according to a data transmission detection engine, specifically detecting a data transmission state between the FTP server and the interface server and storing the data transmission state into a storage layer internal relation database;
s5 enters the engine layer: selecting a data quality detection engine through an engine layer;
s6 enters the processing layer: entering data quality detection engine service according to a data quality detection engine, specifically detecting data quality conditions in a real-time database in a storage layer and storing the data quality conditions in a relational database in the storage layer;
s7 enters the engine layer: selecting a data quality auditing engine through an engine layer;
s8 enters the processing layer: entering a data quality auditing service according to a data quality auditing engine, calling a data quality auditing report in a relational database according to the selected data source and the time parameter, and checking the related data quality condition;
s9 entering the display layer: and the display layer is used for displaying the quality condition of the related data.
Further, the step S4 includes the following steps:
when the data transmission state between the FTP server and the interface server is FTP interruption, an alarm is required, and field personnel process the alarm information to keep FTP transmission smooth.
Further, the step S6 includes the following steps:
s61, constructing a power grid measuring point model tree;
s62, selecting the measuring points to be detected in batch in the measuring point model tree;
s63, adding a group of data quality checking methods, wherein the data quality checking methods comprise a data missing point detection method, a data null detection method, a data 0 detection method, a data jump detection method, a data burr detection method and a data repeated screening method;
s64, loading different types of data quality verification methods according to the selected measuring point to generate a data quality verification rule base file and inserting the data quality verification rule base into a relational database in the storage layer;
s65 calls data quality detection service, data quality detection is carried out by reading data quality check rule base files in the relational database and starting from the data of the relevant measuring points in the real-time database every day, and finally the data quality records of the relevant measuring points are stored in the relational database.
Further, the step S8 includes the following steps:
s81 calls data quality auditing service, generates Excel format files according to the related data quality conditions and data transmission states stored in a relational database by using JXL technology or generates PDF format files according to the related data quality conditions and data transmission states stored in the relational database by using iText technology, and takes the generated Excel format files or PDF format files as data quality auditing reports;
s82, selecting a data source and time parameters, calling a data quality audit report in the relational database and checking the data quality condition of the selected measuring point of the power grid measuring point model tree species in the data source.
Further, in step S9, the presentation layer presents the relevant data quality conditions, where the relevant data quality conditions include FTP interruption, data empty, data 0, data missing point, data duplication, data jump, and data glitch.
Has the advantages that: compared with the prior art, the invention has the advantages that:
the invention can quickly complete the access function of real-time data through the real-time data interaction tool; according to the power grid measuring point model tree and by combining various verification methods, comprehensive evaluation is carried out on data transmission states, data health states and the like, the data quality detection accuracy in each service system is improved through a data quality verification module, and a visual and complete data quality audit report is provided through a data quality audit module;
the invention provides a power grid mass data quality verification method based on a PMS equipment model structure, which is characterized in that a complete power grid real-time data equipment model is formed on a mass data platform by integrating and scheduling main and distribution network equipment models and completing hooking with mass real-time data, different data quality rule verification methods are configured by combining the power grid real-time data equipment models to verify the quality of mass data, data quality audit is carried out on different service systems, the results of defects, alarms and the like of the data quality are fed back to a control department, a basis of permission decision is provided for relevant service departments, and powerful support is provided for mass data mining and high-level application analysis; the data quality verification and the application customization based on the mass data platform are realized, the application development of each business department based on real-time data is attracted to the simple configuration generation of a tool of the use platform, a large amount of unnecessary data quality cleaning work is avoided, the pressure of operation and maintenance personnel is reduced, and the operation and maintenance work efficiency is improved.
Drawings
Fig. 1 is a flow chart of PMS-based power grid real-time data quality verification.
Fig. 2 is a diagram of a data quality verification system architecture based on a PMS model.
Fig. 3 is a block diagram of a data transmission detection module.
Fig. 4 is a flow chart of generating a data quality check rule base file.
FIG. 5 is a functional block diagram of a PMS model based real-time data inspection system.
FIG. 6 is a component instantiation flow diagram.
Detailed Description
The present invention will be further described with reference to the accompanying drawings.
The invention has proposed a mass data quality check method of electric wire netting based on model of PMS, the model structure of apparatus based on PMS, mainly integrate and schedule the main, distribution network equipment model, and then finish and the hanging connection of the real-time data of the magnanimity, form a complete real-time data equipment model of electric wire netting on the massive platform, combine the real-time data equipment model of electric wire netting to dispose different data quality rule check methods of quality check of the magnanimity data and carry on the data quality audit to different business systems, refer to FIG. 1, FIG. 1 is the real-time data quality check flow chart of electric wire netting based on PMS, first set up the real-time data model of electric wire netting based on PMS, EMS data and DMS data and access network province's model of separately cut-in city, aiming at the PMS model of access network province, need define and generate the real-; the method comprises the steps of generating a real-time data model based on a PMS model aiming at EMS and DMS data accessed to a city and definition, matching according to PMS and EMS matching rules of various regions at investigation, forming a web list for measuring points failed in matching, sending the web list to a city company matching project group for solving, modifying standard naming or improving the matching rules by manually searching problems, analyzing E file data of the city successfully matched and accessing the E file data through an interface, then generating real-time data storage corresponding to relevant models, and finally performing data quality verification based on a model tree by loading a data quality verification rule base.
The invention has proposed a power network massive data quality check method based on PMS model, provide a data quality check system, refer to fig. 2, fig. 2 is a data quality check system architecture diagram based on PMS model, the data quality check system includes the storage layer, engine layer, processing layer and presentation layer connected each other sequentially, wherein the storage layer is used for storing the data, there are relational database, real-time database and data quality rule base (ctl file) in the storage layer, the engine layer includes data transmission detection engine and data quality detection engine, the processing layer includes data transmission detection service, data quality detection service and data quality audit service, the presentation layer is used for revealing the data quality situation; the method for realizing the data quality verification according to the data quality verification system architecture diagram specifically comprises the following steps: the invention firstly collects mass real-time data, aiming at the mode that an access interface adopts FTP to transmit E files, the invention is concretely realized by connecting an FTP server in front of an interface server, referring to FIG. 3, the FTP server and the interface server are mutually connected, the interface server and a mass platform data quality server are mutually connected, a mass platform data quality server and a mass platform data quality client are mutually connected, the mass real-time data enter the FTP server to form E file data, the E file data are sent to the interface server, the interface server acquires the E file data on the FTP server, then the data are stored in a real-time database in a storage layer, which is equivalent to the real-time database installed in the mass platform data quality server, a relationship database is also installed in the mass platform data quality server and is equivalent to a relationship database in the storage layer, the mass platform data quality server acquires a data transmission state between the FTP server and the interface server, the data transmission state between the FTP server and the interface server is stored in a relational database, and the mass platform data quality server sends data related to data quality into a mass platform data quality client; when mass data is acquired in an E file analysis mode, data transmission state detection is required before data quality verification, a data transmission detection service in a processing layer is entered through a data transmission detection engine in an engine layer, in the process of calling the data transmission detection service, a data transmission state between an FTP server and an interface server is monitored in the interface server by using an FTP service monitoring detection technology, a data transmission state record is stored in a storage layer internal relation database by means of a JDBC technology for subsequent generation of a data quality audit report, if FTP interruption occurs, an alarm is given in time, and field personnel immediately process alarm information to keep FTP transmission smooth; entering a data quality detection service in a processing layer through a data quality detection engine in an engine layer, detecting the data quality condition in a real-time database in a storage layer by calling the data quality detection service and storing the data quality condition in a relational database in the storage layer; entering data quality audit service through a data quality audit engine in an engine layer, calling the data quality audit service, calling a data quality audit report in a relational database according to a selected data source and time parameters, and checking data quality conditions of related measuring points; displaying the related data quality condition through a display layer, wherein the related data quality condition is determined according to a verification rule adopted in the generation of a data quality verification rule base file and comprises FTP interruption, data null, data 0, data missing points, data repetition, data jumping and data burr; entering a data quality detection service in a processing layer by a data quality detection engine in an engine layer specifically comprises the following steps: firstly, generating a data quality check rule base file, and constructing a measuring point model tree of measuring points acquired by an interface server based on a PMS power grid model by referring to FIG. 4; loading a power grid data measuring point model tree; a group of data quality checking methods is added, and the data quality checking methods comprise a data missing point detection method, a data null detection method, a data 0 detection method, a data jump detection method, a data burr detection method and a data repeated screening method; loading different types of verification methods on the measuring points in the measuring point model tree in the data source, generating a data quality verification rule base file and inserting the data quality verification rule base into a storage layer internal relation database; the following explains the calculation methods of different types of data quality check methods:
data is null: inquiring whether the measuring points have no data;
data missing points: inquiring the acquisition period of the measuring points, and applying a formula: 24 (hours) × (60/data collection period (minutes)) -the total number of all data records on the day, then the number of missing points for the test point on the day is obtained;
the data is 0: and setting data null shift according to the field condition, if the data is 0, judging whether the value of 0 is normal according to the state of the data, and recording the running state of the equipment corresponding to the data according to the state of the data. Judging whether the measuring point data is 0 or not to be normal according to the measuring type when the equipment is in the running state; when the equipment is in a non-running state, the data is in a normal state when the data is 0;
data jumping: if the data change of the measuring point exceeds 20% of the last measuring point, the data is considered to be jumping data, whether the data is correct data or not needs to be judged according to a state value in the data in the jumping data, and the data state value records the running state of equipment corresponding to the data;
data glitch: if the data change of the measuring point exceeds 30% of the last measuring point, the data is regarded as the data of the burr, whether the data is correct data or not needs to be judged according to the state value in the data in the jumping data, and the data state value records the running state of equipment corresponding to the data;
data repetition: inquiring whether repeated measuring points exist in a real-time library or not;
entering data quality detection service in a processing layer through a data quality detection engine in an engine layer, calling the data quality detection service, starting data quality detection from daily data of relevant measuring points in a real-time database by reading a data quality check rule base file in a relational database, and finally storing data quality records of the relevant measuring points into the relational database so as to generate a data quality audit report in the subsequent process.
Entering data quality audit service through a data quality audit engine in an engine layer, calling the data quality audit service, calling a data quality audit report in a relational database according to a selected data source and a time parameter, and checking related data quality conditions; generating an Excel format file from the related data quality condition and the data transmission state stored in a relational database by using a JXL technology or generating a PDF format file from the related data quality condition and the data transmission state stored in the relational database by using an iText technology, taking the generated Excel format file or PDF format file as a data quality audit report, wherein the data quality audit result comprises FTP interruption, data empty, data 0, data leakage points, data repetition, data jumping and data burr, selecting a data source and time parameters, calling the data quality audit report in the relational database and checking the data quality condition of the selected measuring point of the power grid measuring point model tree type in the data source;
and entering a display layer, wherein the display layer is used for displaying a related data quality situation and displaying auditing results about data consistency, integrity, correctness and the like, and the data quality auditing results comprise FTP interruption, data null, data 0, data missing points, data repetition, data jumping and data burr.
The method comprises the steps that a real-time data checking system function module diagram based on a PMS model is modularized to obtain a real-time data checking system function module diagram as shown in figure 5, wherein the system function module diagram comprises a menu management component, a component management module, a page template management module, a data quality auditing module, a data quality checking module, a real-time data interaction module, a real-time data retrieval module, a data model correlation module and a data transmission detection module; instantiating the components, referring to fig. 6, selecting the components from the component library, configuring parameters, displaying a data model tree structure, selecting measuring points from the data model tree structure, completing parameter configuration, generating component instances, and configuring quality check rules of data related to the measuring points.
Example (b):
referring to fig. 3, as shown in fig. 3, an FTP server and an interface server are connected with each other, the interface server and a mass platform data quality server are connected with each other, the mass platform data quality server and a mass platform data quality client are connected with each other, a real-time database and an oracle relational database are installed on the mass platform data quality server, a device id, a PMS model of an access province and other configuration information are stored in the relational database, a data transmission detection program needs to be started for data quality verification by adopting an E-file analysis mode to acquire mass data, the program monitors the communication process of the interface server and the FTP server, if FTP interruption occurs, an alarm is given in time, a recording file is generated in the relational database for generating a data quality audit report, field personnel should immediately process alarm information, and FTP transmission is kept smooth;
then, a mass platform data quality verification system is logged in through a mass platform data quality client to open a verification rule base configuration page, a measurement point needing to be verified is selected in a measurement point model tree and a verification method is configured, such as a data missing point detection method, a data null detection method, a data 0 detection method, a data jump detection method, a data burr detection method and a data repeated screening method; generating a verification rule base file and inserting the data quality verification rule base file into a relational database; then, a data quality checking program reads a data quality checking rule base file (ctl file) in the relational data database, extracts a file which accords with a checking algorithm from the real-time database to start quality checking, and after calculation is finished, generates and inserts indexes of consistency, integrity, correctness and the like of data quality checking into the relational database for calling data quality auditing service;
and logging in a mass platform data quality verification system to open a data quality audit page, selecting a data source and time parameters, calling a data quality audit report in a relational database by a data quality audit service to check a quality report of a selected measuring point of a model tree type in the data source, and selecting printing.
The above description is only of the preferred embodiments of the present invention, and it should be noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the invention and these are intended to be within the scope of the invention.