CN104102652B

Movatterモバイル変換

Info

Publication number: CN104102652B
Application number: CN201310118763.1A
Authority: CN
Inventors: 徐小天; 王刚; 陈威; 石磊; 陈乐然
Original assignee: North China Electric Power Research Institute Co Ltd; State Grid Corp of China SGCC
Current assignee: North China Electric Power Research Institute Co Ltd; State Grid Corp of China SGCC
Priority date: 2013-04-08
Filing date: 2013-04-08
Publication date: 2017-05-24
Anticipated expiration: 2033-04-08
Also published as: CN104102652A

Abstract

Translated fromChinese

本发明提供一种非结构化数据存储系统及方法，包括：源系统数据服务器存储企业业务系统特征数据；XML生成器服务器根据源系统数据服务器的记录特征生成XML文件，并将源系统数据服务器中的非结构化数据文件本体进行提取，以与XML文件进行配对，生成XML文件与非结构化数据文件本体的对应关系；XML解析器服务器将XML文件按照字段匹配规则进行解析，获取XML文件对应的属性和分类信息；并根据XML文件对应的属性和分类信息，将配对后的XML文件与非结构化数据文件本体的对应关系存储到相应分类并赋予对应属性；非结构化数据服务器存储配对后的XML文件与非结构化数据文件本体的对应关系。本发明可以将各种类型源系统中的数据根据一定的业务规则导入至非结构化数据存储系统中。

The present invention provides an unstructured data storage system and method, including: the source system data server stores the characteristic data of the enterprise business system; the XML generator server generates an XML file according to the record characteristics of the source system data server, and stores the The ontology of the unstructured data file is extracted to pair with the XML file to generate the corresponding relationship between the XML file and the ontology of the unstructured data file; the XML parser server parses the XML file according to the field matching rules to obtain the corresponding attribute and classification information; and according to the corresponding attribute and classification information of the XML file, store the corresponding relationship between the paired XML file and the unstructured data file ontology in the corresponding classification and assign corresponding attributes; the unstructured data server stores the paired Correspondence between XML files and unstructured data file ontology. The invention can import data in various types of source systems into an unstructured data storage system according to certain business rules.

Description

Translated fromChinese

一种非结构化数据存储系统及方法An unstructured data storage system and method

技术领域technical field

本发明涉及企业信息化技术，尤其涉及一种非结构化数据存储系统及方法。The invention relates to enterprise information technology, in particular to an unstructured data storage system and method.

背景技术Background technique

BPM（Business Process Management，即业务流程管理，是一套达成企业各种业务环节整合的全面管理方式，它通常以网络方式实现信息传递、数据同步、业务监控和企业业务流程的持续升级与优化）是提高现代企业信息化水平的重要技术。使用统一的流程描述规范对业务进行形式化定义，可以方便的完成企业的信息化系统集成、再造等工作，实现信息化系统的明确业务分割。在BPM过程的系统实现层面，常会涉及到多个业务子系统的数据交互问题：多个业务数据上可能存在相互依赖的系统采用不同的数据存储和传输规范，从而为系统间进行数据交互带来了较大障碍，这在遗留系统间以及遗留系统与新开发系统的交互过程中最为常见。为解决该类问题，通常需要为系统间的数据接口开发相应的数据读写系统，以实现正常的数据交互。BPM (Business Process Management, that is, business process management, is a set of comprehensive management methods to achieve the integration of various business links of the enterprise. It usually realizes information transmission, data synchronization, business monitoring and continuous upgrading and optimization of enterprise business processes through the network) It is an important technology to improve the informatization level of modern enterprises. Using a unified process description specification to formally define the business can easily complete the integration and reengineering of the enterprise's information system, and realize the clear business division of the information system. At the system implementation level of the BPM process, the data interaction problem of multiple business subsystems is often involved: there may be interdependent systems on multiple business data that adopt different data storage and transmission specifications, thus bringing new benefits to the data interaction between systems. This is most common in the interaction between legacy systems and the interaction of legacy systems with newly developed systems. To solve such problems, it is usually necessary to develop a corresponding data reading and writing system for the data interface between systems to achieve normal data interaction.

电力行业企业中普遍部署了ERP（Enterprise Resource Planning，即企业资源计划，是针对物资资源管理、人力资源管理、财务资源管理、信息资源管理集成一体化的企业管理软件套件，是现代企业信息化主流解决方案的重要组成部分）、电力MIS（ManagementInformation System，即管理信息系统是一个以人为主导，利用计算机软硬件、网络通信设备以及其它办公设备，进行信息的收集、传输、加工、储存、更新和维护，以企业战略竞优、提高效益和效率为目的，支持企业的高层决策、中层控制和基层运作的集成化人机系统）等系统。通常使用ERP进行企业财务、资产、运营等方面的管理，而使用电力MIS进行两票、设备、检修等生产任务的管理。上述系统在国内市场已形成较为成熟的产品系列，多数解决方案中的业务数据采用结构化存储方式，即将数据存放于数据库的多个二维数据表中。而对业务数据中的非结构化数据（相对于结构化数据（行数据，存储在数据库中，可以用二维表结构来逻辑表达实现的数据）而言，无法用数据库二维逻辑表来表示的数据称为非结构化数据，主要包括各种格式的计算机文件，包括大文本、图片、音频、视频等格式），则主要有两种存储方式：一种是将非结构化数据本身作为一个二进制串，直接作为字段存放在数据库表的记录中；另一种则是在数据库表中存放指向非结构化数据存储路径的URL（UniformResource Locator，统一资源定位符），而将非结构化数据本身存放在独立的文件系统中。ERP (Enterprise Resource Planning, that is, enterprise resource planning) is widely deployed in enterprises in the power industry. It is an integrated enterprise management software suite for material resource management, human resource management, financial resource management, and information resource management. It is the mainstream of modern enterprise information An important part of the solution), power MIS (Management Information System, that is, management information system is a human-oriented, using computer hardware and software, network communication equipment and other office equipment to collect, transmit, process, store, update and Maintenance, for the purpose of corporate strategic competition, improving benefits and efficiency, supports the company's high-level decision-making, middle-level control, and integrated man-machine systems for grass-roots operations) and other systems. Usually, ERP is used to manage corporate finance, assets, and operations, while power MIS is used to manage production tasks such as bills, equipment, and maintenance. The above-mentioned systems have formed a relatively mature product series in the domestic market. The business data in most solutions adopts a structured storage method, that is, the data is stored in multiple two-dimensional data tables in the database. However, for unstructured data in business data (compared to structured data (row data, stored in the database, which can be logically expressed in a two-dimensional table structure), it cannot be represented by a two-dimensional logical table of the database The data is called unstructured data, which mainly includes computer files in various formats, including large text, pictures, audio, video, etc.), there are two main storage methods: one is to use unstructured data itself as a The binary string is directly stored as a field in the record of the database table; the other is to store the URL (UniformResource Locator, Uniform Resource Locator) pointing to the storage path of the unstructured data in the database table, and the unstructured data itself stored in a separate file system.

在电力企业内，上述系统中的非结构化文件主要包括各类设备设计文档、合同及说明文件、技术报告和检测报告、现场录音录像等，它们通常以附件的形式组织在系统流程中。在一般情况下，这些附件无法直接查找，也无法按类别、属性进行索引，只能通过查找所关联的业务流程，间接的获取相关信息。而电力企业为了掌握这部分生产相关的非结构化数据，需要建立专门用于存储和管理非结构化数据的数据存储系统，对非结构化数据按照不同的属性维度（比如按照年份、设备类型、制造商、重要程度等）进行分类索引，以方便从不同角度对其进行搜索和管理。In power companies, the unstructured files in the above systems mainly include various equipment design documents, contracts and description documents, technical reports and test reports, on-site audio and video recordings, etc., which are usually organized in the system process in the form of attachments. Under normal circumstances, these attachments cannot be directly searched, nor can they be indexed by category and attribute, and relevant information can only be obtained indirectly by searching the associated business process. In order to master this part of production-related unstructured data, electric power companies need to establish a data storage system dedicated to storing and managing unstructured data. manufacturer, importance, etc.) to facilitate search and management from different angles.

在上述背景下，如何对原有业务流程和生产信息管理系统中的非结构化数据和与之关联的结构化属性进行自动化的提取，建立原有系统中的流程、数据记录与非结构化数据存储系统中的非结构化文档的对应关系就成为本领域的技术人员所要解决的问题。In the above background, how to automatically extract the unstructured data and the associated structured attributes in the original business process and production information management system, and establish the processes, data records and unstructured data in the original system The correspondence between unstructured documents in the storage system becomes a problem to be solved by those skilled in the art.

现有技术业务流程系统中结构化数据的提取尚未形成通用的技术规范，目前较为主流的方法是开发独立的数据读写模块，以构建单个源系统与目标系统之间的数据读写通道，使用该种解决方案，一般需要如下步骤：首先确定目标系统进行非结构化数据存储所需的分类以及属性信息，整理出对应的源系统应提供的字段列表；查看数据库，确定非结构化数据本体的存放位置，如果直接以大字段方式存储，则对该字段进行反序列化，否则根据非结构化数据本体的存储路径读取数据本体；针对特定的源系统开发适配工具，在该适配工具中配置源系统数据库参数，从源系统数据库中分别读取非结构化数据和需要抽取的对应特征数据字段；适配工具调用目标系统接口，将源系统抽取的特征数据按照匹配规则作为相应非结构化文档的属性/类别信息写入目标系统数据库，并根据属性/类别信息将非结构化数据写入目标系统。The extraction of structured data in existing technology business process systems has not yet formed a general technical specification. The current mainstream method is to develop an independent data read-write module to build a data read-write channel between a single source system and a target system. This kind of solution generally requires the following steps: First, determine the classification and attribute information required by the target system for unstructured data storage, sort out the list of fields that the corresponding source system should provide; check the database, and determine the unstructured data ontology Storage location, if it is directly stored in a large field, deserialize the field, otherwise read the data ontology according to the storage path of the unstructured data ontology; develop an adaptation tool for a specific source system, in the adaptation tool Configure the source system database parameters in the source system database, respectively read the unstructured data and the corresponding feature data fields to be extracted from the source system database; the adaptation tool calls the target system interface, and uses the feature data extracted by the source system as the corresponding unstructured The attribute/category information of the structured document is written into the target system database, and the unstructured data is written into the target system according to the attribute/category information.

上述解决方案的主要缺点如下：开发成本高：需要为每个源系统开发一套独立的系统适配工具，以使源系统特征数据与目标系统（非结构化数据存储系统）的属性/类别字段相匹配；耦合程度高：该方案中源系统数据抽取与目标系统的数据写入均由同一个适配器完成，没有进行合理的功能区隔。无论是源系统数据存储结构出现变化，还是目标系统所使用的属性及类别发生调整，都需要对适配工具进行重新开发。尤其是存在多个源系统时，目标系统的调整将导致所有源系统适配工具的重新开发，从而适应调整后的非结构化数据关联属性；纠错难度高：由于各适配器直接读取源系统数据表格，不生成提取过程的中间文件，一旦发生错误仍需要读写源系统数据库进行追踪，并需要从数据提取步骤进行重新操作，修正成本较高。The main disadvantages of the above solutions are as follows: High development cost: A separate set of system adaptation tools needs to be developed for each source system to align the source system characteristic data with the target system (unstructured data storage system) attribute/category fields Matching; high degree of coupling: In this solution, the data extraction of the source system and the data writing of the target system are both completed by the same adapter, and there is no reasonable functional separation. Whether the data storage structure of the source system changes, or the attributes and categories used by the target system are adjusted, the adaptation tool needs to be redeveloped. Especially when there are multiple source systems, the adjustment of the target system will lead to the redevelopment of all source system adaptation tools, so as to adapt to the adjusted unstructured data association attributes; the difficulty of error correction is high: since each adapter directly reads the source system The data table does not generate intermediate files in the extraction process. Once an error occurs, it still needs to read and write the source system database for tracking, and it needs to be re-operated from the data extraction step, and the correction cost is high.

综上可见，如何设计一种自动化提取电力企业生产业务特征数据的方法，以将各种类型源系统中的数据根据一定的业务规则导入至非结构化数据存储系统中，这是本领域的技术人员亟待解决的一个技术难题。To sum up, it can be seen that how to design a method for automatically extracting the production business characteristic data of electric power enterprises, so as to import the data in various types of source systems into the unstructured data storage system according to certain business rules, is the technology in this field A technical problem that personnel urgently need to solve.

发明内容Contents of the invention

本发明实施例提供一种非结构化数据存储系统及方法，以将各种类型源系统中的数据根据一定的业务规则导入至非结构化数据存储系统中。Embodiments of the present invention provide an unstructured data storage system and method for importing data from various types of source systems into the unstructured data storage system according to certain business rules.

一方面，本发明实施例提供了一种非结构化数据存储系统，所述企业业务特征数据存储系统包括：源系统数据服务器、XML生成器服务器、XML解析器服务器、非结构化数据服务器，其中：On the one hand, an embodiment of the present invention provides an unstructured data storage system. The enterprise business characteristic data storage system includes: a source system data server, an XML generator server, an XML parser server, and an unstructured data server, wherein :

所述源系统数据服务器，用于存储企业业务系统特征数据；The source system data server is used to store the characteristic data of the enterprise business system;

所述XML生成器服务器，与所述源系统数据服务器相耦接，用于根据所述源系统数据服务器的记录特征生成XML文件，并将所述源系统数据服务器中的非结构化数据文件本体进行提取，以与所述XML文件进行配对，生成XML文件与非结构化数据文件本体的对应关系；The XML generator server, coupled with the source system data server, is used to generate an XML file according to the record characteristics of the source system data server, and convert the unstructured data file ontology in the source system data server Extracting to pair with the XML file to generate a correspondence between the XML file and the ontology of the unstructured data file;

所述XML解析器服务器，与所述XML生成器服务器相耦接，用于将所述XML文件按照字段匹配规则进行解析，获取所述XML文件对应的属性和分类信息；并根据所述XML文件对应的属性和分类信息，将配对后的所述XML文件与非结构化数据文件本体的对应关系存储到相应分类并赋予对应属性；The XML parser server, coupled with the XML generator server, is used to analyze the XML file according to the field matching rules, and obtain the attribute and classification information corresponding to the XML file; and according to the XML file Corresponding attributes and classification information, storing the corresponding relationship between the paired XML file and the unstructured data file ontology in the corresponding classification and assigning corresponding attributes;

所述非结构化数据服务器，与所述XML解析器服务器相耦接，用于存储配对后的所述XML文件与非结构化数据文件本体的对应关系。The unstructured data server is coupled with the XML parser server, and is used for storing the corresponding relationship between the paired XML file and the unstructured data file ontology.

可选的，在本发明一实施例中，所述XML生成器服务器将所述源系统数据服务器中的非结构化数据文件本体进行提取，包括：检索所述源系统数据服务器，确定非结构化数据文件本体的存放位置；根据所述非结构化数据文件本体的存放位置进行提取。Optionally, in an embodiment of the present invention, the XML generator server extracts the unstructured data file ontology in the source system data server, including: retrieving the source system data server, determining the unstructured The storage location of the data file body; extracting according to the storage location of the unstructured data file body.

可选的，在本发明一实施例中，所述XML生成器服务器根据所述非结构化数据文件本体的存放位置进行提取，进一步包括：如果所述源系统数据服务器的非结构化数据文件本体直接以大字段方式在数据表中存储，则对所述大字段进行反序列化，否则根据非结构化数据文件本体的存储路径读取对应的非结构化数据文件本体。Optionally, in an embodiment of the present invention, the XML generator server extracts according to the storage location of the unstructured data file body, further including: if the unstructured data file body of the source system data server If it is directly stored in the data table in the form of a large field, then the large field is deserialized; otherwise, the corresponding unstructured data file body is read according to the storage path of the unstructured data file body.

可选的，在本发明一实施例中，所述非结构化数据服务器，以文件偶的形式存储配对后的所述XML文件与非结构化数据文件本体的对应关系。Optionally, in an embodiment of the present invention, the unstructured data server stores the corresponding relationship between the paired XML file and the unstructured data file body in the form of a file pair.

可选的，在本发明一实施例中，所述XML生成器服务器根据所述源系统数据服务器的记录特征生成的XML文件中的单条记录的每个数据字段作为XML文件的一个节点，如果记录的某字段引用了其他表中的记录，则将该条字段引用记录作为当前字段节点的子节点。Optionally, in an embodiment of the present invention, each data field of a single record in the XML file generated by the XML generator server according to the record characteristics of the source system data server is used as a node of the XML file, if the record If a field in a field references a record in another table, the field reference record is made a child node of the current field node.

另一方面，本发明实施例提供了一种非结构化数据存储方法，所述方法应用于企业业务特征数据存储系统，该系统包括：源系统数据服务器、XML生成器服务器、XML解析器服务器、非结构化数据服务器，所述源系统数据服务器，用于存储企业业务系统特征数据；其中，所述方法包括：On the other hand, an embodiment of the present invention provides a method for storing unstructured data. The method is applied to an enterprise business feature data storage system, and the system includes: a source system data server, an XML generator server, an XML parser server, The unstructured data server, the source system data server, is used to store the characteristic data of the enterprise business system; wherein, the method includes:

通过所述XML生成器服务器根据所述源系统数据服务器的记录特征生成XML文件，并将所述源系统数据服务器中的非结构化数据文件本体进行提取，以与所述XML文件进行配对，生成XML文件与非结构化数据文件本体的对应关系；The XML generator server generates an XML file according to the record characteristics of the source system data server, and extracts the ontology of the unstructured data file in the source system data server to be paired with the XML file to generate Correspondence between XML files and unstructured data file ontology;

通过所述XML解析器服务器将所述XML文件按照字段匹配规则进行解析，获取所述XML文件对应的属性和分类信息；Analyzing the XML file according to the field matching rules by the XML parser server, and obtaining attributes and classification information corresponding to the XML file;

根据所述XML文件对应的属性和分类信息，将配对后的所述XML文件与非结构化数据文件本体的对应关系存储到所述非结构化数据服务器中的相应分类并赋予对应属性。According to the attribute and classification information corresponding to the XML file, store the corresponding relationship between the paired XML file and the unstructured data file ontology in the corresponding classification in the unstructured data server and assign corresponding attributes.

可选的，在本发明一实施例中，所述将所述源系统数据服务器中的非结构化数据文件本体进行提取，包括：检索所述源系统数据服务器，确定非结构化数据文件本体的存放位置；根据所述非结构化数据文件本体的存放位置进行提取。Optionally, in an embodiment of the present invention, the extracting the unstructured data file ontology in the source system data server includes: retrieving the source system data server, and determining the unstructured data file ontology Storage location: extracting according to the storage location of the unstructured data file body.

可选的，在本发明一实施例中，所述根据所述非结构化数据文件本体的存放位置进行提取，包括：如果所述源系统数据服务器的非结构化数据文件本体直接以大字段方式在数据表中存储，则对所述大字段进行反序列化，否则根据非结构化数据文件本体的存储路径读取对应的非结构化数据文件本体。Optionally, in an embodiment of the present invention, the extracting according to the storage location of the unstructured data file body includes: if the unstructured data file body of the source system data server is directly stored in a large field If it is stored in the data table, the large field is deserialized; otherwise, the corresponding unstructured data file body is read according to the storage path of the unstructured data file body.

可选的，在本发明一实施例中，所述将配对后的所述XML文件与非结构化数据文件本体的对应关系存储到所述非结构化数据服务器中的相应分类并赋予对应属性，包括：以文件偶的形式，将配对后的所述XML文件与非结构化数据文件本体的对应关系存储到所述非结构化数据服务器中的相应分类并赋予对应属性。Optionally, in an embodiment of the present invention, storing the corresponding relationship between the paired XML file and the ontology of the unstructured data file in the corresponding classification in the unstructured data server and assigning corresponding attributes, The method includes: storing the corresponding relationship between the paired XML file and the unstructured data file ontology in the unstructured data server in the form of a file pair and assigning corresponding attributes.

可选的，在本发明一实施例中，所述通过所述XML生成器服务器根据所述源系统数据服务器的记录特征生成XML文件，包括：通过所述XML生成器服务器根据所述源系统数据服务器的记录特征生成XML文件，其中，所述XML文件中的单条记录的每个数据字段作为XML文件的一个节点，如果记录的某字段引用了其他表中的记录，则将该条字段引用记录作为当前字段节点的子节点。Optionally, in an embodiment of the present invention, the generating the XML file according to the record characteristics of the source system data server through the XML generator server includes: using the XML generator server according to the source system data The record feature of the server generates an XML file, wherein each data field of a single record in the XML file is used as a node of the XML file, and if a certain field of the record refers to a record in another table, then the field refers to the record As a child node of the current field node.

上述技术方案具有如下有益效果：因为采用所述企业业务特征数据存储系统包括：源系统数据服务器、XML生成器服务器、XML解析器服务器、非结构化数据服务器，其中：所述源系统数据服务器，用于存储企业业务系统特征数据；所述XML生成器服务器，与所述源系统数据服务器相耦接，用于根据所述源系统数据服务器的记录特征生成XML文件，并将所述源系统数据服务器中的非结构化数据文件本体进行提取，以与所述XML文件进行配对，生成XML文件与非结构化数据文件本体的对应关系；所述XML解析器服务器，与所述XML生成器服务器相耦接，用于将所述XML文件按照字段匹配规则进行解析，获取所述XML文件对应的属性和分类信息；并根据所述XML文件对应的属性和分类信息，将配对后的所述XML文件与非结构化数据文件本体的对应关系存储到相应分类并赋予对应属性；所述非结构化数据服务器，与所述XML解析器服务器相耦接，用于存储配对后的所述XML文件与非结构化数据文件本体的对应关系的技术手段，所以达到了如下的技术效果：仅需开发一套XML生成器，一套XML解析器，就可以满足所有类型的源系统至目标系统的数据导入；无论源系统还是目标系统的数据表结构发生变化时，仅需要修改XML解析器使用的字段匹配规则配置文件，大大减少了开发工作量；将源系统数据抽取与目标系统的数据导入分隔为两个独立步骤，中间以标准化的XML文件进行数据交换，实现了较高程度的系统解耦；数据提取的结果采用XML与非结构化数据对应的形式进行存储，如果出现数据导入错误可以方便的根据所保留的中间结果进行排查和回溯。The above technical solution has the following beneficial effects: because the enterprise business feature data storage system includes: a source system data server, an XML generator server, an XML parser server, and an unstructured data server, wherein: the source system data server, It is used to store the characteristic data of the enterprise business system; the XML generator server is coupled with the source system data server, and is used to generate an XML file according to the record characteristics of the source system data server, and store the source system data The unstructured data file ontology in the server is extracted to be paired with the XML file to generate a corresponding relationship between the XML file and the unstructured data file ontology; the XML parser server is connected to the XML generator server Coupling, for parsing the XML file according to field matching rules to obtain the attributes and classification information corresponding to the XML file; and according to the attributes and classification information corresponding to the XML file, the paired XML file The corresponding relationship with the unstructured data file ontology is stored in the corresponding classification and given corresponding attributes; the unstructured data server is coupled with the XML parser server and is used to store the paired XML file and the unstructured data file. The technical means of the corresponding relationship of the structured data file ontology has achieved the following technical effects: only one set of XML generator and one set of XML parser need to be developed to satisfy the data import from all types of source systems to target systems; Regardless of whether the data table structure of the source system or the target system changes, only the field matching rule configuration file used by the XML parser needs to be modified, which greatly reduces the development workload; the data extraction of the source system and the data import of the target system are separated into two Independent steps, with standardized XML files for data exchange in the middle, to achieve a high degree of system decoupling; the results of data extraction are stored in the form corresponding to XML and unstructured data, if there is a data import error, it can be conveniently based on the The retained intermediate results are checked and traced back.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention. Those skilled in the art can also obtain other drawings based on these drawings without creative work.

图1为本发明实施例一种非结构化数据存储系统组成结构示意图；FIG. 1 is a schematic diagram of the composition and structure of an unstructured data storage system according to an embodiment of the present invention;

图2为本发明实施例一种非结构化数据存储方法流程图；FIG. 2 is a flowchart of an unstructured data storage method according to an embodiment of the present invention;

图3为本发明应用实例系统结构示意图；Fig. 3 is a schematic structural diagram of the application example system of the present invention;

图4为本发明应用实例图3中的系统运作机制流程示意图。FIG. 4 is a schematic flowchart of the system operation mechanism in FIG. 3 of the application example of the present invention.

具体实施方式detailed description

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

如图1所示，为本发明实施例一种非结构化数据存储系统组成结构示意图，所述企业业务特征数据存储系统包括：源系统数据服务器11、XML生成器服务器12、XML解析器服务器13、非结构化数据服务器14，其中：As shown in Figure 1, it is a schematic diagram of the composition structure of an unstructured data storage system according to an embodiment of the present invention, and the enterprise business feature data storage system includes: a source system data server 11, an XML generator server 12, and an XML parser server 13 . Unstructured data server 14, wherein:

所述源系统数据服务器11，用于存储企业业务系统特征数据；The source system data server 11 is used to store the characteristic data of the enterprise business system;

所述XML生成器服务器12，与所述源系统数据服务器11相耦接，用于根据所述源系统数据服务器的记录特征生成XML文件，并将所述源系统数据服务器中的非结构化数据文件本体进行提取，以与所述XML文件进行配对，生成XML文件与非结构化数据文件本体的对应关系；The XML generator server 12, coupled with the source system data server 11, is used to generate an XML file according to the record characteristics of the source system data server, and convert the unstructured data in the source system data server The file ontology is extracted to be paired with the XML file to generate a corresponding relationship between the XML file and the unstructured data file ontology;

所述XML解析器服务器13，与所述XML生成器服务器12相耦接，用于将所述XML文件按照字段匹配规则进行解析，获取所述XML文件对应的属性和分类信息；并根据所述XML文件对应的属性和分类信息，将配对后的所述XML文件与非结构化数据文件本体的对应关系存储到相应分类并赋予对应属性；The XML parser server 13, coupled with the XML generator server 12, is used to analyze the XML file according to the field matching rules, and obtain the attributes and classification information corresponding to the XML file; and according to the Attributes and classification information corresponding to the XML file, storing the corresponding relationship between the paired XML file and the ontology of the unstructured data file in the corresponding classification and assigning corresponding attributes;

所述非结构化数据服务器14，与所述XML解析器服务器13相耦接，用于存储配对后的所述XML文件与非结构化数据文件本体的对应关系。The unstructured data server 14 is coupled with the XML parser server 13, and is used for storing the corresponding relationship between the paired XML file and the unstructured data file ontology.

可选的，所述XML生成器服务器12将所述源系统数据服务器11中的非结构化数据文件本体进行提取，包括：检索所述源系统数据服务器，确定非结构化数据文件本体的存放位置；根据所述非结构化数据文件本体的存放位置进行提取。Optionally, the XML generator server 12 extracts the unstructured data file body in the source system data server 11, including: retrieving the source system data server, and determining the storage location of the unstructured data file body ; Extracting according to the storage location of the unstructured data file body.

可选的，所述XML生成器服务器12根据所述非结构化数据文件本体的存放位置进行提取，进一步包括：如果所述源系统数据服务器的非结构化数据文件本体直接以大字段方式在数据表中存储，则对所述大字段进行反序列化，否则根据非结构化数据文件本体的存储路径读取对应的非结构化数据文件本体。Optionally, the XML generator server 12 extracts according to the storage location of the unstructured data file body, further including: if the unstructured data file body of the source system data server is directly stored in the data in a large field If it is stored in the table, deserialize the large field, otherwise read the corresponding unstructured data file body according to the storage path of the unstructured data file body.

可选的，所述非结构化数据服务器14，以文件偶的形式存储配对后的所述XML文件与非结构化数据文件本体的对应关系。Optionally, the unstructured data server 14 stores the corresponding relationship between the paired XML file and the unstructured data file ontology in the form of a file pair.

可选的，所述XML生成器服务器12根据所述源系统数据服务器11的记录特征生成的XML文件中的单条记录的每个数据字段作为XML文件的一个节点，如果记录的某字段引用了其他表中的记录，则将该条字段引用记录作为当前字段节点的子节点。Optionally, each data field of a single record in the XML file generated by the XML generator server 12 according to the record characteristics of the source system data server 11 is used as a node of the XML file, if a certain field of the record refers to other record in the table, then use the field reference record as the child node of the current field node.

对应于上述方法实施例，如图2所示，为本发明实施例一种非结构化数据存储方法流程图，所述方法应用于企业业务特征数据存储系统，该系统包括：源系统数据服务器、XML生成器服务器、XML解析器服务器、非结构化数据服务器，所述源系统数据服务器，用于存储企业业务系统特征数据；其中，所述方法包括：Corresponding to the above-mentioned method embodiment, as shown in FIG. 2 , it is a flow chart of an unstructured data storage method according to an embodiment of the present invention. The method is applied to an enterprise business feature data storage system, and the system includes: a source system data server, XML generator server, XML parser server, unstructured data server, the source system data server is used to store the characteristic data of enterprise business system; wherein, the method includes:

201、通过所述XML生成器服务器根据所述源系统数据服务器的记录特征生成XML文件，并将所述源系统数据服务器中的非结构化数据文件本体进行提取，以与所述XML文件进行配对，生成XML文件与非结构化数据文件本体的对应关系；201. Use the XML generator server to generate an XML file according to the record characteristics of the source system data server, and extract the ontology of the unstructured data file in the source system data server to pair with the XML file , generate the corresponding relationship between the XML file and the unstructured data file ontology;

202、通过所述XML解析器服务器将所述XML文件按照字段匹配规则进行解析，获取所述XML文件对应的属性和分类信息；202. Use the XML parser server to parse the XML file according to field matching rules, and acquire attributes and classification information corresponding to the XML file;

203、根据所述XML文件对应的属性和分类信息，将配对后的所述XML文件与非结构化数据文件本体的对应关系存储到所述非结构化数据服务器中的相应分类并赋予对应属性。203. According to the attribute and classification information corresponding to the XML file, store the paired correspondence between the XML file and the unstructured data file ontology in a corresponding classification in the unstructured data server and assign corresponding attributes.

可选的，所述将所述源系统数据服务器中的非结构化数据文件本体进行提取，包括：检索所述源系统数据服务器，确定非结构化数据文件本体的存放位置；根据所述非结构化数据文件本体的存放位置进行提取。Optionally, the extracting the unstructured data file body in the source system data server includes: retrieving the source system data server to determine the storage location of the unstructured data file body; Extract from the storage location of the chemical data file ontology.

可选的，所述根据所述非结构化数据文件本体的存放位置进行提取，包括：如果所述源系统数据服务器的非结构化数据文件本体直接以大字段方式在数据表中存储，则对所述大字段进行反序列化，否则根据非结构化数据文件本体的存储路径读取对应的非结构化数据文件本体。Optionally, the extracting according to the storage location of the unstructured data file body includes: if the unstructured data file body of the source system data server is directly stored in the data table in the form of a large field, then The large field is deserialized, otherwise, the corresponding unstructured data file body is read according to the storage path of the unstructured data file body.

可选的，所述将配对后的所述XML文件与非结构化数据文件本体的对应关系存储到所述非结构化数据服务器中的相应分类并赋予对应属性，包括：以文件偶的形式，将配对后的所述XML文件与非结构化数据文件本体的对应关系存储到所述非结构化数据服务器中的相应分类并赋予对应属性。Optionally, storing the corresponding relationship between the paired XML file and the ontology of the unstructured data file in the corresponding classification in the unstructured data server and assigning corresponding attributes includes: in the form of a file pair, The corresponding relationship between the paired XML file and the ontology of the unstructured data file is stored in a corresponding classification in the unstructured data server and assigned a corresponding attribute.

可选的，所述通过所述XML生成器服务器根据所述源系统数据服务器的记录特征生成XML文件，包括：通过所述XML生成器服务器根据所述源系统数据服务器的记录特征生成XML文件，其中，所述XML文件中的单条记录的每个数据字段作为XML文件的一个节点，如果记录的某字段引用了其他表中的记录，则将该条字段引用记录作为当前字段节点的子节点。Optionally, the generating the XML file according to the record characteristics of the source system data server by the XML generator server includes: generating the XML file according to the record characteristics of the source system data server by the XML generator server, Wherein, each data field of a single record in the XML file is regarded as a node of the XML file, and if a certain field of the record refers to records in other tables, the field reference record is regarded as a child node of the current field node.

本发明实施例上述技术方案具有如下有益效果：因为采用所述企业业务特征数据存储系统包括：源系统数据服务器、XML生成器服务器、XML解析器服务器、非结构化数据服务器，其中：所述源系统数据服务器，用于存储企业业务系统特征数据；所述XML生成器服务器，与所述源系统数据服务器相耦接，用于根据所述源系统数据服务器的记录特征生成XML文件，并将所述源系统数据服务器中的非结构化数据文件本体进行提取，以与所述XML文件进行配对，生成XML文件与非结构化数据文件本体的对应关系；所述XML解析器服务器，与所述XML生成器服务器相耦接，用于将所述XML文件按照字段匹配规则进行解析，获取所述XML文件对应的属性和分类信息；并根据所述XML文件对应的属性和分类信息，将配对后的所述XML文件与非结构化数据文件本体的对应关系存储到相应分类并赋予对应属性；所述非结构化数据服务器，与所述XML解析器服务器相耦接，用于存储配对后的所述XML文件与非结构化数据文件本体的对应关系的技术手段，所以达到了如下的技术效果：仅需开发一套XML生成器，一套XML解析器，就可以满足所有类型的源系统至目标系统的数据导入；无论源系统还是目标系统的数据表结构发生变化时，仅需要修改XML解析器使用的字段匹配规则配置文件，大大减少了开发工作量；将源系统数据抽取与目标系统的数据导入分隔为两个独立步骤，中间以标准化的XML文件进行数据交换，实现了较高程度的系统解耦；数据提取的结果采用XML与非结构化数据对应的形式进行存储，如果出现数据导入错误可以方便的根据所保留的中间结果进行排查和回溯。The above technical solution of the embodiment of the present invention has the following beneficial effects: because the enterprise business feature data storage system includes: a source system data server, an XML generator server, an XML parser server, and an unstructured data server, wherein: the source The system data server is used to store the characteristic data of the enterprise business system; the XML generator server is coupled to the source system data server, and is used to generate an XML file according to the record characteristics of the source system data server, and store the Extract the unstructured data file ontology in the source system data server to pair with the XML file to generate the corresponding relationship between the XML file and the unstructured data file ontology; the XML parser server and the XML The generator server is coupled to analyze the XML file according to the field matching rules to obtain the attributes and classification information corresponding to the XML file; and according to the attributes and classification information corresponding to the XML file, the paired The corresponding relationship between the XML file and the unstructured data file ontology is stored in the corresponding category and assigned with the corresponding attribute; the unstructured data server is coupled with the XML parser server for storing the paired The technical means of the corresponding relationship between XML files and unstructured data file ontology, so the following technical effects are achieved: only one set of XML generator and one set of XML parser can be developed to satisfy all types of source systems to target systems No matter the data table structure of the source system or the target system changes, only the field matching rule configuration file used by the XML parser needs to be modified, which greatly reduces the development workload; the data extraction of the source system and the data import of the target system It is divided into two independent steps, with standardized XML files for data exchange in the middle, which achieves a high degree of system decoupling; the results of data extraction are stored in the form corresponding to XML and unstructured data. If there is a data import error, you can It is convenient to perform troubleshooting and backtracking based on the retained intermediate results.

以下举应用实例进行详细说明：The following application examples are given in detail:

针对现有技术方案的不足，本发明应用实例方案将各源系统（源系统数据服务器）的数据抽取与目标系统（非结构化数据服务器）的数据写入作为两个独立的步骤完成。本发明应用实例中，为所有的源系统设置的一个数据抽取模块（设置于XML生成器服务器中，以下称XML生成器），该模块将源数据库单条记录中所有特征数据一次性读出，按照既定规则生成（为每条记录生成唯一的）XML（Extensible Markup Language，即可扩展标记语言，它是一种用于标记电子文件使其具有结构性的标记语言，可以用来标记数据、定义数据类型，是一种允许用户对自己的标记语言进行定义的源语言）文档；设置单一XML解析器（设置于XML解析器服务器中，以下称XML解析器），对各源系统生成的XML文档进行解析，并将解析结果写入目标系统数据库中，如图3所示，为本发明应用实例系统结构示意图。In view of the deficiencies of the existing technical solutions, the application example solution of the present invention completes the data extraction of each source system (source system data server) and the data writing of the target system (unstructured data server) as two independent steps. In the application examples of the present invention, a data extraction module (set in the XML generator server, hereinafter referred to as the XML generator) provided for all source systems, this module reads out all characteristic data in a single record of the source database at one time, according to Established rules generate (generate unique for each record) XML (Extensible Markup Language, that is, Extensible Markup Language, which is a markup language used to mark electronic files to make them structural, and can be used to mark data and define data type, which is a source language document that allows users to define their own markup language; set up a single XML parser (set in the XML parser server, hereinafter referred to as XML parser) to perform XML documents generated by each source system analysis, and write the analysis results into the target system database, as shown in Figure 3, which is a schematic diagram of the system structure of the application example of the present invention.

如图4所示，为本发明应用实例图3中的系统运作机制流程示意图，包括：As shown in Figure 4, it is a schematic flow diagram of the system operation mechanism in Figure 3 of the application example of the present invention, including:

401、开始；401. start;

402、源数据库记录读取；402. Read source database records;

403、识别目标系统数据库中与单条记录相关的所有结构化字段信息，生成源记录相关特征字段XML文件；其中单条记录的每个数据字段作为XML文件的一个节点，如果记录的某字段引用了其他表中的记录，则将该条引用记录作为当前字段节点的子节点；403. Identify all structured field information related to a single record in the target system database, and generate an XML file related to the characteristic field of the source record; wherein each data field of a single record is used as a node of the XML file, if a certain field of the record refers to other records in the table, the reference record is used as a child node of the current field node;

404、判断非结构化数据文件本体是否存储在表内？如果是，则转405，否认，转406；404. Determine whether the unstructured data file body is stored in the table? If yes, go to 405, deny, go to 406;

405、如果源系统的非结构化数据文件本体直接以大字段方式在数据表中存储，则进行文件本体字段反序列化；405. If the unstructured data file body of the source system is directly stored in the data table in the form of a large field, deserialize the file body field;

406、如果非结构化数据文件本体没有存储在表内，则读取文件存储路径；406. If the unstructured data file body is not stored in the table, read the file storage path;

407、按照路径读取非结构化数据文件本体；407. Read the unstructured data file ontology according to the path;

408、进行XML文件与提取的非结构化数据文件本体配对，作为目标系统数据导入模块（即图3中的XML解析器）的输入；408. Perform pairing between the XML file and the extracted unstructured data file ontology, and use it as the input of the target system data import module (that is, the XML parser in FIG. 3 );

409、目标系统的数据导入模块将分析输入的XML文件，按照字段匹配规则配置文件提取需要用到的特征数据字段，作为非结构化文档的属性和分类信息，并依此将对应的非结构化数据存储到相应分类并写入特定属性；409. The data import module of the target system will analyze the input XML file, extract the required characteristic data fields according to the field matching rule configuration file, and use it as the attribute and classification information of the unstructured document, and accordingly convert the corresponding unstructured document Data is stored to the corresponding category and written to specific attributes;

410、目标系统非结构化数据写入；采用文件偶的形式将XML文件与非结构化数据文件本体进行存储，如果出现数据导入错误可以方便的根据所保留的中间结果进行排查和回溯；410. Write unstructured data in the target system; store the XML file and the unstructured data file body in the form of a file pair, and if there is a data import error, you can conveniently check and trace back according to the retained intermediate results;

411、结束。411. End.

本发明应用实例方案相比现行主流技术方案，在以下几个方面进行了改进：仅需开发一套XML生成器，一套XML解析器（需要说明的是，XML生成器与XML解析器的物理实现可分别设置于两个服务器中，或者设置于同一服务器中；另外，可以分别为每个源系统单独设计开发独立的XML生成器，分别进行数据抽取，同样能完成本发明应用实例上述文件抽取的目的），就可以满足所有类型的源系统至目标系统的数据导入；无论源系统还是目标系统的数据表结构发生变化时，仅需要修改XML解析器使用的字段匹配规则配置文件，大大减少了开发工作量；将源系统数据抽取与目标系统的数据导入分隔为两个独立步骤，中间以标准化的XML文件进行数据交换，实现了较高程度的系统解耦；数据提取的结果采用XML文件与非结构化数据文件本体以文件偶的形式进行存储，如果出现数据导入错误可以方便的根据所保留的中间结果进行排查和回溯。Compared with the current mainstream technical scheme, the application example scheme of the present invention has been improved in the following aspects: only a set of XML generator and a set of XML parser need to be developed (it should be noted that the physical structure of the XML generator and XML parser The implementation can be set in two servers respectively, or in the same server; in addition, an independent XML generator can be designed and developed separately for each source system, and data extraction can be performed separately, and the above-mentioned file extraction of the application example of the present invention can also be completed. purpose), it can satisfy all types of data import from the source system to the target system; no matter when the data table structure of the source system or the target system changes, only the field matching rule configuration file used by the XML parser needs to be modified, which greatly reduces the Development workload; the source system data extraction and the target system data import are separated into two independent steps, and standardized XML files are used for data exchange in the middle, which achieves a high degree of system decoupling; the results of data extraction use XML files and The unstructured data file body is stored in the form of a file pair. If there is a data import error, it can be easily checked and traced based on the retained intermediate results.

本领域技术人员还可以了解到本发明实施例列出的各种说明性逻辑块（illustrative logical block），单元，和步骤可以通过电子硬件、电脑软件，或两者的结合进行实现。为清楚展示硬件和软件的可替换性（interchangeability），上述的各种说明性部件（illustrative components），单元和步骤已经通用地描述了它们的功能。这样的功能是通过硬件还是软件来实现取决于特定的应用和整个系统的设计要求。本领域技术人员可以对于每种特定的应用，可以使用各种方法实现所述的功能，但这种实现不应被理解为超出本发明实施例保护的范围。Those skilled in the art can also understand that various illustrative logical blocks, units, and steps listed in the embodiments of the present invention can be implemented by electronic hardware, computer software, or a combination of both. To clearly demonstrate the interchangeability of hardware and software, the various illustrative components, units and steps above have generically described their functions. Whether such functions are implemented by hardware or software depends on the specific application and overall system design requirements. Those skilled in the art may use various methods to implement the described functions for each specific application, but such implementation should not be understood as exceeding the protection scope of the embodiments of the present invention.

本发明实施例中所描述的各种说明性的逻辑块，或单元，或服务器都可以通过通用处理器，数字信号处理器，专用集成电路（ASIC），现场可编程门阵列或其它可编程逻辑装置，离散门或晶体管逻辑，离散硬件部件，或上述任何组合的设计来实现或操作所描述的功能。通用处理器可以为微处理器，可选地，该通用处理器也可以为任何传统的处理器、控制器、微控制器或状态机。处理器也可以通过计算装置的组合来实现，例如数字信号处理器和微处理器，多个微处理器，一个或多个微处理器联合一个数字信号处理器核，或任何其它类似的配置来实现。The various illustrative logic blocks, or units, or servers described in the embodiments of the present invention can be implemented by general-purpose processors, digital signal processors, application-specific integrated circuits (ASICs), field programmable gate arrays or other programmable logic devices, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to implement or operate the described functions. The general-purpose processor may be a microprocessor, and optionally, the general-purpose processor may also be any conventional processor, controller, microcontroller or state machine. A processor may also be implemented by a combination of computing devices, such as a digital signal processor and a microprocessor, multiple microprocessors, one or more microprocessors combined with a digital signal processor core, or any other similar configuration to accomplish.

本发明实施例中所描述的方法或算法的步骤可以直接嵌入硬件、处理器执行的软件模块、或者这两者的结合。软件模块可以存储于RAM存储器、闪存、ROM存储器、EPROM存储器、EEPROM存储器、寄存器、硬盘、可移动磁盘、CD-ROM或本领域中其它任意形式的存储媒介中。示例性地，存储媒介可以与处理器连接，以使得处理器可以从存储媒介中读取信息，并可以向存储媒介存写信息。可选地，存储媒介还可以集成到处理器中。处理器和存储媒介可以设置于ASIC中，ASIC可以设置于用户终端中。可选地，处理器和存储媒介也可以设置于用户终端中的不同的部件中。The steps of the method or algorithm described in the embodiments of the present invention may be directly embedded in hardware, a software module executed by a processor, or a combination of both. The software modules may be stored in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, removable disk, CD-ROM or any other storage medium in the art. Exemplarily, the storage medium can be connected to the processor, so that the processor can read information from the storage medium, and can write information to the storage medium. Optionally, the storage medium can also be integrated into the processor. The processor and the storage medium can be set in the ASIC, and the ASIC can be set in the user terminal. Optionally, the processor and the storage medium may also be set in different components in the user terminal.

在一个或多个示例性的设计中，本发明实施例所描述的上述功能可以在硬件、软件、固件或这三者的任意组合来实现。如果在软件中实现，这些功能可以存储与电脑可读的媒介上，或以一个或多个指令或代码形式传输于电脑可读的媒介上。电脑可读媒介包括电脑存储媒介和便于使得让电脑程序从一个地方转移到其它地方的通信媒介。存储媒介可以是任何通用或特殊电脑可以接入访问的可用媒体。例如，这样的电脑可读媒体可以包括但不限于RAM、ROM、EEPROM、CD-ROM或其它光盘存储、磁盘存储或其它磁性存储装置，或其它任何可以用于承载或存储以指令或数据结构和其它可被通用或特殊电脑、或通用或特殊处理器读取形式的程序代码的媒介。此外，任何连接都可以被适当地定义为电脑可读媒介，例如，如果软件是从一个网站站点、服务器或其它远程资源通过一个同轴电缆、光纤电缆、双绞线、数字用户线（DSL）或以例如红外、无线和微波等无线方式传输的也被包含在所定义的电脑可读媒介中。所述的碟片（disk）和磁盘（disc）包括压缩磁盘、镭射盘、光盘、DVD、软盘和蓝光光盘，磁盘通常以磁性复制数据，而碟片通常以激光进行光学复制数据。上述的组合也可以包含在电脑可读媒介中。In one or more exemplary designs, the above functions described in the embodiments of the present invention may be implemented in hardware, software, firmware or any combination of the three. If implemented in software, the functions can be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes computer storage media and communication media that facilitate transfer of a computer program from one place to another. Storage media may be any available media that can be accessed by a general purpose or special computer. For example, such computer-readable media may include, but are not limited to, RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other device that can be used to carry or store instructions or data structures and Other medium of program code in a form readable by a general-purpose or special-purpose computer, or a general-purpose or special-purpose processor. Also, any connection is properly defined as a computer-readable medium, for example, if the software is transmitted from a website site, server, or other remote source via a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL) Or transmitted by wireless means such as infrared, wireless and microwave are also included in the definition of computer readable media. Disks and discs include compact discs, laser discs, optical discs, DVDs, floppy discs, and Blu-ray discs. Disks usually reproduce data magnetically, while discs usually reproduce data optically with lasers. Combinations of the above can also be contained on a computer readable medium.

以上所述的具体实施方式，对本发明的目的、技术方案和有益效果进行了进一步详细说明，所应理解的是，以上所述仅为本发明的具体实施方式而已，并不用于限定本发明的保护范围，凡在本发明的精神和原则之内，所做的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。The specific embodiments described above have further described the purpose, technical solutions and beneficial effects of the present invention in detail. It should be understood that the above descriptions are only specific embodiments of the present invention and are not intended to limit the scope of the present invention. Protection scope, within the spirit and principles of the present invention, any modification, equivalent replacement, improvement, etc., shall be included in the protection scope of the present invention.