Movatterモバイル変換


[0]ホーム

URL:


CN110851136A - Data acquisition method, device, electronic device and storage medium - Google Patents

Data acquisition method, device, electronic device and storage medium
Download PDF

Info

Publication number
CN110851136A
CN110851136ACN201910881318.8ACN201910881318ACN110851136ACN 110851136 ACN110851136 ACN 110851136ACN 201910881318 ACN201910881318 ACN 201910881318ACN 110851136 ACN110851136 ACN 110851136A
Authority
CN
China
Prior art keywords
data
read
file
key
markup language
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910881318.8A
Other languages
Chinese (zh)
Inventor
唐志辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co LtdfiledCriticalPing An Technology Shenzhen Co Ltd
Priority to CN201910881318.8ApriorityCriticalpatent/CN110851136A/en
Priority to PCT/CN2019/118979prioritypatent/WO2021051624A1/en
Publication of CN110851136ApublicationCriticalpatent/CN110851136A/en
Pendinglegal-statusCriticalCurrent

Links

Images

Classifications

Landscapes

Abstract

Translated fromChinese

一种数据获取方法,所述方法包括:接收携带有关键词的数据获取请求;根据所述关键词,从缓存中获取与所述关键词对应的可扩展标记语言文件,以及获取待读取文件;使用超文本标记语言解析器解析所述可扩展标记语言文件,获得所述可扩展标记语言文件中每种类型的数据的配置信息;从所述配置信息中获取预设键,并根据所述配置信息以及所述预设键,获得键值对;将所述键值对读入缓存中;从所述缓存中读取所述键值对,并根据所述键值对确定数据读取格式,以及依据所述数据读取格式,从所述待读取文件中读取目标数据。本发明还提供一种数据获取装置、电子设备及存储介质。本发明能提高文件的读取效率,同时,系统资源利用率较高。

A data acquisition method, the method comprising: receiving a data acquisition request carrying a keyword; acquiring an extensible markup language file corresponding to the keyword from a cache according to the keyword, and acquiring a to-be-read file ; Use a hypertext markup language parser to parse the extensible markup language file to obtain configuration information of each type of data in the extensible markup language file; obtain a preset key from the configuration information, and according to the configuration information and the preset key, obtain a key-value pair; read the key-value pair into the cache; read the key-value pair from the cache, and determine the data read format according to the key-value pair , and read target data from the to-be-read file according to the data read format. The present invention also provides a data acquisition device, an electronic device and a storage medium. The invention can improve the reading efficiency of files, and meanwhile, the utilization rate of system resources is high.

Description

Translated fromChinese
数据获取方法、装置、电子设备及存储介质Data acquisition method, device, electronic device and storage medium

技术领域technical field

本发明涉及智能终端技术领域,尤其涉及一种数据获取方法、装置、电子设备及存储介质。The present invention relates to the technical field of intelligent terminals, and in particular, to a data acquisition method, device, electronic device and storage medium.

背景技术Background technique

目前,在使用电脑办公时,经常需要处理各种文档(比如合同、简历),以从文档中获取需要的数据。通常采用的方法是使用java硬编码的方式作为文本解析工具,通过java硬编码的方式来解析文档,获得数据。具体的,将java硬编码后的编码文件存储在本地某个路径下,当需要解析某个文档时,从该路径下加载编码文件,通过编码文件来解析文档,从而获得数据。At present, when using a computer for office work, it is often necessary to process various documents (such as contracts, resumes) to obtain required data from the documents. The commonly used method is to use the hard-coded method of java as a text parsing tool, and to parse the document and obtain the data by hard-coded method of java. Specifically, the encoded file hardcoded in java is stored in a local path, when a document needs to be parsed, the encoded file is loaded from the path, and the document is parsed through the encoded file to obtain data.

这种方式虽然能够解析文档获得数据,但是由于编码文件通常占用的内存较大,需要保存在本地,而从本地中加载编码文件来解析文档,需要花费较长的时间,这使得文档读取的效率较低,系统资源利用率也不高。Although this method can parse the document to obtain data, because the encoded file usually occupies a large amount of memory, it needs to be saved locally, and it takes a long time to load the encoded file from the local to parse the document, which makes the document read. The efficiency is low, and the system resource utilization rate is not high.

发明内容SUMMARY OF THE INVENTION

鉴于以上内容,有必要提供一种数据获取方法、装置、电子设备及存储介质,能够整体上提高文件的读取效率,同时,系统资源利用率较高。In view of the above content, it is necessary to provide a data acquisition method, device, electronic device and storage medium, which can improve the file reading efficiency as a whole and at the same time, the system resource utilization rate is relatively high.

本发明的第一方面提供一种数据获取方法,所述方法包括:A first aspect of the present invention provides a data acquisition method, the method comprising:

接收携带有关键词的数据获取请求;Receive a data acquisition request carrying a keyword;

根据所述关键词,从缓存中获取与所述关键词对应的可扩展标记语言文件,以及获取待读取文件;According to the keyword, obtain the extensible markup language file corresponding to the keyword from the cache, and obtain the file to be read;

使用超文本标记语言解析器解析所述可扩展标记语言文件,获得所述可扩展标记语言文件中每种类型的数据的配置信息;Using a hypertext markup language parser to parse the extensible markup language file to obtain configuration information of each type of data in the extensible markup language file;

从所述配置信息中获取预设键,并根据所述配置信息以及所述预设键,获得键值对;Obtain a preset key from the configuration information, and obtain a key-value pair according to the configuration information and the preset key;

将所述键值对读入缓存中;read the key-value pair into the cache;

从所述缓存中读取所述键值对,并根据所述键值对确定数据读取格式,以及依据所述数据读取格式,从所述待读取文件中读取目标数据。The key-value pair is read from the cache, a data reading format is determined according to the key-value pair, and target data is read from the to-be-read file according to the data reading format.

在一种可能的实现方式中,所述使用超级文本标记语言解析器解析所述可扩展标记语言文件,获得所述可扩展标记语言文件中每种类型的数据的配置信息包括:In a possible implementation manner, using a hypertext markup language parser to parse the extensible markup language file, and obtaining configuration information of each type of data in the extensible markup language file includes:

针对每种类型的数据,根据所述数据的类型,从所述可扩展标记语言文件中,确定与所述数据相关的目标标签;For each type of data, according to the type of the data, from the extensible markup language file, determine a target tag related to the data;

通过所述超级文本标记语言解析器的选择器,读取所述目标标签,获得所述数据的配置信息;或,通过所述超级文本标记语言解析器的文档对象模型访问方法,读取所述目标标签,获得所述数据的配置信息。Through the selector of the hypertext markup language parser, read the target tag to obtain the configuration information of the data; or, through the document object model access method of the hypertext markup language parser, read the Target tag to obtain the configuration information of the data.

在一种可能的实现方式中,所述根据所述配置信息以及所述预设键,获得键值对包括:In a possible implementation manner, the obtaining a key-value pair according to the configuration information and the preset key includes:

将所述配置信息保存至目标对象中;saving the configuration information to the target object;

将所述预设键和所述目标对象构成键值对。The preset key and the target object form a key-value pair.

在一种可能的实现方式中,所述根据所述键值对,确定数据读取格式,并依据所述数据读取格式从所述待读取文件中读取目标数据包括:In a possible implementation manner, determining the data reading format according to the key-value pair, and reading the target data from the to-be-read file according to the data reading format includes:

从所述键值对中,确定正则表达式;From the key-value pair, determine the regular expression;

使用所述正则表达式,从所述待读取文件存储的数据中,获取与所述正则表达式匹配的目标数据。Using the regular expression, obtain target data matching the regular expression from the data stored in the file to be read.

在一种可能的实现方式中,所述正则表达式有多个,所述使用所述正则表达式,从所述待读取文件存储的数据中,获取与所述正则表达式匹配的目标数据包括:In a possible implementation manner, there are multiple regular expressions, and the regular expression is used to obtain target data matching the regular expression from the data stored in the file to be read include:

按照多个所述正则表达式的预设排列顺序,依次判断所述待读取文件存储的所有数据中是否存在与所述正则表达式匹配的目标数据;According to the preset arrangement order of a plurality of the regular expressions, sequentially determine whether there is target data matching the regular expression in all the data stored in the to-be-read file;

若所述待读取文件存储的所有数据中存在与所述正则表达式匹配的目标数据,获取与所述正则表达式匹配的目标数据。If target data matching the regular expression exists in all the data stored in the file to be read, obtain the target data matching the regular expression.

在一种可能的实现方式中,所述根据所述关键词,从缓存中获取与所述关键词对应的可扩展标记语言文件,以及获取待读取文件之后,所述方法还包括:In a possible implementation manner, after acquiring the extensible markup language file corresponding to the keyword from the cache according to the keyword, and acquiring the file to be read, the method further includes:

使用文本解析工具解析所述待读取文件,获得输入流;Use a text parsing tool to parse the to-be-read file to obtain an input stream;

将所述输入流保存至缓存中;saving the input stream to the cache;

所述依据所述数据读取格式,从所述待读取文件中读取目标数据包括:The reading target data from the to-be-read file according to the data reading format includes:

依据所述数据读取格式,从所述缓存中读取所述输入流中的目标数据。According to the data reading format, the target data in the input stream is read from the buffer.

在一种可能的实现方式中,所述方法还包括:In a possible implementation, the method further includes:

从所述键值对中,获取参数类型和参数名称;Obtain the parameter type and parameter name from the key-value pair;

按照所述参数类型和所述参数名称的数据保存格式,保存所述目标数据。The target data is saved according to the data saving format of the parameter type and the parameter name.

本发明的第二方面提供一种数据获取装置,所述装置包括:A second aspect of the present invention provides a data acquisition device, the device comprising:

接收模块,用于接收携带有关键词的数据获取请求;a receiving module, configured to receive a data acquisition request carrying a keyword;

第一获取模块,用于根据所述关键词,从缓存中获取与所述关键词对应的可扩展标记语言文件,以及获取待读取文件;a first obtaining module, configured to obtain an extensible markup language file corresponding to the keyword from the cache according to the keyword, and obtain the file to be read;

解析模块,用于使用超文本标记语言解析器解析所述可扩展标记语言文件,获得所述可扩展标记语言文件中每种类型的数据的配置信息;a parsing module, configured to use a hypertext markup language parser to parse the extensible markup language file to obtain configuration information of each type of data in the extensible markup language file;

第二获取模块,用于从所述配置信息中获取预设键,并根据所述配置信息以及所述预设键,获得键值对;a second obtaining module, configured to obtain a preset key from the configuration information, and obtain a key-value pair according to the configuration information and the preset key;

第一读取模块,用于将所述键值对读入缓存中;a first reading module, for reading the key-value pair into the cache;

第二读取模块,用于从所述缓存中读取所述键值对,并根据所述键值对确定数据读取格式,以及依据所述数据读取格式,从所述待读取文件中读取目标数据。A second reading module, configured to read the key-value pair from the cache, determine a data read format according to the key-value pair, and read the file to be read from the to-be-read file according to the data read format Read the target data in.

本发明的第三方面提供一种电子设备,所述电子设备包括处理器和存储器,所述处理器用于执行所述存储器中存储的计算机程序时实现所述的数据获取方法。A third aspect of the present invention provides an electronic device, the electronic device includes a processor and a memory, and the processor is configured to implement the data acquisition method when executing a computer program stored in the memory.

本发明的第四方面提供一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现所述的数据获取方法。A fourth aspect of the present invention provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the data acquisition method is implemented.

由以上技术方案,本发明中,可以接收数据获取请求,所述数据获取请求携带有关键词和待读取文件;进一步地,根据所述关键词,从缓存中获取与所述关键词对应的可扩展标记语言文件;再进一步地,使用超级文本标记语言解析器解析所述可扩展标记语言文件,获得所述可扩展标记语言文件中每种类型的数据的配置信息;更进一步地,从所述配置信息中获取预设键,并根据所述配置信息以及所述预设键,获得键值对;将所述键值对读入缓存中;根据所述键值对,确定数据读取格式,并依据所述数据读取格式从所述待读取文件中读取目标数据。可见,本发明中,可以预先将可扩展标记语言文件存储在缓存中,当需要获取待读取文件中的目标数据时,可以加载缓存中的可扩展标记语言文件,并通过超文本标记语言解析器来解析该可扩展标记语言文件,获得配置信息以及键值对,并将键值对读入缓存中,依据键值对来读取所述待读取文件中的目标数据,整个过程中,都是从缓存中读取可扩展标记语言文件以及键值对,充分利用了缓存的系统资源,此外,由于可扩展标记语言文件占用的内存较小,从缓存中能快速的读取可扩展标记语言文件,从而能够整体上提高文件的读取效率,同时,系统资源利用率较高。According to the above technical solutions, in the present invention, a data acquisition request can be received, and the data acquisition request carries a keyword and a file to be read; further, according to the keyword, the data corresponding to the keyword is acquired from the cache. Extensible Markup Language file; further, use a hypertext markup language parser to parse the Extensible Markup Language file to obtain configuration information of each type of data in the Extensible Markup Language file; further, from all Obtain a preset key from the configuration information, and obtain a key-value pair according to the configuration information and the preset key; read the key-value pair into the cache; determine the data read format according to the key-value pair , and read target data from the to-be-read file according to the data read format. It can be seen that in the present invention, the extensible markup language file can be stored in the cache in advance, and when the target data in the file to be read needs to be obtained, the extensible markup language file in the cache can be loaded, and parsed by the hypertext markup language The extensible markup language file is parsed by the browser to obtain configuration information and key-value pairs, and the key-value pairs are read into the cache, and the target data in the to-be-read file is read according to the key-value pairs. During the whole process, Both read the extensible markup language files and key-value pairs from the cache, making full use of the system resources of the cache. In addition, because the extensible markup language files occupy less memory, the extensible markup can be quickly read from the cache. language files, so that the file reading efficiency can be improved as a whole, and at the same time, the system resource utilization rate is high.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only It is an embodiment of the present invention. For those of ordinary skill in the art, other drawings can also be obtained according to the provided drawings without creative work.

图1是本发明公开的一种数据获取方法的较佳实施例的流程图。FIG. 1 is a flowchart of a preferred embodiment of a data acquisition method disclosed in the present invention.

图2是本发明公开的一种数据获取装置的较佳实施例的功能模块图。FIG. 2 is a functional block diagram of a preferred embodiment of a data acquisition device disclosed in the present invention.

图3是本发明实现数据获取方法的较佳实施例的电子设备的结构示意图。FIG. 3 is a schematic structural diagram of an electronic device according to a preferred embodiment of the data acquisition method of the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

除非另有定义,本文所使用的所有的技术和科学术语与属于本发明的技术领域的技术人员通常理解的含义相同。本文中在本发明的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本发明。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terms used herein in the description of the present invention are for the purpose of describing specific embodiments only, and are not intended to limit the present invention.

本发明实施例的数据获取方法应用在电子设备中,也可以应用在电子设备和通过网络与所述电子设备进行连接的服务器所构成的硬件环境中,由服务器和电子设备共同执行。网络包括但不限于:广域网、城域网或局域网。The data acquisition method of the embodiment of the present invention is applied to an electronic device, and can also be applied to a hardware environment composed of an electronic device and a server connected to the electronic device through a network, and is jointly executed by the server and the electronic device. The network includes, but is not limited to: a wide area network, a metropolitan area network or a local area network.

所述电子设备包括一种能够按照事先设定或存储的指令,自动进行数值计算和/或信息处理的电子设备,其硬件包括但不限于微处理器、专用集成电路(ASIC)、现场可编程门阵列(FPGA)、数字处理器(DSP)、嵌入式设备等。所述电子设备还可包括网络设备和/或用户设备。其中,所述网络设备包括但不限于单个网络服务器、多个网络服务器组成的服务器组或基于云计算(Cloud Computing)的由大量主机或网络服务器构成的云。所述用户设备包括但不限于任何一种可与用户通过键盘、鼠标、遥控器、触摸板或声控设备等方式进行人机交互的电子产品,例如,个人计算机、平板电脑、智能手机、个人数字助理PDA等。The electronic device includes an electronic device that can automatically perform numerical calculation and/or information processing according to pre-set or stored instructions, and its hardware includes but is not limited to microprocessors, application-specific integrated circuits (ASIC), field programmable Gate Array (FPGA), Digital Processor (DSP), Embedded Devices, etc. The electronic equipment may also include network equipment and/or user equipment. Wherein, the network device includes, but is not limited to, a single network server, a server group formed by multiple network servers, or a cloud formed by a large number of hosts or network servers based on cloud computing (Cloud Computing). The user equipment includes but is not limited to any electronic product that can interact with the user through a keyboard, a mouse, a remote control, a touchpad or a voice-activated device, for example, a personal computer, a tablet computer, a smart phone, a personal digital Assistant PDA etc.

图1是本发明公开的一种数据获取方法的较佳实施例的流程图。其中,根据不同的需求,该流程图中步骤的顺序可以改变,某些步骤可以省略。FIG. 1 is a flowchart of a preferred embodiment of a data acquisition method disclosed in the present invention. Wherein, according to different requirements, the order of the steps in the flowchart can be changed, and some steps can be omitted.

S11、电子设备接收携带有关键词的数据获取请求。S11. The electronic device receives a data acquisition request carrying a keyword.

本发明实施例中,因为要获取待读取文件存储的不同类型的数据,而获取数据的规则保存在可扩展标记语言文件中,可扩展标记语言文件有多个,所以需要接收携带有关键词的数据获取请求,以确定可扩展标记语言文件。In the embodiment of the present invention, because different types of data stored in the file to be read need to be acquired, and the rules for acquiring the data are stored in the extensible markup language file, there are multiple extensible markup language files, so it is necessary to receive keywords that carry keywords data fetch request to identify Extensible Markup Language files.

S12、电子设备根据所述关键词,从缓存中获取与所述关键词对应的可扩展标记语言文件,以及获取待读取文件。S12. According to the keyword, the electronic device acquires the extensible markup language file corresponding to the keyword from the cache, and acquires the file to be read.

本发明实施例中,因为不同的关键词和不同的可扩展标记语言文件是一一对应的,可以根据所述关键词确定对应的可扩展标记语言文件,并从缓存中获取与所述关键词对应的可扩展标记语言文件。In this embodiment of the present invention, because different keywords are in one-to-one correspondence with different extensible markup language files, the corresponding extensible markup language files can be determined according to the keywords, and the corresponding extensible markup language files can be obtained from the cache. Corresponding Extensible Markup Language file.

所述待读取文件可以是预先保存在服务器中的文件,也可以是临时上传的文件,通过所述关键词,可以确定并获取到所述待读取文件。The to-be-read file may be a file pre-stored in the server or a temporarily uploaded file, and the to-be-read file may be determined and acquired through the keyword.

其中,所述待读取文件可能是合同文件,也可能是客户信息文件、简历或者其它待读取文件,不同的待读取文件里面储存的数据是不同的,不同类型的待读取文件都有对应的可扩展标记语言文件,不同的可扩展标记语言文件中有对应不同类型的待读取文件的数据获取规则,即数据的配置信息。其中,所述待读取文件的格式可以是word,也可以是PDF。Wherein, the file to be read may be a contract file, a customer information file, a resume or other file to be read. The data stored in different files to be read is different, and different types of files to be read are There are corresponding extensible markup language files, and different extensible markup language files have data acquisition rules corresponding to different types of files to be read, that is, data configuration information. The format of the file to be read may be word or PDF.

其中,可扩展标记语言(XML,Extensible Markup Language)是一种标记语言,作用是用来传输和存储数据;可扩展标记语言具有统一的标准语法,几乎所有系统和产品所支持的可扩展标记语言文档;因为具有统一的格式和语法,使得可扩展标记语言可以跨平台使用。其中,标记指计算机所能理解的信息符号,通过此种标记,计算机之间可以处理包含各种信息的文章等。Among them, Extensible Markup Language (XML, Extensible Markup Language) is a markup language, which is used to transmit and store data; Extensible Markup Language has a unified standard syntax, which is supported by almost all systems and products. Documentation; the Extensible Markup Language can be used across platforms because it has a uniform format and syntax. Among them, the mark refers to the information symbol that can be understood by the computer, and through this mark, the articles containing various information can be processed between the computers.

其中,可以预先将各种可扩展标记语言文件存储在缓存中,充分利用系统资源,提高系统资源利用率。Among them, various extensible markup language files can be stored in the cache in advance, so as to make full use of system resources and improve the utilization rate of system resources.

此外,从缓存中获取文件的速度要比从本地获取文件的速度快,在加上可扩展标记语言文件占用的内存较小,因此,可以快速地从缓存中获取与所述关键词对应的可扩展标记语言文件,提高文件读取效率。In addition, the speed of obtaining files from the cache is faster than that of obtaining files from the local area, and the memory occupied by the extensible markup language file is small. Therefore, the extensible files corresponding to the keywords can be quickly obtained from the cache. Extend markup language files to improve file reading efficiency.

S13、电子设备使用超文本标记语言解析器解析所述可扩展标记语言文件,获得所述可扩展标记语言文件中每种类型的数据的配置信息。S13. The electronic device uses a hypertext markup language parser to parse the extensible markup language file to obtain configuration information of each type of data in the extensible markup language file.

其中,所述可扩展标记语言文件存储有多种类型的数据的配置信息。Wherein, the extensible markup language file stores configuration information of various types of data.

本发明实施例中,因为所述待读取文件存储有多种类型的数据,比如,客户信息文件存储的数据包括但不限于:出生日期、身份证号、姓名、银行账户、教育信息、职业、收入情况、房产信息等不同类型的数据;数据的类型有多种,系统的不同功能模块可能需要不同类型的数据,所以,需要通过解析可扩展标记语言文件来获得不同类型的数据的获取规则,即不同类型的数据的配置信息;所述超文本标记语言解析器是指jsoup,jsoup是一款Java的HTML(HyperText Markup Language,超文本标记语言)解析器,可直接解析某个URL(Uniform Resource Locator,统一资源定位符)地址和HTML文本内容。它提供了一套非常省力的API(Application Programming Interface,应用程序编程接口),可通过DOM(Document Object Model,文档对象模型)、CSS(Cascading Style Sheets,层叠样式表)以及类似于jQuery的操作方法来取出和操作数据。In the embodiment of the present invention, because the to-be-read file stores various types of data, for example, the data stored in the customer information file includes but is not limited to: date of birth, ID number, name, bank account, education information, occupation , income, real estate information and other different types of data; there are many types of data, and different functional modules of the system may require different types of data. Therefore, it is necessary to parse the extensible markup language file to obtain the acquisition rules for different types of data , that is, the configuration information of different types of data; the hypertext markup language parser refers to jsoup, which is a Java HTML (HyperText Markup Language, hypertext markup language) parser, which can directly parse a URL (Uniform Resource Locator, Uniform Resource Locator) address and HTML text content. It provides a very labor-saving API (Application Programming Interface, application programming interface), through DOM (Document Object Model, document object model), CSS (Cascading Style Sheets, cascading style sheets) and jQuery-like operation methods to retrieve and manipulate data.

因为使用jsoup来解析可扩展标记语言文件,使得可扩展标记语言文件不需要保留用不到的标签,减少了代码编写的工作量,代码的编写更加灵活,提高了开发效率。Because jsoup is used to parse the extensible markup language file, the extensible markup language file does not need to retain unused tags, which reduces the workload of code writing, makes the code writing more flexible, and improves development efficiency.

所述可扩展标记语言文件中每种类型的数据有不同的配置信息,所述配置信息可以包括但不限于:数据的类型、正则表达式、预设键、参数类型以及参数名称。通过所述配置信息,可以获取待读取文件存储的不同类型的数据。Each type of data in the extensible markup language file has different configuration information, and the configuration information may include but is not limited to: data type, regular expression, preset key, parameter type, and parameter name. Through the configuration information, different types of data stored in the file to be read can be acquired.

具体的,所述使用超级文本标记语言解析器解析所述可扩展标记语言文件,获得所述可扩展标记语言文件中每种类型的数据的配置信息包括:Specifically, using a hypertext markup language parser to parse the extensible markup language file, and obtaining configuration information of each type of data in the extensible markup language file includes:

针对每种类型的数据,根据所述数据的类型,从所述可扩展标记语言文件中,确定与所述数据相关的目标标签;For each type of data, according to the type of the data, from the extensible markup language file, determine a target tag related to the data;

通过所述超级文本标记语言解析器的选择器,读取所述目标标签,获得所述数据的配置信息;或,通过所述超级文本标记语言解析器的文档对象模型访问方法,读取所述目标标签,获得所述数据的配置信息。Through the selector of the hypertext markup language parser, read the target tag to obtain the configuration information of the data; or, through the document object model access method of the hypertext markup language parser, read the Target tag to obtain the configuration information of the data.

在该可选的实施方式中,因为数据有多种类型,每种类型的数据有不同的配置信息,所以需要针对每种类型的数据,根据所述数据的类型,在所述可扩展标记语言文件中确定与所述数据的配置信息相关的标签,将所述标签确定为目标标签,进一步地,可以通过超级文本标记语言解析器jsoup的选择器(Selector)读取目标标签或者通过超级文本标记语言解析器jsoup的文档对象模型访问方法读取所述目标标签,以获取所述数据的配置信息。In this optional implementation manner, because there are multiple types of data, and each type of data has different configuration information, for each type of data, according to the type of the data, in the extensible markup language A tag related to the configuration information of the data is determined in the file, and the tag is determined as a target tag. Further, the target tag can be read through the selector (Selector) of the hypertext markup language parser jsoup or the hypertext markup can be used. The document object model access method of the language parser jsoup reads the target tag to obtain configuration information of the data.

具体的,通过jsoup的选择器读取所述目标标签是指jsoup支持类似CSS或者jQuery的选择器语法去查找匹配的标签,可以通过选择器来查找所述目标标签,并返回元素列表即所述数据的配置信息。Specifically, reading the target tag through the selector of jsoup means that jsoup supports a selector syntax similar to CSS or jQuery to search for a matching tag. The target tag can be searched through a selector, and a list of elements is returned. Data configuration information.

具体的,所述通过文档对象模型访问方法读取所述目标标签是指jsoup可以访问DOM,即可以通过标签名、id标识、class类名等来获取匹配的标签对象,可以获取所述数据的配置信息。Specifically, the reading of the target tag through the document object model access method means that jsoup can access the DOM, that is, the matching tag object can be obtained through the tag name, id identifier, class name, etc., and the data of the data can be obtained. configuration information.

其中,所述目标标签是指所述数据的配置信息所在的标签,所述标签是指XML标签,XML标签是用户自定义的,每个XML标签都有一个对应的关闭标签,在所述XML标签和所述对应的关闭标签之间,可以存放内容。每个XML标签、对应的关闭标签和存放的内容可以组成一个元素(Element)。The target tag refers to the tag where the configuration information of the data is located, the tag refers to the XML tag, the XML tag is user-defined, and each XML tag has a corresponding closing tag. Between the tag and the corresponding closing tag, content can be stored. Each XML tag, the corresponding closing tag and the stored content can form an element (Element).

S14、电子设备从所述配置信息中获取预设键,并根据所述配置信息以及所述预设键,获得键值对。S14. The electronic device obtains a preset key from the configuration information, and obtains a key-value pair according to the configuration information and the preset key.

本发明实施例中,需要对每种类型的数据的配置信息做进一步的处理,以便后续用来获取所述数据。In this embodiment of the present invention, the configuration information of each type of data needs to be further processed so that it can be used to acquire the data subsequently.

其中,所述数据有多种类型,每种类型的所述数据都有对应的配置信息,即每种类型的数据都有对应的键值对。There are multiple types of the data, and each type of the data has corresponding configuration information, that is, each type of data has a corresponding key-value pair.

具体的,所述根据所述配置信息以及所述预设键,获得键值对包括:Specifically, the obtaining a key-value pair according to the configuration information and the preset key includes:

将所述配置信息保存至目标对象中;saving the configuration information to the target object;

将所述预设键和所述目标对象构成键值对。The preset key and the target object form a key-value pair.

其中,键(key)是指在Map对象里包含的键,Map是将键映射到值的对象,其中,键和值一一映射构成键值对,可以通过键来获取值;给定一个键和一个值,将该值存储在一个Map对象。之后,可以通过键来访问对应的值。Among them, the key (key) refers to the key contained in the Map object, and the Map is the object that maps the key to the value, in which the key and the value are mapped one-to-one to form a key-value pair, and the value can be obtained through the key; given a key and a value, which is stored in a Map object. After that, the corresponding value can be accessed by key.

其中,所述预设键是指预先设置的键,可以预先设置不同类型的数据对应的键。The preset keys refer to preset keys, and keys corresponding to different types of data may be preset.

其中,所述缓存是数据交换的缓冲区(Cache),当某一硬件要读取数据时,会首先从缓存中查找需要的数据,如果找到了则直接执行,找不到的话则从内存中找。由于缓存的运行速度比内存快得多,故缓存的作用是可以用来快速读取到常用的数据。Among them, the cache is a buffer (Cache) for data exchange. When a certain hardware wants to read data, it will first look for the required data from the cache. If it is found, it will be executed directly. If it is not found, it will be retrieved from the memory. Find. Since the cache runs much faster than the memory, the function of the cache is to quickly read frequently used data.

其中,所述数据的配置信息包括预设键。Wherein, the configuration information of the data includes a preset key.

在该可选的实施方式中,可以从每种类型的数据的配置信息中,获取每种类型的数据的预设键,在将所述每种类型的数据的配置信息读入缓存前,需要先将所述配置信息保存至一个数组对象或者其他对象中,方便和所述预设键构成键值对。In this optional embodiment, the preset key of each type of data can be obtained from the configuration information of each type of data. Before reading the configuration information of each type of data into the cache, it is necessary to The configuration information is first saved in an array object or other object, so as to form a key-value pair with the preset key.

S15、电子设备将所述键值对读入缓存中。S15. The electronic device reads the key-value pair into the cache.

其中,在构成键值对,将所述键值对读入缓存中后,可以通过所述预设键,快速找到对应类型的数据的配置信息,同时,还充分利用了系统资源,使得系统资源利用率高。Among them, after forming a key-value pair and reading the key-value pair into the cache, the configuration information of the corresponding type of data can be quickly found through the preset key, and at the same time, the system resources are also fully utilized, so that the system resources High utilization rate.

S16、电子设备从所述缓存中读取所述键值对,并根据所述键值对确定数据读取格式,以及依据所述数据读取格式,从所述待读取文件中读取目标数据。S16. The electronic device reads the key-value pair from the cache, determines a data reading format according to the key-value pair, and reads the target from the to-be-read file according to the data reading format data.

本发明实施例中,因为所述待读取文件存储着多种类型的数据,所以需要根据每种类型的数据的配置信息来确定如何获取所述待读取文件中的每种类型的数据,所述配置信息被处理并保存在所述键值对中,因此可以根据每种类型的数据的所述键值对获取所述待读取文件存储的所述每种类型的数据,可以避免对数据库的频繁访问,减轻数据库的负担。In this embodiment of the present invention, because the to-be-read file stores multiple types of data, it is necessary to determine how to acquire each type of data in the to-be-read file according to the configuration information of each type of data, The configuration information is processed and stored in the key-value pair, so each type of data stored in the to-be-read file can be obtained according to the key-value pair of each type of data, which can avoid Frequent access to the database reduces the burden on the database.

具体的,所述根据所述键值对,确定数据读取格式,并依据所述数据读取格式从所述待读取文件中读取目标数据包括:Specifically, determining the data reading format according to the key-value pair, and reading target data from the to-be-read file according to the data reading format includes:

从所述键值对中,确定正则表达式;From the key-value pair, determine the regular expression;

使用所述正则表达式,从所述待读取文件存储的数据中,获取与所述正则表达式匹配的目标数据。Using the regular expression, obtain target data matching the regular expression from the data stored in the file to be read.

其中,每种类型的数据的配置信息包括了正则表达式(英语:RegularExpression,在代码中常简写为regex、regexp或RE),正则表达式是对字符串操作的一种逻辑公式,就是用事先定义好的一些特定字符、及这些特定字符的组合,组成一个“规则字符串”,这个“规则字符串”用来表达对字符串的一种过滤逻辑。Among them, the configuration information of each type of data includes a regular expression (English: RegularExpression, often abbreviated as regex, regexp or RE in the code), a regular expression is a logical formula for string operations, which is defined in advance. Some good specific characters and the combination of these specific characters form a "rule string", and this "rule string" is used to express a filtering logic for strings.

在该可选的实施方式中,可以从每种类型的所述数据的配置信息中,确定并获取正则表达式,进一步地,可以将所述存储文件存储的所有数据与所述正则表达式进行匹配,获取这些匹配成功的数据。In this optional implementation manner, a regular expression may be determined and acquired from the configuration information of each type of the data, and further, all data stored in the storage file may be compared with the regular expression Match, and get the data that these matches are successful.

具体的,所述正则表达式有多个,所述使用所述正则表达式,从所述待读取文件存储的数据中,获取与所述正则表达式匹配的目标数据包括:Specifically, there are multiple regular expressions, and the use of the regular expression to obtain target data matching the regular expression from the data stored in the to-be-read file includes:

按照多个所述正则表达式的预设排列顺序,依次判断所述待读取文件存储的所有数据中是否存在与所述正则表达式匹配的目标数据;According to the preset arrangement order of a plurality of the regular expressions, sequentially determine whether there is target data matching the regular expression in all the data stored in the to-be-read file;

若所述待读取文件存储的所有数据中存在与所述正则表达式匹配的目标数据,获取与所述正则表达式匹配的目标数据。If target data matching the regular expression exists in all the data stored in the file to be read, obtain the target data matching the regular expression.

在该可选的实施方式中,因为同一种类型的数据在不同的储存文件中可能存在不同的表示形式,比如,一个客户出生日期为一九九零年一月一号,那么这个出生日期可能存在的表示形式有:1990.1.1、1990-1-1等。因此需要有多个不同的正则表达式来匹配不同表示形式的数据。同时,可以预先将与这种类型的数据常用的表示形式相匹配的正则表达式放在前面排列,这样可以节省总体的匹配时间,因此,可以预先设置正则表达式的排列顺序。当判断存储文件存储的所有数据中存在与所述正则表达式匹配的数据时,即可以获取所述与所述正则表达式匹配的数据。In this optional implementation, because the same type of data may have different representations in different storage files, for example, a customer's date of birth is January 1, 1990, then this date of birth may The representations that exist are: 1990.1.1, 1990-1-1, etc. So there needs to be multiple different regular expressions to match data in different representations. At the same time, the regular expressions that match the commonly used representations of this type of data can be arranged in the front, which can save the overall matching time. Therefore, the regular expressions can be arranged in advance. When it is determined that there is data matching the regular expression in all the data stored in the storage file, the data matching the regular expression can be acquired.

作为一种可选的实施方式,所述根据所述关键词,从缓存中获取与所述关键词对应的可扩展标记语言文件,以及获取待读取文件之后,所述方法还包括:As an optional implementation manner, after acquiring the extensible markup language file corresponding to the keyword from the cache according to the keyword, and acquiring the file to be read, the method further includes:

使用文本解析工具解析所述待读取文件,获得输入流;Use a text parsing tool to parse the to-be-read file to obtain an input stream;

将所述输入流保存至缓存中;saving the input stream to the cache;

所述依据所述数据读取格式,从所述待读取文件中读取目标数据包括:The reading target data from the to-be-read file according to the data reading format includes:

依据所述数据读取格式,从所述缓存中读取所述输入流中的目标数据。According to the data reading format, the target data in the input stream is read from the buffer.

其中,所述输入流是指能够读取字节序列的对象,一个流可以理解为一个数据的序列;输入流表示从一个源读取数据。The input stream refers to an object that can read a sequence of bytes, and a stream can be understood as a sequence of data; an input stream means reading data from a source.

在该可选的实施方式中,可以使用文本解析工具Apache tika读取所述待读取文件,并将所述待读取文件存储的数据转换为输入流,所述输入流可以保存在缓存中,即可以将所述待读取文件存储的数据保存在缓存中,使得所述待读取文件存储的数据被读取的速度更快,提高系统性能。In this optional embodiment, the text parsing tool Apache tika can be used to read the to-be-read file, and convert the data stored in the to-be-read file into an input stream, and the input stream can be stored in a cache , that is, the data stored in the file to be read can be stored in the cache, so that the data stored in the file to be read can be read faster and system performance is improved.

其中,所述文本解析工具Apache tika是基于java的内容检测和分析的工具包,可检测并提取来自不同文件类型(如PPT,XLS和PDF)中的内容。Among them, the text parsing tool Apache tika is a java-based content detection and analysis toolkit, which can detect and extract content from different file types (such as PPT, XLS and PDF).

作为一种可选的实施方式,所述方法还包括:As an optional embodiment, the method further includes:

从所述键值对中,获取参数类型和参数名称;Obtain the parameter type and parameter name from the key-value pair;

按照所述参数类型和所述参数名称的数据保存格式,保存所述目标数据。The target data is saved according to the data saving format of the parameter type and the parameter name.

其中,所述键值对中的数据的配置信息包括参数类型和参数名称。The configuration information of the data in the key-value pair includes parameter type and parameter name.

其中,所述参数包括:数组、列表等可以保存数据的类型。The parameters include: arrays, lists, and other types that can store data.

在该可选的实施方式中,每种类型的数据对应着不同的参数类型和/或参数名称,从所述每种类型的数据的配置信息中确定参数类型和参数名称,将所述每种类型的数据保存在参数中,即不同类型的数据保存在不同的参数中,方便让系统的功能模块或者方法来调用不同类型的数据。In this optional implementation manner, each type of data corresponds to a different parameter type and/or parameter name, the parameter type and parameter name are determined from the configuration information of each type of data, and the Types of data are stored in parameters, that is, different types of data are stored in different parameters, which is convenient for system function modules or methods to call different types of data.

在图1所描述的方法流程中,可以接收数据获取请求,所述数据获取请求携带有关键词和待读取文件;进一步地,根据所述关键词,从缓存中获取与所述关键词对应的可扩展标记语言文件;再进一步地,使用超级文本标记语言解析器解析所述可扩展标记语言文件,获得所述可扩展标记语言文件中每种类型的数据的配置信息;更进一步地,从所述配置信息中获取预设键,并根据所述配置信息以及所述预设键,获得键值对;将所述键值对读入缓存中;根据所述键值对,确定数据读取格式,并依据所述数据读取格式从所述待读取文件中读取目标数据。可见,本发明中,可以预先将可扩展标记语言文件存储在缓存中,当需要获取待读取文件中的目标数据时,可以加载缓存中的可扩展标记语言文件,并通过超文本标记语言解析器来解析该可扩展标记语言文件,获得配置信息以及键值对,并将键值对读入缓存中,依据键值对来读取所述待读取文件中的目标数据,整个过程中,都是从缓存中读取可扩展标记语言文件以及键值对,充分利用了缓存的系统资源,此外,由于可扩展标记语言文件占用的内存较小,从缓存中能快速的读取可扩展标记语言文件,从而能够整体上提高文件的读取效率,同时,系统资源利用率较高。In the method flow described in FIG. 1, a data acquisition request can be received, and the data acquisition request carries a keyword and a file to be read; further, according to the keyword, acquire from the cache corresponding to the keyword The extensible markup language file; further, use a hypertext markup language parser to parse the extensible markup language file, and obtain the configuration information of each type of data in the extensible markup language file; further, from Obtaining a preset key from the configuration information, and obtaining a key-value pair according to the configuration information and the preset key; reading the key-value pair into the cache; determining data read according to the key-value pair format, and read target data from the to-be-read file according to the data reading format. It can be seen that in the present invention, the extensible markup language file can be stored in the cache in advance, and when the target data in the file to be read needs to be obtained, the extensible markup language file in the cache can be loaded, and parsed by the hypertext markup language The extensible markup language file is parsed by the browser to obtain configuration information and key-value pairs, and the key-value pairs are read into the cache, and the target data in the to-be-read file is read according to the key-value pairs. During the whole process, Both read the extensible markup language files and key-value pairs from the cache, making full use of the system resources of the cache. In addition, because the extensible markup language files occupy less memory, the extensible markup language can be quickly read from the cache. language files, so that the file reading efficiency can be improved as a whole, and at the same time, the system resource utilization rate is high.

以上所述,仅是本发明的具体实施方式,但本发明的保护范围并不局限于此,对于本领域的普通技术人员来说,在不脱离本发明创造构思的前提下,还可以做出改进,但这些均属于本发明的保护范围。The above are only specific embodiments of the present invention, but the protection scope of the present invention is not limited to this. improvements, but these all belong to the protection scope of the present invention.

图2是本发明公开的一种数据获取装置的较佳实施例的功能模块图。FIG. 2 is a functional block diagram of a preferred embodiment of a data acquisition device disclosed in the present invention.

在一些实施例中,所述数据获取装置运行于电子设备中。所述数据获取装置可以包括多个由程序代码段所组成的功能模块。所述数据获取装置中的各个程序段的程序代码可以存储于存储器中,并由至少一个处理器所执行,以执行图1所描述的数据获取方法中的部分或全部步骤。In some embodiments, the data acquisition apparatus operates in an electronic device. The data acquisition device may include a plurality of functional modules composed of program code segments. The program codes of each program segment in the data acquisition device may be stored in a memory and executed by at least one processor to perform some or all of the steps in the data acquisition method described in FIG. 1 .

本实施例中,所述数据获取装置根据其所执行的功能,可以被划分为多个功能模块。所述功能模块可以包括:接收模块201、第一获取模块202、解析模块203、第二获取模块204、第一读取模块205及第二读取模块206。本发明所称的模块是指一种能够被至少一个处理器所执行并且能够完成固定功能的一系列计算机程序段,其存储在存储器中。关于各模块的功能将在后续的实施例中详述。In this embodiment, the data acquisition device may be divided into a plurality of functional modules according to the functions performed by the data acquisition device. The functional modules may include: a receiving module 201 , a first obtaining module 202 , a parsing module 203 , a second obtaining module 204 , a first reading module 205 and a second reading module 206 . The modules referred to in the present invention refer to a series of computer program segments that can be executed by at least one processor and can perform fixed functions, and are stored in a memory. The functions of each module will be described in detail in subsequent embodiments.

接收模块201,用于接收携带有关键词的数据获取请求。The receiving module 201 is configured to receive a data acquisition request carrying a keyword.

本发明实施例中,因为要获取待读取文件存储的不同类型的数据,而获取数据的规则保存在可扩展标记语言文件中,可扩展标记语言文件有多个,所以需要接收携带有关键词的数据获取请求,以确定可扩展标记语言文件。In the embodiment of the present invention, because different types of data stored in the file to be read need to be acquired, and the rules for acquiring the data are stored in the extensible markup language file, there are multiple extensible markup language files, so it is necessary to receive keywords that carry keywords data fetch request to identify Extensible Markup Language files.

第一获取模块202,用于根据所述关键词,从缓存中获取与所述关键词对应的可扩展标记语言文件,以及获取待读取文件。The first acquiring module 202 is configured to acquire, according to the keyword, an extensible markup language file corresponding to the keyword from the cache, and acquire the file to be read.

本发明实施例中,因为不同的关键词和不同的可扩展标记语言文件是一一对应的,可以根据所述关键词确定对应的可扩展标记语言文件,并从缓存中获取与所述关键词对应的可扩展标记语言文件。In this embodiment of the present invention, because different keywords are in one-to-one correspondence with different extensible markup language files, the corresponding extensible markup language files can be determined according to the keywords, and the corresponding extensible markup language files can be obtained from the cache. Corresponding Extensible Markup Language file.

所述待读取文件可以是预先保存在服务器中的文件,也可以是临时上传的文件,通过所述关键词,可以确定并获取到所述待读取文件。The to-be-read file may be a file pre-stored in the server or a temporarily uploaded file, and the to-be-read file may be determined and acquired through the keyword.

其中,所述待读取文件可能是合同文件,也可能是客户信息文件、简历或者其它待读取文件,不同的待读取文件里面储存的数据是不同的,不同类型的待读取文件都有对应的可扩展标记语言文件,不同的可扩展标记语言文件中有对应不同类型的待读取文件的数据获取规则,即数据的配置信息。其中,所述待读取文件的格式可以是word,也可以是PDF。Wherein, the file to be read may be a contract file, a customer information file, a resume or other file to be read. The data stored in different files to be read is different, and different types of files to be read are There are corresponding extensible markup language files, and different extensible markup language files have data acquisition rules corresponding to different types of files to be read, that is, data configuration information. The format of the file to be read may be word or PDF.

其中,可扩展标记语言(XML,Extensible Markup Language)是一种标记语言,作用是用来传输和存储数据;可扩展标记语言具有统一的标准语法,几乎所有系统和产品所支持的可扩展标记语言文档;因为具有统一的格式和语法,使得可扩展标记语言可以跨平台使用。其中,标记指计算机所能理解的信息符号,通过此种标记,计算机之间可以处理包含各种信息的文章等。Among them, Extensible Markup Language (XML, Extensible Markup Language) is a markup language, which is used to transmit and store data; Extensible Markup Language has a unified standard syntax, which is supported by almost all systems and products. Documentation; the Extensible Markup Language can be used across platforms because it has a uniform format and syntax. Among them, the mark refers to the information symbol that can be understood by the computer, and through this mark, the articles containing various information can be processed between the computers.

其中,可以预先将各种可扩展标记语言文件存储在缓存中,充分利用系统资源,提高系统资源利用率。Among them, various extensible markup language files can be stored in the cache in advance, so as to make full use of system resources and improve the utilization rate of system resources.

此外,从缓存中获取文件的速度要比从本地获取文件的速度快,在加上可扩展标记语言文件占用的内存较小,因此,可以快速地从缓存中获取与所述关键词对应的可扩展标记语言文件,提高文件读取效率。In addition, the speed of obtaining files from the cache is faster than that of obtaining files from the local area, and the memory occupied by the extensible markup language file is small. Therefore, the extensible files corresponding to the keywords can be quickly obtained from the cache. Extend markup language files to improve file reading efficiency.

解析模块203,用于使用超文本标记语言解析器解析所述可扩展标记语言文件,获得所述可扩展标记语言文件中每种类型的数据的配置信息。The parsing module 203 is configured to use a hypertext markup language parser to parse the extensible markup language file to obtain configuration information of each type of data in the extensible markup language file.

其中,所述可扩展标记语言文件存储有多种类型的数据的配置信息。Wherein, the extensible markup language file stores configuration information of various types of data.

本发明实施例中,因为所述待读取文件存储有多种类型的数据,比如,客户信息文件存储的数据包括但不限于:出生日期、身份证号、姓名、银行账户、教育信息、职业、收入情况、房产信息等不同类型的数据;数据的类型有多种,系统的不同功能模块可能需要不同类型的数据,所以,需要通过解析可扩展标记语言文件来获得不同类型的数据的获取规则,即不同类型的数据的配置信息;所述超文本标记语言解析器是指jsoup,jsoup是一款Java的HTML(HyperText Markup Language,超文本标记语言)解析器,可直接解析某个URL(Uniform Resource Locator,统一资源定位符)地址和HTML文本内容。它提供了一套非常省力的API(Application Programming Interface,应用程序编程接口),可通过DOM(Document Object Model,文档对象模型)、CSS(Cascading Style Sheets,层叠样式表)以及类似于jQuery的操作方法来取出和操作数据。In the embodiment of the present invention, because the to-be-read file stores various types of data, for example, the data stored in the customer information file includes but is not limited to: date of birth, ID number, name, bank account, education information, occupation , income, real estate information and other different types of data; there are many types of data, and different functional modules of the system may require different types of data. Therefore, it is necessary to parse the extensible markup language file to obtain the acquisition rules for different types of data , that is, the configuration information of different types of data; the hypertext markup language parser refers to jsoup, which is a Java HTML (HyperText Markup Language, hypertext markup language) parser, which can directly parse a URL (Uniform Resource Locator, Uniform Resource Locator) address and HTML text content. It provides a very labor-saving API (Application Programming Interface, application programming interface), through DOM (Document Object Model, document object model), CSS (Cascading Style Sheets, cascading style sheets) and jQuery-like operation methods to retrieve and manipulate data.

因为使用jsoup来解析可扩展标记语言文件,使得可扩展标记语言文件不需要保留用不到的标签,减少了代码编写的工作量,代码的编写更加灵活,提高了开发效率。Because jsoup is used to parse the extensible markup language file, the extensible markup language file does not need to retain unused tags, which reduces the workload of code writing, makes the code writing more flexible, and improves development efficiency.

所述可扩展标记语言文件中每种类型的数据有不同的配置信息,所述配置信息可以包括但不限于:数据的类型、正则表达式、预设键、参数类型以及参数名称。通过所述配置信息,可以获取待读取文件存储的不同类型的数据。Each type of data in the extensible markup language file has different configuration information, and the configuration information may include but is not limited to: data type, regular expression, preset key, parameter type, and parameter name. Through the configuration information, different types of data stored in the file to be read can be acquired.

第二获取模块204,用于从所述配置信息中获取预设键,并根据所述配置信息以及所述预设键,获得键值对。The second obtaining module 204 is configured to obtain a preset key from the configuration information, and obtain a key-value pair according to the configuration information and the preset key.

本发明实施例中,需要对每种类型的数据的配置信息做进一步的处理,以便后续用来获取所述数据。In this embodiment of the present invention, the configuration information of each type of data needs to be further processed so that it can be used to acquire the data subsequently.

其中,所述数据有多种类型,每种类型的所述数据都有对应的配置信息,即每种类型的数据都有对应的键值对。There are multiple types of the data, and each type of the data has corresponding configuration information, that is, each type of data has a corresponding key-value pair.

第一读取模块205,用于将所述键值对读入缓存中。The first reading module 205 is configured to read the key-value pair into the cache.

其中,在构成键值对,将所述键值对读入缓存中后,可以通过所述预设键,快速找到对应类型的数据的配置信息,同时,还充分利用了系统资源,使得系统资源利用率高。Among them, after forming a key-value pair and reading the key-value pair into the cache, the configuration information of the corresponding type of data can be quickly found through the preset key, and at the same time, the system resources are also fully utilized, so that the system resources High utilization rate.

第二读取模块206,用于从所述缓存中读取所述键值对,并根据所述键值对确定数据读取格式,以及依据所述数据读取格式,从所述待读取文件中读取目标数据。The second reading module 206 is configured to read the key-value pair from the cache, determine a data reading format according to the key-value pair, and, according to the data reading format, retrieve the data from the to-be-read Read the target data from the file.

本发明实施例中,因为所述待读取文件存储着多种类型的数据,所以需要根据每种类型的数据的配置信息来确定如何获取所述待读取文件中的每种类型的数据,所述配置信息被处理并保存在所述键值对中,因此可以根据每种类型的数据的所述键值对获取所述待读取文件存储的所述每种类型的数据,可以避免对数据库的频繁访问,减轻数据库的负担。In this embodiment of the present invention, because the to-be-read file stores multiple types of data, it is necessary to determine how to acquire each type of data in the to-be-read file according to the configuration information of each type of data, The configuration information is processed and stored in the key-value pair, so each type of data stored in the to-be-read file can be obtained according to the key-value pair of each type of data, which can avoid Frequent access to the database reduces the burden on the database.

所述解析模块203使用超文本标记语言解析器解析所述可扩展标记语言文件,获得所述可扩展标记语言文件中每种类型的数据的配置信息的方式具体为:The parsing module 203 uses a hypertext markup language parser to parse the extensible markup language file, and the manner of obtaining the configuration information of each type of data in the extensible markup language file is specifically:

针对每种类型的数据,根据所述数据的类型,从所述可扩展标记语言文件中,确定与所述数据相关的目标标签;For each type of data, according to the type of the data, from the extensible markup language file, determine a target tag related to the data;

通过所述超级文本标记语言解析器的选择器,读取所述目标标签,获得所述数据的配置信息;或,通过所述超级文本标记语言解析器的文档对象模型访问方法,读取所述目标标签,获得所述数据的配置信息。Through the selector of the hypertext markup language parser, read the target tag to obtain the configuration information of the data; or, through the document object model access method of the hypertext markup language parser, read the Target tag to obtain the configuration information of the data.

所述第二获取模块204根据所述配置信息以及所述预设键,获得键值对的方式具体为:The second obtaining module 204 obtains the key-value pair according to the configuration information and the preset key as follows:

将所述配置信息保存至目标对象中;saving the configuration information to the target object;

将所述预设键和所述目标对象构成键值对。The preset key and the target object form a key-value pair.

所述第二读取模块206根据所述键值对,确定数据读取格式,并依据所述数据读取格式从所述待读取文件中读取目标数据的方式具体为:The second reading module 206 determines the data reading format according to the key-value pair, and reads the target data from the to-be-read file according to the data reading format. Specifically:

从所述键值对中,确定正则表达式;From the key-value pair, determine the regular expression;

使用所述正则表达式,从所述待读取文件存储的数据中,获取与所述正则表达式匹配的目标数据。Using the regular expression, obtain target data matching the regular expression from the data stored in the file to be read.

所述正则表达式有多个,所述使用所述正则表达式,从所述待读取文件存储的数据中,获取与所述正则表达式匹配的目标数据包括:There are multiple regular expressions, and the use of the regular expression to obtain target data matching the regular expression from the data stored in the to-be-read file includes:

按照多个所述正则表达式的预设排列顺序,依次判断所述待读取文件存储的所有数据中是否存在与所述正则表达式匹配的目标数据;According to the preset arrangement order of a plurality of the regular expressions, sequentially determine whether there is target data matching the regular expression in all the data stored in the to-be-read file;

若所述待读取文件存储的所有数据中存在与所述正则表达式匹配的目标数据,获取与所述正则表达式匹配的目标数据。If target data matching the regular expression exists in all the data stored in the file to be read, obtain the target data matching the regular expression.

可选的,所述解析模块203,还用于在所述第一获取模块202根据所述关键词,从缓存中获取与所述关键词对应的可扩展标记语言文件,以及获取待读取文件之后,使用文本解析工具解析所述待读取文件,获得输入流。Optionally, the parsing module 203 is further configured to, in the first obtaining module 202, obtain the extensible markup language file corresponding to the keyword from the cache according to the keyword, and obtain the file to be read. After that, use a text parsing tool to parse the to-be-read file to obtain an input stream.

所述数据获取装置还包括:The data acquisition device also includes:

保存模块,用于将所述输入流保存至缓存中。A saving module, configured to save the input stream to the cache.

所述第二读取模块206依据所述数据读取格式,从所述待读取文件中读取目标数据包括:The second reading module 206 reads target data from the to-be-read file according to the data reading format, including:

依据所述数据读取格式,从所述缓存中读取所述输入流中的目标数据。According to the data reading format, the target data in the input stream is read from the buffer.

可选的,所述第一获取模块202,还用于从所述键值对中,获取参数类型和参数名称。Optionally, the first obtaining module 202 is further configured to obtain the parameter type and parameter name from the key-value pair.

所述保存模块,还用于按照所述参数类型和所述参数名称的数据保存格式,保存所述目标数据。The saving module is further configured to save the target data according to the data saving format of the parameter type and the parameter name.

在图2所描述的数据获取装置中,可以接收数据获取请求,所述数据获取请求携带有关键词和待读取文件;进一步地,根据所述关键词,从缓存中获取与所述关键词对应的可扩展标记语言文件;再进一步地,使用超级文本标记语言解析器解析所述可扩展标记语言文件,获得所述可扩展标记语言文件中每种类型的数据的配置信息;更进一步地,从所述配置信息中获取预设键,并根据所述配置信息以及所述预设键,获得键值对;将所述键值对读入缓存中;根据所述键值对,确定数据读取格式,并依据所述数据读取格式从所述待读取文件中读取目标数据。可见,本发明中,可以预先将可扩展标记语言文件存储在缓存中,当需要获取待读取文件中的目标数据时,可以加载缓存中的可扩展标记语言文件,并通过超文本标记语言解析器来解析该可扩展标记语言文件,获得配置信息以及键值对,并将键值对读入缓存中,依据键值对来读取所述待读取文件中的目标数据,整个过程中,都是从缓存中读取可扩展标记语言文件以及键值对,充分利用了缓存的系统资源,此外,由于可扩展标记语言文件占用的内存较小,从缓存中能快速的读取可扩展标记语言文件,从而能够整体上提高文件的读取效率,同时,系统资源利用率较高。In the data acquisition device described in FIG. 2, a data acquisition request can be received, and the data acquisition request carries a keyword and a file to be read; further, according to the keyword, the keyword and the keyword are acquired from the cache Corresponding extensible markup language file; further, use a hypertext markup language parser to parse the extensible markup language file to obtain configuration information of each type of data in the extensible markup language file; further, Obtain a preset key from the configuration information, and obtain a key-value pair according to the configuration information and the preset key; read the key-value pair into the cache; determine the data read according to the key-value pair format, and read target data from the to-be-read file according to the data reading format. It can be seen that in the present invention, the extensible markup language file can be stored in the cache in advance, and when the target data in the file to be read needs to be obtained, the extensible markup language file in the cache can be loaded, and parsed by the hypertext markup language The extensible markup language file is parsed by the browser to obtain configuration information and key-value pairs, and the key-value pairs are read into the cache, and the target data in the to-be-read file is read according to the key-value pairs. During the whole process, Both read the extensible markup language files and key-value pairs from the cache, making full use of the system resources of the cache. In addition, because the extensible markup language files occupy less memory, the extensible markup can be quickly read from the cache. language files, so that the file reading efficiency can be improved as a whole, and at the same time, the system resource utilization rate is high.

图3是本发明实现数据获取方法的较佳实施例的电子设备的结构示意图。所述电子设备3包括存储器31、至少一个处理器32、存储在所述存储器31中并可在所述至少一个处理器32上运行的计算机程序33及至少一条通讯总线34。FIG. 3 is a schematic structural diagram of an electronic device according to a preferred embodiment of the data acquisition method of the present invention. Theelectronic device 3 includes amemory 31 , at least oneprocessor 32 , acomputer program 33 stored in thememory 31 and executable on the at least oneprocessor 32 , and at least onecommunication bus 34 .

本领域技术人员可以理解,图3所示的示意图仅仅是所述电子设备3的示例,并不构成对所述电子设备3的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如所述电子设备3还可以包括输入输出设备、网络接入设备等。Those skilled in the art can understand that the schematic diagram shown in FIG. 3 is only an example of theelectronic device 3, and does not constitute a limitation on theelectronic device 3, and may include more or less components than those shown, or combinations thereof Certain components, or different components, for example, theelectronic device 3 may also include input and output devices, network access devices, and the like.

所述电子设备3还包括但不限于任何一种可与用户通过键盘、鼠标、遥控器、触摸板或声控设备等方式进行人机交互的电子产品,例如,个人计算机、平板电脑、智能手机、个人数字助理(Personal Digital Assistant,PDA)、游戏机、交互式网络电视(InternetProtocol Television,IPTV)、智能式穿戴式设备等。所述电子设备3所处的网络包括但不限于互联网、广域网、城域网、局域网、虚拟专用网络(Virtual Private Network,VPN)等。Theelectronic device 3 also includes, but is not limited to, any electronic product that can interact with the user through a keyboard, a mouse, a remote control, a touchpad or a voice-activated device, for example, a personal computer, a tablet computer, a smart phone, Personal Digital Assistant (Personal Digital Assistant, PDA), game console, Internet Protocol Television (Internet Protocol Television, IPTV), smart wearable devices, etc. The network where theelectronic device 3 is located includes but is not limited to the Internet, a wide area network, a metropolitan area network, a local area network, a virtual private network (Virtual Private Network, VPN) and the like.

所述至少一个处理器32可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。该处理器32可以是微处理器或者该处理器32也可以是任何常规的处理器等,所述处理器32是所述电子设备3的控制中心,利用各种接口和线路连接整个电子设备3的各个部分。The at least oneprocessor 32 may be a central processing unit (Central Processing Unit, CPU), and may also be other general-purpose processors, digital signal processors (Digital Signal Processors, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC) ), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. Theprocessor 32 can be a microprocessor or theprocessor 32 can also be any conventional processor, etc. Theprocessor 32 is the control center of theelectronic device 3, and uses various interfaces and lines to connect the entireelectronic device 3 of each part.

所述存储器31可用于存储所述计算机程序33和/或模块/单元,所述处理器32通过运行或执行存储在所述存储器31内的计算机程序和/或模块/单元,以及调用存储在存储器31内的数据,实现所述电子设备3的各种功能。所述存储器31可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据电子设备3的使用所创建的数据(比如音频数据)等。此外,存储器31可以包括非易失性存储器,例如硬盘、内存、插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)、至少一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件。Thememory 31 can be used to store thecomputer program 33 and/or modules/units, and theprocessor 32 executes or executes the computer programs and/or modules/units stored in thememory 31 and calls the computer programs and/or modules/units stored in thememory 31. 31 to realize various functions of theelectronic device 3 . Thememory 31 may mainly include a stored program area and a stored data area, wherein the stored program area may store an operating system, an application program required for at least one function (such as a sound playback function, an image playback function, etc.), etc.; the storage data area may Data such as audio data and the like created in accordance with the use of theelectronic device 3 are stored. In addition, thememory 31 may include non-volatile memory, such as hard disk, internal memory, plug-in hard disk, Smart Media Card (SMC), Secure Digital (SD) card, Flash Card (Flash Card), At least one disk storage device, flash memory device, or other non-volatile solid state storage device.

结合图1,所述电子设备3中的所述存储器31存储多个指令以实现一种数据获取方法,所述处理器32可执行所述多个指令从而实现:With reference to FIG. 1 , thememory 31 in theelectronic device 3 stores multiple instructions to implement a data acquisition method, and theprocessor 32 can execute the multiple instructions to implement:

接收携带有关键词的数据获取请求;Receive a data acquisition request carrying a keyword;

根据所述关键词,从缓存中获取与所述关键词对应的可扩展标记语言文件,以及获取待读取文件;According to the keyword, obtain the extensible markup language file corresponding to the keyword from the cache, and obtain the file to be read;

使用超文本标记语言解析器解析所述可扩展标记语言文件,获得所述可扩展标记语言文件中每种类型的数据的配置信息;Using a hypertext markup language parser to parse the extensible markup language file to obtain configuration information of each type of data in the extensible markup language file;

从所述配置信息中获取预设键,并根据所述配置信息以及所述预设键,获得键值对;Obtain a preset key from the configuration information, and obtain a key-value pair according to the configuration information and the preset key;

将所述键值对读入缓存中;read the key-value pair into the cache;

从所述缓存中读取所述键值对,并根据所述键值对确定数据读取格式,以及依据所述数据读取格式,从所述待读取文件中读取目标数据。The key-value pair is read from the cache, a data reading format is determined according to the key-value pair, and target data is read from the to-be-read file according to the data reading format.

具体地,所述处理器32对上述指令的具体实现方法可参考图1对应实施例中相关步骤的描述,在此不赘述。Specifically, for the specific implementation method of the above-mentioned instruction by theprocessor 32, reference may be made to the description of the relevant steps in the corresponding embodiment of FIG. 1, and details are not described herein.

在图3所描述的电子设备3中,可以接收数据获取请求,所述数据获取请求携带有关键词和待读取文件;进一步地,根据所述关键词,从缓存中获取与所述关键词对应的可扩展标记语言文件;再进一步地,使用超级文本标记语言解析器解析所述可扩展标记语言文件,获得所述可扩展标记语言文件中每种类型的数据的配置信息;更进一步地,从所述配置信息中获取预设键,并根据所述配置信息以及所述预设键,获得键值对;将所述键值对读入缓存中;根据所述键值对,确定数据读取格式,并依据所述数据读取格式从所述待读取文件中读取目标数据。可见,本发明中,可以预先将可扩展标记语言文件存储在缓存中,当需要获取待读取文件中的目标数据时,可以加载缓存中的可扩展标记语言文件,并通过超文本标记语言解析器来解析该可扩展标记语言文件,获得配置信息以及键值对,并将键值对读入缓存中,依据键值对来读取所述待读取文件中的目标数据,整个过程中,都是从缓存中读取可扩展标记语言文件以及键值对,充分利用了缓存的系统资源,此外,由于可扩展标记语言文件占用的内存较小,从缓存中能快速的读取可扩展标记语言文件,从而能够整体上提高文件的读取效率,同时,系统资源利用率较高。In theelectronic device 3 described in FIG. 3 , a data acquisition request can be received, and the data acquisition request carries a keyword and a file to be read; further, according to the keyword, the keyword and the keyword are acquired from the cache Corresponding extensible markup language file; further, use a hypertext markup language parser to parse the extensible markup language file to obtain configuration information of each type of data in the extensible markup language file; further, Obtain a preset key from the configuration information, and obtain a key-value pair according to the configuration information and the preset key; read the key-value pair into the cache; determine the data read according to the key-value pair format, and read target data from the to-be-read file according to the data reading format. It can be seen that in the present invention, the extensible markup language file can be stored in the cache in advance, and when the target data in the file to be read needs to be obtained, the extensible markup language file in the cache can be loaded, and parsed by the hypertext markup language The extensible markup language file is parsed by the browser to obtain configuration information and key-value pairs, and the key-value pairs are read into the cache, and the target data in the to-be-read file is read according to the key-value pairs. During the whole process, Both read the extensible markup language files and key-value pairs from the cache, making full use of the system resources of the cache. In addition, because the extensible markup language files occupy less memory, the extensible markup can be quickly read from the cache. language files, so that the file reading efficiency can be improved as a whole, and at the same time, the system resource utilization rate is high.

所述电子设备3集成的模块/单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明实现上述实施例方法中的全部或部分流程,也可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一计算机可读存储介质中,该计算机程序在被处理器执行时,可实现上述各个方法实施例的步骤。其中,所述计算机程序包括计算机程序代码,所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质可以包括:能够携带所述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器以及只读存储器(ROM,Read-Only Memory)。If the modules/units integrated in theelectronic device 3 are implemented in the form of software functional units and sold or used as independent products, they may be stored in a computer-readable storage medium. Based on this understanding, the present invention can implement all or part of the processes in the methods of the above embodiments, and can also be completed by instructing relevant hardware through a computer program, and the computer program can be stored in a computer-readable storage medium. When the program is executed by the processor, the steps of the foregoing method embodiments can be implemented. Wherein, the computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file or some intermediate form, and the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a USB flash drive, a removable hard disk, a magnetic disk, an optical disc, a computer memory, and a read-only memory (ROM, Read-Only Memory). .

在本发明所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。In the several embodiments provided by the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are only illustrative. For example, the division of the modules is only a logical function division, and there may be other division manners in actual implementation.

所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。The modules described as separate components may or may not be physically separated, and components shown as modules may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

另外,在本发明各个实施例中的各功能模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能模块的形式实现。In addition, each functional module in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware, or can be implemented in the form of hardware plus software function modules.

对于本领域技术人员而言,显然本发明不限于上述示范性实施例的细节,而且在不背离本发明的精神或基本特征的情况下,能够以其他的具体形式实现本发明。因此,无论从哪一点来看,均应将实施例看作是示范性的,而且是非限制性的,本发明的范围由所附权利要求而不是上述说明限定,因此旨在将落在权利要求的等同要件的含义和范围内的所有变化涵括在本发明内。不应将权利要求中的任何附关联图标记视为限制所涉及的权利要求。此外,显然“包括”一词不排除其他单元或步骤,单数不排除复数。系统权利要求中陈述的多个单元或装置也可以由一个单元或装置通过软件或者硬件来实现。第二等词语用来表示名称,而并不表示任何特定的顺序。It will be apparent to those skilled in the art that the present invention is not limited to the details of the above-described exemplary embodiments, but that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics of the invention. Therefore, the embodiments are to be regarded in all respects as illustrative and not restrictive, and the scope of the invention is to be defined by the appended claims rather than the foregoing description, which are therefore intended to fall within the scope of the claims. All changes within the meaning and range of the equivalents of , are included in the present invention. Any reference signs in the claims shall not be construed as limiting the involved claim. Furthermore, it is clear that the word "comprising" does not exclude other units or steps and the singular does not exclude the plural. Several units or means recited in the system claims can also be realized by one unit or means by means of software or hardware. Second-class terms are used to denote names and do not denote any particular order.

最后应说明的是,以上实施例仅用以说明本发明的技术方案而非限制,尽管参照较佳实施例对本发明进行了详细说明,本领域的普通技术人员应当理解,可以对本发明的技术方案进行修改或等同替换,而不脱离本发明技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and not to limit them. Although the present invention has been described in detail with reference to the preferred embodiments, those of ordinary skill in the art should understand that the technical solutions of the present invention can be Modifications or equivalent substitutions can be made without departing from the spirit and scope of the technical solutions of the present invention.

Claims (10)

Translated fromChinese
1.一种数据获取方法,其特征在于,所述方法包括:1. a data acquisition method, is characterized in that, described method comprises:接收携带有关键词的数据获取请求;Receive a data acquisition request carrying a keyword;根据所述关键词,从缓存中获取与所述关键词对应的可扩展标记语言文件,以及获取待读取文件;According to the keyword, obtain the extensible markup language file corresponding to the keyword from the cache, and obtain the file to be read;使用超文本标记语言解析器解析所述可扩展标记语言文件,获得所述可扩展标记语言文件中每种类型的数据的配置信息;Using a hypertext markup language parser to parse the extensible markup language file to obtain configuration information of each type of data in the extensible markup language file;从所述配置信息中获取预设键,并根据所述配置信息以及所述预设键,获得键值对;Obtain a preset key from the configuration information, and obtain a key-value pair according to the configuration information and the preset key;将所述键值对读入缓存中;read the key-value pair into the cache;从所述缓存中读取所述键值对,并根据所述键值对确定数据读取格式,以及依据所述数据读取格式,从所述待读取文件中读取目标数据。The key-value pair is read from the cache, a data reading format is determined according to the key-value pair, and target data is read from the to-be-read file according to the data reading format.2.根据权利要求1所述的方法,其特征在于,所述使用超级文本标记语言解析器解析所述可扩展标记语言文件,获得所述可扩展标记语言文件中每种类型的数据的配置信息包括:2 . The method according to claim 1 , wherein the extensible markup language file is parsed by using a hypertext markup language parser to obtain configuration information of each type of data in the extensible markup language file. 3 . include:针对每种类型的数据,根据所述数据的类型,从所述可扩展标记语言文件中,确定与所述数据相关的目标标签;For each type of data, according to the type of the data, from the extensible markup language file, determine a target tag related to the data;通过所述超级文本标记语言解析器的选择器,读取所述目标标签,获得所述数据的配置信息;或,通过所述超级文本标记语言解析器的文档对象模型访问方法,读取所述目标标签,获得所述数据的配置信息。Through the selector of the hypertext markup language parser, read the target tag to obtain the configuration information of the data; or, through the document object model access method of the hypertext markup language parser, read the Target tag to obtain the configuration information of the data.3.根据权利要求1所述的方法,其特征在于,所述根据所述配置信息以及所述预设键,获得键值对包括:3. The method according to claim 1, wherein the obtaining a key-value pair according to the configuration information and the preset key comprises:将所述配置信息保存至目标对象中;saving the configuration information to the target object;将所述预设键和所述目标对象构成键值对。The preset key and the target object form a key-value pair.4.根据权利要求1所述的方法,其特征在于,所述根据所述键值对,确定数据读取格式,并依据所述数据读取格式从所述待读取文件中读取目标数据包括:4. The method according to claim 1, wherein the data reading format is determined according to the key-value pair, and target data is read from the to-be-read file according to the data reading format include:从所述键值对中,确定正则表达式;From the key-value pair, determine the regular expression;使用所述正则表达式,从所述待读取文件存储的数据中,获取与所述正则表达式匹配的目标数据。Using the regular expression, obtain target data matching the regular expression from the data stored in the file to be read.5.根据权利要求4所述的方法,其特征在于,所述正则表达式有多个,所述使用所述正则表达式,从所述待读取文件存储的数据中,获取与所述正则表达式匹配的目标数据包括:5. The method according to claim 4, characterized in that, there are multiple regular expressions, and the regular expression is used to obtain and match the regular expression from the data stored in the to-be-read file. The target data for expression matching includes:按照多个所述正则表达式的预设排列顺序,依次判断所述待读取文件存储的所有数据中是否存在与所述正则表达式匹配的目标数据;According to the preset arrangement order of a plurality of the regular expressions, sequentially determine whether there is target data matching the regular expression in all the data stored in the to-be-read file;若所述待读取文件存储的所有数据中存在与所述正则表达式匹配的目标数据,获取与所述正则表达式匹配的目标数据。If target data matching the regular expression exists in all the data stored in the file to be read, obtain the target data matching the regular expression.6.根据权利要求1所述的方法,其特征在于,所述根据所述关键词,从缓存中获取与所述关键词对应的可扩展标记语言文件,以及获取待读取文件之后,所述方法还包括:6. The method according to claim 1, wherein, according to the keyword, the extensible markup language file corresponding to the keyword is obtained from the cache, and after the file to be read is obtained, the Methods also include:使用文本解析工具解析所述待读取文件,获得输入流;Use a text parsing tool to parse the to-be-read file to obtain an input stream;将所述输入流保存至缓存中;saving the input stream to the cache;所述依据所述数据读取格式,从所述待读取文件中读取目标数据包括:The reading target data from the to-be-read file according to the data reading format includes:依据所述数据读取格式,从所述缓存中读取所述输入流中的目标数据。According to the data reading format, the target data in the input stream is read from the buffer.7.根据权利要求1至6中任一项所述的方法,其特征在于,所述方法还包括:7. The method according to any one of claims 1 to 6, wherein the method further comprises:从所述键值对中,获取参数类型和参数名称;Obtain the parameter type and parameter name from the key-value pair;按照所述参数类型和所述参数名称的数据保存格式,保存所述目标数据。The target data is saved according to the data saving format of the parameter type and the parameter name.8.一种数据获取装置,其特征在于,所述数据获取装置包括:8. A data acquisition device, characterized in that the data acquisition device comprises:接收模块,用于接收携带有关键词的数据获取请求;a receiving module, configured to receive a data acquisition request carrying a keyword;第一获取模块,用于根据所述关键词,从缓存中获取与所述关键词对应的可扩展标记语言文件,以及获取待读取文件;a first obtaining module, configured to obtain an extensible markup language file corresponding to the keyword from the cache according to the keyword, and obtain the file to be read;解析模块,用于使用超文本标记语言解析器解析所述可扩展标记语言文件,获得所述可扩展标记语言文件中每种类型的数据的配置信息;a parsing module, configured to use a hypertext markup language parser to parse the extensible markup language file to obtain configuration information of each type of data in the extensible markup language file;第二获取模块,用于从所述配置信息中获取预设键,并根据所述配置信息以及所述预设键,获得键值对;a second obtaining module, configured to obtain a preset key from the configuration information, and obtain a key-value pair according to the configuration information and the preset key;第一读取模块,用于将所述键值对读入缓存中;a first reading module, for reading the key-value pair into the cache;第二读取模块,用于从所述缓存中读取所述键值对,并根据所述键值对确定数据读取格式,以及依据所述数据读取格式,从所述待读取文件中读取目标数据。A second reading module, configured to read the key-value pair from the cache, determine a data read format according to the key-value pair, and read the file to be read from the to-be-read file according to the data read format Read the target data in.9.一种电子设备,其特征在于,所述电子设备包括处理器和存储器,所述处理器用于执行存储器中存储的计算机程序以实现如权利要求1至7中任意一项所述的数据获取方法。9. An electronic device, characterized in that the electronic device comprises a processor and a memory, and the processor is configured to execute a computer program stored in the memory to realize the data acquisition according to any one of claims 1 to 7 method.10.一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有至少一个指令,所述至少一个指令被处理器执行时实现如权利要求1至7中任意一项所述的数据获取方法。10. A computer-readable storage medium, wherein the computer-readable storage medium stores at least one instruction, and when the at least one instruction is executed by a processor, the implementation of any one of claims 1 to 7 is realized data acquisition method.
CN201910881318.8A2019-09-182019-09-18 Data acquisition method, device, electronic device and storage mediumPendingCN110851136A (en)

Priority Applications (2)

Application NumberPriority DateFiling DateTitle
CN201910881318.8ACN110851136A (en)2019-09-182019-09-18 Data acquisition method, device, electronic device and storage medium
PCT/CN2019/118979WO2021051624A1 (en)2019-09-182019-11-15Data acquisition method and apparatus, and electronic device and storage medium

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201910881318.8ACN110851136A (en)2019-09-182019-09-18 Data acquisition method, device, electronic device and storage medium

Publications (1)

Publication NumberPublication Date
CN110851136Atrue CN110851136A (en)2020-02-28

Family

ID=69594835

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201910881318.8APendingCN110851136A (en)2019-09-182019-09-18 Data acquisition method, device, electronic device and storage medium

Country Status (2)

CountryLink
CN (1)CN110851136A (en)
WO (1)WO2021051624A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN112035408A (en)*2020-09-012020-12-04文思海辉智科科技有限公司Text processing method and device, electronic equipment and storage medium
CN113449502A (en)*2021-06-292021-09-28平安资产管理有限责任公司Document generation method and system based on dynamic data
CN113553297A (en)*2021-06-082021-10-26优刻得科技股份有限公司Management method and system for switch configuration information
CN115544304A (en)*2022-10-122022-12-30东软睿驰汽车技术(大连)有限公司File analysis method and device, readable storage medium and file analysis equipment
CN117687626A (en)*2024-02-042024-03-12双一力(宁波)电池有限公司Host computer and main program matching system and method
CN118394429A (en)*2024-06-282024-07-26浪潮电子信息产业股份有限公司 A project configuration management method, device, product and medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN101556536A (en)*2008-04-112009-10-14北京闻言科技有限公司Method for configuring application program by using self-defining configuration files
CN105354311A (en)*2015-11-102016-02-24科大智能电气技术有限公司Data key value pair storage method based on embedded equipment file system
CN106649451A (en)*2016-09-222017-05-10北京奇虎科技有限公司Data update method and device
CN107145538A (en)*2017-04-212017-09-08网易(杭州)网络有限公司List data querying method, device and system
CN107169047A (en)*2017-04-252017-09-15腾讯科技(深圳)有限公司A kind of method and device for realizing data buffer storage
CN107562936A (en)*2017-09-122018-01-09中山大学A kind of crawl of web page news list based on Jsoup and store method
CN109450969A (en)*2018-09-272019-03-08北京奇艺世纪科技有限公司The method, apparatus and server of data are obtained from third party's data source server
CN109725932A (en)*2017-10-312019-05-07北京京东尚科信息技术有限公司A kind of application component illustrates document generation method and device
CN109947720A (en)*2019-04-122019-06-28苏州浪潮智能科技有限公司 A file pre-reading method, apparatus, device and readable storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN102594833B (en)*2012-03-092016-01-06北京思特奇信息技术股份有限公司A kind of communication protocol adapting method and system
CN108885627B (en)*2016-01-112022-04-05甲骨文美国公司Query-as-a-service system providing query result data to remote client
CN108228597A (en)*2016-12-142018-06-29深圳市优朋普乐传媒发展有限公司Data bank access method and device
CN107908485B (en)*2017-10-262020-08-04中国平安人寿保险股份有限公司Interface parameter transmission method, device, equipment and computer readable storage medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN101556536A (en)*2008-04-112009-10-14北京闻言科技有限公司Method for configuring application program by using self-defining configuration files
CN105354311A (en)*2015-11-102016-02-24科大智能电气技术有限公司Data key value pair storage method based on embedded equipment file system
CN106649451A (en)*2016-09-222017-05-10北京奇虎科技有限公司Data update method and device
CN107145538A (en)*2017-04-212017-09-08网易(杭州)网络有限公司List data querying method, device and system
CN107169047A (en)*2017-04-252017-09-15腾讯科技(深圳)有限公司A kind of method and device for realizing data buffer storage
CN107562936A (en)*2017-09-122018-01-09中山大学A kind of crawl of web page news list based on Jsoup and store method
CN109725932A (en)*2017-10-312019-05-07北京京东尚科信息技术有限公司A kind of application component illustrates document generation method and device
CN109450969A (en)*2018-09-272019-03-08北京奇艺世纪科技有限公司The method, apparatus and server of data are obtained from third party's data source server
CN109947720A (en)*2019-04-122019-06-28苏州浪潮智能科技有限公司 A file pre-reading method, apparatus, device and readable storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
丁旭 等: "《基于B/S架构的软件项目实训 JSP》", 31 August 2011, 北京:北京交通大学出版社, pages: 107 - 112*

Cited By (9)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN112035408A (en)*2020-09-012020-12-04文思海辉智科科技有限公司Text processing method and device, electronic equipment and storage medium
CN112035408B (en)*2020-09-012023-10-31文思海辉智科科技有限公司Text processing method, device, electronic equipment and storage medium
CN113553297A (en)*2021-06-082021-10-26优刻得科技股份有限公司Management method and system for switch configuration information
CN113553297B (en)*2021-06-082023-01-06优刻得科技股份有限公司Management method and system for switch configuration information
CN113449502A (en)*2021-06-292021-09-28平安资产管理有限责任公司Document generation method and system based on dynamic data
CN115544304A (en)*2022-10-122022-12-30东软睿驰汽车技术(大连)有限公司File analysis method and device, readable storage medium and file analysis equipment
CN117687626A (en)*2024-02-042024-03-12双一力(宁波)电池有限公司Host computer and main program matching system and method
CN117687626B (en)*2024-02-042024-05-03双一力(宁波)电池有限公司Host computer and main program matching system and method
CN118394429A (en)*2024-06-282024-07-26浪潮电子信息产业股份有限公司 A project configuration management method, device, product and medium

Also Published As

Publication numberPublication date
WO2021051624A1 (en)2021-03-25

Similar Documents

PublicationPublication DateTitle
CN110851136A (en) Data acquisition method, device, electronic device and storage medium
CN108304498B (en)Webpage data acquisition method and device, computer equipment and storage medium
CN109842629B (en)Method for realizing self-defined protocol based on protocol analysis framework
US11182451B2 (en)Automated generation of web API descriptions from usage data
US9426200B2 (en)Updating dynamic content in cached resources
WO2020253389A1 (en)Page translation method and apparatus, medium, and electronic device
CN111177113B (en)Data migration method, device, computer equipment and storage medium
US20130173655A1 (en)Selective fetching of search results
US20170199850A1 (en)Method and system to decrease page load time by leveraging network latency
WO2013178094A1 (en)Page display method and device
RU2665920C2 (en)Optimized visualization process in browser
US8930807B2 (en)Web content management based on timeliness metadata
WO2022048210A1 (en)Named entity recognition method and apparatus, and electronic device and readable storage medium
CN106648569B (en)Target serialization realization method and device
US20130246520A1 (en)Recognizing Social Media Posts, Comments, or other Texts as Business Recommendations or Referrals
WO2022179128A1 (en)Crawler-based data crawling method and apparatus, computer device, and storage medium
WO2019071907A1 (en)Method for identifying help information based on operation page, and application server
CN115080154A (en)Page display method and device, storage medium and electronic equipment
CN113127776A (en)Breadcrumb path generation method and device and terminal equipment
CN113760894B (en) Data retrieval method, device, electronic device and storage medium
TWI769632B (en)Data segmentation method processor electronic equipment and computer readable storage medium
CN113139145B (en)Page generation method and device, electronic equipment and readable storage medium
CN118193567B (en) Method, device, equipment and medium for generating query statements and querying business data
CN106055677B (en)Content-aggregated page display method and device in information flow
CN111782244A (en) Configuration file update method, device, computer equipment and storage medium

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
RJ01Rejection of invention patent application after publication
RJ01Rejection of invention patent application after publication

Application publication date:20200228


[8]ページ先頭

©2009-2025 Movatter.jp