Summary of the invention
The embodiment of the present invention provides a kind of large data files read method, device, computer equipment and storage medium, with solutionThe problem of data reading mode inefficiency of certainly existing dbf file.
A kind of large data files read method, comprising:
The header information for reading target dbf file, obtains every row field data amount in the header information, every rowField data amount refers to the size of every a line field data in the target dbf file;
First piece of field data of the target dbf file is mapped in specified memory as present field block, the headBlock field data refers to the preset data for being located at after the header information and abutting the header information in the target dbf fileMeasure the field data of size;
Successively obtain the field data of every row in the present field block line by line according to every row field data amount;
The field data got is parsed, the data value after being parsed;
Until before getting all field datas of the target dbf file, by next piece of the target dbf fileField data maps in the specified memory as new present field block, returns and executes according to every row field data amountThe step of successively obtaining the field data of every row in the present field block line by line, the next piece of field data refer to the meshMark the field data for being located at the preset data amount size after present field block and against present field block in dbf file.
A kind of large data files reading device, comprising:
Header information read module obtains every in the header information for reading the header information of target dbf fileRow field data amount, every row field data amount refer to the size of every a line field data in the target dbf file;
Field data mapping block, for mapping to first piece of field data of the target dbf file in specified memoryAs present field block, the first piece of field data, which refers to, to be located at after the header information and is abutted in the target dbf fileThe field data of the preset data amount size of the header information;
Module is obtained line by line, it is every in the present field block for successively being obtained line by line according to every row field data amountCapable field data;
Field data parsing module, for being parsed to the field data got, the data value after being parsed;
Circular treatment module, before all field datas up to getting the target dbf file, by the meshNext piece of field data of mark dbf file is mapped to as new present field block in the specified memory, and return successively triggersAcquisition module and the field data parsing module, the next piece of field data line by line refer to the target dbf fileIn be located at present field block after and against present field block the preset data amount size field data.
A kind of computer equipment, including memory, processor and storage are in the memory and can be in the processingThe computer program run on device, the processor realize above-mentioned large data files read method when executing the computer programThe step of.
A kind of computer readable storage medium, the computer-readable recording medium storage have computer program, the meterThe step of calculation machine program realizes above-mentioned large data files read method when being executed by processor.
Above-mentioned large data files read method, device, computer equipment and storage medium, firstly, reading target dbf fileHeader information, obtain every row field data amount in the header information, every row field data amount refers to the targetThe size of every a line field data in dbf file;Then, first piece of field data of the target dbf file is mapped to specifiedPresent field block is used as in memory, the first piece of field data refers in the target dbf file after the header informationAnd the field data of the preset data amount size against the header information;Then, successively according to every row field data amountThe field data of every row in the present field block is obtained line by line;The field data got is parsed, after obtaining parsingData value;Until before getting all field datas of the target dbf file, by the next of the target dbf fileBlock field data maps in the specified memory as new present field block, returns and executes according to every row field dataThe step of amount successively obtains the field data of every row in the present field block line by line, the next piece of field data refer to describedIt is located at the field data of the preset data amount size after present field block and against present field block in target dbf file.As it can be seen that the speed due to handling data in memory will be added far faster than the speed for handling data in disk in the present inventionThe mode of File Mapping is also higher than reading the efficiency of data on disk, therefore, is higher than on whole treatment effeciency existingMode improves the efficiency that dbf file reads data, reduces the read access time of data, can satisfy big data era to dataThe demand quickly read.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, completeSite preparation description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hairEmbodiment in bright, every other implementation obtained by those of ordinary skill in the art without making creative effortsExample, shall fall within the protection scope of the present invention.
Large data files read method provided by the present application, can be applicable in the application environment such as Fig. 1, wherein clientIt is communicated by network with server.Wherein, the client can be, but not limited to various personal computers, laptop,Smart phone, tablet computer and portable wearable device.Server can use independent server either multiple serversThe server cluster of composition is realized.
In one embodiment, it as shown in Fig. 2, providing a kind of large data files read method, applies in Fig. 1 in this wayServer for be illustrated, include the following steps:
101, the header information for reading target dbf file, obtains every row field data amount in the header information, describedEvery row field data amount refers to the size of every a line field data in the target dbf file;
In the present embodiment, dbf file is one kind of database format file, there is fixed call format, such as dbf textThe head of part is header information, and the size of a line field data in the dbf file, i.e., described every row word are had recorded in header informationSegment data amount.Therefore, server can be every in the header information to obtain by reading the header information of target dbf fileRow field data amount.
102, first piece of field data of the target dbf file is mapped in specified memory as present field block, instituteFirst piece of field data is stated to refer in the target dbf file after the header information and against the default of the header informationThe field data of data volume size;
It is understood that due to map the data into memory rate will far faster than read data rate, thisTarget dbf file is mapped in specified memory by the way of File Mapping and is handled by scheme.Since target dbf file is logicalIt is often larger, and the memory source of server is limited, and is generally difficult to accomplish disposably to map to entire target dbf file interiorIn depositing.Therefore, in the present embodiment, server can first be mapped to first piece of field data of the target dbf file in specifiedIt deposits middle as present field block, wherein the first piece of field data, which refers to, is located at the header information in the target dbf fileThe field data of preset data amount size afterwards and against the header information.It illustrates, it is assumed that 0- in the target dbf file100 bytes are header information, and 101-10000 byte is field data.When the preset data amount size is 100 byte, then shouldFirst piece of field data is 101-200 byte, and server first maps the field data of 101-200 byte in the target dbf filePresent field block is used as into the specified memory.
In actual use, when target dbf file is mapped to specified memory, if thinking the first piece of field data individuallySpecified memory is mapped to, this is difficult to realize.For this purpose, as shown in figure 3, further, the step 102 specifically can wrapIt includes:
201, the data for presetting mapping data volume size in the target dbf file since the header information are mappedTo specified memory, wherein the default mapping data volume size is equal to the data of preset data amount size and the header informationMeasure the sum of size;
202, the data in the data for mapping to the specified memory in addition to the header information are determined as current wordSection block.
For step 201, it is to be understood that in order to which first piece of field data is successfully mapped to specified memory, can incite somebody to actionThe first piece of field data maps to specified memory together with header information, namely maps since the head of target dbf fileThe field data of default mapping data volume size, wherein the default mapping data volume size is equal to preset data amount size and instituteState the sum of the data volume size of header information.For example, accept the example above, i.e., it will be to the number of 0-200 byte in target dbf fileAccording to mapping in specified memory.
For step 202, since the data mapped in step 201 contain header information, and do not have in these header informationsThere is the value comprising field data needed for this programme, therefore, server will can directly map to the data of the specified memoryIn data in addition to the header information be determined as present field block.It, can for the header information being mapped in specified memoryTo wipe or ignore, the present embodiment is not construed as limiting this.
Aiming at the problem that being difficult to realize individually map first piece of field data when mapping data mentioned above, in the present embodimentThere are also another ways to be handled.As shown in figure 4, further, before step 102 further include:
301, the header information of the target dbf file is deleted, obtains new target dbf file;
Step 102 specifically: reflect the field data for being located at the preset data amount size on head in the target dbf fileIt is incident upon in specified memory as present field block.
The thinking of this mode is, before mapping target dbf file, first by the head in target dbf fileInformation deletion, in this way, being only left Field Count in new target dbf file when executing step 102 to the mapping of target dbf fileAccording to there is no header information brings to interfere, therefore can directly carry out mapping processing.
For step 301, server can be deleted the header information of the target dbf file, obtain deleting head letterNew target dbf file after breath.
On the basis of step 301, it is to be understood that in above-mentioned steps 102, server can be by the target dbfField data in file positioned at the preset data amount size on head maps in specified memory as present field block.This be becauseWithout header information in the target dbf file, therefore can directly to take from the beginning pre- by the processing of step 301If the field data of data volume size is mapped in specified memory, it is as current that these are mapped to the field data in specified memoryField block.
In dbf file, field data stores in rows, and the data volume of every row is every row field dataAmount.But, it is contemplated that when determining to map obtained present field block, the data volume of present field block is that preset data amount is bigSmall, if being not limited to the preset data amount size and every row field data amount, this meeting is so that last in present field blockCapable data may insufficient a line field data.Such as, it is assumed that every row field data amount is 100, and preset data amount size is250,0-100 bytes are header information, then first piece of field data of the target dbf file is 101-350, and last line data are301-350, it is seen that last line data deficiencies a line field data, this can bring difficulty to subsequent parsing field data.For this purpose,The present embodiment can avoid the occurrence of above situation by limiting the relationship of preset data amount size and every row field data amount, makeThe last line data for obtaining each present field block are enough a line field datas.
As shown in figure 5, further, the preset data amount size can be preset by following steps:
401, the current space available of the specified memory is obtained;
402, determine the specified memory for mapping according to preset memory use ratio and the current space availableThe mapping space of field data;
403, the mapping space is obtained into the first numerical value divided by every row field data amount;
404, be rounded first numerical value, and calculate first numerical value after being rounded and every row field data amount itProduct, obtains second value as preset data amount size.
For step 401, which refers to the remaining space that specified memory currently can be used, for example, falseIf specified memory tribute 4g, has used 2g, then current space available is 2g.
For step 402, it is to be understood that server can preset memory use ratio, the memory use thanExample indicates that data should be mapped using the memory of how much ratios in current space available.The memory use ratio can basisSituation setting is actually used, for example can be set as 50%.Server is in the current space available for getting the specified memoryLater, the product that can calculate preset memory use ratio and the current space available obtains the specified memory for reflectingPenetrate the mapping space of field data.
For step 403 and 404, in the present embodiment, in order to enable preset data amount size is equal to every row field data amountIntegral multiple, while the preset data amount size is as close possible to the size of the mapping space.It therefore, can be first by the mappingSpace obtains the first numerical value divided by every row field data amount, is then rounded to the first numerical value, first numerical value after roundingIt may be considered multiple, the second number that the product of first numerical value after calculating the rounding and every row field data amount obtainsValue is the integral multiple of every row field data amount, may thereby determine that the second value is preset data amount size.
103, the field data of every row in the present field block is successively obtained line by line according to every row field data amount;
It is understood that due in dbf file field data be the arrangement of a line a line, storage, and the word of every a lineIn segment data again made an appointment each field put in order and each field respectively shared by byte therefore obtainingWhen field data in the present field block, need successively to obtain the present field line by line according to every row field data amountThe field data of every row in block, then executing step 104, these field datas are parsed respectively.
Further, as shown in the above, the possible insufficient a line field data of the last line data of present field block,Faced with this situation, after step 103, if this method can also include: the present field block last line gotThe data volume of field data be less than every row field data amount, then keep in the field data of the last line as connecting numberAccording to;Then, next piece of field data of the target dbf file is mapped in the specified memory as new in step 105Present field block after, return execute successively is obtained line by line in the present field block according to every row field data amount it is everyBefore the step of capable field data, the subsequent data is incorporated into the head of the present field block.For example, accepting above-mentionedCiting, it is assumed that present field block last line data are 301-350, then the last line data can be temporarily stored in interior by serverIn depositing, then, after step 105 obtains new present field block, it is known that the new present field block is 351-600, serviceThe last line data are merged into the head of present field block by device, so that present field block is 301-600.As it can be seen that logicalThe processing for crossing this mode can handle the last line data of present field block well.
104, the field data got is parsed, the data value after being parsed;
In the present embodiment, whenever getting in present field block after the field data of every row, server can be to obtainingThe field data got is parsed, the data value after being parsed.It is understood that in dbf file every a line fieldEach field in data is according to the arrangement that puts in order made an appointment, and respectively shared byte is also preparatory for each fieldIt appoints, therefore, after getting these field datas, is directly parsed in these field datas according to each field of agreementData value.
It 105, will be under the target dbf file until before getting all field datas of the target dbf fileOne piece of field data maps in the specified memory as new present field block, returns and executes according to every row Field CountThe step of successively obtaining the field data of every row in the present field block line by line according to amount, the next piece of field data refers to instituteState the Field Count for being located at the preset data amount size after present field block and against present field block in target dbf fileAccording to.
In the present embodiment, due to obtaining and parsing the field data in entire target dbf file, executing stepAfter 104, that is, after obtaining and having parsed present field block, server can be by next piece of Field Count of the target dbf fileAccording to mapping in the specified memory as new present field block, then return to step 103 and step 104 handle headNext piece of field data of block field data after step 103 and step 104 processing complete next piece of field data, then determinesUnder next piece of field data be new present field block, be then followed by acquisition and parsing, it is all until the target dbf fileField data is acquired and is parsed, it is known that, the data of all field datas in the target dbf file available at this timeValue so far completes the reading of the target dbf file.
In the embodiment of the present invention, firstly, reading the header information of target dbf file, obtain every in the header informationRow field data amount, every row field data amount refer to the size of every a line field data in the target dbf file;SoAfterwards, first piece of field data of the target dbf file is mapped in specified memory as present field block, the first piece of fieldData refer to the preset data amount size for being located at after the header information and abutting the header information in the target dbf fileField data;Then, the field of every row in the present field block is successively obtained line by line according to every row field data amountData;The field data got is parsed, the data value after being parsed;Until getting the target dbf fileAll field datas before, next piece of field data of the target dbf file is mapped into conduct in the specified memoryNew present field block returns to execution according to every row field data amount and successively obtains every row in the present field block line by lineField data the step of, the next piece of field data refer to be located in the target dbf file it is after present field block and tightBy the field data of the preset data amount size of present field block.As it can be seen that in the present invention, due to handling data in memorySpeed will be far faster than the speed for handling data in disk, along with the mode of File Mapping is also than reading data on diskEfficiency wants high, therefore, is higher than existing way on whole treatment effeciency, improves the efficiency that dbf file reads data, subtractsThe read access time of a small number of evidences can satisfy the demand that big data era quickly reads data.
It should be understood that the size of the serial number of each step is not meant that the order of the execution order in above-described embodiment, each processExecution sequence should be determined by its function and internal logic, the implementation process without coping with the embodiment of the present invention constitutes any limitIt is fixed.
In one embodiment, a kind of large data files reading device is provided, the large data files reading device and above-mentioned realityLarge data files read method in example is applied to correspond.As shown in fig. 6, the large data files reading device includes that header information is readModulus block 501, obtains module 503, field data parsing module 504 and circular treatment mould at field data mapping block 502 line by lineBlock 505.Detailed description are as follows for each functional module:
Header information read module 501 obtains in the header information for reading the header information of target dbf fileEvery row field data amount, every row field data amount refers to the size of every a line field data in the target dbf file;
Field data mapping block 502, for first piece of field data of the target dbf file to be mapped to specified memoryMiddle to be used as present field block, the first piece of field data refers to after the header information and tight in the target dbf fileBy the field data of the preset data amount size of the header information;
Module 503 is obtained line by line, for successively obtaining the present field block line by line according to every row field data amountIn every row field data;
Field data parsing module 504, for being parsed to the field data got, the data after being parsedValue;
Circular treatment module 505, before all field datas up to getting the target dbf file, by instituteNext piece of field data for stating target dbf file maps in the specified memory as new present field block, returns successivelyTriggering acquisition module and the field data parsing module, the next piece of field data line by line refer to the target dbfIt is located at the field data of the preset data amount size after present field block and against present field block in file.
Further, the field data mapping block may include:
First map unit, for mapping data volume will to be preset in the target dbf file since the header informationThe data of size map to specified memory, wherein the default mapping data volume size be equal to preset data amount size with it is describedThe sum of data volume size of header information;
Field block determination unit, the number in the data for the specified memory will to be mapped in addition to the header informationAccording to being determined as present field block.
Further, the large data files reading device can also include:
Header information removing module obtains new target dbf for deleting the header information of the target dbf fileFile;
The field data mapping block is specifically used for: will be located at the preset data amount on head in the target dbf fileThe field data of size maps in specified memory as present field block.
Further, the preset data amount size can be preset by following steps:
Available space obtains module, for obtaining the current space available of the specified memory;
Mapping space determining module, for according to preset memory use ratio and current space available determinationSpecified memory is used for the mapping space of map field data;
First Numerical Simulation Module, for the mapping space divided by every row field data amount, to be obtained the first numberValue;
Floor module for being rounded first numerical value, and calculates first numerical value and every row word after being roundedThe product of segment data amount obtains second value as preset data amount size.
Further, the large data files reading device can also include:
Temporary storage module, if the data volume of the field data for the present field block last line got is less than everyRow field data amount then keeps in the field data of the last line as subsequent data;
Merging module, for being mapped in the specified memory by next piece of field data of the target dbf fileAfter new present field block, return successively triggering it is described obtain line by line module and the field data parsing module itBefore, the subsequent data is incorporated into the head of the present field block.
Specific restriction about large data files reading device may refer to above for large data files read methodRestriction, details are not described herein.Modules in above-mentioned large data files reading device can be fully or partially through software, hardPart and combinations thereof is realized.Above-mentioned each module can be embedded in the form of hardware or independently of in the processor in computer equipment,It can also be stored in a software form in the memory in computer equipment, execute the above modules in order to which processor callsCorresponding operation.
In one embodiment, a kind of computer equipment is provided, which can be server, internal junctionComposition can be as shown in Figure 7.The computer equipment include by system bus connect processor, memory, network interface andDatabase.Wherein, the processor of the computer equipment is for providing calculating and control ability.The memory packet of the computer equipmentInclude non-volatile memory medium, built-in storage.The non-volatile memory medium is stored with operating system, computer program and dataLibrary.The built-in storage provides environment for the operation of operating system and computer program in non-volatile memory medium.The calculatingThe database of machine equipment is for storing the data being related in large data files read method.The network interface of the computer equipmentFor being communicated with external terminal by network connection.To realize a kind of big data text when the computer program is executed by processorPart read method.
In one embodiment, a kind of computer equipment is provided, including memory, processor and storage are on a memoryAnd the computer program that can be run on a processor, processor realize big data text in above-described embodiment when executing computer programThe step of part read method, such as step 101 shown in Fig. 2 is to step 105.Alternatively, reality when processor executes computer programThe function of each module/unit of large data files reading device in existing above-described embodiment, such as module 501 shown in Fig. 6 is to module505 function.To avoid repeating, which is not described herein again.
In one embodiment, a kind of computer readable storage medium is provided, computer program is stored thereon with, is calculatedMachine program realizes the step of large data files read method in above-described embodiment, such as step shown in Fig. 2 when being executed by processorRapid 101 to step 105.Alternatively, realizing that large data files read dress in above-described embodiment when computer program is executed by processorThe function for each module/unit set, such as module 501 shown in Fig. 6 is to the function of module 505.It is no longer superfluous here to avoid repeatingIt states.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be withRelevant hardware is instructed to complete by computer program, the computer program can be stored in a non-volatile computerIn read/write memory medium, the computer program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein,To any reference of memory, storage, database or other media used in each embodiment provided herein,Including non-volatile and/or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM(PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may includeRandom access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms,Such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double data rate sdram (DDRSDRAM), enhancingType SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM(RDRAM), direct memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..
It is apparent to those skilled in the art that for convenience of description and succinctly, only with above-mentioned each functionCan unit, module division progress for example, in practical application, can according to need and by above-mentioned function distribution by differentFunctional unit, module are completed, i.e., the internal structure of described device is divided into different functional unit or module, more than completingThe all or part of function of description.
Embodiment described above is merely illustrative of the technical solution of the present invention, rather than its limitations;Although referring to aforementioned realityApplying example, invention is explained in detail, those skilled in the art should understand that: it still can be to aforementioned eachTechnical solution documented by embodiment is modified or equivalent replacement of some of the technical features;And these are modifiedOr replacement, the spirit and scope for technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution should allIt is included within protection scope of the present invention.