Movatterモバイル変換


[0]ホーム

URL:


CN110515920A - A kind of mass small documents access method and system based on Hadoop - Google Patents

A kind of mass small documents access method and system based on Hadoop
Download PDF

Info

Publication number
CN110515920A
CN110515920ACN201910816503.9ACN201910816503ACN110515920ACN 110515920 ACN110515920 ACN 110515920ACN 201910816503 ACN201910816503 ACN 201910816503ACN 110515920 ACN110515920 ACN 110515920A
Authority
CN
China
Prior art keywords
small documents
file
index
small
hadoop
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910816503.9A
Other languages
Chinese (zh)
Inventor
孙伟源
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Inspur Data Technology Co Ltd
Original Assignee
Beijing Inspur Data Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Inspur Data Technology Co LtdfiledCriticalBeijing Inspur Data Technology Co Ltd
Priority to CN201910816503.9ApriorityCriticalpatent/CN110515920A/en
Publication of CN110515920ApublicationCriticalpatent/CN110515920A/en
Pendinglegal-statusCriticalCurrent

Links

Classifications

Landscapes

Abstract

The invention discloses a kind of mass small documents access method and system based on Hadoop, method includes: step 1, judges whether to need to save small documents;If so, step 2, classifies to small documents according to predetermined characteristic, and the small documents of small documents index is put into small documents queue;Step 3, judge whether the length of small documents queue reaches threshold value;If so, multiple small documents in small documents queue are merged into big file by step 4, global index is established, and corresponding relationship is deposited into file index backward NameNode and initiates storage request;Step 5, NameNode according to the block of default size to big file division at data block after, by the storage at least one DataNode of big file, and the state of DataNode and DataNode where data block are write in name space.By first being sorted out according to predetermined characteristic, big file being synthesized in small documents queue before small documents store, small documents rope and vertical global index are established, memory consumption, system load are reduced, improves access efficiency.

Description

A kind of mass small documents access method and system based on Hadoop
Technical field
The present invention relates to big data processing technology fields, access more particularly to a kind of mass small documents based on HadoopMethod and system.
Background technique
Currently, Internet application is ubiquitous, resulting mass data brings huge pressure to storage and processingPower.Big data technology is a series of unconventional tool of uses to a large amount of structuring, unstructured and partly-structured dataHandled and obtained the technology of analysis and prediction result.
It can not only be the storage of mass data using big data processing technique by Hadoop frame application in mass dataCarrier is provided, while also providing new approach for efficiently processing data.Hadoop provides a distributed document storageSystem HDFS.HDFS can be used to save the mass data of substantially sequential access, and provide it is a kind of quickly access it is specificThe mechanism of data.
However, the HDFS designed to handle big file is that can generate in small documents such as processing picture, file typesProblem.General small documents refer to that size is less than the file of 10M, if there are a large amount of this small documents in system, it will poleThe memory headroom of the earth trumpet NameNode, to influence the performance of entire HDFS cluster.
There is no very good solution methods aiming at the problem that HDFS accesses small documents at present, and HDFS itself is providedSequencefile solution reduces the memory consumption of NameNode by merging small documents Li Ai to greatest extent.Sequencefile is the text storage file being made of the byte of Binary Serialization key/value.InIn Sequencefile, each key/value is counted as a record.In general, can by the file of small documents andFile content constructs a key-value pair, and the key-value pair set being made of in this way multiple small documents can be bundled toIn Sequencefile.Sequencefile supports compression, can by several recording compresseds to together, the method reduceThe memory consumption of NameNode, but file mergences needs to consume the long period, since key assignments therein does not arrange, searches oneA small documents need to be traversed for entire Sequencefile, reduce access efficiency.
Summary of the invention
The object of the present invention is to provide a kind of mass small documents access method and system based on Hadoop are reducedNameNode memory consumption improves access efficiency, reduces system load.
In order to solve the above technical problems, the embodiment of the invention provides a kind of mass small documents access side based on HadoopMethod, comprising:
Step 1, judge whether to need to save small documents;
If so, step 2, after classifying according to predetermined characteristic to the small documents, by small documents and described smallThe small documents index of file is put into small documents queue;
Step 3, judge whether the length of the small documents queue reaches threshold value;
If so, multiple small documents in the small documents queue are merged into big file by step 4, global rope is establishedDraw, and corresponding relationship is deposited into file index backward NameNode and initiates storage request;
Step 5, the NameNode according to the block of default size to the big file division at data block after, by instituteIt states in the storage at least one DataNode of big file, and by the DataNode where the data block and describedThe state of DataNode is write in name space.
Wherein, the step 2 includes:
The small documents are sorted out according to the file type or creation time of the small documents.
Wherein, the step 4 includes:
Multiple small documents in the small documents queue are merged by big file using Mapfile, whereinMapFile includes the part index and the part data, and for storing data, the part index is used for file for the part dataData directory, for recording the deviation post of the key value and record of record hereof.
Wherein, after the step 5, further includes:
Step 6, judge whether to receive small documents read requests;
Step 7, pre-read in the big file in the small documents read requests where corresponding small documents with the small documentsThe relevant small documents.
Wherein, after the step 7, further includes:
Step 8, judge whether the frequency accessed in the given time of the small documents reaches threshold value;
If so, step 9, by small documents storage into caching.
Wherein, after the step 9, further includes:
Step 10, judge the small documents in the caching and it is last accessed between time interval whether reachPre- fixed length T;
If so, the small documents are deleted from the caching.
In addition to this, the embodiment of the invention also provides a kind of, and the mass small documents based on Hadoop access system, comprising:
Small documents store request module, for after having detected that small documents are stored, output pretreatment to be orderedIt enables;
Small documents preprocessing module is connect with small documents storage request module, receives the pretreatment order, according toThe small documents of the small documents and small documents index is put into small by predetermined characteristic after classifying to the small documentsIn document queue, after the length of the small documents queue reaches threshold value, by multiple small texts in the small documents queuePart merges into big file, establishes global index, and corresponding relationship is deposited into after the file index and is deposited to NameNode initiationStorage request, control the NameNode according to preset size block to the big file division at data block after, will it is described greatlyFile storage is at least one DataNode, and by the DataNode and the DataNode where the data blockState is write in name space.
It wherein, further include the small documents read requests module being connect with the small documents preprocessing module, the pre- modulus of indexBlock, the small documents read requests module are used for after detecting small documents read requests, prefetch module hair to the indexPre-read is requested out, and the index prefetches module and pre-reads big file in the small documents read requests where corresponding small documentsIn the small documents relevant to the small documents.
It wherein, further include prefetching the cache module that module is connect with index, the cache module is for storing the predetermined timeInterior accessed frequency reaches the small documents of threshold value.
It wherein, further include the cache cleaner module being connect with the cache module, the cache cleaner module detects instituteState the small documents in caching and it is last accessed between time interval reach pre- fixed length T after, by the small documents fromIt is deleted in the caching.
Mass small documents access method and system based on Hadoop provided by the embodiment of the present invention, with prior art phaseThan having the advantage that
Mass small documents access method and system provided in an embodiment of the present invention based on Hadoop, by being deposited in small documentsBefore storage, first sorted out according to predetermined characteristic, big file is synthesized in small documents queue, establishes small documents rope and the vertical overall situationIndex, so that double indexes are formed by small documents rope and vertical global index, so that the reading process of small documents in reading processIn, it is first indexed again to small documents from global index, inquiry velocity has more block, realizes the quick positioning of small documents, simultaneously because needingThe index file wanted is less, reduces memory consumption, system load, improves access efficiency, while storage is stored according to predetermined characteristic,Storage efficiency is higher, can also improve reading efficiency accordingly.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show belowThere is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is the present inventionSome embodiments for those of ordinary skill in the art without creative efforts, can also basisThese attached drawings obtain other attached drawings.
Fig. 1 is a kind of specific embodiment party of the mass small documents access method provided in an embodiment of the present invention based on HadoopThe step flow diagram of formula;
Fig. 2 is another specific implementation of the mass small documents access method provided in an embodiment of the present invention based on HadoopThe step flow diagram of mode;
Fig. 3 is a kind of specific embodiment party that the mass small documents provided in an embodiment of the present invention based on Hadoop access systemThe attachment structure schematic diagram of formula;
Fig. 4 is another specific implementation that the mass small documents provided in an embodiment of the present invention based on Hadoop access systemThe attachment structure schematic diagram of mode.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, completeSite preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based onEmbodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every otherEmbodiment shall fall within the protection scope of the present invention.
FIG. 1 to FIG. 4 is please referred to, Fig. 1 is the mass small documents access method provided in an embodiment of the present invention based on HadoopA kind of specific embodiment step flow diagram;Fig. 2 is that the magnanimity provided in an embodiment of the present invention based on Hadoop is smallThe step flow diagram of another specific embodiment of file access method;Fig. 3 is provided in an embodiment of the present invention is based onA kind of attachment structure schematic diagram of specific embodiment of the mass small documents access system of Hadoop;Fig. 4 is that the present invention is implementedThe attachment structure schematic diagram of another specific embodiment for the mass small documents access system based on Hadoop that example provides.
In a specific embodiment, the mass small documents access method based on Hadoop, comprising:
Step 1, judge whether to need to save small documents;It needs to judge whether there is small documents storage herein and askAsk, to open subsequent step, save memory, small documents storage request here can be timing and detect, be also possible toMachine testing, if system carries out random assignment, such as 1-3s, or according to the flat rate of appearance of small documents, if frequency is very beforeHeight illustrates currently carrying out large-scale small documents storage, thus needs to improve detection frequency, reduces between detection timeEvery on the contrary, detection time interval can be increased.
If so, step 2, after classifying according to predetermined characteristic to the small documents, by small documents and described smallThe small documents index of file is put into small documents queue;Here the purpose classified is storage and subsequent reading for convenienceIt takes, the file of general same category feature can be stored and be read by collective so that it is convenient to subsequent reading, otherwise, even if carrying out small textThe position enquiring of part just needs many time, both increases memory consumption, also increases the time read and needed, substantially reducesAccess efficiency.
Step 3, judge whether the length of the small documents queue reaches threshold value;Judge that the length of small documents queue reaches thresholdThe purpose of value is, facilitates it in the big file of subsequent synthesis, all has unified length, the length between different big filesSpend of substantially equal, it is of substantially equal to store the space occupied, is similar to and uses packaging cargo in case, greatly improves the utilization for spaceEfficiency, the present invention for small documents queue length threshold without limitation, can be according to the size of big file storage intoRow auto-changing, as big file storage space in, can allow for storage quantity be 100G, allow 100 big files, eachThe length of big file is not more than 1G, and after the memory space of big file becomes 200G, allow quantity or 100, thatThe length of each big file becomes not greater than 1G, or regardless of in that memory space, each the size of big file isNo more than 1G, only the size according to corresponding memory space, quantity are accordingly converted, and this is not limited by the present invention.
If so, multiple small documents in the small documents queue are merged into big file by step 4, global rope is establishedDraw, and corresponding relationship is deposited into file index backward NameNode and initiates storage request;Here global index is established with beforePerson indexes to form double indexes in the small documents formed in small documents queue, so that in subsequent reading, it can be using double indexesStructure be read out, small documents can be more quickly positioned, so that all becoming more in the reading of small documents and storing processAccelerate speed.
Step 5, the NameNode according to the block of default size to the big file division at data block after, by instituteIt states in the storage at least one DataNode of big file, and by the DataNode where the data block and describedThe state of DataNode is write in name space.
By first being sorted out according to predetermined characteristic, big file being synthesized in small documents queue before small documents store,Small documents rope and vertical global index are established, so that forming double ropes by small documents rope and vertical global index in reading processDraw, so that first being indexed again to small documents from global index, inquiry velocity has more block, realizes small text in the reading process of small documentsThe quick positioning of part reduces memory consumption, system load simultaneously because the index file needed is less, improves access efficiency, togetherWhen storage stored according to predetermined characteristic, storage efficiency is higher, can also improve reading efficiency accordingly.
It needing to carry out certain pretreatment before small documents storage in the present invention, it, which is first sorted out again, becomes big file,The present invention is to its classifying mode and sorts out requirement without limitation, and in one embodiment of the invention, the step 2 includes:
The small documents are sorted out according to the file type or creation time of the small documents.
It should be pointed out that a kind of classifying mode is generally used in the present invention, such as only with file type or only with creationTime is sorted out, and may be such that a small documents not only belong to the former in such a way that mixing is sorted out, but also belong to the latter, badIt is divided, certain present invention can also in other manners, and the present invention is without limitation.
It needs to merge into small documents into big file after classification in the present invention, similar to the standard of small documents storageChange, in being merged into big file and then reading process, is first read in the way of big file, then read in big fileTake small documents, the present invention for small documents merging mode without limitation, in one embodiment, the step 4 includes:
Multiple small documents in the small documents queue are merged by big file using Mapfile, whereinMapFile includes the part index and the part data, and for storing data, the part index is used for file for the part dataData directory, for recording the deviation post of the key value and record of record hereof.
When MapFile is accessed, index file can be loaded into memory, can be navigated to rapidly by indexing mapping relationsDocument location where specified record greatly improves recall precision, and then improves access efficiency.
It is used in the present invention to the pretreated mode of small documents, by changing storage mode in storing process, so that itsConveniently it is read, and in reading process, if it is possible to there is better reading mechanism, also can be improved access efficiency, in this hairIn bright one embodiment, the mass small documents access method based on Hadoop is after the step 5, further includes:
Step 6, judge whether to receive small documents read requests;
Step 7, pre-read in the big file in the small documents read requests where corresponding small documents with the small documentsThe relevant small documents.
After receiving small documents read requests, small documents and file relevant to small documents are pre-read, energyEnough save the step interacted with NameNode and time, it is this prefetch mechanism under, NameNode node visit amount will be significantlyIt reduces, hence it is evident that improve the operational efficiency of NameNode.
In one embodiment, when HDFS attempts to read a small documents in MapFile, with this document sameThe metadata information of other related small documents in MapFile can be prefetched from NameNode node.Due to at oneThere is correlation between small documents in MapFile, user often accesses relative file when reading a file,When the metadata of related small documents is stored in HDFS client-cache, client can be saved to be interacted with NameNodeStep and time, so that NameNode node visit amount will greatly reduce, hence it is evident that the operational efficiency of NameNode is improved,The memory for reducing NameNode consumption, reduces system load.
In order to further increase reading efficiency, in one embodiment of the present of invention, after the step 7, further includes:
Step 8, judge whether the frequency accessed in the given time of the small documents reaches threshold value;
If so, step 9, by small documents storage into caching.
Some files are often repeatedly inquired, and the access frequency of each file is not identical.To improve reading speed,After user reads file, access record is write down, for counting access times.Caching clothes are placed on for the file of high access frequencyDevice be engaged in as caching, when user reads again same file, need to only be read from cache server, read these files in this wayThe time of consumption can greatly reduce, and improve access efficiency.
However the access behavior of user often changes, if the file of storing excess in the buffer, and be infrequently byThe file used, then caching, which becomes, can become too fat to move, simultaneously because the space of caching is limited, the quantity for the small documents that can storeLimited, the efficiency for caching this high-quality storage resource will not be given full play to, in order to solve this technical problem, in the present inventionOne embodiment in, after the step 9, further includes:
Step 10, judge the small documents in the caching and it is last accessed between time interval whether reachPre- fixed length T;
If so, step 11, the small documents are deleted from the caching.
The time interval used by judging small documents, if its time interval exceeds threshold value T, illustrating may quiltThe probability used can decline, and value will decline, and have exceeded the lower limit of buffer memory file value, what is cached in this way makesIt will be lower with efficiency, and by being deleted, allow for that the high file of more accessed probabilities can be stored in caching in this way, this is rightAsk that file reading speed is very helpful in raising.In the present invention by using double-indexing mechanism and caching mechanism, from visitorThe accessed note probability of file is improved in terms of family end and server-side two, enhances the robustness of system.
In addition to this, the embodiment of the invention also provides a kind of, and the mass small documents based on Hadoop access system, comprising:
Small documents store request module 10, for after having detected that small documents are stored, output to be pre-processedOrder;
Small documents preprocessing module 20 is connect with small documents storage request module 10, receives the pretreatment order,The small documents of the small documents and small documents index is put after classifying according to predetermined characteristic to the small documentsEnter in small documents queue, it, will be multiple described in the small documents queue after the length of the small documents queue reaches threshold valueSmall documents merge into big file, establish global index, and corresponding relationship is deposited into after the file index and is sent out to NameNodeRise storage request, control the NameNode according to preset size block to the big file division at data block after, by instituteIt states in the storage at least one DataNode of big file, and by the DataNode where the data block and describedThe state of DataNode is write in name space.
Since the mass small documents access system based on Hadoop is based on the above-mentioned mass small documents based on HadoopThe system of access method, beneficial effect having the same, this is not limited by the present invention.
It is in one embodiment of the invention, described based on Hadoop's in order to further increase the reading efficiency of fileIt further includes the small documents read requests module 30 connecting with the small documents preprocessing module 20, rope that mass small documents, which access system,Draw and prefetch module 40, the small documents read requests module 30 is used for the Xiang Suoshu rope after detecting small documents read requestsDraw prefetch module 40 issue pre-read request, it is described index prefetch module 40 pre-read it is corresponding small in the small documents read requestsThe small documents relevant to the small documents in big file where file.
By using the mode pre-read, setting index prefetches module between HDFS client and NameNode.Work as HDFSWhen attempting to read a small documents in big file (such as the Mapfile being merged into using MapFile technology), with this document sameThe metadata information of other related small documents in one big file can be prefetched from NameNode node.Due to at oneThere is correlation between small documents in MapFile, user often accesses relative file when reading a file,When the metadata of related small documents is stored in HDFS client-cache, client can be saved to be interacted with NameNodeStep and time.It is this prefetch mechanism under, NameNode node visit amount will greatly reduce, hence it is evident that improve NameNodeOperational efficiency.
In order to further increase file reading efficiency, in one embodiment of the invention, the sea based on HadoopAmount small documents access system further includes prefetching the cache module 50 that module 40 is connect with index, and the cache module 50 is for storingThe small documents that frequency reaches threshold value are accessed in predetermined time.
Some files are often repeatedly inquired, and the access frequency of each file is not identical.To improve reading speed,After user reads file, access record is write down, for counting access times.Caching clothes are placed on for the file of high access frequencyDevice be engaged in as caching, when user reads again same file, need to only be read from cache server, read these files in this wayThe time of consumption can greatly reduce, and improve access efficiency.
Thus, in the present invention by increasing cache module, the file of reading and the higher culture of frequency of use are putIt sets in the buffer, reads characteristic using natural high efficiency is cached, improve the reading efficiency of file.
However the access behavior of user often changes, high access frequency is on certain time section, if quilt in cachingThe file blocking or injection being largely not frequently used, since the space of itself is very limited, the file that can store becomesLess, it so that its efficiency utilization rate reduces, in order to solve this technical problem, in one embodiment of the invention, is set forth inThe mass small documents access system of Hadoop further includes the cache cleaner module connecting with the cache module 50, and the caching is clearReason module detect small documents in the caching and it is last it is accessed between time interval reach pre- fixed length T after,The small documents are deleted from the caching.
By setting up a timer in cache server, for recording the time of last access file till nowInterval, when time interval be greater than scheduled duration T after, system can be automatically deleted the file higher than T, this makes it possible to realize cachingIn file regular update so that its frequency of use and service efficiency, maintain a high-order level alwaysOn, improve service efficiency.
In conclusion the mass small documents access method and system provided in an embodiment of the present invention based on Hadoop, passes throughBefore small documents storage, is first sorted out according to predetermined characteristic, big file is synthesized in small documents queue, establishes small documents ropeAnd vertical global index, so that double indexes are formed by small documents rope and vertical global index, so that small documents in reading processReading process in, first indexed again to small documents from global index, inquiry velocity has more block, realizes the quick positioning of small documents,Simultaneously because the index file needed is less, memory consumption, system load are reduced, improves access efficiency, while storing according to pre-Determine characteristic storage, storage efficiency is higher, can also improve reading efficiency accordingly.
The transaudient alarm method of phone provided by the present invention and device are described in detail above.It is used hereinA specific example illustrates the principle and implementation of the invention, and the above embodiments are only used to help understand originallyThe method and its core concept of invention.It should be pointed out that for those skilled in the art, not departing from this hair, can be with several improvements and modifications are made to the present invention under the premise of bright principle, these improvement and modification also fall into power of the present inventionIn the protection scope that benefit requires.

Claims (10)

Small documents preprocessing module is connect with small documents storage request module, the pretreatment order is received, according to predeterminedThe small documents of the small documents and the small documents are indexed after classifying to the small documents and are put into small documents by featureIn queue, after the length of the small documents queue reaches threshold value, multiple small documents in the small documents queue are closedAnd be big file, global index is established, and corresponding relationship is deposited into initiate to store to NameNode after the file index and is askedAsk, control the NameNode according to the block of default size to the big file division at data block after, by the big fileIt stores at least one DataNode, and by the state of the DataNode and the DataNode where the data blockIt writes in name space.
CN201910816503.9A2019-08-302019-08-30A kind of mass small documents access method and system based on HadoopPendingCN110515920A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201910816503.9ACN110515920A (en)2019-08-302019-08-30A kind of mass small documents access method and system based on Hadoop

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201910816503.9ACN110515920A (en)2019-08-302019-08-30A kind of mass small documents access method and system based on Hadoop

Publications (1)

Publication NumberPublication Date
CN110515920Atrue CN110515920A (en)2019-11-29

Family

ID=68629643

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201910816503.9APendingCN110515920A (en)2019-08-302019-08-30A kind of mass small documents access method and system based on Hadoop

Country Status (1)

CountryLink
CN (1)CN110515920A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN110968272A (en)*2019-12-162020-04-07华中科技大学Time sequence prediction-based method and system for optimizing storage performance of mass small files
CN111475469A (en)*2020-03-192020-07-31中山大学Virtual file system-based small file storage optimization system in KUBERNETES user mode application
CN113127548A (en)*2019-12-312021-07-16奇安信科技集团股份有限公司File merging method, device, equipment and storage medium
CN113407620A (en)*2020-03-172021-09-17北京信息科技大学Data block placement method and system based on heterogeneous Hadoop cluster environment
CN113590566A (en)*2021-06-232021-11-02河海大学Stack structure-based sequence File storage optimization method, device, equipment and storage medium
CN113901007A (en)*2021-10-202022-01-07杭州电子科技大学Distributed caching method for massive small files for AI training
CN114116612A (en)*2021-11-152022-03-01长沙理工大学B + tree index-based access method for archived files
CN114546962A (en)*2022-02-172022-05-27桂林理工大学Hadoop-based distributed storage system for marine bureau ship inspection big data
CN115269524A (en)*2022-09-262022-11-01创云融达信息技术(天津)股份有限公司Integrated system and method for end-to-end small file collection transmission and storage
CN115858249A (en)*2022-12-302023-03-28北京迪艾尔软件技术有限公司Backup method for massive unstructured data files

Citations (10)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN102332029A (en)*2011-10-152012-01-25西安交通大学 A method for associative storage of massive classifiable small files based on Hadoop
CN102332027A (en)*2011-10-152012-01-25西安交通大学 A method for associative storage of massive non-independent small files based on Hadoop
CN102902716A (en)*2012-08-272013-01-30苏州两江科技有限公司Storage system based on Hadoop distributed computing platform
CN103559229A (en)*2013-10-222014-02-05西安电子科技大学Small file management service (SFMS) system based on MapFile and use method thereof
CN103856567A (en)*2014-03-262014-06-11西安电子科技大学Small file storage method based on Hadoop distributed file system
CN105183839A (en)*2015-09-022015-12-23华中科技大学Hadoop-based storage optimizing method for small file hierachical indexing
CN105956183A (en)*2016-05-302016-09-21广东电网有限责任公司电力调度控制中心Method and system for multi-stage optimization storage of a lot of small files in distributed database
CN106909651A (en)*2017-02-232017-06-30郑州云海信息技术有限公司A kind of method for being write based on HDFS small documents and being read
CN107045531A (en)*2017-01-202017-08-15郑州云海信息技术有限公司A kind of system and method for optimization HDFS small documents access
CN109800208A (en)*2019-01-182019-05-24湖南友道信息技术有限公司Network traceability system and its data processing method, computer storage medium

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN102332029A (en)*2011-10-152012-01-25西安交通大学 A method for associative storage of massive classifiable small files based on Hadoop
CN102332027A (en)*2011-10-152012-01-25西安交通大学 A method for associative storage of massive non-independent small files based on Hadoop
CN102902716A (en)*2012-08-272013-01-30苏州两江科技有限公司Storage system based on Hadoop distributed computing platform
CN103559229A (en)*2013-10-222014-02-05西安电子科技大学Small file management service (SFMS) system based on MapFile and use method thereof
CN103856567A (en)*2014-03-262014-06-11西安电子科技大学Small file storage method based on Hadoop distributed file system
CN105183839A (en)*2015-09-022015-12-23华中科技大学Hadoop-based storage optimizing method for small file hierachical indexing
CN105956183A (en)*2016-05-302016-09-21广东电网有限责任公司电力调度控制中心Method and system for multi-stage optimization storage of a lot of small files in distributed database
CN107045531A (en)*2017-01-202017-08-15郑州云海信息技术有限公司A kind of system and method for optimization HDFS small documents access
CN106909651A (en)*2017-02-232017-06-30郑州云海信息技术有限公司A kind of method for being write based on HDFS small documents and being read
CN109800208A (en)*2019-01-182019-05-24湖南友道信息技术有限公司Network traceability system and its data processing method, computer storage medium

Cited By (18)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN110968272B (en)*2019-12-162021-01-01华中科技大学Time sequence prediction-based method and system for optimizing storage performance of mass small files
CN110968272A (en)*2019-12-162020-04-07华中科技大学Time sequence prediction-based method and system for optimizing storage performance of mass small files
CN113127548A (en)*2019-12-312021-07-16奇安信科技集团股份有限公司File merging method, device, equipment and storage medium
CN113127548B (en)*2019-12-312023-10-31奇安信科技集团股份有限公司File merging method, device, equipment and storage medium
CN113407620B (en)*2020-03-172023-04-21北京信息科技大学 Data block placement method and system based on heterogeneous Hadoop cluster environment
CN113407620A (en)*2020-03-172021-09-17北京信息科技大学Data block placement method and system based on heterogeneous Hadoop cluster environment
CN111475469A (en)*2020-03-192020-07-31中山大学Virtual file system-based small file storage optimization system in KUBERNETES user mode application
CN111475469B (en)*2020-03-192021-12-14中山大学 Small file storage optimization system based on virtual file system in KUBERNETES user mode application
CN113590566A (en)*2021-06-232021-11-02河海大学Stack structure-based sequence File storage optimization method, device, equipment and storage medium
CN113590566B (en)*2021-06-232023-10-27河海大学 SequenceFile storage optimization method, device, equipment and storage medium based on heap structure
CN113901007A (en)*2021-10-202022-01-07杭州电子科技大学Distributed caching method for massive small files for AI training
CN114116612A (en)*2021-11-152022-03-01长沙理工大学B + tree index-based access method for archived files
CN114116612B (en)*2021-11-152024-06-07长沙理工大学Access method for index archive file based on B+ tree
CN114546962A (en)*2022-02-172022-05-27桂林理工大学Hadoop-based distributed storage system for marine bureau ship inspection big data
CN115269524B (en)*2022-09-262023-03-24创云融达信息技术(天津)股份有限公司Integrated system and method for end-to-end small file collection transmission and storage
CN115269524A (en)*2022-09-262022-11-01创云融达信息技术(天津)股份有限公司Integrated system and method for end-to-end small file collection transmission and storage
CN115858249A (en)*2022-12-302023-03-28北京迪艾尔软件技术有限公司Backup method for massive unstructured data files
CN115858249B (en)*2022-12-302024-07-09北京迪艾尔软件技术有限公司Backup method for massive unstructured data files

Similar Documents

PublicationPublication DateTitle
CN110515920A (en)A kind of mass small documents access method and system based on Hadoop
US8352517B2 (en)Infrastructure for spilling pages to a persistent store
US8145859B2 (en)Method and system for spilling from a queue to a persistent store
US10296462B2 (en)Method to accelerate queries using dynamically generated alternate data formats in flash cache
CN105956183B (en)The multilevel optimization's storage method and system of mass small documents in a kind of distributed data base
CN100452041C (en)Method and system for reading information at network resource site, and searching engine
US9712646B2 (en)Automated client/server operation partitioning
CN108804566B (en)A kind of mass small documents read method based on Hadoop
CN104252536B (en)A kind of internet log data query method and device based on hbase
EP2608070A1 (en)Hybrid database table stored as both row and column store
CN108932287B (en)A kind of mass small documents wiring method based on Hadoop
EP2608071A1 (en)Hybrid database table stored as both row and column store
CN104679898A (en)Big data access method
US9852180B2 (en)Systems and methods of accessing distributed data
CN103810237A (en)Data management method and system
CN109492148A (en)ElasticSearch paging query method and apparatus based on Redis
CN109815234A (en)A kind of multiple cuckoo filter under streaming computing model
CN106033428B (en) Uniform resource locator selection method and uniform resource locator selection device
CN108241725B (en)A kind of data hot statistics system and method
CN109842621A (en)A kind of method and terminal reducing token storage quantity
CN116126546B (en)Performance optimization method and device, electronic equipment and medium
CN111752941B (en)Data storage and access method and device, server and storage medium
CN114168084B (en)File merging method, file merging device, electronic equipment and storage medium
US10303687B2 (en)Concurrent processing of data sources
CN119166048A (en) Cross-file data access method, system and embedded device based on data aggregation

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
RJ01Rejection of invention patent application after publication

Application publication date:20191129

RJ01Rejection of invention patent application after publication

[8]ページ先頭

©2009-2025 Movatter.jp