Disclosure of Invention
The invention aims to provide an SQLite data recovery method and system, which can improve the applicability and quality of SQLite data recovery.
In a first aspect, the present invention provides an SQLite data recovery method, including the following steps:
s10, identifying data idle blocks from an SQLite database;
s20, obtaining the offset of each data idle block, and constructing an idle block list based on each offset;
and S30, performing data recovery analysis on each data free block based on the free block list, and recovering the data of the data area corresponding to the data free block based on the analysis result.
Further, the step S10 specifically includes:
the method comprises the steps of obtaining the number of columns of a data table stored in an SQLite database and the data type corresponding to each column of data, performing traversal scanning on the data table according to pages, and identifying idle data blocks.
Further, the identifying the data free block specifically includes:
and identifying the data idle blocks with non-all-zero data in the data table based on the idle block identification.
Further, the step S20 specifically includes:
paging a data table stored in an SQLite database based on page identification to obtain the offset of each data idle block;
and storing the offset of the first data free block in the second byte and the third byte of each page of the data table, respectively storing the offset of the next data free block in the first two bytes of each data free block, and setting the data of the first two bytes of the last data free block to be all zeros, thereby constructing a free block list.
Further, the step S30 specifically includes:
step S31, positioning each data idle block based on the offset carried by the idle block list, acquiring the length of the idle block based on the third byte and the fourth byte of the data idle block, and calculating the length of a data area based on the type of data carried by the data idle block and the length of the head of the data area;
step S32, judging whether the length of the idle block is equal to the length of the data area, if so, entering step S34; if not, go to step S33;
step S33, judging whether the length of the idle block is larger than the length of a data area, if so, indicating that the data idle block comprises a plurality of data areas, splitting the data idle block, and entering step S32; if not, the data area is rewritten, the data damage cannot be recovered, and the process is ended;
and step S34, carrying out assignment reconstruction on the data area based on the data type to finish data recovery.
In a second aspect, the present invention provides an SQLite data recovery system, including the following modules:
the data idle block identification module is used for identifying data idle blocks from the SQLite database;
a free block list building module, configured to obtain an offset of each data free block, and build a free block list based on each offset;
and the data recovery module is used for performing data recovery analysis on each data idle block based on the idle block list and recovering the data of the data area corresponding to the data idle block based on the analysis result.
Further, the data free block identification module is specifically configured to:
the method comprises the steps of obtaining the number of columns of a data table stored in an SQLite database and the data type corresponding to each column of data, performing traversal scanning on the data table according to pages, and identifying idle data blocks.
Further, the identifying the data free block specifically includes:
and identifying the data idle blocks with non-all-zero data in the data table based on the idle block identification.
Further, the free block list building module is specifically configured to:
paging a data table stored in an SQLite database based on page identification to obtain the offset of each data free block;
and storing the offset of the first data free block in the second byte and the third byte of each page of the data table, respectively storing the offset of the next data free block in the first two bytes of each data free block, and setting the data of the first two bytes of the last data free block to be all zeros, thereby constructing a free block list.
Further, the data recovery module specifically includes:
the idle block and data area length calculation unit is used for positioning each data idle block based on the offset carried by the idle block list, acquiring the length of the idle block based on the third byte and the fourth byte of the data idle block, and calculating the length of the data area based on the data type carried by the data idle block and the length of the head of the data area;
the first length checking unit is used for judging whether the length of the idle block is equal to the length of the data area or not, and if so, the first length checking unit enters the assignment reconstruction unit; if not, entering a second length checking unit;
the second length checking unit is used for judging whether the length of the idle block is greater than the length of the data area, if so, the idle block of the data comprises a plurality of data areas, and the idle block of the data enters the first length checking unit after being split; if not, the data area is rewritten, the data damage cannot be recovered, and the process is ended;
and the assignment reconstruction unit is used for carrying out assignment reconstruction on the data area based on the data type so as to finish data recovery.
The invention has the advantages that:
the method is simple and easy to implement, does not need to be based on transaction files and log files, is suitable for data which do not relate to transaction operation and an old version SQLite database, and finally greatly improves the applicability and quality of SQLite data recovery.
Detailed Description
The technical scheme in the embodiment of the application has the following general idea: the data of the data idle block in the initial state is all zero, and the deleted data area can be positioned by identifying the non-all-zero data idle block; the deleted data area still has data remained under the condition that the data is not duplicated, and because the data duplication condition possibly exists, the data area length is obtained by accumulating the data length corresponding to the data type, whether the data can be recovered and whether the data idle block needs to be split is judged by comparing the data area length with the idle block length, and the data area is assigned and reconstructed based on the data type to recover the data, and the recovery based on the transaction file and the log file is not needed, so that the applicability and the quality of the SQLite data recovery are improved.
Referring to fig. 1 to fig. 2, a preferred embodiment of the SQLite data recovery method according to the present invention includes the following steps:
s10, identifying data idle blocks from an SQLite database;
s20, obtaining the offset of each data idle block, and constructing an idle block list based on each offset; searching the free data block of each page of data table through the free block list;
and S30, performing data recovery analysis on each data free block based on the free block list, and recovering the data of the data area corresponding to the data free block based on the analysis result.
The step S10 specifically includes:
the method comprises the steps of obtaining the number of columns of a data table stored in an SQLite database and the data type corresponding to each column of data, sequentially performing traversal scanning on the data table according to pages, and identifying data idle blocks.
The identification data idle block specifically includes:
identifying data idle blocks with non-all-zero data in the data table based on idle block identification; the data of the data idle block in the initial state are all zeros, and the data idle block which is not all zeros is the deleted data area.
The deleted data area may contain one complete data line, multiple complete data lines, or partial data remaining after data duplication; the data recovery can be directly carried out when the data recovery device comprises a complete data line, the data recovery device comprises a plurality of complete data lines which need to be split firstly, and only the residual data cannot be recovered.
The step S20 is specifically:
paging a data table stored in an SQLite database based on page identification to obtain the offset of each data idle block; the page identifier is set in a first byte of a page data table, for example, the first byte is set to 0x0D as the page identifier;
and storing the offset of the first data idle block in the second byte and the third byte of each page of the data table, respectively storing the offset of the next data idle block in the first two bytes of each data idle block, and setting the data of the first two bytes of the last data idle block to be all zero, thereby constructing an idle block list. The first byte of each page of the data table is a page identifier, and the second byte and the third byte are offsets of a first data idle block, which is a page header format of the data table stored in the SQLite database.
The step S30 specifically includes:
step S31, positioning each data idle block based on the offset carried by the idle block list, acquiring the length of the idle block based on the third byte and the fourth byte of the data idle block, and calculating the length of a data area based on the type of data carried by the data idle block and the length of the head of the data area;
step S32, judging whether the length of the idle block is equal to the length of the data area, if so, entering step S34; if not, go to step S33;
step S33, judging whether the length of the idle block is larger than the length of a data area, if so, indicating that the data idle block comprises a plurality of data areas, splitting the data idle block, and entering step S32; if not, the data area is rewritten, the data damage cannot be recovered, and the process is ended;
and step S34, carrying out assignment reconstruction on the data area based on the data type to finish data recovery.
The format of the data area is as follows:
| data zone length | Line number | Data type 1 | … | Data type N | Data 1 | … | Data N |
When a row of data is deleted, the SQLite database updates the first four bytes of the data area, wherein the first two bytes are used for tracking a data idle block, and the last two bytes are the idle block length of the current data idle block; the data area contains information such as the length and the line number of the data area besides specific data, but when deletion occurs, the information is rewritten, so that the actual length of the data area can hardly be directly obtained.
The invention discloses a preferred embodiment of an SQLite data recovery system, which comprises the following modules:
the data idle block identification module is used for identifying data idle blocks from the SQLite database;
a free block list building module, configured to obtain an offset of each data free block, and build a free block list based on each offset; searching the free data block of each page of data table through the free block list;
and the data recovery module is used for performing data recovery analysis on each data idle block based on the idle block list and recovering the data of the data area corresponding to the data idle block based on the analysis result.
The data idle block identification module is specifically configured to:
the method comprises the steps of obtaining the number of columns of a data table stored in an SQLite database and the data type corresponding to each column of data, sequentially performing traversal scanning on the data table according to pages, and identifying data idle blocks.
The identification data idle block specifically comprises:
identifying data idle blocks with non-all-zero data in the data table based on idle block identification; the data of the data idle block in the initial state are all zeros, and the data idle block which is not all zeros is the deleted data area.
The deleted data area may contain one complete data line, multiple complete data lines, or partial data remaining after data duplication; the data recovery can be directly carried out when the data recovery device comprises a complete data line, the data recovery device comprises a plurality of complete data lines which need to be split firstly, and only the residual data cannot be recovered.
Further, the free block list building module is specifically configured to:
paging a data table stored in an SQLite database based on page identification to obtain the offset of each data free block; the page identifier is set in a first byte of a page data table, for example, the first byte is set to 0x0D as the page identifier;
and storing the offset of the first data free block in the second byte and the third byte of each page of the data table, respectively storing the offset of the next data free block in the first two bytes of each data free block, and setting the data of the first two bytes of the last data free block to be all zeros, thereby constructing a free block list. The first byte of each page of the data table is a page identifier, and the second byte and the third byte are offsets of a first data idle block, which is a page header format of the data table stored in the SQLite database.
The data recovery module specifically comprises:
the idle block and data area length calculation unit is used for positioning each data idle block based on the offset carried by the idle block list, acquiring the length of the idle block based on the third byte and the fourth byte of the data idle block, and calculating the length of the data area based on the data type carried by the data idle block and the length of the head of the data area;
the first length checking unit is used for judging whether the length of the idle block is equal to the length of the data area or not, and if so, entering the assignment reconstruction unit; if not, entering a second length checking unit;
the second length checking unit is used for judging whether the length of the idle block is greater than that of the data area, if so, the data idle block comprises a plurality of data areas, and the data idle block enters the first length checking unit after being split; if not, the data area is rewritten, the data damage cannot be recovered, and the process is ended;
and the assignment reconstruction unit is used for carrying out assignment reconstruction on the data area based on the data type so as to finish data recovery.
The format of the data area is as follows:
| data zone length | Line number | Data type 1 | … | Data type N | Data 1 | … | Data N |
When a row of data is deleted, the SQLite database updates the first four bytes of the data area, wherein the first two bytes are used for tracking a data idle block, and the last two bytes are the idle block length of the current data idle block; the data area contains information such as the length and the line number of the data area besides specific data, but when deletion occurs, the information is rewritten, so that the actual length of the data area can hardly be directly obtained.
In conclusion, the invention has the advantages that:
the method is simple and easy to implement, does not need to be based on transaction files and log files, is suitable for data which do not relate to transaction operation and an old version SQLite database, and finally greatly improves the applicability and quality of SQLite data recovery.
Although specific embodiments of the invention have been described above, it will be understood by those skilled in the art that the specific embodiments described are illustrative only and are not limiting upon the scope of the invention, and that equivalent modifications and variations can be made by those skilled in the art without departing from the spirit of the invention, which is to be limited only by the appended claims.