A similar algorithm has been already published in Russian onmy site.
Introduction
Sooner or later, any programmer runs into the necessity of the obvious use of one or the other method of sorting. Not looking at that, this theme is well investigated and there are many wonderful sorting algorithms, there are some nuances at their practical use. At first, it is a problem of choice. Indeed, how to choose the best, especially as clear recommendations are not present. As the saying goes, every algorithm is good in its own fashion. Secondly, there is always a problem of realization as it applies to a practical task. Possibly, the experience of the author in this area can appear useful to someone; therefore, we will describe our researches and practical results.
Although usually choosing a concrete algorithm of sorting is an arduous duty, it is fully possible to be determined with a class of algorithms, depending on a given task. In my case, this was the necessity of the independent sorting ofstring
data from the set column of tabular database (concretelydbf file) in an external file buffer, i.e., sorting on determination must be external because a source file can be large enough (gigabyte and more) and in addition sorting can be carried out for a few columns simultaneously. A sorting purpose is a creation of index file for one or a few the fields of data. Here, we will not discuss whether it is needed or not needed to do the similar sorting, when there exists a great number of software products, especially oriented to work with databases, and are already doing this well enough. There are cases, when it is not desirable without an absolute necessity to attract external programs for your own programs which are not very difficult. Therefore, we will make an effort to decide on this task in the indicated formulation.
So, we were determined, that we were interested by an algorithm of external sort. As we were not going to modify the source of data, for sorting of one data column, we may have needed two file buffers, their sizes being equal to the amount of data multiplied into the length of field of data. Really, we will yet interest to the row number of everystring
data field for the purpose of indexing. As forWin32, the size of long integer equals32 bits or4 bytes, we need be increase the field length in bytes yet on4 bytes.
It is known that theMerge Sort method is comfortable for the general sorting of a few long ordered subsequences. This method chooses a maximal or minimum value consistently from data queues and writes it down in the sought-for sequence. If basic data are not presented as a fewOrdered SubSequences which we will nameOSS too, suchOSS should be got, that is be split by any way. Let us say, the relatively small sequence of casual data are sorting out in memory by anything comfortable for these aims algorithm, for example, by the quick sort algorithm. It is clear that instead of merger a fewOSS we can limit by any two of them, when one will not be a uniqueOSS.
It is obvious that there are a lot of variants of Split functions and on this theme are published enough information. Thus, an external sort is not limited to only the Merge Sort method; however, we will not use other methods, because we are fully arranged by the base class of algorithms of external sort examined here.
So, we need to be determined with the choice of aSplit
function and amount ofOSS simultaneously in-use for theMerge
function. The second question decides easily, we will simply choose a pair wise merge technique, because it is very comfortable for realization. Remainder will be determined with aSplit
function.
TheSplit
function can assort in memory relatively small data subsequences or in some way to extractOSS from the given sequence. We at once renounce the additional sorting in memory, at first, it complicates a general algorithm, and secondly, this is more substantial, already existent ordered subsequences (OSS) are lying idle. It is logical to name methods of extraction in natural way already ordered, subsequences from the given data sequence asNatural Split, and the method of sorting asNatural Merge Sort. Similar terms indeed exist, so in literature there are the followings types of theNatural Split andMerge:
- The initial data sequence is examined as a great number ofOSS of length 1 byte, which are ordered on definition. TwoOSS of length1 are joined in one two, in the second buffer, double two in a four into the first buffer (or into the source file, if this change is admitted) andso alternately.
- The initial data sequence fissions on successive series, i.e. non-decreasingOSS. At confluence at the first series’ distributing of initialA file is executed alternately into temporary files ofB andC, and then confluence ofB andC files inA file etcetera while there will not beone series.
- The initial data sequence fissions on series simultaneously with beginning and ending with these series at once unite into the first buffer alternately in its beginning (in direct order) and ending (in reverse order). Then this buffer becomes by the data source and result is directed into the second buffer (or into the source file, if its change is admitted). Then the first and second buffers switch places et cetera. The name of this algorithm isNatural Two-Way Sort Merge.
Obvious lacks of these algorithms are that the first case is not taken into account by the already being order of data, and the second and third does not take into account decreasingOSS. In addition, three buffers are actually used in second case, if we set a condition do not change source of data, while it is easily possible to treat two, as in the first and third cases.
I have not found in literature the fourth variant of the «natural splitting and merging» of initial data described below, although it does not mean, that it is not yet known. In this case, it is easy to «reopen» this algorithm, than to find it. - The initial sequence of file data is split inOSS of random order (both non-decreasing and decreasing). Then all ofOSS in pairs (for various pair, including accrued) is merged into the proper file buffer, into proper place.
To do the similar splitting only one time and not take under the data structures ofSplit function, the array of data proportional the size of initial file; we will at once use derivable information. In this case, we may need the additional array of structures of the fixed size. For Win32, the maximal dimension of array will be equal to32. Non-decreasingOSS merge from beginning, and decreasing from the end into any free file buffer. If oneOSS is into the first buffer, and second into the second, before their merging theOSS from the first buffer simply are copied into the second, and then twoOSS of the second buffer meet into the first buffer in the proper positions. Obviously, only two buffers are enough, and if we do not want to do the operation of copying, three. However, we prefer the transfer of data block to the third buffer, for the sake of economy of disk space.
Realization of this algorithm is enough difficult and depends on substantial appearance from binary decomposition of number of ordered subsequences. An algorithm disintegrates in two parts.The first part does even (various pair wise) associations andthe second one merges odd («hangings up»)OSS. Information about oddOSS are kept in the array of structures mentioned above by a dimension32. We are interested by those elements of the array whose indexes coincide with the degrees of binary decomposition of this number ofOSS. Naturally, if amount of all initialOSS is the degree of two, we need not do the second part of algorithm. For the detailed illustration of the algorithm, we demonstrate many pictures.
Advantage of the indicated algorithm is that it uses the present partial order of data on a maximum. Moreover, even in the case of the absolute evenly distributed casual data sequence, middle lengthL0 of theirOSS are more than2, because any two comparable values always form anOSS. In the special case of «zigzag» data sequence of arbitrary even length, middle lengthL0 of theOSS is equal in exactness to two. For example, for any even sequence of zeros and ones:
{0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1, . . .} : L0 = Lmin = 2.
Actually, this sequence possesses the minimum order for the aims of our algorithm. However, our advantage is one passageway forSplit
function, fixed size of data for its parameters and presence only of two file buffers (taking into account invariability of initial data).
However, let us go back to the case of the evenly distributed casual data sequence. What middle lengthL0 (determined as relation of total length of sequence per the amount of itsOSS) is? Clearly, that in this case:
L0 > Lmin = 2
and therefore we can use this additional order for more quick sort. It maybe is already algorithmic advantage.
A task of the calculation of middle lengthL0 ofOSS for the evenly distributed casual data sequence does not seem very difficult, however although we tried hard, we did not find the answer in the Internet. It was therefore necessary to perplex a Russian forum of mathematicians and after the joint discussion of the problem (in the theme:Distributing of order into all transpositions where I am under nicknameScholium) we were able to calculate this value:
L0 = 2e – 3 = 2.436563656918 .
It is interesting, is not it? Any casual sequence possesses partial order more minimum! The order of real data is always higher than absolutely evenly distributing of casual values. That is why we are especially attracted by this algorithm.
Now, we go directly to description of our algorithm.
Description of the Algorithm
It consists of two basic operations.
- Splitting given sorted sequenceS on set of locally arranged (by increase (not to decrease) or by decrease) subsequences{S1, S2, S3, . . . , SN} where all subsequences, except, perhaps, last contain more than one record (key). Arranged subsequenceSn for brevity we will nameOSSSn.
- Merging adjacent pairs ofOSS of one level in one (from left to right or from top to down), taking into account an order of streamlining of these subsequences, i.e., the decreasing sequence merges from «end», and increasing with «beginning».
These two operations repeat themselves until all sequences ofS do not consist of one only subsequence, i.e., while it is not arranged.
The example of usage of the given sort algorithm is shown in fig. 1. The first line is the given (invariable) sequence, last is required (sorted) one.

Fig. 1. A sorting example by «Natural Merge» for an invariable source. The order of operations merging / splitting is specified by red digits.
Simple Algorithm of External Sort by «Natural Merge»
Let it be given external (file) source ofOSSS0 and enoughM of external (file) buffers{S1, . . . , SM} into the necessary size. SourceS0 should not vary, and buffers{S1, . . . , SM} can change of their contents. It is required to receive sorted sourceS0 in some buffer, using paired merging of the arranged subsequences.
- If source ofOSSS0 is empty, the algorithm is ended (there is no data).
- If sourceS0 contains only oneOSS then it is required, taking into account a sort order of thisOSS. The algorithm is ended.
- If quantity ofOSS of a source is more than one, thenS0 mutually (it is defined by functionSplit) merges (functionMerge) intoS1,S1 intoS2 etc., while intoSM does not remain uniqueOSS. If in flowingSk (1 <= k < M) is available not mergedOSS (at odd quantity of allOSS of the given levelk) it is simply copied in the nearest bufferSM where also contains not mergedOSS, on its place (K > k) provided thatk andK have different parity. Then again derivated pairOSS merges. Ifk andK have identical parity, i.e. actually are in one buffer then they merge at once, without copying. And so before reception unique finiteOSS. The algorithm is ended.
Optimization of Simple Algorithm
It is easy to see that theM number can be reduced to two. Besides, last oddOSS there is no necessity to copy some times before its merging, it is enough to do no more than once. The truth in this case maybe merging from two different sources is required (fig. 1) in the free buffer that will a little complicated functionMerge
. FunctionSplit
needs to be applied no more an once and to use its results at once, not to store the superfluous data.
- We select current adjacent pair ofOSS at pass on all required records of the source. The data oddOSS is saved in a zero array cell of structures of this data.
- The current data evenOSS is merged with the saved data previous oddOSS by means of function
Merge
in the appropriate buffer. - If in the merging buffer given mergedOSS is odd, then it saved its parameters in a following array cell ofOSS data structures, else (for evenOSS) it is merged into other buffer.
The Description of Function Split
FunctionSplit
, which is fulfilled only once (in one pass), should define the beginning and the end of a current subsequence, its length, a sort order (1 for a non-decreasing subsequence,–1 for decreasing one and0 for theOSS consisting only of one element, closing a current data series), and also addresses of sources ofOSS and the target buffer, the parameters characterizing data proper, etc.
To do that, it is enough to have quantity of the elements more than two, i.e. three and more. If elements are less than three they already derivate oneOSS, its parameters are defined trivially. For more quantities of elements we define an initial order, comparing the first two ones. Then consistently we compare the second and third, third and fourth elements and so on until their order coincides with the initial. As soon as the order has varied, at once we receive boundary nextOSS, the data about which can be used into functionMerge
. For this purpose, the information about oddOSS we save in the set array of structures with an element0 <= k < M, whereM is dimension of a long-integer in operating system. ForWin32 M is equal to32.
Further, as functionSplit
is used for us only once that we are not selecting it in separate function, and here functionMerge
we do.
The Description of Function Merge
Here, we use classical implementation of this function, i.e. as already it was mentioned above, from queues of the data we will consistently select (with removal from the queue) maximum or minimum value and to write it in the required sequence.
Function which controlSplit andMerge for us we nameNaturalMergeSort.
The Description of Function NaturalMergeSort
This function consists of two main bodies. The first handles all evenOSS, in process of their appearance, and the second all remained odd.
Let's show and tell in the following figures.The first part of algorithm deals with numberOSS being a maximum degree of two not exceeding initial quantity of theseOSS (fig. 2).

Fig. 2. A sorting example of «Natural Merge» for an invariable source at quantityOSS equal to a degree of two. The order of operations of splitting / merging is specified by red digits.
Let's consider further that multi-colored rectangles contain various arranged subsequences of the data.The second part of algorithm deals remainedOSS after performance of the first part of algorithm (fig. 1). It is natural, if quantityOSS is equal to a degree of two the second part is not required. On fig. 3-7 the second part of algorithm is shown for various values of quantityOSS.

Fig. 3. A sorting example of «Natural Merge» for an invariable source at quantityOSS equal3.

Fig. 4. A sorting example of «Natural Merge» for an invariable source at quantityOSS equal5.

Fig. 5. A sorting example of «Natural Merge» for an invariable source at quantityOSS equal6.

Fig. 6. A sorting example of «Natural Merge» for an invariable source at quantityOSS equal7.

Fig. 7. A sorting example of «Natural Merge» for an invariable source at quantityOSS equal99.
Now, when the idea of our algorithm is clear, it is possible to pass to its program implementation. As we write a working variant of the program, we will research the real data source. It is enough for our future purposes if it is a binary file in which there is a block of thestring
data (fig. 8).

Fig. 8. Structure of the block of the string data in the binary data source.
Similar structure has, for example,dbf files. Therefore, we give test fileSort.dbf. It contains some fields and360 records. View of this file is shown in fig. 9.

Fig. 9. Contents of test fileSort.dbf.
Parameters of all data fields contain in structureg_DataBlock
and are presented in the test program.Name
field is current, the others are commented out. Certainly, nothing prevents us from calculating these parameters automatically, but it already does not concern the sorting purposes. However, this possibility is accessible in a program code of article:Creation and memory mapping existing DBF files as alternative of data serialization during work with modified CListCtrl classes in virtual mode on dialogs of MDI application.
As a result of execution of the program from some field of a testdbf file, appropriate records will be extracted and sorted in a temporary file. Besides data proper, the numbers of positions (strings or rows) of the given data field in a current data block will be still presented. These numbers of positions are counted from zero. As numbers of positions are stored as binary data besides a test procedure representing this data inASCII format will be still started. Both spool files are formed with casual names in the current directory, which will be shown in the appropriate message (fig. 10). Generally, these time files can be to further removal, but we leave them for the demonstration purposes.

Fig. 10. Result of execution of the sort program.
For our convenience, we use technologyMemory Mapped Files (MMF)into functionSortFileDataBlock
, i.e. files map into memory. Though in this case it is not essential and also can be easily removed, by insignificant complicating of a code. However owing to efficiency and convenience ofMMF, we leave it for demonstration purposes.
Now it is possible to pass directly to code of the main function of sorting by natural merging with the non-decreasing and decreasing arranged subsequences.
Code of the Main Function NaturalMergeSort
////////////////////////////////////////////////////////////////////////// NaturalMergeSort////////////////////////////////////////////////////////////////////////BOOL NaturalMergeSort(NMS *pNms) {//*** Natural Merge Sort structure TCHAR *szFileBuf1 = pNms->szFileBuf1;// Name of first file// buffer TCHAR *szFileBuf2 = pNms->szFileBuf2;// Name of second file// buffer HANDLE hSrcFile = pNms->hSrcFile;// Handle of data source file HANDLE hFileBuf1 = pNms->hFileBuf1;// Handle of first temporary// file buffer HANDLE hFileBuf2 = pNms->hFileBuf2;// Handle of second temporary// file buffer BYTE *pSrcFile = pNms->pSrcFile;// Maps view of data source file BYTE *pFileBufView1 = pNms->pFileBufView1;// Maps view of first file// buffer BYTE *pFileBufView2 = pNms->pFileBufView2;// Maps view of second// file buffer//*** String data block structure ULONG nHdrSize = g_DataBlock.nHdrSize;// Header size of data source ULONG nRecSize = g_DataBlock.nRecSize;// Record size of data source UINT nFldOff = g_DataBlock.nFldOff;// Offset of data field in record// of source UINT nFldLen = g_DataBlock.nFldLen;// Length of data field ULONG nRows = g_DataBlock.nRows;// Rows count of data source HANDLE hOut = NULL;// Handle of destination file buffer of OSS BYTE *pOut = NULL;// Pointer of destination file buffer of OSS OSS Oss = {0};// OSS`s structure DWORD nRecNo =0;// Reading «buffer» of record number DWORD dwBytes =0;// Number of written bytes ULONG nLineInd =0; ULONG nRow =0;//*** Sorts ordered subsequences (OSS)if(nRows>2) {int j =0; ULONG nNextRow =0; ULONG nOldNextRow =0; ULONG nInRow =0; ULONG nOldRow =0; ULONG nOldRows =0;int nOldSortSign =0;int nLastSortSign =0; ULONG nOutRow =0; ULONG nOldOutRow =0; ULONG nOSS =0;// Number of OSS ULONG nOSSBegin =0;// Begin of OSS ULONG nOSSEnd =0;// End of OSS ULONG nOSSRows =0;// Rows count of OSSint nSortSign1 =0;// Sign of sorting of first OSS of sourceint nSortSign2 =0;// Sign of sorting of second OSS of source CString *psFldVal1 = NULL; CString *psFldVal2 = NULL; BOOL bFirstRec = TRUE; INROWS aInRows[32] = {0};// Array of INROWS`s structures for OSS Oss.nFldLen = nFldLen;//*** Merges even couples of ordered subsequences (OSS)for(nRow =0; nRow<= nRows; nRow++) { nLineInd = nRow*nRecSize + nFldOff;try {if(nRow< nRows) {//*** Copies (nRow, nCol) field value of length nFldLen// We do this as it hasn`t NULL terminatorif(bFirstRec) { psFldVal1 =new CString((LPCSTR) &pSrcFile[nLineInd], nFldLen); bFirstRec = FALSE;continue; }else { psFldVal2 =new CString((LPCSTR) &pSrcFile[nLineInd], nFldLen); }//*** Formats date string into German style/* if(m_pDoc->m_acFldType[nCol] == 68) { // = 0x44 ("D") - Date CString sDate = sFldVal.Right(2) + _T(". ") + sFldVal.Mid(4, 2) + _T(".") + sFldVal.Left(4); sFldVal = sDate; } *///*** Compares two stringsif(!Compare(psFldVal1, psFldVal2, &nSortSign2)) {//_M("CreateIndexFile : Error Compare!");return FALSE; } }else nOSS = nOSS;//*** Defines the end of ordered subsequence (OSS)if((nSortSign1 !=0 && nSortSign1 != nSortSign2) || nRow == nRows) { nOSS++;// The number of ordered subsequences (OSS) nOSSEnd = nRow -1;// The end of OSS nOSSRows = nOSSEnd - nOSSBegin +1;// The len of OSS nLastSortSign = nSortSign1; nInRow = (nSortSign1 ==1) ? nOSSBegin : nOSSEnd;if(nOSS%2) {// Odd OSS//*** Saves odd OSS parameters aInRows[0].nInRow = nInRow; aInRows[0].nInRows = nOSSRows; nOldSortSign = nSortSign1; }else {// Even OSS// log(nOSS)/log(2) < 32for(j =0; j<= (int)(log(nOSS)/log(2)); j++) {// nOSS % 2^(j+1) = 0if(nOSS % (ULONG) pow(2, j+1) ==0) { nOutRow = (nOSS == pow(2, j+1)) ?0 : nOutRow;if(j ==0) { Oss.hIn1 = NULL; Oss.pIn1 = pSrcFile; Oss.nRecSize1 = nRecSize; Oss.nRecSize2 = nRecSize; Oss.nFldOff1 = nFldOff; Oss.nFldOff2 = nFldOff;//Oss.nFldLen = nFldLen; }else { Oss.hIn1 = (j%2 ==0) ? hFileBuf2 : hFileBuf1; Oss.pIn1 = (j%2 ==0) ? pFileBufView2 : pFileBufView1; Oss.nRecSize1 = nFldLen +sizeof(nRecNo); Oss.nRecSize2 = Oss.nRecSize1; Oss.nFldOff1 =0; Oss.nFldOff2 =0;//Oss.nFldLen = nFldLen + sizeof(nRecNo); }//Oss.nFldLen = nFldLen; Oss.hIn2 = Oss.hIn1; Oss.pIn2 = Oss.pIn1; Oss.hOut = (j%2 ==0) ? hFileBuf1 : hFileBuf2; Oss.pOut = (j%2 ==0) ? pFileBufView1 : pFileBufView2; nInRow = (j>0) ? nOldRow : nInRow; nSortSign1 = (j>0) ?1 : nSortSign1; nOldSortSign = (j>0) ?1 : nOldSortSign; nOutRow = (j>0) ? nInRow : nOutRow; nOldOutRow = nOutRow; Oss.nInRow1 = aInRows[j].nInRow; Oss.nInRow2 = nInRow; Oss.nInRows1 = aInRows[j].nInRows; Oss.nInRows2 = (j>0) ? nOldRows : nOSSRows; Oss.nInSS1 = nOldSortSign; Oss.nInSS2 = nSortSign1; Oss.nOutRow = nOutRow;//*** Merges two OSS into oneif(!Merge(&Oss)) {// Pointer to OSS structure//_M("CreateIndexFile : Error of merger of two OSS!");return FALSE; } nOutRow = Oss.nOutRow;//*** Saves even OSS parameters nOldRow = aInRows[j+1].nInRow; nOldRows = aInRows[j+1].nInRows; nOldSortSign = nSortSign1; aInRows[j+1].nInRow = nOldOutRow;//aInRows[j+1].nInRows = aInRows[j].nInRows + nOSSRows; aInRows[j+1].nInRows = Oss.nInRows1 + Oss.nInRows2; nSortSign1 =1; } } } nOSSBegin = nRow;// The begin of ordered subsequences (OSS) nSortSign1 =0; }else nSortSign1 = nSortSign2; psFldVal1 = psFldVal2; }catch(...) { _M("CreateIndexFile : Error exception!");//*** Forces to exit from the application as else will be a lot// messages exit(-1); } }//*** Handle of destination file buffer of OSS hOut = (nOSS ==1) ? NULL : (nOSS%2 ==0) ? hFileBuf1 : hFileBuf2;//*** Pointer of destination file buffer of OSS pOut = (nOSS ==1) ? pSrcFile : (nOSS%2 ==0) ? pFileBufView1 : pFileBufView2;//*** Merges the rest (odd couples) of ordered subsequences (OSS)if(nOSS> pow(2, --j)) {// nOSS != 2^j && nOSS > 2//*** As minimum two natural numbers exist so that// nOSS = 2^n1 + 2^n2 (n1 < n2)int n1 =0;int n2 =0; Oss.nInRow1 =0; Oss.nInRows1 =0;//j = 0;//*** Finds n1int i =0; nOldRow =0; nOldRows =0; ULONG nOSS1 = nOSS;while(nOSS1 %2 ==0) { nOSS1 /=2; i++; } n1 = i;// Must be n1 >= 0! ULONG nOSS0 = nOSS - (ULONG) pow(2, n1); ULONG nOSS2 =0;//while(nOSS > (ULONG) pow(2, n1) && j++ < 32) { // For Win32while(nOSS> (ULONG) pow(2, n1)) {//*** Finds n2 i =0; nOSS2 = nOSS0;while(nOSS2 %2 ==0) { nOSS2 /=2; i++; } n2 = i;// Must be n2 > 0 as n2 > n1 >= 0! Oss.hIn1 = (n1 ==0) ? NULL : (n1%2 ==0) ? hFileBuf2 : hFileBuf1; Oss.pIn1 = (n1 ==0) ? pSrcFile : (n1%2 ==0) ? pFileBufView2 : pFileBufView1; Oss.hIn2 = (n2%2 ==0) ? hFileBuf2 : hFileBuf1; Oss.pIn2 = (n2%2 ==0) ? pFileBufView2 : pFileBufView1; Oss.hOut = (n2%2 ==0) ? hFileBuf1 : hFileBuf2; Oss.pOut = (n2%2 ==0) ? pFileBufView1 : pFileBufView2; ULONG nRecSize = nFldLen +sizeof(nRecNo);//*** g_DataBlock.nRecSize - Record size of data source Oss.nRecSize1 = (n1 ==0) ? g_DataBlock.nRecSize : nRecSize; Oss.nRecSize2 = nRecSize; Oss.nFldOff1 = (n1 ==0) ? nFldOff :0; Oss.nFldOff2 =0;//*** Checks evenness n1 and n2if(Oss.pOut == Oss.pIn1) { ULONG nOff = (n1< n2) ? aInRows[n1].nInRow : nOldRow; nOff *= nRecSize; ULONG nSize = (n1< n2) ? aInRows[n1].nInRows : nOldRows; nSize *= nRecSize;//*** Simply copies OSS from pIn1 into pIn2 CopyMemory( Oss.pIn2 + nOff,// Pointer to address of copy destination Oss.pIn1 + nOff,// Pointer to address of block to copy nSize// Size, in bytes, of block to copy ); Oss.hIn1 = Oss.hIn2; Oss.pIn1 = Oss.pIn2; } Oss.nInRow1 = aInRows[n1].nInRow; Oss.nInRow2 = (n1< n2) ? aInRows[n2].nInRow : nOldRow; Oss.nInRows1 = aInRows[n1].nInRows; Oss.nInRows2 = (n1< n2) ? aInRows[n2].nInRows : nOldRows; Oss.nInSS1 = (n1>0) ?1 : nLastSortSign; Oss.nInSS2 =1; Oss.nOutRow = Oss.nInRow2;//*** Merges two OSS into oneif(!Merge(&Oss)) {// Pointer to OSS structure//_M("CreateIndexFile : Error of merger of two OSS!");return FALSE; } n1 = n2 +1;//*** Saves old aInRows[n1] structure nOldRow = aInRows[n1].nInRow; nOldRows = aInRows[n1].nInRows;//*** Changes aInRows[n1] structure aInRows[n1].nInRow = Oss.nInRow2; aInRows[n1].nInRows = Oss.nInRows1 + Oss.nInRows2; nOSS0 = nOSS0 - (ULONG) pow(2, n2); } } }//*** TEST//* BYTE acCRLF[2] = {13,10}; HANDLE hIn = Oss.hOut;// Handle of source file buffer of index dataif(hIn) {// nOSS > 1//*** Handle of destination file buffer of index data HANDLE hOut = (hIn == hFileBuf1) ? hFileBuf2 : hFileBuf1;//*** Moves file pointer of current file bufferif(SetFilePointer(hIn,0, NULL, FILE_BEGIN) == (DWORD) -1) { _M("CMainDoc:CreateIndexFile : Failed to call SetFilePointer function!");return FALSE; }//*** Moves file pointer of current file bufferif(SetFilePointer(hOut,0, NULL, FILE_BEGIN) == (DWORD) -1) { _M("CMainDoc:CreateIndexFile : Failed to call SetFilePointer function!");return FALSE; }//*** Merges even couples of ordered subsequences (OSS)for(ULONG i =0; i< nRows; i++) { BYTE *acStr =new BYTE[nFldLen];// Reading buffer of field datachar acNo[2*sizeof(ULONG)];// Writing buffer of record number//*** Reads pcStr into first temporary file bufferif(!ReadFile(hIn, acStr, nFldLen, &dwBytes, NULL)) { _M("NaturalMergeSort : Failed to call ReadFile function for acStr!");return FALSE; }//*** Reads nRecNo into current temporary file bufferif(!ReadFile(hIn, &nRecNo,sizeof(nRecNo), &dwBytes, NULL)) { _M("NaturalMergeSort : Failed to call ReadFile function for &nRecNo!");return FALSE; }//*** Writes pcStr into first temporary file bufferif(!WriteFile(hOut, acStr, nFldLen, &dwBytes, NULL)) { _M("NaturalMergeSort : Failed to call WriteFile function for acStr!");return FALSE; }//*** Converts DWORD to char array sprintf(acNo,"%8d", nRecNo);// 8 = 2*sizeof(ULONG)//*** Writes pcStr into first temporary file bufferif(!WriteFile(hOut, acNo,sizeof(acNo), &dwBytes, NULL)) { _M("NaturalMergeSort : Failed to call WriteFile function for acNo! ");return FALSE; }//*** Writes pcStr into first temporary file bufferif(!WriteFile(hOut, acCRLF,sizeof(acCRLF), &dwBytes, NULL)) { _M("NaturalMergeSort : Failed to call WriteFile function for acCRLF! ");return FALSE; } } }else {// nOSS == 1 hOut = hFileBuf1;// Handle of destination file buffer of index// data;char acNo[2*sizeof(ULONG)];// Writing buffer of record numberfor(nRow =0; nRow< nRows; nRow++) { nLineInd = nRow*nRecSize + nFldOff;//*** Writes pcStr into first temporary file bufferif(!WriteFile(hOut, &pSrcFile[nLineInd], nFldLen, &dwBytes, NULL)) { _M("NaturalMergeSort : Failed to call WriteFile function for &pSrcFile[nLineInd]!");return FALSE; }//*** Converts DWORD to char array sprintf(acNo,"%8d", nRow);// 8 = 2*sizeof(ULONG)//*** Writes pcStr into first temporary file bufferif(!WriteFile(hOut, acNo,sizeof(acNo), &dwBytes, NULL)) { _M("NaturalMergeSort : Failed to call WriteFile function for nRow!");return FALSE; }//*** Writes pcStr into first temporary file bufferif(!WriteFile(hOut, acCRLF,sizeof(acCRLF), &dwBytes, NULL)) { _M("NaturalMergeSort : Failed to call WriteFile function for acCRLF!");return FALSE; } } }//*//*** TEST sprintf( g_szStr, _T("NaturalMergeSort : See out files: `%s` and `%s`!"), szFileBuf1, szFileBuf2 ); _M(g_szStr);return TRUE;}// NaturalMergeSort
Let's notice that at attempt of sorting of the first data block (IndNo
field) which is already sorted, our functionNaturalMergeSort
will define that in this case we have an uniqueOSS and it is not necessary to sort anything. For output of results, the source will be used directly, instead of a spool file.