Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork29
Basic stand-alone disk-based N-way merge sort component for Java
License
cowtowncoder/java-merge-sort
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
This project implements basic disk-backed multi-way merge sort, with configurable input and output formats (i.e. not just textual sort).It should be useful for systems that process large amounts of data, as a simple building block for sort phases.
Checkoutproject wiki for more documentation, including Javadocs.
Library is licensed underApache License 2.0.
Version 1.1.0 (released on 2022-11-19) requires Java 8.
Earlier versions (1.0.2 and before) require Java 6.
Main class to interact with iscom.fasterxml.sort.Sorter, which needs to be constructed with four things:
- Configuration settings (default
SortConfigworks fine) DataReaderFactorywhich is used for creating readers for intermediate sort files (and input, if stream passed)DataWriterFactorywhich is used for creating writers for intermediate sort files (and results, if stream passed)Comparatorfor data items
An example of how this can be done can be found fromcom.fasterxml.sort.std.TextFileSorter.Basic implementations exist for line-based text input (in packagecom.fasterxml.sort.std), and additional implementations may be added: for example, a JSON data sorter could be implement as an extension module ofJackson.Fortunately implementing your own readers and writers is trivial.
With a Sorter instance, you can call one of two main sort methods:
publicvoidsort(InputStreamsource,OutputStreamdestination)publicbooleansort(DataReader<T>inputReader,DataWriter<T>resultWriter)
where former takes input as streams and uses configured reader/writer factories to constructDataReader for input andDataWriter for output; and latter just uses pre-constructed instances.
In addition to core sorting functionality,Sorter instance also gives access to progress information (it implementsSortingState interface with accessor methods).
A very simple example of sorting a text file using line-by-line comparison is:
TextSortersorter =newTextFileSorter(newSortConfig().withMaxMemoryUsage(20 *1000 *1000));sorter.sort(newFileInputStream("input.txt"),newFileOutputStream("output.txt"));
which would read text from file "input.txt", sort using about 20 megs of heap (note: estimates for memory usage are rough), use temporary files if necessary (i.e. for small files it's just in-memoryu sort, for bigger real merge sort), and write output as file "output.txt".
Project jar is packaged such that it can be used as a primitive 'sort' tool like so:
java -jarjava-merge-sort-1.1.0.jar [input-file]
where sorted output gets printed tostdout; and argument is optional (if missing, reads input from stdout).(implementation note: this uses standardTextFileSorter mentioned above)
Format is assumed to be basic text lines, similar to unixsort, and sorting order basic byte sorting (which works for most common encodings).
Here are some external links:
- Sorting large data sets (includes example for sorting JSON files)
To access source, just cloneproject
About
Basic stand-alone disk-based N-way merge sort component for Java
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Sponsor this project
Uh oh!
There was an error while loading.Please reload this page.
Packages0
Uh oh!
There was an error while loading.Please reload this page.
Contributors7
Uh oh!
There was an error while loading.Please reload this page.