tzaeschke/TinSpinPublic

NotificationsYou must be signed in to change notification settings
Fork1
Star26

Testing in-memory spatial indexes

License

Apache-2.0 license

26 stars 1 fork Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 130 Commits
doc		doc
src		src
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
pom.xml		pom.xml

Repository files navigation

This is the repository of the TinSpin benchmarkframework.Click here for the TinSpin spatial index collection.

TinSpin

TinSpin is a framework for benchmarking in-memory spatial indexes.

TinSpin provides several dataset generators for point data and rectangle data. The datasets can be scaled with size anddimensionality. Each index can be tested with various loads, such as insertion, window queries, exact match queries,nearest neighbor queries, updates (moving objects) and removal.

The framework was originally developed at ETH Zurich in the GlobIS group. It is now maintained by Tilmann Zäschke.

If you want to reference this project, please consider referencing thePH-Tree instead, asTinSpin was originally developed to benchmark the PH-Tree and because there is currently no TinSpin publication:ThePH-tree: A Space-Efficient Storage Structure and Multi-Dimensional Index, T. Zäschke, C. Zimmerli and M.C. Norrie, Proc.of the 2014 ACM SIGMOD Intl. Conf. on Management of Data.

Changelog

2023-07:

Updated to new tinspin-indexes 2.0.0 with new API
Indexes now all hold aunique identifier (int) as value instead of asingleton Object.
Added support for multimap tests. Note: this allows testing of multimap, however, the generated test data does currentlynot contain duplicates.

2018-12:

Major refactoring with modularization, new Maven module 'tinspin-common'
Added newExample for custom testing
Added (very basic!) import for HDF5 files.
Testing: Updated Window Query generator again; Use result count only from first run, for comparability; Updatedtimings to use nanoSecs internally
CHANGE: WQ timings now return time/query instead of time/result

2018-07:

New logging output with ops/sec instead of time/op
Operations without side effects (window query, point query, kNN search) are repeatedly executed until a minimum timehas passed (in order to avoid problems with warm-up)

Overview

TheTestManager allows running many tests in one session. Output files are written to subfolders,seebelow.

TheTestRunner can run individual tests, all output is written to console. This is mostly useful for debugging. Themeaning of the output is describedbelow, the second row uses the new TinSpin 1.0 format(the first row uses the old TinSpin 0.0.1 format).

TheExample.java class demonstrates how to test you own index.

Thedata folder contains plugins for importing data from various formats.

Thedb folder contains a "cache" that can be used for file formats that are slow to import. It allows importing to aDB file (binary format) which is faster to read than text/xml files such as OSM.

Thewrappers folder contains wrappers for various index implementation and different data types: point data, rectangledata.*MM* wrappers support multimap indexes.Note that generated data does NOT contain duplicates, i.e. thegenerators generate at most one entry for each coordinate (point) or pair of coordinates (rectangle).

Results

Some results can be found in the doc folder.

Visualized results from a complete run with 2D and 3D data can befoundhere (PDF).

The filebenchmark-high-dim-2018-12.ods contains benchmarks with high-dimensionaldatasets, such as GloVe-25, GloVe-50, SIFT-128, NYTimes-256 and MNIST-784 (seeann-benchmark).

Output File Format

The test data is written to tab-separated value files in target/logs.

Folders

Log files from tests are written to several possible folders. The default folders are:

dimsP: Point data scaled with dimensionality
dimsR: Rectangle data scaled with dimensionality
sizeP: Point data scaled with dataset size
sizeR: Rectangle data scaled with dataset size
sizePWQS: Point data scaled with size of query window
sizeRWQS: Rectangle data scaled with size of query window

File Sections

column names:Index data ...
comment:% Averages
comment:% ========
averaged results:RSZ-R-AVG-3/3 ... (Index=RSZ, datatype=Rectangle, average of 3 successful runs of 3 initiatedruns
comment:% Measurements
comment:% ============
results:RSZ-R-0CLUSTER(5.0,0.0,null) ... (Index=RSZ, datatype=Rectangle, random seed=0 (equals test run ID)
results new:RSZ-R-M-0CLUSTER(5.0,0.0,null) ... (Index=RSZ, datatype=Rectangle, map/multimap=Map, random seed=0 (equals test run ID)

By default, TinSpin averages three consecutive test runs into one average.

File Columns in TinSpin 1.x

Columns with general information.An example spreadsheet file for interpreting and visualizing results can befoundhere, e.g. 2nd sheet.

Index: Index and test descriptor, such asRSZ-R-M for rectangle index orPHT-P-MMfor a point multimap index.
data: Test data descriptor, Such asCUBE(4, 1.0, 0.0, null), see above
dim: number of data dimensions
bits: number of bits (deprecated), always 64
N: dataset size (number of points or rectangles)
memory: Total measured JVM memory [bytes]
memory/n: Total measured JVM memory [bytes per entry]

Columns with timing. Most parts of the test are executed in two runs or more, each run consisting of a predefinednumber ofexecution. For example, each exact match run consists of 100,000 exact match queries as defined in theTestStatsclass.Except for load/unload, runs are repeated until at least 2 seconds (default) have passed, this is in order to give moreprecise timings for very short runs.

gen: time [ms] for dataset generation
load/s: adding entries throughput [entries/s]
wq1/s: window query throughput run #1 [queries/s]
wq2/s: window query throughput run #2 [queries/s]
pq1/s: exact match query throughput run #1 [queries/s] (was called point query)
pq2/s: exact match query throughput run #2 [queries/s]
1-NN1/s: 1 nearest neighbor query throughput run #1 [queries/s]
1-NN2/s: 1 nearest neighbor query throughput run #2 [queries/s]
10-NN1/s: 10 nearest neighbor query throughput run #1 [queries/s]
10-NN2/s: 10 nearest neighbor query throughput run #2 [queries/s]
up1/s: position update throughput run #1 [updates/s]
up2/s: position update throughput run #2 [updates/s]
unload/s: removing entries throughput [entries/s]

Columns with Tree statistics. The following columns contain tree statistics, such as number of nodes or depth. Themeaning may differbetween trees.

nodes: Number of nodes
postLen: PH-tree: Average length of postfixes; R-Tree & Quadtrees: depth
AHC: PH-tree:Number of AHC nodes
NT: PH-tree:Number of NT-Nodes
NTinternal: PH-tree: Number of NT-subnodes in all NT-Nodes

Columns with result statistics. The following columns give an indicator of the results returned by thefirst testrun, even if runsare repeated if they are faster than 2 seconds (default), see above. The counts should not vary much between runs.Thisis a basic form of correctness testing. For a given test scenarion, results should be the same for all indexes.

wq1-n: Number of returned window query objects
wq2-n: Number of returned window query objects
q1p-n: Number of found objects in point query
q2p-n: Number of found objects in point query
1-NN1-d: Average distance over all 1-nearest neighbors
1-NN2-d: Average distance over all 1-nearest neighbors
10-NN1-d: Average distance over all 10-nearest neighbors
10-NN2-d: Average distance over all 10-nearest neighbors
up1-n: Number of updated objects
up2-n: Number of updated objects

Columns with JVM statistics For each test part, the following column contain garbage collection statistics based onJava instrumentation. They are agood indicator, but not precise!-s is the estimated memory [MB] freed up by GC.-t is the estimated timeused by the GC in [ms].

load-s: Estimated size of garbage collected memory [MB]
load-t: Estimated CPU time of garbage collector [ms]
w-query-s
w-query-t
p-query-s
p-query-t
update-s
update-t
1-NN-s
1-NN-t
10-NN-s
10-NN-t
unload-s
unload-t

General messages column

msg: General messages by index wrapper and test runner

File Columns in TinSpin 0.x

In general:

Index: Index and test descriptor, such asRSZ-R for rectangle index, see above
data: Test data descriptor, Such asCUBE(1.0,0.0,null), see above
dim: number of data dimensions
bits: number of bits (deprecated), always 64
N: dataset size (number of points or rectangles)
calcMem: Estimated memory requirement [bytes] per entry, only PH-Tree variants
memory: Total measured JVM memory [bytes]
memory/n: Total measured JVM memory [bytes per entry]

Timing. Most parts of the test are executed in two runs or more, each run consisting of a predefined number ofexecution. For example, each exact match run consists of 100,000 exact match queries as defined in theTestStatsclass.Except for load/unload, runs are repeated until at least 2 seconds (default) have passed, this is in order to give moreprecise timings for very short runs.

gen: time [ms] for dataset generation
load: total index loading time [ms]
load/n: average loading time [micro-s/entry]
q1/n: window query time run #1 [micro-s/returned entry]
q2/n: window query time run #2 [micro-s/returned entry]
pq1/n: exact match query time run #1 [micro-s/query] (was called point query)
pq2/n: exact match query time run #2 [micro-s/query]
up1/n: update time run #1 [micro-s/update]
up2/n: update time run #2 [micro-s/update]
1-NN1: query time run #1 [micro-s/update]
1-NN2: query time run #2 [micro-s/update]
10-NN1: query time run #1 [micro-s/update]
10-NN2: query time run #2 [micro-s/update]
unload: total index unloading time [ms]
unload/n: average removal time [micro-s/entry]

Tree statistics. The following columns contain tree statistics, such as number of nodes or depth. The meaning may differbetween trees.

nodes: Number of nodes
postLen: PH-tree: Average length of postfixes; R-Tree & Quadtrees: depth
AHC: PH-tree:Number of AHC nodes
NT: PH-tree:Number of NT-Nodes
NTinternal: PH-tree: Number of NT-subnodes in all NT-Nodes

Result statistics. The following columns give an indicator of the result returned by thefirst test run, even if runsare repeated if they are faster than 2 seconds (default), see above. The counts should not vary much between runs, butusing the first runs allows comparing the result counts of different tree as a basic form of correctness testing.

q1-n: Number of returned window query objects
q2-n: Number of returned window query objects
q1p-n: Number of found objects in point query
q2p-n: Number of found objects in point query
d1-1NN: Average distance of nearest neighbors
d2-1NN: Average distance of nearest neighbors
d1-kNN: Average of sum of distance of 10 nearest neighbors
d2-kNN: Average of sum of distance of 10 nearest neighbors
up1-n: Number of updated objects
up2-n: Number of updated objects
distCalc-n : Number of distance calculation for insert, query and deletion
distCalc1NN-n : Number of distance calculations of 1NN queries
distCalcKNN-n : Number of distance calculations of kNN queries

For each test part, the following column contain garbage collection statistics based on Java instrumentation. They are agood indicator, but not precise!-s is the estimated memory [MB] freed up by GC.-t is the estimated time in [ms]used by the GC.

load-s: Estimated size of garbage collected memory [MB]
load-t: Estimated runtime of garbage collector [ms]
w-query-s
w-query-t
p-query-s
p-query-t
update-s
update-t
1-NN-s
1-NN-t
10-NN-s
10-NN-t
unload-s
unload-t

General messages column:

msg: General messages by index wrapper and test runner

About

Testing in-memory spatial indexes

Languages

Java100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

TinSpin

Changelog

Overview

Results

Output File Format

Folders

File Sections

File Columns in TinSpin 1.x

File Columns in TinSpin 0.x

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Contributors2

Uh oh!

Languages

Movatterモバイル変換

License

tzaeschke/TinSpin

Folders and files

Latest commit

History

Repository files navigation

TinSpin

Changelog

Overview

Results

Output File Format

Folders

File Sections

File Columns in TinSpin 1.x

File Columns in TinSpin 0.x

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Contributors2

Uh oh!

Languages

Packages