- Notifications
You must be signed in to change notification settings - Fork0
athenarc/MinMaxCache
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Supplemental material detailing the algorithms for error-bound calculation and query evaluation over MinMaxCache, as well as a detailed presentation of the user study conducted, can be foundhere.
Data used for the experiments can be foundhere.
In this folder, there are 2 sub-folders. One for the real datasets used in the experiments and one for the synthetic. Each contains a notebook, named expand_data.ipynb and create_synth.ipynb respectively.
Required libraries to run both notebooks are: numpy, pandas and datetime.
By running the expand_data.ipynb, the original datasets will be expanded 50 times, and 3 datasets with the same name and the suffix "exp" will be created.Running create_synth.ipynb, will create 11 synthetic timeseries datasets generated from random walks, with the names synthetic{1m-1b}.csv.
First, build the JAR file:
mvn clean package
To execute a sequence of queries, e.g. using a table, run the following:
java -jar target/experiments.jar -seqCount 50 -measureChange 0 -type <influx, postgres> -measures <measure_ids> -timeCol <timeCol (if postgres)>-valueCol <valueCol (if postgres)> -idCol <idCol (if postgres)> -zoomFactor 2 -viewport <width,height> -runs 1 -out <output_folder_path> -minShift 0.1 -maxShift 0.5 -schema -table -timeFormat "yyyy-MM-dd[ HH:mm:ss.SSS]" -a 0.95 -q 0.1 -prefetchingFactor 1 -aggFactor 4
### Parameters:-seqCount *No. of queries*-type *Database, <influx, postgres>*-mode *Algorithm to run <minMax, m4, raw>-measures *Measure ids, e.g 1,2,3*-timeCol *Name of time column (for postgres)*-valueCol *Name of value column (for postgres)*-idCol *Name of id column (for postgres)*-out *Output Folder*-schema *Schema on the DB (On Influx it defines the bucket)-table *Table Name*-zoomFactor *Factor by which to zoom in and out%*-viewPort *Width, height of the viewport of visualization*-runs *No. of times to run the experiment*-minShift *Minimum pan shift*-maxShift *Maximum pan shift*-timeFormat *Time format of the Time Column*-a *Accuracy Threshold*-q *Query Selectivity*-prefetchingFactor *Prefetching Factor*-aggFactor *Initial Aggregation Factor*(-queries) *A path to a csv file with predefined epoch-based queries. First column is start epoch and the second end epoch (e.g queries.txt file in the repository). *