- Notifications
You must be signed in to change notification settings - Fork15
Build your own historical Limit Order Book dataset
License
pfei-sa/binance-LOB
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This data recorder obtains orderbook datastream using Binance API (both with websocket and snapshot) and data are entered into a clickhouse instance for faster analysis and reconstruction of local orderbook from data streamed.
All price and quantity information are stored asFloat64
as this project was written primarliy for data exploration/machine learning purpose. This means and price/quantity information are not acurate and should not be used in production environment.
To start data collection, make suredocker
anddocker-compose
is install on the system. Although it should be possible to run without docker however it is not tested.
Modify config.json to control the behavior of the data collection process:
{// symbols to track"symbols": [// Spot symbols"ethusdt","btcusdt","dogeusdt",// USD-M Futures needs 'USD_' prefix"USD_btcusdt",// COIN-M Futures needs 'COIN_' prefix"COIN_btcusd_perp" ],// interval to refetch full orderbook snapshot (seconds)"full_fetch_interval":3600,// level of orderbook to fetch for snapshot// for detail see binance api GET /api/v3/depth"full_fetch_limit":1000,// update speed to listen for diff depth stream"stream_interval":100,// control if logging information are logged to console"log_to_console":true,// default database name"db_name":"archive",// host name of the clickhouse instance if in docker"host_name_docker":"clickhouse",// host name of the clickhouse instance if not in docker"host_name_default":"localhost"}
This repo also include few utility function for reconstructing full orderbook from diff depth stream. They are included inreplay.py
. If you want to use it, also install all the required libraries withpip install -r requirements.txt
.
To start collecting data, first build image for the main python script withdocker-compose build
. Then, to start both python script and clickhouse instance, usedocker-compose up
. Clickhouse ports8123
and9000
are exposed tolocalhost
.
To see how data are stored in the database, seeDepthSnapshot
andDiffDepthStream
inmodel.py
.
Orderbook is reconstructed by following instruction onbinance api page. To use included replay functionality, first useget_snapshots_update_ids(symbol)
to get list of snapshot ids for the given symbol. Then useorderbook_generator
(orpartial_orderbook_generator
if you are only intered in a partial orderbook) withlast_update_id
= the id of snapshot you are instered to reconstruct from minus 1. Here is an example on how you should use these generator.
forrinorderbook_generator(0,"ETHUSDT",block_size=5000):# here last_update_id=0 is supplied to contruct orderbook# from first avalible snapshot.# block_size should also be used if the database is really large ...#process yor orderbook
The generator is exhausted when there is a gap in the diff depth stream (probably due to connection lost while logging data), i.e. the previousfinal_update_id + 1 != first_update_id
, or there is no more diff stream in the database. To skip the gap and start a new generator, simply use the lastlast_update_id
from previus iteration again.
Seedocumentation for detail on how to use the replay modules