- Notifications
You must be signed in to change notification settings - Fork0
A composable and fully extensible C++ execution engine library for data management systems.
License
Rijin-N/velox
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Velox is a composable execution engine distributed as an open source C++library. It provides reusable, extensible, and high-performance data processingcomponents that can be (re-)used to build data management systems focused ondifferent analytical workloads, including batch, interactive, streamprocessing, and AI/ML. Velox was created by Meta and it is currently developedin partnership with IBM/Ahana, Intel, Voltron Data, Microsoft, ByteDance andmany other companies.
In common usage scenarios, Velox takes a fully optimized query plan as inputand performs the described computation. Considering Velox does not provide aSQL parser, a dataframe layer, or a query optimizer, it is usually not meantto be used directly by end-users; rather, it is mostly used by developersintegrating and optimizing their compute engines.
Velox provides the following high-level components:
- Type: a generic typing system that supports scalar, complex, and nestedtypes, such as structs, maps, arrays, etc.
- Vector: anArrow-compatible columnar memory layoutmodule,providing encodings such as Flat, Dictionary, Constant, and Sequence/RLE, inaddition to a lazy materialization pattern and support for out-of-orderwrites.
- Expression Eval: afully vectorized expression evaluationenginethat allows expressions to be efficiently executed on top of Vector/Arrowencoded data.
- Functions: sets of vectorized scalar, aggregates, and window functionsimplementations following the Presto and Spark semantic.
- Operators: implementation of relational operators such as scans, writes,projections, filtering, grouping, ordering, shuffle/exchange,hash, merge,and nested loop joins,unnest, and more.
- I/O: a connector interface for extensible data sources and sinks,supporting different file formats (ORC/DWRF, Parquet, Nimble), and storageadapters (S3, HDFS, GCS, ABFS, local files) to be used.
- Network Serializers: an interface where different wire protocols can beimplemented, used for network communication, supportingPrestoPageand Spark's UnsafeRow.
- Resource Management: a collection of primitives for handlingcomputational resources, such asmemoryarenas andbuffer management, tasks, drivers, and thread pools for CPU and threadexecution, spilling, and caching.
Velox is extensible and allows developers to define their own engine-specificspecializations, including:
- Custom types
- Simple and vectorized functions
- Aggregate functions
- Window functions
- Operators
- File formats
- Storage adapters
- Network serializers
Examples of extensibility and integration with different component APIscan befound here
Developer guides detailing many aspects of the library, in addition to the listof available functionscan be found here.
Blog posts are availablehere.
Velox is an open source project supported by a community of individualcontributors and organizations. The project's technical governance mechanics isdescribedin thisdocument..
Project maintainersare listedhere.
The main communication channel with the Velox OSS community is through thetheVelox-OSS Slack workspace, github Issues, andDiscussions.
For access to the Velox Slack workspace, please add a commentto thisDiscussion
Check ourcontributing guide to learn about how tocontribute to the project.
Velox is licensed under the Apache 2.0 License. A copy of the licensecan be found here.
git clone https://github.com/facebookincubator/velox.gitcd velox
Once Velox is checked out, the first step is to install the dependencies.Details on the dependencies and how Velox manages some of them for youcan be found here.
Velox also provides the following scripts to help developers setup and install Veloxdependencies for a given platform.
The following setup scripts use theDEPENDENCY_DIR
environment variable to set thelocation to download and build packages. This defaults todeps-download
in the currentworking directory.
UseINSTALL_PREFIX
to set the install directory of the packages. This defaults todeps-install
in the current working directory on macOS and to the default installlocation (eg./usr/local
) on linux.Using the default install location/usr/local
on macOS is discouraged since thislocation is used by certain Homebrew versions.
Manually add theINSTALL_PREFIX
value in the IDE or bash environment,sayexport INSTALL_PREFIX=/Users/$USERNAME/velox/deps-install
to~/.zshrc
so thatsubsequent Velox builds can use the installed packages.
You can reuseDEPENDENCY_INSTALL
andINSTALL_PREFIX
for Velox clients such as Prestissimoby specifying a common shared directory.`
On a macOS machine (either Intel or Apple silicon) you can setup and then build like so:
$ ./scripts/setup-macos.sh$ make
With macOS 14.4 and XCode 15.3 wherem4
is missing, you can either
- install
m4
viabrew
:
$ brew install m4$export PATH=/opt/homebrew/opt/m4/bin:$PATH
- or use
gm4
instead:
$ M4=/usr/bin/gm4 make
The supported architectures are x86_64 (avx, sse), and AArch64 (apple-m1+crc, neoverse-n1).You can build like so:
$ ./scripts/setup-ubuntu.sh$ make
Velox adapters include file-systems such as AWS S3, Google Cloud Storage,and Azure Blob File System. These adapters require installation of additionallibraries. Once you have checked out Velox, you can setup and build like so:
$ ./scripts/setup-centos9.sh$ ./scripts/setup-adapters.sh$ make
Note thatsetup-adapters.sh
supports macOS and Ubuntu 20.04 or later.
Clang 15 can be additionally installed during the setup step for Ubuntu 22.04/24.04and CentOS 9 by setting theUSE_CLANG
environment variable prior to running the platform specific setup script.
$export USE_CLANG=true
This will install and use Clang 15 to build the dependencies instead of using the default GCC compiler.
Once completed, and before running anymake
command, set the compiler to be used:
$export CC=/usr/bin/clang-15$export CXX=/usr/bin/clang++-15$ make
Runmake
in the root directory to compile the sources. For development, usemake debug
to build a non-optimized debug version, ormake release
to buildan optimized version. Usemake unittest
to build and run tests.
Note that,
- Velox requires a compiler at the minimum GCC 11.0 or Clang 15.0.
- Velox requires the CPU to support instruction sets:
- bmi
- bmi2
- f16c
- Velox tries to use the following (or equivalent) instruction sets where available:
- On Intel CPUs
- avx
- avx2
- sse
- On ARM
- Neon
- Neon64
- On Intel CPUs
Build metrics for Velox are published athttps://facebookincubator.github.io/velox/bm-report/
If you don't want to install the system dependencies required to build Velox,you can also build and run tests for Velox on a docker containerusingdocker-compose.Use the following commands:
$ docker-compose build ubuntu-cpp$ docker-compose run --rm ubuntu-cpp
If you want to increase or decrease the number of threads used when building Veloxyou can override theNUM_THREADS
environment variable by doing:
$ docker-compose run -e NUM_THREADS=<NUM_THREADS_TO_USE> --rm ubuntu-cpp