Dingo
Dingo products, designed and developed byDataCanvas, comprises a range of innovative solutions, includingDingoDB,DingoFS andDingoSpeed. Each product delivers unique features and serves distinct application scenarios. Below is a detailed overview.
English |简体中文
📝DingoDB | 📂DingoFS | ⚡DingoSpeed
DingoDB is an open-source distributed multi-modal vector database, which integrates real-time strong consistency, relational semantics, and vector semantics into a unified platform, DingoDB positioning itself as a distinctive multi-modal database solution. With exceptional horizontal scalability and elastic scaling capabilities, it effortlessly meets enterprise-grade high availability requirements.
Key Features
✔️Comprehensive access interfaceDingoDB provides comprehensive access interfaces, supporting various flexible access modes such as SQL, SDK, and API to meet the needs of different developers. Additionally, it introduces Table and Vector as first-class citizen data models, providing users with efficient and powerful data processing capabilities.
✔️Built-in data high availability
DingoDB provides fully functional and highly available built-in configurations without the need to deploy any external components, which can significantly reduce users' deployment and operation and maintenance costs and significantly improve the efficiency of system operation and maintenance.
✔️Fully automatic elastic data sharding
DingoDB supports dynamic configuration of data shard size, automatic splitting and merging, realizing efficient and friendly resource allocation strategies, and easily responding to various business expansion needs.
✔️Scalar-vector hybrid retrieval
DingoDB supports both traditional database index types and various vector index types, providing a seamless scalar and vector hybrid retrieval experience, reflecting industry-leading retrieval capabilities. In addition, it also supports fusion of scalars, vectors and text. Distributed transaction processing.
✔️Built-in real-time index optimization
DingoDB can build scalar and vector indexes in real time, providing users with unconscious background automatic index optimization. At the same time, it ensures no delays during data retrieval.
✔️Cold-Hot Tiered Retrieval for Massive Datasets
DingoDB provides disk-based vector search capabilities to minimize memory consumption, and supports dynamic switching between different indexes based on data scale requirements.
Get Start
All DocumentationDingoDB Docs
UsageHow to use DingoDBUsage
Developing DingoDB
- VS CodeWe recommendVS Code to develop the DingoDB codebase.
DingoFS is a cloud-native distributed high-speed file storage system. It integrates multiple features such as elasticity, multi-cloud compatibility, multi-protocol convergence, and exceptional performance. By leveraging its multi-tiered, multi-type, and high-performance distributed multi-level caching architecture, DingoFS accelerates data I/O for AI workflows, effectively addressing burst I/O challenges in AI scenarios.
Key Features
✔️POSIX Compliance
DingoFS delivers a native file system-like operational experience, enabling seamless system integration.
✔️AI-Native Architecture
Deeply optimized for large language model (LLM) workflows, efficiently managing massive training datasets and checkpoint workloads.
✔️S3 Protocol Compatibility
DingoFS supports standard S3 interface protocols for streamlined access to filesystem namespace resources.
✔️Fully Distributed Architecture
DingoFS's metadata Service (MDS), data storage layer, caching system, and client components all support linear scalability.
✔️Exceptional Performance
Combines SSD-level low-latency responsiveness with object storage-grade elastic throughput capacity.
✔️Intelligent Caching Acceleration System
DingFS implements a three-tier caching topology (memory/local SSD/distributed cluster) to deliver high-throughput, low-latency intelligent I/O acceleration for AI workloads.
Get Start
- All DocumentationDingoFS Docs
1. Setup Dingo-eureka and Dingo-sdkIf you installed the software using aDocker container, the container already includes pre-integratedDingo-eureka andDingo-sdk, no additional installation is required.
2. Install jemalloc
wget https://github.com/jemalloc/jemalloc/releases/download/5.3.0/jemalloc-5.3.0.tar.bz2tar -xjvf jemalloc-5.3.0.tar.bz2cd jemalloc-5.3.0&& ./configure&& make&& make install
3. Download dep
git submodule syncgit submodule update --init --recursive
4. Build Etcd Client
bash build_thirdparties.sh
5. Build
mkdir buildcd buildcmake ..make -j 32Developing DingoFS
- Install DependenciesWe recommend Rocky and Ubuntu to develop the DingoFS codebase.
- GCC 13We recommend using GCC 13 as the primary compiled language.
DingoSpeed is a self-hosted Huggingface image service, which provides users with a convenient and efficient solution for accessing and managing model resources. Through local mirroring, users can reduce their reliance on remote Hugging Face servers, improve resource acquisition speed, and achieve local storage and management of data.
Key Features
✔️ Mirror Acceleration
Cache the resources downloaded for the first time. When the client makes a subsequent request, the data will be read from the cache and returned, greatly improving the download rate.
✔️ Convenient Access
There is no need for scientific internet access or complex network configuration. Simply deploy the DingoSpeed service and use it as the proxy address to easily complete the download.
✔️ Traffic Reduction and Load Alleviation
Download once and use multiple times, reducing the traffic waste caused by repeated downloads, which is efficient and saves traffic.
✔️ Localized Management
Cover the entire process of local compilation, deployment, monitoring, and usage of the mirror service, bringing an excellent and flexible experience. It avoids reliance on external networks and public mirror repositories, significantly improving the system's response speed and data security.
Installation
The project uses the wire command to generate the required dependency code. Install the wire command as follows:
# Import into the projectgo get -u github.com/google/wire# Install the commandgo install github.com/google/wire/cmd/wireWire is a flexible dependency injection tool that completes dependency injection at compile time by automatically generating code. In the dependency relationships between various components, explicit initialization is usually used instead of passing global variables. Therefore, using Wire to initialize the code can effectively solve the coupling between components and improve code maintainability.
This project uses go mod to manage dependencies and requires Go version 1.23 or higher. It uses makefile to manage the project and requires the make command.
# 1. Install dependenciesmake init# 2. Generate codemake wire# 3. Compile the executable file for the current system versionmake build# 4. Compile the Linux executable file on macOSmake macbuild# 5. Add a license to each filemake licenseQuick Start
💡 Deploy the compiled binary file and execute ./dingo-hfmirror to start the service. Then set the environment variable HF_ENDPOINT to the mirror site (here it ishttp://localhost:8090/).
Linux:
export HF_ENDPOINT=http://localhost:8090Windows Powershell:
$env:HF_ENDPOINT = "http://localhost:8090"From now on, all download operations in the Hugging Face library will be proxied through this mirror site. You can install the Python library to try it out:
pip install -U huggingface_hubfrom huggingface_hub import snapshot_downloadsnapshot_download(repo_id='Qwen/Qwen-7B', repo_type='model',local_dir='./model_dir', resume_download=True,max_workers=8)Alternatively, you can use the Hugging Face CLI to directly download models and datasets. Download GPT2:
huggingface-cli download --resume-download openai-community/gpt2 --local-dir gpt2Download a single file:
huggingface-cli download --resume-download --force-download HuggingFaceTB/SmolVLM-256M-Instruct config.jsonDownload WikiText:
huggingface-cli download --repo-type dataset --resume-download Salesforce/wikitext --local-dir wikitextYou can view the path ./repos, where the caches of all datasets and models are stored.
DingoDB & DingoFS & DingoSpeed are Sponsored byDataCanvas, a new platform to do data science and data process in real-time.welcome any feedback from the community.
PinnedLoading
Repositories
- dingo Public
A multi-modal vector database that supports upserts and vector queries using unified SQL (MySQL-Compatible) on structured and unstructured data, while meeting the requirements of high concurrency and ultra-low latency.
dingodb/dingo’s past year of commit activity - dingo-sdk Public
Uh oh!
There was an error while loading.Please reload this page.
dingodb/dingo-sdk’s past year of commit activity - dingo-store-proto Public
Uh oh!
There was an error while loading.Please reload this page.
dingodb/dingo-store-proto’s past year of commit activity - dingofs Public
DingoFS is a project fork from Curve. Curve is a sandbox project hosted by the CNCF Foundation. It's cloud-native, high-performance, and easy to operate. Curve is an open-source distributed storage system for block and shared file storage.
dingodb/dingofs’s past year of commit activity - dingofs-tools Public
Uh oh!
There was an error while loading.Please reload this page.
dingodb/dingofs-tools’s past year of commit activity Uh oh!
There was an error while loading.Please reload this page.
dingodb/dingo-expr’s past year of commit activity - kolla-ansible Public
Uh oh!
There was an error while loading.Please reload this page.
dingodb/kolla-ansible’s past year of commit activity Uh oh!
There was an error while loading.Please reload this page.
dingodb/dingo-store’s past year of commit activity - brpc Public Forked fromapache/brpc
brpc is an Industrial-grade RPC framework using C++ Language, which is often used in high performance system such as Search, Storage, Machine learning, Advertisement, Recommendation etc. "brpc" means "better RPC".
Uh oh!
There was an error while loading.Please reload this page.
dingodb/brpc’s past year of commit activity Uh oh!
There was an error while loading.Please reload this page.
dingodb/dingo-libexpr’s past year of commit activity
Top languages
Loading…
Uh oh!
There was an error while loading.Please reload this page.
Most used topics
Loading…
Uh oh!
There was an error while loading.Please reload this page.
