CLIP-as-service is a low-latency high-scalability service for embedding images and text. It can be easily integrated as a microservice into neural search solutions.
⚡Fast: Serve CLIP models with TensorRT, ONNX runtime and PyTorch w/o JIT with 800QPS[*]. Non-blocking duplex streaming on requests and responses, designed for large data and long-running tasks.
🫐Elastic: Horizontally scale up and down multiple CLIP models on single GPU, with automatic load balancing.
🐥Easy-to-use: No learning curve, minimalist design on client and server. Intuitive and consistent API for image and sentence embedding.
👒Modern: Async client support. Easily switch between gRPC, HTTP, WebSocket protocols with TLS and compression.
🍱Integration: Smooth integration with neural search ecosystem includingJina andDocArray. Build cross-modal and multi-modal solutions in no time.
[*] with default config (single replica, PyTorch no JIT) on GeForce RTX 3090.
is the latest version.
Make sure you are using Python 3.7+. You can install the client and server independently. It isnot required to install both: e.g. you can installclip_server
on a GPU machine andclip_client
on a local laptop.
pipinstallclip-client
pipinstallclip-server
pipinstall"clip_server[onnx]"
pipinstallnvidia-pyindexpipinstall"clip_server[tensorrt]"
After installing, you can run the following commands for a quick connectivity check.
python-mclip_server
python-mclip_serveronnx-flow.yml
python-mclip_servertensorrt-flow.yml
At the first time starting the server, it will download the default pretrained model, which may take a while depending on your network speed. Then you will get the address information similar to the following:
╭────────────── 🔗 Endpoint ───────────────╮│ 🔗 Protocol GRPC ││ 🏠 Local 0.0.0.0:51000 ││ 🔒 Private 192.168.31.62:51000 │| 🌍 Public 87.105.159.191:51000 |╰──────────────────────────────────────────╯
This means the server is ready to serve. Note down the three addresses shown above, you will need them later.
Tip
Depending on the location of the client and server. You may use different IP addresses:
Client and server are on the same machine: use local address, e.g.0.0.0.0
Client and server are connected to the same router: use private network address, e.g.192.168.3.62
Server is in public network: use public network address, e.g.87.105.159.191
Run the following Python script:
fromclip_clientimportClientc=Client('grpc://0.0.0.0:51000')c.profile()
will give you:
Roundtrip 16ms 100%├── Client-server network 8ms 49%└── Server 8ms 51% ├── Gateway-CLIP network 2ms 25% └── CLIP model 6ms 75%{'Roundtrip': 15.684750003856607, 'Client-server network': 7.684750003856607, 'Server': 8, 'Gateway-CLIP network': 2, 'CLIP model': 6}
It means the client and the server are now connected. Well done!
Join ourDiscord community and chat with other community members about ideas.
Watch ourEngineering All Hands to learn Jina’s new features and stay up-to-date with the latest AI techniques.
Subscribe to the latest video tutorials on ourYouTube channel
CLIP-as-service is backed byJina AI and licensed underApache-2.0.We are actively hiring AI engineers, solution engineers to build the next neural search ecosystem in open-source.