This repository was archived by the owner on Jul 4, 2025. It is now read-only.

menloresearch/cortex.cppPublic archive

NotificationsYou must be signed in to change notification settings
Fork180
Star2.8k

feat: vLLM backend#2010

Draft

gau-nernst wants to merge93 commits intodev

base:dev

Choose a base branch

fromthien/python_engine

Draft

feat: vLLM backend#2010

gau-nernst wants to merge93 commits intodevfromthien/python_engine

Conversation

Copy link

Contributor

gau-nernst commentedFeb 21, 2025•
edited
Loading

Describe Your Changes

High-level design

vLLM is an inference engine for large-scale (many GPUs)
cortex will spawn an vLLM subprocess and route the requests to vLLM

`cortex engines install vllm`

Download uv tocortexcpp/python_engines/bin/uv if uv is not installed
(via uv) Setup venv atcortexcpp/python_engines/envs/vllm/<version>/.venv
(via uv) Download vllm and its deps
Known issues:
- Progress streaming is not supported (since download is done via uv instead ofDownloadService).
- It's not async since we need to wait for subprocess to finish (perhaps we will need a newSubprocessService in the future which handles asyncWaitProcess())
- Hence, stopping and resuming download also does not work.

Note:

All cached Python packages are stored incortexcpp/python_engines/cache/uv. The purpose is that when we removepython_engines folder, we are sure that we don't leave anything behind.

`cortex models start <model>`

Spawnvllm serve

TODO:

cortex engines install vllm (TODO: async install in separate thread)
Set default engine variant
cortex engines load vllm
cortex engines list
cortex engines uninstall vllm: deletecortexcpp/python_engines/envs/vllm/<version>
cortex pull <model>
cortex models list
cortex models start <model>: spawnvllm serve
cortex models stop <model>
cortex ps
Chat completion
- Non-streaming
- Streaming
Embeddings
cortex run

Fixes Issues

ClosesvLLM backend for Cortex #1890

Self Checklist

Added relevant comments, esp in complex areas
Updated docs (for bug fixes / features)
Created issues for follow-up changes or refactoring needed

gau-nernst added30 commits

February 14, 2025 09:14

wip: download uv

60b13bb

Merge branch 'dev' into thien/python_engine

3ddce8c

fix: has_value -> has_error

f9817c8

move uv stuff to python_engine. use uv to start process

2dbc296

redirect stdout/stderr

eec24bd

simplify code

26fdbd3

rename python engine interface

3ba7994

use PythonEngineI

5e7125f

more checks to match all EngineV variants

c5da0ee

improve Python load model

3c097fb

consolidate process-related functions

84db8b0

update PythonModelConfig. add UnloadModel

8ee815c

implement PythonEngine::GetModels

29f5344

Merge branch 'dev' into thien/python_engine

75ce355

implement getModelStatus. add some notes

7949dcc

add router for python

e2f0323

call PythonEngine destructor

607d2cb

remove unused method

f58b773

remove unnecessary headers

bf23c9f

Merge branch 'dev' into thien/python_engine

d7818d5

remove unused stuff

8ebee7c

download uv directly from github release

8f36adc

check for entrypoint

5ebfbb7

only record model size for llama.cpp

5d310d1

don't include headers

c4c622c

Merge branch 'dev' into thien/python_engine

fc0369c

don't use std::optional to support < c++17

6b59878

fix stringstream usage

250a2ac

define pid_t for windows

bb38a56

explicit call .string() on filesystem::path to support windows

723c5db

gau-nernst added16 commits

March 18, 2025 13:06

support download HF model

591d461

use / for HF model

c3d41bf

fix thread-unsafe

dc42ddd

Merge branch 'dev' into thien/python_engine

13d9e3f

Merge branch 'dev' into thien/python_engine

70151e2

remove methods

73fe3e5

remove old remnants

7bf287d

support models list. add --relocatable for venv

2a2b607

preparation works for start model

fffc686

add sync download util. add vLLM version config. some boilerplate cod…

cea8020

…e to launch model (non-functional atm)

list engines

86d4c01

load and unload model

ec8b36d

retrieve cortex port from yaml file

9226110

add env vars support. log stdout and stderr

eeccd3a

add GetModelStatus and GetModels

6fe7ae8

fix typo

074a04a

gau-nernst moved this fromIcebox toIn Progress inMenlo

Mar 20, 2025

gau-nernst added2 commits

March 21, 2025 15:34

Merge branch 'dev' into thien/python_engine

cd55d64

add non-stream chat completions

368a4f3

gau-nernst mentioned this pull request

Mar 22, 2025

idea: Apple MLX#678

Open

vansangpfievand others added10 commits

March 27, 2025 15:12

Merge pull request#2186from menloresearch/s/chore/sync-dev

c0e0fca

fix: std::filesystem::equivalent does not work for non-exist path

Merge branch 'main' into thien/python_engine

e141891

add uninstall cmd

807b201

support streaming

d38eca8

fix cortex run

7e002cd

wait for vLLM server to be up

1ebbbdb

use health check for some stuff

b5d8315

add some notes. support embeddings. support some extra vLLM args

5feda51

remove old tests. some chores

5eea345

remove unused function

2bde26a

Labels

None yet

3 participants

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: vLLM backend#2010

Are you sure you want to change the base?

feat: vLLM backend#2010

Uh oh!

Conversation

gau-nernst commentedFeb 21, 2025•
edited
Loading

Uh oh!