- Notifications
You must be signed in to change notification settings - Fork352
vllm support#1063
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Open
kczimm wants to merge9 commits intomasterChoose a base branch fromkczimm-vllm-support
base:master
Could not load branches
Branch not found:{{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline, and old review comments may become outdated.
Uh oh!
There was an error while loading.Please reload this page.
Open
vllm support#1063
Changes from1 commit
Commits
Show all changes
9 commits Select commitHold shift + click to select a range
635476c add vllm binding
kczimm9360ef7 add vllm SamplingParams
kczimm8be0710 add test showing vllm model support check
kczimmb212ee0 refactor into llm module, use PyResult
kczimm746953e add vLLM to the transform API
kczimmca7e4ad make bindings vllm::outputs
kczimmd017cd6 swap out vLLM model if new
kczimm74ce6ae add vllm docs
kczimmaca505c add vllm inference docs; fix logic
kczimmFile filter
Filter by extension
Conversations
Failed to load comments.
Loading
Uh oh!
There was an error while loading.Please reload this page.
Jump to
Jump to file
Failed to load files.
Loading
Uh oh!
There was an error while loading.Please reload this page.
Diff view
Diff view
swap out vLLM model if new
- Loading branch information
Uh oh!
There was an error while loading.Please reload this page.
commitd017cd6bd9544d14e27eb8ef00b900e5b64a6c89
There are no files selected for viewing
27 changes: 1 addition & 26 deletionspgml-extension/src/api.rs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
75 changes: 75 additions & 0 deletionspgml-extension/src/bindings/vllm/inference.rs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,75 @@ | ||
| use parking_lot::Mutex; | ||
| use pyo3::prelude::*; | ||
| use serde_json::{json, Value}; | ||
| use super::LLM; | ||
| static MODEL: Mutex<Option<LLM>> = Mutex::new(None); | ||
| pub fn vllm_inference(task: &Value, inputs: &[&str]) -> PyResult<Value> { | ||
| crate::bindings::python::activate().expect("python venv activate"); | ||
| let mut model = MODEL.lock(); | ||
| let llm = match get_model_name(&model, task) { | ||
| ModelName::Same => model.as_mut().expect("ModelName::Same as_mut"), | ||
| ModelName::Different(name) => { | ||
| if let Some(llm) = model.take() { | ||
| // delete old model, exists | ||
| destroy_model_parallel(llm)?; | ||
| } | ||
| // make new model | ||
| let llm = LLM::new(&name)?; | ||
| model.insert(llm) | ||
| } | ||
| }; | ||
| let outputs = llm | ||
| .generate(&inputs, None)? | ||
| .iter() | ||
| .map(|o| { | ||
| o.outputs() | ||
| .expect("RequestOutput::outputs()") | ||
| .iter() | ||
| .map(|o| o.text().expect("CompletionOutput::text()")) | ||
| .collect::<Vec<_>>() | ||
| }) | ||
| .collect::<Vec<Vec<_>>>(); | ||
| Ok(json!(outputs)) | ||
| } | ||
| fn get_model_name<M>(model: &M, task: &Value) -> ModelName | ||
| where | ||
| M: std::ops::Deref<Target = Option<LLM>>, | ||
| { | ||
| match task | ||
| .as_object() | ||
| .and_then(|obj| obj.get("model").and_then(|m| m.as_str())) | ||
| { | ||
| Some(name) => match model.as_ref() { | ||
| Some(llm) if llm.model() == name => ModelName::Same, | ||
| _ => ModelName::Different(name.to_string()), | ||
| }, | ||
| None => ModelName::Same, | ||
| } | ||
| } | ||
| enum ModelName { | ||
| Same, | ||
| Different(String), | ||
| } | ||
| // See https://github.com/vllm-project/vllm/issues/565#issuecomment-1725174811 | ||
| fn destroy_model_parallel(llm: LLM) -> PyResult<()> { | ||
| Python::with_gil(|py| { | ||
| PyModule::import(py, "vllm")? | ||
| .getattr("model_executor")? | ||
| .getattr("parallel_utils")? | ||
| .getattr("parallel_state")? | ||
| .getattr("destroy_model_parallel")? | ||
| .call0()?; | ||
| drop(llm); | ||
| PyModule::import(py, "gc")?.getattr("collect")?.call0()?; | ||
| Ok(()) | ||
| }) | ||
| } |
12 changes: 10 additions & 2 deletionspgml-extension/src/bindings/vllm/llm.rs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
2 changes: 2 additions & 0 deletionspgml-extension/src/bindings/vllm/mod.rs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,9 +1,11 @@ | ||
| //! Rust bindings to the Python package `vllm`. | ||
| mod inference; | ||
| mod llm; | ||
| mod outputs; | ||
| mod params; | ||
| pub use inference::*; | ||
| pub use llm::*; | ||
| pub use outputs::*; | ||
| pub use params::*; |
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.