- Notifications
You must be signed in to change notification settings - Fork88
VS Code extension for LLM-assisted code/text completion
License
ggml-org/llama.vscode
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
Local LLM-assisted text completion, chat with AI and agentic coding extension for VS Code
- Auto-suggest on input
- Accept a suggestion with
Tab - Accept the first line of a suggestion with
Shift + Tab - Accept the next word with
Ctrl/Cmd + Right - Toggle the suggestion manually by pressing
Ctrl + L - Control max text generation time
- Configure scope of context around the cursor
- Ring context with chunks from open and edited files and yanked text
- Supports very large contexts even on low-end hardware via smart context reuse
- Display performance stats
- Llama Agent for agentic coding
- Add/remove/export/import for models - completion, chat, embeddings and tools
- Model selection - for completion, chat, embeddings and tools
- Env (group of models) concept introduced. Selecting/Deselecting env selects/deselects all the models in it
- Add/remove/export/import for env
- Predefined models (including OpenAI gpt-oss 20B added as a local one)
- Predefined envs for different use cases - only completion, chat + completion, chat + agent, loccal full package (with gpt-oss 20B), etc.
- MCP tools selection for the agent (from VS Code installed MCP Servers)
- Search and download models from Huggingface directly from llama-vscode
Install thellama-vscode extension from the VS Code extension marketplace:
Note: also available atOpen VSX
Show llama-vscode menu by clicking on llama-vscode in the status bar or Ctrl+Shift+M and select "Install/Upgrade llama.cpp". This will install llama.cpp automatically for Mac and Windows. For Linux get thelatest binaries and add the bin folder to the path.
Once you have llama.cpp installed, you can select env for your needs from llama-vscode menu "Select/start env..."
Below are some details how to install llama.cpp manually (if you prefer it).
brew install llama.cpp
winget install llama.cpp
Either use thelatest binaries orbuild llama.cpp from source. For more information how to run thellama.cpp server, please refer to theWiki.
Here are recommended settings, depending on the amount of VRAM that you have:
More than 64GB VRAM:
llama-server --fim-qwen-30b-default
More than 16GB VRAM:
llama-server --fim-qwen-7b-default
Less than 16GB VRAM:
llama-server --fim-qwen-3b-default
Less than 8GB VRAM:
llama-server --fim-qwen-1.5b-default
CPU-only configs
These arellama-server settings for CPU-only hardware. Note that the quality will be significantly lower:
llama-server \ -hf ggml-org/Qwen2.5-Coder-1.5B-Q8_0-GGUF \ --port 8012 -ub 512 -b 512 --ctx-size 0 --cache-reuse 256
llama-server \ -hf ggml-org/Qwen2.5-Coder-0.5B-Q8_0-GGUF \ --port 8012 -ub 1024 -b 1024 --ctx-size 0 --cache-reuse 256
You can use any other FIM-compatible model that your system can handle. By default, the models downloaded with the-hf flag are stored in:
- Mac OS:
~/Library/Caches/llama.cpp/ - Linux:
~/.cache/llama.cpp - Windows:
LOCALAPPDATA
The plugin requires FIM-compatible models:HF collection
The extension includes Llama Agent
- Llama Agent UI in Explorer view
- Works with local models - gpt-oss 20B is the best choice for now
- Could work with external models (for example from OpenRouter)
- MCP Support - could use the tools from the MCP Servers, which are installed and started in VS Code
- 9 internal tools available for use
- custom_tool - returns the content of a file or a web page
- custom_eval_tool - write your own tool in javascript (function with input and return value string)
- Attach the selection to the context
- Configure maximum loops for Llama Agent
- Open Llama Agent with Ctrl+Shift+A or from llama-vscode menu "Show Llama Agent"
- Select Env with an agent if you haven't done it before.
- Write a query and attach files with the @ button if needed
More details(https://github.com/ggml-org/llama.vscode/wiki)
Speculative FIMs running locally on a M2 Studio:
llama-vscode-1.mp4
The extension aims to be very simple and lightweight and at the same time to provide high-quality and performant local FIM completions, even on consumer-grade hardware.
- The initial implementation was done by Ivaylo Gardev@igardev using thellama.vim plugin as a reference
- Techincal description:ggml-org/llama.cpp#9787
- Vim/Neovim:https://github.com/ggml-org/llama.vim
About
VS Code extension for LLM-assisted code/text completion
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.


