- Notifications
You must be signed in to change notification settings - Fork13.9k
New llama-run#17554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:master
Are you sure you want to change the base?
New llama-run#17554
Conversation
94a98cf to500a90cCompareericcurtin commentedNov 27, 2025
ngxson commentedNov 27, 2025
Considering these points:
So, I think it's better to repurpose one of the 3 binaries mentioned in my first point instead. |
ericcurtin commentedNov 27, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
|
f839cfb to5b0c817Compareericcurtin commentedNov 27, 2025
|
ngxson commentedNov 27, 2025
I think what's better doing at this point is improve the usability of |
5b0c817 tofddbeedCompareericcurtin commentedNov 27, 2025
I’m on board with the idea, but can we push ahead with this PR and plan the refactor for a later stage? To be honest, I think you’re the best person to handle the refactoring. You know exactly how you want the architecture to look, and it’s hard for others to guess those specifics. If someone else tries, we risk getting stuck in a loop of corrections, so it’s probably more efficient if you drive that part. |
ngxson commentedNov 27, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
For most parts, server code already refactored to be reused in another tool/example. I'm not sure what kind of refactoring you're talking about |
ericcurtin commentedNov 27, 2025
Moving the relevant code to llama-cli, llama-run, etc. or some other place. |
ngxson commentedNov 27, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
I think your
This way, you simply make In the future, we can make server as a static (or maybe shared) lib target and reuse it in both server and run |
0e3f326 tof7a7775Comparef347cb1 to2619c11Compareericcurtin commentedNov 27, 2025
People should give this a shot, the UX is quite neat: |
ngxson commentedNov 28, 2025
Seems to be a better direction now. I'll see if it's worth splitting some changes specific to One question though: why do we need to split |
ngxson commentedNov 28, 2025
Btw maybe a TODO, we can now add multimodal support to the CLI too |
2619c11 tof9ae221Compareericcurtin commentedNov 28, 2025
I've received mixed feedback in the past regarding granular file separation versus consolidation, so I am unsure of the preferred direction here. run-chat.cpp and run.cpp seems like a reasonable split between chat-focused activities and other code required to get the tool running. |
- Added readline.cpp include- Created run_chat_mode(): - Initializes readline with command history - Maintains conversation history - Applies chat templates to format messages - Submits completion tasks to the server queue - Displays assistant responses interactivelySigned-off-by: Eric Curtin <eric.curtin@docker.com>
f9ae221 toa449065Compareericcurtin commentedNov 28, 2025
If somebody could restart the failing builds I'd appreciate it, I don't have any sort of maintainer access anymore, as limited as a random contributor |
ngxson commentedNov 28, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
The failed builds are not related to server, we can ignore them anw. Beside, I usually open a mirror PR on my fork to skip the long waiting line of CIs on main repo, you can give that a try. |
Uh oh!
There was an error while loading.Please reload this page.