winpython/winpythonPublic

NotificationsYou must be signed in to change notification settings
Fork339
Star2.2k

the local llm thing #1505

New issue

Open

the local llm thing#1505

Description

stonebig

opened

on Mar 16, 2025

browsing web:

when the model doesn't fit on the gpu ram, you loose 10x the gpu performance.. making APU with big RAM competing with big RTX3090
hardware:
- the models must fit inside the RAM or you loose 10x the performance
- RAM is even more crucial for reasoning models who needs to remember
- big usefull model seems to require between 16 to 32 Go RAM, so:
  - a new pc apu + npu with 64Go RAM is the caution
  - if a gpu, the ratio CPU ram = 4x GPU ram seems still nice, so 64gb ram CPU + option a 16gb GPU
  - unified RAM makes big APU with big bandwith the "cheap" competitive idea in 2025
- bandwith to RAM is important: 256 bit bus looks minimum, 384 lovely
- 1T disk may be too small
evolution pressure:
- Microsoft "copilote" ==> npu and ram
- electricity ==> APU seems 10x more efficient then gpu
- mac-mini style pc the low cost choice, ep if hardware renewal strategy every 2 years
- token cost is trending -20x per year:
  - so there is no reason sink too much money on hardware, renew more often, but check minimums
  - apple M4 are the reference: we can only hope from competition from Qualcomm, AMD, Nvidia

asking to AI (gemini flash 2):

software:
- llama.cpp:ggml-org/llama.cpp , greatest but first oriented Mac / Arm
- ollama: ollama.ai:ollama.ai
- LM studio:https://lmstudio.ai/
- torch / Tensorflow / Jax ? not really for cpu
- intel IPEX-LLM for optimizing on cpu (works below Torch?)
- ... docker models are looking like a trend

*** gemini flash 2: 2025-03-16

It's great you're looking into running LLMs locally! Here's a breakdown of how to obtain llama.cpp, Ollama, and LM Studio:

1. llama.cpp:

GitHub:
- The primary source is the llama.cpp GitHub repository:
  - [ggml-org/llama.cpp](https://github.com/ggml-org/llama.cpp)
- From there, you have a few options:
  - Building from source: This offers the most flexibility but requires technical knowledge. You'll need to compile the code according to your operating system.
  - Downloading pre-built binaries: On the github release page, you can find prebuilt binaries. Be sure to download the correct version for your CPU architecture.
Key points:
- llama.cpp is a C++ library, so it's primarily command-line driven.
- It's highly optimized for CPU inference, especially with quantized models.

2. Ollama:

Official Website:
- The easiest way to get Ollama is from its official website:
  - [ollama.ai](https://ollama.ai/)
- The website provides straightforward installation instructions for macOS, Linux, and Windows.
- Installation often involves running a simple installation script.
Key points:
- Ollama simplifies the process of running LLMs locally.
- It handles model management, making it user-friendly.

3. LM Studio:

Official Website:
- You can download LM Studio from its official website:
  - [lmstudio.ai](https://lmstudio.ai/)
- The website provides installers for various operating systems.
Key points:
- LM Studio offers a graphical user interface (GUI), making it very accessible.
- It allows you to easily download and run LLMs from Hugging Face.
- It uses llama.cpp in the background.

General Tips:

When downloading or installing, always prioritize official sources to ensure safety.
Pay attention to system requirements, especially RAM and storage space.
If you're new to LLMs, Ollama or LM Studio might be easier starting points.

I hope this helps!

simple comparisons:

Metadata

Assignees

No one assigned

Labels

No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

the local llm thing #1505

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions