Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

the local llm thing #1505

Open
Open
@stonebig

Description

@stonebig

browsing web:

  • when the model doesn't fit on the gpu ram, you loose 10x the gpu performance.. making APU with big RAM competing with big RTX3090
  • hardware:
    • the models must fit inside the RAM or you loose 10x the performance
    • RAM is even more crucial for reasoning models who needs to remember
    • big usefull model seems to require between 16 to 32 Go RAM, so:
      • a new pc apu + npu with 64Go RAM is the caution
      • if a gpu, the ratio CPU ram = 4x GPU ram seems still nice, so 64gb ram CPU + option a 16gb GPU
      • unified RAM makes big APU with big bandwith the "cheap" competitive idea in 2025
    • bandwith to RAM is important: 256 bit bus looks minimum, 384 lovely
    • 1T disk may be too small
  • evolution pressure:
    • Microsoft "copilote" ==> npu and ram
    • electricity ==> APU seems 10x more efficient then gpu
    • mac-mini style pc the low cost choice, ep if hardware renewal strategy every 2 years
    • token cost is trending -20x per year:
      • so there is no reason sink too much money on hardware, renew more often, but check minimums
      • apple M4 are the reference: we can only hope from competition from Qualcomm, AMD, Nvidia

asking to AI (gemini flash 2):

  • software:
    • llama.cpp:ggml-org/llama.cpp , greatest but first oriented Mac / Arm
    • ollama: ollama.ai:ollama.ai
    • LM studio:https://lmstudio.ai/
    • torch / Tensorflow / Jax ? not really for cpu
    • intel IPEX-LLM for optimizing on cpu (works below Torch?)
    • ... docker models are looking like a trend

*** gemini flash 2: 2025-03-16

It's great you're looking into running LLMs locally! Here's a breakdown of how to obtain llama.cpp, Ollama, and LM Studio:

1. llama.cpp:

  • GitHub:
    • The primary source is the llama.cpp GitHub repository:
    • From there, you have a few options:
      • Building from source: This offers the most flexibility but requires technical knowledge. You'll need to compile the code according to your operating system.
      • Downloading pre-built binaries: On the github release page, you can find prebuilt binaries. Be sure to download the correct version for your CPU architecture.
  • Key points:
    • llama.cpp is a C++ library, so it's primarily command-line driven.
    • It's highly optimized for CPU inference, especially with quantized models.

2. Ollama:

  • Official Website:
    • The easiest way to get Ollama is from its official website:
    • The website provides straightforward installation instructions for macOS, Linux, and Windows.
    • Installation often involves running a simple installation script.
  • Key points:
    • Ollama simplifies the process of running LLMs locally.
    • It handles model management, making it user-friendly.

3. LM Studio:

  • Official Website:
  • Key points:
    • LM Studio offers a graphical user interface (GUI), making it very accessible.
    • It allows you to easily download and run LLMs from Hugging Face.
    • It uses llama.cpp in the background.

General Tips:

  • When downloading or installing, always prioritize official sources to ensure safety.
  • Pay attention to system requirements, especially RAM and storage space.
  • If you're new to LLMs, Ollama or LM Studio might be easier starting points.

I hope this helps!

simple comparisons:

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions


      [8]ページ先頭

      ©2009-2025 Movatter.jp