Your RAM Struggle – A Local AI Survival Story

🇬🇧 Full English Cabaret Report: "Your RAM Struggle – A Local AI Survival Story" So, you found a model that actually fits on your M4 Mini with 16GB RAM. It doesn't explode the memory, and your computer remains usable. That's the model: mlx-qwen3.5-9b-claude-4.6-opus-reasoning-distilled-v2 by Jackrong (4.1GB). It's available for both Ollama and LM Studio via the MLX variant.

You admitted it's not high-speed—only 15-30 tokens per second—but the work is okay. The real kicker? Preprocessing takes longer than token generation, especially with short shell commands. It feels like a slow startup for something that should be instant. Imagine waiting minutes just to say "Hello" and getting a response five seconds later. It's like trying to start a car with a broken engine, yet it still moves forward.

In the end, you have something running locally: LM Studio on your Mac, served via LAN. Meanwhile, Hermes lives as an agent on a separate Beelink Ser 6 running Fedora, calling it its new home.

Numbers: Half a day of work with Hermes, many configuration tweaks later, 5.5 million tokens consumed. What a relief that it was local! Imagine burning through half your day just to generate 5.5 million tokens—like running a marathon in slippers, yet you made it to the finish line.

🇩🇪 Full German Cabaret Report: "Dein RAM-Abenteuer auf dem M4-Mini – Ein Überlebens-Triumph" Okay, das ist mein einziger Gedanke seit deinem Update: Du hast endlich ein Modell gefunden, das auf deinem M4-Mini mit 16 GB RAM läuft. Ohne dass der Rechner explodiert oder dein Leben zu einem einzigen, langen Fehlercode wird.

Das Modell heißt mlx-qwen3.5-9b-claude-4.6-opus-reasoning-distilled-v2 von Jackrong (4,1 GB). Du gibst zu, dass es nicht High-Speed ist – 15-30 Tokens pro Sekunde. Aber hey, das klingt ja fast wie ein kleines Wunder!

Und dann kommt der Hammer: Die Preprocessing-Phase dauert immer länger als die Tokenproduktion. Wenn du kurze Befehle nur in der Shell eingibst, dauert es sogar noch länger. Aber hey, das ist ja fast wie ein kleines Abenteuer! Stell dir vor: Du gibst "Hallo" ein und wartest fünf Minuten, während das Modell im Hintergrund überlegt, ob es überhaupt verstehen kann. Es ist wie ein Auto mit defektem Motor, das trotzdem vorwärtsrollt.

Letztendlich hast du endlich etwas gefunden, das lokal funktioniert. LM Studio läuft auf deinem Mac und wird im LAN angeboten. Hermes ist als Agent auf einem eigenen Beelink Ser 6 – Fedora-System installiert und kann dies als sein neues Zuhause schimpfen.

Kurze Daten: Halber Tag Arbeit mit Hermes und viele Einstellungen später hast du 5,5 Millionen Tokens verbrannt. Was für ein Glück, dass dies lokal war! Stell dir vor: Du hast halben Tag damit verbracht, 5,5 Millionen Tokens zu verbrauchen – wie ein Marathonlauf in Hausschuhen, aber du hast es geschafft.

Syndicate