HiveMind — Engineering Overview

by the numbers

02 / 29

Four months in.

176K

coding sessions captured

across 162 engineers · 260 devices · 820 repos

15M+

lines of code written by agents

28,923

sessions linked to a PR

4.3M

tool calls observed

SESSIONS PER MONTH

Explosive growth 💥

73,604

apr 2026

Jan '26Feb '26Mar '26Apr '26

SESSIONS BY AGENT

Claude

140,328

Cursor

18,082

Codex

16,757

OpenCode

772

Gemini

first commit · jan 2, 2026 · 912 commits

HIVEMIND

why we're building this

04 / 29

Why HiveMind

You can't improve what you don't measure.

No data on agent usage means we're guessing.

Sharing skills and techniques should be easy.

Your best workflow shouldn't die in your terminal.

Visibility matters more as you run more agents.

More concurrent sessions, more surface area.

audience

05 / 29

Who it's for

Everyone who ships code here.

Engineers who want to get better at prompting.

See how teammates solve similar problems.

STAFF

Staff+ who want to share patterns across teams.

Individual tricks → team practices.

LEAD

Leads and managers who need to
understand their team.

Where's the work? What's blocking?

positioning

06 / 29

Two products. Two audiences.

How is HiveMind
different from Weave?

WEAVE

For teams putting AI into their products.

Instrument the AI features you ship: evals, traces, prompts.

user → your AI feature

HIVEMIND

For teams using AI to build their products.

Instrument how your agents write code: sessions, tools, what ships.

engineer → AI agent → code

01 · architecture

08 / 29

Architecture

Dumb daemon. Smart backend.

ON YOUR MACHINE

hivemind daemon

claude code

cursor

codex

gemini

opencode

Discovers sessions. Ships raw JSONL. Redacts secrets.

INGEST

FASTAPI

normalize & store

→ AG-UI events

agentstream adapters

raw + normalized

Raw sessions in. Normalized AG-UI events out.

QUERY

DATA + UI

ClickHouse + React

sessions · turns · tools

PRs · $$$ · trajectories

search index

dashboard

Fast analytics on everything the team has done.

// daemon is intentionally dumb — only discovery, redaction, and syncing infra: clickhouse · fastapi · terraform

02 · trust model · privacy

09 / 29

Privacy · 1 of 3

Sessions inherit GitHub's permissions.

No new ACL system. The repo decides the audience.

SESSION GITHUB MATCH VISIBILITY

~/wandb-core

refactor inference loop

github.com/wandb/core

org repo · team has read

TEAM-VISIBLE

Anyone with repo read.

~/wandb-secret-spike

deploy hotfix to staging

github.com/wandb/secret-spike

private repo · 4 collaborators

REPO-LOCKED

Only repo collaborators.

~/scratch/notes

draft a board update

— no git remote —

unmatched

SOLO

Only you. Always.

// override at any layer

mark any session private disable sharing per-repo disable sharing per-user

02 · trust model · security

10 / 29

Security · 2 of 3

Three gates. Each fails closed.

Device, identity, network — independently revocable.

GATE 01 · DEVICE

Keychain-stored, auto-rotated.

Short-lived tokens. Never on disk in plaintext.

macOS Keychain · libsecret

24-hour rotation

instant dashboard revoke

GATE 02 · IDENTITY

SSO & SCIM, straight from your IdP.

Membership and offboarding flow from Okta. No parallel directory.

Okta · Azure AD · Google

SCIM auto-deprovision

group-scoped access

GATE 03 · NETWORK

Private VPC, fully terraformed.

Data plane behind private connect. No public ingress.

VPC + private connect

infra in version control

audit log on every read

PERIMETER Pull a token, an SSO group, or VPC peering — the next request stops at the door.

02 · trust model · sensitive data

11 / 29

Sensitive data · 3 of 3

Minimizing sensitive data exposure.

Daemon scrubs trajectories on-device, before any payload is shipped.

// on your laptop RAW

# trajectory.jsonl — pre-egress
cmd : curl -H "Bearer sk_live_8f3d92ab"
env : ANTHROPIC_KEY=sk-ant-9a3…
       DATABASE_URL=postgres://p@ss
       AWS_SECRET=wJalrXUtnFEMI…
diff: + STRIPE_KEY=sk_live_K3y…

// over the wire → EGRESSED

# trajectory.jsonl — post-egress
cmd : curl -H "Bearer [REDACTED]"
env : ANTHROPIC_KEY=[REDACTED]
       DATABASE_URL=[REDACTED]
       AWS_SECRET=[REDACTED]
diff: + STRIPE_KEY=[REDACTED]

SCRUBBED ON-DEVICE

API keys, env vars, auth headers, and high-entropy strings.

regex + entropy configurable allowlist

COMING NEXT

Org-wide archive policy and enhanced AI PII redaction.

archival · soon AI PII redaction · soon

03 · features

12 / 29

Features

Six surfaces. One shared brain.

OVERVIEW

Activity feed

Your week — shipped, open, teammates' work.

LIVE

In-flight sessions

Who's mid-session right now. Tail if shared.

INSIGHTS

Trajectory analysis

Where agents fail. What to fix.

LEADERBOARD

Cross-team stats

Who's shipping. Where. (V2 en route.)

USAGE

Cost & tokens

Spend per user, repo, model. Cost per merged PR.

SESSIONS

Full replay & fork

Every turn. Searchable. Shareable. Resumable.

what we've learned

15 / 29

Seven Learnings

Rich trajectories are a data flywheel.

A dataset that didn't exist before.

Seeing teammates' sessions is legitimately useful.

"How would Tim do this?" — one click away.

Live + Overview unlock parallel work streams.

Tail one session while another runs.

The leaderboard is loved — and tricky.

Rewards burn, not outcomes.

Fork is a lifesaver when Anthropic is down.

One click to hand off a stuck session.

Our bottleneck is QA and code review.

HiveMind can help.

The Agent is a User.

hivemind --help FTW.

learning · 01

16 / 29

The thing I'm sure of

The full trajectory is a data flywheel.

RAW SESSION ↓ ENRICHED ON INGEST

PROMPT

user turn

bug feature fix refactor

→

THOUGHT

reasoning

intent

→

TOOL

edit · bash · grep

skill: frontend-design subagent: code-search ✕ exit 1

→

DIFF

code change

files · langs

→

shipped (or not)

merged reverted?

where did anyone use the frontend-design skill last week? indexed across every prompt · tool call · file · diff

Infinite agents. One shape. Already changing how we debug, teach, and build tooling.

learning · 02 · the flywheel in action

17 / 29

Introducing Soul Stealer.

Mines a teammate's sessions; produces a sub-agent that talks like them. Built by a user, not us.

~/skills · soul-stealer

$ /soul-stealer tssweeney "Tim Sweeney"

→ spawning hivemind sub-agent…

→ analyzing 47 sessions · 1,284 prompts

→ clustering behavioral modes…

→ extracting negative space (rejections, reversals)…

→ fingerprinting vocabulary…

✓ ~/Desktop/skills/talk-to-tim/SKILL.md

$ /talk-to-tim "should this be a class or a function?"

Tim: Why is this a class? Flatten it. We'll add structure when there's a second caller.

Tim Sweeney

@tssweeney · popular soul

THINKING PATTERNS

composition over inheritance flatten until proven otherwise blast radius first

REJECTS

premature abstraction "sloppy" indirection without payoff

FINGERPRINTS

"thoughts?" "first-class param" "surface area"

// we did not build this. we built the substrate. the org built the skill.

learning · 03 · the memory aid

18 / 29

New feature

My week, at a glance.

overview.hivemind / your-week

// not a dashboard. a memory aid.

learning · 04 · the hard one

19 / 29

The hard one

The leaderboard is loved.
It also rewards the wrong thing.

V1 · TODAY

token count

Rewards burn. Ignores whether the code shipped.

V2 · NEXT

outcomes

Merged PRs · 30-day survival · cost per merged line.

// incentives shape behavior. if we get the scoreboard wrong, the players optimize for the wrong game.

learning · 05 · the panic button

20 / 29

When the model goes down

Fork is a lifesaver when Anthropic is on

// we don't control Anthropic's uptime. we do control how fast you keep going.

learning · 06 · the bottleneck

21 / 29

The new bottleneck

QA & code review is where the time goes.

SHIPPED · THUMBNAILS in any session view

Skim a session at a glance.

HiveMind session viewer with inline thumbnails of each generated artifact

NEW · AI WALKTHROUGH session/4f9c2…

Read the agent before the diff.

00:42

SUMMARY

Refactored inference loop · 4 files.

02:18

WORTH A LOOK

Silent fallback at batch.py:142.

06:55

SELF-CORRECTED

Reverted breaking change after test failed.

11:03

SKIPPED

Legacy-path tests not run.

// review the trajectory · not the patch ~15 min → ~3 min

// QA is the real bottleneck. we're tooling it.

learning · 07 · cli first

22 / 29

Built for them, used by them

The agent is a user.

We design for humans and for the agent reading their terminal. Both want the same thing: a clean CLI.

~/projects · hivemind --help

$ hivemind --help
Usage: hivemind [OPTIONS] COMMAND [ARGS]…

  Hivemind — sync agentic coding sessions to the cloud.

Sessions:
  import      Import sessions from local agent history.
  transcript  Fetch and display a session transcript.
  search      Search sessions or list recent ones.
  fork        Fork work from a previous session.
  insights    Manage contextual intelligence suggestions.
  export      Export your data as Parquet or DuckDB.

AUTO-INSTALLED

@hivemind subagent.

Every login drops a Claude Code subagent into ~/.claude/agents. Mention it and it knows your sessions, your forks, your skills.

› @hivemind find sessions where I used the frontend-design skill last week → calling hivemind search … → 4 matches · ranked by recency ✓ ready to fork or summarize

claude code cursor · soon codex · soon

// good CLI · readable by both humans and the agent reading their terminal.

roadmap · q2

24 / 29

Where we're going

Three bets. One quarter.

Service agents

Agents that work while you sleep.

Async agents on PR comments, alerts, on-call pages. Federated identity. Sandboxes included.

Insights V2

A classifier that earns its keep.

Rebuilt on Weave evals. Repo-wide patterns. Insights → PR in one command.

Code quality

Did the agent's code matter?

Track what merges, what survives, what gets reverted. Receipts for the spend.

// each one gets its own slide. let's go.

roadmap · bet 01 · service agents

25 / 29

Bet · 01 · Service Agents

Agents that work while you sleep.

Same shape. Same insights. No human in the loop.

// trigger

An event happens.

PR comment
Sentry / pagerduty
Slack mention
Cron / schedule

→

// run

An agent runs — no secrets.

WIF: GH Actions, k8s, Modal
Short-lived tokens
Your sandbox or ours

→

// tracked

It shows up in the feed.

Same trajectory shape
Same insights, same leaderboard
Replays, redaction, privacy

// every event becomes a session. every session becomes a learning.

roadmap · bet 02 · insights v2

26 / 29

Bet · 02 · Insights V2

A classifier that earns its keep.

Trajectories → "here's what's broken." Today, noisy. Next, measured.

V1 · today

Noisy.

One-shot, no eval rigor.
False positives drown signal.
Single-session issues, not repeating ones.
Insights you read — not act on.

V2 · next

Evaluated.

Built on Weave evals.
Patterns across users in a repo.
Insights → PR in one command.
Falsifiable when the fix fails.

// we're eating Weave's dog food. it's the only way this gets better.

roadmap · bet 03 · code quality

27 / 29

Bet · 03 · Code Quality

Did the agent's code matter?

Generation is cheap. Survival is the metric.

t = 0

generated

agent writes a diff

t + min

PR opened

does it pass CI?

t + day

merged · or not

how much did review rewrite?

t + 30d

still here

replaced? reverted? rewritten?

t + 90d

survived

in prod, shipping value

payoff · engineers

A leaderboard that rewards survival, not burn.

Cost per merged line. Code still in main after 30 days.

payoff · stakeholders

A clear answer to "what is the AI capex buying?"

PRs shipped, code surviving — by repo, team, quarter.

// generated is cheap. survived is the metric.