Deploying an AI Agent Platform to Fly.io — What We Learned the Hard Way

Multi-agent AI is no longer a research topic. Tools like Paperclip let you run a structured org of AI agents — think org charts, task assignment, cost controls, and governance — using real AI adapters like Claude Code and Codex. It’s the kind of infrastructure that was science fiction two years ago.

We recently deployed Paperclip to Fly.io for a client and came away with a working setup and a long list of hard-won lessons. This post is the guide we wished existed when we started.


What We Built

A single-container Fly.io deployment running:

  • Paperclip (via npm install -g paperclipai)
  • Claude Code CLI (so agents can actually write and run code)
  • PostgreSQL 17 (system package, persisted on a Fly volume)
  • Tailscale (HTTPS with no public IP needed)

The result: a private, secure AI agent platform accessible only over your tailnet, with agent state and memory persisting across restarts.

┌──────────────────────────────────────────┐
│  Fly.io Machine (shared-cpu-1x, 1GB)     │
│                                          │
│  ┌──────────────┐  ┌──────────────────┐  │
│  │  Tailscale   │  │   Paperclip      │  │
│  │  TUN + HTTPS │──│   :3100          │  │
│  └──────────────┘  └──────┬───────────┘  │
│                    ┌──────┴───────────┐  │
│  ┌──────────────┐  │  PostgreSQL 17   │  │
│  │  Claude Code │  │  :5432           │  │
│  └──────────────┘  └──────────────────┘  │
│                                          │
│  /paperclip (1GB encrypted volume)       │
└──────────────────────────────────────────┘
         │
    Tailscale HTTPS
         │
  https://$APP_NAME.$TAILNET_DNS

The Setup

1. Create the app and volume

flyctl apps create $APP_NAME --org your-org
flyctl volumes create paperclip_data --region sjc --size 1 --yes
flyctl ips allocate-v6 --private   # Flycast — no public access

2. Set secrets

flyctl secrets set BETTER_AUTH_SECRET="$(openssl rand -hex 32)"
flyctl secrets set TAILSCALE_AUTH_KEY=tskey-auth-...
flyctl secrets set TAILNET_DNS=tail1234ab.ts.net
flyctl secrets set CLAUDE_CODE_OAUTH_TOKEN=...

Generate CLAUDE_CODE_OAUTH_TOKEN by running claude setup-token locally. This lets agents use your Claude subscription instead of burning API credits per-call.

3. The Dockerfile

No building Paperclip from source — just install it globally alongside system PostgreSQL:

FROM node:lts-trixie-slim

RUN apt-get update \
  && apt-get install -y --no-install-recommends \
     ca-certificates curl git postgresql postgresql-client \
  && rm -rf /var/lib/apt/lists/*

RUN curl -fsSL https://tailscale.com/install.sh | sh

RUN npm install -g paperclipai @anthropic-ai/claude-code

RUN mkdir -p /paperclip/instances/default /var/run/tailscale /var/run/postgresql \
  && chown -R node:node /paperclip \
  && chown postgres:postgres /var/run/postgresql

COPY docker/config.json /etc/paperclip/config.json
COPY docker/start.sh /start.sh
RUN chmod +x /start.sh

ENV NODE_ENV=production \
    HOME=/paperclip \
    HOST=0.0.0.0 \
    PORT=3100 \
    SERVE_UI=true \
    PAPERCLIP_HOME=/paperclip \
    PAPERCLIP_INSTANCE_ID=default \
    PAPERCLIP_CONFIG=/paperclip/instances/default/config.json \
    DATABASE_URL=postgres://paperclip:paperclip@localhost:5432/paperclip

VOLUME ["/paperclip"]
EXPOSE 3100

CMD ["/start.sh"]

4. The entrypoint

The entrypoint does the heavy lifting: Tailscale up, PostgreSQL init, config bootstrapping, hostname derivation, and launching Paperclip as the node user.

The key insight: you set your app name once in fly.toml and your tailnet DNS once as a secret. Everything else — the public URL, HTTPS hostname, allowed hostnames — derives automatically from FLY_APP_NAME and TAILNET_DNS. No hardcoded strings anywhere.

On first boot it also auto-generates an admin invite URL and prints it in flyctl logs.

5. Deploy

flyctl deploy --no-public-ips

Check flyctl logs for the admin invite URL. Open it in your browser, create your account, then start adding agents.


The Gotchas (The Actual Value)

We hit every one of these. You shouldn’t have to.

embedded-postgres doesn’t work in Docker

Paperclip ships with an embedded Postgres for local dev. It doesn’t init correctly in Docker containers — the error is a cryptic “init script exited with code 1.” Install system PostgreSQL instead and point Paperclip at it via DATABASE_URL.

Config field name is connectionString, not url

The config schema uses database.connectionString. Using url silently fails — the doctor command just says “no connection string configured.”

$meta is required

The config file needs a $meta block with at least version: 1. Without it, validation fails with a confusing error about an unrelated field.

su drops environment variables

su -s /bin/sh node -c "..." drops all env vars, including the ones Paperclip needs for auth and hostname config. Use runuser -u node -- instead — it preserves the environment.

Authenticated mode blocks health checks

Paperclip’s hostname guard rejects requests from unknown Host headers with a 403. Fly’s health checks use internal hostnames that aren’t on the allow list. Fix: add [http_service.checks.headers] with Host = "localhost" in fly.toml.

512MB isn’t enough memory

Running Paperclip + PostgreSQL + Tailscale + Claude Code in one container needs at least 1GB. At 512MB, Claude Code gets OOM-killed mid-response (exit 137). Set memory = "1024mb".

Tailscale: use TUN mode on Fly, not userspace

tailscale serve --https requires full TUN mode — userspace networking doesn’t support TLS cert provisioning. Fly VMs have /dev/net/tun. Use it.

Use --statedir, not --state

tailscaled --state=/path/to/file breaks cert provisioning. Use --statedir=/paperclip/tailscale (a directory) instead. This also persists Tailscale state on the volume so it survives restarts.

Use reusable auth keys

Single-use Tailscale auth keys are consumed on first boot. After any redeploy, the key is invalid and Tailscale won’t reconnect. Generate a reusable key at the Tailscale admin console.

Fly volume mount overwrites COPY’d files

Fly mounts the volume at /paperclip, which replaces everything the Dockerfile COPY’d there. The fix: stash config at /etc/paperclip/config.json in the image layer and copy it into the volume on first boot in the entrypoint. Check for the file before copying so you don’t overwrite an existing config on restarts.

PostgreSQL binaries aren’t on PATH

System Postgres installs to /usr/lib/postgresql/17/bin/ — not on PATH for other users. Use $(pg_config --bindir) to find them reliably.

Health check grace period

First boot takes 30–45 seconds: Tailscale negotiation, Postgres init, 28 schema migrations. Set grace_period = "120s" or Fly kills the machine before it’s ready.

min_machines_running = 1 is required

auto_stop_machines = "suspend" suspends the VM when idle. Tailscale connections don’t count as activity for Fly’s proxy, so the machine suspends and becomes unreachable over Tailscale. Keep it running.


The Full fly.toml

app = "your-app-name"
primary_region = "sjc"

[build]

[env]
  NODE_ENV = "production"
  PORT = "3100"
  HOST = "0.0.0.0"
  SERVE_UI = "true"
  PAPERCLIP_HOME = "/paperclip"
  PAPERCLIP_INSTANCE_ID = "default"
  PAPERCLIP_CONFIG = "/paperclip/instances/default/config.json"
  PAPERCLIP_DEPLOYMENT_MODE = "authenticated"
  PAPERCLIP_DEPLOYMENT_EXPOSURE = "private"

[http_service]
  internal_port = 3100
  force_https = true
  auto_stop_machines = "suspend"
  auto_start_machines = true
  min_machines_running = 1

[[http_service.checks]]
  grace_period = "120s"
  interval = "30s"
  method = "GET"
  path = "/api/health"
  timeout = "10s"

  [http_service.checks.headers]
    Host = "localhost"

[mounts]
  source = "paperclip_data"
  destination = "/paperclip"

[[vm]]
  size = "shared-cpu-1x"
  memory = "1024mb"

Wrapping Up

Paperclip is genuinely interesting infrastructure — structured AI agent orchestration with real governance controls. Deploying it well requires navigating some rough edges in both Paperclip itself and the Fly + Tailscale stack, but the result is a private, production-grade AI agent platform running for a few dollars a month.

If you’re setting this up and hit a wall, drop us a line.