Agents can move: validated round trip

Move the mind,
not just the model.

PermeantOS is an open-source platform for live AI agent state migration. Validated runs show a graph-attached agent can move from Apple Silicon MLX to AWS vLLM with live KV cache, artifacts, memory, retrieval evidence, and pending tool policy intact; resume activity on the target; export the target decode boundary back through a live vLLM API; import it into MLX; and continue again at the origin.

Current validated result

Local Apple Silicon MLX to AWS g4dn.xlarge vLLM 0.23.0, using Qwen/Qwen2.5-0.5B-Instruct. The strongest run validated a 128-token continuation horizon, a 27-node Agent Memory Graph package, target-side graph activity, reverse vLLM-to-MLX runtime import, and origin return-home continuation. A local MLX-to-llama.cpp proof now shows canonical KV tensors from one runtime can feed directly into another runtime's raw KV writer.

Run Simulator Explore Architecture

2016: long-horizon prefix tokens
24: Qwen layers migrated
27: graph nodes moved

Permeant migration node

USXF v1.1

SOURCE

Apple Silicon MLX

live source adapter

active KV

signed USXF stream

TARGET

AWS vLLM CUDA

prefix-cache attachment

seeded

Interactive migration trace

Live Migration Simulator

Step through the same handoff PermeantOS performs in the validated MLX-to-vLLM path: freeze, encode, stream, seed, verify, commit.

SOURCE HOST

Apple Silicon / MLX

0% Encoded

USXF sealed frame

TARGET HOST

AWS g4dn / vLLM

PERMEANT DAEMON TELEMETRYMLX-TO-VLLM-X64

Telemetry ready. Click "Migrate State" to capture live attention caches.

USXF v1.1

A runtime-neutral envelope for active KV state.

USXF stores model identity, attention geometry, sequence length, dtype, transfer quantization, block hashes, checksums, and signatures so source and target runtimes can reason about migratable state without sharing a hardware layout.

The current implementation validates live KV-cache migration plus Agent Memory Graph transaction binding, target-side activity resume, reverse vLLM-to-MLX runtime-state import, and origin return-home graph continuation for the MLX-to-vLLM path.

{
  "usxf_version": "1.1",
  "source_runtime": "mlx",
  "target_runtime": "vllm-0.23.0",
  "model_architecture": "Qwen2.5-0.5B-Instruct",
  "sequence_length": 2016,
  "layers": 24,
  "block_hashes": ["sha256:..."],
  "signature": "ed25519:..."
}

A compact USXF envelope view: identity, runtime geometry, hashes, and signature in one migratable contract.

Evidence, not theater

Validated migration evidence and platform path.

The strongest result is precise and reproducible: real complex-agent MLX-to-vLLM migration, exact 128-token continuation fidelity on Qwen2.5, target-side Agent Memory Graph continuation, reverse runtime import back into MLX, origin return-home proof, and local MLX-to-llama.cpp canonical KV feed evidence.

Real runtime

MLX to vLLM

Local Apple Silicon source, AWS g4dn.xlarge target, vLLM 0.23.0, Qwen2.5 0.5B, complex graph/KV binding aligned, then graph activity resumed on the target.

Validation

Exact 128-token horizon

Hash validation passed, post-migration decode matched source and target-baseline continuation through the 128-token Qwen2.5 horizon, the target exported its decode boundary, MLX continued from that target-advanced state, and the origin continued from the AWS graph proof.

Analysis

Codec roadmap

Historical QATQ runs showed promising transfer reduction, but QATQ is being matured separately. PermeantOS now treats raw and FP8 as core paths, with QATQ folded back in when it is ready as a proper crate/service.

Latency Savings Calculator

Conversation Context Length32768 tokens

Uses the whitepaper prefill model: T = alpha*S + beta*S^2, with alpha=1e-5 and beta=6e-10. This is analytical, not a measured E2E runtime.

Analytical re-prefill estimate0.97s

Validated live migration result2016-token prefix

Analytical comparison

Long contexts are where state migration starts to matter. Actual cutover time depends on model geometry, bandwidth, quantization, runtime hooks, and host warmup.

Architecture

How PermeantOS works.

The daemon and adapters coordinate extraction, normalization, transport, target allocation, verification, and commit.

Stage 01

Roadmap

Toward a repeatable open-source platform.

KV cache portability is the first layer, and the agent movement claim is now validated for specific runtime paths. The next milestone is product maturity: stable releases, binaries, crates, documentation hub, automated evidence jobs, durable graph sessions, and broader runtime coverage.

KV cache migration

Validated MLX source extraction, USXF transport, Agent Memory Graph binding, target KV write, prefix-cache seeding, exact short-horizon continuation, reverse runtime import, AWS target-side graph activity resume, and origin return-home proof.

Validated

Repeatable platform releases

Add GitHub Releases, signed binaries, checksums, crate publishing readiness, SDK packaging, docs hub, and automated evidence jobs.

Broader runtime fidelity

Repeat validation across longer horizons, larger contexts, more model families, more runtime adapters, and cost-bounded cloud evidence jobs.

In progress

Apache 2.0

Contribute to portable long-running intelligence.

PermeantOS is open-source infrastructure for portable agent state. Contributions are especially welcome around runtime adapters, manifest analysis, Agent Memory Graph export/import, reproducible benchmarks, security review, release packaging, and documentation.

Join GitHub Contributors Read Technical Paper Open Docs Hub

Technical whitepaperJune 2026

PermeantOS White Paper: Live migration for AI agent state

A technical walk-through of USXF, graph-attached MLX-to-vLLM validation, target-side continuation, reverse runtime import, origin return-home proof, MLX-to-llama.cpp evidence, and the roadmap toward a repeatable open-source platform.

HTML documentOpen