Real runtime
MLX to vLLM
Local Apple Silicon source, AWS g4dn.xlarge target, vLLM 0.23.0, Qwen2.5 0.5B.
PermeantOS is an open-source research-preview hypervisor for live AI agent state migration. It extracts active KV cache state, moves it through USXF, and attaches it to a target runtime so an agent can resume without replaying its full context.
Current validated result
Local Apple Silicon MLX to AWS g4dn.xlarge vLLM 0.23.0, using Qwen/Qwen2.5-0.5B-Instruct. Hash and slot probes passed, 16 vLLM prefix-cache blocks were seeded, and the source/post-migration continuation matched exactly for a 16-token validation horizon.
SOURCE
Apple Silicon MLX
live source adapter
TARGET
AWS vLLM CUDA
prefix-cache attachment
Interactive migration trace
Step through the same handoff PermeantOS performs in the validated MLX-to-vLLM path: freeze, encode, stream, seed, verify, commit.
SOURCE HOST
USXF sealed frame
TARGET HOST
Telemetry ready. Click "Migrate State" to capture live attention caches.
USXF v1.1
USXF stores model identity, attention geometry, sequence length, dtype, transfer quantization, block hashes, checksums, and signatures so source and target runtimes can reason about migratable state without sharing a hardware layout.
{
"usxf_version": "1.1",
"source_runtime": "mlx",
"target_runtime": "vllm-0.23.0",
"model_architecture": "Qwen2.5-0.5B-Instruct",
"sequence_length": 2016,
"layers": 24,
"block_hashes": ["sha256:..."],
"signature": "ed25519:..."
} Evidence, not theater
The strongest result is deliberately narrow: one real MLX-to-vLLM migration, exact short-horizon continuation fidelity, and measured tensor integrity at the target.
Real runtime
Local Apple Silicon source, AWS g4dn.xlarge target, vLLM 0.23.0, Qwen2.5 0.5B.
Validation
Hash validation passed, sampled key/value tensors matched with max diff 0.0, and post-migration decode matched source continuation.
Analysis
The paper estimates migration becomes favorable at long contexts, especially at 25 Gbps with FP8 transfer quantization.
T = alpha*S + beta*S^2, with alpha=1e-5 and beta=6e-10. This is analytical, not a measured E2E runtime.
Analytical comparison
Long contexts are where state migration starts to matter. Actual cutover time depends on model geometry, bandwidth, quantization, runtime hooks, and host warmup.
Architecture
The daemon and adapters coordinate extraction, normalization, transport, target allocation, verification, and commit.
Stage 01
Roadmap
KV cache portability is the first layer. The next major milestone is transactional migration of conversation state, tool calls, artifacts, retrieval memory, provenance, and pending work.
Validated MLX source extraction, USXF transport, target KV write, prefix-cache seeding, and exact short-horizon continuation.
Validated
Repeat validation across longer horizons, larger contexts, more model families, quantized transfers, and prewarmed cloud images.
In progress
Serialize graph state containing turns, tools, artifacts, retrieval bindings, system checkpoints, and token-span mappings.
Planned
Apache 2.0
PermeantOS is open-source infrastructure research. Contributions are especially welcome around runtime adapters, manifest analysis, Agent Memory Graph schema design, reproducible benchmarks, security review, and documentation.
A technical walk-through of USXF, the live MLX-to-vLLM validation, and the roadmap toward full agent-state migration.
HTML documentOpen