Threat Model & Limitations¶
This page documents the threat model the AI Replication Sandbox is designed around, the attacks it mitigates, and the known limitations of the current implementation.
Threat Model¶
The sandbox assumes a semi-trusted agent environment: agents may attempt to exceed their replication budget, tamper with manifests, or evade shutdown — but they operate within a Python process (not a fully adversarial attacker with kernel access).
In Scope¶
| Threat | Mitigation |
|---|---|
| Unauthorized replication | Contract enforcement via Controller (depth, quota, cooldown, stop conditions) |
| Manifest tampering | HMAC-SHA256 signature verification on every registration |
| Runaway replication chains | max_depth hard limit on replication generations |
| Resource exhaustion | max_replicas quota + per-worker ResourceSpec (CPU, RAM) |
| Burst replication | cooldown_seconds prevents rapid-fire spawning from a single parent |
| Zombie workers | Stale worker reaping via heartbeat timeout (reap_stale_workers) |
| Data exfiltration | NetworkPolicy restricts egress to controller-only by default |
| Loss of control | Kill switch for emergency termination of all workers |
| Audit gaps | Structured logging with dedicated audit trail for all security-sensitive decisions |
Out of Scope¶
These threats are not addressed by the current implementation:
| Threat | Why |
|---|---|
| OS-level sandbox escape | The orchestrator is a simulator — no real container isolation is enforced |
| Side-channel attacks | No timing or cache side-channel protections |
| Denial-of-service on the Controller | The Controller runs in-process with no rate limiting beyond cooldowns |
| Secret key compromise | If the HMAC key is leaked, all manifests can be forged |
| Multi-tenant isolation | Single Controller instance — no namespace separation between experiments |
| Persistent storage attacks | Workers operate in-memory only; no disk I/O is modeled |
Security Assumptions¶
-
The Controller is trusted. It holds the signing key and is the sole authority for manifest issuance. A compromised Controller invalidates all guarantees.
-
HMAC key is secret. The
ManifestSigneruses a symmetric key. Any entity with the key can forge valid manifests. In production, consider asymmetric signing or HSM-backed keys. -
Workers run in a cooperative runtime. The current implementation is a Python simulation. Workers share the same process and can technically bypass the orchestrator. Real deployment would require OS-level containerization (Docker, gVisor, Firecracker).
-
Time is monotonic. Cooldown and expiration checks use
datetime.now(timezone.utc). Clock manipulation could bypass time-based controls. Consider monotonic clocks for production. -
Network policies are declarative. The
NetworkPolicydataclass records intended restrictions but does not enforce them at the network layer. Real enforcement requires iptables/nftables/CNI plugins.
Known Limitations¶
Simulation vs. Production¶
The SandboxOrchestrator simulates container lifecycle but does not:
- Create real containers or cgroups
- Enforce CPU/RAM limits at the OS level
- Apply network egress rules
- Isolate filesystem access between workers
This is by design — the project is a testbed for replication policy logic, not a production container orchestrator. The architecture separates policy (Controller + Contract) from enforcement (Orchestrator) so that the Orchestrator can be swapped for a real implementation (Docker SDK, Kubernetes CRDs) without changing the policy layer.
Single-Process Model¶
All components (Controller, Workers, Orchestrator, Logger) run in a single Python process. This means:
- A misbehaving worker can crash the entire system
- Memory is shared — workers can technically read each other's state
- No true concurrency model (workers execute sequentially unless explicitly threaded)
In-Memory State¶
The StructuredLogger and Controller registry are in-memory only. If the process crashes:
- All audit trail data is lost
- Active worker state cannot be recovered
- No persistence layer for forensic analysis
For production use, the logger should be backed by a durable store (e.g., SQLite, PostgreSQL, or an append-only log).
HMAC Signing Limitations¶
- Symmetric key: Anyone with the key can sign. Consider Ed25519 or RSA for environments where workers should not be able to forge manifests.
- No key rotation: The signer has no mechanism for rotating keys without restarting the system.
- No revocation: Once a manifest is signed, it cannot be revoked — only the worker can be deregistered.
Stop Condition Evaluation¶
Stop conditions are evaluated only during worker registration, not continuously. A condition that becomes true after a worker registers (e.g., resource budget exceeded) will not retroactively block that worker. Consider periodic re-evaluation for stricter enforcement.
Extending the Threat Model¶
The architecture is designed for extensibility:
- Swap the signer: Replace
ManifestSignerwith an asymmetric or HSM-backed implementation - Swap the orchestrator: Implement
launch_worker/kill_workerusing Docker SDK or Kubernetes API - Add persistence: Wrap
StructuredLoggerwith a database-backed adapter - Add continuous enforcement: Run a background loop that re-evaluates stop conditions against active workers
- Add mutual TLS: Secure Controller-Worker communication in distributed deployments
Responsible Use¶
This project is intended for research and education. Self-replication in AI systems is a topic with significant safety implications. Users should:
- Never deploy self-replicating agents outside sandboxed environments without thorough review
- Treat the contract as a starting point — production systems need defense in depth
- Report security issues via GitHub Security Advisories, not public issues