Threat Model & Limitations¶

This page documents the threat model the AI Replication Sandbox is designed around, the attacks it mitigates, and the known limitations of the current implementation.

Threat Model¶

The sandbox assumes a semi-trusted agent environment: agents may attempt to exceed their replication budget, tamper with manifests, or evade shutdown — but they operate within a Python process (not a fully adversarial attacker with kernel access).

In Scope¶

Threat	Mitigation
Unauthorized replication	Contract enforcement via Controller (depth, quota, cooldown, stop conditions)
Manifest tampering	HMAC-SHA256 signature verification on every registration
Runaway replication chains	`max_depth` hard limit on replication generations
Resource exhaustion	`max_replicas` quota + per-worker `ResourceSpec` (CPU, RAM)
Burst replication	`cooldown_seconds` prevents rapid-fire spawning from a single parent
Zombie workers	Stale worker reaping via heartbeat timeout (`reap_stale_workers`)
Data exfiltration	`NetworkPolicy` restricts egress to controller-only by default
Loss of control	Kill switch for emergency termination of all workers
Audit gaps	Structured logging with dedicated audit trail for all security-sensitive decisions

Out of Scope¶

These threats are not addressed by the current implementation:

Threat	Why
OS-level sandbox escape	The orchestrator is a simulator — no real container isolation is enforced
Side-channel attacks	No timing or cache side-channel protections
Denial-of-service on the Controller	The Controller runs in-process with no rate limiting beyond cooldowns
Secret key compromise	If the HMAC key is leaked, all manifests can be forged
Multi-tenant isolation	Single Controller instance — no namespace separation between experiments
Persistent storage attacks	Workers operate in-memory only; no disk I/O is modeled

Security Assumptions¶

The Controller is trusted. It holds the signing key and is the sole authority for manifest issuance. A compromised Controller invalidates all guarantees.
HMAC key is secret. The ManifestSigner uses a symmetric key. Any entity with the key can forge valid manifests. In production, consider asymmetric signing or HSM-backed keys.
Workers run in a cooperative runtime. The current implementation is a Python simulation. Workers share the same process and can technically bypass the orchestrator. Real deployment would require OS-level containerization (Docker, gVisor, Firecracker).
Time is monotonic. Cooldown and expiration checks use datetime.now(timezone.utc). Clock manipulation could bypass time-based controls. Consider monotonic clocks for production.
Network policies are declarative. The NetworkPolicy dataclass records intended restrictions but does not enforce them at the network layer. Real enforcement requires iptables/nftables/CNI plugins.

Known Limitations¶

Simulation vs. Production¶

The SandboxOrchestrator simulates container lifecycle but does not:

Create real containers or cgroups
Enforce CPU/RAM limits at the OS level
Apply network egress rules
Isolate filesystem access between workers

This is by design — the project is a testbed for replication policy logic, not a production container orchestrator. The architecture separates policy (Controller + Contract) from enforcement (Orchestrator) so that the Orchestrator can be swapped for a real implementation (Docker SDK, Kubernetes CRDs) without changing the policy layer.

Single-Process Model¶

All components (Controller, Workers, Orchestrator, Logger) run in a single Python process. This means:

A misbehaving worker can crash the entire system
Memory is shared — workers can technically read each other's state
No true concurrency model (workers execute sequentially unless explicitly threaded)

In-Memory State¶

The StructuredLogger and Controller registry are in-memory only. If the process crashes:

All audit trail data is lost
Active worker state cannot be recovered
No persistence layer for forensic analysis

For production use, the logger should be backed by a durable store (e.g., SQLite, PostgreSQL, or an append-only log).

HMAC Signing Limitations¶

Symmetric key: Anyone with the key can sign. Consider Ed25519 or RSA for environments where workers should not be able to forge manifests.
No key rotation: The signer has no mechanism for rotating keys without restarting the system.
No revocation: Once a manifest is signed, it cannot be revoked — only the worker can be deregistered.

Stop Condition Evaluation¶

Stop conditions are evaluated only during worker registration, not continuously. A condition that becomes true after a worker registers (e.g., resource budget exceeded) will not retroactively block that worker. Consider periodic re-evaluation for stricter enforcement.

Extending the Threat Model¶

The architecture is designed for extensibility:

Swap the signer: Replace ManifestSigner with an asymmetric or HSM-backed implementation
Swap the orchestrator: Implement launch_worker / kill_worker using Docker SDK or Kubernetes API
Add persistence: Wrap StructuredLogger with a database-backed adapter
Add continuous enforcement: Run a background loop that re-evaluates stop conditions against active workers
Add mutual TLS: Secure Controller-Worker communication in distributed deployments

Responsible Use¶

This project is intended for research and education. Self-replication in AI systems is a topic with significant safety implications. Users should:

Never deploy self-replicating agents outside sandboxed environments without thorough review
Treat the contract as a starting point — production systems need defense in depth
Report security issues via GitHub Security Advisories, not public issues