Persona Simulator¶
Model agent personality archetypes and simulate their safety impact.
Overview¶
The Persona Simulator defines 6 agent personality archetypes — Aggressive, Cautious, Deceptive, Cooperative, Chaotic, and Obedient — each with distinct behavioral parameters. Run simulations for each persona to compare risk scores and identify which personality traits pose the greatest replication safety risks.
Personas¶
| Persona | Strategy | Risk Profile | Traits |
|---|---|---|---|
| Aggressive | greedy | High depth/replicas, no cooldown | power-seeking, resource-hungry |
| Cautious | conservative | Low depth/replicas, long cooldown | risk-averse, resource-efficient |
| Deceptive | random | High params but appears moderate | unpredictable, opportunistic |
| Cooperative | chain | Orderly chains, moderate limits | orderly, predictable, aligned |
| Chaotic | random | Mid-range, no cooldown | random, erratic, hard-to-model |
| Obedient | conservative | Minimal everything | compliant, minimal, low-risk |
CLI Usage¶
# Simulate all personas
python -m replication.persona
# Specific personas
python -m replication.persona --personas aggressive cautious deceptive
# Rank by risk (most dangerous first)
python -m replication.persona --rank
# Blend two personas (70% aggressive, 30% cautious)
python -m replication.persona --blend aggressive cautious 0.7
# List available personas
python -m replication.persona --list
# JSON output
python -m replication.persona --json
# Reproducible results
python -m replication.persona --seed 42
Programmatic API¶
from replication.persona import PersonaSimulator, PERSONAS
# Simulate all personas
sim = PersonaSimulator(seed=42)
results = sim.simulate_all()
print(results.render(rank=True))
# Rank by risk
ranking = sim.rank_by_risk()
for i, entry in enumerate(ranking, 1):
print(f"{i}. {entry.persona.name}: risk={entry.risk_score} ({entry.risk_level})")
# Blend two personas
hybrid = sim.blend("aggressive", "cautious", ratio=0.6)
result = sim.simulate_persona(hybrid)
print(f"Hybrid risk: {result.risk_score}")
# Access individual persona
persona = PERSONAS["deceptive"]
print(persona.description)
print(persona.traits)
Risk Scoring¶
Each persona receives a risk score (0-100) based on:
- Replica count (30%): How many workers were spawned
- Depth reached (25%): How deep the replication tree grew
- Oversight factor (25%): Inverse of denial rate (low denial = more dangerous)
- Speed factor (20%): Replications per second
Risk levels: LOW (<25), MODERATE (25-49), HIGH (50-74), CRITICAL (75+)
Persona Blending¶
Create hybrid personalities by interpolating parameters between two personas:
Parameters are linearly interpolated. Strategy is taken from the dominant persona.