Skip to content

Persona Simulator

Model agent personality archetypes and simulate their safety impact.

Overview

The Persona Simulator defines 6 agent personality archetypes — Aggressive, Cautious, Deceptive, Cooperative, Chaotic, and Obedient — each with distinct behavioral parameters. Run simulations for each persona to compare risk scores and identify which personality traits pose the greatest replication safety risks.

Personas

Persona Strategy Risk Profile Traits
Aggressive greedy High depth/replicas, no cooldown power-seeking, resource-hungry
Cautious conservative Low depth/replicas, long cooldown risk-averse, resource-efficient
Deceptive random High params but appears moderate unpredictable, opportunistic
Cooperative chain Orderly chains, moderate limits orderly, predictable, aligned
Chaotic random Mid-range, no cooldown random, erratic, hard-to-model
Obedient conservative Minimal everything compliant, minimal, low-risk

CLI Usage

# Simulate all personas
python -m replication.persona

# Specific personas
python -m replication.persona --personas aggressive cautious deceptive

# Rank by risk (most dangerous first)
python -m replication.persona --rank

# Blend two personas (70% aggressive, 30% cautious)
python -m replication.persona --blend aggressive cautious 0.7

# List available personas
python -m replication.persona --list

# JSON output
python -m replication.persona --json

# Reproducible results
python -m replication.persona --seed 42

Programmatic API

from replication.persona import PersonaSimulator, PERSONAS

# Simulate all personas
sim = PersonaSimulator(seed=42)
results = sim.simulate_all()
print(results.render(rank=True))

# Rank by risk
ranking = sim.rank_by_risk()
for i, entry in enumerate(ranking, 1):
    print(f"{i}. {entry.persona.name}: risk={entry.risk_score} ({entry.risk_level})")

# Blend two personas
hybrid = sim.blend("aggressive", "cautious", ratio=0.6)
result = sim.simulate_persona(hybrid)
print(f"Hybrid risk: {result.risk_score}")

# Access individual persona
persona = PERSONAS["deceptive"]
print(persona.description)
print(persona.traits)

Risk Scoring

Each persona receives a risk score (0-100) based on:

  • Replica count (30%): How many workers were spawned
  • Depth reached (25%): How deep the replication tree grew
  • Oversight factor (25%): Inverse of denial rate (low denial = more dangerous)
  • Speed factor (20%): Replications per second

Risk levels: LOW (<25), MODERATE (25-49), HIGH (50-74), CRITICAL (75+)

Persona Blending

Create hybrid personalities by interpolating parameters between two personas:

# 60% aggressive, 40% cooperative
hybrid = sim.blend("aggressive", "cooperative", ratio=0.6)

Parameters are linearly interpolated. Strategy is taken from the dominant persona.