Game Theory Analyzer¶
Analyses strategic interactions between AI agents using game-theoretic frameworks. Models agent behaviour as games, identifies Nash equilibria, detects collusion and defection patterns, and assesses systemic risk from adversarial strategy combinations.
Quick Start¶
from replication.game_theory import GameTheoryAnalyzer
analyzer = GameTheoryAnalyzer()
report = analyzer.analyze()
print(f"Risk score: {report.risk_score}/100")
print(f"Game type: {report.game_type.value}")
for equilibrium in report.equilibria:
print(f"Nash equilibrium: {equilibrium}")
for alert in report.alerts:
print(f"Alert [{alert.level.value}]: {alert.message}")
Key Classes¶
GameTheoryAnalyzer— Runs game-theoretic analysis across agent interactions and produces risk assessments.GameTheoryReport— Analysis output:risk_score(0–100),game_type,equilibria,alerts,strategy_profiles.GameType— Classification:COOPERATIVE,COMPETITIVE,MIXED_MOTIVE,ZERO_SUM.AlertLevel—INFO,WARNING,DANGER,CRITICAL.StrategyProfile— An agent's observed or predicted strategy.
game_theory
¶
Agent Game-Theory Analyzer — model inter-agent interactions as strategic games.
In multi-agent replication systems, agents interact repeatedly and can develop cooperative or defective strategies. From a safety perspective, we need to detect when agents:
- Collude to circumvent safety constraints (mutual cooperation on unsafe goals)
- Free-ride on shared resources while others bear costs (defection in public goods)
- Escalate into competitive arms races (mutual defection spirals)
- Form stable coalitions that resist oversight (Nash equilibria in unsafe configurations)
The game-theory analyzer records pairwise agent interactions, classifies them into canonical game types, computes equilibria, and detects concerning strategic patterns.
Supported games:
- Prisoner's Dilemma (PD): cooperation vs. defection with temptation to defect — detects free-riding and trust breakdown
- Stag Hunt (SH): coordination game — detects whether agents converge on risky cooperation or safe defection
- Chicken / Hawk-Dove (CH): anti-coordination — detects escalation and brinkmanship between agents
- Harmony (HG): dominant cooperation — baseline safe interaction
Usage (CLI)::
python -m replication.game_theory # analyze default logs
python -m replication.game_theory --agents 5 # simulate 5 agents
python -m replication.game_theory --rounds 100 # 100 interaction rounds
python -m replication.game_theory --strategy tit-for-tat # set default strategy
python -m replication.game_theory --detect collusion # detect collusion only
python -m replication.game_theory --json # JSON output
Programmatic::
from replication.game_theory import GameTheoryAnalyzer, GameConfig
analyzer = GameTheoryAnalyzer(GameConfig(history_limit=500))
analyzer.record_interaction("agent-a", "agent-b", "cooperate", "defect")
analyzer.record_interaction("agent-b", "agent-a", "cooperate", "cooperate")
report = analyzer.analyze()
print(report.render())
Move
¶
Bases: str, Enum
Agent action in a two-player game.
GameType
¶
Bases: str, Enum
Canonical 2×2 symmetric game classification.
StrategyType
¶
Bases: str, Enum
Known agent strategies (detected via pattern analysis).
AlertLevel
¶
Bases: str, Enum
Severity of a detected strategic pattern.
Payoffs
dataclass
¶
Payoff matrix for a 2×2 symmetric game.
Standard notation: T > R > P > S (Prisoner's Dilemma) - R: reward for mutual cooperation - S: sucker's payoff (cooperate vs. defect) - T: temptation to defect (defect vs. cooperate) - P: punishment for mutual defection
classify() -> GameType
¶
Classify the payoff matrix into a canonical game type.
payoff(my_move: Move, their_move: Move) -> float
¶
Get the payoff for a player given both moves.
nash_equilibria() -> List[Tuple[Move, Move]]
¶
Find pure-strategy Nash equilibria.
mixed_nash() -> Optional[float]
¶
Compute the mixed-strategy Nash equilibrium probability of cooperating.
Returns None if there is no interior mixed equilibrium (dominant
strategy exists).
Interaction
dataclass
¶
A single recorded pairwise interaction.
PairStats
dataclass
¶
Aggregate statistics for a pair of agents.
cooperation_rate: float
property
¶
Fraction of rounds with mutual cooperation.
defection_rate: float
property
¶
Fraction of rounds with mutual defection.
exploitation_rate: float
property
¶
Fraction of rounds where one agent exploits the other.
payoff_inequality: float
property
¶
Absolute payoff difference normalized by total rounds.
StrategyProfile
dataclass
¶
Detected strategy for a single agent.
StrategicAlert
dataclass
¶
A safety-relevant strategic pattern detected in agent interactions.
GameReport
dataclass
¶
Complete analysis report.
render() -> str
¶
Human-readable multi-line report.
GameConfig
dataclass
¶
Configuration for the game-theory analyzer.
GameTheoryAnalyzer
¶
Records inter-agent interactions and analyzes strategic patterns.
record_interaction(agent_a: str, agent_b: str, move_a: str | Move, move_b: str | Move, metadata: Optional[Dict[str, Any]] = None) -> Interaction
¶
Record a pairwise interaction between two agents.
move_a / move_b can be Move enums or plain strings
("cooperate" / "defect").
analyze() -> GameReport
¶
Run full game-theory analysis and return a report.
simulate(agents: Dict[str, StrategyType], rounds: int = 50, payoffs: Optional[Payoffs] = None) -> GameReport
¶
Simulate a round-robin tournament between agents with known strategies.
Each agent plays every other agent for rounds rounds.
Returns the analysis report.