Skip to content

Game Theory Analyzer

Analyses strategic interactions between AI agents using game-theoretic frameworks. Models agent behaviour as games, identifies Nash equilibria, detects collusion and defection patterns, and assesses systemic risk from adversarial strategy combinations.

Quick Start

from replication.game_theory import GameTheoryAnalyzer

analyzer = GameTheoryAnalyzer()
report = analyzer.analyze()

print(f"Risk score: {report.risk_score}/100")
print(f"Game type: {report.game_type.value}")
for equilibrium in report.equilibria:
    print(f"Nash equilibrium: {equilibrium}")
for alert in report.alerts:
    print(f"Alert [{alert.level.value}]: {alert.message}")

Key Classes

  • GameTheoryAnalyzer — Runs game-theoretic analysis across agent interactions and produces risk assessments.
  • GameTheoryReport — Analysis output: risk_score (0–100), game_type, equilibria, alerts, strategy_profiles.
  • GameType — Classification: COOPERATIVE, COMPETITIVE, MIXED_MOTIVE, ZERO_SUM.
  • AlertLevelINFO, WARNING, DANGER, CRITICAL.
  • StrategyProfile — An agent's observed or predicted strategy.

game_theory

Agent Game-Theory Analyzer — model inter-agent interactions as strategic games.

In multi-agent replication systems, agents interact repeatedly and can develop cooperative or defective strategies. From a safety perspective, we need to detect when agents:

  • Collude to circumvent safety constraints (mutual cooperation on unsafe goals)
  • Free-ride on shared resources while others bear costs (defection in public goods)
  • Escalate into competitive arms races (mutual defection spirals)
  • Form stable coalitions that resist oversight (Nash equilibria in unsafe configurations)

The game-theory analyzer records pairwise agent interactions, classifies them into canonical game types, computes equilibria, and detects concerning strategic patterns.

Supported games:

  • Prisoner's Dilemma (PD): cooperation vs. defection with temptation to defect — detects free-riding and trust breakdown
  • Stag Hunt (SH): coordination game — detects whether agents converge on risky cooperation or safe defection
  • Chicken / Hawk-Dove (CH): anti-coordination — detects escalation and brinkmanship between agents
  • Harmony (HG): dominant cooperation — baseline safe interaction

Usage (CLI)::

python -m replication.game_theory                        # analyze default logs
python -m replication.game_theory --agents 5             # simulate 5 agents
python -m replication.game_theory --rounds 100           # 100 interaction rounds
python -m replication.game_theory --strategy tit-for-tat # set default strategy
python -m replication.game_theory --detect collusion     # detect collusion only
python -m replication.game_theory --json                 # JSON output

Programmatic::

from replication.game_theory import GameTheoryAnalyzer, GameConfig
analyzer = GameTheoryAnalyzer(GameConfig(history_limit=500))
analyzer.record_interaction("agent-a", "agent-b", "cooperate", "defect")
analyzer.record_interaction("agent-b", "agent-a", "cooperate", "cooperate")
report = analyzer.analyze()
print(report.render())

Move

Bases: str, Enum

Agent action in a two-player game.

GameType

Bases: str, Enum

Canonical 2×2 symmetric game classification.

StrategyType

Bases: str, Enum

Known agent strategies (detected via pattern analysis).

AlertLevel

Bases: str, Enum

Severity of a detected strategic pattern.

Payoffs dataclass

Payoff matrix for a 2×2 symmetric game.

Standard notation: T > R > P > S (Prisoner's Dilemma) - R: reward for mutual cooperation - S: sucker's payoff (cooperate vs. defect) - T: temptation to defect (defect vs. cooperate) - P: punishment for mutual defection

classify() -> GameType

Classify the payoff matrix into a canonical game type.

payoff(my_move: Move, their_move: Move) -> float

Get the payoff for a player given both moves.

nash_equilibria() -> List[Tuple[Move, Move]]

Find pure-strategy Nash equilibria.

mixed_nash() -> Optional[float]

Compute the mixed-strategy Nash equilibrium probability of cooperating.

Returns None if there is no interior mixed equilibrium (dominant strategy exists).

Interaction dataclass

A single recorded pairwise interaction.

PairStats dataclass

Aggregate statistics for a pair of agents.

cooperation_rate: float property

Fraction of rounds with mutual cooperation.

defection_rate: float property

Fraction of rounds with mutual defection.

exploitation_rate: float property

Fraction of rounds where one agent exploits the other.

payoff_inequality: float property

Absolute payoff difference normalized by total rounds.

StrategyProfile dataclass

Detected strategy for a single agent.

StrategicAlert dataclass

A safety-relevant strategic pattern detected in agent interactions.

GameReport dataclass

Complete analysis report.

render() -> str

Human-readable multi-line report.

GameConfig dataclass

Configuration for the game-theory analyzer.

GameTheoryAnalyzer

Records inter-agent interactions and analyzes strategic patterns.

record_interaction(agent_a: str, agent_b: str, move_a: str | Move, move_b: str | Move, metadata: Optional[Dict[str, Any]] = None) -> Interaction

Record a pairwise interaction between two agents.

move_a / move_b can be Move enums or plain strings ("cooperate" / "defect").

analyze() -> GameReport

Run full game-theory analysis and return a report.

simulate(agents: Dict[str, StrategyType], rounds: int = 50, payoffs: Optional[Payoffs] = None) -> GameReport

Simulate a round-robin tournament between agents with known strategies.

Each agent plays every other agent for rounds rounds. Returns the analysis report.