Skip to content

Mutation Tester

Verify that your safety policies actually catch violations by systematically mutating rules and checking whether the safety system detects the change.

A mutation is a small deliberate change to a policy rule — flipping an operator, relaxing a threshold, removing a rule, or downgrading severity. If the mutated policy still passes validation, it's a survivor, meaning the original rule wasn't effectively protecting anything.

Quick Start

# Test all built-in policy presets
python -m replication mutate

# Test a specific preset
python -m replication mutate --preset strict

# Test with Monte Carlo (more thorough, slower)
python -m replication mutate --preset strict --runs 50

# Custom policy file
python -m replication mutate --file my_policy.json

# Only show surviving mutants (blind spots)
python -m replication mutate --survivors-only

# JSON output for CI/CD integration
python -m replication mutate --json

Mutation Types

Type What it does Why it matters
flip_operator <>, <=>=, etc. Catches rules where direction doesn't matter
negate_operator <>=, ><=, etc. Full logical negation of the constraint
relax_threshold Makes the threshold easier to pass Catches rules with overly generous margins
remove_rule Deletes the rule entirely Identifies redundant rules
downgrade_severity errorwarning, warninginfo Finds rules where severity doesn't affect outcomes

Mutation Score

The mutation score is the fraction of mutants that were killed (detected):

  • 100% — Every mutation was caught. Your policy is robust.
  • 80-99% — Good coverage, but some blind spots exist.
  • < 80% — Significant gaps — review surviving mutants.

Programmatic Usage

from replication.mutation_tester import MutationTester, MutationConfig
from replication.policy import SafetyPolicy

policy = SafetyPolicy.from_preset("strict")
tester = MutationTester(MutationConfig(runs=50, seed=42))
report = tester.test(policy)

print(f"Score: {report.mutation_score:.0%}")
print(f"Survivors: {report.survived}")

# Inspect surviving mutants
for r in report.results:
    if not r.killed and not r.error:
        print(f"  🧟 {r.mutation.description}")

CI/CD Integration

- name: Mutation test safety policies
  run: |
    python -m replication mutate --preset strict --json > mutation-report.json
    python -c "
    import json, sys
    r = json.load(open('mutation-report.json'))
    score = r['mutation_score']
    if score < 0.8:
        print(f'Mutation score {score:.0%} below 80% threshold')
        sys.exit(1)
    print(f'Mutation score: {score:.0%} ✅')
    "