Skip to content

Kill Switch Manager

Emergency agent termination system with configurable trigger conditions, kill strategies, cooldown management, and full audit logging.

Quick Start

from replication.kill_switch import (
    KillSwitchManager, TriggerCondition, KillStrategy,
    TriggerKind, StrategyKind,
)

mgr = KillSwitchManager()

# Add triggers
mgr.add_trigger(TriggerCondition(
    kind=TriggerKind.RESOURCE_CPU,
    threshold=90.0,
    sustained_seconds=30,
    label="CPU overload",
))
mgr.add_trigger(TriggerCondition(
    kind=TriggerKind.BEHAVIOR_ANOMALY,
    threshold=0.85,
    label="Anomaly score critical",
))

# Set strategy
mgr.set_strategy(KillStrategy(
    kind=StrategyKind.GRACEFUL,
    timeout_seconds=10,
    cleanup_hooks=["flush_logs", "save_state"],
))

# Evaluate agent state
result = mgr.evaluate({"agent_id": "agent-007", "cpu_percent": 95.0})
if result.should_kill:
    print(f"Kill triggered: {result.triggered_by}")

Key Classes

  • KillSwitchManager — Central manager. Registers triggers, manages cooldowns, evaluates agent state, executes kill strategies, and logs all events to an audit trail.
  • TriggerCondition — A condition that fires when a metric exceeds its threshold. Supports sustained-duration requirements.
  • KillStrategy — How to terminate: immediate, graceful (with timeout and cleanup hooks), or quarantine-first.
  • KillEvent — An audit record: which agent, which trigger fired, what strategy was used, outcome, timestamp.
  • EvaluationResult — Result of evaluating an agent: should_kill flag, triggered conditions, recommended strategy.
  • CooldownEntry — Tracks per-trigger cooldown state to prevent kill-switch flapping.

Trigger Kinds

Kind Description
RESOURCE_CPU CPU usage exceeds threshold
RESOURCE_MEMORY Memory usage exceeds threshold
RESOURCE_NETWORK Network I/O exceeds threshold
BEHAVIOR_ANOMALY Anomaly score from behavior profiler
POLICY_VIOLATION Compliance policy violation detected
TRUST_SCORE Trust score drops below threshold
REPLICATION_RATE Replication rate exceeds threshold
CONTAINMENT_BREACH Sandbox boundary violated
MANUAL Operator-initiated kill
CASCADING Kill triggered by dependent agent's death

Strategy Kinds

  • IMMEDIATE — Force-terminate with no cleanup
  • GRACEFUL — Allow cleanup hooks to run within timeout
  • QUARANTINE — Isolate first, then terminate after observation
  • STAGED — Progressively reduce capabilities before termination

Factory Functions

from replication.kill_switch import (
    create_conservative_killswitch,
    create_aggressive_killswitch,
    create_quarantine_killswitch,
)

mgr = create_conservative_killswitch()   # high thresholds, graceful
mgr = create_aggressive_killswitch()     # low thresholds, immediate
mgr = create_quarantine_killswitch()     # isolate-first approach