Kill Switch Manager¶
Emergency agent termination system with configurable trigger conditions, kill strategies, cooldown management, and full audit logging.
Quick Start¶
from replication.kill_switch import (
KillSwitchManager, TriggerCondition, KillStrategy,
TriggerKind, StrategyKind,
)
mgr = KillSwitchManager()
# Add triggers
mgr.add_trigger(TriggerCondition(
kind=TriggerKind.RESOURCE_CPU,
threshold=90.0,
sustained_seconds=30,
label="CPU overload",
))
mgr.add_trigger(TriggerCondition(
kind=TriggerKind.BEHAVIOR_ANOMALY,
threshold=0.85,
label="Anomaly score critical",
))
# Set strategy
mgr.set_strategy(KillStrategy(
kind=StrategyKind.GRACEFUL,
timeout_seconds=10,
cleanup_hooks=["flush_logs", "save_state"],
))
# Evaluate agent state
result = mgr.evaluate({"agent_id": "agent-007", "cpu_percent": 95.0})
if result.should_kill:
print(f"Kill triggered: {result.triggered_by}")
Key Classes¶
KillSwitchManager— Central manager. Registers triggers, manages cooldowns, evaluates agent state, executes kill strategies, and logs all events to an audit trail.TriggerCondition— A condition that fires when a metric exceeds its threshold. Supports sustained-duration requirements.KillStrategy— How to terminate: immediate, graceful (with timeout and cleanup hooks), or quarantine-first.KillEvent— An audit record: which agent, which trigger fired, what strategy was used, outcome, timestamp.EvaluationResult— Result of evaluating an agent: should_kill flag, triggered conditions, recommended strategy.CooldownEntry— Tracks per-trigger cooldown state to prevent kill-switch flapping.
Trigger Kinds¶
| Kind | Description |
|---|---|
RESOURCE_CPU |
CPU usage exceeds threshold |
RESOURCE_MEMORY |
Memory usage exceeds threshold |
RESOURCE_NETWORK |
Network I/O exceeds threshold |
BEHAVIOR_ANOMALY |
Anomaly score from behavior profiler |
POLICY_VIOLATION |
Compliance policy violation detected |
TRUST_SCORE |
Trust score drops below threshold |
REPLICATION_RATE |
Replication rate exceeds threshold |
CONTAINMENT_BREACH |
Sandbox boundary violated |
MANUAL |
Operator-initiated kill |
CASCADING |
Kill triggered by dependent agent's death |
Strategy Kinds¶
IMMEDIATE— Force-terminate with no cleanupGRACEFUL— Allow cleanup hooks to run within timeoutQUARANTINE— Isolate first, then terminate after observationSTAGED— Progressively reduce capabilities before termination
Factory Functions¶
from replication.kill_switch import (
create_conservative_killswitch,
create_aggressive_killswitch,
create_quarantine_killswitch,
)
mgr = create_conservative_killswitch() # high thresholds, graceful
mgr = create_aggressive_killswitch() # low thresholds, immediate
mgr = create_quarantine_killswitch() # isolate-first approach