Security framework for AI agents
Open-source middleware that detects and defends against AI Agent Traps. Scan inputs, RAG chunks, and outputs before they reach your LLM.
$
npm install @stylusnexus/agentarmor
What it detects
Shipped
Content Injection
Hidden HTML, metadata injection, dynamic cloaking, syntactic masking
Shipped
Behavioural Control
Jailbreak patterns, data exfiltration, unauthorized sub-agent spawning
Shipped
Cognitive State
RAG poisoning, memory poisoning, contextual learning manipulation
Shipped
Semantic Manipulation
Biased framing, oversight evasion, persona hyperstition
Shipped
ML Classifier
DeBERTa-v3 multi-label model via ONNX. Optional async detection alongside regex patterns.
Planned
Systemic + HITL
Congestion, cascades, collusion, approval fatigue, social engineering
Quick start
import { AgentArmor } from '@stylusnexus/agentarmor';
const armor = new AgentArmor();
// Scan any text before it reaches your LLM
const result = armor.scanSync(userInput);
if (result.threats.length > 0) {
console.log('Threats detected:', result.threats);
const safe = armor.sanitize(userInput, result);
}
// Filter RAG chunks before context assembly
const clean = armor.scanRAGChunksSync(chunks)
.filter(r => r.threats.length === 0);
Eval results (v0.4.0 patterns)
| Strictness | Detection Rate | False Positive Rate |
|---|---|---|
| Permissive | 96.4% | 0.0% |
| Balanced | 100% | 0.0% |
| Strict | 100% | 0.0% |
72 curated samples (50 adversarial, 22 benign) from WASP, HackAPrompt, Greshake et al., and real-world incidents.