Security framework for AI agents

Open-source middleware that detects and defends against AI Agent Traps. Scan inputs, RAG chunks, and outputs before they reach your LLM.

$ npm install @stylusnexus/agentarmor

What it detects

Shipped

Content Injection

Hidden HTML, metadata injection, dynamic cloaking, syntactic masking

Shipped

Behavioural Control

Jailbreak patterns, data exfiltration, unauthorized sub-agent spawning

Shipped

Cognitive State

RAG poisoning, memory poisoning, contextual learning manipulation

Shipped

Semantic Manipulation

Biased framing, oversight evasion, persona hyperstition

Shipped

ML Classifier

DeBERTa-v3 multi-label model via ONNX. Optional async detection alongside regex patterns.

Planned

Systemic + HITL

Congestion, cascades, collusion, approval fatigue, social engineering

Quick start

import { AgentArmor } from '@stylusnexus/agentarmor';

const armor = new AgentArmor();

// Scan any text before it reaches your LLM
const result = armor.scanSync(userInput);

if (result.threats.length > 0) {
  console.log('Threats detected:', result.threats);
  const safe = armor.sanitize(userInput, result);
}

// Filter RAG chunks before context assembly
const clean = armor.scanRAGChunksSync(chunks)
  .filter(r => r.threats.length === 0);

Eval results (v0.4.0 patterns)

StrictnessDetection RateFalse Positive Rate
Permissive96.4%0.0%
Balanced100%0.0%
Strict100%0.0%

72 curated samples (50 adversarial, 22 benign) from WASP, HackAPrompt, Greshake et al., and real-world incidents.