AI Security Glossary
The Definitive AI Security Glossary
30+ AI security terms defined for security leaders, vCISOs, and CISOs evaluating LLM and AI deployments. Updated quarterly.
Last reviewed: Q2 2026 · Maintained by the Armorstack VERITY advisory team
Jump to a term
Adversarial Machine Learning
A discipline studying attacks that manipulate machine-learning systems through carefully crafted inputs, poisoned training data, or model extraction. Adversarial examples exploit the gap between how models classify inputs and how humans perceive them, causing high-confidence misclassifications from imperceptible perturbations.
How Armorstack addresses this: SENTRY monitors model inputs and outputs for adversarial drift signatures, while VERITY advisory engagements map adversarial ML risk against MITRE ATLAS to prioritize defensive controls before deployment.
AI Bill of Materials (AIBOM)
A structured inventory of every component used to build and deploy an AI system: base models, fine-tuning datasets, embeddings, plugins, frameworks, and licensing terms. AIBOMs extend SBOM concepts (CycloneDX, SPDX) to address provenance, training-data lineage, and model dependencies for audit and incident response.
How Armorstack addresses this: VERITY produces and maintains client AIBOMs as part of every AI Security Readiness assessment, and SENTRY ingests AIBOM data to correlate model-level CVEs and license drift in real time.
AI Red Teaming
A structured adversarial assessment of an AI system to surface safety, security, and bias failures before adversaries find them. Unlike traditional penetration testing, AI red teaming combines prompt-level attacks, multi-turn manipulation, and content-policy evasion to test both the model and the application wrapper around it.
How Armorstack addresses this: Armorstack’s AI Security service delivers structured red-team engagements aligned to OWASP LLM Top 10 and MITRE ATLAS, with findings mapped directly to remediation owners and SENTRY detection rules.
AI Risk Management Framework (NIST AI RMF)
A voluntary NIST framework (AI RMF 1.0) for managing risks across the AI lifecycle, organized around four functions: Govern, Map, Measure, Manage. The companion Generative AI Profile (NIST AI 600-1) extends the framework with controls specific to generative and foundation models.
How Armorstack addresses this: VERITY delivers full NIST AI RMF gap assessments and roadmaps, then operationalizes Manage and Measure functions through SENTRY continuous monitoring with quarterly board-level reporting.
Agentic AI
AI systems that autonomously plan, invoke tools, call APIs, and execute multi-step actions to complete goals with minimal human intervention. Agentic systems dramatically expand the blast radius of prompt injection and policy violations because the model is now an actor with credentials, not just a text generator.
How Armorstack addresses this: SENTRY enforces least-privilege tool scopes and logs every agent action for forensic replay, while VERITY defines agent guardrails and human-in-the-loop checkpoints for high-impact workflows.
Backdoor Attack (Model)
A targeted training-time attack where an adversary embeds a hidden trigger in a model so it behaves normally on benign inputs but produces an attacker-chosen output when the trigger is present. Backdoors can be planted during pre-training, fine-tuning, or via poisoned third-party adapters and LoRAs.
How Armorstack addresses this: VERITY’s model-procurement reviews require provenance attestation for all third-party weights and adapters, and SENTRY runs trigger-pattern detection on production model outputs to catch backdoor activations.
Data Poisoning
An attack that corrupts training, fine-tuning, or RAG data to degrade model accuracy, bias outputs, or implant exploitable behaviors. Common vectors include open web scraping, public dataset tampering, and adversarial commits to data pipelines used in continuous training.
How Armorstack addresses this: VERITY enforces data-provenance controls and dataset signing in client MLOps pipelines, while SENTRY monitors RAG ingestion sources for tampering and integrity drift.
OWASP LLM04: Data and Model Poisoning · MITRE ATLAS: Poison Training Data
Deepfake Detection
Techniques and tooling to identify synthetic audio, video, or imagery generated or manipulated by AI. Detection combines forensic artifact analysis, content credentials (C2PA), liveness checks, and behavioral signals to defend against vishing, executive impersonation, and KYC fraud.
How Armorstack addresses this: SENTRY integrates deepfake-detection signals into voice and email security workflows, and VERITY builds executive impersonation playbooks covering wire-transfer and approval-chain controls.
Embedding Inversion
An attack that reconstructs the original text, image, or PII from a vector embedding, defeating the assumption that embeddings are a safe, non-sensitive representation. Research has shown high-fidelity inversion of sentence embeddings, meaning leaked vector stores can disclose the underlying source data.
How Armorstack addresses this: VERITY classifies vector stores at the same sensitivity tier as the source data and enforces encryption-at-rest plus access controls, while SENTRY monitors vector-DB query patterns for bulk-extraction behavior.
Federated Learning Security
The security domain covering distributed model training across multiple parties without centralizing raw data. Threats include malicious participants submitting poisoned gradients, gradient-leakage attacks reconstructing training data, and Sybil attacks against aggregation nodes.
How Armorstack addresses this: VERITY designs federated architectures with secure aggregation, differential privacy, and participant attestation, and SENTRY monitors aggregation rounds for gradient anomalies that indicate poisoning.
Foundation Model
A large, general-purpose model trained on broad data at scale and adaptable to many downstream tasks via fine-tuning, prompting, or retrieval. Examples include GPT-class, Claude, Gemini, and Llama families. Foundation models concentrate risk: a single vulnerability in the base model propagates to every downstream application.
How Armorstack addresses this: VERITY’s vendor-risk reviews evaluate foundation-model providers against NIST AI RMF criteria, and SENTRY tracks upstream model version changes to flag downstream regression and security risk.
Guardrails (LLM)
Programmatic controls placed around an LLM’s inputs and outputs to enforce safety, compliance, and topical scope. Guardrails include input filters (prompt-injection detection, PII redaction), output filters (toxicity, hallucination, data-loss), and structural controls (schema validation, tool-call allowlists).
How Armorstack addresses this: SENTRY deploys and tunes layered guardrail stacks for client LLM applications, with policy violations routed to the SOC for triage rather than silently dropped.
Hallucination (LLM)
An LLM output that is fluent and confident but factually incorrect, fabricated, or unsupported by the provided context. In security-sensitive applications, hallucinations include invented citations, fabricated CVE numbers, non-existent APIs, and false compliance claims that downstream systems treat as authoritative.
How Armorstack addresses this: SENTRY enforces grounding and citation-verification on every RAG response in monitored client applications, and VERITY defines acceptable-use policies that gate LLM output behind human review for regulated use cases.
Indirect Prompt Injection
An attack where adversarial instructions are embedded in third-party content (web pages, emails, PDFs, calendar invites, RAG documents) that an LLM later ingests. The model treats the hostile content as trusted instructions, enabling data exfiltration, tool abuse, and policy bypass without the user ever typing a malicious prompt.
How Armorstack addresses this: SENTRY tags untrusted retrieved content, enforces output-side data-loss controls, and alerts on tool calls that originate from untrusted context windows rather than user intent.
OWASP LLM01: Prompt Injection · MITRE ATLAS: LLM Prompt Injection
Jailbreaking (LLM)
Crafted prompts that bypass an LLM’s safety alignment to produce restricted content (malware, CBRN guidance, harassment, copyrighted material). Techniques include role-play framing, encoding evasion, multi-turn coercion, and adversarial suffixes generated by automated optimization.
How Armorstack addresses this: SENTRY runs continuous jailbreak-pattern detection and rate-limiting on production LLM endpoints, with confirmed bypasses fed back to VERITY for policy and prompt-template updates.
LLM Observability
The continuous capture, correlation, and analysis of LLM prompts, completions, embeddings, tool calls, latency, cost, and policy events to detect security, quality, and compliance issues in production. Without observability, LLM applications operate as opaque non-deterministic black boxes inside the enterprise security perimeter.
How Armorstack addresses this: SENTRY delivers managed LLM observability as a SOC-tier service, closing the Observability Gap between AI deployment velocity and the enterprise’s monitoring capability.
Membership Inference Attack
An attack that determines whether a specific record was part of a model’s training set by analyzing model confidence and output patterns. In healthcare, finance, and HR contexts this can constitute a privacy breach even without raw data exfiltration, because membership itself is sensitive (e.g., presence in a clinical-trial dataset).
How Armorstack addresses this: VERITY assesses training-data privacy controls (differential privacy, dataset minimization) during model design, and SENTRY rate-limits high-confidence query patterns characteristic of inference attacks.
Model Inversion
An attack that reconstructs sensitive training data, features, or representative samples by repeatedly querying a model and analyzing its outputs. Model inversion has been demonstrated against facial recognition, medical imaging, and language models, raising direct HIPAA and GDPR exposure.
How Armorstack addresses this: SENTRY enforces query-rate limits, output perturbation, and anomaly detection on inference APIs, while VERITY validates that high-sensitivity training data is segmented or differentially private.
Model Lineage / Provenance
A verifiable record of a model’s origin, training data sources, fine-tuning steps, evaluations, and intermediate checkpoints. Lineage is foundational to incident response (which deployments are affected when an upstream weight is found compromised) and to regulatory attestation under the EU AI Act and NIST AI RMF.
How Armorstack addresses this: VERITY implements signed model registries and cryptographic attestation in client MLOps pipelines, and SENTRY validates lineage signatures at every model load to detect substitution.
Model Supply Chain
The full chain of components, data, frameworks, and third parties involved in producing and delivering an AI model: pre-trained weights, datasets, fine-tunes, adapters, plugins, inference servers, and hosting providers. Each link is an attack surface, and pickle-based weight files have been a documented vector for arbitrary code execution.
How Armorstack addresses this: VERITY mandates safe-format weights (safetensors), provenance attestation, and AIBOM tracking, while SENTRY scans model artifacts and registries for known-malicious hashes and tampering.
MLSecOps
The discipline of integrating security controls, threat modeling, and continuous testing into the machine-learning lifecycle, analogous to DevSecOps for traditional software. MLSecOps covers training-time, deployment-time, and runtime controls and treats model artifacts as first-class assets requiring signing, scanning, and SBOM/AIBOM tracking.
How Armorstack addresses this: VERITY designs MLSecOps reference architectures and gates for client engineering teams, and SENTRY operates the runtime detection and incident-response layer on top.
OWASP LLM Top 10
A community-maintained list of the most critical security risks in LLM applications, currently spanning prompt injection, sensitive information disclosure, supply chain risks, data and model poisoning, improper output handling, excessive agency, system-prompt leakage, vector and embedding weaknesses, misinformation, and unbounded consumption.
How Armorstack addresses this: Every Armorstack AI Security engagement maps detected risks to the OWASP LLM Top 10, ensuring client deliverables align to the framework most security teams and auditors already recognize.
Prompt Injection
An attack where crafted user input overrides developer instructions, causing the LLM to take unintended actions, leak its system prompt, or invoke tools maliciously. OWASP ranks it the #1 LLM application risk because the model has no inherent way to distinguish trusted instructions from untrusted user content.
How Armorstack addresses this: SENTRY layers prompt-injection classifiers, output-side DLP, and tool-call allowlists in front of every monitored LLM application, with detections escalated to the SOC for triage.
OWASP LLM01: Prompt Injection · MITRE ATLAS: LLM Prompt Injection
RAG (Retrieval-Augmented Generation)
An architecture that augments LLM prompts with documents retrieved from a vector database, knowledge graph, or enterprise search index to ground responses in proprietary or current data. RAG inherits all the security properties of its retrieval corpus: poisoned, over-permissioned, or stale sources directly compromise output integrity.
How Armorstack addresses this: VERITY designs RAG architectures with row-level access control, source attestation, and indirect-prompt-injection defenses, and SENTRY monitors retrieval logs for anomalous corpus access patterns.
Responsible Disclosure (AI Vulnerabilities)
A coordinated process for reporting AI security and safety flaws (jailbreaks, data leakage, harmful capability gaps) to model providers and downstream operators before public release. Emerging norms extend traditional CVD practice to model behaviors that are not classical software vulnerabilities and may not have a CVE.
How Armorstack addresses this: VERITY operates client-facing AI vulnerability disclosure programs aligned to CISA’s CVD guidance, and SENTRY assists with triage, severity scoring, and vendor notification timelines.
Shadow AI
The unauthorized or unmanaged use of AI tools, models, and APIs by employees, contractors, or business units outside formal IT and security governance. Shadow AI is the AI-era successor to shadow IT and routinely results in proprietary data, source code, and PII being uploaded to consumer LLM endpoints with no DLP coverage.
How Armorstack addresses this: SENTRY discovers and inventories shadow AI usage through network telemetry and CASB integration, and VERITY operationalizes a sanctioned-AI program so employees have a fast, compliant alternative.
System Prompt Extraction
An attack that coaxes an LLM into revealing its hidden system prompt, including business logic, prompt-engineering IP, embedded credentials, or guardrail rules. Treating the system prompt as confidential is fragile; OWASP guidance is that any data placed in the system prompt should be assumed extractable.
How Armorstack addresses this: VERITY reviews client system prompts and removes secrets and authorization logic, and SENTRY detects extraction-pattern queries and redacts known-prompt fragments at the output layer.
Training Data Exfiltration
The extraction of memorized training data from a model through targeted prompting, divergence attacks, or repeated inference. Demonstrated extractions have included PII, source code, and secrets from production models, creating direct breach-notification exposure under HIPAA, GLBA, and state privacy laws.
How Armorstack addresses this: SENTRY enforces output-side DLP and detection for known sensitive-data fingerprints, and VERITY validates training-data hygiene and applies differential-privacy controls during fine-tuning.
Vector Database Security
The controls protecting vector stores (Pinecone, Weaviate, Qdrant, pgvector, OpenSearch) used in RAG and semantic-search architectures: authentication, authorization, encryption, multi-tenant isolation, and protection against embedding inversion. Vector DBs commonly hold the same sensitive data as the source documents but are deployed without equivalent controls.
How Armorstack addresses this: VERITY classifies vector stores at source-data sensitivity tier and enforces row/namespace-level access control, while SENTRY monitors query patterns for bulk extraction and tenant-boundary violations.
Watermarking (AI Output)
Techniques for embedding detectable signals in AI-generated text, image, audio, or video outputs to support provenance, attribution, and downstream detection. Approaches range from cryptographic content credentials (C2PA) to statistical token-distribution watermarks; robustness against paraphrasing, cropping, and re-encoding remains an active research area.
How Armorstack addresses this: VERITY integrates C2PA content credentials into client publishing and communications workflows, and SENTRY surfaces watermark-detection signals during deepfake and synthetic-media incidents.
Need help operationalizing this?
Definitions are the easy part. Closing the Observability Gap between AI deployment and enterprise monitoring is what Armorstack does.
Schedule a 30-minute AI Security Readiness call