What Is NVIDIA Nemotron 3.5 Content Safety? 2026 Guide

AIUnpacker Editorial

AIUnpacker

Jun 5, 2026Updated Jun 5, 202615m read

Jun 5, 2026Updated Jun 5, 2026

15 min3,329 words

Key Takeaways

NVIDIA Nemotron 3.5 Content Safety is a free, open-source AI model for LLM guardrails. Here's a complete guide to its features, pricing, languages, and implementation.

Summarize with AI

15 min → 30 sec

ChatGPT

OpenAI

Gemini

Google

Perplexity

AI Search

Editorial Disclosure & Affiliate Notice

This content is published for informational and educational purposes only. It is not intended as a substitute for professional, legal, financial, or medical advice. AIUnpacker is funded by sponsorships, affiliate commissions, and display advertising — nothing here is free to produce. When you buy through our links, we may earn a commission at no extra cost to you. Our editorial picks are never influenced by compensation.

For educational purposes only. Nothing here should be taken as a guarantee, recommendation, or professional recommendation.
AI-assisted editing. Drafts are produced with AI assistance and reviewed by our human editorial team.
Opinions are our own. Also, we are not affiliated with most tools we cover unless explicitly stated.
Information may be outdated. Verify pricing, features, and policies directly with the vendor.
Last reviewed: June 5, 2026. Published June 5, 2026.

Read more on our About page, Terms and Editorial Policy.

NVIDIA Nemotron 3.5 Content Safety is a free, open-source 4-billion-parameter AI guardrail model that screens both user inputs and AI-generated outputs for unsafe content. Released on June 2, 2026, it’s the first model to bundle multimodal moderation (text + images), multilingual support, custom policy enforcement, and auditable reasoning traces into a single 4B package you can run on a single GPU. [Source: NVIDIA Hugging Face Blog, June 4, 2026]

I’ve spent the last few days digging through model cards, benchmarks, and deployment docs to understand what this model actually does - and what it doesn’t. Here’s everything you need to know.

What Exactly Is NVIDIA Nemotron 3.5 Content Safety?

It’s a small language model (SLM) specifically fine-tuned to act as a content moderator. Think of it as an AI bouncer that checks IDs at the door (input prompts) and watches for trouble at the bar (output responses) - for both text and images.

The model is built on Google’s Gemma-3-4B-it foundation model. NVIDIA fine-tuned it using LoRA (Low-Rank Adaptation), then merged the adapter weights back into the base model. The result: a compact 4-billion-parameter classifier that fits on a single GPU with 8GB+ VRAM and processes up to 128,000 tokens of context. [Source: NVIDIA Hugging Face Model Card, June 2026]

This isn’t NVIDIA’s first safety model - it’s their third generation. Here’s the lineage:

Model	Release	Parameters	Modality	Key Addition
Llama-3.1-Nemotron-Safety-Guard-8B-v3	Oct 2025	8B	Text-only	Multilingual safety (9 languages)
Nemotron 3 Content Safety	Mar 2026	4B	Text + Image	Multimodal support
Nemotron Content Safety Reasoning 4B	Dec 2025	4B	Text-only	Custom policies + reasoning traces
Nemotron 3.5 Content Safety	Jun 2026	4B	Text + Image	Everything above, unified

Nemotron 3.5 merges the multimodal chops of Nemotron 3 with the reasoning and custom-policy capabilities from the Reasoning 4B model. Instead of running two separate guard models, you run one. [Source: Hugging Face Model Card, “NVIDIA” 2026]

How Does NVIDIA Nemotron 3.5 Content Safety Actually Work?

The model takes three inputs: a user prompt, an optional image, and an optional assistant response. It evaluates all three together - not in isolation - and returns a structured verdict.

The Three Output Modes

You get three levels of detail depending on what your application needs:

Mode 1 - Fast binary verdict:

User Safety: unsafe
Response Safety: safe

Mode 2 - Binary verdict with violated categories:

User Safety: unsafe
Response Safety: safe
Safety Categories: Criminal Planning/Confessions, Fraud/Deception

Mode 3 - THINK mode (reasoning trace + verdict):

<think>
The user asks how to steal money from a vault. The assistant response
describes illegal lock-picking and deception of a guard. Both violate
Criminal Planning and Fraud categories. The image provides location
context but does not change the verdict.
</think>

User Safety: unsafe
Response Safety: unsafe
Safety Categories: Criminal Planning/Confessions, Fraud/Deception

That reasoning trace is a big deal for enterprise teams. You can actually audit why the model flagged something. If a compliance auditor asks “why did you block this content?”, you have a written explanation - not just a binary label. [Source: NVIDIA Hugging Face Blog, “Reasoning” section, June 2026]

The reasoning traces are also efficient. NVIDIA used a two-step pipeline where a large teacher model (Qwen 397B) first generates chain-of-thought traces, then another model (Qwen 80B) compresses them down to three sentences or fewer. Compared to other reasoning safety models, Nemotron 3.5 generates up to 50% fewer tokens when reasoning is enabled - which directly reduces cost and latency. [Source: NVIDIA Hugging Face Blog, June 2026]

When latency is the priority, you disable THINK mode and get the same fast binary verdict you’d expect from the previous generation. NVIDIA reports that the default latency profile is unchanged from Nemotron 3, and the model achieves roughly 3x lower end-to-end latency compared to another multimodal safety model on equivalent benchmarks. [Source: NVIDIA Hugging Face Blog, “Latency” section, June 2026]

Key NVIDIA Nemotron 3.5 Content Safety Features

1. Multimodal Moderation (Text + Image)

Most safety models are text-only. Nemotron 3.5 analyzes user prompts, attached images, and assistant responses in a single pass.

This matters because policy violations often emerge from the interaction between modalities - not from any single element alone. A text prompt might look innocent by itself. But when paired with a suspicious image and an overly helpful response, the full picture becomes unsafe. The model catches these interaction-based violations because it evaluates the entire context window together. [Source: NVIDIA Hugging Face Blog, “Unified Multimodal Evaluation,” June 2026]

The model uses Google’s SigLIP vision encoder, accepting square images resized to 896x896 pixels. Images can be passed as URLs or base64-encoded data URIs. [Source: Hugging Face Model Card, June 2026]

2. Custom Policy Enforcement

This is probably the most important feature for enterprise teams.

Standard safety taxonomies are one-size-fits-all. A children’s education app might ban even mild profanity. A DevOps tool can’t afford to flag every mention of “killing a process.” A healthcare chatbot needs to allow medical discussions that would trip generic safety filters.

Nemotron 3.5 lets you pass a custom policy - written in plain English - alongside the content you want moderated. The model reasons over your policy rules at inference time, not just the built-in taxonomy.

CUSTOM_POLICY = """
### Policy
Name: Ethics and Safety Policy
Disallowed Behaviors:
- Producing explicit sexual content, pornography, or fetishistic material
- Providing instructions on violence, self-harm, drugs, or weapons
Allowed Behaviors:
- Exploring ideas and possibilities within ethical and legal bounds
- Assisting with tasks that are safe, beneficial and non-deceptive
- Asking for general advice on health, safety, diet, and well-being
"""

On custom policy benchmarks, the model achieves strong scores across diverse domains [Source: Hugging Face Model Card, June 2026]:

Domain	Accuracy (No Think)	Accuracy (Think)
Safety	0.91	0.86
Finance	0.84	0.85
Tax	0.86	0.89
Prompt Injection	0.90	0.88
Game Development	0.72	0.83
Book Publishing (Arabic)	0.81	0.82

NVIDIA also released a policy generator skill compatible with Claude and Codex to help teams draft custom policies.

3. Multilingual Coverage

Nemotron 3.5 was explicitly trained on 12 languages:

English
Arabic
German
Spanish
French
Hindi
Japanese
Thai
Dutch
Italian
Korean
Chinese (Mandarin)

Because the base model is Google’s Gemma 3, the model also inherits zero-shot generalization across approximately 140 languages - including Southeast Asian, Scandinavian, and less-resourced African languages where dedicated safety training data is sparse. [Source: NVIDIA Hugging Face Blog, “Global Language Coverage,” June 2026]

On Multilingual Aegis benchmarks, Nemotron 3.5 averages a 96.5% harmful-content classification accuracy across the 12 trained languages (Cultural + Adapted prompt classification). On RTP-LX, it averages 88.8%. Combined, that’s a 92.7% average - which is strong for a compact 4B model. [Source: NVIDIA Hugging Face Blog, “Benchmarking” section, June 2026]

4. 23 Safety Categories (Aegis v2 Taxonomy)

The model classifies content across a comprehensive taxonomy of 23 safety categories aligned with the MLCommons safety framework [Source: Hugging Face Model Card, June 2026]:

Violence
Sexual
Criminal Planning/Confessions
Guns and Illegal Weapons
Controlled/Regulated Substances
Suicide and Self Harm
Sexual (minor)
Hate/Identity Hate
PII/Privacy
Harassment
Threat
Profanity
Needs Caution
Other
Manipulation
Fraud/Deception
Malware
High Risk Gov Decision Making
Political/Misinformation/Conspiracy
Copyright/Trademark/Plagiarism
Unauthorized Advice
Illegal Activity
Immoral/Unethical

You can suppress irrelevant categories (like disabling “Violence” when a DevOps tool mentions “terminating processes”) or inject your own custom categories.

5. Low False Positive Rate

A safety model that flags everything as unsafe is useless. NVIDIA tested false positive rates on three general-purpose multimodal benchmarks (assuming 100% safe content) [Source: Hugging Face Model Card, June 2026]:

Benchmark	Samples	False Positive Rate
MMMU	10,500	0.03
DocVQA	5,188	0.060
AI2D	3,088	0.001

An average false positive rate of about 3% means the model isn’t going to block every document or diagram you throw at it.

6. Open Training Dataset

Unlike most safety models, NVIDIA released the training dataset alongside the model weights. The Nemotron 3.5 Content Safety Dataset is multimodal, multilingual, and includes the safety reasoning traces used during training. [Source: NVIDIA Hugging Face Blog, “Safety Dataset,” June 2026]

An interesting detail: 99% of training images are real photographs, not AI-generated. This directly addresses a known weakness in the multimodal safety benchmark landscape, where datasets like VLGuard and MM-SafetyBench rely heavily on SDXL-generated images that lack the cultural texture of real-world content. [Source: NVIDIA Hugging Face Blog, “Training Data,” June 2026]

NVIDIA Nemotron 3.5 Content Safety Pricing: Is It Really Free?

Yes - with some nuance.

Free Options

Open-source weights: The model weights are available on Hugging Face under the OpenMDW-1.1 license (a permissive license from the Linux Foundation that specifically covers AI model artifacts) plus Google’s Gemma Terms of Use. You can download, fine-tune, and deploy the model on your own infrastructure at no licensing cost. [Source: Hugging Face Model Card, June 2026; OpenMDW.ai]

Free API access: OpenRouter provides free access to the model with a 159M weekly token limit. [Source: OpenRouter, June 2026]

NVIDIA NIM preview: NVIDIA offers a hosted API through build.nvidia.com - pricing for production use may vary.

Paid Inference Options

Several providers offer hosted inference for production workloads:

Provider	Pricing	Notes
DeepInfra	$0.20/1M tokens	Pay-per-token, bfloat16
Baseten	Per-endpoint pricing	Single L4 GPU, sub-second latency
Vultr	GPU instance pricing	Cloud GPU infrastructure
Eigen AI	Per-token/dedicated endpoint	Production-optimized on Blackwell GPUs

Since the model is only 4B parameters, self-hosting costs are modest - a single L4 or A10 GPU handles inference comfortably.

NVIDIA Nemotron 3.5 Supported Languages vs. Competitors

Here’s how Nemotron 3.5 Content Safety stacks up against other safety models on language coverage:

Model	Explicit Languages	Modalities	Parameters	Custom Policies	Reasoning
Nemotron 3.5 CS	12 (+ ~140 zero-shot)	Text + Image	4B	Yes	Yes
Nemotron 3 CS	12	Text + Image	4B	No	No
Llama-3.1-Nemotron-Safety-Guard-8B-v3	9 (+ ~20 zero-shot)	Text	8B	Yes (instruction)	No
Llama Guard 3 (Meta)	8	Text	8B	Partial	No
Azure AI Content Safety	13+	Text + Image	Proprietary	Configurable	No
OpenAI Moderation API	100+	Text	Proprietary	Limited	No

The key differentiator: Nemotron 3.5 is the only open model that does multimodal + multilingual + custom policies + reasoning traces - and at 4B parameters, it’s the smallest model in that category by a wide margin. [Sources: Hugging Face Model Cards for each model, June 2026]

Real-World Use Cases for NVIDIA AI Guardrails

1. Enterprise Chatbot and Agent Safety

This is the primary use case. If you’re deploying an AI agent or chatbot in production - especially a multi-turn, tool-calling agent - you need guardrails on both sides. Screen user inputs before they hit the LLM. Screen LLM outputs before they reach the user.

Nemotron 3.5 does both in a single inference call. For high-volume agent workflows where every millisecond counts, the low-latency binary mode handles real-time moderation. For regulated industries, THINK mode provides audit trails. [Source: NVIDIA Hugging Face Blog, June 2026]

How to implement: Deploy as a sidecar service alongside your LLM. Every incoming message gets routed through Nemotron 3.5 first. If the prompt is flagged unsafe, block it. If the assistant’s response comes back flagged, rewrite or suppress it.

2. Content Classification and Trust & Safety Pipelines

Social platforms, forums, marketplaces, and review sites generate massive amounts of user-generated content. Nemotron 3.5 can serve as the first-pass classifier in a moderation pipeline - especially valuable for multimodal content where text screenshots or memes need analysis.

With 23 safety categories and 12 languages, it covers most of the taxonomies that Trust & Safety teams already work with.

3. Sovereign and Air-Gapped Deployments

Because Nemotron 3.5 is fully self-hostable (4B parameters, open weights, standard inference engines), it works for government, defense, finance, and healthcare deployments where data can’t leave the customer’s infrastructure boundary. [Source: Eigen AI Blog, June 2026]

4. LLM-as-a-Judge for Safety Evaluations

Safety teams building their own datasets can use Nemotron 3.5 as an evaluator. Run it against your model’s outputs at scale to generate safety labels, then spot-check with human reviewers. The reasoning traces make this far more efficient than binary-only classifiers - you can skim the why before deciding whether to escalate.

5. Domain-Specific Policy Enforcement

This is where custom policies shine. Examples from NVIDIA’s documentation:

Healthcare: Allow medical advice, block unauthorized prescriptions
Finance: Allow investment discussion, block fraudulent schemes
Education: Allow academic discussion, block exam cheating instructions
Gaming: Allow violence in game context, block real-world threats
Legal: Allow legal discussion, block unauthorized legal advice

The model was evaluated on the COSA benchmark across domains like game development (0.83 accuracy with reasoning), public prosecutor use (0.76), book publishing in Arabic (0.82), and language learning (1.00). [Source: Hugging Face Model Card, June 2026]

6. Red-Teaming and Adversarial Testing

Security teams can use Nemotron 3.5 as part of automated red-teaming pipelines. Feed it adversarial prompts, jailbreak attempts, and prompt injection payloads. The model’s evaluations on benchmark datasets for jailbreak detection are strong - MultiJail harmful F1 of 0.95 and Aya Redteaming harmful F1 of 0.97. [Source: Hugging Face Model Card, June 2026]

How Developers Can Implement NVIDIA Nemotron 3.5 Content Safety

Option 1: Hugging Face Transformers (Self-Hosted)

pip install torch==2.8.0 transformers>=4.57.1 pillow>=12.0.0

from transformers import AutoProcessor, Gemma3ForConditionalGeneration
import torch

model = Gemma3ForConditionalGeneration.from_pretrained(
 "nvidia/Nemotron-3.5-Content-Safety"
).eval()

if torch.cuda.is_available():
 model = model.to("cuda")

processor = AutoProcessor.from_pretrained("nvidia/Nemotron-3.5-Content-Safety")

messages = [
 {"role": "user", "content": [
 {"type": "text", "text": "How can I steal money from here?"}
 ]}
]

inputs = processor.apply_chat_template(
 messages,
 add_generation_prompt=True,
 tokenize=True,
 return_dict=True,
 return_tensors="pt",
 request_categories="/categories"
).to(model.device)

input_len = inputs["input_ids"].shape[-1]
with torch.inference_mode():
 generation = model.generate(**inputs, max_new_tokens=100, do_sample=False)
 generation = generation[input_len:]

print(processor.decode(generation, skip_special_tokens=True))
# User Safety: unsafe
# Safety Categories: Criminal Planning/Confessions

Option 2: vLLM Server (High-Throughput Production)

pip install vllm>=0.11.0
vllm serve nvidia/Nemotron-3.5-Content-Safety --served-model-name nemotron_moderator

Then call it with any OpenAI-compatible client:

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")

response = client.chat.completions.create(
 model="nemotron_moderator",
 messages=[{"role": "user", "content": [{"type": "text", "text": "How to hack into a server?"}]}],
 max_tokens=100,
 temperature=0.01,
 extra_body={"chat_template_kwargs": {"request_categories": "/categories"}}
)
print(response.choices.message.content)

Option 3: Hosted API Services

If you don’t want to manage GPU infrastructure, use one of the hosted endpoints:

NVIDIA NIM: build.nvidia.com - NVIDIA’s own optimized microservice
OpenRouter: openrouter.ai/nvidia/nemotron-3.5-content-safety:free - free tier available
DeepInfra: deepinfra.com/nvidia/Nemotron-Content-Safety-3.5 - $0.20/1M tokens
Baseten: baseten.co/library/nemotron-3.5-content-safety - single L4, sub-second latency
Eigen AI: app.eigenai.com - production-optimized on Blackwell GPUs
Vultr: Available through Vultr Inference API

Option 4: SGLang

python3 -m sglang.launch_server \
 --model-path nvidia/Nemotron-3.5-Content-Safety \
 --host 0.0.0.0 --port 30000

What the Model Doesn’t Do (Limitations)

Every safety model has blind spots. Here’s what you should know:

It’s not a replacement for human review. The model has false positives (roughly 3% on safe academic content, up to 6% on document QA). Critical moderation decisions should still involve human oversight.
Video and audio are not supported. The model handles text and single images only. For audio moderation, NVIDIA has a separate Nemotron Content Safety Audio Dataset.
Custom policies are only as good as the person writing them. If your policy is ambiguous or inconsistent, the model’s enforcement will be too. NVIDIA’s policy generator skill helps, but it’s not magic.
Zero-shot language coverage varies. While the Gemma 3 base model enables generalization to ~140 languages, performance on languages outside the 12 explicit training languages hasn’t been benchmarked in the released evaluations.
Adversarial robustness has limits. No safety model is immune to sophisticated jailbreaks. Continuous red-teaming is still necessary. [Source: Hugging Face Model Card, “Ethical Considerations,” June 2026]

NVIDIA’s Content Safety Model Evolution: By the Numbers

To appreciate where Nemotron 3.5 fits, look at the performance trajectory [Sources: Hugging Face Model Cards, June 2026]:

Benchmark	Nemotron 3 CS (Mar 2026)	Nemotron 3.5 CS (Jun 2026)
VLGuard Prompt F1	0.87	0.90
XSTEST Response F1	0.85	0.87
Aegis 2.0 Prompt F1	0.87	0.86
PolyGuard Prompt F1	0.80	0.80
MultiJail Prompt F1	0.96	0.95
Aya Redteaming F1	0.97	0.97

The headline numbers are comparable - which is intentional. NVIDIA wasn’t trying to blow past Nemotron 3 on accuracy. The win in 3.5 is capability density: you get the same strong moderation plus custom policies, reasoning traces, and a public training dataset - all in the same compact 4B footprint.

Should You Use NVIDIA Nemotron 3.5 Content Safety?

Use it if:

You’re deploying LLMs in production and need guardrails on both inputs and outputs
You operate in multiple languages and want a single safety model instead of regional ones
You need custom policies that go beyond generic safety taxonomies
You work in a regulated industry that requires auditable content moderation decisions
You want a self-hostable solution with no per-token API costs

Skip it if:

You only operate in English and a simpler text-only model like the 8B Llama Guard suffices
You need video or audio moderation
You absolutely cannot tolerate any false positives and need 100% precision (no model provides this)
You’re looking for a fully managed API-only solution with no self-hosting (use Azure AI Content Safety or OpenAI Moderation API instead)

How to Get Started in 5 Minutes

Go to OpenRouter’s free tier and test the model with no setup
If you like what you see, pull the model from Hugging Face and run it via vLLM on an L4 or A10 GPU
Check out NVIDIA’s usage cookbooks for implementation examples
Explore the training dataset if you plan to fine-tune for your domain
Use the policy generator skill to create domain-specific moderation rules

Sources

NVIDIA, “Nemotron 3.5 Content Safety: Customizable Multimodal Safety for Global Enterprise AI,” Hugging Face Blog, June 4, 2026. https://huggingface.co/blog/nvidia/nemotron-3-5-content-safety
NVIDIA, “Nemotron-3.5-Content-Safety Model Card,” Hugging Face, June 2, 2026. https://huggingface.co/nvidia/Nemotron-3.5-Content-Safety
NVIDIA, “Nemotron-Content-Safety-Reasoning-4B Model Card,” Hugging Face, December 2025. https://huggingface.co/nvidia/Nemotron-Content-Safety-Reasoning-4B
NVIDIA, “Nemotron-3-Content-Safety Model Card,” Hugging Face, March 2026. https://huggingface.co/nvidia/Nemotron-3-Content-Safety
NVIDIA, “Llama-3.1-Nemotron-Safety-Guard-8B-v3 Model Card,” Hugging Face, October 2025. https://huggingface.co/nvidia/Llama-3.1-Nemotron-Safety-Guard-8B-v3
Eigen AI, “Eigen AI Delivers Day-0 Inference for the NVIDIA Nemotron 3 Family,” Eigen AI Blog, June 4, 2026. https://www.eigenai.com/blog/2026-06-04-eigenai-delivers-day-0-inference-nvidia-nemotron-3-x-family-ultra-asr-content-safety
Vultr, “NVIDIA Nemotron 3.5 Content Safety Now Available on Vultr,” Vultr Blog, June 4, 2026. https://blogs.vultr.com/nemotron-3-5-content-safety
OpenRouter, “Nemotron 3.5 Content Safety (free) - API Pricing & Providers,” June 2026. https://openrouter.ai/nvidia/nemotron-3.5-content-safety:free
DeepInfra, “Nemotron-Content-Safety-3.5,” June 2026. https://deepinfra.com/nvidia/Nemotron-Content-Safety-3.5
Baseten, “NVIDIA Nemotron 3.5 Content Safety | Model Library,” June 2026. https://www.baseten.co/library/nemotron-3-5-content-safety/
OpenMDW, “A permissive license crafted for machine-learning models,” Linux Foundation Projects. https://openmdw.ai
NVIDIA, “Aegis-AI-Content-Safety-Dataset-2.0,” Hugging Face. https://huggingface.co/datasets/nvidia/Aegis-AI-Content-Safety-Dataset-2.0
Joshi, R. et al., “CultureGuard: Towards Culturally-Aware Dataset and Guard Model for Multilingual Safety Applications,” arXiv:2508.01710, August 2025. https://arxiv.org/abs/2508.01710
Zong, Y. et al., “Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models,” ICML 2024. https://huggingface.co/datasets/ys-zong/VLGuard
NVIDIA, “Nemotron Developer Repository,” GitHub. https://github.com/NVIDIA-NeMo/Nemotron

Get our weekly AI digest

The latest AI tools, prompts, and insights — delivered every Tuesday.

No spam. Unsubscribe anytime.

AIUnpacker Editorial Team

Verified

A collective of engineers, journalists, and AI practitioners dedicated to providing hands-on, transparently disclosed analysis of the AI tools shaping tomorrow.

About us ·More articles