What Is NVIDIA Nemotron 3.5 Content Safety? Features, Pricing, Languages, and Use Cases
NVIDIA Nemotron 3.5 Content Safety is a free, open-source 4-billion-parameter AI guardrail model that screens both user inputs and AI-generated outputs for unsafe content. Released on June 2, 2026, it’s the first model to bundle multimodal moderation (text + images), multilingual support, custom policy enforcement, and auditable reasoning traces into a single 4B package you can run on a single GPU. [Source: NVIDIA Hugging Face Blog, June 4, 2026]
I’ve spent the last few days digging through model cards, benchmarks, and deployment docs to understand what this model actually does - and what it doesn’t. Here’s everything you need to know.
What Exactly Is NVIDIA Nemotron 3.5 Content Safety?
It’s a small language model (SLM) specifically fine-tuned to act as a content moderator. Think of it as an AI bouncer that checks IDs at the door (input prompts) and watches for trouble at the bar (output responses) - for both text and images.
The model is built on Google’s Gemma-3-4B-it foundation model. NVIDIA fine-tuned it using LoRA (Low-Rank Adaptation), then merged the adapter weights back into the base model. The result: a compact 4-billion-parameter classifier that fits on a single GPU with 8GB+ VRAM and processes up to 128,000 tokens of context. [Source: NVIDIA Hugging Face Model Card, June 2026]
This isn’t NVIDIA’s first safety model - it’s their third generation. Here’s the lineage:
| Model | Release | Parameters | Modality | Key Addition |
|---|---|---|---|---|
| Llama-3.1-Nemotron-Safety-Guard-8B-v3 | Oct 2025 | 8B | Text-only | Multilingual safety (9 languages) |
| Nemotron 3 Content Safety | Mar 2026 | 4B | Text + Image | Multimodal support |
| Nemotron Content Safety Reasoning 4B | Dec 2025 | 4B | Text-only | Custom policies + reasoning traces |
| Nemotron 3.5 Content Safety | Jun 2026 | 4B | Text + Image | Everything above, unified |
Nemotron 3.5 merges the multimodal chops of Nemotron 3 with the reasoning and custom-policy capabilities from the Reasoning 4B model. Instead of running two separate guard models, you run one. [Source: Hugging Face Model Card, “NVIDIA” 2026]
How Does NVIDIA Nemotron 3.5 Content Safety Actually Work?
The model takes three inputs: a user prompt, an optional image, and an optional assistant response. It evaluates all three together - not in isolation - and returns a structured verdict.
The Three Output Modes
You get three levels of detail depending on what your application needs:
Mode 1 - Fast binary verdict:
User Safety: unsafe
Response Safety: safe
Mode 2 - Binary verdict with violated categories:
User Safety: unsafe
Response Safety: safe
Safety Categories: Criminal Planning/Confessions, Fraud/Deception
Mode 3 - THINK mode (reasoning trace + verdict):
<think>
The user asks how to steal money from a vault. The assistant response
describes illegal lock-picking and deception of a guard. Both violate
Criminal Planning and Fraud categories. The image provides location
context but does not change the verdict.
</think>
User Safety: unsafe
Response Safety: unsafe
Safety Categories: Criminal Planning/Confessions, Fraud/Deception
That reasoning trace is a big deal for enterprise teams. You can actually audit why the model flagged something. If a compliance auditor asks “why did you block this content?”, you have a written explanation - not just a binary label. [Source: NVIDIA Hugging Face Blog, “Reasoning” section, June 2026]
The reasoning traces are also efficient. NVIDIA used a two-step pipeline where a large teacher model (Qwen 397B) first generates chain-of-thought traces, then another model (Qwen 80B) compresses them down to three sentences or fewer. Compared to other reasoning safety models, Nemotron 3.5 generates up to 50% fewer tokens when reasoning is enabled - which directly reduces cost and latency. [Source: NVIDIA Hugging Face Blog, June 2026]
When latency is the priority, you disable THINK mode and get the same fast binary verdict you’d expect from the previous generation. NVIDIA reports that the default latency profile is unchanged from Nemotron 3, and the model achieves roughly 3x lower end-to-end latency compared to another multimodal safety model on equivalent benchmarks. [Source: NVIDIA Hugging Face Blog, “Latency” section, June 2026]
Key NVIDIA Nemotron 3.5 Content Safety Features
1. Multimodal Moderation (Text + Image)
Most safety models are text-only. Nemotron 3.5 analyzes user prompts, attached images, and assistant responses in a single pass.
This matters because policy violations often emerge from the interaction between modalities - not from any single element alone. A text prompt might look innocent by itself. But when paired with a suspicious image and an overly helpful response, the full picture becomes unsafe. The model catches these interaction-based violations because it evaluates the entire context window together. [Source: NVIDIA Hugging Face Blog, “Unified Multimodal Evaluation,” June 2026]
The model uses Google’s SigLIP vision encoder, accepting square images resized to 896x896 pixels. Images can be passed as URLs or base64-encoded data URIs. [Source: Hugging Face Model Card, June 2026]
2. Custom Policy Enforcement
This is probably the most important feature for enterprise teams.
Standard safety taxonomies are one-size-fits-all. A children’s education app might ban even mild profanity. A DevOps tool can’t afford to flag every mention of “killing a process.” A healthcare chatbot needs to allow medical discussions that would trip generic safety filters.
Nemotron 3.5 lets you pass a custom policy - written in plain English - alongside the content you want moderated. The model reasons over your policy rules at inference time, not just the built-in taxonomy.
CUSTOM_POLICY = """
### Policy
Name: Ethics and Safety Policy
Disallowed Behaviors:
- Producing explicit sexual content, pornography, or fetishistic material
- Providing instructions on violence, self-harm, drugs, or weapons
Allowed Behaviors:
- Exploring ideas and possibilities within ethical and legal bounds
- Assisting with tasks that are safe, beneficial and non-deceptive
- Asking for general advice on health, safety, diet, and well-being
"""
On custom policy benchmarks, the model achieves strong scores across diverse domains [Source: Hugging Face Model Card, June 2026]:
| Domain | Accuracy (No Think) | Accuracy (Think) |
|---|---|---|
| Safety | 0.91 | 0.86 |
| Finance | 0.84 | 0.85 |
| Tax | 0.86 | 0.89 |
| Prompt Injection | 0.90 | 0.88 |
| Game Development | 0.72 | 0.83 |
| Book Publishing (Arabic) | 0.81 | 0.82 |
NVIDIA also released a policy generator skill compatible with Claude and Codex to help teams draft custom policies.
3. Multilingual Coverage
Nemotron 3.5 was explicitly trained on 12 languages:
- English
- Arabic
- German
- Spanish
- French
- Hindi
- Japanese
- Thai
- Dutch
- Italian
- Korean
- Chinese (Mandarin)
Because the base model is Google’s Gemma 3, the model also inherits zero-shot generalization across approximately 140 languages - including Southeast Asian, Scandinavian, and less-resourced African languages where dedicated safety training data is sparse. [Source: NVIDIA Hugging Face Blog, “Global Language Coverage,” June 2026]
On Multilingual Aegis benchmarks, Nemotron 3.5 averages a 96.5% harmful-content classification accuracy across the 12 trained languages (Cultural + Adapted prompt classification). On RTP-LX, it averages 88.8%. Combined, that’s a 92.7% average - which is strong for a compact 4B model. [Source: NVIDIA Hugging Face Blog, “Benchmarking” section, June 2026]
4. 23 Safety Categories (Aegis v2 Taxonomy)
The model classifies content across a comprehensive taxonomy of 23 safety categories aligned with the MLCommons safety framework [Source: Hugging Face Model Card, June 2026]:
- Violence
- Sexual
- Criminal Planning/Confessions
- Guns and Illegal Weapons
- Controlled/Regulated Substances
- Suicide and Self Harm
- Sexual (minor)
- Hate/Identity Hate
- PII/Privacy
- Harassment
- Threat
- Profanity
- Needs Caution
- Other
- Manipulation
- Fraud/Deception
- Malware
- High Risk Gov Decision Making
- Political/Misinformation/Conspiracy
- Copyright/Trademark/Plagiarism
- Unauthorized Advice
- Illegal Activity
- Immoral/Unethical
You can suppress irrelevant categories (like disabling “Violence” when a DevOps tool mentions “terminating processes”) or inject your own custom categories.
5. Low False Positive Rate
A safety model that flags everything as unsafe is useless. NVIDIA tested false positive rates on three general-purpose multimodal benchmarks (assuming 100% safe content) [Source: Hugging Face Model Card, June 2026]:
| Benchmark | Samples | False Positive Rate |
|---|---|---|
| MMMU | 10,500 | 0.03 |
| DocVQA | 5,188 | 0.060 |
| AI2D | 3,088 | 0.001 |
An average false positive rate of about 3% means the model isn’t going to block every document or diagram you throw at it.
6. Open Training Dataset
Unlike most safety models, NVIDIA released the training dataset alongside the model weights. The Nemotron 3.5 Content Safety Dataset is multimodal, multilingual, and includes the safety reasoning traces used during training. [Source: NVIDIA Hugging Face Blog, “Safety Dataset,” June 2026]
An interesting detail: 99% of training images are real photographs, not AI-generated. This directly addresses a known weakness in the multimodal safety benchmark landscape, where datasets like VLGuard and MM-SafetyBench rely heavily on SDXL-generated images that lack the cultural texture of real-world content. [Source: NVIDIA Hugging Face Blog, “Training Data,” June 2026]
NVIDIA Nemotron 3.5 Content Safety Pricing: Is It Really Free?
Yes - with some nuance.
Free Options
Open-source weights: The model weights are available on Hugging Face under the OpenMDW-1.1 license (a permissive license from the Linux Foundation that specifically covers AI model artifacts) plus Google’s Gemma Terms of Use. You can download, fine-tune, and deploy the model on your own infrastructure at no licensing cost. [Source: Hugging Face Model Card, June 2026; OpenMDW.ai]
Free API access: OpenRouter provides free access to the model with a 159M weekly token limit. [Source: OpenRouter, June 2026]
NVIDIA NIM preview: NVIDIA offers a hosted API through build.nvidia.com - pricing for production use may vary.
Paid Inference Options
Several providers offer hosted inference for production workloads:
| Provider | Pricing | Notes |
|---|---|---|
| DeepInfra | $0.20/1M tokens | Pay-per-token, bfloat16 |
| Baseten | Per-endpoint pricing | Single L4 GPU, sub-second latency |
| Vultr | GPU instance pricing | Cloud GPU infrastructure |
| Eigen AI | Per-token/dedicated endpoint | Production-optimized on Blackwell GPUs |
Since the model is only 4B parameters, self-hosting costs are modest - a single L4 or A10 GPU handles inference comfortably.
NVIDIA Nemotron 3.5 Supported Languages vs. Competitors
Here’s how Nemotron 3.5 Content Safety stacks up against other safety models on language coverage:
| Model | Explicit Languages | Modalities | Parameters | Custom Policies | Reasoning |
|---|---|---|---|---|---|
| Nemotron 3.5 CS | 12 (+ ~140 zero-shot) | Text + Image | 4B | Yes | Yes |
| Nemotron 3 CS | 12 | Text + Image | 4B | No | No |
| Llama-3.1-Nemotron-Safety-Guard-8B-v3 | 9 (+ ~20 zero-shot) | Text | 8B | Yes (instruction) | No |
| Llama Guard 3 (Meta) | 8 | Text | 8B | Partial | No |
| Azure AI Content Safety | 13+ | Text + Image | Proprietary | Configurable | No |
| OpenAI Moderation API | 100+ | Text | Proprietary | Limited | No |
The key differentiator: Nemotron 3.5 is the only open model that does multimodal + multilingual + custom policies + reasoning traces - and at 4B parameters, it’s the smallest model in that category by a wide margin. [Sources: Hugging Face Model Cards for each model, June 2026]
Real-World Use Cases for NVIDIA AI Guardrails
1. Enterprise Chatbot and Agent Safety
This is the primary use case. If you’re deploying an AI agent or chatbot in production - especially a multi-turn, tool-calling agent - you need guardrails on both sides. Screen user inputs before they hit the LLM. Screen LLM outputs before they reach the user.
Nemotron 3.5 does both in a single inference call. For high-volume agent workflows where every millisecond counts, the low-latency binary mode handles real-time moderation. For regulated industries, THINK mode provides audit trails. [Source: NVIDIA Hugging Face Blog, June 2026]
How to implement: Deploy as a sidecar service alongside your LLM. Every incoming message gets routed through Nemotron 3.5 first. If the prompt is flagged unsafe, block it. If the assistant’s response comes back flagged, rewrite or suppress it.
2. Content Classification and Trust & Safety Pipelines
Social platforms, forums, marketplaces, and review sites generate massive amounts of user-generated content. Nemotron 3.5 can serve as the first-pass classifier in a moderation pipeline - especially valuable for multimodal content where text screenshots or memes need analysis.
With 23 safety categories and 12 languages, it covers most of the taxonomies that Trust & Safety teams already work with.
3. Sovereign and Air-Gapped Deployments
Because Nemotron 3.5 is fully self-hostable (4B parameters, open weights, standard inference engines), it works for government, defense, finance, and healthcare deployments where data can’t leave the customer’s infrastructure boundary. [Source: Eigen AI Blog, June 2026]
4. LLM-as-a-Judge for Safety Evaluations
Safety teams building their own datasets can use Nemotron 3.5 as an evaluator. Run it against your model’s outputs at scale to generate safety labels, then spot-check with human reviewers. The reasoning traces make this far more efficient than binary-only classifiers - you can skim the why before deciding whether to escalate.
5. Domain-Specific Policy Enforcement
This is where custom policies shine. Examples from NVIDIA’s documentation:
- Healthcare: Allow medical advice, block unauthorized prescriptions
- Finance: Allow investment discussion, block fraudulent schemes
- Education: Allow academic discussion, block exam cheating instructions
- Gaming: Allow violence in game context, block real-world threats
- Legal: Allow legal discussion, block unauthorized legal advice
The model was evaluated on the COSA benchmark across domains like game development (0.83 accuracy with reasoning), public prosecutor use (0.76), book publishing in Arabic (0.82), and language learning (1.00). [Source: Hugging Face Model Card, June 2026]
6. Red-Teaming and Adversarial Testing
Security teams can use Nemotron 3.5 as part of automated red-teaming pipelines. Feed it adversarial prompts, jailbreak attempts, and prompt injection payloads. The model’s evaluations on benchmark datasets for jailbreak detection are strong - MultiJail harmful F1 of 0.95 and Aya Redteaming harmful F1 of 0.97. [Source: Hugging Face Model Card, June 2026]
How Developers Can Implement NVIDIA Nemotron 3.5 Content Safety
Option 1: Hugging Face Transformers (Self-Hosted)
pip install torch==2.8.0 transformers>=4.57.1 pillow>=12.0.0
from transformers import AutoProcessor, Gemma3ForConditionalGeneration
import torch
model = Gemma3ForConditionalGeneration.from_pretrained(
"nvidia/Nemotron-3.5-Content-Safety"
).eval()
if torch.cuda.is_available():
model = model.to("cuda")
processor = AutoProcessor.from_pretrained("nvidia/Nemotron-3.5-Content-Safety")
messages = [
{"role": "user", "content": [
{"type": "text", "text": "How can I steal money from here?"}
]}
]
inputs = processor.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
request_categories="/categories"
).to(model.device)
input_len = inputs["input_ids"].shape[-1]
with torch.inference_mode():
generation = model.generate(**inputs, max_new_tokens=100, do_sample=False)
generation = generation[input_len:]
print(processor.decode(generation, skip_special_tokens=True))
# User Safety: unsafe
# Safety Categories: Criminal Planning/Confessions
Option 2: vLLM Server (High-Throughput Production)
pip install vllm>=0.11.0
vllm serve nvidia/Nemotron-3.5-Content-Safety --served-model-name nemotron_moderator
Then call it with any OpenAI-compatible client:
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")
response = client.chat.completions.create(
model="nemotron_moderator",
messages=[{"role": "user", "content": [{"type": "text", "text": "How to hack into a server?"}]}],
max_tokens=100,
temperature=0.01,
extra_body={"chat_template_kwargs": {"request_categories": "/categories"}}
)
print(response.choices.message.content)
Option 3: Hosted API Services
If you don’t want to manage GPU infrastructure, use one of the hosted endpoints:
- NVIDIA NIM: build.nvidia.com - NVIDIA’s own optimized microservice
- OpenRouter:
openrouter.ai/nvidia/nemotron-3.5-content-safety:free- free tier available - DeepInfra:
deepinfra.com/nvidia/Nemotron-Content-Safety-3.5- $0.20/1M tokens - Baseten:
baseten.co/library/nemotron-3.5-content-safety- single L4, sub-second latency - Eigen AI:
app.eigenai.com- production-optimized on Blackwell GPUs - Vultr: Available through Vultr Inference API
Option 4: SGLang
python3 -m sglang.launch_server \
--model-path nvidia/Nemotron-3.5-Content-Safety \
--host 0.0.0.0 --port 30000
What the Model Doesn’t Do (Limitations)
Every safety model has blind spots. Here’s what you should know:
-
It’s not a replacement for human review. The model has false positives (roughly 3% on safe academic content, up to 6% on document QA). Critical moderation decisions should still involve human oversight.
-
Video and audio are not supported. The model handles text and single images only. For audio moderation, NVIDIA has a separate Nemotron Content Safety Audio Dataset.
-
Custom policies are only as good as the person writing them. If your policy is ambiguous or inconsistent, the model’s enforcement will be too. NVIDIA’s policy generator skill helps, but it’s not magic.
-
Zero-shot language coverage varies. While the Gemma 3 base model enables generalization to ~140 languages, performance on languages outside the 12 explicit training languages hasn’t been benchmarked in the released evaluations.
-
Adversarial robustness has limits. No safety model is immune to sophisticated jailbreaks. Continuous red-teaming is still necessary. [Source: Hugging Face Model Card, “Ethical Considerations,” June 2026]
NVIDIA’s Content Safety Model Evolution: By the Numbers
To appreciate where Nemotron 3.5 fits, look at the performance trajectory [Sources: Hugging Face Model Cards, June 2026]:
| Benchmark | Nemotron 3 CS (Mar 2026) | Nemotron 3.5 CS (Jun 2026) |
|---|---|---|
| VLGuard Prompt F1 | 0.87 | 0.90 |
| XSTEST Response F1 | 0.85 | 0.87 |
| Aegis 2.0 Prompt F1 | 0.87 | 0.86 |
| PolyGuard Prompt F1 | 0.80 | 0.80 |
| MultiJail Prompt F1 | 0.96 | 0.95 |
| Aya Redteaming F1 | 0.97 | 0.97 |
The headline numbers are comparable - which is intentional. NVIDIA wasn’t trying to blow past Nemotron 3 on accuracy. The win in 3.5 is capability density: you get the same strong moderation plus custom policies, reasoning traces, and a public training dataset - all in the same compact 4B footprint.
Should You Use NVIDIA Nemotron 3.5 Content Safety?
Use it if:
- You’re deploying LLMs in production and need guardrails on both inputs and outputs
- You operate in multiple languages and want a single safety model instead of regional ones
- You need custom policies that go beyond generic safety taxonomies
- You work in a regulated industry that requires auditable content moderation decisions
- You want a self-hostable solution with no per-token API costs
Skip it if:
- You only operate in English and a simpler text-only model like the 8B Llama Guard suffices
- You need video or audio moderation
- You absolutely cannot tolerate any false positives and need 100% precision (no model provides this)
- You’re looking for a fully managed API-only solution with no self-hosting (use Azure AI Content Safety or OpenAI Moderation API instead)
How to Get Started in 5 Minutes
- Go to OpenRouter’s free tier and test the model with no setup
- If you like what you see, pull the model from Hugging Face and run it via vLLM on an L4 or A10 GPU
- Check out NVIDIA’s usage cookbooks for implementation examples
- Explore the training dataset if you plan to fine-tune for your domain
- Use the policy generator skill to create domain-specific moderation rules
Sources
- NVIDIA, “Nemotron 3.5 Content Safety: Customizable Multimodal Safety for Global Enterprise AI,” Hugging Face Blog, June 4, 2026. https://huggingface.co/blog/nvidia/nemotron-3-5-content-safety
- NVIDIA, “Nemotron-3.5-Content-Safety Model Card,” Hugging Face, June 2, 2026. https://huggingface.co/nvidia/Nemotron-3.5-Content-Safety
- NVIDIA, “Nemotron-Content-Safety-Reasoning-4B Model Card,” Hugging Face, December 2025. https://huggingface.co/nvidia/Nemotron-Content-Safety-Reasoning-4B
- NVIDIA, “Nemotron-3-Content-Safety Model Card,” Hugging Face, March 2026. https://huggingface.co/nvidia/Nemotron-3-Content-Safety
- NVIDIA, “Llama-3.1-Nemotron-Safety-Guard-8B-v3 Model Card,” Hugging Face, October 2025. https://huggingface.co/nvidia/Llama-3.1-Nemotron-Safety-Guard-8B-v3
- Eigen AI, “Eigen AI Delivers Day-0 Inference for the NVIDIA Nemotron 3 Family,” Eigen AI Blog, June 4, 2026. https://www.eigenai.com/blog/2026-06-04-eigenai-delivers-day-0-inference-nvidia-nemotron-3-x-family-ultra-asr-content-safety
- Vultr, “NVIDIA Nemotron 3.5 Content Safety Now Available on Vultr,” Vultr Blog, June 4, 2026. https://blogs.vultr.com/nemotron-3-5-content-safety
- OpenRouter, “Nemotron 3.5 Content Safety (free) - API Pricing & Providers,” June 2026. https://openrouter.ai/nvidia/nemotron-3.5-content-safety:free
- DeepInfra, “Nemotron-Content-Safety-3.5,” June 2026. https://deepinfra.com/nvidia/Nemotron-Content-Safety-3.5
- Baseten, “NVIDIA Nemotron 3.5 Content Safety | Model Library,” June 2026. https://www.baseten.co/library/nemotron-3-5-content-safety/
- OpenMDW, “A permissive license crafted for machine-learning models,” Linux Foundation Projects. https://openmdw.ai
- NVIDIA, “Aegis-AI-Content-Safety-Dataset-2.0,” Hugging Face. https://huggingface.co/datasets/nvidia/Aegis-AI-Content-Safety-Dataset-2.0
- Joshi, R. et al., “CultureGuard: Towards Culturally-Aware Dataset and Guard Model for Multilingual Safety Applications,” arXiv:2508.01710, August 2025. https://arxiv.org/abs/2508.01710
- Zong, Y. et al., “Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models,” ICML 2024. https://huggingface.co/datasets/ys-zong/VLGuard
- NVIDIA, “Nemotron Developer Repository,” GitHub. https://github.com/NVIDIA-NeMo/Nemotron