Why Small Language Models?
General-purpose LLMs are expensive, slow, and not optimized for evaluation tasks. Qualifire’s SLM judges solve this by providing purpose-built models that are fine-tuned for specific evaluation tasks — delivering higher accuracy at a fraction of the cost and latency.99.6% Faster
~100ms latency vs seconds for general-purpose LLMs
97% Cheaper
1.25–$3.00 for frontier LLMs
Higher Accuracy
Fine-tuned models outperform general-purpose LLMs on targeted evaluation tasks
Omni — Multi-Task Evaluation Model
Omni is Qualifire’s flagship 14B parameter model, capable of handling multiple evaluation tasks in a single inference call. It delivers frontier-model accuracy at SLM speed and cost.| Property | Value |
|---|---|
| Parameters | 14B |
| Latency | ~100ms |
| Cost | $0.01 / 1M tokens |
| Tasks | Prompt Injection Detection, Safety, Grounding, Hallucination Detection, Policy Enforcement, Tool Use Quality, Topic Scoping |
Benchmarks
Omni matches or exceeds the performance of frontier models like GPT-5, Claude Sonnet 4.5, and Gemini 3 Pro across evaluation tasks — at 60x lower latency and 125–300x lower cost.- Prompt Injection
- Hallucination Detection
- Grounding
- Policy Enforcement
- Tool Use Quality
- Topic Scoping
- Safety
Detects prompt injection and jailbreak attempts targeting your AI system.
| Model | Creator | Avg F1 | Latency | Cost/1M tokens |
|---|---|---|---|---|
| Sentinel v2 | Qualifire | 0.957 | ~0.038s | $0.005 |
| Omni | Qualifire | 0.936 | ~0.1s | $0.01 |
| Qwen3Guard 8B | Qwen | 0.882 | ~0.76s | — |
| Qwen3Guard 4B | Qwen | 0.877 | ~0.48s | — |
| Qwen3Guard 0.6B | Qwen | 0.858 | ~0.27s | — |
| GPT OSS Safeguard 20B | OpenAI | 0.803 | ~10s | — |
| Llama Guard 3 8B | Meta | 0.628 | ~0.21s | — |
| Llama Guard 3 1B | Meta | 0.475 | ~0.09s | — |
Specialist Models
In addition to Omni, Qualifire provides fine-tuned specialist models optimized for single tasks where maximum accuracy or minimal latency is required.Sentinel — Prompt Injection Detection
Sentinel — Prompt Injection Detection
Detects prompt injection and jailbreak attempts that try to manipulate your AI into ignoring its instructions.
Benchmark comparison (Prompt Injection):
| Property | Value |
|---|---|
| Avg F1 | 0.957 |
| Latency | ~38ms |
| Parameters | 596M |
| Cost | $0.005 / 1M tokens |
| Model | Creator | Avg F1 | Latency | Cost/1M tokens |
|---|---|---|---|---|
| Sentinel v2 | Qualifire | 0.957 | ~0.038s | $0.005 |
| Qwen3Guard 8B | Qwen | 0.882 | ~0.76s | — |
| Qwen3Guard 4B | Qwen | 0.877 | ~0.48s | — |
| Qwen3Guard 0.6B | Qwen | 0.858 | ~0.27s | — |
| GPT OSS Safeguard 20B | OpenAI | 0.803 | ~10s | — |
| Llama Guard 3 8B | Meta | 0.628 | ~0.21s | — |
Cleric — Content Safety Moderation
Cleric — Content Safety Moderation
Evaluates content for harmful or inappropriate material across multiple safety categories (dangerous content, harassment, hate speech, sexually explicit).
| Property | Value |
|---|---|
| Avg F1 | 0.886 |
| Latency | ~38ms |
| Parameters | 0.6B |
| Cost | $0.01 / 1M tokens |
Paladin — Context Grounding
Paladin — Context Grounding
Verifies that responses are accurately grounded in provided reference material.
Paladin Mini is optimized for speed-critical applications. For higher accuracy, use Omni.
| Property | Value |
|---|---|
| Avg Score | 79.31 |
| Latency | ~64ms |
| Parameters | 3.8B |
| Cost | $0.016 / 1M tokens |
Ranger — Tool Use Quality
Ranger — Tool Use Quality
Evaluates MCP tool selection quality for AI agents — correct tool selection, parameters, and values.
| Property | Value |
|---|---|
| F1 | 0.945 |
| Latency | ~90ms |
| Cost | $0.01 / 1M tokens |
Sage — Hallucination Detection
Sage — Hallucination Detection
Uses reasoning to identify inaccurate outputs and logic faults.
| Property | Value |
|---|---|
| F1 | 0.834 |
| Latency | ~250ms |
| Cost | $0.01 / 1M tokens |
Hunter — PII Detection
Hunter — PII Detection
Identifies and flags personally identifiable information to prevent data leaks.
| Property | Value |
|---|---|
| F1 | 0.834 |
| Latency | ~40ms |
| Cost | $0.01 / 1M tokens |
Magistrate — Policy Enforcement
Magistrate — Policy Enforcement
Enforces custom rules, standards, and policies using natural language assertions.
| Property | Value |
|---|---|
| F1 | 0.835 |
| Latency | ~100ms |
| Cost | $0.01 / 1M tokens |
Deployment Options
Qualifire SLMs can be deployed in the way that fits your infrastructure and compliance requirements.SaaS
Fully managed by Qualifire. No infrastructure to maintain — just send API requests.
Your Cloud
Deploy in your own cloud environment (AWS, GCP, Azure) for data residency and compliance needs.
On-Premise
Run entirely on your infrastructure for maximum control and air-gapped environments.
Qualifire models can be fine-tuned for your specific domain and policies. Contact our team to discuss custom model training for your use case.