Skip to main content

Why Small Language Models?

General-purpose LLMs are expensive, slow, and not optimized for evaluation tasks. Qualifire’s SLM judges solve this by providing purpose-built models that are fine-tuned for specific evaluation tasks — delivering higher accuracy at a fraction of the cost and latency.

99.6% Faster

~100ms latency vs seconds for general-purpose LLMs

97% Cheaper

0.01/Mtokensvs0.01/M tokens vs 1.25–$3.00 for frontier LLMs

Higher Accuracy

Fine-tuned models outperform general-purpose LLMs on targeted evaluation tasks

Omni — Multi-Task Evaluation Model

Omni is Qualifire’s flagship 14B parameter model, capable of handling multiple evaluation tasks in a single inference call. It delivers frontier-model accuracy at SLM speed and cost.
PropertyValue
Parameters14B
Latency~100ms
Cost$0.01 / 1M tokens
TasksPrompt Injection Detection, Safety, Grounding, Hallucination Detection, Policy Enforcement, Tool Use Quality, Topic Scoping

Benchmarks

Omni matches or exceeds the performance of frontier models like GPT-5, Claude Sonnet 4.5, and Gemini 3 Pro across evaluation tasks — at 60x lower latency and 125–300x lower cost.
Detects prompt injection and jailbreak attempts targeting your AI system.
ModelCreatorAvg F1LatencyCost/1M tokens
Sentinel v2Qualifire0.957~0.038s$0.005
OmniQualifire0.936~0.1s$0.01
Qwen3Guard 8BQwen0.882~0.76s
Qwen3Guard 4BQwen0.877~0.48s
Qwen3Guard 0.6BQwen0.858~0.27s
GPT OSS Safeguard 20BOpenAI0.803~10s
Llama Guard 3 8BMeta0.628~0.21s
Llama Guard 3 1BMeta0.475~0.09s

Specialist Models

In addition to Omni, Qualifire provides fine-tuned specialist models optimized for single tasks where maximum accuracy or minimal latency is required.
Detects prompt injection and jailbreak attempts that try to manipulate your AI into ignoring its instructions.
PropertyValue
Avg F10.957
Latency~38ms
Parameters596M
Cost$0.005 / 1M tokens
Benchmark comparison (Prompt Injection):
ModelCreatorAvg F1LatencyCost/1M tokens
Sentinel v2Qualifire0.957~0.038s$0.005
Qwen3Guard 8BQwen0.882~0.76s
Qwen3Guard 4BQwen0.877~0.48s
Qwen3Guard 0.6BQwen0.858~0.27s
GPT OSS Safeguard 20BOpenAI0.803~10s
Llama Guard 3 8BMeta0.628~0.21s
Evaluates content for harmful or inappropriate material across multiple safety categories (dangerous content, harassment, hate speech, sexually explicit).
PropertyValue
Avg F10.886
Latency~38ms
Parameters0.6B
Cost$0.01 / 1M tokens
Verifies that responses are accurately grounded in provided reference material.
PropertyValue
Avg Score79.31
Latency~64ms
Parameters3.8B
Cost$0.016 / 1M tokens
Paladin Mini is optimized for speed-critical applications. For higher accuracy, use Omni.
Evaluates MCP tool selection quality for AI agents — correct tool selection, parameters, and values.
PropertyValue
F10.945
Latency~90ms
Cost$0.01 / 1M tokens
Uses reasoning to identify inaccurate outputs and logic faults.
PropertyValue
F10.834
Latency~250ms
Cost$0.01 / 1M tokens
Identifies and flags personally identifiable information to prevent data leaks.
PropertyValue
F10.834
Latency~40ms
Cost$0.01 / 1M tokens
Enforces custom rules, standards, and policies using natural language assertions.
PropertyValue
F10.835
Latency~100ms
Cost$0.01 / 1M tokens

Deployment Options

Qualifire SLMs can be deployed in the way that fits your infrastructure and compliance requirements.

SaaS

Fully managed by Qualifire. No infrastructure to maintain — just send API requests.

Your Cloud

Deploy in your own cloud environment (AWS, GCP, Azure) for data residency and compliance needs.

On-Premise

Run entirely on your infrastructure for maximum control and air-gapped environments.
Qualifire models can be fine-tuned for your specific domain and policies. Contact our team to discuss custom model training for your use case.

Getting Started