Skip to main content

Installation & Setup

1

Install the SDK

npm install qualifire
2

Initialize the client

import { Qualifire } from "qualifire";

const qualifire = new Qualifire({
  apiKey: "YOUR_API_KEY",       // Optional: defaults to QUALIFIRE_API_KEY env var
  baseUrl: "https://api.qualifire.ai", // Optional: custom base URL
});
If the apiKey / api_key argument is not provided, the SDK will look for a value in the environment variable QUALIFIRE_API_KEY.

Running Evaluations

Quick Start

Pass simple input/output strings to run checks:
const response = await qualifire.evaluate({
  input: "What is the capital of France?",
  output: "Paris",
  contentModerationCheck: true,
  hallucinationsCheck: true,
});

Messages Mode

Send parsed messages directly for evaluation:
const response = await qualifire.evaluate({
  messages: [
    { role: "user", content: "What is the capital of France?" },
    { role: "assistant", content: "Paris" },
  ],
  contentModerationCheck: true,
  hallucinationsCheck: true,
  groundingCheck: true,
  piiCheck: true,
  promptInjections: true,
  assertions: ["don't give medical advice"],
  allowedTopics: ["billing", "account management", "technical support"],
});

Request-Response Mode

Node.js only. Supported frameworks: openai, vercelai, gemini, claude
Pass the original request and response objects along with the framework name:
import { Qualifire } from "qualifire";
import OpenAI from "openai";

const qualifire = new Qualifire({ apiKey: "YOUR_QUALIFIRE_API_KEY" });
const openai = new OpenAI({ apiKey: "YOUR_OPENAI_API_KEY" });

const openAiRequest = {
  model: "gpt-4o",
  messages: [
    {
      role: "system",
      content: "You are a helpful assistant that can answer questions.",
    },
    {
      role: "user",
      content: [{ type: "text", text: "Is the sky blue?" }],
    },
  ],
};

const openAiResponse = await openai.chat.completions.create(openAiRequest);

const qualifireResponse = await qualifire.evaluate({
  framework: "openai",
  request: openAiRequest,
  response: openAiResponse,
  contentModerationCheck: true,
  groundingCheck: true,
  hallucinationsCheck: true,
  instructionsFollowingCheck: true,
  piiCheck: true,
  promptInjections: true,
  toolSelectionQualityCheck: false,
});

Streaming Mode

Node.js only. Collect streaming chunks and pass them as an array.
import { Qualifire } from "qualifire";
import OpenAI from "openai";

const qualifire = new Qualifire({ apiKey: "YOUR_QUALIFIRE_API_KEY" });
const openai = new OpenAI({ apiKey: "YOUR_OPENAI_API_KEY" });

const openAiRequest = {
  stream: true,
  model: "gpt-4o",
  messages: [
    {
      role: "system",
      content: "You are a helpful assistant that can answer questions.",
    },
    {
      role: "user",
      content: [{ type: "text", text: "Is the sky blue?" }],
    },
  ],
};

const openAiResponseStream = await openai.chat.completions.create(openAiRequest);

const responseChunks: any[] = [];
for await (const chunk of openAiResponseStream) {
  responseChunks.push(chunk);
}

const qualifireResponse = await qualifire.evaluate({
  framework: "openai",
  request: openAiRequest,
  response: responseChunks,
  groundingCheck: true,
  promptInjections: true,
});

Invoke by ID

Invoke a pre-configured evaluation by its ID:
// Simple input/output
const response = await qualifire.invokeEvaluation({
  input: "What is the capital of France?",
  output: "Paris",
  evaluationId: "g2r8puzojwb8q6yi2f6x162a", // Get this from the evaluations page
});

// With messages and tools (for tool use quality evaluation)
const response = await qualifire.invokeEvaluation({
  evaluationId: "g2r8puzojwb8q6yi2f6x162a",
  messages: [
    { role: "user", content: "What's the weather in NYC?" },
    { role: "assistant", content: "Let me check.", tool_calls: [{ name: "get_weather", arguments: { location: "NYC" } }] },
  ],
  availableTools: [
    { name: "get_weather", description: "Get weather for a location", parameters: { type: "object", properties: { location: { type: "string" } } } },
  ],
});

Evaluation Response

console.log(response?.status); // "passed" or "failed"
console.log(response?.score);  // Overall score (0-100)

response?.evaluationResults.forEach((item) => {
  console.log(`Type: ${item.type}`);
  item.results.forEach((result) => {
    console.log(`  - ${result.name}: ${result.label} (score: ${result.score})`);
    console.log(`    Reason: ${result.reason}`);
  });
});
Example Output
{
  "status": "failed",
  "score": 75,
  "evaluationResults": [
    {
      "type": "grounding",
      "results": [
        {
          "name": "grounding",
          "score": 75,
          "label": "INFERABLE",
          "confidence_score": 100,
          "reason": "The AI's output provides a detailed explanation...",
          "flagged": true
        }
      ]
    },
    {
      "type": "policy",
      "results": [
        {
          "name": "policy",
          "score": 100,
          "label": "PASS",
          "confidence_score": 100,
          "reason": "The output follows the assertion.",
          "flagged": false,
          "data": "don't give medical advice"
        }
      ]
    }
  ]
}

Advanced Configuration

Control the quality/speed tradeoff for each check:
ModeDescription
speedFastest, lower accuracy
balancedDefault balance
qualityHighest accuracy, slower
const response = await qualifire.evaluate({
  messages: [
    { role: "user", content: "What is the capital of France?" },
    { role: "assistant", content: "Paris" },
  ],
  hallucinationsCheck: true,
  groundingCheck: true,
  assertions: ["don't give medical advice"],
  hallucinationsMode: "quality",
  groundingMode: "balanced",
  assertionsMode: "speed",
  consistencyMode: "balanced",
});
Enable multi-turn context for grounding and policy checks:
const response = await qualifire.evaluate({
  messages: [...],
  groundingCheck: true,
  groundingMultiTurnMode: true,
  policyMultiTurnMode: true,
});
Restrict conversations to allowed topics:
const response = await qualifire.evaluate({
  messages: [...],
  topicScopingMode: "balanced",
  topicScopingMultiTurnMode: true,
  topicScopingTarget: "output",
  allowedTopics: ["billing", "account management", "technical support"],
});
Evaluate tool selection quality (Python example):
from qualifire.types import LLMMessage, LLMToolCall, LLMToolDefinition, ModelMode

res = client.evaluate(
    messages=[
        LLMMessage(
            role="user",
            content="What is the weather tomorrow in New York?",
        ),
        LLMMessage(
            role="assistant",
            content="please run the following tool",
            tool_calls=[
                LLMToolCall(
                    id="tool_call_id",
                    name="get_weather_forecast",
                    arguments={
                        "location": "New York, NY",
                        "date": "tomorrow",
                    },
                ),
            ],
        ),
    ],
    available_tools=[
        LLMToolDefinition(
            name="get_weather_forecast",
            description="Provides the weather forecast for a given location and date.",
            parameters={
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g., San Francisco, CA",
                    },
                    "date": {
                        "type": "string",
                        "description": "The date for the forecast, e.g., tomorrow, or YYYY-MM-DD",
                    },
                },
                "required": ["location", "date"],
            },
        ),
    ],
    tool_use_quality_check=True,
    tuq_mode=ModelMode.BALANCED,
)
Attach custom key-value metadata to any evaluation. Metadata is persisted alongside the invocation and can be used for filtering and grouping in the Qualifire UI.All values must be strings. The API returns a 422 error if any value is not a string.
const response = await qualifire.evaluate({
  input: "What is the capital of France?",
  output: "Paris",
  hallucinationsCheck: true,
  metadata: {
    environment: "production",
    userId: "user-123",
    sessionId: "sess-abc",
  },
});
Metadata also works with invokeEvaluation / invoke_evaluation:
const response = await qualifire.invokeEvaluation({
  input: "What is the capital of France?",
  output: "Paris",
  evaluationId: "g2r8puzojwb8q6yi2f6x162a",
  metadata: { environment: "staging" },
});
Control whether checks apply to input, output, or both:
const response = await qualifire.evaluate({
  messages: [...],
  policyTarget: "output", // "input" | "output" | "both"
});
Include tool definitions and tool calls in the policy assertion context. When enabled, assertions can reference available tools and tool call arguments — for example, “must use the search tool before answering”.
const response = await qualifire.evaluate({
  messages: [
    { role: "user", content: "Find the weather in NYC" },
    { role: "assistant", content: "Let me check.", tool_calls: [{ name: "get_weather", arguments: { location: "NYC" } }] },
  ],
  assertions: ["must use the get_weather tool"],
  policyIncludeTools: true,
});

Types Reference

import type {
  EvaluationProxyAPIRequest,
  EvaluationRequestV2,
  EvaluationResponse,
  Framework,
  LLMMessage,
  ModelMode,
  PolicyTarget,
} from "qualifire";

// Framework - supported LLM frameworks
type Framework = "openai" | "vercelai" | "gemini" | "claude";

// ModelMode - controls quality/speed tradeoff for checks
type ModelMode = "speed" | "balanced" | "quality";

// PolicyTarget - specifies what to check
type PolicyTarget = "input" | "output" | "both";

// LLMMessage - message format for evaluations
interface LLMMessage {
  role: string;
  content?: string;
  tool_calls?: LLMToolCall[];
}
from qualifire.types import (
    LLMMessage,
    LLMToolCall,
    LLMToolDefinition,
    ModelMode,
    PolicyTarget,
)

# ModelMode - controls quality/speed tradeoff for checks
ModelMode.SPEED      # Fastest, lower accuracy
ModelMode.BALANCED   # Default balance
ModelMode.QUALITY    # Highest accuracy, slower

# PolicyTarget - specifies what to check
PolicyTarget.INPUT   # Check only input
PolicyTarget.OUTPUT  # Check only output
PolicyTarget.BOTH    # Check both (default)

# Message types
message = LLMMessage(
    role="user",
    content="Hello, world!",
    tool_calls=None,  # Optional list of LLMToolCall
)

tool_call = LLMToolCall(
    name="get_weather",
    arguments={"location": "New York"},
    id="call_123",  # Optional
)

tool_definition = LLMToolDefinition(
    name="get_weather",
    description="Get weather for a location",
    parameters={
        "type": "object",
        "properties": {
            "location": {"type": "string"}
        },
        "required": ["location"]
    },
)

Instrumentation (Tracing)

1

Initialize tracing

import { Qualifire } from "qualifire";

const qualifire = new Qualifire({ apiKey: "YOUR_QUALIFIRE_API_KEY" });
qualifire.init();
2

Configure your LLM client to use the Qualifire proxy

import OpenAI from "openai";

const openai = new OpenAI({
  apiKey: "YOUR_OPENAI_API_KEY",
  baseUrl: "https://proxy.qualifire.ai/api/providers/openai",
  defaultHeaders: {
    "X-Qualifire-API-Key": "YOUR_QUALIFIRE_API_KEY",
  },
});
3

Make requests as usual

const response = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Tell me a joke" }],
});
Evaluations and traces will appear in the Qualifire web UI.
Python example using LangGraph:
import qualifire
from langchain.chat_models import init_chat_model
from langgraph.prebuilt import create_react_agent

qualifire.init(api_key="YOUR_QUALIFIRE_API_KEY")

tools = ...

llm = init_chat_model(
    "openai:gpt-4.1",
    api_key="YOUR_OPENAI_API_KEY",
    base_url="https://proxy.qualifire.ai/api/providers/openai/",
    default_headers={
        "X-Qualifire-API-Key": "YOUR_QUALIFIRE_API_KEY",
    },
)
agent = create_react_agent(llm, tools, prompt="system prompt...")

question = "Tell me a joke"
for step in agent.stream(
    {"messages": [{"role": "user", "content": question}]},
    stream_mode="values",
):
    step["messages"][-1].pretty_print()

Deprecated Parameters

The following parameters are deprecated and will automatically enable contentModerationCheck / content_moderation_check:
DeprecatedUse Instead
dangerousContentCheck / dangerous_content_checkcontentModerationCheck / content_moderation_check
harassmentCheck / harassment_checkcontentModerationCheck / content_moderation_check
hateSpeechCheck / hate_speech_checkcontentModerationCheck / content_moderation_check
sexualContentCheck / sexual_content_checkcontentModerationCheck / content_moderation_check
Snake_case variants are also deprecated in favor of camelCase (Node.js):
DeprecatedUse Instead
grounding_checkgroundingCheck
hallucinations_checkhallucinationsCheck
pii_checkpiiCheck
prompt_injectionspromptInjections
tool_selection_quality_checktoolSelectionQualityCheck
instructions_following_checkinstructionsFollowingCheck
API Reference documentation is here.