Grounding Verification
Dual-model critic flow to detect hallucinated test cases.
| Related: CLI Reference | Configuration | Test Format |
Overview
When AI generates test cases, it can hallucinate — invent steps, expected results, or behaviors that don’t exist in your documentation. SPECTRA’s grounding verification uses a second AI model (the “critic”) to verify each test case against the source documentation.
How It Works
- Generator creates draft test cases from your documentation
- Critic (different model) verifies each test case against the same docs
- Test cases receive a verdict:
grounded,partial, orhallucinated - Only grounded and partial test cases are written to disk
Verdicts
| Verdict | Meaning | Action |
|---|---|---|
grounded |
All steps trace to documentation | Written as-is |
partial |
Some steps have assumptions | Written with warnings |
hallucinated |
Contains invented behaviors | Rejected |
Grounding Metadata
Verified test cases include grounding metadata in their frontmatter:
grounding:
verdict: grounded
score: 0.95
generator: claude-sonnet-4
critic: gemini-2.0-flash
verified_at: 2026-03-19T10:30:00Z
unverified_claims: []
For partial verdicts, unverified_claims lists what couldn’t be verified:
grounding:
verdict: partial
score: 0.72
unverified_claims:
- "Step 3: assumes refund email sent within 5 minutes"
- "Expected Result: specific error code not in docs"
Verification Output
After generation, SPECTRA displays verification results:
Generating test cases...
✓ Generated 10 test cases
✓ 7 grounded
⚠ 2 partial — written with grounding warnings
✗ 1 hallucinated — rejected
✓ 9 test cases written to test-cases/checkout/
✓ Index updated
ℹ Partial test cases (review recommended):
TC-209 Assumes refund email is sent within 5 minutes — not confirmed in docs
TC-212 Navigation path to currency settings not documented
ℹ Rejected test cases:
TC-220 References "fraud detection API" — not mentioned in any documentation
Configuration
Configure the critic in spectra.config.json:
{
"ai": {
"critic": {
"enabled": true,
"provider": "github-models",
"model": "gpt-5-mini",
"timeout_seconds": 120,
"max_concurrent": 5
}
}
}
Spec 043 — parallel verification:
max_concurrent(default1) controls how many critic verification calls run concurrently. Setting it to5typically cuts the critic phase to ~1/5 of sequential time on a large suite without changing any output (results are written in original input order). Clamped to[1, 20]. Values >10 emit a rate-limit-risk warning at run start. If you start hitting rate limits, the Run Summary panel surfaces aRate limitscount with a hint pointing back at this knob.
Spec 041:
gpt-5-miniis the new default critic model (wasgpt-4o-mini). It’s included free on any paid Copilot plan and, when paired with agpt-4.1generator, provides cross-architecture verification without burning premium requests. For Claude generators, the default critic rotates toclaude-haiku-4-5. Per-provider defaults resolved byCriticConfig.GetEffectiveModel():github-models/openai/azure-openai→gpt-5-mini;anthropic/azure-anthropic→claude-haiku-4-5.
Supported critic providers (spec 039 — same set as the generator):
github-models, azure-openai, azure-anthropic, openai, anthropic.
Default API key environment variables:
github-models:GITHUB_TOKENazure-openai:AZURE_OPENAI_API_KEYazure-anthropic:AZURE_ANTHROPIC_API_KEYopenai:OPENAI_API_KEYanthropic:ANTHROPIC_API_KEY
Legacy values:
provider: "github"is accepted as a soft alias forgithub-models(with a one-line deprecation warning on stderr). The legacy valueprovider: "google"is no longer supported — the Copilot SDK runtime cannot route to Google. Update your config to one of the canonical five providers above.
Skip Verification
spectra ai generate checkout --skip-critic
Or disable globally in config:
{
"ai": {
"critic": {
"enabled": false
}
}
}