Testing & Quality Assurance
This guide covers the Prompt library's testing, debugging, and quality assurance tools. These features help you systematically evaluate prompt effectiveness, catch regressions, and validate response quality — all without requiring LLM calls.
Prompt Debugger
PromptDebugger performs deep structural analysis of your prompts. It
detects anti-patterns, identifies components (instructions, constraints,
examples), measures clarity, and suggests specific improvements.
var report = PromptDebugger.Analyze(
"Tell me about dogs. Make it good. Cover everything."
);
Console.WriteLine($"Clarity: {report.ClarityScore}/100");
// Clarity: 35/100
foreach (var issue in report.Issues)
Console.WriteLine($"[{issue.Severity}] {issue.Id}: {issue.Message}");
// [Warning] AP001: Overly broad scope — asking the model to do 'everything'...
// [Warning] AP002: Vague quality instruction — 'make it good' gives no criteria...
foreach (var fix in report.SuggestedFixes)
Console.WriteLine($"→ {fix}");
// → Break the task into specific sub-tasks or focus on one aspect
// → Specify what 'good' means: e.g., 'use active voice, keep under 20 words'
Conversation Analysis
Analyze multi-turn conversations to catch issues across the full message history:
var messages = new[]
{
new DebugChatMessage("system", "You are a helpful assistant."),
new DebugChatMessage("user", "Summarize this article."),
new DebugChatMessage("assistant", "Sure! Here's a summary..."),
new DebugChatMessage("user", "Make it better.")
};
var report = PromptDebugger.AnalyzeConversation(messages);
// Detects vague follow-ups, missing context, contradictions across turns
What It Detects
| Anti-Pattern | Example | Severity |
|---|---|---|
| Overly broad scope | "handle all edge cases" | Warning |
| Vague quality | "make it nice" | Warning |
| Contradictory instructions | "don't use jargon, but also be technical" | Error |
| Instruction stacking | "also... additionally... furthermore..." | Warning |
| Missing context | Single-word prompts | Info |
| Repetition | Same instruction repeated | Warning |
Prompt Test Suite
PromptTestSuite provides a structured testing framework for prompts.
Define test cases with assertions, run them against responses, and get
detailed pass/fail results.
Defining Tests
var suite = new PromptTestSuite("Customer Service Bot Tests");
// Add a test case with assertions
suite.AddTest(new PromptTestCase
{
Name = "Greeting Response",
Prompt = "Hello, I need help with my order",
ExpectedAssertions = new List<TestAssertion>
{
new TestAssertion(AssertionType.Contains, "help"),
new TestAssertion(AssertionType.HasMinLength, "20"),
new TestAssertion(AssertionType.NotContains, "error"),
new TestAssertion(AssertionType.MatchesRegex, @"\b(hi|hello|welcome)\b")
}
});
suite.AddTest(new PromptTestCase
{
Name = "JSON Format Check",
Prompt = "Return order status as JSON",
ExpectedAssertions = new List<TestAssertion>
{
new TestAssertion(AssertionType.ContainsJson, ""),
new TestAssertion(AssertionType.HasMaxLength, "500")
}
});
Running Tests
// Evaluate a response against a test case
string response = "Hello! I'd be happy to help you with your order.";
var result = suite.RunTest("Greeting Response", response);
Console.WriteLine($"Passed: {result.Passed}");
Console.WriteLine($"Assertions: {result.PassedCount}/{result.TotalCount}");
foreach (var assertion in result.AssertionResults)
Console.WriteLine($" {assertion.Type}: {(assertion.Passed ? "✓" : "✗")}");
Assertion Types
| Type | Value Parameter | What It Checks |
|---|---|---|
Contains |
text | Response contains the text (case-insensitive) |
NotContains |
text | Response does not contain the text |
MatchesRegex |
pattern | Response matches the regex |
StartsWith |
prefix | Response starts with the prefix |
EndsWith |
suffix | Response ends with the suffix |
HasMinLength |
number | Response has at least N characters |
HasMaxLength |
number | Response has at most N characters |
ContainsJson |
(unused) | Response contains valid JSON |
ContainsCodeBlock |
(unused) | Response has a fenced code block |
ContainsAllOf |
"a,b,c" | Response contains all comma-separated values |
Any assertion can be negated:
// This assertion passes when the response DOES contain "error"
new TestAssertion(AssertionType.Contains, "error", negate: true)
// → passes only when response does NOT contain "error"
Serialization
Test suites can be serialized to JSON for storage, version control, or sharing across teams:
string json = suite.ToJson();
var loaded = PromptTestSuite.FromJson(json);
Response Evaluator
PromptResponseEvaluator scores prompt-response pairs across multiple
quality dimensions. It's fully heuristic-based (no LLM calls needed)
and deterministic — the same input always produces the same score.
Basic Evaluation
var evaluator = new PromptResponseEvaluator();
var result = evaluator.Evaluate(
prompt: "List 3 benefits of exercise",
response: "1. Better cardiovascular health\n2. Increased energy levels\n3. Improved mood and mental clarity"
);
Console.WriteLine($"Score: {result.CompositeScore:F2}"); // ~0.92
Console.WriteLine($"Grade: {result.Grade}"); // A
Quality Dimensions
Each dimension produces a 0.0–1.0 score:
| Dimension | What It Measures |
|---|---|
| Relevance | How well the response addresses the prompt keywords |
| Completeness | Whether the response covers all aspects of the request |
| Conciseness | Length efficiency — not too short, not padded |
| Structure | Use of formatting (lists, paragraphs, headings) |
| Specificity | Concrete details vs vague generalities |
Custom Weights
Adjust dimension weights for your use case:
var config = new EvaluatorConfig
{
Weights = new Dictionary<string, double>
{
["relevance"] = 2.0, // Relevance matters most
["completeness"] = 1.5,
["conciseness"] = 0.5, // We don't mind verbose answers
["structure"] = 1.0,
["specificity"] = 1.0
}
};
var evaluator = new PromptResponseEvaluator(config);
Batch Evaluation
Evaluate multiple prompt-response pairs for regression testing:
var pairs = new[]
{
("Explain REST APIs", response1),
("Write a haiku about coding", response2),
("List 5 programming languages", response3)
};
foreach (var (prompt, response) in pairs)
{
var result = evaluator.Evaluate(prompt, response);
Console.WriteLine($"{result.Grade} ({result.CompositeScore:F2}): {prompt}");
}
Grammar Validator
PromptGrammarValidator validates responses against structural rules.
Define expected formats, lengths, patterns, and content requirements —
then validate responses automatically.
Defining Rules
var validator = new PromptGrammarValidator();
// Response must be valid JSON
validator.AddRule(new GrammarRule
{
Id = "json-format",
Type = GrammarRuleType.JsonSchema,
Severity = ViolationSeverity.Error
});
// Response must be 50–500 characters
validator.AddRule(new GrammarRule
{
Id = "length-check",
Type = GrammarRuleType.Length,
Min = 50,
Max = 500,
Severity = ViolationSeverity.Warning
});
// Response must contain a specific section
validator.AddRule(new GrammarRule
{
Id = "has-summary",
Type = GrammarRuleType.Contains,
Value = "Summary:",
Severity = ViolationSeverity.Error
});
// Response must NOT contain filler phrases
validator.AddRule(new GrammarRule
{
Id = "no-filler",
Type = GrammarRuleType.NotContains,
Value = "As an AI language model",
Severity = ViolationSeverity.Warning
});
Validating Responses
var result = validator.Validate(response);
if (!result.IsValid)
{
foreach (var violation in result.Violations)
Console.WriteLine($"[{violation.Severity}] Rule '{violation.RuleId}': {violation.Message}");
}
Rule Types
| Type | Purpose |
|---|---|
Regex |
Match a regex pattern |
JsonSchema |
Valid JSON structure |
Enum |
One of allowed values |
StartsWith / EndsWith |
Prefix/suffix matching |
Contains / NotContains |
Substring presence/absence |
Length |
Character count bounds |
LineCount |
Line count bounds |
Structure |
Section/bullet structure validation |
Custom |
Delegate-based custom logic |
Prompt Fuzzer
PromptFuzzer generates variations of your prompts to test robustness.
It systematically applies mutations (typos, synonym swaps, case changes,
word drops) to discover how sensitive your prompt is to phrasing.
Basic Fuzzing
var fuzzer = new PromptFuzzer();
var result = fuzzer.Fuzz(
"Summarize the following article in 3 bullet points",
count: 5
);
Console.WriteLine($"Original: {result.Original}");
foreach (var variant in result.Variants)
{
Console.WriteLine($" [{variant.Strategy}] {variant.Text}");
Console.WriteLine($" Similarity: {variant.Similarity:P0}");
}
Fuzzing Strategies
Combine strategies using flags:
var result = fuzzer.Fuzz(prompt,
strategies: FuzzStrategy.TypoInjection | FuzzStrategy.WordDrop,
count: 10
);
| Strategy | What It Does |
|---|---|
SynonymSwap |
Replaces words with synonyms |
TypoInjection |
Introduces realistic typos |
CaseChange |
Randomizes letter casing |
WordDrop |
Removes random words |
WordShuffle |
Swaps adjacent words |
NoiseInjection |
Adds filler words or whitespace |
Truncation |
Truncates at various points |
All |
Applies all strategies |
Robustness Testing Workflow
Combine fuzzing with the test suite for automated robustness checks:
var fuzzer = new PromptFuzzer();
var suite = new PromptTestSuite("Robustness Tests");
// Original prompt
string prompt = "Extract the email addresses from this text";
// Generate 20 variations
var fuzzed = fuzzer.Fuzz(prompt, count: 20, strategies: FuzzStrategy.All);
// Test each variation against your assertions
foreach (var variant in fuzzed.Variants)
{
string response = await GetModelResponse(variant.Text);
var testResult = suite.RunTest("Email Extraction", response);
if (!testResult.Passed)
{
Console.WriteLine($"FAILED with variant: {variant.Text}");
Console.WriteLine($"Strategy: {variant.Strategy}");
Console.WriteLine($"Similarity: {variant.Similarity:P0}");
}
}
Putting It All Together
These tools compose into a complete prompt QA pipeline:
// 1. Debug the prompt structure first
var debugReport = PromptDebugger.Analyze(myPrompt);
if (debugReport.ClarityScore < 50)
Console.WriteLine("⚠️ Prompt clarity is low — review suggested fixes");
// 2. Define quality expectations
var suite = new PromptTestSuite("Production Tests");
suite.AddTest(new PromptTestCase
{
Name = "Format Check",
Prompt = myPrompt,
ExpectedAssertions = new List<TestAssertion>
{
new TestAssertion(AssertionType.ContainsJson, ""),
new TestAssertion(AssertionType.HasMinLength, "100"),
new TestAssertion(AssertionType.NotContains, "I'm sorry")
}
});
// 3. Validate response grammar
var validator = new PromptGrammarValidator();
validator.AddRule(new GrammarRule
{
Id = "valid-json", Type = GrammarRuleType.JsonSchema,
Severity = ViolationSeverity.Error
});
// 4. Evaluate response quality
var evaluator = new PromptResponseEvaluator();
// 5. Fuzz for robustness
var fuzzer = new PromptFuzzer();
var variants = fuzzer.Fuzz(myPrompt, count: 10);
// Run the full pipeline
foreach (var variant in variants.Variants)
{
string response = await GetModelResponse(variant.Text);
var testResult = suite.RunTest("Format Check", response);
var grammarResult = validator.Validate(response);
var evalResult = evaluator.Evaluate(variant.Text, response);
Console.WriteLine($"Variant ({variant.Strategy}): " +
$"Test={testResult.Passed}, Grammar={grammarResult.IsValid}, " +
$"Quality={evalResult.Grade}");
}
This pipeline gives you confidence that your prompts are:
- Well-structured (Debugger catches anti-patterns)
- Producing correct output (TestSuite validates assertions)
- Following format rules (GrammarValidator enforces structure)
- High quality (ResponseEvaluator scores dimensions)
- Robust to variation (Fuzzer tests edge cases)