Table of Contents

Testing & Quality Assurance

This guide covers the Prompt library's testing, debugging, and quality assurance tools. These features help you systematically evaluate prompt effectiveness, catch regressions, and validate response quality — all without requiring LLM calls.

Prompt Debugger

PromptDebugger performs deep structural analysis of your prompts. It detects anti-patterns, identifies components (instructions, constraints, examples), measures clarity, and suggests specific improvements.

var report = PromptDebugger.Analyze(
    "Tell me about dogs. Make it good. Cover everything."
);

Console.WriteLine($"Clarity: {report.ClarityScore}/100");
// Clarity: 35/100

foreach (var issue in report.Issues)
    Console.WriteLine($"[{issue.Severity}] {issue.Id}: {issue.Message}");
// [Warning] AP001: Overly broad scope — asking the model to do 'everything'...
// [Warning] AP002: Vague quality instruction — 'make it good' gives no criteria...

foreach (var fix in report.SuggestedFixes)
    Console.WriteLine($"→ {fix}");
// → Break the task into specific sub-tasks or focus on one aspect
// → Specify what 'good' means: e.g., 'use active voice, keep under 20 words'

Conversation Analysis

Analyze multi-turn conversations to catch issues across the full message history:

var messages = new[]
{
    new DebugChatMessage("system", "You are a helpful assistant."),
    new DebugChatMessage("user", "Summarize this article."),
    new DebugChatMessage("assistant", "Sure! Here's a summary..."),
    new DebugChatMessage("user", "Make it better.")
};

var report = PromptDebugger.AnalyzeConversation(messages);
// Detects vague follow-ups, missing context, contradictions across turns

What It Detects

Anti-Pattern Example Severity
Overly broad scope "handle all edge cases" Warning
Vague quality "make it nice" Warning
Contradictory instructions "don't use jargon, but also be technical" Error
Instruction stacking "also... additionally... furthermore..." Warning
Missing context Single-word prompts Info
Repetition Same instruction repeated Warning

Prompt Test Suite

PromptTestSuite provides a structured testing framework for prompts. Define test cases with assertions, run them against responses, and get detailed pass/fail results.

Defining Tests

var suite = new PromptTestSuite("Customer Service Bot Tests");

// Add a test case with assertions
suite.AddTest(new PromptTestCase
{
    Name = "Greeting Response",
    Prompt = "Hello, I need help with my order",
    ExpectedAssertions = new List<TestAssertion>
    {
        new TestAssertion(AssertionType.Contains, "help"),
        new TestAssertion(AssertionType.HasMinLength, "20"),
        new TestAssertion(AssertionType.NotContains, "error"),
        new TestAssertion(AssertionType.MatchesRegex, @"\b(hi|hello|welcome)\b")
    }
});

suite.AddTest(new PromptTestCase
{
    Name = "JSON Format Check",
    Prompt = "Return order status as JSON",
    ExpectedAssertions = new List<TestAssertion>
    {
        new TestAssertion(AssertionType.ContainsJson, ""),
        new TestAssertion(AssertionType.HasMaxLength, "500")
    }
});

Running Tests

// Evaluate a response against a test case
string response = "Hello! I'd be happy to help you with your order.";
var result = suite.RunTest("Greeting Response", response);

Console.WriteLine($"Passed: {result.Passed}");
Console.WriteLine($"Assertions: {result.PassedCount}/{result.TotalCount}");

foreach (var assertion in result.AssertionResults)
    Console.WriteLine($"  {assertion.Type}: {(assertion.Passed ? "✓" : "✗")}");

Assertion Types

Type Value Parameter What It Checks
Contains text Response contains the text (case-insensitive)
NotContains text Response does not contain the text
MatchesRegex pattern Response matches the regex
StartsWith prefix Response starts with the prefix
EndsWith suffix Response ends with the suffix
HasMinLength number Response has at least N characters
HasMaxLength number Response has at most N characters
ContainsJson (unused) Response contains valid JSON
ContainsCodeBlock (unused) Response has a fenced code block
ContainsAllOf "a,b,c" Response contains all comma-separated values

Any assertion can be negated:

// This assertion passes when the response DOES contain "error"
new TestAssertion(AssertionType.Contains, "error", negate: true)
// → passes only when response does NOT contain "error"

Serialization

Test suites can be serialized to JSON for storage, version control, or sharing across teams:

string json = suite.ToJson();
var loaded = PromptTestSuite.FromJson(json);

Response Evaluator

PromptResponseEvaluator scores prompt-response pairs across multiple quality dimensions. It's fully heuristic-based (no LLM calls needed) and deterministic — the same input always produces the same score.

Basic Evaluation

var evaluator = new PromptResponseEvaluator();

var result = evaluator.Evaluate(
    prompt: "List 3 benefits of exercise",
    response: "1. Better cardiovascular health\n2. Increased energy levels\n3. Improved mood and mental clarity"
);

Console.WriteLine($"Score: {result.CompositeScore:F2}");  // ~0.92
Console.WriteLine($"Grade: {result.Grade}");               // A

Quality Dimensions

Each dimension produces a 0.0–1.0 score:

Dimension What It Measures
Relevance How well the response addresses the prompt keywords
Completeness Whether the response covers all aspects of the request
Conciseness Length efficiency — not too short, not padded
Structure Use of formatting (lists, paragraphs, headings)
Specificity Concrete details vs vague generalities

Custom Weights

Adjust dimension weights for your use case:

var config = new EvaluatorConfig
{
    Weights = new Dictionary<string, double>
    {
        ["relevance"] = 2.0,      // Relevance matters most
        ["completeness"] = 1.5,
        ["conciseness"] = 0.5,    // We don't mind verbose answers
        ["structure"] = 1.0,
        ["specificity"] = 1.0
    }
};

var evaluator = new PromptResponseEvaluator(config);

Batch Evaluation

Evaluate multiple prompt-response pairs for regression testing:

var pairs = new[]
{
    ("Explain REST APIs", response1),
    ("Write a haiku about coding", response2),
    ("List 5 programming languages", response3)
};

foreach (var (prompt, response) in pairs)
{
    var result = evaluator.Evaluate(prompt, response);
    Console.WriteLine($"{result.Grade} ({result.CompositeScore:F2}): {prompt}");
}

Grammar Validator

PromptGrammarValidator validates responses against structural rules. Define expected formats, lengths, patterns, and content requirements — then validate responses automatically.

Defining Rules

var validator = new PromptGrammarValidator();

// Response must be valid JSON
validator.AddRule(new GrammarRule
{
    Id = "json-format",
    Type = GrammarRuleType.JsonSchema,
    Severity = ViolationSeverity.Error
});

// Response must be 50–500 characters
validator.AddRule(new GrammarRule
{
    Id = "length-check",
    Type = GrammarRuleType.Length,
    Min = 50,
    Max = 500,
    Severity = ViolationSeverity.Warning
});

// Response must contain a specific section
validator.AddRule(new GrammarRule
{
    Id = "has-summary",
    Type = GrammarRuleType.Contains,
    Value = "Summary:",
    Severity = ViolationSeverity.Error
});

// Response must NOT contain filler phrases
validator.AddRule(new GrammarRule
{
    Id = "no-filler",
    Type = GrammarRuleType.NotContains,
    Value = "As an AI language model",
    Severity = ViolationSeverity.Warning
});

Validating Responses

var result = validator.Validate(response);

if (!result.IsValid)
{
    foreach (var violation in result.Violations)
        Console.WriteLine($"[{violation.Severity}] Rule '{violation.RuleId}': {violation.Message}");
}

Rule Types

Type Purpose
Regex Match a regex pattern
JsonSchema Valid JSON structure
Enum One of allowed values
StartsWith / EndsWith Prefix/suffix matching
Contains / NotContains Substring presence/absence
Length Character count bounds
LineCount Line count bounds
Structure Section/bullet structure validation
Custom Delegate-based custom logic

Prompt Fuzzer

PromptFuzzer generates variations of your prompts to test robustness. It systematically applies mutations (typos, synonym swaps, case changes, word drops) to discover how sensitive your prompt is to phrasing.

Basic Fuzzing

var fuzzer = new PromptFuzzer();

var result = fuzzer.Fuzz(
    "Summarize the following article in 3 bullet points",
    count: 5
);

Console.WriteLine($"Original: {result.Original}");
foreach (var variant in result.Variants)
{
    Console.WriteLine($"  [{variant.Strategy}] {variant.Text}");
    Console.WriteLine($"   Similarity: {variant.Similarity:P0}");
}

Fuzzing Strategies

Combine strategies using flags:

var result = fuzzer.Fuzz(prompt,
    strategies: FuzzStrategy.TypoInjection | FuzzStrategy.WordDrop,
    count: 10
);
Strategy What It Does
SynonymSwap Replaces words with synonyms
TypoInjection Introduces realistic typos
CaseChange Randomizes letter casing
WordDrop Removes random words
WordShuffle Swaps adjacent words
NoiseInjection Adds filler words or whitespace
Truncation Truncates at various points
All Applies all strategies

Robustness Testing Workflow

Combine fuzzing with the test suite for automated robustness checks:

var fuzzer = new PromptFuzzer();
var suite = new PromptTestSuite("Robustness Tests");

// Original prompt
string prompt = "Extract the email addresses from this text";

// Generate 20 variations
var fuzzed = fuzzer.Fuzz(prompt, count: 20, strategies: FuzzStrategy.All);

// Test each variation against your assertions
foreach (var variant in fuzzed.Variants)
{
    string response = await GetModelResponse(variant.Text);
    var testResult = suite.RunTest("Email Extraction", response);

    if (!testResult.Passed)
    {
        Console.WriteLine($"FAILED with variant: {variant.Text}");
        Console.WriteLine($"Strategy: {variant.Strategy}");
        Console.WriteLine($"Similarity: {variant.Similarity:P0}");
    }
}

Putting It All Together

These tools compose into a complete prompt QA pipeline:

// 1. Debug the prompt structure first
var debugReport = PromptDebugger.Analyze(myPrompt);
if (debugReport.ClarityScore < 50)
    Console.WriteLine("⚠️ Prompt clarity is low — review suggested fixes");

// 2. Define quality expectations
var suite = new PromptTestSuite("Production Tests");
suite.AddTest(new PromptTestCase
{
    Name = "Format Check",
    Prompt = myPrompt,
    ExpectedAssertions = new List<TestAssertion>
    {
        new TestAssertion(AssertionType.ContainsJson, ""),
        new TestAssertion(AssertionType.HasMinLength, "100"),
        new TestAssertion(AssertionType.NotContains, "I'm sorry")
    }
});

// 3. Validate response grammar
var validator = new PromptGrammarValidator();
validator.AddRule(new GrammarRule
{
    Id = "valid-json", Type = GrammarRuleType.JsonSchema,
    Severity = ViolationSeverity.Error
});

// 4. Evaluate response quality
var evaluator = new PromptResponseEvaluator();

// 5. Fuzz for robustness
var fuzzer = new PromptFuzzer();
var variants = fuzzer.Fuzz(myPrompt, count: 10);

// Run the full pipeline
foreach (var variant in variants.Variants)
{
    string response = await GetModelResponse(variant.Text);

    var testResult = suite.RunTest("Format Check", response);
    var grammarResult = validator.Validate(response);
    var evalResult = evaluator.Evaluate(variant.Text, response);

    Console.WriteLine($"Variant ({variant.Strategy}): " +
        $"Test={testResult.Passed}, Grammar={grammarResult.IsValid}, " +
        $"Quality={evalResult.Grade}");
}

This pipeline gives you confidence that your prompts are:

  • Well-structured (Debugger catches anti-patterns)
  • Producing correct output (TestSuite validates assertions)
  • Following format rules (GrammarValidator enforces structure)
  • High quality (ResponseEvaluator scores dimensions)
  • Robust to variation (Fuzzer tests edge cases)