Eval Files
Evaluation files define the test cases, targets, and evaluators for an evaluation run. AgentV supports two formats: YAML and JSONL.
YAML Format
Section titled “YAML Format”The primary format. A single file contains metadata, execution config, and tests:
description: Math problem solving evaluationexecution: target: default
assert: - name: correctness type: llm_judge prompt: ./judges/correctness.md
tests: - id: addition criteria: Correctly calculates 15 + 27 = 42 input: What is 15 + 27? expected_output: "42"Top-level Fields
Section titled “Top-level Fields”| Field | Description |
|---|---|
description | Human-readable description of the evaluation |
dataset | Optional dataset identifier |
execution | Default execution config (for example target) |
workspace | Suite-level workspace config (lifecycle hooks, template) |
tests | Array of individual tests, or a string path to an external file |
assert | Suite-level evaluators appended to each test unless execution.skip_defaults: true is set on the test |
Metadata Fields
Section titled “Metadata Fields”You can add structured metadata to your eval file using these optional top-level fields. Metadata is parsed when the name field is present:
| Field | Description |
|---|---|
name | Machine-readable identifier (lowercase, hyphens, max 64 chars). Triggers metadata parsing. |
description | Human-readable description (max 1024 chars) |
version | Eval version string (e.g., "1.0") |
author | Author or team identifier |
tags | Array of string tags for categorization |
license | License identifier (e.g., "MIT", "Apache-2.0") |
requires | Dependency constraints (e.g., agentv: ">=0.30.0") |
name: export-screeningdescription: Evaluates export control screening accuracyversion: "1.0"author: acme-compliancetags: [compliance, agents]license: Apache-2.0requires: agentv: ">=0.30.0"
tests: - id: denied-party criteria: Identifies denied parties correctly input: Screen "Acme Corp" against denied parties listSuite-level Assert
Section titled “Suite-level Assert”The assert field is the canonical way to define suite-level evaluators. Suite-level assertions are appended to every test’s evaluators unless a test sets execution.skip_defaults: true.
description: API response validationassert: - type: is_json required: true - type: contains value: "status"
tests: - id: health-check criteria: Returns health status input: Check API healthassert supports all evaluator types, including deterministic assertion types (contains, regex, is_json, equals) and rubrics. See Tests for per-test assert usage.
Tests as String Path
Section titled “Tests as String Path”Instead of inlining tests in the same file, you can point tests to an external YAML or JSONL file. This is the inverse of the sidecar pattern — the metadata file references the test data:
name: my-evaldescription: My evaluation suiteexecution: target: defaulttests: ./cases.yamlThe path is resolved relative to the eval file’s directory. The external file should contain a YAML array of test objects or a JSONL file with one test per line.
JSONL Format
Section titled “JSONL Format”For large-scale evaluations, AgentV supports JSONL (JSON Lines) format. Each line is a single test:
{"id": "test-1", "criteria": "Calculates correctly", "input": "What is 2+2?"}{"id": "test-2", "criteria": "Provides explanation", "input": "Explain variables"}Sidecar Metadata
Section titled “Sidecar Metadata”An optional YAML sidecar file provides metadata and execution config. Place it alongside the JSONL file with the same base name:
dataset.jsonl + dataset.eval.yaml:
description: Math evaluation datasetdataset: math-testsexecution: target: azure_baseassert: - name: correctness type: llm_judge prompt: ./judges/correctness.mdBenefits of JSONL
Section titled “Benefits of JSONL”- Streaming-friendly — process line by line
- Git-friendly — diffs show individual case changes
- Programmatic generation — easy to create from scripts
- Industry standard — compatible with DeepEval, LangWatch, Hugging Face datasets
Converting Between Formats
Section titled “Converting Between Formats”Use the convert command to switch between YAML and JSONL:
agentv convert evals/dataset.eval.yaml --format jsonlagentv convert evals/dataset.jsonl --format yaml