Scope: this page is about authoring *.test.jh workflow tests (jaiph test) and how those pieces relate to the same Node workflow runtime as jaiph run. It also summarizes repository test layers (compiler txtar, golden AST, shell E2E) that contributors run in CI.
In production, a workflow’s behavior depends on live models, host timing, and local files. A harness fixes inputs (mock prompts, stubbed workflows/scripts), runs the same interpreter the CLI uses for real runs, and checks outputs with small assertions so CI and refactors can catch regressions without external services.
Jaiph includes a built-in test harness: test files (*.test.jh) mock prompt responses, stub workflows, rules, and scripts, execute workflows through NodeWorkflowRuntime in-process, and assert on captured output — without calling real LLMs or depending on external state. Unlike jaiph run, the harness does not spawn node-workflow-runner: after buildScripts, the CLI calls runTestFile() in src/runtime/kernel/node-test-runner.ts. There is no Docker mode for jaiph test; workflows under test always run on the host. How that fits buildRuntimeGraph, suppressLiveEvents, and artifact writes is in Architecture — Test runner integration.
Test files use the .test.jh suffix (for example workflow_greeting.test.jh).
A test file supports the same top-level forms as any .jh file (import, config, workflow, etc.), but the CLI only executes test "..." { ... } blocks. Other declarations are parsed into the runtime graph — for example, a local workflow is visible to single-segment references.
Recommended style: keep test files to import statements and test blocks. Define the workflows under test in separate modules so files stay small and focused.
Import paths in import "..." as alias resolve relative to the test file’s directory, with the same extension handling as ordinary modules (.jh is appended when omitted). See Grammar — Lexical notes.
# All *.test.jh files under the detected workspace root (recursive)
jaiph test
# All tests under a directory (recursive)
jaiph test ./e2e
# One file
jaiph test ./e2e/workflow_greeting.test.jh
# Equivalent shorthand (a *.test.jh path is treated as jaiph test)
jaiph ./e2e/workflow_greeting.test.jh
jaiph path.test.jh without the test subcommand is only accepted when the first CLI argument ends with .test.jh and path resolves to an existing file (src/cli/index.ts); otherwise the token is treated as an unknown command.
Discovery: With no path argument, Jaiph scans the detected workspace root recursively; with a directory, it scans that tree. Only *.test.jh files are collected: the name must end in .jh and the stem must end with .test (see walkTestFiles in src/transpile/build.ts). Unlike walkjhFiles (used when compiling ordinary *.jh trees), test discovery does not skip .jaiph/ subtrees, so stray *.test.jh files under .jaiph/... would be picked up — keep test modules in normal source locations. The workspace root—for locating imports and setting JAIPH_WORKSPACE—is from detectWorkspaceRoot in src/cli/shared/paths.ts: walk upward from a starting directory (the current working directory, the directory you passed, or the parent of a single test file) until .jaiph or .git is found, subject to a few guards for shared temp directories and nested .jaiph/tmp layouts; if nothing matches, the resolved starting directory is used as the root.
If no *.test.jh files are found, the command prints an error and exits with status 1. A file must contain at least one test block; otherwise the CLI reports a parse error. Passing a plain *.jh file that is not named *.test.jh is rejected — use jaiph run for those.
Each test block is a named test case containing ordered steps:
import "workflow_greeting.jh" as w
test "runs happy path and prints PASS" {
mock prompt "e2e-greeting-mock"
const response = run w.default()
expect_contain response "e2e-greeting-mock"
expect_contain response "done"
}
Inside a test block, steps execute in order. # line comments and blank lines are allowed between steps (they are ignored by the runner).
Queues a fixed response for the next prompt call in the workflow under test. Multiple mock prompt lines queue in order — one is consumed per prompt call.
mock prompt "hello from mock"
mock prompt "second response"
mock prompt myConstName
Use a double-quoted string (escapes: \", \n, \\) or a bare identifier for a test const defined earlier in the block. Single-quoted mock text is rejected at parse time — use double quotes.
Dispatches different responses based on the prompt text using pattern matching. Arms are tested top-to-bottom; the first match wins.
mock prompt {
/greeting/ => "hello"
/farewell/ => "goodbye"
_ => "default response"
}
Each arm is pattern => "response". Patterns can be:
"greeting") — exact match against the prompt text/greeting/) — tested against the prompt text_) — matches anything (like a default/else branch)Without a _ wildcard arm, an unmatched prompt fails the test.
Do not combine mock prompt { ... } with inline mock prompt "..." in the same test block — when a block mock is present, inline queue entries are ignored.
Replaces a workflow body for this test case with Jaiph steps:
mock workflow w.greet() {
return "stubbed greeting"
}
Syntax: mock workflow <ref>(<params>) { ... } — parentheses are required, even when there are no parameters (()). The legacy form mock workflow ref { without () is rejected with a fix hint.
The reference format is <alias>.<workflow> (preferred) or <name> for a workflow defined in the test file itself.
Same as mock workflow, but for rules (body uses Jaiph steps, not shell):
mock rule w.validate() {
return "stubbed validation"
}
Stubs a module script block. The body is shell, like a real script step (the runner executes it as a managed shell mock — see runtime-mock.ts):
mock script w.helper() {
echo "stubbed script"
}
Test stubs use mock script, not mock function; the latter is a parse error with a fix hint.
mock script uses the same ref() { ... } header shape as mock workflow / mock rule.
Runs a workflow and captures its output into a variable:
const response = run w.default()
Capture semantics (see runTestBlock in node-test-runner.ts) pick the first branch that applies:
return string from the workflow → that return value is captured.allow_failure when you assert on failure output).*.out step capture in the run directory in sorted filename order; if listing or reading those files fails, it falls back to the runtime’s aggregated output string.The test still fails on non-zero exit unless allow_failure is set; capture content is independent of that check.
Variants:
# With an argument
const response = run w.default("my input")
# Allow failure
const response = run w.default() allow_failure
# With argument and allow failure
const response = run w.default("my input") allow_failure
Runs a workflow without storing output. Still fails on non-zero exit unless allow_failure is appended:
run w.setup()
run w.setup("arg")
run w.setup() allow_failure
Inside a test block, const NAME = "value" binds a test-local string (double-quoted literal only; no interpolation). Names can be used as:
mock prompt NAME — the next prompt consumes the bound valueexpect_contain, expect_not_contain, or expect_equal when written as a bare identifier (not quoted)const bindings used for mock prompt or expected values must appear before the steps that read them. Capture variables (const x = run w.default()) are separate: only const … = run … introduces a capture name for expect_*.
After capturing workflow output, use these to check the result:
expect_contain response "expected substring"
expect_not_contain response "unwanted text"
expect_equal response "exact expected value"
The second argument is either a double-quoted string (with \", \n, and \\ escapes) or a const name bound earlier in the same test block (see Test block constants):
const want = "expected substring"
expect_contain response want
expect_equal failures print a short diff-style - / + preview; substring assertions report lengths and the expected fragment.
When a workflow uses typed prompts (returns "{ ... }"), mock text must be a single line of valid JSON matching the schema so that parsing and field variables work correctly. Fields are accessed with dot notation — ${result.field} — in log, return, and other interpolation contexts. See e2e/prompt_returns_run_capture.test.jh and e2e/dot_notation.test.jh for examples.
Each test block runs in isolation. Failed assertions, harness/runtime errors while executing the block, or a workflow exiting non-zero (without allow_failure) mark that case as failed.
The runner output looks like:
testing workflow_greeting.test.jh
▸ runs happy path and prints PASS
✓ 0s
▸ handles error case
✗ expect_contain failed: "response" (42 chars) does not contain "expected" 1s
✗ 1 / 2 test(s) failed
- handles error case
When all tests pass: ✓ N test(s) passed. Exit status is 0 on full success, non-zero if any test failed.
The CLI parses each test file and passes test "…" { … } blocks to runTestFile() (src/runtime/kernel/node-test-runner.ts). That path aligns with Architecture — Test runner integration:
buildScripts(testFileAbs, tmpDir, workspaceRoot) — same helper as jaiph run, with the test file as the entrypoint (test.ts calls it with the absolute path to the *.test.jh file). For a file entrypoint, the transpiler walks the test module and every file reachable by transitive import (see collectTransitiveJhModules in src/transpile/build.ts); it runs validateReferences / emitScriptsForModule per file and writes atomic script files into a temp scripts/ tree. (If buildScripts were ever given a directory entrypoint, directory walks skip *.test.jh files — that is not how jaiph test invokes it.)buildRuntimeGraph(testFileAbs, workspaceRoot) — called once per test file; the same graph is reused for every test block in that file and for every run step inside them.NodeWorkflowRuntime, not in a detached child.There is no Bash transpilation of full workflows on this path — only extracted script bodies are shell, same as production. The import graph is fixed for a single jaiph test process; mutating imported *.jh on disk between blocks is not a supported use case.
For each workflow run inside a test block, the harness builds the runtime environment from process.env plus:
| Variable | Value |
|---|---|
JAIPH_TEST_MODE |
1 (selects mock prompt dispatch in prompt.ts) |
JAIPH_WORKSPACE |
Project root (from detectWorkspaceRoot) |
JAIPH_RUNS_DIR |
Per test block, …/tmp/jaiph-test-block-*/.jaiph/runs (ephemeral) |
JAIPH_SCRIPTS |
Directory containing extracted script files from buildScripts (temp) |
JAIPH_MOCK_RESPONSES_JSON |
JSON array of strings: sequential inline mock prompt "…" / mock prompt <const> responses (only when no mock prompt { … } block exists in that case) |
JAIPH_MOCK_PROMPT_ARMS_JSON |
JSON payload for pattern-based mock prompt { … } arms (in-process dispatch in mock.ts / prompt.ts; mutually exclusive with the responses queue for that run) |
You do not set mock variables or JAIPH_TEST_MODE yourself; the harness sets them for each run … step that starts an in-process NodeWorkflowRuntime. JAIPH_TEST_MODE routes prompt steps to the mock path in prompt.ts. Suppression of live __JAIPH_EVENT__ lines on stderr is controlled by suppressLiveEvents: true on that runtime (see Architecture — Test runner integration), not by JAIPH_TEST_MODE; durable run_summary.jsonl writes still append. Production jaiph run uses a spawned node-workflow-runner child without suppressLiveEvents, so live events keep streaming to stderr there.
A Given / When / Then structure works well but is not required — comments and blank lines are fine:
import "app.jh" as app
test "default workflow prints greeting" {
# Given
mock prompt "hello"
# When
const out = run app.default()
# Then
expect_contain out "hello"
}
Compiler tests verify parse and validate outcomes using a language-agnostic txtar format. Unlike the TypeScript-embedded tests in src/, these fixtures are plain text files that can be reused by alternative implementations (e.g. a Rust compiler).
Test fixture files live in test-fixtures/compiler-txtar/ as .txt files. Each file contains multiple test cases separated by === delimiters:
=== test name here
# @expect ok
--- input.jh
workflow default() {
log "hello"
}
=== another test
# @expect error E_PARSE "unterminated workflow block"
--- input.jh
workflow default() {
log "hello"
=== <name> starts a new test case. Everything until the next === (or EOF) belongs to that case.--- <filename> starts a virtual file within the test case. Filenames must end in .jh.# @expect <directive> declares the expected outcome and must appear before the first --- marker.| Directive | Meaning |
|---|---|
# @expect ok |
Parse + validate succeed with no errors |
# @expect error E_CODE "substring" |
An error is thrown whose message contains both E_CODE and substring (substring must be double-quoted in the fixture — the runner parses that form only) |
# @expect error E_CODE "substring" @L |
Same, and the error must be reported at line L (any column) |
# @expect error E_CODE "substring" @L:C |
Same, and the error must be reported at line L, column C |
--- input.jh. The runner parses and validates input.jh.--- input.test.jh for test-specific fixtures.--- main.jh as the entry file plus additional --- lib.jh etc. The runner parses and validates main.jh as the entry.The entry file is determined by priority: main.jh if present, otherwise input.jh, otherwise input.test.jh, otherwise the first file.
npm run test:compiler
The runner (test-infra/compiler-test-runner.ts) discovers all .txt files in test-fixtures/compiler-txtar/, parses them, writes virtual files to a temp directory per case, runs parsejaiph + validateReferences, and asserts the expected outcome. Results are reported per test case via node:test. Compiler tests are also included in npm test.
Test cases are organized by error type and single-vs-multi-module:
| File | Cases | What it covers |
|---|---|---|
test-fixtures/compiler-txtar/valid.txt |
119 | Success cases — source compiles without error (single-module) |
test-fixtures/compiler-txtar/parse-errors.txt |
282 | E_PARSE error cases — syntax and grammar violations |
test-fixtures/compiler-txtar/validate-errors.txt |
92 | E_VALIDATE, E_IMPORT_NOT_FOUND, E_SCHEMA error cases (single-module) |
test-fixtures/compiler-txtar/validate-errors-multi-module.txt |
20 | Validation errors requiring imports (multi-file) |
(Counts are lines matching # @expect in each .txt file; the runner also registers separate node:test meta-tests in compiler-test-runner.ts. Re-count after large fixture changes.)
The initial cases were extracted from TypeScript test files across src/parse/*.test.ts and src/transpile/*.test.ts. Additional cases were written directly as txtar fixtures to cover compiler error paths that had no prior test coverage. Only tests that verify “source in, pass/fail out” qualify — tests that check AST structure or internal APIs remain in TypeScript.
.txt file per category.The format is documented in detail in test-fixtures/compiler-txtar/README.md.
Golden AST tests verify that the parser produces the expected tree shape for successful parses. While compiler tests (txtar) cover pass/fail outcomes and E2E tests cover runtime behavior, golden AST tests lock in what the parser actually produced — so refactors cannot silently change tree structure.
Each .jh fixture in test-fixtures/golden-ast/fixtures/ is parsed and serialized to deterministic JSON (locations and file paths stripped, keys sorted). The result is compared against a checked-in .json golden file in test-fixtures/golden-ast/expected/.
npm run test:golden-ast
Golden AST tests are also included in npm test.
When an intentional parser change alters AST shape, regenerate the golden files:
UPDATE_GOLDEN=1 npm run test:golden-ast
Review the diff to confirm the changes are expected, then commit the updated .json files.
.jh file in test-fixtures/golden-ast/fixtures/ (one concern per file).UPDATE_GOLDEN=1 npm run test:golden-ast to generate test-fixtures/golden-ast/expected/<name>.json.For concurrency-sensitive behavior (for example inbox stress with many sends and route targets, or run async with interleaved managed steps), the repository includes shell-based E2E scenarios that go beyond single native tests:
See e2e/tests/91_inbox_dispatch.sh, e2e/tests/93_inbox_stress.sh, and e2e/tests/94_parallel_shell_steps.sh for examples.
Some CLI behavior only activates when stdout is a real TTY — the live progress tree with ANSI redraws, for example. These tests use Python’s pty.openpty() to spawn jaiph run under a pseudo-terminal, capture the raw byte stream, and assert on the rendered output.
Two PTY tests exist today:
| Test file | What it covers |
|---|---|
e2e/tests/81_tty_progress_tree.sh |
Synchronous workflow progress rendering — verifies the tree structure, step timing, and PASS/FAIL markers under a real TTY. |
e2e/tests/131_tty_async_progress.sh |
Async workflow progress rendering — verifies that run async branches (with Handle<T> deferred resolution) render per-branch progress events under subscript-numbered nodes (₁, ₂), that both branches show resolved return values in the final frame, and that no orphaned ANSI escape sequences appear. |
Both tests require Python 3 and use only deterministic, non-LLM steps (sleep loops, log, scripts) so results are reproducible. Assertions use assert_contains with order-insensitive matching because async interleaving and PTY redraws make exact full-output comparison infeasible.
Shell harnesses and CI expectations for the full repo are described in Contributing — E2E testing.
E2E tests compare full CLI output and full artifact file contents by default. Use e2e::expect_stdout, e2e::expect_out, e2e::expect_file, e2e::expect_run_file, or e2e::assert_equals. Substring checks (e2e::assert_contains) require an inline comment justifying the exception. For the full policy (two surfaces, full equality, assert_contains exceptions, normalization), see Contributing — E2E testing. For the on-disk tree under .jaiph/runs/, see Architecture — Durable artifact layout.
Every .jh sample under e2e/ must be wired into at least one test. Run bash e2e/check_orphan_samples.sh to detect unreferenced fixtures. See Contributing — Orphan sample guard for details.
Similarly, every .jh and .test.jh file under examples/ must be accounted for in e2e/tests/110_examples.sh — either exercised with strict assertions or explicitly excluded with a rationale. An orphan guard in that script enforces this. See Contributing — Example matrix guard for details.
The project includes a Playwright-based test (e2e/playwright/landing-page.spec.ts) that verifies landing-page code samples stay in sync with real CLI behavior. Run it with npm run test:samples. See Contributing — Landing-page sample verification for details.
mock prompt "…", mock prompt <const>, or mock prompt { … }) — there are no external mock-config file paths. Inline responses must use double quotes (not single quotes).mock prompt { … } with queue-style mock prompt "…" / mock prompt <const> in the same test block; when a block is present, queued entries are ignored.mock workflow / mock rule / mock script require ref() with parentheses — empty () when there are no parameters.*.out files; exit 0 with an empty return, failures without a runtime error string, and other edge cases use the *.out / aggregated-output path described above.expect_* right-hand side is either a double-quoted literal or a test const name — not an arbitrary expression.expectContain / expectEqual / expectNotContain (camelCase) are rejected; use expect_contain, expect_equal, expect_not_contain.jaiph test <path> [extra...]) are accepted but ignored (reserved for future use).