Testing Jaiph Workflows

Jaiph includes a built-in test harness for workflow testing. Test files (*.test.jh) let you mock prompt responses, stub workflows, rules, and scripts, run workflows through the same in-process Node workflow runtime used by jaiph run (NodeWorkflowRuntime), and assert on captured output — all without calling real LLMs or depending on external state. Unlike jaiph run, the test harness does not spawn a separate node-workflow-runner process: after buildScripts, the CLI runs runTestFile from node-test-runner.ts in the same process. There is no Docker mode for jaiph test (workflows under test always run on the host). The system layout (including Test runner integration and the Node test runner) is described in Architecture.

In production, a workflow’s behavior depends on live models, host timing, and local files. A harness fixes inputs (mock prompts, stubbed workflows/scripts), runs the same interpreter the CLI uses for real runs, and checks outputs with small assertions so CI and refactors can catch regressions without external services.

File naming and layout

Test files use the .test.jh suffix (for example workflow_greeting.test.jh).

A test file supports the same top-level forms as any .jh file (import, config, workflow, etc.), but the CLI only executes test "..." { ... } blocks. Other declarations are parsed into the runtime graph — for example, a local workflow is visible to single-segment references.

Recommended style: keep test files to import statements and test blocks. Define the workflows under test in separate modules so files stay small and focused.

Import paths in import "..." as alias resolve relative to the test file’s directory, with the same extension handling as ordinary modules (.jh is appended when omitted). See Grammar — Lexical notes.

Running tests

# All *.test.jh files under the detected workspace root (recursive)
jaiph test

# All tests under a directory (recursive)
jaiph test ./e2e

# One file
jaiph test ./e2e/workflow_greeting.test.jh

# Equivalent shorthand (a *.test.jh path is treated as jaiph test)
jaiph ./e2e/workflow_greeting.test.jh

Discovery: jaiph test walks the given directory recursively, or the workspace root when no path is passed. The workspace root is found by walking up from the current directory until a .jaiph or .git directory exists; if neither is found, the current directory is used.

If no *.test.jh files are found, the command prints an error and exits with status 1. A file must contain at least one test block; otherwise the CLI reports a parse error. Passing a plain *.jh file that is not named *.test.jh is rejected — use jaiph run for those.

Test block syntax

Each test block is a named test case containing ordered steps:

import "workflow_greeting.jh" as w

test "runs happy path and prints PASS" {
  mock prompt "e2e-greeting-mock"
  const response = run w.default()
  expect_contain response "e2e-greeting-mock"
  expect_contain response "done"
}

Inside a test block, steps execute in order. The following step types are available.

Mock prompt (inline)

Queues a fixed response for the next prompt call in the workflow under test. Multiple mock prompt lines queue in order — one is consumed per prompt call.

mock prompt "hello from mock"
mock prompt "second response"
mock prompt myConstName

Use a double-quoted string (escapes: \", \n, \\) or a bare identifier for a test const defined earlier in the block.

Mock prompt (content-based dispatch)

Dispatches different responses based on the prompt text using pattern matching. Arms are tested top-to-bottom; the first match wins.

mock prompt {
  /greeting/ => "hello"
  /farewell/ => "goodbye"
  _ => "default response"
}

Each arm is pattern => "response". Patterns can be:

Without a _ wildcard arm, an unmatched prompt fails the test.

Do not combine mock prompt { ... } with inline mock prompt "..." in the same test block — when a block mock is present, inline queue entries are ignored.

Mock workflow

Replaces a workflow body for this test case with Jaiph steps:

mock workflow w.greet() {
  return "stubbed greeting"
}

The reference format is <alias>.<workflow> (preferred) or <name> for a workflow defined in the test file itself.

Mock rule

Same as mock workflow, but for rules (body uses Jaiph steps, not shell):

mock rule w.validate() {
  return "stubbed validation"
}

Mock script

Stubs a module script block:

mock script w.helper() {
  echo "stubbed script"
}

Test stubs use mock script, not mock function; the latter is a parse error with a fix hint.

Workflow run (with capture)

Runs a workflow and captures its output into a variable:

const response = run w.default()

Capture semantics match production behavior:

  1. If the workflow exits 0 with a non-empty explicit return value, that string is captured.
  2. If the workflow fails (non-zero exit), the runtime error string is captured (when present).
  3. Otherwise, the harness reads all *.out files in the run directory sorted by filename, or falls back to the runtime’s aggregated output.

The test fails on non-zero exit unless allow_failure is specified.

Variants:

# With an argument
const response = run w.default("my input")

# Allow failure
const response = run w.default() allow_failure

# With argument and allow failure
const response = run w.default("my input") allow_failure

Workflow run (no capture)

Runs a workflow without storing output. Still fails on non-zero exit unless allow_failure is appended:

run w.setup()
run w.setup("arg")
run w.setup() allow_failure

Test block constants

Inside a test block, const NAME = "value" binds a test-local string (double-quoted literal only; no interpolation). Names can be used as:

const bindings used for mock prompt or expected values must appear before the steps that read them. Capture variables (const x = run w.default()) are separate: only const … = run … introduces a capture name for expect_*.

Assertions

After capturing workflow output, use these to check the result:

expect_contain response "expected substring"
expect_not_contain response "unwanted text"
expect_equal response "exact expected value"

The second argument is either a double-quoted string (with \", \n, and \\ escapes) or a const name bound earlier in the same test block (see Test block constants):

const want = "expected substring"
expect_contain response want

Failures print expected vs. actual previews.

Typed prompts

When a workflow uses typed prompts (returns "{ ... }"), mock text must be a single line of valid JSON matching the schema so that parsing and field variables work correctly. Fields are accessed with dot notation — ${result.field} — in log, return, and other interpolation contexts. See e2e/prompt_returns_run_capture.test.jh and e2e/dot_notation.test.jh for examples.

Pass/fail reporting

Each test block runs in isolation. Assertions, shell errors, or a workflow exiting non-zero (without allow_failure) mark that case as failed.

The runner output looks like:

testing workflow_greeting.test.jh
  ▸ runs happy path
  ✓ 0s
  ▸ handles error case
  ✗ expect_contain failed: "response" (42 chars) does not contain "expected" 1s

✗ 1 / 2 test(s) failed
  - handles error case

When all tests pass: ✓ N test(s) passed. Exit status is 0 on full success, non-zero if any test failed.

How it works

The CLI parses each test file and passes test "…" { … } blocks to runTestFile() (src/runtime/kernel/node-test-runner.ts). That path aligns with the Test runner integration description in Architecture:

  1. buildScripts(testFileAbs, tmpDir, workspaceRoot) — same helper as jaiph run, with the test file as the entrypoint (test.ts calls it with the absolute path to the *.test.jh file). For a file entrypoint, the transpiler walks the test module and every file reachable by transitive import (see collectTransitiveJhModules in src/transpile/build.ts); it runs validateReferences / emitScriptsForModule per file and writes atomic script files into a temp scripts/ tree. (If buildScripts were ever given a directory entrypoint, directory walks skip *.test.jh files — that is not how jaiph test invokes it.)
  2. buildRuntimeGraph(testFileAbs, workspaceRoot) — called once per test file; the same graph is reused for every test block in that file and for every run step inside them.
  3. For each block, a fresh temp layout sets env vars (below); workflows run in NodeWorkflowRuntime, not in a detached child.

There is no Bash transpilation of full workflows on this path — only extracted script bodies are shell, same as production. The import graph is fixed for a single jaiph test process; mutating imported *.jh on disk between blocks is not a supported use case.

Environment variables

For each workflow run inside a test block, the harness builds the runtime environment from process.env plus:

Variable Value
JAIPH_TEST_MODE 1 (selects mock prompt dispatch in prompt.ts)
JAIPH_WORKSPACE Project root (from detectWorkspaceRoot)
JAIPH_RUNS_DIR Per test block, …/tmp/jaiph-test-block-*/.jaiph/runs (ephemeral)
JAIPH_SCRIPTS Directory containing extracted script files from buildScripts (temp)
JAIPH_MOCK_RESPONSES_FILE or JAIPH_MOCK_DISPATCH_SCRIPT Set by the runner when using inline or block mock prompt (do not set manually)

You do not set JAIPH_TEST_MODE yourself; the harness manages it. Its only purpose is to route prompt steps to the mock dispatcher in prompt.ts. It no longer controls __JAIPH_EVENT__ stderr suppression — the test runner now passes suppressLiveEvents: true directly to the in-process NodeWorkflowRuntime constructor so test reporter output stays clean. Durable run_summary.jsonl writes are unaffected; production runs (jaiph run via the spawned node-workflow-runner child) do not set the flag and stream events to stderr as before.

Organizing tests

A Given / When / Then structure works well but is not required — comments and blank lines are fine:

import "app.jh" as app

test "default workflow prints greeting" {
  # Given
  mock prompt "hello"

  # When
  const out = run app.default()

  # Then
  expect_contain out "hello"
}

Compiler tests (txtar format)

Compiler tests verify parse and validate outcomes using a language-agnostic txtar format. Unlike the TypeScript-embedded tests in src/, these fixtures are plain text files that can be reused by alternative implementations (e.g. a Rust compiler).

Test fixture files live in test-fixtures/compiler-txtar/ as .txt files. Each file contains multiple test cases separated by === delimiters:

=== test name here
# @expect ok
--- input.jh
workflow default() {
  log "hello"
}

=== another test
# @expect error E_PARSE "unterminated workflow block"
--- input.jh
workflow default() {
  log "hello"

Format rules

Expect directives

Directive Meaning
# @expect ok Parse + validate succeed with no errors
# @expect error E_CODE "substring" An error is thrown whose message contains both E_CODE and substring
# @expect error E_CODE "substring" @L Same, and the error must be reported at line L (any column)
# @expect error E_CODE "substring" @L:C Same, and the error must be reported at line L, column C

Single-file vs multi-file tests

The entry file is determined by priority: main.jh if present, otherwise input.jh, otherwise input.test.jh, otherwise the first file.

Running compiler tests

npm run test:compiler

The runner (test-infra/compiler-test-runner.ts) discovers all .txt files in test-fixtures/compiler-txtar/, parses them, writes virtual files to a temp directory per case, runs parsejaiph + validateReferences, and asserts the expected outcome. Results are reported per test case via node:test. Compiler tests are also included in npm test.

Fixture files

Test cases are organized by error type and single-vs-multi-module:

File Cases What it covers
test-fixtures/compiler-txtar/valid.txt 119 Success cases — source compiles without error (single-module)
test-fixtures/compiler-txtar/parse-errors.txt 274 E_PARSE error cases — syntax and grammar violations
test-fixtures/compiler-txtar/validate-errors.txt 88 E_VALIDATE, E_IMPORT_NOT_FOUND, E_SCHEMA error cases (single-module)
test-fixtures/compiler-txtar/validate-errors-multi-module.txt 20 Validation errors requiring imports (multi-file)

(Counts are one # @expect per test case; re-count after large fixture changes.)

The initial cases were extracted from TypeScript test files across src/parse/*.test.ts and src/transpile/*.test.ts. Additional cases were written directly as txtar fixtures to cover compiler error paths that had no prior test coverage. Only tests that verify “source in, pass/fail out” qualify — tests that check AST structure or internal APIs remain in TypeScript.

Conventions

The format is documented in detail in test-fixtures/compiler-txtar/README.md.

Golden AST tests

Golden AST tests verify that the parser produces the expected tree shape for successful parses. While compiler tests (txtar) cover pass/fail outcomes and E2E tests cover runtime behavior, golden AST tests lock in what the parser actually produced — so refactors cannot silently change tree structure.

How it works

Each .jh fixture in test-fixtures/golden-ast/fixtures/ is parsed and serialized to deterministic JSON (locations and file paths stripped, keys sorted). The result is compared against a checked-in .json golden file in test-fixtures/golden-ast/expected/.

Running golden AST tests

npm run test:golden-ast

Golden AST tests are also included in npm test.

Updating goldens

When an intentional parser change alters AST shape, regenerate the golden files:

UPDATE_GOLDEN=1 npm run test:golden-ast

Review the diff to confirm the changes are expected, then commit the updated .json files.

Adding a new fixture

  1. Create a small, focused .jh file in test-fixtures/golden-ast/fixtures/ (one concern per file).
  2. Run UPDATE_GOLDEN=1 npm run test:golden-ast to generate test-fixtures/golden-ast/expected/<name>.json.
  3. Review the generated JSON and commit both files.

Stress and soak testing

For concurrency-sensitive behavior (for example inbox stress with many sends and route targets, or run async with interleaved managed steps), the repository includes shell-based E2E scenarios that go beyond single native tests:

See e2e/tests/91_inbox_dispatch.sh, e2e/tests/93_inbox_stress.sh, and e2e/tests/94_parallel_shell_steps.sh for examples.

PTY-based TTY tests

Some CLI behavior only activates when stdout is a real TTY — the live progress tree with ANSI redraws, for example. These tests use Python’s pty.openpty() to spawn jaiph run under a pseudo-terminal, capture the raw byte stream, and assert on the rendered output.

Two PTY tests exist today:

Test file What it covers
e2e/tests/81_tty_progress_tree.sh Synchronous workflow progress rendering — verifies the tree structure, step timing, and PASS/FAIL markers under a real TTY.
e2e/tests/131_tty_async_progress.sh Async workflow progress rendering — verifies that run async branches (with Handle<T> deferred resolution) render per-branch progress events under subscript-numbered nodes (₁, ₂), that both branches show resolved return values in the final frame, and that no orphaned ANSI escape sequences appear.

Both tests require Python 3 and use only deterministic, non-LLM steps (sleep loops, log, scripts) so results are reproducible. Assertions use assert_contains with order-insensitive matching because async interleaving and PTY redraws make exact full-output comparison infeasible.

E2E testing

Shell harnesses and CI expectations for the full repo are described in Contributing — E2E testing.

E2E tests compare full CLI output and full artifact file contents by default. Use e2e::expect_stdout, e2e::expect_out, e2e::expect_file, e2e::expect_run_file, or e2e::assert_equals. Substring checks (e2e::assert_contains) require an inline comment justifying the exception. For the full policy (two surfaces, full equality, assert_contains exceptions, normalization), see Contributing — E2E testing. For the on-disk tree under .jaiph/runs/, see Architecture — Durable artifact layout.

Every .jh sample under e2e/ must be wired into at least one test. Run bash e2e/check_orphan_samples.sh to detect unreferenced fixtures. See Contributing — Orphan sample guard for details.

Similarly, every .jh and .test.jh file under examples/ must be accounted for in e2e/tests/110_examples.sh — either exercised with strict assertions or explicitly excluded with a rationale. An orphan guard in that script enforces this. See Contributing — Example matrix guard for details.

Landing-page sample verification

The project includes a Playwright-based test (e2e/playwright/landing-page.spec.ts) that verifies landing-page code samples stay in sync with real CLI behavior. Run it with npm run test:samples. See Contributing — Landing-page sample verification for details.

Limitations (v1)