`trillium bench`

A load generator that reports latency as an HDR-histogram summary. It runs in two modes: closed-loop (a fixed pool of connections, each firing the next request as soon as the previous completes) and open-loop (requests scheduled at a target arrival rate, independent of how fast they complete).

trillium bench <URL> [OPTIONS]

Closed-loop (default)

By default, bench opens 50 concurrent connections and runs for 10 seconds:

trillium bench https://localhost:8080

Tune the concurrency and stopping condition:

trillium bench https://localhost:8080 -c 200 -d 30s     # 200 connections for 30s
trillium bench https://localhost:8080 -c 100 -n 1000000 # 100 connections, 1M requests total

Flag	Default	Notes
`-c`, `--connections`	`50`	concurrent connections
`-d`, `--duration`	`10s`	run for this long (e.g. `10s`, `1m`, `30s500ms`)
`-n`, `--requests`		stop after this many requests (closed-loop only)

--duration and --requests are mutually exclusive; with neither, bench runs for 10 seconds.

Open-loop

Passing -r / --rate switches to open-loop scheduling: requests are launched at a fixed offered rate (requests per second) regardless of how quickly the server responds. This is the mode for measuring latency under a known load, since it doesn't let a slow server throttle the offered rate (avoiding "coordinated omission").

trillium bench https://localhost:8080 --rate 5000 --duration 30s
trillium bench https://localhost:8080 --rate 5000 --pacing poisson

Flag	Default	Notes
`-r`, `--rate`		target arrival rate (req/s); enables open-loop
`--pacing`	`uniform`	`uniform` (fixed interval) or `poisson` (exponential gaps)
`--max-concurrency`		hard cap on in-flight requests; excess are dropped as saturation

When the server can't keep up with the offered rate, scheduled requests that would exceed --max-concurrency are counted as saturation drops in the report — a direct signal that you've found the server's ceiling.

Request shape

bench shares the client flags for method, headers, body, TLS, and HTTP version:

trillium bench https://api.example.com/items -m POST \
  -H Content-Type=application/json -b '{"q":"test"}'

trillium bench https://api.example.com/upload --body-size 4kb   # synthetic body

Flag	Default	Notes
`-m`, `--method`	`GET`	HTTP method
`-H`, `--headers`		`KEY=VALUE`, repeatable
`-f`, `--file`		request body from a file
`-b`, `--body`		inline request body
`--body-size`		synthesize a zero-filled body of this size (`4kb`, `1mb`)
`--http-version`	`1.1`	`0.9`–`3`
`-t`, `--tls`	`rustls`	TLS backend
`--no-keepalive`		disable HTTP/1.1 connection reuse

Warmup and timeout

trillium bench https://localhost:8080 -d 1m --warmup 5s --timeout 2s

Flag	Notes
`-w`, `--warmup`	discard statistics collected during this initial period
`--timeout`	per-request timeout

--warmup lets connection pools and JITs settle before measurement begins, so the histogram reflects steady state rather than cold-start latency.

Reading the report

When stdout is a terminal, bench shows a live progress bar during the run and then prints a report with these sections:

Summary — elapsed time, completed/succeeded counts, request throughput (req/s), and bytes sent/received with receive throughput.
Status codes — a count per HTTP status, colored by class.
Errors — counts bucketed into io, timeout, protocol, other, plus saturation drops (open-loop).
Latency (full response) and Latency (TTFB) — HDR-histogram percentiles (min, mean, p50, p75, p90, p95, p99, p99.9, max, stdev). TTFB is time-to-first-byte.
Open-loop queue wait — in open-loop mode, how long scheduled requests waited for a free slot (a second saturation signal).

Machine-readable output

trillium bench https://localhost:8080 --json > report.json
trillium bench https://localhost:8080 --csv timings.csv

Flag	Notes
`--json`	emit the full report as JSON to stdout (suppresses the bar)
`--csv <PATH>`	write per-request timing samples (scheduled/started offsets, queue, TTFB, total, status, bytes) to a CSV file
`--no-progress`	suppress the live progress display even on a tty

The CSV captures one row per request, suitable for plotting latency over time or post-hoc percentile analysis.

Tuning the client's HTTP layer

For squeezing the client side, bench exposes a few trillium_http::HttpConfig knobs. These are rarely needed; reach for them only when the client itself is the bottleneck.

--response-buffer-len <BYTES>
--response-buffer-max-len <BYTES>
--head-max-len <BYTES>
--copy-loops-per-yield <N>
--received-body-max-len <BYTES>

Full flag reference

trillium bench [OPTIONS] <URL>

Options:
  -m, --method <METHOD>            [default: GET]
  -c, --connections <CONNECTIONS>  [default: 50]
  -d, --duration <DURATION>        (conflicts with --requests)
  -n, --requests <REQUESTS>        (conflicts with --duration)
  -r, --rate <RATE>                target req/s; switches to open-loop
      --pacing <PACING>            [default: uniform]  (uniform | poisson)
      --max-concurrency <N>
  -w, --warmup <WARMUP>
      --timeout <TIMEOUT>
  -H, --headers <HEADERS>          KEY=VALUE, repeatable
  -f, --file <FILE>
  -b, --body <BODY>
      --body-size <BODY_SIZE>
      --http-version <HTTP_VERSION>  [default: 1.1]
  -t, --tls <TLS>                  [default: rustls]
      --no-keepalive
      --json
      --csv <CSV>
      --no-progress
  -v, --verbose...
  -q, --quiet...
  -h, --help

Closed-loop (default)​

Open-loop​

Request shape​

Warmup and timeout​

Reading the report​

Machine-readable output​

Tuning the client's HTTP layer​

Full flag reference​