Unsterwerx

benchmark

Benchmarks the Unsterwerx TCA pipeline stages with timing, throughput, and storage metrics. Supports two modes: in-place (benchmarks existing data) and fresh (ingests from a source directory).

Usage

bash
unsterwerx benchmark [OPTIONS] [SOURCE]

Arguments

ArgumentRequiredDescription
SOURCENoSource directory for fresh mode. Omit for in-place mode

Options

OptionShortTypeDefaultDescription
--runs-ninteger3Number of benchmark runs to average
--stages-sstringallComma-separated stages: ingest, canonical, similarity, diff, classify, archive, search, reconstruct. Aliases: normalize/parse/extract map to canonical, denormalize maps to reconstruct
--formatstringtableOutput format: table or json
--jsonflagShortcut for --format json
--baselinepathPath to previous JSON report for delta comparison

Examples

In-place benchmark (existing data)

bash
unsterwerx benchmark --stages canonical,similarity
Run 1/3 (in-place copy)...
Run 2/3 (in-place copy)...
Run 3/3 (in-place copy)...

Unsterwerx Benchmark (3 runs averaged)
==============================================================
  Dataset:           2,074 docs / 2.7 GB

  Stage                Time       Throughput       Notes
  --------------------------------------------------------
  Normalize (NACs)     1m14s      36.9 MB/s        CSV: 574ms, DOCX: 8.1s, PDF: 56.6s
  Similarity           3.9s       462.5 docs/s     1,806 docs -> 373 pairs

  Storage:
    Original size:   2.7 GB
    Universal Data:     94 MB   (96.6% compaction)
    DB + indexes:      234 MB
    Diff artifacts:      4 MB
    Total footprint:   332 MB   (87.9% reduction)

  Trust Chain:       148 events, integrity OK
  Peak RSS:          5909 MB
  Wall clock:        4m3s
==============================================================

JSON output

bash
unsterwerx benchmark --format json --stages canonical
{
  "dataset_docs": 2074,
  "dataset_bytes": 2877788871,
  "runs": 3,
  "stages": [ ... ],
  "storage": {
    "original_bytes": 2877788871,
    "canonical_bytes": 98261419,
    "db_bytes": 245743616,
    "diff_bytes": 4208599
  },
  "trust_chain_events": 148,
  "trust_chain_ok": true,
  "wall_clock_secs": 256.27,
  "peak_rss_kb": 4609244
}

Compare against baseline

bash
unsterwerx benchmark --format json --stages canonical > baseline.json
unsterwerx benchmark --baseline baseline.json --stages canonical

Notes