diff

Computes structural diffs between similar document pairs identified by the similarity analysis. Without flags, lists existing computed diffs. Use --all to compute diffs for all candidate pairs, or --doc-a/--doc-b to diff a specific pair.

Usage

bash

unsterwerx diff [OPTIONS]

Options

Option	Short	Type	Default	Description
`--doc-a`		UUID		First document ID for pairwise diff
`--doc-b`		UUID		Second document ID for pairwise diff
`--all`		flag		Compute diffs for all similarity candidate pairs
`--context`	`-C`	integer	3	Context lines for unified diff display

Examples

List existing diffs

bash

unsterwerx diff

Recent Diffs
══════════════════════════════════════════════════════════════
  2026-02-25 ICCC-2017-paper-v1.pdf <-> ICCC-2017-paper-v2.pdf | +490 -445 (166% changed)
  2026-02-25 Request-For-Order.pdf <-> RFO-draft.pdf | +28 -34 (84% changed)
  2026-02-25 Final-Project.pdf <-> Final-Project.docx | +121 -139 (172% changed)
  2026-02-25 dissertation_v11.docx <-> dissertation_v09.docx | +6 -168 (21% changed)
  2026-02-25 Data-Tagging-v1.2a.docx <-> Data-Tagging-v1.2b.docx | +30 -30 (23% changed)
  ...
══════════════════════════════════════════════════════════════

Diff a specific pair

bash

unsterwerx diff --doc-a a1b2c3d4-... --doc-b e5f6a7b8-...

@@ -130,133 +130,115 @@

 Chapter One

-Of the currently accepted attributes used to describe a big dataset,
-volume and velocity can be measured while variety remains an abstraction
+Of the currently accepted attributes used to describe a big dataset,
+volume and velocity can be measured while variety remains an abstraction
 (Jagadish, 2015). Yet, variety has the potential to significantly
-impact how difficult it is to analyze big datasets. This lack of
-measurability is a barrier to quantitatively assess a big data set
+impact how difficult it is to analyze big datasets. This lack of
+measurability is a barrier to quantitatively assessing a big data,
 or even predicting the possible difficulty of analysis.

Compute all diffs

bash

unsterwerx diff --all

Computing diffs for all candidate pairs...

Diff Summary
══════════════════════════════════
  Pairs processed:       371
  Diffs computed:        274
  Errors:                  0
══════════════════════════════════

Diff with more context

bash

unsterwerx diff --doc-a a1b2c3d4-... --doc-b e5f6a7b8-... -C 5

Notes

Diffs are stored as compressed (zstd) payloads in the CAS diffs/ directory.
Identical documents (Jaccard score = 1.0) produce no diff output: Documents are identical.
The change percentage is computed from the ratio of added/removed lines to total document lines.
Run unsterwerx similarity before diff --all to ensure candidate pairs exist.