diff
Computes structural diffs between similar document pairs identified by the similarity analysis. Without flags, lists existing computed diffs. Use --all to compute diffs for all candidate pairs, or --doc-a/--doc-b to diff a specific pair.
Usage
bash
unsterwerx diff [OPTIONS]
Options
| Option | Short | Type | Default | Description |
|---|---|---|---|---|
--doc-a | UUID | First document ID for pairwise diff | ||
--doc-b | UUID | Second document ID for pairwise diff | ||
--all | flag | Compute diffs for all similarity candidate pairs | ||
--context | -C | integer | 3 | Context lines for unified diff display |
Examples
List existing diffs
bash
unsterwerx diff
Recent Diffs
══════════════════════════════════════════════════════════════
2026-02-25 ICCC-2017-paper-v1.pdf <-> ICCC-2017-paper-v2.pdf | +490 -445 (166% changed)
2026-02-25 Request-For-Order.pdf <-> RFO-draft.pdf | +28 -34 (84% changed)
2026-02-25 Final-Project.pdf <-> Final-Project.docx | +121 -139 (172% changed)
2026-02-25 dissertation_v11.docx <-> dissertation_v09.docx | +6 -168 (21% changed)
2026-02-25 Data-Tagging-v1.2a.docx <-> Data-Tagging-v1.2b.docx | +30 -30 (23% changed)
...
══════════════════════════════════════════════════════════════
Diff a specific pair
bash
unsterwerx diff --doc-a a1b2c3d4-... --doc-b e5f6a7b8-...
@@ -130,133 +130,115 @@
Chapter One
-Of the currently accepted attributes used to describe a big dataset,
-volume and velocity can be measured while variety remains an abstraction
+Of the currently accepted attributes used to describe a big dataset,
+volume and velocity can be measured while variety remains an abstraction
(Jagadish, 2015). Yet, variety has the potential to significantly
-impact how difficult it is to analyze big datasets. This lack of
-measurability is a barrier to quantitatively assess a big data set
+impact how difficult it is to analyze big datasets. This lack of
+measurability is a barrier to quantitatively assessing a big data,
or even predicting the possible difficulty of analysis.
Compute all diffs
bash
unsterwerx diff --all
Computing diffs for all candidate pairs...
Diff Summary
══════════════════════════════════
Pairs processed: 371
Diffs computed: 274
Errors: 0
══════════════════════════════════
Diff with more context
bash
unsterwerx diff --doc-a a1b2c3d4-... --doc-b e5f6a7b8-... -C 5
Notes
- Diffs are stored as compressed (zstd) payloads in the CAS
diffs/directory. - Identical documents (Jaccard score = 1.0) produce no diff output:
Documents are identical. - The change percentage is computed from the ratio of added/removed lines to total document lines.
- Run
unsterwerx similaritybeforediff --allto ensure candidate pairs exist.