Quick Start
This guide takes you from ingest to search and status checks in a few minutes.
1. Ingest Documents
Point Unsterwerx at a directory containing your documents:
unsterwerx ingest /path/to/documents
Use --dry-run to preview what would be ingested without writing to the database:
unsterwerx ingest --dry-run /path/to/documents
Dry Run: would ingest 2873 files
Filter by extension or file size:
unsterwerx ingest --dry-run -e pdf /path/to/documents
Dry Run: would ingest 1184 files
2. Check Status
See how many documents were ingested and their status:
unsterwerx status
Unsterwerx Status
══════════════════════════════════════════
Data directory: /home/user/.unsterwerx
Total documents: 2074
Total size: 2.7 GB
Indexed (FTS5): 1807
Audit events: 148
══════════════════════════════════════════
3. Find Duplicates
Run similarity analysis to detect exact and near-duplicate documents:
unsterwerx similarity
Similarity Analysis
══════════════════════════════════
Documents processed: 1806
Candidate pairs: 371
Exact duplicates: 97
Threshold: 0.30
══════════════════════════════════
4. Score Document Pairs and Build Vectors
Build Bayesian Business Intelligence scores that go beyond Jaccard similarity:
unsterwerx knowledge build --evaluate
unsterwerx knowledge vectors build
Building semantic features...
Corpus: 1807 docs, 2939590 unique terms (IDF snapshot #1)
Training Bayesian model...
Bootstrap labels: 318 positive, 636 negative
Model trained: run #1, P(dup)=0.301, P(unrel)=0.699
Scoring candidates...
Candidates scored: 371
Evaluation:
Post-train consistency: 100.0%
Improve results with feedback:
unsterwerx knowledge labels add --label duplicate_or_same_concept <DOC_A> <DOC_B>
This Business Intelligence pass scores candidate pairs already staged in the Universal Data Module, then clusters them into knowledge vectors for higher-level review.
Preview Business Intelligence dedup candidates inside vectors:
unsterwerx knowledge dedup scan --threshold 0.8
Apply dedup only after reviewing the plan, then rebuild vectors in the Universal Data Module:
unsterwerx knowledge dedup apply --confirm
unsterwerx knowledge vectors build
5. Search Content
Search across all canonical document content with full-text search:
unsterwerx search "policy"
Search Results (5 matches)
══════════════════════════════════════════════════════════════
1. Homeowners Policy Packet [d3d2da43]
HOMEOWNERS POLICY PACKET IMPORTANT MESSAGES...
2. DODI Standards [455d5bb1]
Establishing Policy in DoDIs...
══════════════════════════════════════════════════════════════
6. Classify Documents
Apply classification rules and view results:
unsterwerx classify
unsterwerx classify --show a1b2c3d4
Classifications for a1b2c3d4...
══════════════════════════════════════════
cv (62%) via rule 'seed-cv' at 2026-02-25
══════════════════════════════════════════
7. View Audit Trail
Every operation is recorded in an append-only hash-chained audit log:
unsterwerx audit --verify
Verifying audit hash chain...
Chain verified: 142 events, integrity OK
Next Steps
- Architecture: understand the module pipeline
- Workflow Guide: end-to-end walkthrough
- Commands: full command reference