Unsterwerx

Unsterwerx is a document-domain implementation of the Trusted Client-Centric Application Architecture (US Patent US9069626B2). It ingests common document formats into a local Shared Sandbox, normalizes them into a Universal Data Set, finds duplicates and near-duplicates, computes structural diffs, and supports temporal reconstruction under Business Intelligence and User Intelligence policy control.

Features

Ingest thousands of documents from any directory tree
Detect exact and near-duplicate documents via MinHash + LSH
Extract searchable canonical markdown from every supported format
Diff structural changes between similar document versions
Search the entire corpus with full-text search (SQLite FTS5)
Classify documents with regex-based rules and retention policies
Import from external sources: ChatGPT, Notion, Obsidian, Telegram
Reconstruct documents from canonical store as markdown or PDF
Cluster and compact related content in the Universal Data Module with Bayesian Business Intelligence, knowledge vectors, plus BI dedup
Audit every operation with an append-only hash-chained log
Benchmark the full pipeline with detailed performance metrics

Quick Start

bash

curl -fsSL https://unsterwerx.run/install.sh | sh
unsterwerx ingest /path/to/documents
unsterwerx similarity
unsterwerx search "data architecture"
unsterwerx status --detailed

Commands

Command	Description
ingest	Ingest files from a source directory
status	Show system and document status
reindex	Rebuild full-text search index (FTS5)
similarity	Run similarity analysis on ingested documents
diff	Compute diffs between similar document pairs
search	Search canonical document content
reconstruct	Reconstruct a document from canonical store
classify	Classify documents using rules
archive	Archive documents per retention policies
audit	View and verify audit log
rules	Manage classification rules
knowledge	Bayesian scoring, vector graphs, BI dedup
import	Import data from external sources
jobs	Manage background ingest and import jobs
config	Manage configuration
benchmark	Benchmark the TCA pipeline
upgrade	Check for and install the latest release