TCA Overview
Unsterwerx is a document-domain implementation of concepts from US Patent US9069626B2, Trusted Client-Centric Application Architecture (TCA), by Dr. Robert Whetsel. The current scope covers document normalization and governance; broader cross-application capabilities described in the patent are roadmap items.
Core Principles
Normalization, Abstraction, Compaction (NAC)
Every document passes through three transformation stages:
- Normalization: raw format-specific parsing produces uniform text output regardless of input format
- Abstraction: structural analysis identifies headings, body text, lists, tables, then produces canonical markdown
- Compaction: content hashing and similarity analysis reduce storage to unique knowledge
In Unsterwerx, these stages correspond to the parse → canonical → similarity pipeline.
Business Intelligence
The TCA defines Business Intelligence as the rules of hierarchy: classification and governance policies. In Unsterwerx, that means:
- Classification rules: pattern-based document categorization, scoped to organizational boundaries
- Knowledge scoring: Bayesian posterior probabilities for document pair relatedness, trained on bootstrap signals and refined by user feedback
- Model invalidation: deterministic tracking of config changes and label events so the model reflects current state
Knowledge scores provide a richer signal than Jaccard similarity alone by incorporating TF-IDF cosine similarity, structural overlap, temporal proximity, provenance weights.
User Intelligence
User Intelligence covers the rules of engagement: retention periods, access controls, mutability constraints.
- Retention policies: per-class retention periods with a hierarchical cascade
- Legal holds: signed documents and policy-driven freezes
- User feedback: human labels that override automated signals in knowledge scoring
Trust Chain
Every operation that modifies data is recorded in an append-only, hash-chained audit log. Each event contains:
- Timestamp
- Action type (ingest, canonical_extract, classify, reconstruct, etc.)
- Target document ID
- Result (success/error)
- Hash linking to the previous event
Verify at any time with unsterwerx audit --verify to confirm no events have been tampered with or removed.
Source Hierarchy
Not all knowledge sources are equally trustworthy. Unsterwerx assigns trust weights to sources based on their class:
| Trust Class | Default Weight | Examples |
|---|---|---|
academic | 5 (highest) | Peer-reviewed papers, dissertations |
government | 3 | Government publications, regulations |
curated | 2 | Curated datasets, local filesystems |
ai-generated | 1 (lowest) | ChatGPT exports, AI-generated content |
Trust weights influence how conflicting information is resolved and how documents are prioritized in search results.
Policy Cascade
Retention policies follow a hierarchical cascade: global > organization > division > user. Each level can only tighten constraints set by the level above; a division policy cannot allow shorter retention than the organization policy. Policies are resolved per-document based on the document's scope assignment, so two divisions with conflicting policies resolve independently for their respective documents.
Policies control:
- Retention period: minimum years or days before archival
- Mutability: whether the document can be modified
- Legal hold: whether the document is frozen for legal purposes
- Archive action: what happens at the end of retention (
move,delete, orkeep)
Temporal Reconstruction
Each document's canonical state is stored alongside structural diffs between similar versions, enabling point-in-time reconstruction by applying the canonical content plus relevant diffs.
Reconstructed documents can be output as markdown or as read-only PDF with encryption.
For a full mapping of patent concepts to their implementation status, see Patent Mapping.