TCA Overview

Unsterwerx is a document-domain implementation of concepts from US Patent US9069626B2, Trusted Client-Centric Application Architecture (TCA), by Dr. Robert Whetsel. The current scope covers document normalization and governance; broader cross-application capabilities described in the patent are roadmap items.

Core Principles

Normalization, Abstraction, Compaction (NAC)

Every document passes through three transformation stages:

Normalization: raw format-specific parsing produces uniform text output regardless of input format
Abstraction: structural analysis identifies headings, body text, lists, tables, then produces canonical markdown
Compaction: content hashing and similarity analysis reduce storage to unique knowledge

In Unsterwerx, these stages correspond to the parse → canonical → similarity pipeline.

Business Intelligence

The TCA defines Business Intelligence as the rules of hierarchy: classification and governance policies. In Unsterwerx, that means:

Classification rules: pattern-based document categorization, scoped to organizational boundaries
Knowledge scoring: Bayesian posterior probabilities for document pair relatedness, trained on bootstrap signals and refined by user feedback
Model invalidation: deterministic tracking of config changes and label events so the model reflects current state

Knowledge scores provide a richer signal than Jaccard similarity alone by incorporating TF-IDF cosine similarity, structural overlap, temporal proximity, provenance weights.

User Intelligence

User Intelligence covers the rules of engagement: retention periods, access controls, mutability constraints.

Retention policies: per-class retention periods with a hierarchical cascade
Legal holds: signed documents and policy-driven freezes
User feedback: human labels that override automated signals in knowledge scoring

Trust Chain

Every operation that modifies data is recorded in an append-only, hash-chained audit log. Each event contains:

Timestamp
Action type (ingest, canonical_extract, classify, reconstruct, etc.)
Target document ID
Result (success/error)
Hash linking to the previous event

Verify at any time with unsterwerx audit --verify to confirm no events have been tampered with or removed.

Source Hierarchy

Not all knowledge sources are equally trustworthy. Unsterwerx assigns trust weights to sources based on their class:

Trust Class	Default Weight	Examples
`academic`	5 (highest)	Peer-reviewed papers, dissertations
`government`	3	Government publications, regulations
`curated`	2	Curated datasets, local filesystems
`ai-generated`	1 (lowest)	ChatGPT exports, AI-generated content

Trust weights influence how conflicting information is resolved and how documents are prioritized in search results.

Policy Cascade

Retention policies follow a hierarchical cascade: global > organization > division > user. Each level can only tighten constraints set by the level above; a division policy cannot allow shorter retention than the organization policy. Policies are resolved per-document based on the document's scope assignment, so two divisions with conflicting policies resolve independently for their respective documents.

Policies control:

Retention period: minimum years or days before archival
Mutability: whether the document can be modified
Legal hold: whether the document is frozen for legal purposes
Archive action: what happens at the end of retention (move, delete, or keep)

Temporal Reconstruction

Each document's canonical state is stored alongside structural diffs between similar versions, enabling point-in-time reconstruction by applying the canonical content plus relevant diffs.

Reconstructed documents can be output as markdown or as read-only PDF with encryption.

For a full mapping of patent concepts to their implementation status, see Patent Mapping.