Unsterwerx

TCA Overview

Unsterwerx is a document-domain implementation of concepts from US Patent US9069626B2, Trusted Client-Centric Application Architecture (TCA), by Dr. Robert Whetsel. The current scope covers document normalization and governance; broader cross-application capabilities described in the patent are roadmap items.

Core Principles

Normalization, Abstraction, Compaction (NAC)

Every document passes through three transformation stages:

  1. Normalization: raw format-specific parsing produces uniform text output regardless of input format
  2. Abstraction: structural analysis identifies headings, body text, lists, tables, then produces canonical markdown
  3. Compaction: content hashing and similarity analysis reduce storage to unique knowledge

In Unsterwerx, these stages correspond to the parsecanonicalsimilarity pipeline.

Business Intelligence

The TCA defines Business Intelligence as the rules of hierarchy: classification and governance policies. In Unsterwerx, that means:

Knowledge scores provide a richer signal than Jaccard similarity alone by incorporating TF-IDF cosine similarity, structural overlap, temporal proximity, provenance weights.

User Intelligence

User Intelligence covers the rules of engagement: retention periods, access controls, mutability constraints.

Trust Chain

Every operation that modifies data is recorded in an append-only, hash-chained audit log. Each event contains:

Verify at any time with unsterwerx audit --verify to confirm no events have been tampered with or removed.

Source Hierarchy

Not all knowledge sources are equally trustworthy. Unsterwerx assigns trust weights to sources based on their class:

Trust ClassDefault WeightExamples
academic5 (highest)Peer-reviewed papers, dissertations
government3Government publications, regulations
curated2Curated datasets, local filesystems
ai-generated1 (lowest)ChatGPT exports, AI-generated content

Trust weights influence how conflicting information is resolved and how documents are prioritized in search results.

Policy Cascade

Retention policies follow a hierarchical cascade: global > organization > division > user. Each level can only tighten constraints set by the level above; a division policy cannot allow shorter retention than the organization policy. Policies are resolved per-document based on the document's scope assignment, so two divisions with conflicting policies resolve independently for their respective documents.

Policies control:

Temporal Reconstruction

Each document's canonical state is stored alongside structural diffs between similar versions, enabling point-in-time reconstruction by applying the canonical content plus relevant diffs.

Reconstructed documents can be output as markdown or as read-only PDF with encryption.

For a full mapping of patent concepts to their implementation status, see Patent Mapping.