Unsterwerx

Configuration Reference

Unsterwerx configuration is stored in TOML format in the data directory. View current config with unsterwerx config show.

[general]

KeyTypeDefaultDescription
general.data_dirstring~/.unsterwerxData directory path

[ingest]

KeyTypeDefaultDescription
ingest.extensionsstring[]["pdf", "docx", "xlsx", "pptx", "doc", "xls", "ppt", "txt", "csv", "rtf"]File extensions to process during ingestion
ingest.max_file_sizeinteger524288000 (500 MB)Maximum scan/discovery file size in bytes
ingest.max_size_fileinteger104857600 (100 MB)Maximum parse-stage file size in bytes for in-memory parsers
ingest.skip_hiddenbooleantrueSkip hidden files (starting with .)
ingest.follow_symlinksbooleanfalseFollow symbolic links during directory traversal

[similarity]

KeyTypeDefaultDescription
similarity.shingle_kinteger3Shingle size (number of tokens per shingle)
similarity.num_hashesinteger128Number of MinHash hash functions
similarity.lsh_bandsinteger32Number of LSH bands
similarity.lsh_rowsinteger4Number of rows per LSH band
similarity.thresholdfloat0.3Jaccard similarity threshold

[storage]

KeyTypeDefaultDescription
storage.journal_modestring"wal"SQLite journal mode (wal recommended)
storage.busy_timeout_msinteger5000SQLite busy timeout in milliseconds
storage.zstd_levelinteger3Zstandard compression level for diff payloads

[knowledge]

KeyTypeDefaultDescription
knowledge.feature_versioninteger1Feature version. Bump it to force full recomputation of semantic features
knowledge.temporal_scale_secsfloat86400.0Scale for temporal proximity in seconds (86400 = 24 hours)
knowledge.feedback_weightfloat3.0Weight multiplier for user feedback labels in Bayesian training
knowledge.negative_ratiofloat2.0Maximum negative samples as ratio of positive count
knowledge.min_bootstrap_confidencefloat0.5Minimum confidence threshold for bootstrap labels
knowledge.bootstrap_thresholdfloat0.7Jaccard threshold for bootstrap positive seed labels
knowledge.dedup_thresholdfloat0.8Default posterior threshold for knowledge dedup scan/apply
knowledge.vectors.thresholdfloat0.5Posterior threshold for clustering documents into knowledge vectors
knowledge.vectors.min_vector_sizeinteger2Minimum cluster size required to persist a vector
knowledge.vectors.edge_thresholdfloat0.3Posterior threshold for inter-vector edges
knowledge.vectors.max_vector_sizeinteger50Warning threshold for unusually large vectors

Setting Values

bash
# Get a value
unsterwerx config get similarity.threshold
0.3

# Set a value
unsterwerx config set similarity.threshold 0.5

# View all settings
unsterwerx config show

Notes