01 · INGEST
Textify & Normalize
PDF extraction via pdfplumber. Raw text normalization, page mapping, character count, reference block isolation. Extraction method stamped to GoldPaper.
02 · TOPOLOGY
Six-Layer Fingerprint
Pre-linguistic topology across token, character, punctuation, sentence, paragraph, and document layers. Density gradient, hapax ratio, type-token ratio, character entropy. Fully deterministic — same input always produces same output.
03 · TOKENS
Semantic Extraction
Stopword removal, top tokens, bigrams, trigrams, token-to-page provenance map, hapax legomena sample, lexical density, average word length.
04 · SCHEMA
Schema Graph Traversal
Schema.org type neighborhood traversal. Ancestor chain — ScholarlyArticle ← Article ← CreativeWork ← Thing. Negative space declaration per Constitutional Law VI. Schema hits scored across both corpora.
05 · WIKIPEDIA
Edge Discovery
Token-to-Wikipedia edge verification with confidence scoring. HIGH and MEDIUM hits retained with extract, page provenance, and source paper stamp. Unfetched candidates preserved for provenance display.
06 · BRIDGE
Cross-Paper Semantic Bridge
Common tokens, common phrases, uncommon tokens per paper, semantic edges with fuzzy match scores, shared references, topology delta field by field. No inference — pure deterministic cross-corpus measurement.
07 · INFERENCE
Synthesis Generation
Three LLM passes — synopsis per paper, connection over shared intellectual territory, inflection over structural negative space. Labeled SEMANTIC-LLM. Control condition run in parallel over raw unstructured text only — same model, same temperature.
08 · RENDER
HTML Page Generation
generate_publication.py assembles 17 sections into a single complete HTML page — provenance ledger, synopsis, training bundle, semantic bridge, topology, word metrics, schema graph, Wikipedia cards, full corpus windows, structured inference, unstructured control, root-LD display. Written to output/publications/{slug}/index.html.
09 · PUBLISH
Graph Append & Deploy
Summary node prepended to publications/graph.jsonld — newest first, permanently growing. Card rendered to publications.html index. Lineage edge to previous post written. R2 upload. Cloudflare deploy.