The Boise Standard Record

A minted entity is a permanent, machine-readable provenance record — built to the standard we would want AI reading about us.

Every entity in the Boise Standard directory is minted. Minting is the act of crawling a live source, measuring every measurable dimension of it, assembling those measurements into a structured provenance record with a permanent identifier, and publishing that record at a stable URL where AI systems, researchers, and humans can read it — in that order, simultaneously, without ambiguity about what was measured, when, and by what method.

The record does not describe the entity. It measures it. The distinction is not semantic. A description carries the author's interpretation. A measurement carries the author's methodology, and the methodology is published alongside the output. Every number on a Boise Standard entity page traces to a specific pipeline stage, a specific crawl timestamp, a specific source URL. The chain of custody is unbroken from the raw HTTP response to the rendered page.

The living example of what a minted entity record looks like — in full — is boisestandard.org/web/hamstrahvac-com. That page is the reference implementation. Everything described on this page is present on that page, in the code, right now.

◈ What Gets Minted

The Boise Standard refinery pipeline crawls the source domain, extracts every schema.org block from every interior page it can reach, measures the full text corpus for structural topology, scores the schema implementation against the declared type's property neighborhood, generates a machine-readable atomic answer grounded in the measured fields, and assembles all of it into a Root-LD traveling context pod embedded directly in the page head.

An AI crawler hitting a Boise Standard entity profile gets complete provenance on the first HTTP request. No body parse required. The structured data in the head carries the entity's full measurement record — identity, schema graph, topology fingerprint, semantic signal, gap analysis, atomic answer, and a recursive layer initialized at mint and ready to receive corpus edges as the graph grows.

This is not a directory listing. It is a provenance record built to the standard the machine-readable web requires and the community deserves.

What a Minted Record Contains

Every field measured. Every field sourced.

Identity

Domain · slug · TLD · canonical URL · Federation ID · status code · SSL · server · response time · redirect chain · tech stack · security label · freshness label

Schema

All declared types · property count · block count · coverage score · gap list · parent types · sibling types · child types · negative type space · graph edge URLs

Signal

SEO record · topology fingerprint · top 40 semantic words by frequency · 8 ratio signals · navigation map · full extracted text corpus · atomic answer

Provenance

UUID · content hash · mint timestamp · pipeline version · generation method · vocab version · analysis timestamp · model stamp · input hash

The Refinery Pipeline

Six stages. Every measurement deterministic.
Same input produces the same output. Always.

The pipeline is not a black box. Every stage is named, timed, and stamped in the record it produces. The hamstrahvac.com entity was minted in 26.33 seconds across six stages. Every entity in the directory was minted by the same pipeline, in the same sequence, against the same vocabulary.

Stage 1 — Extract 1

Live HTTP crawl of the source domain.

The pipeline makes a live HTTP request to the canonical source URL, follows the redirect chain if present, records the status code, SSL validity, server header, and response time in milliseconds, and extracts the full HTML of the homepage. All schema.org JSON-LD blocks in the homepage head are extracted verbatim — preserved as Law I requires, with full provenance attribution. Stage 1 timing for hamstrahvac.com: 4.44 seconds.

Stage 2 — Extract 2

Interior page crawl and full corpus assembly.

The pipeline crawls interior pages reachable from the homepage, extracts schema.org blocks from each, and assembles the full text corpus from all crawled pages. Each page's schema blocks are preserved verbatim with source URL and fetch timestamp. The interior crawl produces the navigation map, URL depth distribution, dead link count, and interior word count. Stage 2 timing for hamstrahvac.com: 19.6 seconds across 58 interior pages, 44,891 words.

Stage 3 — Topology

Six-layer pre-linguistic shape measurement.

The topology fingerprint measures the extracted text corpus across six layers: character, token, punctuation, sentence, paragraph, and document. Every measurement is deterministic — the same input always produces the same output. The fingerprint includes type-token ratio, hapax ratio, repetition score, sentence skewness, paragraph structure, and character entropy. A SHA-256 hash of the extracted text seals the measurement. Stage 3 timing: 0.058 seconds.

Stage 4 — Schema Analysis

GDR Weighted Coverage Score v2.0.

The schema analyzer scores the entity's declared schema.org types against the full property neighborhood of the scored type. Four weighted components: own-property implementation (55%), inherited ancestor properties (25%), multi-type breadth (12%), and negative space precision (8%). The scored type is selected by a three-pass priority ladder — business identity types first, content types second, structural types as fallback only. The gap list names every recommended property not yet implemented. Each gap is a question AI cannot accurately answer about this entity. Stage 4 timing: 0.01 seconds.

Stage 5 — Atomic Answer

Machine-generated summary. Grounded. Stamped.

The atomic answer is a machine-generated summary of the entity, grounded in the measured fields of the record — not in the model's training data. The generation method, model identifier, and SHA-256 hash of the input are stamped in the record. The atomic answer is what AI reads first when it hits a Boise Standard entity page — served by the SpeakableSpecification targeting .atomic-answer-text in the page head. Stage 5 timing for hamstrahvac.com: 2.214 seconds.

Stage 6 — Mint

Root-LD assembly and permanent record publication.

All measurements are assembled into the Root-LD traveling context pod and embedded in the page head alongside the verbatim source schema blocks, the Boise Standard entity record, the FAQPage, Dataset, BreadcrumbList, ProfilePage, and schema graph neighborhood. A UUID is assigned. A Federation ID is stamped. A content hash seals the anchor layer. The record is published at its permanent URL. The recursive layer is initialized — empty by design, ready to receive corpus edges. Total pipeline time for hamstrahvac.com: 26.33 seconds.

26.33s

Total pipeline time — hamstrahvac.com mint

Schema.org blocks extracted and preserved verbatim

Graph edges minted — 1 TLD, 18 schema type, 45 schema property

31,257

Tokens measured across the full crawled corpus

The Schema Coverage Score

A schema gap is not a technical deficiency.
It is a question AI cannot answer about you.

Schema.org is the shared vocabulary of the machine-readable web. Founded in 2011 by Google, Microsoft, Yahoo, and Yandex, it is the standard by which web publishers declare structured facts about themselves — their identity, their services, their location, their ratings, their credentials — in a format any machine can read without inference.

The Web Almanac 2024 found that only 44 percent of domains carry any schema markup at all, and of those, coverage of the available property space is typically shallow. The majority of local business pages implement a narrow slice of what schema.org makes available — address, name, phone number — and leave the rest undeclared. Web Almanac 2024 — Structured Data chapter.

The Boise Standard GDR Weighted Coverage Score measures how completely a given entity has implemented the schema.org property space available to its declared type. A score of 37% — like hamstrahvac.com at the time of minting — means 69 recommended properties are not yet implemented. Each of those 69 gaps is a specific question AI cannot answer accurately about that entity from structured data alone. It must infer. Inference introduces error. Error at scale becomes hallucination.

The score is not a grade. It is a measurement of the gap between what an entity has declared about itself and what the schema.org vocabulary makes available for that type of entity. Verification closes that gap — permanently, in the record, in the language AI reads.

GDR Weighted Coverage Score v2.0

Four components. One score. Full methodology.

Component 1 — 55%

Own-property implementation

Properties directly belonging to the scored type — the primary business identity type selected by the priority ladder, not structural or navigational types — measured against the full property space declared for that type in the schema.org vocabulary.

Component 2 — 25%

Inherited ancestor properties

Properties available through the scored type's parent types, measured at depth 1 and depth 2 only. Capped to prevent Thing-level noise from inflating the score. An entity cannot score well on ancestors alone.

Component 3 — 12%

Multi-type breadth

The number of distinct meaningful schema types declared, saturating at five types. CMS boilerplate — Yoast and RankMath auto-generate WebPage, WebSite, Organization, BreadcrumbList, and Article on every WordPress install — is explicitly accounted for. Five auto-generated types from a CMS do not represent genuine breadth.

Component 4 — 8%

Negative space precision

The absence of structurally unrelated type branches is a signal of declaration coherence — the entity knows what it is and has not inflated its type list with irrelevant branches. Ten THING_BRANCHES are measured for absence. Fewer absent branches means more precise self-declaration.

The scored type is selected by a three-pass priority ladder. Business identity types are evaluated first — HVACBusiness, LocalBusiness, Organization, Restaurant, and so on. Content types are evaluated second — Article, BlogPosting, FAQPage. Structural and navigational types — WebPage, WebSite, BreadcrumbList — are the fallback only. A WordPress site that auto-generates five structural types does not get scored as a WebPage. It gets scored as the business it actually is.

Research on schema.org implementation in AI retrieval contexts confirms that business identity types — LocalBusiness, Organization, FAQPage — produce the strongest signal for AI citation. A 2024 study by Stackmatix found that entities implementing Tier 1 schema types saw a 3-to-1 improvement in AI citation rate over unstructured pages. Stackmatix — Structured Data and AI Search. The gap list on every Boise Standard entity page is a prioritized, specific, actionable list of what would close that gap for that entity.

Schema.org v30.0 was published March 2026. The Boise Standard refinery runs against the latest published vocabulary. Every entity record carries the vocabulary version and analysis timestamp. schema.org/version/latest.

Root-LD Architecture

Three layers. Every entity. Every page.
An AI crawler gets complete provenance on the first request.

Root-LD is a linked data specification developed for the Boise Standard project from independent research into frontier AI systems, knowledge graph architecture, provenance standards, and information theory. The W3C JSON-LD 1.1 specification — the underlying format — is published at w3.org/TR/json-ld11. The Root-LD specification is published at root-ld.org. Boise Standard is the first deployment at regional scale.

Layer 1 — rld:anchor

Immutable. Frozen at mint. Never changes.

The anchor layer is sealed at the moment of minting and never modified. It contains the UUID, Federation ID, content hash of the extracted text, primary source URL, source verification flag, generation method, pipeline version, queued timestamp, mint timestamp, sequence number, domain signature, and the full manifest — a complete inventory of what the record contains and where each section lives.

The manifest includes a table of contents with permanent URLs pointing to every sub-record: body, schema graph, topology, semantic keywords, atomic answer, manifest JSON, Root-LD JSON, and the recursive edge collections. The link pod contains direct URLs to the canonical source, the TLD graph edge, the official schema.org vocabulary, and the Boise Standard vocabulary.

An AI system reading the anchor layer knows exactly what it is looking at, when it was created, by what method, from what source, and where every piece of the record lives. Full chain of custody from the first request.

Layer 2 — rld:body

Complete measurement snapshot. Frozen at mint.

The body layer is a complete measurement snapshot of the entity at the moment of minting. It is frozen — a new mint produces a new body, but this body does not change. It contains nine named subsections: identity, SEO, schema, semantic, topology fingerprint, ratio signals, navigation, provenance, graph edges, pipeline timing, and atomic answer.

The topology fingerprint is a six-layer pre-linguistic shape measurement of the full extracted text corpus — 12 deterministic values including type-token ratio, hapax ratio, repetition score, sentence skewness, kurtosis, punctuation entropy, and capital token ratio — sealed with a SHA-256 hash of the extracted text. The semantic section contains the top 40 words by frequency after stop-word removal, with no language classification, no dictionary matching, no editorial layer. Pure signal from what the entity chose to say about itself.

The ratio signals are eight deterministic measurements: schema density, nav ratio, content-to-structure ratio, external TLD diversity, self-declaration coherence, schema-to-navigation alignment, JavaScript surface ratio, and URL depth distribution. Each traces to a specific pipeline stage.

The atomic answer — a machine-generated summary grounded in the measured fields — carries the model identifier and a SHA-256 hash of the input. It is the most important field in the body for AI retrieval: it is what gets read first, and it is grounded in measurement, not inference.

Layer 3 — rld:recursive

Empty at mint. Grows as the graph builds itself.

The recursive layer is initialized at mint with zero edges, an empty edge list, and an empty append timestamp list. This is correct. It is not a deficiency. The recursive layer is the future tense of every entity record — the layer that accumulates connections between entities as the corpus grows deep enough to make those connections meaningful.

Common edges connect entities that share schema type neighborhoods — what two HVACBusiness entities share that no Restaurant shares. Uncommon edges connect entities across structural boundaries — the signal that appears in one topology cluster and not another. Jurisdictional edges connect entities to the geographic and regulatory structures they operate within. Supply chain edges connect entities through the product and service relationships they declare.

The graph builds itself. No editor decides which entities are related. The corpus passes over the accumulated records and the edges emerge from the measurements. This is Constitutional Law VII — Torus. The record reads the records.

The research foundation for the Root-LD architecture draws from multiple independent disciplines. Knowledge graph construction for AI retrieval: Neo4j — Unstructured Text to Knowledge Graph. Knowledge graph accuracy improvement in AI systems: WordLift — 29.8% accuracy improvement with knowledge graph enrichment. LLM-driven knowledge graph construction at scale: NVIDIA — LLM-driven Knowledge Graph techniques.

Data provenance and its role in AI accuracy: Zyte — What Is AI Data Provenance. Data quality and hallucination prevention: DataScienceCentral — Data Quality for Unbiased AI Results. IBM on AI data quality: IBM — AI and Data Quality.

Schema.org structured data and AI search: Google — Introduction to Structured Data. Entity linking and disambiguation at scale: SchemaApp — Entity Linking for Disambiguation. The @id and @graph pattern for knowledge graph construction: Momentic — @id Schema for SEO, LLMs, and Knowledge Graphs.

The shift from search to answer engine retrieval: SimilarWeb — Answer Engine Optimization. AI session growth: Frase — AI sessions up 527% year over year in 2025. AI Overviews reducing click-through rate: Jasper — AI Overviews reduced CTR 58%.

The Web Almanac 2024 found that 44% of domains carry any structured data, and of those, JSON-LD is dominant at 70%. Only 12.4% of all registered domains carry any schema markup at all. The average Boise Standard verified entity operates well above that baseline — with schema coverage scored, gaps named, and a permanent provenance record in place. Web Almanac 2024 — Structured Data chapter.

The Constitutional Laws of Information

Seven laws govern every decision the pipeline makes.
They are in the code. They are in the output. They are non-negotiable.

The Constitutional Laws of Information are not a philosophy statement. They are implemented constraints — rules that govern every pipeline decision, every output field, every display choice on every entity page. When you see a label on a Boise Standard entity page that says "Law I — Provenance" or "Law III — meaning is yours," that label is not decorative. It is the specific law that governs what is shown and why it is shown that way.

Law I — Provenance

Everything has a source. The source is in the record.

Every field in a Boise Standard entity record traces to a specific origin: a crawl, a measurement, a model call, or a human-submitted verification. The source URL, fetch timestamp, pipeline stage, and method are recorded alongside the output. Source schema blocks are preserved verbatim in the page head — not summarized, not interpreted, not cleaned up. The raw blocks are the provenance. This is why the Schema Intelligence panel on every entity page shows the actual JSON-LD blocks extracted from the source, each stamped with the URL and fetch time they came from.

Law II — Temporal Attestation

Every measurement is timestamped in ISO 8601 UTC.

The record does not claim to describe the entity as it is right now. It describes the entity as it was at the moment of the crawl, precisely identified. The mint timestamp, the analysis timestamp, the atomic answer generation timestamp — all are ISO 8601 UTC, all are in the record. An AI system reading a Boise Standard entity knows exactly when the measurement was taken. The freshness label — CURRENT or STALE — is a deterministic output of the gap between the mint date and the present date. Time is not hidden. It is declared.

Law III — Reversible Ontology

The graph surrounds. Meaning is the reader's.

The semantic words panel on every entity page — the top 40 words by frequency from the full crawled corpus — carries no classification, no editorial label, no dictionary matching. The words are measured. What they mean is not declared by the pipeline. This is intentional. The pipeline records what the entity chose to say about itself — ranked by how often it said it. The interpretation belongs to whoever reads the record. This applies equally to the topology fingerprint: the numbers are measurements, not judgments. The pipeline does not decide what a type-token ratio of 0.145 means. It measures it and publishes it.

Law V — Common Edges

What two things share is a thing.

Two entities that share a schema type neighborhood share a common edge. Two entities that share a topology cluster share a common edge. Two entities that share a jurisdictional boundary share a common edge. Common edges are the most fundamental unit of the recursive layer — they accumulate through corpus passes and populate the rld:recursive layer as the graph matures. The graph builds itself not by editorial decision but by measurement. What is shared between entities is as significant as what is unique to them. Common edges are Law V made structural.

Law VI — Uncommon Edges

The absence of connection is a pattern waiting to be read.

The structural negative type space on every entity page — the schema.org branches that have no connection to the entity's declared type — is first-class data, not a residual. An HVACBusiness with no connection to the BioChemEntity branch is more precisely declared because of that absence. The negative space blocks on the Schema Intelligence panel carry the note: "Graph position measurement — not inference. Constitutional Law VI: the absence of connection is a pattern waiting to be read." Uncommon edges — the signals that appear in one entity and not another, the topology positions that diverge from the cluster — are how the recursive layer builds the graph's most precise distinctions.

Law VII — Torus

The graph reads itself. The record builds the record.

The recursive layer is empty at mint because it cannot be meaningfully populated until there are enough entities in the corpus to make comparison meaningful. When corpus depth is sufficient, the pipeline runs passes over all records, identifies edges — common and uncommon — and appends them to the recursive layer of each entity they connect. The graph is self-referential in the precise sense: the records are the substrate from which the edges emerge. No human decides which entities are related. The measurements decide. This is the torus: the output feeds the next input. The graph builds itself.

Verification

The pipeline measures what is findable.
Verification declares what is true.

Every unverified entity record on Boise Standard is built from what the pipeline could find on the open web. Verification is the act of the entity itself claiming the record and filling in what the pipeline cannot find — the founding story, the legal name, the credentials, the corrections to what AI currently gets wrong, and the declaration of who this entity actually is in its own words.

The unverified fields panel on every entity page lists every slot the pipeline could not fill from the open web. The hamstrahvac.com entity record, minted June 13, 2026, had 30 named empty slots across eight categories — entity identity, location and service area, what the entity does, credentials and trust, voice and authority, ratings and digital presence, media and documents, and final notes.

The most powerful slot on the panel is this one: What AI Currently Gets Wrong About You. Corrections go directly into the entity record as boundary declarations. An entity that has been mischaracterized — by a hallucinating model, by an outdated training set, by a competitor's SEO strategy — can declare the correction in its own words, in the permanent record, in the exact format AI reads. No intermediary. No platform dependency. The record belongs to the entity.

Verification also produces the complete JSON-LD schema file — built from the verification questionnaire, delivered via email with head placement instructions, ready to implement on the entity's own website. The schema file closes the gap list. Every property named in the gap list becomes a declared property. The schema coverage score rises. The questions AI could not answer become questions AI can answer — accurately, from the verified record.

The verification deliverables for every $25 verification: verified entity profile, complete JSON-LD schema file, full site analysis report, AI optimization recommendations, AI readiness guide, sitemap submission walkthrough, and Certificate of Verified Provenance — Certificate ID format BS-2026-000001, issued by Boise Standard LLC.

What Verification Permanently Adds

The claim layer. The entity's own words. In the record. Forever.

Identity

Legal business name · DBA / trade name · year founded · business structure · industry classification · team size

Location

Verified address · all cities and neighborhoods served · service area ZIP codes · accurate business hours — each location becomes a traversable graph edge

What You Do

30-second pitch · all services and products · what you do not do · target customer · price range · service guarantee — the boundary declarations AI uses to route queries correctly

Credentials

State license number · certifications and accreditations · insurance and bonding · years in business · years licensed — every credential becomes a verified authority signal

Voice

What AI currently gets wrong about you · what makes you different · your founding story · what your best customers always say · notable projects, awards, press coverage · geographic landmarks and anchors

Graph

Google rating and review count · Google Business Profile URL · social profiles — Facebook, Instagram, LinkedIn, Yelp · other platform presence — every profile becomes a sameAs edge in the schema record

Verify Your Business — $25

10% of net verification proceeds — sale price minus payment processing fee — are tithed to Treasure Valley faith communities and civic organizations, announced publicly with each disbursement. This is not a marketing commitment. It is a structural one, built into the pricing from the beginning. The community that owns the data benefits from the infrastructure built to serve it.

The Standard

The machine-readable web for the Treasure Valley
is being built one minted entity at a time.

The Web Almanac 2024 documented that 44% of domains carry any structured data, and only 12.4% of all registered domains carry schema markup of any kind. The majority of businesses operating in the Treasure Valley today are invisible to the machine-readable web — not because they have nothing to say, but because no one has built the infrastructure to say it for them and with them, in a format machines can read and communities can own.

Boise Standard is that infrastructure. Every minted entity in the directory is a permanent, machine-readable, provenance-sealed record of a real entity in this region — measured by a reproducible pipeline, grounded in open standards, governed by seven Constitutional Laws, and available to any AI system, researcher, or person who needs it.

The verified record of the Treasure Valley belongs to the Treasure Valley. Not to a platform. Not to a model. Not to an algorithm that changes without notice. The record is here. The standard is set. The graph builds itself.

Not for AI. For Boise.

Verify Your Business — $25 Browse the Directory See a Live Entity Record ↗

Research Provenance — Every Source Behind This Page

Schema.org Vocabularyschema.org — Founded 2011 by Google, Microsoft, Yahoo, Yandex. 800+ types. v30.0 published March 2026.

W3C JSON-LD 1.1w3.org/TR/json-ld11 — The linked data serialization standard underlying every Boise Standard record.

Web Almanac 2024 — Structured Dataalmanac.httparchive.org/en/2024/structured-data — 44% of domains with structured data. JSON-LD dominant at 70%. 12.4% total domain coverage.

WebDataCommons 2024uni-mannheim.de — WDC JSON-LD Corpus 2024 — 75M+ domain corpus. 12.4% carry schema markup.

Google Structured Data Guidelinesdevelopers.google.com — Structured Data Introduction

Knowledge Graphs and LLMsDataCamp — Knowledge Graphs and LLMs

WordLift — Knowledge Graph Accuracywordlift.io — 29.8% accuracy improvement with knowledge graph enrichment

Neo4j — Text to Knowledge Graphneo4j.com — Unstructured Text to Knowledge Graph

NVIDIA — LLM Knowledge Graphsdeveloper.nvidia.com — LLM-driven Knowledge Graph techniques

Zyte — AI Data Provenancezyte.com — What Is AI Data Provenance

IBM — AI Data Qualityibm.com — AI and Data Quality

DataScienceCentral — Hallucination Preventiondatasciencecentral.com — Data Quality for Unbiased AI Results

Stackmatix — Structured Data and AI Citationstackmatix.com — Structured Data and AI Search. 3:1 citation improvement for Tier 1 schema types.

SchemaApp — Entity Linkingschemaapp.com — Entity Linking for Disambiguation

Momentic — @id and Knowledge Graphsmomenticmarketing.com — @id Schema for SEO, LLMs, and Knowledge Graphs

SimilarWeb — Answer Engine Optimizationsimilarweb.com — Answer Engine Optimization

Frase — GEO and AI Session Growthfrase.io — AI sessions up 527% YoY 2025

Jasper — AI Overviews and CTRjasper.ai — AI Overviews reduced CTR 58%

Amsive — AI and Search Behavioramsive.com — 1 in 10 US users goes to AI first. AI Overviews in 16% of Google searches.

Root-LD Specificationroot-ld.org — The traveling context pod specification. Boise Standard is the first regional deployment.

Live Entity Exampleboisestandard.org/web/hamstrahvac-com — The reference implementation. Federation ID bs-b9b5d29f. Minted 2026-06-13T23:41:05Z.

Boise Standard Official Vocabularyboisestandard.org/schema/official_vocab.json

This Pageboisestandard.org/standard/ · Published 2026 · Boise Standard LLC · Treasure Valley, Idaho · Governing law: Idaho / Ada County

Home of the first AI Directory A new standard of information

A minted entity is a permanent, machine-readable provenance record — built to the standard we would want AI reading about us.

Six stages. Every measurement deterministic.Same input produces the same output. Always.

A schema gap is not a technical deficiency.It is a question AI cannot answer about you.

Three layers. Every entity. Every page.An AI crawler gets complete provenance on the first request.

Seven laws govern every decision the pipeline makes.They are in the code. They are in the output. They are non-negotiable.

The pipeline measures what is findable.Verification declares what is true.

The machine-readable web for the Treasure Valleyis being built one minted entity at a time.