CLI reference
The rete binary (crate rete-cli). Run rete <command> --help for the
authoritative flags. Terms are written as canonical N-Triples tokens —
<http://ex/Alice> for an IRI, "30" or "30"^^<…#integer> for a literal.
Building
rete build <inputs…> -o <out.rete> [--format nt|nq|ttl]
Build a file from one or more RDF inputs, merged under one shared dictionary.
Format is detected by extension (.nt / .nq / .ttl); - reads stdin and
defaults to N-Triples; --format forces a format for all inputs. N-Quads inputs
produce a dataset with named graphs.
rete build a.nt b.nt -o merged.rete
curl -s https://host/data.nt | rete build - -o data.rete
--materialize bakes RDFS/OWL-RL entailments into the file at build time (see
Reasoning). --reason runs the same reasoner but instead of
adding triples it stamps the coherence verdict into the Dataset Card (implies
--card) — so a remote reader learns whether the graph is logically coherent from
the index-free card with no compute; unlike --materialize it records
coherent: false honestly rather than aborting (verify later with rete reason --verify-card, combine with --materialize to also bake the inferred triples).
--card (and --card-file / --title / --license / --source /
--description / --created) embeds a Dataset Card —
data-catalog metadata plus an auto-derived profile. --text-index adds a
full-text (word/CONTAINS) index over the literals for rete search --contains
(see below). --type-predicate <IRI> overrides the predicate that types subjects
with classes for the schema pyramid (default rdf:type, else auto-detected) —
e.g. Wikidata's --type-predicate http://www.wikidata.org/prop/direct/P31.
--no-pyramid skips the community pyramid entirely (no pyramid section): SPARQL /
SHACL / triple-pattern / reachability queries don't use it, so the file stays
fully queryable and is markedly smaller — only community / summary / progressive
queries need the pyramid. All of these are opt-in; without them the output is
byte-identical to a plain build.
rete repyramid <file> -o <out.rete> [--type-predicate <IRI>] [--text-index] [--card …]
Rebuild a file's pyramid in place, reading the triples straight from the
existing .rete — no export | build N-Quads round-trip. Use it to add a schema
pyramid (or a --text-index / a Dataset Card) to a file built before those
existed, or to re-derive the schema pyramid under a different --type-predicate.
The card flags match rete build (--card-file / --title / --license / …).
rete repyramid old.rete -o new.rete --type-predicate http://www.wikidata.org/prop/direct/P31
rete repyramid old.rete -o new.rete --text-index # add full-text search to an existing file
Validating
rete validate <inputs…> [--format nt|nq|ttl]
Parse RDF input(s) without building, to check they are well-formed N-Triples/N-Quads/Turtle. Reports statement and named-graph counts, or exits non-zero with a precise parse error (file, line, column).
rete validate data.ttl
curl -s https://host/data.nt | rete validate - --format nt
Inspecting
rete info <file>
Print the decoded 1 KB header (the section directory, codecs, counts, content hash) — plus the Dataset Card catalog when the file carries one.
rete stats <file>
Human-friendly overview: file size, default-graph triple count, distinct terms, named-graph count, pyramid levels, and top predicates. Notes the label-index and text-index sizes when the file carries them.
rete verify <file>
Recompute the blake3 content hash and compare to the header. Exits non-zero with
FAILED — content hash mismatch on corruption/truncation.
rete search <file> [<prefix>] [--contains <word>…] [--contains-prefix P] [--limit N] [--json]
Two modes.
Label prefix (default) — the subjects whose label starts with prefix
(case-insensitive), printed as label<TAB><iri> (or [{"label":…,"subject":…}]
with --json). An empty prefix returns the first --limit labels (default 20).
Answered from a bounded, label-sorted block in the pyramid-meta by binary search
— no literal scan — so it is the fast path for autocomplete (~22× a
FILTER(STRSTARTS(LCASE(?l), …)) scan at 6k labels; the gap widens with size).
Labels come from rdfs:label, skos:prefLabel/altLabel, foaf:name,
dc(terms):title, and schema:name; the block keeps the top 8,192 most-connected
labeled subjects. Files built before this feature carry no label index (the block
is additive — rebuild to add it).
Full-text (--contains <word>…) — the subjects whose literals contain
every given word (whole-word, case-insensitive — AND), printed one IRI per
line (or [{"subject":…}] with --json). --contains-prefix einst additionally
requires a literal word starting with einst. Answered from the opt-in
TEXT_INDEX section (rete build --text-index); on a remote file only the
queried words' posting lists are fetched, not the whole index. A file built
without --text-index reports that it has no text index.
rete search data.rete gluc # label prefix (autocomplete)
rete search data.rete --contains glucose # literals containing "glucose"
rete search data.rete --contains glucose phosphate # both words (AND)
rete search data.rete --contains-prefix einst # a word starting with "einst…"
rete card <file> [--json]
Print the embedded Dataset Card — curated metadata
(title/license/source/…) plus the derived profile (counts, top predicates and
classes, vocabularies) and the content-hash checksum. --json emits the raw
card. Prints (no dataset card) for a file built without one. rete info shows
the same catalog beneath the header when a card is present.
rete graphs <file>
List the named-graph IRIs in a dataset (the default graph is unnamed).
rete export <file> [--format nq|ttl|jsonld]
Serialize the dataset. nq (the default) dumps every triple/quad as N-Quads
(default graph + named graphs) — a lossless round-trip. ttl emits Turtle and
jsonld emits expanded JSON-LD; both serialize the default graph only
(Turtle/JSON-LD here carry no default-vs-named distinction, so named graphs are
skipped — use nq to export those).
rete export data.rete # N-Quads (default)
rete export data.rete --format ttl # Turtle
rete export data.rete --format jsonld # expanded JSON-LD
Querying
rete query <file> [--subject S] [--predicate P] [--object O]
Match a single triple pattern; omitted positions are wildcards.
rete query data.rete --predicate '<http://ex/knows>'
rete why <file> [--subject S] [--predicate P] [--object O] [--json]
Explain the provenance of each triple-pattern result. The command reports the
matched terms, dictionary IDs, graph scope, the chosen permutation (one of the
six), and the file byte ranges for the dictionary, full index container, selected
permutation payload, and pyramid metadata. With --json, the same data is
emitted as stable machine-readable JSON.
rete why data.rete --predicate '<http://ex/knows>'
rete why data.rete --subject '<http://ex/Alice>' --json
rete why-url <url> [--subject S] [--predicate P] [--object O] [--json]
The remote counterpart of rete why: explain a triple-pattern result over a
.rete served on HTTP, range-fetching only the routed tiles — the same
provenance (permutation, section, byte ranges, the physical tile) plus the bytes
fetched and range-request count. The CLI version of the browser's why_url.
rete why-url https://host/data.rete --predicate '<http://ex/knows>'
# … provenance …
# (fetched 4096 bytes in 1 range request(s); file is 1048576 bytes)
Provenance is honest about the physical layout: it identifies the index
container, the selected permutation payload, and — for tiled files —
the physical tile holding each match (PERM/index) with its compressed byte
range. Pre-tiling (v0.1) files report tile provenance as not_materialized.
rete bgp <file> "<pattern> . <pattern> …"
Evaluate a Basic Graph Pattern. Patterns are separated by ., terms by spaces;
?name is a variable (terms may not contain spaces).
rete bgp data.rete "?x <http://ex/knows> ?y . ?y <http://ex/knows> ?z"
rete sparql <file> "<query>" [--json]
Run SPARQL: SELECT / ASK / CONSTRUCT / DESCRIBE. With --json, emit
standard SPARQL Results JSON (for SELECT/ASK). See SPARQL support.
rete sparql data.rete "PREFIX e: <http://ex/> SELECT ?p (COUNT(?f) AS ?n) WHERE { ?p e:knows ?f } GROUP BY ?p"
rete cost <file-or-url> "<query>" [--json] [--explain]
Preview the byte/range-request cost of a SPARQL query without evaluating it. The report parses the query, lists the concrete predicates that can drive summary-based routing, and compares three access paths:
- summary overview — header + dictionary + pyramid summary, skipping the triple index.
- routed pattern open — for a single default-graph triple pattern, header + dictionary + the one selected permutation payload (the best of the six).
- full query open — the current SPARQL engine path, which opens dictionary + index (+ pyramid/named-graph metadata when present) before evaluation.
rete cost data.rete "PREFIX e: <http://ex/> SELECT ?y WHERE { e:Alice e:knows ?y }"
rete cost https://host/data.rete "ASK { ?s <http://ex/knows> ?o }" --json
For the exact summary-only shapes SELECT (COUNT(*) AS ?n) WHERE { ?s <p> ?o },
SELECT (COUNT(*) AS ?n) WHERE { ?s ?p ?o },
SELECT ?p (COUNT(*) AS ?n) WHERE { ?s ?p ?o } GROUP BY ?p,
SELECT DISTINCT ?p WHERE { ?s ?p ?o },
SELECT (COUNT(DISTINCT ?p) AS ?n) WHERE { ?s ?p ?o }, ASK { ?s <p> ?o },
and ASK { ?s ?p ?o }, the JSON output includes summary_answer with the
exact count/boolean value read from the pyramid summary. Predicate-specific
shapes also include the predicate; predicate totals return all predicate/count
pairs, predicate lists return all predicates present in the summary, and
predicate distinct counts return the number of predicates. More complex shapes
are marked requires_index.
Add --explain to include a planner explanation. In JSON, this adds an
explain object with the classified query_shape, whether the answer is
summary_exact, the planned access path (summary-only, routed-pattern, or
full-index), and whether the current engine path still reads the index.
For HTTP(S), the host must honor Range requests, just like query-url and
sparql-url. Treat this as a deployment/debugging preview: it reports the
current SPARQL engine's range budget, the summary budget, and the exact routed
single-pattern budget when the query shape allows it.
rete progressive <file-or-url> "<query>" [--json]
Run the first summary-only progressive query path. This command answers only the exact shapes that can be proven from the pyramid summary without opening the triple index:
SELECT (COUNT(*) AS ?n) WHERE { ?s <p> ?o }SELECT (COUNT(*) AS ?n) WHERE { ?s ?p ?o }SELECT ?p (COUNT(*) AS ?n) WHERE { ?s ?p ?o } GROUP BY ?pSELECT DISTINCT ?p WHERE { ?s ?p ?o }SELECT (COUNT(DISTINCT ?p) AS ?n) WHERE { ?s ?p ?o }ASK { ?s ?p ?o }ASK { ?s <p> ?o }
rete progressive data.rete "PREFIX e: <http://ex/> SELECT (COUNT(*) AS ?n) WHERE { ?s e:knows ?o }" --json
The JSON output is SPARQL Results JSON with an added progressive object
describing the summary stage, exactness, predicate when one is fixed, bytes
fetched, request count, and reads_index: false. Other query shapes fail
clearly; use rete sparql for full-index evaluation or rete cost --explain to
inspect why a query is not summary-answerable yet.
rete cypher <file> "<query>" [--base <iri>] [--json]
Run a read-only Cypher subset (a prototype). The query is translated to an
equivalent SPARQL SELECT and evaluated by the same engine — no second query
engine. Supports MATCH … [WHERE …] RETURN … [LIMIT n]: node patterns
((a), (a:Label)), forward/reverse relationships ((a)-[:REL]->(b),
(a)<-[:REL]-(b)), variable-length relationships (-[:REL*]-> → the SPARQL
property path REL+, one-or-more), simple WHERE comparisons on a property
(a.age > 30) or identity (a = <iri>) joined by AND/OR, and RETURN of
variables and/or properties. Writes, OPTIONAL MATCH, WITH, aggregations,
relationship variables/properties, and multiple labels are rejected with a clear
error. See compatibility for the full subset and the
name→IRI convention. With --json, emit standard SPARQL Results JSON.
A bare label/relationship/property name X maps to <BASE + X>, where BASE
defaults to http://ex/ and is overridable with --base.
rete cypher deps.rete "MATCH (a:Application) RETURN a"
rete cypher deps.rete "MATCH (a)-[:dependsOn*]->(b) WHERE b = <http://ex/log4x> RETURN a"
Reasoning
rete reason [<file>] [--url <url>] [--materialize] [--check] [--verify-card] [--format nq|ttl]
Run the prototype OWL RL / RDFS reasoner: materialize RDFS/OWL entailments to
a fixpoint and report any logical inconsistencies ("incoherent points", e.g. a
disjoint-class violation). Prints the count of newly entailed triples and each
inconsistency (kind + detail). Exits non-zero if any inconsistency is found
(zero if coherent), so it works as a coherence gate in CI. With --materialize,
also serialize the base + inferred graph (nq default, or ttl). This is a
documented subset, not full OWL DL — see Reasoning & coherence.
--url <url>reads a remote.reteover HTTP range requests instead of a local file (omit the positional<file>).--checkis coherence-gate mode: print one verdict line and exit non-zero on any incoherent point (suppresses--materializeoutput) — the minimal CI gate.--verify-cardchecks the file's baked coherence card (fromrete build --reason) against a fresh reasoning run, guarding against drift or a stale ruleset; it exits non-zero if the stored verdict disagrees with recomputation.
rete reason data.rete
rete reason data.rete --materialize --format ttl
rete reason data.rete --check # one-line CI verdict
rete reason --url https://host/data.rete # reason over a remote file
rete reason data.rete --verify-card # baked card vs fresh run
Shape validation
rete shacl <file> --shapes <shapes.ttl> [--graph <iri>] [--format text|json|ttl]
Validate a .rete graph against SHACL Core shapes read from Turtle. The default
graph is validated unless --graph names one dataset graph. The command exits
zero when the report conforms and non-zero when it finds validation results, so
it can be used as a CI data-quality gate. See SHACL validation for
the supported components and current limits.
rete shacl data.rete --shapes shapes.ttl
rete shacl data.rete --shapes shapes.ttl --format json
rete shacl data.rete --shapes shapes.ttl --graph '<http://ex/snapshot>'
rete shacl-url <url> --shapes <shapes.ttl> [--format text|json|ttl]
Validate a remote .rete over HTTP, range-reading only what the shapes
target. The file is opened lazily and each focus node's values are fetched as
routed range reads, so a targeted shape (sh:targetClass / targetNode /
targetSubjectsOf / targetObjectsOf) never downloads the whole graph — it
faults only the tiles holding the target nodes and their property values. Reports
the bytes fetched and the range-request count. Validates the default graph.
rete shacl-url https://host/data.rete --shapes shapes.ttl
# (fetched 38912 bytes in 7 range request(s); file is 1048576 bytes)
Coarse graphs (no index read)
rete summary <file> [--level k]
Print the structural coarse graph: the Louvain community quotient graph
(community → community relations with counts), plus — for v2 files — the schema
pyramid: a leveled rdf:type histogram where abstract classes describe coarse
levels and leaf classes resolve as you zoom in (e.g. Agent → Person → Scientist → Astronomer). --level k prints just level k's type histogram (0 =
coarsest / most abstract). Everything here reads index-free from the
pyramid-meta — summary-url shows the same over HTTP without fetching the index.
rete schema <file>
Print the semantic coarse graph: classes (by rdf:type) with instance
counts, and the class→predicate→class relations between them — the dataset's
effective schema.
rete communities <file> [--json] [--profile] [--predicate IRI] [--round N] [--min-size N]
Recompute the Louvain communities and expose, per community, its member
subjects and the literal text of its triples — the per-community text
corpus for downstream topic modeling. --profile adds a no-ML "topic" profile
per community (top literal words, rdf:type classes, and predicates).
--predicate <iri> detects communities using only that relation's edges — a
criterion-specific partition (see multi-criteria splitting).
--json emits [{community, size, members:[<iri>…], text:[lexical…]}] (plus a
profile object when --profile is set); --round N cuts the dendrogram at a
specific round (default: the round chosen for the tile budget); --min-size N
drops communities with fewer than N members. See the
topic modeling tutorial.
rete communities papers.rete --profile # no-ML topic labels
rete communities researchers.rete --predicate '<http://ex/cites>' # one criterion
rete communities papers.rete --json | python3 scripts/lda_topics.py --topics 3
rete predicates <file>
Exact per-predicate triple totals, summed from the pyramid summary alone — the triple index is never read.
Graph traversal
rete reach <file> --predicate <iri> --seed <iri>… [--seeds-file F] [--reverse] [--parallel] [--count]
Multi-source transitive reachability over one relation: for each seed, the set of
nodes it transitively reaches. --reverse answers "who reaches the seed?"
(impact analysis); --seeds-file reads one seed IRI per line; --count prints
only sizes; --parallel runs one rayon task per
seed (the batch-reachability workload that benchmarks ~14–15× on many cores).
# Who (transitively) depends on the vulnerable package? (impact analysis)
rete reach deps.rete --predicate '<http://ex/dependsOn>' --seed '<http://ex/log4x>' --reverse
# Many seeds at once, in parallel:
rete reach g.rete --predicate '<http://ex/knows>' --seeds-file seeds.txt --parallel --count
Over HTTP
Both URL commands work against http:// and https:// hosts that honor Range
requests (S3, GitHub, any CDN).
rete card-url <url> [--json]
Fetch only the embedded Dataset Card — the header and the metadata range, in two small range requests. The dictionary, index, and pyramid are never fetched: this is the index-free CARD tier, the cold-start self-description (counts, vocabulary, class graph, signals, starter queries) a client reads before it knows what to query. Reports bytes fetched + range count.
rete card-url https://host/data.rete --json
rete summary-url <url>
Fetch only the header + dictionary + pyramid summary and print the coarse graph. The (large) index is never fetched — the "overview first" path.
rete query-url <url> [--subject S] [--predicate P] [--object O]
Match a triple pattern over HTTP, fetching only the byte ranges the query needs and reporting how many ranges/bytes were pulled. Bound terms are resolved from the dictionary first; if they exist, the reader fetches only the selected permutation payload rather than the whole index container. If a bound term is unknown, the index is skipped entirely and the result is empty.
rete query-url https://host/data.rete --object '<http://ex/Dave>'
rete sparql-url <url> "<query>" [--json]
Run a full SPARQL query over HTTP with lazy tile faulting (tiled files): the open fetches the header, dictionary, pyramid, and the index's small tile directories; index tiles are then range-fetched only when the query's scans and probes touch them, so a selective query reads O(touched tiles) rather than the whole index. A range failure mid-query is reported as an error, never as silently fewer rows. Pre-tiling (v0.1) files fall back to fetching the index whole.
rete sparql-url https://host/data.rete \
"PREFIX e: <http://ex/> SELECT ?d WHERE { ?d e:dependsOn+ e:log4x }"
Federation
rete federate <sources…> --query "<SPARQL>" [--json] [--no-route]
Run one SPARQL query across several .rete sources — local file paths and/or
http(s):// URLs, mixed — and merge the results at the term (string) level:
SELECT rows are unioned + deduped, ASK is OR'd across sources, CONSTRUCT
triples are unioned + deduped. Routing (default on) reads each source's
predicate set from its summary (index never touched) and prunes sources whose
predicates are disjoint from the query's; --no-route queries every source.
Per-source diagnostics and the queried/pruned tally go to stderr.
This is union federation — correct for sharded data where each file
independently yields complete rows. It does not do cross-file joins, and
aggregates (COUNT/GROUP BY) and LIMIT are evaluated per source then
unioned (a federated COUNT(*) returns per-source counts, not a global sum). See
Federated queries for the full model, limitations, and a real
OpenCitations multi-shard example.
rete federate data/opencitations/cites-2021.rete data/opencitations/cites-2024.rete \
--query 'SELECT ?citing WHERE { ?citing <http://purl.org/spar/cito/cites>
<https://doi.org/10.1038/s41586-021-03819-2> } LIMIT 5'
Exit codes
0 on success; non-zero with a message on error (bad input, missing file,
corrupt file, a host that ignores Range, or an unsupported query construct).