CLI reference

The rete binary (crate rete-cli). Run rete <command> --help for the authoritative flags. Terms are written as canonical N-Triples tokens — <http://ex/Alice> for an IRI, "30" or "30"^^<…#integer> for a literal.

Building

`rete build <inputs…> -o <out.rete> [--format nt|nq|ttl]`

Build a file from one or more RDF inputs, merged under one shared dictionary. Format is detected by extension (.nt / .nq / .ttl); - reads stdin and defaults to N-Triples; --format forces a format for all inputs. N-Quads inputs produce a dataset with named graphs.

rete build a.nt b.nt -o merged.rete
curl -s https://host/data.nt | rete build - -o data.rete

--materialize bakes RDFS/OWL-RL entailments into the file at build time (see Reasoning). --reason runs the same reasoner but instead of adding triples it stamps the coherence verdict into the Dataset Card (implies --card) — so a remote reader learns whether the graph is logically coherent from the index-free card with no compute; unlike --materialize it records coherent: false honestly rather than aborting (verify later with rete reason --verify-card, combine with --materialize to also bake the inferred triples). --card (and --card-file / --title / --license / --source / --description / --created) embeds a Dataset Card — data-catalog metadata plus an auto-derived profile. --text-index adds a full-text (word/CONTAINS) index over the literals for rete search --contains (see below). --type-predicate <IRI> overrides the predicate that types subjects with classes for the schema pyramid (default rdf:type, else auto-detected) — e.g. Wikidata's --type-predicate http://www.wikidata.org/prop/direct/P31. --no-pyramid skips the community pyramid entirely (no pyramid section): SPARQL / SHACL / triple-pattern / reachability queries don't use it, so the file stays fully queryable and is markedly smaller — only community / summary / progressive queries need the pyramid. All of these are opt-in; without them the output is byte-identical to a plain build.

`rete repyramid <file> -o <out.rete> [--type-predicate <IRI>] [--text-index] [--card …]`

Rebuild a file's pyramid in place, reading the triples straight from the existing .rete — no export | build N-Quads round-trip. Use it to add a schema pyramid (or a --text-index / a Dataset Card) to a file built before those existed, or to re-derive the schema pyramid under a different --type-predicate. The card flags match rete build (--card-file / --title / --license / …).

rete repyramid old.rete -o new.rete --type-predicate http://www.wikidata.org/prop/direct/P31
rete repyramid old.rete -o new.rete --text-index    # add full-text search to an existing file

Validating

`rete validate <inputs…> [--format nt|nq|ttl]`

Parse RDF input(s) without building, to check they are well-formed N-Triples/N-Quads/Turtle. Reports statement and named-graph counts, or exits non-zero with a precise parse error (file, line, column).

rete validate data.ttl
curl -s https://host/data.nt | rete validate - --format nt

Inspecting

`rete info <file>`

Print the decoded 1 KB header (the section directory, codecs, counts, content hash) — plus the Dataset Card catalog when the file carries one.

`rete stats <file>`

Human-friendly overview: file size, default-graph triple count, distinct terms, named-graph count, pyramid levels, and top predicates. Notes the label-index and text-index sizes when the file carries them.

`rete verify <file>`

Recompute the blake3 content hash and compare to the header. Exits non-zero with FAILED — content hash mismatch on corruption/truncation.

`rete search <file> [<prefix>] [--contains <word>…] [--contains-prefix P] [--limit N] [--json]`

Two modes.

Label prefix (default) — the subjects whose label starts with prefix (case-insensitive), printed as label<TAB><iri> (or [{"label":…,"subject":…}] with --json). An empty prefix returns the first --limit labels (default 20). Answered from a bounded, label-sorted block in the pyramid-meta by binary search — no literal scan — so it is the fast path for autocomplete (~22× a FILTER(STRSTARTS(LCASE(?l), …)) scan at 6k labels; the gap widens with size). Labels come from rdfs:label, skos:prefLabel/altLabel, foaf:name, dc(terms):title, and schema:name; the block keeps the top 8,192 most-connected labeled subjects. Files built before this feature carry no label index (the block is additive — rebuild to add it).

Full-text (--contains <word>…) — the subjects whose literals contain every given word (whole-word, case-insensitive — AND), printed one IRI per line (or [{"subject":…}] with --json). --contains-prefix einst additionally requires a literal word starting with einst. Answered from the opt-in TEXT_INDEX section (rete build --text-index); on a remote file only the queried words' posting lists are fetched, not the whole index. A file built without --text-index reports that it has no text index.

rete search data.rete gluc                       # label prefix (autocomplete)
rete search data.rete --contains glucose         # literals containing "glucose"
rete search data.rete --contains glucose phosphate  # both words (AND)
rete search data.rete --contains-prefix einst    # a word starting with "einst…"

`rete card <file> [--json]`

Print the embedded Dataset Card — curated metadata (title/license/source/…) plus the derived profile (counts, top predicates and classes, vocabularies) and the content-hash checksum. --json emits the raw card. Prints (no dataset card) for a file built without one. rete info shows the same catalog beneath the header when a card is present.

`rete graphs <file>`

List the named-graph IRIs in a dataset (the default graph is unnamed).

`rete export <file> [--format nq|ttl|jsonld]`

Serialize the dataset. nq (the default) dumps every triple/quad as N-Quads (default graph + named graphs) — a lossless round-trip. ttl emits Turtle and jsonld emits expanded JSON-LD; both serialize the default graph only (Turtle/JSON-LD here carry no default-vs-named distinction, so named graphs are skipped — use nq to export those).

rete export data.rete                 # N-Quads (default)
rete export data.rete --format ttl    # Turtle
rete export data.rete --format jsonld # expanded JSON-LD

Querying

`rete query <file> [--subject S] [--predicate P] [--object O]`

Match a single triple pattern; omitted positions are wildcards.

rete query data.rete --predicate '<http://ex/knows>'

`rete why <file> [--subject S] [--predicate P] [--object O] [--json]`

Explain the provenance of each triple-pattern result. The command reports the matched terms, dictionary IDs, graph scope, the chosen permutation (one of the six), and the file byte ranges for the dictionary, full index container, selected permutation payload, and pyramid metadata. With --json, the same data is emitted as stable machine-readable JSON.

rete why data.rete --predicate '<http://ex/knows>'
rete why data.rete --subject '<http://ex/Alice>' --json

`rete why-url <url> [--subject S] [--predicate P] [--object O] [--json]`

The remote counterpart of rete why: explain a triple-pattern result over a .rete served on HTTP, range-fetching only the routed tiles — the same provenance (permutation, section, byte ranges, the physical tile) plus the bytes fetched and range-request count. The CLI version of the browser's why_url.

rete why-url https://host/data.rete --predicate '<http://ex/knows>'
# … provenance …
# (fetched 4096 bytes in 1 range request(s); file is 1048576 bytes)

Provenance is honest about the physical layout: it identifies the index container, the selected permutation payload, and — for tiled files — the physical tile holding each match (PERM/index) with its compressed byte range. Pre-tiling (v0.1) files report tile provenance as not_materialized.

`rete bgp <file> "<pattern> . <pattern> …"`

Evaluate a Basic Graph Pattern. Patterns are separated by ., terms by spaces; ?name is a variable (terms may not contain spaces).

rete bgp data.rete "?x <http://ex/knows> ?y . ?y <http://ex/knows> ?z"

`rete sparql <file> "<query>" [--json]`

Run SPARQL: SELECT / ASK / CONSTRUCT / DESCRIBE. With --json, emit standard SPARQL Results JSON (for SELECT/ASK). See SPARQL support.

rete sparql data.rete "PREFIX e: <http://ex/> SELECT ?p (COUNT(?f) AS ?n) WHERE { ?p e:knows ?f } GROUP BY ?p"

`rete cost <file-or-url> "<query>" [--json] [--explain]`

Preview the byte/range-request cost of a SPARQL query without evaluating it. The report parses the query, lists the concrete predicates that can drive summary-based routing, and compares three access paths:

summary overview — header + dictionary + pyramid summary, skipping the triple index.
routed pattern open — for a single default-graph triple pattern, header + dictionary + the one selected permutation payload (the best of the six).
full query open — the current SPARQL engine path, which opens dictionary + index (+ pyramid/named-graph metadata when present) before evaluation.

rete cost data.rete "PREFIX e: <http://ex/> SELECT ?y WHERE { e:Alice e:knows ?y }"
rete cost https://host/data.rete "ASK { ?s <http://ex/knows> ?o }" --json

For the exact summary-only shapes SELECT (COUNT(*) AS ?n) WHERE { ?s <p> ?o }, SELECT (COUNT(*) AS ?n) WHERE { ?s ?p ?o }, SELECT ?p (COUNT(*) AS ?n) WHERE { ?s ?p ?o } GROUP BY ?p, SELECT DISTINCT ?p WHERE { ?s ?p ?o }, SELECT (COUNT(DISTINCT ?p) AS ?n) WHERE { ?s ?p ?o }, ASK { ?s <p> ?o }, and ASK { ?s ?p ?o }, the JSON output includes summary_answer with the exact count/boolean value read from the pyramid summary. Predicate-specific shapes also include the predicate; predicate totals return all predicate/count pairs, predicate lists return all predicates present in the summary, and predicate distinct counts return the number of predicates. More complex shapes are marked requires_index.

Add --explain to include a planner explanation. In JSON, this adds an explain object with the classified query_shape, whether the answer is summary_exact, the planned access path (summary-only, routed-pattern, or full-index), and whether the current engine path still reads the index.

For HTTP(S), the host must honor Range requests, just like query-url and sparql-url. Treat this as a deployment/debugging preview: it reports the current SPARQL engine's range budget, the summary budget, and the exact routed single-pattern budget when the query shape allows it.

`rete progressive <file-or-url> "<query>" [--json]`

Run the first summary-only progressive query path. This command answers only the exact shapes that can be proven from the pyramid summary without opening the triple index:

SELECT (COUNT(*) AS ?n) WHERE { ?s <p> ?o }
SELECT (COUNT(*) AS ?n) WHERE { ?s ?p ?o }
SELECT ?p (COUNT(*) AS ?n) WHERE { ?s ?p ?o } GROUP BY ?p
SELECT DISTINCT ?p WHERE { ?s ?p ?o }
SELECT (COUNT(DISTINCT ?p) AS ?n) WHERE { ?s ?p ?o }
ASK { ?s ?p ?o }
ASK { ?s <p> ?o }

rete progressive data.rete "PREFIX e: <http://ex/> SELECT (COUNT(*) AS ?n) WHERE { ?s e:knows ?o }" --json

The JSON output is SPARQL Results JSON with an added progressive object describing the summary stage, exactness, predicate when one is fixed, bytes fetched, request count, and reads_index: false. Other query shapes fail clearly; use rete sparql for full-index evaluation or rete cost --explain to inspect why a query is not summary-answerable yet.

`rete cypher <file> "<query>" [--base <iri>] [--json]`

Run a read-only Cypher subset (a prototype). The query is translated to an equivalent SPARQL SELECT and evaluated by the same engine — no second query engine. Supports MATCH … [WHERE …] RETURN … [LIMIT n]: node patterns ((a), (a:Label)), forward/reverse relationships ((a)-[:REL]->(b), (a)<-[:REL]-(b)), variable-length relationships (-[:REL*]-> → the SPARQL property path REL+, one-or-more), simple WHERE comparisons on a property (a.age > 30) or identity (a = <iri>) joined by AND/OR, and RETURN of variables and/or properties. Writes, OPTIONAL MATCH, WITH, aggregations, relationship variables/properties, and multiple labels are rejected with a clear error. See compatibility for the full subset and the name→IRI convention. With --json, emit standard SPARQL Results JSON.

A bare label/relationship/property name X maps to <BASE + X>, where BASE defaults to http://ex/ and is overridable with --base.

rete cypher deps.rete "MATCH (a:Application) RETURN a"
rete cypher deps.rete "MATCH (a)-[:dependsOn*]->(b) WHERE b = <http://ex/log4x> RETURN a"

Reasoning

`rete reason [<file>] [--url <url>] [--materialize] [--check] [--verify-card] [--format nq|ttl]`

Run the prototype OWL RL / RDFS reasoner: materialize RDFS/OWL entailments to a fixpoint and report any logical inconsistencies ("incoherent points", e.g. a disjoint-class violation). Prints the count of newly entailed triples and each inconsistency (kind + detail). Exits non-zero if any inconsistency is found (zero if coherent), so it works as a coherence gate in CI. With --materialize, also serialize the base + inferred graph (nq default, or ttl). This is a documented subset, not full OWL DL — see Reasoning & coherence.

--url <url> reads a remote .rete over HTTP range requests instead of a local file (omit the positional <file>).
--check is coherence-gate mode: print one verdict line and exit non-zero on any incoherent point (suppresses --materialize output) — the minimal CI gate.
--verify-card checks the file's baked coherence card (from rete build --reason) against a fresh reasoning run, guarding against drift or a stale ruleset; it exits non-zero if the stored verdict disagrees with recomputation.

rete reason data.rete
rete reason data.rete --materialize --format ttl
rete reason data.rete --check                  # one-line CI verdict
rete reason --url https://host/data.rete       # reason over a remote file
rete reason data.rete --verify-card            # baked card vs fresh run

Shape validation

`rete shacl <file> --shapes <shapes.ttl> [--graph <iri>] [--format text|json|ttl]`

Validate a .rete graph against SHACL Core shapes read from Turtle. The default graph is validated unless --graph names one dataset graph. The command exits zero when the report conforms and non-zero when it finds validation results, so it can be used as a CI data-quality gate. See SHACL validation for the supported components and current limits.

rete shacl data.rete --shapes shapes.ttl
rete shacl data.rete --shapes shapes.ttl --format json
rete shacl data.rete --shapes shapes.ttl --graph '<http://ex/snapshot>'

`rete shacl-url <url> --shapes <shapes.ttl> [--format text|json|ttl]`

Validate a remote .rete over HTTP, range-reading only what the shapes target. The file is opened lazily and each focus node's values are fetched as routed range reads, so a targeted shape (sh:targetClass / targetNode / targetSubjectsOf / targetObjectsOf) never downloads the whole graph — it faults only the tiles holding the target nodes and their property values. Reports the bytes fetched and the range-request count. Validates the default graph.

rete shacl-url https://host/data.rete --shapes shapes.ttl
# (fetched 38912 bytes in 7 range request(s); file is 1048576 bytes)

Coarse graphs (no index read)

`rete summary <file> [--level k]`

Print the structural coarse graph: the Louvain community quotient graph (community → community relations with counts), plus — for v2 files — the schema pyramid: a leveled rdf:type histogram where abstract classes describe coarse levels and leaf classes resolve as you zoom in (e.g. Agent → Person → Scientist → Astronomer). --level k prints just level k's type histogram (0 = coarsest / most abstract). Everything here reads index-free from the pyramid-meta — summary-url shows the same over HTTP without fetching the index.

`rete schema <file>`

Print the semantic coarse graph: classes (by rdf:type) with instance counts, and the class→predicate→class relations between them — the dataset's effective schema.

`rete communities <file> [--json] [--profile] [--predicate IRI] [--round N] [--min-size N]`

Recompute the Louvain communities and expose, per community, its member subjects and the literal text of its triples — the per-community text corpus for downstream topic modeling. --profile adds a no-ML "topic" profile per community (top literal words, rdf:type classes, and predicates). --predicate <iri> detects communities using only that relation's edges — a criterion-specific partition (see multi-criteria splitting). --json emits [{community, size, members:[<iri>…], text:[lexical…]}] (plus a profile object when --profile is set); --round N cuts the dendrogram at a specific round (default: the round chosen for the tile budget); --min-size N drops communities with fewer than N members. See the topic modeling tutorial.

rete communities papers.rete --profile               # no-ML topic labels
rete communities researchers.rete --predicate '<http://ex/cites>'  # one criterion
rete communities papers.rete --json | python3 scripts/lda_topics.py --topics 3

`rete predicates <file>`

Exact per-predicate triple totals, summed from the pyramid summary alone — the triple index is never read.

Graph traversal

`rete reach <file> --predicate <iri> --seed <iri>… [--seeds-file F] [--reverse] [--parallel] [--count]`

Multi-source transitive reachability over one relation: for each seed, the set of nodes it transitively reaches. --reverse answers "who reaches the seed?" (impact analysis); --seeds-file reads one seed IRI per line; --count prints only sizes; --parallel runs one rayon task per seed (the batch-reachability workload that benchmarks ~14–15× on many cores).

# Who (transitively) depends on the vulnerable package?  (impact analysis)
rete reach deps.rete --predicate '<http://ex/dependsOn>' --seed '<http://ex/log4x>' --reverse

# Many seeds at once, in parallel:
rete reach g.rete --predicate '<http://ex/knows>' --seeds-file seeds.txt --parallel --count

Over HTTP

Both URL commands work against http:// and https:// hosts that honor Range requests (S3, GitHub, any CDN).

`rete card-url <url> [--json]`

Fetch only the embedded Dataset Card — the header and the metadata range, in two small range requests. The dictionary, index, and pyramid are never fetched: this is the index-free CARD tier, the cold-start self-description (counts, vocabulary, class graph, signals, starter queries) a client reads before it knows what to query. Reports bytes fetched + range count.

rete card-url https://host/data.rete --json

`rete summary-url <url>`

Fetch only the header + dictionary + pyramid summary and print the coarse graph. The (large) index is never fetched — the "overview first" path.

`rete query-url <url> [--subject S] [--predicate P] [--object O]`

Match a triple pattern over HTTP, fetching only the byte ranges the query needs and reporting how many ranges/bytes were pulled. Bound terms are resolved from the dictionary first; if they exist, the reader fetches only the selected permutation payload rather than the whole index container. If a bound term is unknown, the index is skipped entirely and the result is empty.

rete query-url https://host/data.rete --object '<http://ex/Dave>'

`rete sparql-url <url> "<query>" [--json]`

Run a full SPARQL query over HTTP with lazy tile faulting (tiled files): the open fetches the header, dictionary, pyramid, and the index's small tile directories; index tiles are then range-fetched only when the query's scans and probes touch them, so a selective query reads O(touched tiles) rather than the whole index. A range failure mid-query is reported as an error, never as silently fewer rows. Pre-tiling (v0.1) files fall back to fetching the index whole.

rete sparql-url https://host/data.rete \
  "PREFIX e: <http://ex/> SELECT ?d WHERE { ?d e:dependsOn+ e:log4x }"

Federation

`rete federate <sources…> --query "<SPARQL>" [--json] [--no-route]`

Run one SPARQL query across several .rete sources — local file paths and/or http(s):// URLs, mixed — and merge the results at the term (string) level: SELECT rows are unioned + deduped, ASK is OR'd across sources, CONSTRUCT triples are unioned + deduped. Routing (default on) reads each source's predicate set from its summary (index never touched) and prunes sources whose predicates are disjoint from the query's; --no-route queries every source. Per-source diagnostics and the queried/pruned tally go to stderr.

This is union federation — correct for sharded data where each file independently yields complete rows. It does not do cross-file joins, and aggregates (COUNT/GROUP BY) and LIMIT are evaluated per source then unioned (a federated COUNT(*) returns per-source counts, not a global sum). See Federated queries for the full model, limitations, and a real OpenCitations multi-shard example.

rete federate data/opencitations/cites-2021.rete data/opencitations/cites-2024.rete \
  --query 'SELECT ?citing WHERE { ?citing <http://purl.org/spar/cito/cites>
                                          <https://doi.org/10.1038/s41586-021-03819-2> } LIMIT 5'

Exit codes

0 on success; non-zero with a message on error (bad input, missing file, corrupt file, a host that ignores Range, or an unsupported query construct).