Skip to content

Optimize Cypher planning and execution caches#2429

Closed
emotionbug wants to merge 15 commits into
apache:masterfrom
emotionbug:fix/optimize
Closed

Optimize Cypher planning and execution caches#2429
emotionbug wants to merge 15 commits into
apache:masterfrom
emotionbug:fix/optimize

Conversation

@emotionbug
Copy link
Copy Markdown
Contributor

No description provided.

emotionbug added 15 commits May 30, 2026 06:47
Document the PostgreSQL-backed execution direction for Cypher planning and
record the local WSL regression workflow used while iterating on the branch.

Remove the obsolete session-info side channel from Cypher analysis and keep
the planning notes aligned with that simpler data flow. This gives the later
performance work a clearer baseline and records how to run focused tests on
this checkout without hitting Windows mount permission issues.
Reduce repeated executor setup in SET, DELETE, MERGE, and entity-existence
paths by carrying reusable lookup state across rows.

The combined change keeps label relations and endpoint indexes open where the
executor can safely reuse them, defers result-relation and RLS setup until the
write path actually needs it, and avoids scans when a DELETE cannot touch
connected edges. It also switches common id lookups to the primary-key fast
path and uses read locks for read-only existence and endpoint checks.
Reuse label metadata and Cypher function OIDs across parser, planner, load,
VLE, and write-clause paths.

This groups the first wave of label-cache plumbing with the function lookup
caches that consume the same graph metadata. Parser relation lookups, MATCH
label RTE construction, batch insert, graph generation, SET, DELETE, and VLE
edge label handling all consult the shared caches instead of repeating catalog
scans or name-based lookups.
Add generation-aware graph and label lookup caches for repeated catalog
queries that occur during graph loading, VLE setup, endpoint lookup, graph
stats, and label creation.

The cache slots remember graph OIDs, namespace lookups, label metadata, and
edge constraint state while avoiding invalidation churn for unrelated labels.
The accompanying documentation records the focused regression sets used for
these graph-cache changes.
Consolidate AGE catalog, namespace, type, function, label-sequence, and btree
index lookup caches with narrower invalidation rules.

The change keeps common OID lookups close to the call sites that need them but
flushes them on catalog, function, type, index, and sequence changes that can
make the cached values stale. It also preserves Cypher custom-plan cost,
width, disabled-node, and parameterization metadata while sharing common
custom path initialization.
Group the next set of executor and expression hot-path caches that reduce
repeat work inside common Cypher queries.

MERGE update caches, shared btree index lookup, graph-name cache scans, direct
agtype field access, endpoint graph-name reuse, SET tuple slots, DETACH DELETE
scan slots, and scalar argument type caching all avoid rebuilding state that is
stable for the current expression or executor invocation.
Add fast paths for scalar, variadic, string, math, range, collect, access, and
small agtype builder arguments.

The grouped change caches argument types, avoids fallback conversion when the
input shape is already known, reduces startup allocations in lookup caches,
uses direct search-path membership checks, and trims zero-fill work in agtype,
VLE, MERGE, parser, aggregate, and graph traversal setup paths.
Reduce transient allocation and repeated string work in parser function
lookups, CSV load helpers, graph OID microcaches, VLE range handling, and text
scalar conversion paths.

The change frees short-lived lookup data after syscache probes, scans trimmed
CSV fields once, avoids detoast and cstring copies for length-only checks, and
uses direct text varlena data where possible. It also keeps related DML hash,
slot, and planner cheapest-path improvements with the same performance pass.
Replace several agtype scalar conversion round trips with direct length-aware
integer, float, numeric, boolean, and text formatting paths.

This avoids reparsing list values, repeated output length scans, stale label
and proc-name comparisons, and unnecessary string serialization in toString,
toStringList, typecast, concat, and agtype output helpers. The result keeps the
same values while reducing allocation and formatting overhead in tight scalar
loops.
Avoid graph invalidation and metadata serialization when a DML clause does not
actually mutate graph data.

The combined DML changes lazily create DELETE tracking, record actual SET,
CREATE, and MERGE mutations, reuse graph-load snapshots, load graph labels with
one catalog scan, cache SET path update attributes and indexes, hash MERGE
created paths, cache DELETE item positions, and trim path rebuild overhead.
Reduce unnecessary agtype value, entity, path, key, and wrapper copies across
scalar operators, casts, string helpers, list functions, VLE quals, endpoint
lookups, and DML propagation.

The change releases temporary graph values as soon as field, label, id, or
path extraction is complete and keeps no-copy helper usage together so related
memory-lifetime assumptions can be reviewed in one place.
Combine the custom-plan, custom-path, path-replacement, clause-function, and
external-cast cleanup work for Cypher DML.

The grouped change removes duplicated setup between DML nodes, batches clause
function OID loading, and skips redundant external function casts. Keeping
these together makes the planner and executor contract for write clauses easier
to audit.
Widen and refresh AGE metadata caches used by global graph load, label
validity checks, catalog writes, and graph load setup.

The change keeps compact global-graph label arrays, shared graph label names,
metadata miss caching, generation refresh on catalog writes, created-label
relid reuse, DML and parser label lookup caches, and graph-cache setup during
load in one reviewable cache-generation pass.
Carry cached graph and label names through graph generation, label DDL, object
access errors, and graph drop paths.

This avoids SQL wrapper calls and default label lookups when graph OIDs and
label relids are already known, creates labels with the known graph OID, uses
cached relation names when dropping labels, and reuses graph-cache names for
namespace and DDL work.
Collapse the recent VLE micro-optimization commits into one coherent change so
the history describes the traversal work as a single reviewable unit instead of
a long sequence of mechanical fast paths.

The combined change keeps the VLE path materialization, slice, boundary,
endpoint, and uniqueness optimizations together because they all reduce
repeated work in the same expansion pipeline.

This preserves the final code state while making the branch easier to review
and bisect. The commit covers direct stack scans, cached frontier state,
empty-range short circuits, direct relationship and node projection paths,
reverse index rewrites, endpoint mode validation, and regression coverage for
empty VLE ranges.
@emotionbug emotionbug closed this May 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant