Optimize Cypher planning and execution caches#2429
Closed
emotionbug wants to merge 15 commits into
Closed
Conversation
Document the PostgreSQL-backed execution direction for Cypher planning and record the local WSL regression workflow used while iterating on the branch. Remove the obsolete session-info side channel from Cypher analysis and keep the planning notes aligned with that simpler data flow. This gives the later performance work a clearer baseline and records how to run focused tests on this checkout without hitting Windows mount permission issues.
Reduce repeated executor setup in SET, DELETE, MERGE, and entity-existence paths by carrying reusable lookup state across rows. The combined change keeps label relations and endpoint indexes open where the executor can safely reuse them, defers result-relation and RLS setup until the write path actually needs it, and avoids scans when a DELETE cannot touch connected edges. It also switches common id lookups to the primary-key fast path and uses read locks for read-only existence and endpoint checks.
Reuse label metadata and Cypher function OIDs across parser, planner, load, VLE, and write-clause paths. This groups the first wave of label-cache plumbing with the function lookup caches that consume the same graph metadata. Parser relation lookups, MATCH label RTE construction, batch insert, graph generation, SET, DELETE, and VLE edge label handling all consult the shared caches instead of repeating catalog scans or name-based lookups.
Add generation-aware graph and label lookup caches for repeated catalog queries that occur during graph loading, VLE setup, endpoint lookup, graph stats, and label creation. The cache slots remember graph OIDs, namespace lookups, label metadata, and edge constraint state while avoiding invalidation churn for unrelated labels. The accompanying documentation records the focused regression sets used for these graph-cache changes.
Consolidate AGE catalog, namespace, type, function, label-sequence, and btree index lookup caches with narrower invalidation rules. The change keeps common OID lookups close to the call sites that need them but flushes them on catalog, function, type, index, and sequence changes that can make the cached values stale. It also preserves Cypher custom-plan cost, width, disabled-node, and parameterization metadata while sharing common custom path initialization.
Group the next set of executor and expression hot-path caches that reduce repeat work inside common Cypher queries. MERGE update caches, shared btree index lookup, graph-name cache scans, direct agtype field access, endpoint graph-name reuse, SET tuple slots, DETACH DELETE scan slots, and scalar argument type caching all avoid rebuilding state that is stable for the current expression or executor invocation.
Add fast paths for scalar, variadic, string, math, range, collect, access, and small agtype builder arguments. The grouped change caches argument types, avoids fallback conversion when the input shape is already known, reduces startup allocations in lookup caches, uses direct search-path membership checks, and trims zero-fill work in agtype, VLE, MERGE, parser, aggregate, and graph traversal setup paths.
Reduce transient allocation and repeated string work in parser function lookups, CSV load helpers, graph OID microcaches, VLE range handling, and text scalar conversion paths. The change frees short-lived lookup data after syscache probes, scans trimmed CSV fields once, avoids detoast and cstring copies for length-only checks, and uses direct text varlena data where possible. It also keeps related DML hash, slot, and planner cheapest-path improvements with the same performance pass.
Replace several agtype scalar conversion round trips with direct length-aware integer, float, numeric, boolean, and text formatting paths. This avoids reparsing list values, repeated output length scans, stale label and proc-name comparisons, and unnecessary string serialization in toString, toStringList, typecast, concat, and agtype output helpers. The result keeps the same values while reducing allocation and formatting overhead in tight scalar loops.
Avoid graph invalidation and metadata serialization when a DML clause does not actually mutate graph data. The combined DML changes lazily create DELETE tracking, record actual SET, CREATE, and MERGE mutations, reuse graph-load snapshots, load graph labels with one catalog scan, cache SET path update attributes and indexes, hash MERGE created paths, cache DELETE item positions, and trim path rebuild overhead.
Reduce unnecessary agtype value, entity, path, key, and wrapper copies across scalar operators, casts, string helpers, list functions, VLE quals, endpoint lookups, and DML propagation. The change releases temporary graph values as soon as field, label, id, or path extraction is complete and keeps no-copy helper usage together so related memory-lifetime assumptions can be reviewed in one place.
Combine the custom-plan, custom-path, path-replacement, clause-function, and external-cast cleanup work for Cypher DML. The grouped change removes duplicated setup between DML nodes, batches clause function OID loading, and skips redundant external function casts. Keeping these together makes the planner and executor contract for write clauses easier to audit.
Widen and refresh AGE metadata caches used by global graph load, label validity checks, catalog writes, and graph load setup. The change keeps compact global-graph label arrays, shared graph label names, metadata miss caching, generation refresh on catalog writes, created-label relid reuse, DML and parser label lookup caches, and graph-cache setup during load in one reviewable cache-generation pass.
Carry cached graph and label names through graph generation, label DDL, object access errors, and graph drop paths. This avoids SQL wrapper calls and default label lookups when graph OIDs and label relids are already known, creates labels with the known graph OID, uses cached relation names when dropping labels, and reuses graph-cache names for namespace and DDL work.
Collapse the recent VLE micro-optimization commits into one coherent change so the history describes the traversal work as a single reviewable unit instead of a long sequence of mechanical fast paths. The combined change keeps the VLE path materialization, slice, boundary, endpoint, and uniqueness optimizations together because they all reduce repeated work in the same expansion pipeline. This preserves the final code state while making the branch easier to review and bisect. The commit covers direct stack scans, cached frontier state, empty-range short circuits, direct relationship and node projection paths, reverse index rewrites, endpoint mode validation, and regression coverage for empty VLE ranges.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.