Skip to content

ci: add kernel-e2e workflow + KERNEL_REV pin for use_kernel=True coverage#808

Merged
vikrantpuppala merged 7 commits into
mainfrom
vikrant/kernel-e2e-ci
May 27, 2026
Merged

ci: add kernel-e2e workflow + KERNEL_REV pin for use_kernel=True coverage#808
vikrantpuppala merged 7 commits into
mainfrom
vikrant/kernel-e2e-ci

Conversation

@vikrantpuppala
Copy link
Copy Markdown
Contributor

Summary

  • Adds .github/workflows/kernel-e2e.yml that builds the (private) databricks/databricks-sql-kernel wheel from a pinned SHA and runs tests/e2e/test_kernel_backend.py against the dogfood warehouse.
  • Adds KERNEL_REV at the repo root pinning the kernel commit the connector is tested against (currently aed2efbed8087171d61848f5ad98c7e171827698). Bumping this file is the only way to pick up a new kernel version — keeps the connector ↔ kernel pair bisectable.
  • Reuses the existing INTEGRATION_TEST_APP (the same App that dispatches to databricks/databricks-driver-test) to clone the kernel repo at the pinned SHA. The App's repo allowlist needs to be extended to include databricks/databricks-sql-kernel.

Gate semantics (mirrors trigger-integration-tests.yml)

  • Plain PR events post a synthetic-success Kernel E2E check so the required check doesn't block PRs that don't touch kernel code.
  • kernel-e2e label on a PR triggers a preview run. Auto-removed on synchronize for the same security reason integration-test is.
  • merge_group is the real gate — dispatches the suite when kernel-relevant files changed (src/databricks/sql/backend/kernel/, tests/e2e/test_kernel_backend.py, tests/unit/test_kernel_*, .github/workflows/kernel-e2e.yml, KERNEL_REV, pyproject.toml, poetry.lock); auto-passes otherwise.

What's NOT in this PR

  • Unit tests are unchanged. tests/unit/test_kernel_*.py already runs in every code-quality-checks.yml matrix combo against a fake databricks_sql_kernel ModuleType injected into sys.modules at import time. The "also run unit tests against the real wheel" question is a real one — tracking as a follow-up because it requires refactoring the fake-injection bootstrap in the unit test files. Drift between the fake and the real kernel API is currently caught by the e2e suite (which this PR newly gates on).
  • No reverse dispatch (kernel → connector). A future change in the kernel repo will dispatch back into this repo to run connector e2e against the kernel's main. Will land as a follow-up workflow on the kernel side.

Required external setup before this can become a required check

  1. Extend the INTEGRATION_TEST_APP (INTEGRATION_TEST_APP_ID / INTEGRATION_TEST_PRIVATE_KEY) repo allowlist to include databricks/databricks-sql-kernel (one-line change in the App config).
  2. Create the kernel-e2e label in this repo.
  3. Add Kernel E2E to the main branch protection's required status checks once a green run lands.

Until step 3 is done, the synthetic-success check posted on every PR is harmless — it just looks like a passing optional check.

Test plan

  • Push this branch and confirm the synthetic-success Kernel E2E check posts on the PR.
  • After step 1 above lands (App allowlist update), add the kernel-e2e label here and confirm the real run dispatches, builds the kernel wheel, and runs test_kernel_backend.py green.
  • Verify the synchronize handler removes the kernel-e2e label when a follow-up commit is pushed.
  • Sanity-check that a PR that touches an unrelated file (e.g. README.md) does NOT trigger the real run in merge queue — should auto-pass.

This pull request and its description were written by Isaac.

Wires up CI coverage for use_kernel=True. The kernel is a private
repo with no published wheel, so we pin a kernel SHA in KERNEL_REV
and build the wheel inline via maturin develop using the existing
INTEGRATION_TEST_APP GitHub App (extended to include
databricks/databricks-sql-kernel in its repo allowlist).

Gate semantics mirror trigger-integration-tests.yml:
- Plain PR events post a synthetic-success Kernel E2E check so the
  required check doesn't block PRs that don't touch kernel code.
- The kernel-e2e label triggers a preview run on the PR and is
  auto-removed on synchronize for the same security reason as the
  integration-test label.
- merge_group is the real gate — runs when kernel-relevant files
  change (src/databricks/sql/backend/kernel/, test_kernel_backend.py,
  KERNEL_REV, etc.), auto-passes otherwise.

Unit tests are unchanged: tests/unit/test_kernel_*.py already runs
in every code-quality-checks.yml matrix combo against a fake
databricks_sql_kernel module injected at sys.modules import time.

Required follow-up before this merges:
1. Extend the INTEGRATION_TEST_APP allowlist to include
   databricks/databricks-sql-kernel.
2. Create the kernel-e2e label in this repo.
3. Add Kernel E2E as a required check on main once a green run lands.

Co-authored-by: Isaac
Signed-off-by: Vikrant Puppala <[email protected]>
@vikrantpuppala vikrantpuppala added the kernel-e2e Trigger preview run of the Kernel E2E workflow on this PR label May 27, 2026
setup-poetry runs setup-jfrog, which exchanges a GitHub OIDC token
for a JFrog access token to reach the internal PyPI mirror. That
needs id-token: write on the job, which was missing — the labelled
preview run failed at setup-poetry with
"ACTIONS_ID_TOKEN_REQUEST_TOKEN: unbound variable".

Declared at both workflow scope and on run-kernel-e2e directly:
a job-level permissions block fully overrides workflow scope, so
the redundancy is intentional.

Co-authored-by: Isaac
Signed-off-by: Vikrant Puppala <[email protected]>
@github-actions github-actions Bot removed the kernel-e2e Trigger preview run of the Kernel E2E workflow on this PR label May 27, 2026
@vikrantpuppala vikrantpuppala added the kernel-e2e Trigger preview run of the Kernel E2E workflow on this PR label May 27, 2026
`poetry run maturin develop` from inside databricks-sql-kernel/pyo3/
makes poetry create a fresh, empty .venv next to the kernel source
(it discovers pyo3/pyproject.toml first and treats it as the project
root). That venv has no maturin → "Command not found: maturin".

Resolve the connector venv's python path explicitly before changing
working directory, then call maturin from that python via
`-m maturin`. `--interpreter <path>` pins the produced wheel to the
connector venv so the resulting extension is installed where pytest
will look for it.

Co-authored-by: Isaac
Signed-off-by: Vikrant Puppala <[email protected]>
@github-actions github-actions Bot removed the kernel-e2e Trigger preview run of the Kernel E2E workflow on this PR label May 27, 2026
@vikrantpuppala vikrantpuppala added the kernel-e2e Trigger preview run of the Kernel E2E workflow on this PR label May 27, 2026
…flag)

maturin develop installs into whichever python invoked it; the flag
exists on `maturin build` only. The previous commit's extra
`--interpreter $CONNECTOR_VENV_PY` was redundant — we're already
calling maturin via `$CONNECTOR_VENV_PY -m maturin`, so the venv
python is the one doing the build and install.

Co-authored-by: Isaac
Signed-off-by: Vikrant Puppala <[email protected]>
@github-actions github-actions Bot removed the kernel-e2e Trigger preview run of the Kernel E2E workflow on this PR label May 27, 2026
@vikrantpuppala vikrantpuppala added the kernel-e2e Trigger preview run of the Kernel E2E workflow on this PR label May 27, 2026
databricks-protected-runner-group blocks direct egress to
index.crates.io, so the maturin build was failing with SSL EOF on
the cargo metadata step. Extend setup-jfrog with an opt-in
`configure-cargo` input that writes ~/.cargo/config.toml +
credentials.toml against the JFrog db-cargo-remote proxy (recipe
borrowed verbatim from databricks-odbc's setup-jfrog action) and
forward it through setup-poetry so the kernel-e2e workflow can
enable it without bypassing the wrapper.

Bundled cleanups from a workflow audit:

- Drop the redundant `Set up Python 3.10` step — setup-poetry runs
  actions/setup-python internally at the matching version.
- Smoke-check now uses `$CONNECTOR_VENV_PY` (same interpreter we
  built the wheel with), so a wheel installed into the wrong venv
  would surface here rather than be masked by `poetry run python`
  re-resolving.
- Post `Kernel E2E` check on the labelled-PR path as well as the
  merge-queue path; previously the PR would still show the
  synthetic-success check forever even after a real labelled run
  failed.
- Add a comment to fetch-depth: 0 explaining why we keep it.

Co-authored-by: Isaac
Signed-off-by: Vikrant Puppala <[email protected]>
@github-actions github-actions Bot removed the kernel-e2e Trigger preview run of the Kernel E2E workflow on this PR label May 27, 2026
@vikrantpuppala vikrantpuppala added the kernel-e2e Trigger preview run of the Kernel E2E workflow on this PR label May 27, 2026
The original pin (aed2efb) predates kernel PR #36 which added
`complex_types_as_json` to Session.__new__. Connector main already
passes that kwarg (added in PR #795), so every e2e test was failing
with:

  TypeError: Session.__new__() got an unexpected keyword argument
  'complex_types_as_json'

Bump to current kernel main (3aa25b21) which has the kwarg plus the
rest of the comparator-parity changes the connector code already
expects.

This is a good demonstration of why the bisectable KERNEL_REV pin
matters: the connector and kernel evolved in lockstep on `main`
before this CI existed, so the very first thing the workflow does
once it can actually build the wheel is catch that we'd been
shipping a stale pin.

Co-authored-by: Isaac
Signed-off-by: Vikrant Puppala <[email protected]>
@github-actions github-actions Bot removed the kernel-e2e Trigger preview run of the Kernel E2E workflow on this PR label May 27, 2026
@vikrantpuppala vikrantpuppala added the kernel-e2e Trigger preview run of the Kernel E2E workflow on this PR label May 27, 2026
actions-rust-lang/setup-rust-toolchain invokes Swatinem/rust-cache
internally, which runs `cargo metadata` from the workflow's working
directory. Our job's CWD is the connector repo root (no Cargo.toml
there — the kernel checkout is in a subdir), so the bundled cache
attempt fails with exit 101 and dumps a Node stack trace into the
log. It's cosmetic — the action handles its own errors — but reads
as a failure on first glance, and the bundled cache races with the
explicit rust-cache step we already configure with the correct
`workspaces: databricks-sql-kernel` path.

Disabling the bundled cache leaves a single, correctly-keyed
rust-cache invocation and cleans up the log.

Co-authored-by: Isaac
Signed-off-by: Vikrant Puppala <[email protected]>
@github-actions github-actions Bot removed the kernel-e2e Trigger preview run of the Kernel E2E workflow on this PR label May 27, 2026
@vikrantpuppala vikrantpuppala added the kernel-e2e Trigger preview run of the Kernel E2E workflow on this PR label May 27, 2026
@vikrantpuppala vikrantpuppala merged commit 34d4233 into main May 27, 2026
54 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kernel-e2e Trigger preview run of the Kernel E2E workflow on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants