feat(webapp,run-engine): mollifier drainer replay + stale sweep + cancelled-run engine API by d-cs · Pull Request #3754 · triggerdotdev/trigger.dev

d-cs · 2026-05-26T10:45:05Z

Summary

The replay side of the mollifier:

DrainerHandler: reads buffered snapshots and replays them through engine.trigger to materialise PG rows.
RunEngine.createCancelledRun: new public method the handler uses to write CANCELED rows directly from snapshots (bypass queue + waitpoint, emit runCancelled). Tolerates the cjson empty-table tags edge case found during validation.
Drainer fairness: org → env rotation so a heavy env doesn't starve light ones in the same org.
Stale-entry sweep + telemetry + alertable gauge so a stuck/offline drainer surfaces in alerts.

Both the drainer and sweep default-off; nothing fires unless flagged on (TRIGGER_MOLLIFIER_DRAINER_ENABLED, TRIGGER_MOLLIFIER_STALE_SWEEP_ENABLED).

Stacked on the trigger-time decisions PR.

Test plan

`pnpm run typecheck --filter webapp` passes
`pnpm run test --filter webapp test/mollifierDrainerHandler.test.ts` passes
`pnpm run test --filter webapp test/mollifierStaleSweep.test.ts` passes
`pnpm run test --filter @internal/run-engine src/engine/tests/createCancelledRun.test.ts` passes
`pnpm run test --filter @trigger.dev/redis-worker packages/redis-worker/src/mollifier/drainer.test.ts` passes

changeset-bot · 2026-05-26T10:45:10Z

⚠️ No Changeset found

Latest commit: 014313e

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

coderabbitai · 2026-05-26T10:45:13Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 78734780-19ed-4c88-acc4-1615d5a671a3

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch mollifier-phase-3-replay

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

devin-ai-integration

Devin Review found 2 potential issues.

View 3 additional findings in Devin Review.

Two bugs flagged by Devin on PR #3754: 1. entry.server.tsx reverted to \`void sessionsReplicationInstance;\`, which esbuild tree-shakes under \`"sideEffects": false\`. Restored the globalThis assignment + warning comment from #3738 (incident TRI-9864). Without this the sessions→ClickHouse logical replication slot stops being consumed at boot. 2. createFailedTaskRun unconditionally emitted \`runFailed\`, which the \`completeFailedRunEvent\` listener uses to write a span completion into ClickHouse. But TriggerFailedTaskService.call() already wraps createFailedTaskRun inside \`repository.traceEvent({ incomplete: false, isError: true })\` which writes its own completion row for the same (traceId, spanId). Two completions racing on the same span row is a real observability bug. Added an \`emitRunFailedEvent: boolean = true\` opt-out. The TriggerFailedTaskService.call() path now passes \`false\` and enqueues \`PerformTaskRunAlertsService\` directly after the trace event closes so the alerts side of \`runFailed\` is preserved. \`callWithoutTraceEvents\` and the mollifier drainer's terminal- failure path keep the default emit (they have no outer trace event managing the span). Regression test pins the opt-out: \`emitRunFailedEvent: false\` writes the PG row but does NOT fire the bus event. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

…celled-run engine API The replay side of the mollifier: - DrainerHandler that reads buffered snapshots and replays them through engine.trigger to materialise PG rows. - RunEngine.createCancelledRun: new public method the handler uses to write CANCELED rows directly from snapshots (bypass queue + waitpoint, emit runCancelled). Tolerates cjson empty-table tags. - Drainer fairness: org → env rotation so a heavy env doesn't starve light ones in the same org. - Stale-entry sweep + telemetry + alertable gauge for stuck drainers. Both drainer and sweep default-off; nothing fires unless flagged on. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

- `isRetryablePgError`: also accept `errorCode === "P1001"` so `PrismaClientInitializationError` (which surfaces P1001 on a different field than `PrismaClientKnownRequestError`) retries. - Drop `envId` from OTel metric labels on `mollifier.realtime_subscriptions.buffered`, `mollifier.stale_entries`, and the `mollifier.stale_entries.current` gauge. `envId` is a banned high-cardinality attribute; the structured warn log alongside each counter tick still carries envId for forensic drill-down. - Stale-sweep test name + comments now match the assertion shape (all three entries stale, not "two stale + one fresh"). - `RunEngine.createCancelledRun` P2002 path now requires the existing row's status to be CANCELED; a non-canceled conflict throws rather than silently reporting success, so the caller can route to `engine.cancelRun()` or skip. - Regression test pins the new conflict guard. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

…leton Importing the production drainer wiring transitively loads \`~/v3/runEngine.server\`, whose top-level \`singleton(...)\` eagerly constructs a RunEngine. The constructor spins up Prisma + Redis workers that try to connect to localhost — in CI (no PG, no Redis) that produces an unhandled \`PrismaClientInitializationError\` which fails the run even though every assertion passes. Mock the runEngine and prisma modules so the unit test exercises only the bootstrap's error classification, not a live engine. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Container startup + the sweep loop can exceed Vitest's 5s default on CI runners (passes in ~1.7-2s locally). Matches the explicit \`{ timeout: 20_000 }\` other mollifier redisTests carry across the project. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Two bugs flagged by Devin on PR #3754: 1. entry.server.tsx reverted to \`void sessionsReplicationInstance;\`, which esbuild tree-shakes under \`"sideEffects": false\`. Restored the globalThis assignment + warning comment from #3738 (incident TRI-9864). Without this the sessions→ClickHouse logical replication slot stops being consumed at boot. 2. createFailedTaskRun unconditionally emitted \`runFailed\`, which the \`completeFailedRunEvent\` listener uses to write a span completion into ClickHouse. But TriggerFailedTaskService.call() already wraps createFailedTaskRun inside \`repository.traceEvent({ incomplete: false, isError: true })\` which writes its own completion row for the same (traceId, spanId). Two completions racing on the same span row is a real observability bug. Added an \`emitRunFailedEvent: boolean = true\` opt-out. The TriggerFailedTaskService.call() path now passes \`false\` and enqueues \`PerformTaskRunAlertsService\` directly after the trace event closes so the alerts side of \`runFailed\` is preserved. \`callWithoutTraceEvents\` and the mollifier drainer's terminal- failure path keep the default emit (they have no outer trace event managing the span). Regression test pins the opt-out: \`emitRunFailedEvent: false\` writes the PG row but does NOT fire the bus event. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Rename the catch-all mollifier.md and trim it to the drainer replay handler, stale sweep, telemetry gauge, and run-engine cancelled/failed APIs; later read/mutation/dashboard work is documented in its own PR. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

…tage in mollifier drainer The mollifier drainer's cancel bifurcation called engine.createCancelledRun without handling its documented conflict contract: when the normal trigger replay path races ahead and materialises a live (non-CANCELED) row, the engine throws a conflict so the caller can "decide between engine.cancelRun() and skipping". The handler did neither — the conflict propagated, isRetryablePgError returned false, and the drainer buffer.fail()'d the entry, silently losing the cancellation while the run kept executing. Now route conflicts to engine.cancelRun() so the cancel actually wins. Separately, when engine.trigger fails non-retryably and the SYSTEM_FAILURE fallback write then fails because PG is transiently unreachable, rethrowing the original non-retryable error made the drainer buffer.fail() the entry — losing the run with no PG row ever landing, and dropping the write error entirely. Rethrow the retryable write error instead so the drainer requeues; the failure row lands once PG recovers. Non-retryable write failures still rethrow the original error as before. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Remove plan-tracking shorthand (Q# bifurcation, Phase C1/Q4) from replay-layer mollifier comments and test names; reword to plain English. Comment/test-name only; no behaviour change. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

d-cs commented May 26, 2026

View reviewed changes

Comment thread apps/webapp/app/v3/mollifier/mollifierDrainerHandler.server.ts

d-cs commented May 26, 2026

View reviewed changes

Comment thread apps/webapp/app/v3/mollifier/mollifierTelemetry.server.ts

d-cs commented May 26, 2026

View reviewed changes

Comment thread apps/webapp/test/mollifierStaleSweep.test.ts Outdated

d-cs commented May 26, 2026

View reviewed changes

Comment thread internal-packages/run-engine/src/engine/index.ts

d-cs self-assigned this May 26, 2026

d-cs force-pushed the mollifier-phase-3-trigger branch from 626a8dc to af7368e Compare May 26, 2026 11:12

d-cs force-pushed the mollifier-phase-3-replay branch from 31f4726 to b05929b Compare May 26, 2026 11:12

d-cs force-pushed the mollifier-phase-3-trigger branch from 5a7bc19 to baa6f17 Compare May 26, 2026 13:24

d-cs force-pushed the mollifier-phase-3-replay branch from b05929b to b89da52 Compare May 26, 2026 13:24

devin-ai-integration Bot reviewed May 26, 2026

View reviewed changes

Comment thread apps/webapp/app/entry.server.tsx Outdated

Comment thread internal-packages/run-engine/src/engine/index.ts

d-cs force-pushed the mollifier-phase-3-replay branch from 74fdf6d to c6fa61f Compare May 26, 2026 16:20

d-cs force-pushed the mollifier-phase-3-trigger branch from 01f3958 to 449a0bc Compare May 27, 2026 12:04

d-cs force-pushed the mollifier-phase-3-replay branch from 242ba73 to 6a8404d Compare May 27, 2026 12:04

d-cs force-pushed the mollifier-phase-3-trigger branch from 449a0bc to ffe51b8 Compare May 27, 2026 12:15

d-cs force-pushed the mollifier-phase-3-replay branch from 6a8404d to bc9f4e2 Compare May 27, 2026 12:15

d-cs force-pushed the mollifier-phase-3-trigger branch from ffe51b8 to 7ddb17d Compare May 27, 2026 12:21

d-cs force-pushed the mollifier-phase-3-replay branch 2 times, most recently from 637e8c0 to 65219db Compare May 27, 2026 12:58

d-cs force-pushed the mollifier-phase-3-trigger branch 2 times, most recently from 4229f9a to 4f31074 Compare May 27, 2026 14:06

d-cs force-pushed the mollifier-phase-3-replay branch from 65219db to ccdcd9c Compare May 27, 2026 14:06

d-cs force-pushed the mollifier-phase-3-trigger branch from 4f31074 to e56b937 Compare May 27, 2026 15:07

d-cs force-pushed the mollifier-phase-3-replay branch from ccdcd9c to 5f50940 Compare May 27, 2026 15:07

d-cs force-pushed the mollifier-phase-3-trigger branch from e56b937 to cae33fa Compare May 27, 2026 15:33

d-cs force-pushed the mollifier-phase-3-replay branch 3 times, most recently from df65a3b to 1e5b555 Compare May 27, 2026 16:50

d-cs force-pushed the mollifier-phase-3-trigger branch from cae33fa to 16bfff0 Compare May 27, 2026 16:50

d-cs and others added 8 commits May 27, 2026 17:57

d-cs force-pushed the mollifier-phase-3-replay branch from 1e5b555 to 014313e Compare May 27, 2026 16:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(webapp,run-engine): mollifier drainer replay + stale sweep + cancelled-run engine API#3754

feat(webapp,run-engine): mollifier drainer replay + stale sweep + cancelled-run engine API#3754
d-cs wants to merge 8 commits into
mollifier-phase-3-triggerfrom
mollifier-phase-3-replay

d-cs commented May 26, 2026

Uh oh!

changeset-bot Bot commented May 26, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 26, 2026 •

edited

Loading

Review skipped

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

d-cs commented May 26, 2026

Summary

Test plan

Uh oh!

changeset-bot Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

coderabbitai Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

changeset-bot Bot commented May 26, 2026 •

edited

Loading

coderabbitai Bot commented May 26, 2026 •

edited

Loading