Skip to content

feat(transform-dedup): Add file-based persistence for dedup state #136

@warren-t-c

Description

@warren-t-c

Problem

@orgloop/transform-dedup (v0.1.10) uses an in-memory Map for dedup state. On service restart, the seen map is cleared, causing previously-deduplicated events to pass through again and trigger duplicate route firings.

Current behavior

From the README:

"State is in-memory only and lost on restart."

Config schema: store: string — Storage backend. Only "memory" currently.

Any service restart (config change, update, crash recovery) loses all dedup state, causing events within the dedup window to be re-delivered.

Proposal

Add store: "file" option, using the same FileCheckpointStore infrastructure from #107.

Suggested config

transforms:

  • name: dedup
    type: package
    package: "@orgloop/transform-dedup"
    config:
    window: "15m"
    store: "file"
    fields:
    - "type"
    - "source"
    - "provenance.platform_event"
    - "provenance.issue_number"
    Requirements
  1. Persist hash + timestamp to file on each pass/drop decision
  2. On startup, load persisted hashes, filter out entries older than window
  3. Atomic file writes (write to temp, rename) — same as connector checkpoints
  4. Files in .orgloop/checkpoints/dedup-.json
  5. store: "memory" remains default for backward compat
  6. Periodic expired-entry cleanup (existing timer pattern)

Context

Related: #107 — extends connector checkpoint persistence to the dedup transform.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions