Reduce duplicate frontmatter scanning in ParseWorkflow#34819
Conversation
Co-authored-by: gh-aw-bot <[email protected]>
There was a problem hiding this comment.
Pull request overview
This PR optimizes workflow frontmatter parsing by consolidating frontmatter line extraction and top-level field line attribution into a single metadata pass, reducing redundant scanning work during ParseWorkflow/ParseWorkflowFile parsing.
Changes:
- Replaced separate frontmatter line splitting and top-level key scanning with a combined
extractFrontmatterMetadatapass. - Updated
ExtractFrontmatterFromContentto use the combined metadata output directly. - Added a regression test to ensure blank-line preservation and correct absolute line-number mapping for top-level fields.
Show a summary per file
| File | Description |
|---|---|
| pkg/parser/frontmatter_content.go | Consolidates frontmatter line capture + top-level key line attribution into extractFrontmatterMetadata and wires it into ExtractFrontmatterFromContent. |
| pkg/parser/frontmatter_field_lines_test.go | Adds coverage for preserving blank lines and mapping top-level field lines correctly in mixed frontmatter content. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comments suppressed due to low confidence (1)
pkg/parser/frontmatter_content.go:150
strings.Count(frontmatterYAML, "\n")adds an extra full pass over the frontmatter text. Since this code is on the hot path, consider avoiding the pre-count (or using a small fixed initial capacity) so metadata extraction can stay truly single-pass overfrontmatterYAML.
lines := make([]string, 0, strings.Count(frontmatterYAML, "\n")+1)
fieldLines := make(map[string]int)
- Files reviewed: 2/2 changed files
- Comments generated: 2
| } | ||
| } | ||
| } | ||
| _, fieldLines := extractFrontmatterMetadata(yamlContent, frontmatterStart) |
| func extractFrontmatterMetadata(frontmatterYAML string, frontmatterStart int) ([]string, map[string]int) { | ||
| if frontmatterYAML == "" { | ||
| return []string{}, map[string]int{} | ||
| } |
|
✅ Design Decision Gate 🏗️ completed the design decision gate check. No ADR enforcement needed: PR #34819 does not have the 'implementation' label and has only 52 new lines in business logic directories (below the 100-line threshold). |
|
✅ PR Code Quality Reviewer completed the code quality review. |
|
🧪 Test Quality Sentinel completed test quality analysis. |
|
🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅ |
🧪 Test Quality Sentinel Report✅ Test Quality Score: 95/100 — Excellent
📊 Metrics & Test Classification (1 test analyzed)
Test Classification Details
Language SupportTests analyzed:
|
There was a problem hiding this comment.
Review: Reduce duplicate frontmatter scanning
The consolidation is correct — single-pass extractFrontmatterMetadata produces identical results to the two separate passes it replaces, and the new test confirms blank-line/comment/field-line parity. No correctness or concurrency issues found in the changed lines.
Findings summary
Both unresolved existing comments identify the two real issues with this PR:
-
extractTopLevelFieldLinesnow allocates a full[]stringit immediately discards — the wrapper exists only for the test infrontmatter_field_lines_test.go, so this wasted allocation is on the test path only. Still, a PR whose stated goal is reducing parsing overhead should not leave a caller that is strictly more expensive than before. Consider inlining or eliminating the wrapper. -
splitFrontmatterLinesis now dead code — nothing in production or tests calls it after this change. It should be deleted to avoid drift.
All other candidate issues (relLine arithmetic, trailing-newline trim vs. fieldLines correlation, scope of frontmatterStartLine) are either pre-existing or not actionable in the changed lines.
🔎 Code quality review by PR Code Quality Reviewer · sonnet46 1.8M
There was a problem hiding this comment.
Skills-Based Review 🧠
Applied /zoom-out and /tdd — no blocking issues, but two cleanup opportunities and a test coverage gap worth addressing.
📋 Key Themes & Highlights
Key Themes
- Dead code left behind:
splitFrontmatterLines(lines 131–142) is no longer called anywhere — it should be deleted. - Thin wrapper:
extractTopLevelFieldLinesis now a one-liner that only exists to satisfy its own test; consider collapsing it. - Test coverage: The added regression test is well-targeted, but the trailing-no-newline branch and the duplicate-key dedup path in
extractFrontmatterMetadataare uncovered.
Positive Highlights
- ✅ Clean single-pass consolidation —
extractFrontmatterMetadatais easy to reason about and neatly replaces two separate scans. - ✅
strings.IndexBytemicro-optimisation is a sensible improvement overstrings.Indexfor a single-byte delimiter. - ✅ Capacity hint (
strings.Count(..., "\n")+1) avoids slice reallocations on the hot path — good attention to detail. - ✅
frontmatterStartLineconst promotion is clean and the value is now easier to find.
🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · sonnet46 2.6M
Comments that could not be inline-anchored
pkg/parser/frontmatter_content.go:131
[/zoom-out] splitFrontmatterLines is now unreachable dead code — ExtractFrontmatterFromContent was its only call site, and that now delegates to extractFrontmatterMetadata. The function can be deleted.
<details>
<summary>💡 Suggested cleanup</summary>
Remove lines 131–142 (splitFrontmatterLines). Keeping it silently diverges from the new single-pass contract and will confuse future readers about which path is canonical.
</details>
pkg/parser/frontmatter_content.go:34
[/zoom-out] extractTopLevelFieldLines is now a one-liner wrapper around extractFrontmatterMetadata that discards the []string return value. Its only remaining caller is the direct unit test (frontmatter_field_lines_test.go:76). Consider either removing the wrapper and updating the test to call extractFrontmatterMetadata directly, or documenting why the indirection is intentional.
<details>
<summary>💡 Options</summary>
Option A – remove the wrapper (preferred): update `front…
pkg/parser/frontmatter_field_lines_test.go:111
[/tdd] The regression test covers the happy path well, but two boundary conditions for the new extractFrontmatterMetadata logic are unexercised: (1) input with no trailing newline and (2) a duplicate top-level key (second occurrence should be ignored).
<details>
<summary>💡 Suggested additions</summary>
t.Run("no trailing newline preserves all lines", func(t *testing.T) {
// frontmatterYAML without a trailing \n — the trim branch at line 175
// must NOT remove the …
</details>
The failing job is I checked the scope of the last two commits: they only touch I also reproduced locally:
So this looks like an unrelated/flaky |
ParseWorkflowshowed a benchmark regression in the workflow parsing phase. This change trims repeated work in frontmatter extraction so the parser does less per-file scanning on the hot path.What changed
FrontmatterLinesand top-levelFieldLinestogether instead of scanning the same YAML twice.Why this matters
ParseWorkflowFilealways goes through frontmatter extraction.Behavior preserved
FrontmatterLines.FieldLines.Example
Targeted coverage