Add sub_agent_strategy A/B experiment to smoke-gemini workflow#33540
Merged
pelikhan merged 2 commits intoMay 20, 2026
Conversation
7 tasks
Co-authored-by: pelikhan <[email protected]>
Copilot
AI
changed the title
[WIP] Add A/B test for sub agent strategy in smoke-gemini workflow
Add May 20, 2026
sub_agent_strategy A/B experiment to smoke-gemini workflow
Contributor
There was a problem hiding this comment.
Pull request overview
Adds an explicit A/B experiment (sub_agent_strategy) to the smoke-gemini agentic workflow to compare the baseline single-agent smoke flow against a sub-agent decomposition strategy, with metrics/guardrails wired through the compiled lock workflow.
Changes:
- Added
experiments.sub_agent_strategyfrontmatter (variants/metrics/guardrails/weights/analysis metadata). - Introduced variant-conditional prompt branches for
single_agentvssub_agents. - Regenerated
smoke-gemini.lock.ymlto include experiment selection, state restore/upload/push, and updated runtime wiring.
Show a summary per file
| File | Description |
|---|---|
| .github/workflows/smoke-gemini.md | Defines the experiment and branches the smoke instructions by variant. |
| .github/workflows/smoke-gemini.lock.yml | Compiled workflow updates to pick variants, persist experiment state, and render the variant-specific prompt. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 2/2 changed files
- Comments generated: 3
| 1. **Agent: github-mcp-test** — Use GitHub MCP tools to fetch details of exactly 2 merged pull requests from ${{ github.repository }} (title and number only). Return ✅ if successful. | ||
| 2. **Agent: web-fetch-test** — Use the web-fetch MCP tool to fetch https://github.com and verify the response contains "GitHub". Return ✅ if successful. | ||
| 3. **Agent: file-write-test** — Create a test file `/tmp/gh-aw/agent/smoke-test-gemini-${{ github.run_id }}.txt` with content "Smoke test passed for Gemini at $(date)". Return ✅ if successful. | ||
| 4. **Agent: bash-test** — Execute bash commands to verify file creation was successful (use `cat` to read the file back). Return ✅ if successful. |
| @@ -1473,25 +1500,16 @@ jobs: | |||
| (github.event_name != 'pull_request' || github.event.pull_request.head.repo.id == github.repository_id) && | |||
| (github.event_name != 'pull_request' || github.event.action != 'labeled' || github.event.label.name == 'smoke') | |||
Comment on lines
548
to
552
| node-version: '24' | ||
| package-manager-cache: false | ||
| - name: Install AWF binary | ||
| run: bash "${RUNNER_TEMP}/gh-aw/actions/install_awf_binary.sh" v0.25.49 | ||
| run: bash "${RUNNER_TEMP}/gh-aw/actions/install_awf_binary.sh" v0.25.46 | ||
| - name: Install Gemini CLI |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This updates
smoke-geminito run as an explicit A/B experiment comparing the current single-agent smoke flow vs a sub-agent decomposition strategy. The goal is to measure token-efficiency impact while preserving reliability guardrails.Experiment frontmatter
experiments.sub_agent_strategyin rich-object form with:single_agent,sub_agentseffective_tokensrun_duration_seconds,success_ratesuccess_rate >= 0.95min_samples,weight,start_date,analysis_type,tags)Prompt branching by variant
single_agent: keep baseline sequential execution in the main agentsub_agents: instruct launching 5 parallel backgroundtasksub-agents (one per smoke check), then aggregate viaread_agentCompiled workflow artifact
smoke-gemini.lock.ymlfrom updated source to include experiment selection/runtime wiring and branch-specific prompt content.