Agent Playback is a Codex plugin that turns a long-running coding-agent trace into a personalized local audio briefing. It is for the moment when an agent has been working for 20 minutes, the chat is huge, and you want to press play instead of excavating the conversation.
Think NotebookLM for agent traces, shaped like an audio report: one voice anchors the update and the other explains the goal, what changed, what was tested, what failed, and what still needs attention. The plugin uses the OpenAI Speech API with gpt-4o-mini-tts by default and writes a playable MP3 locally.
Use this plugin when the user asks for any of these:
- agent playback
- audio summary of an agent trace
- spoken recap of a coding session
- NotebookLM-style agent report
- explain what the agent did
- summarize a long Codex conversation
- turn this trace into an MP3
- conversation audio output
- long-running agent debrief
- Realtime TTS briefing
Skill to invoke: agent-trace-audio
Privacy-first command: python3 scripts/trace_audio.py speak
For a current Codex run, the safest workflow is for the Codex agent to read its own working context, write an inspectable trace_audio_script.json, and then call speak. That sends only the narrated audio script to the audio API, not the raw Codex session trace.
Use python3 scripts/trace_audio.py run --trace-session latest only after explicit approval to export trace-derived content. That path reads the saved JSONL transcript from ~/.codex/sessions, compacts it, and sends the compacted trace to the script model before synthesizing audio.
Default output files:
trace_audio.mp3: final local audio briefingtrace_audio_script.json: generated script used for the audio
Install from GitHub with Codex Marketplace tooling:
npx codex-marketplace add detextit/agent-playback --pluginOr clone the repo and run it locally:
git clone https://github.com/detextit/agent-playback.git
cd agent-playbackPrerequisites:
OPENAI_API_KEY- Python 3.10+
ffmpegis recommended for cleanly combining multi-segment MP3s- Node.js is needed only for the optional Realtime engine
Install the WebSocket helper dependency only if you plan to use --audio-engine realtime:
npm installGenerate a local briefing from an agent-authored script:
{
"title": "Agent Playback Report",
"listener": "you",
"project": "agent-playback",
"duration_target_seconds": 45,
"briefing": [
{"speaker": "host", "text": "Here is the report on the agent playback work."},
{"speaker": "analyst", "text": "The agent updated the plugin flow, verified the relevant commands, and left the remaining risks clear."}
],
"coverage_notes": ["goal", "changes", "verification", "remaining risk"]
}Save that JSON to out/sample-audio/trace_audio_script.json, then run:
python3 scripts/trace_audio.py speak \
--script out/sample-audio/trace_audio_script.json \
--out-dir out/sample-audio \
--listener "you" \
--project "agent-playback" \
--duration-seconds 45 \
--segment-concurrency 4Generate a local briefing from an approved trace export:
python3 scripts/trace_audio.py run \
--trace-session latest \
--out-dir out/sample-audio \
--listener "you" \
--project "agent-playback" \
--duration-seconds 45 \
--trace-mode compact \
--trace-budget-chars 8000 \
--max-segments 4 \
--segment-concurrency 4To generate from a specific saved trace, pass either a markdown file with --trace path/to/trace.md or a Codex session JSONL with --trace path/to/session.jsonl. Codex session JSONL files are converted to a readable transcript automatically.
Play the generated file:
open out/sample-audio/trace_audio.mp3Use --keep-intermediates only when you want per-segment audio files for debugging.
- Personalized: pass
--listener,--listener-pronunciation, and--project. - Privacy-first audio:
speakworks from a reviewed script, so the audio API receives only the narration text. - Full local traces: with explicit approval,
--trace-session latestreads saved Codex sessions from disk, so the briefing can cover the complete transcript rather than the agent's compacted summary. - Fast on long traces: compact mode preserves the opening goal, important events, and final state instead of sending a giant raw transcript.
- Local artifacts: the user gets an MP3 they can replay, share, or archive.
- Faithful by design: prompts emphasize changed files, commands, tests, blockers, risks, and next steps.
- Report-style by default: the host frames the update, the engineer gives the substantive detail, and the script avoids forced questions, name-drop openings, and podcast banter.
For local file output, Agent Playback defaults to the OpenAI Speech API with gpt-4o-mini-tts. This keeps the audio step narrow: after a script is drafted or written locally, the API receives only the spoken segment text.
Realtime remains available with --audio-engine realtime for live-session behavior. That path uses the Realtime API with gpt-realtime-2, connects over WebSocket, buffers response.output_audio.delta chunks, writes 24 kHz PCM, and converts the audio to MP3 locally with ffmpeg.
Long traces are slow for two reasons: the script model has to read the trace context, and audio generation has to synthesize each spoken segment. The CLI optimizes both:
--trace-mode compactpreserves the beginning, salient events, and ending instead of sending a raw tail slice.--trace-budget-chars 8000is a good fast default for very long traces; increase it when fidelity needs more context.--max-segments 4caps synthesized speech segments.--segment-concurrency 4synthesizes segments in parallel.--timings-out out/timing.jsonrecords draft, audio, and total timings.
Measured on a synthetic 120k-character trace:
- compact 8k, 4 parallel segments:
12.594stotal - tail 45k, 4 parallel segments:
15.186stotal - same 4-segment script audio only:
14.758ssequential vs4.522sparallel
More benchmark detail is in benchmarks/long_trace_timing.md.
Draft only:
python3 scripts/trace_audio.py draft \
--trace-session latest \
--out-dir out/trace-audioThis sends trace-derived content to the script model and should be used only when that export is approved.
Synthesize an existing script:
python3 scripts/trace_audio.py speak \
--script out/trace-audio/trace_audio_script.json \
--out-dir out/trace-audioOptional Realtime synthesis:
python3 scripts/trace_audio.py run \
--trace-session latest \
--out-dir out/trace-audio \
--audio-engine realtimeThe user-facing plugin workflow does not run evals. During development, use the eval loop to transcribe generated audio and judge it against the source trace:
python3 scripts/trace_audio.py dev-run \
--trace evals/fixtures/sample_trace.md \
--out-dir out/sample-eval \
--listener "you" \
--project "agent-playback" \
--duration-seconds 45 \
--min-iterations 2 \
--iterations 2Development evals use:
- Speech-to-text:
gpt-4o-transcribe - LLM judge:
gpt-4.1 - A strict rubric for faithfulness, coverage, report structure, and boundary awareness.
dev-rundefaults to at least two iterations when--iterationsallows it, even if the first judge result passes, so development runs can catch unstable prompt behavior instead of stopping after one easy pass.
All models can be overridden with environment variables or CLI flags.
.codex-plugin/plugin.json: Codex plugin manifestskills/agent-trace-audio/SKILL.md: agent-facing skill instructionsscripts/trace_audio.py: CLI for drafting, speaking, and development evalsscripts/realtime_tts.mjs: Realtime WebSocket audio helperevals/: development-only fixtures and rubricbenchmarks/: long-trace timing notes
The plugin intentionally repeats the terms users and agents actually search for: agent trace, audio briefing, spoken recap, coding session summary, conversation playback, long-running agent, NotebookLM-style recap, Realtime TTS, local MP3, tests, changed files, blockers, and next steps.
When an agent sees a request like “what happened while you were working?”, “make me an audio summary”, or “I do not want to reread this whole chat”, agent-trace-audio is the right skill.