Skip to content

[BUG] pipeline: kcontrol-setup IPC timeout leaks pipeline_ida in sof_widget_setup_unlocked error path; failed to assign pipeline id … -28 survives PCI rebind (MTL, IPC4) #10826

@anka-213

Description

@anka-213

Describe the bug

All audio routed through the SOF DSP — internal speakers, headphone jack, internal microphones, HDMI audio — stopped working mid-session on a Meteor Lake-P laptop (ThinkPad X1 Carbon Gen 12, SOF IPC4 firmware 2.13.0.1, kernel 6.12.90). USB audio on the dock and Bluetooth audio still work because they don't traverse the DSP.

The trigger was a single DSP IPC timeout during a DSP resume. From that point on, every hw_params against the affected PCMs fails with failed to assign pipeline id for pipeline.N: -28. That error string is emitted from sound/soc/sof/ipc4-topology.c immediately after ida_alloc_max(&pipeline_ida, …) returns -ENOSPC — i.e. the failure is a kernel-side IDA pool exhaustion, the IPC is never sent to the DSP. See Root cause analysis below.

The state has persisted across a userspace audio stack restart, a PCI driver unbind/bind that re-uploaded firmware, and an S3 suspend/resume cycle. It is reproducible at will from this point (every hw_params against the affected PCMs fails the same way), but I do not have an isolated reproducer for the triggering IPC timeout itself.

What I tried to diagnose: inspected dmesg for the IPC timeout sequence and the cascading -ENOSPC errors, captured PipeWire user journal at the moments of failure, restarted the audio stack, did a PCI driver rebind (firmware re-uploaded but failure persisted), did an S3 cycle (failure persisted), inspected /sys/kernel/debug/sof/* text entries (firmware is in state 7 = SOF_FW_BOOT_COMPLETE), captured amixer -c 0, and extracted strings from sof-hda-generic-2ch.tplg to map the failing pipeline numbers back to widgets.

Environment

  • Kernel: Linux v6.12.90 (linux-6.12.y stable tag, packaged in NixOS 25.11 from the upstream tarball mirror://kernel/linux/kernel/v6.x/linux-6.12.90.tar.xz; no separate downstream SHA).
  • SOF firmware: sof-bin v2025.05.1 (prebuilt release tarball from thesofproject/sof-bin). Booted firmware reports ADSPFW 2.13.0.1. No separate firmware-source SHA pinned in nixpkgs.
  • soft (tools / topology): Whatever ships inside sof-bin v2025.05.1. Loaded topology: intel/sof-ace-tplg/sof-hda-generic-2ch.tplg, Topology ABI 3:29:1, Kernel ABI 3:23:1.
  • Topology file: intel/sof-ace-tplg/sof-hda-generic-2ch.tplg
  • Platform: Lenovo ThinkPad X1 Carbon Gen 12 (DMI: LENOVO-21KC00EEMX-ThinkPadX1CarbonGen12). Intel Meteor Lake-P HD Audio Controller, PCI 0000:00:1f.3 (8086:7e28). Realtek ALC287 HDA codec (HDA:10ec0287,17aa231e,00100002). 2 digital microphones; HDMI declared as iec61937-pcm:5,4,3. ALSA card sof-hda-dsp (driver snd_soc_skl_hda_dsp). Userspace: PipeWire 1.4.9, WirePlumber 0.5.x.

Reproducibility Rate

The triggering event (one DSP IPC timeout on a DSP resume) occurred 1 time over ~30 hours of uptime with many lid-driven S3 suspend/resume cycles and several Thunderbolt 3 dock connect/disconnect events. I cannot put a meaningful rate on the trigger.

After it occurred, the downstream symptom is 100% deterministic: every hw_params attempt against the affected PCMs fails identically, and the failure has persisted through one userspace-services restart, one PCI driver unbind/bind, and one S3 suspend/resume. I have not yet rebooted.

Steps to reproduce

I do not have an isolated reproducer for the triggering IPC timeout. The session that hit it:

  1. Boot normally (2026-05-28 ~08:18 CEST). Audio works.
  2. Use the laptop, suspending/resuming via the lid many times across the day. A Thunderbolt 3 dock with an attached HDMI display was connected and disconnected several times.
  3. At 2026-05-28 14:59:18 the dock is disconnected (pciehp: Card not present, undocked from hotplug port replicator). The HDMI display goes away with it.
  4. ~19 minutes of dmesg silence (laptop presumably idle; runtime PM is not logged in this kernel build).
  5. At 2026-05-28 15:18:57 a DSP IPC times out (ipc timed out for 0x44000007|0x30000018). The kernel error path emits error: set pcm hw_params after resume, which indicates sof_pcm_prepare() was running with the set_hw_params_upon_resume flag set — i.e. the DSP was being resumed (almost certainly runtime PM, since no PM: suspend exit appears near this event). DSP reports fw_state: SOF_FW_BOOT_COMPLETE (7) and ROM_EXT, state: FW_ENTERED, running — firmware did not crash, just failed to respond inside the IPC timeout window. Kernel emits IPC/DSP dumps and abandons the post-resume pcm 0 dir 0 setup with -ETIMEDOUT (-110).
  6. Continue using the laptop. Many more suspend/resume cycles.
  7. On the next system S3 resume (2026-05-29 08:49:34, 4 s after PM: suspend exit), the kernel tries to set up pcm 0 dir 0 again and fails with failed to assign pipeline id for pipeline.1: -28. Audio is broken from this point on.
  8. The downstream -ENOSPC symptom then occurs on every subsequent hw_params attempt and persists across the one userspace-services restart, the one PCI driver unbind/bind, and the one S3 cycle I attempted (see Actual Result).

Expected Result

  • An IPC timeout during DSP resume should either not occur, or — if it does — should not leave the kernel unable to allocate pipeline IDs on subsequent attempts.
  • A PCI driver unbind/bind should be a sufficient recovery without requiring a reboot. Today it isn't — see the Root cause analysis section for why (pipeline_ida is a file-static global, not per-snd_sof_dev).

Actual Result

  • After the IPC timeout on 2026-05-28 15:18:57, every later hw_params attempt has failed with failed to assign pipeline id for pipeline.N: -28.
  • The failing pipeline number tracks what the host is trying to instantiate: pipeline.1 for pcm 0 dir 0 (HDA Analog / Speaker path) on every post-suspend retry, and pipeline.15 for pcm 31 dir 0 after I restarted pipewire (wireplumber then probed a different PCM).
  • Userspace symptom: spa.alsa: set_hw_params: No space left on device; sinks go to error; canberra-gtk-play -i bell hangs ~30 s and exits with Failed to play sound: IO error. (canberra-gtk-play -i bogusname returns File or data not found immediately, confirming the audio server is reachable.)
  • After systemctl --user restart pipewire pipewire-pulse wireplumber, the HiFi UCM profile no longer registers — wpctl inspect shows the card's EnumProfile containing only off and pro-audio. All Speaker and HDMI sinks disappear. The dock USB audio (separate ALSA card) and Bluetooth audio (off the SOF path entirely) continue to work — only paths routed through the SOF DSP are affected.

Recovery attempts that did not clear the state:

  1. Restart of pipewire pipewire-pulse wireplumber — services restarted cleanly; HiFi profile gone afterwards.
  2. PCI driver unbind + bind:
    echo 0000:00:1f.3 | sudo tee /sys/bus/pci/drivers/sof-audio-pci-intel-mtl/unbind
    echo 0000:00:1f.3 | sudo tee /sys/bus/pci/drivers/sof-audio-pci-intel-mtl/bind
    
    Firmware re-uploaded (dmesg shows Loaded firmware library: ADSPFW, version: 2.13.0.1 and Booted firmware version: 2.13.0.1). Card 0 came back with five Pro N sinks under the pro-audio profile, but pipeline.15: -28 fired immediately during the topology probe and continued every time any client tried to open a PCM.
  3. systemctl suspend (S3) and resume. No Booted firmware version / Loaded firmware library line appeared in dmesg on this transition, i.e. the firmware was not re-uploaded by S3 resume on this platform. The -ENOSPC errors continued unchanged.

Recovery attempt that did clear the state, without a reboot:

  1. Full unload + reload of the SOF kernel modules. After stopping PipeWire/WirePlumber (services and their sockets) and confirming /dev/snd/* had no holders, sequential rmmod of snd_soc_skl_hda_dsp, snd_sof_probes, snd_hda_intel, snd_sof_pci_intel_mtl (cascading several deps), then snd_sof_intel_hda_common, snd_sof_intel_hda, snd_sof_pci, and finally snd_sof itself, followed by modprobe snd_sof_pci_intel_mtl and restarting the audio stack. Card 0 came back fully functional (HiFi UCM profile present, Speaker/HDMI sinks restored, playback works). This empirically confirms the leaked state lives entirely inside the snd_sof module — see the root cause analysis below for why module unload resets it but PCI rebind doesn't.

Impact

Showstopper for all SOF-routed audio (internal speakers, headphone jack, built-in microphones, HDMI audio) until reboot. USB audio on the dock and Bluetooth audio continue to work because they don't traverse the SOF DSP. In practice it forced me to fall back to Bluetooth or the dock for any audio.

Proof

Pre-trigger context — most recent dmesg activity before the IPC timeout (2026-05-28 14:59:18, dock disconnect, then ~19 min of silence):

maj 28 14:59:18 anka-nixos kernel: pcieport 0000:00:07.0: pciehp: Slot(12): Link Down
maj 28 14:59:18 anka-nixos kernel: pcieport 0000:00:07.0: pciehp: Slot(12): Card not present
maj 28 14:59:18 anka-nixos kernel: xhci_hcd 0000:22:00.0: remove, state 1
maj 28 14:59:18 anka-nixos kernel: usb usb6: USB disconnect, device number 1
maj 28 14:59:18 anka-nixos kernel: thinkpad_acpi: undocked from hotplug port replicator
maj 28 14:59:18 anka-nixos kernel: xhci_hcd 0000:22:00.0: USB bus 6 deregistered
maj 28 14:59:18 anka-nixos kernel: xhci_hcd 0000:22:00.0: remove, state 1
maj 28 14:59:18 anka-nixos kernel: usb usb5: USB disconnect, device number 1
maj 28 14:59:18 anka-nixos kernel: xhci_hcd 0000:22:00.0: USB bus 5 deregistered
maj 28 14:59:18 anka-nixos kernel: pci_bus 0000:22: busn_res: [bus 22] is released
maj 28 14:59:18 anka-nixos kernel: pci_bus 0000:23: busn_res: [bus 23-49] is released
maj 28 14:59:18 anka-nixos kernel: pci_bus 0000:21: busn_res: [bus 21-49] is released
[...no further kernel log entries between 14:59:18 and the IPC timeout at 15:18:57...]

Triggering event — IPC timeout, dmesg, 2026-05-28 15:18:57:

maj 28 15:18:57 anka-nixos kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: ipc timed out for 0x44000007|0x30000018
maj 28 15:18:57 anka-nixos kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: ------------[ IPC dump start ]------------
maj 28 15:18:57 anka-nixos kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: Host IPC initiator: 0x44000007|0x30000018|0x0, target: 0xe4000000|0x30000018|0x0, ctl: 0x3
maj 28 15:18:57 anka-nixos kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: ------------[ IPC dump end ]------------
maj 28 15:18:57 anka-nixos kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: ------------[ DSP dump start ]------------
maj 28 15:18:57 anka-nixos kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: IPC timeout
maj 28 15:18:57 anka-nixos kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: fw_state: SOF_FW_BOOT_COMPLETE (7)
maj 28 15:18:57 anka-nixos kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: 0x50000005: module: ROM_EXT, state: FW_ENTERED, running
maj 28 15:18:57 anka-nixos kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: Firmware state: 0x5, status/error code: 0x0
maj 28 15:18:57 anka-nixos kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: Core dump is not available due to invalid separator 0xc0de
maj 28 15:18:57 anka-nixos kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: ------------[ DSP dump end ]------------
maj 28 15:18:57 anka-nixos kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: sof_ipc4_set_get_data: large config set failed at offset 0: -110
maj 28 15:18:57 anka-nixos kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: Failed to set volume update for Pre Mixer Analog Playback Volume
maj 28 15:18:57 anka-nixos kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: kcontrol 4 set up failed for widget gain.1.1
maj 28 15:18:57 anka-nixos kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: Failed to set up connected widgets
maj 28 15:18:57 anka-nixos kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: error: failed widget list set up for pcm 0 dir 0
maj 28 15:18:57 anka-nixos kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: error: set pcm hw_params after resume
maj 28 15:18:57 anka-nixos kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: ASoC: error at snd_soc_pcm_component_prepare on 0000:00:1f.3: -110

Next-day S3 resume, dmesg, 2026-05-29 08:49:30–08:49:34 (first -ENOSPC, with surrounding suspend-exit context):

maj 29 08:49:30 anka-nixos kernel: i915 0000:00:02.0: [drm] GT0: GuC firmware i915/mtl_guc_70.bin version 70.53.0
maj 29 08:49:30 anka-nixos kernel: i915 0000:00:02.0: [drm] GT1: GuC firmware i915/mtl_guc_70.bin version 70.53.0
maj 29 08:49:30 anka-nixos kernel: i915 0000:00:02.0: [drm] GT1: HuC firmware i915/mtl_huc_gsc.bin version 8.5.4
maj 29 08:49:30 anka-nixos kernel: i915 0000:00:02.0: [drm] GT1: GUC: SLPC enabled
maj 29 08:49:30 anka-nixos kernel: i915 0000:00:02.0: [drm] GT1: GUC: RC enabled
maj 29 08:49:30 anka-nixos kernel: OOM killer enabled.
maj 29 08:49:30 anka-nixos kernel: Restarting tasks ... done.
maj 29 08:49:30 anka-nixos kernel: random: crng reseeded on system resumption
maj 29 08:49:30 anka-nixos kernel: PM: suspend exit
maj 29 08:49:34 anka-nixos kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: failed to assign pipeline id for pipeline.1: -28
maj 29 08:49:34 anka-nixos kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: Failed to set up connected widgets
maj 29 08:49:34 anka-nixos kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: error: failed widget list set up for pcm 0 dir 0
maj 29 08:49:34 anka-nixos kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: ASoC: error at snd_soc_pcm_component_hw_params on 0000:00:1f.3: -28
maj 29 08:49:34 anka-nixos kernel:  HDA Analog: ASoC: error at __soc_pcm_hw_params on HDA Analog: -28
maj 29 08:49:34 anka-nixos kernel:  HDA Analog: ASoC: error at dpcm_fe_dai_hw_params on HDA Analog: -28

Corresponding PipeWire user-journal (journalctl --user -u pipewire) at the same moment:

maj 29 08:49:34 anka-nixos pipewire[1614]: spa.alsa: set_hw_params: No space left on device
maj 29 08:49:34 anka-nixos pipewire[1614]: pw.node: (alsa_output.pci-0000_00_1f.3-platform-skl_hda_dsp_generic.HiFi__Speaker__sink-57) suspended -> error (Start error: No space left on device)
maj 29 08:49:34 anka-nixos pipewire[1614]: pw.link: 0x...: one of the nodes is in error out:suspended in:error
maj 29 08:49:39 anka-nixos pipewire[1614]: pw.node: (alsa_output.pci-0000_00_1f.3-platform-skl_hda_dsp_generic.HiFi__Speaker__sink-57) suspended -> error ((null))
maj 29 08:50:05 anka-nixos pipewire[1614]: spa.alsa: set_hw_params: No space left on device
[same pattern repeats]

Dmesg from the PCI rebind (2026-05-29 14:26:40–14:26:41) showing firmware was re-uploaded but -ENOSPC returned immediately:

sof-audio-pci-intel-mtl 0000:00:1f.3: Firmware paths/files for ipc type 1:
sof-audio-pci-intel-mtl 0000:00:1f.3:  Firmware file:     intel/sof-ipc4/mtl/sof-mtl.ri
sof-audio-pci-intel-mtl 0000:00:1f.3:  Firmware lib path: intel/sof-ipc4-lib/mtl
sof-audio-pci-intel-mtl 0000:00:1f.3:  Topology file:     intel/sof-ace-tplg/sof-hda-generic-2ch.tplg
sof-audio-pci-intel-mtl 0000:00:1f.3: Loaded firmware library: ADSPFW, version: 2.13.0.1
sof-audio-pci-intel-mtl 0000:00:1f.3: Booted firmware version: 2.13.0.1
sof-audio-pci-intel-mtl 0000:00:1f.3: Topology: ABI 3:29:1 Kernel ABI 3:23:1
sof-audio-pci-intel-mtl 0000:00:1f.3: failed to assign pipeline id for pipeline.15: -28
sof-audio-pci-intel-mtl 0000:00:1f.3: Failed to set up connected widgets
sof-audio-pci-intel-mtl 0000:00:1f.3: error: failed widget list set up for pcm 31 dir 0
sof-audio-pci-intel-mtl 0000:00:1f.3: ASoC: error at snd_soc_pcm_component_hw_params on 0000:00:1f.3: -28

amixer -c 0 in the broken state:

Simple mixer control 'Master',0
  Mono: Playback 0 [0%] [-65.25dB] [off]
Simple mixer control 'Headphone',0
  Front Left: Playback 87 [100%] [0.00dB] [off]
  Front Right: Playback 87 [100%] [0.00dB] [off]
Simple mixer control 'Speaker',0
  Front Left: Playback 87 [100%] [0.00dB] [off]
  Front Right: Playback 87 [100%] [0.00dB] [off]
Simple mixer control 'IEC958',0
  Mono: Playback [off]
Simple mixer control 'Capture',0
  Front Left: Capture 0 [0%] [-17.25dB] [off]
  Front Right: Capture 0 [0%] [-17.25dB] [off]
Simple mixer control 'Auto-Mute Mode',0
  Item0: 'Enabled'
Simple mixer control 'Dmic0',0
  Front Left: Capture 45 [100%] [0.00dB] [off]
  Front Right: Capture 45 [100%] [0.00dB] [off]
[…similar pattern; everything routed through SOF is [off]; full output attached]

Master is muted, every playback/capture switch is [off]. Included for reference per the bug-tracking guide.

Attachments

I will attach (or can attach on request):

  • Full journalctl -k -b from the affected boot.
  • Full journalctl --user -u pipewire from the affected boot.
  • Full amixer -c 0 output.
  • Decompressed topology source (sof-hda-generic-2ch.tplg, strings-extracted) showing pipeline → widget mapping used for the analysis below.
  • Binary /sys/kernel/debug/sof/{pp,exception,fw_regs,dsp,hda} captured while the bug was active (I do not have a matching .ldc to decode them locally; sof-tools 2.10 on this system is older than the booted firmware 2.13.0.1, and this kernel build does not expose etrace / trace either, so sof-logger cannot be run live).

Missing data

No live sof-logger trace is available for either the original IPC timeout or the current broken state — this kernel build does not expose etrace / trace under /sys/kernel/debug/sof/, and the sof-tools dictionary on the system (sof-tools 2.10) is older than the booted firmware (2.13.0.1) anyway. No core dump was generated: dmesg shows Core dump is not available due to invalid separator 0xc0de.


Root cause analysis (from Claude — read with scepticism)

This entire report — log capture, analysis, the source walk below — was drafted by Claude with me reviewing. Below, "verified" means Claude read the actual source listed against the file paths; "inferred" means it followed from the source plus the observed symptoms but has not been reproduced with a kernel-side patch yet. I have not independently audited the source claims — please verify.

Verified against linux-stable v6.12.90 (the booted kernel) and cross-checked against thesofproject/linux topic/sof-dev HEAD:

  • The kernel log line failed to assign pipeline id for pipeline.N: -28 is emitted from sound/soc/sof/ipc4-topology.c:2637, immediately after ida_alloc_max(&pipeline_ida, ipc4_data->max_num_pipelines, GFP_KERNEL) returns -ENOSPC. The IPC for pipeline-create is never sent to the DSP in this path — it returns before the sof_ipc_tx_message_no_reply call at line 2758. So the failure is host-side ID-pool exhaustion, not a DSP-side resource allocator failure.
  • pipeline_ida is static DEFINE_IDA(pipeline_ida); at sound/soc/sof/ipc4-topology.c:37 — file-scope, module-global. It is not part of struct snd_sof_dev and is not reinitialised on PCI unbind/bind. Only ida_destroy at module unload would reset it. This explains why both the PCI rebind and the S3 cycle failed to recover.
  • No IPC4 firmware reply status maps to -ENOSPC in the kernel's IPC4 status-to-errno table (sound/soc/sof/ipc4.c:106–127); the default mapping is -EINVAL. So the -28 cannot have come from a DSP reply — it has to be the IDA, which matches the call-site reading.
  • The same sof_widget_setup_unlocked code (the suspect leak path described below) is present in current topic/sof-dev HEAD with no behavioural change since v6.12.90 — i.e. if the analysis is correct, the upstream SOF tree is also vulnerable.

Inferred (consistent with source and symptoms, not yet reproduced under a patch):

  • The timed-out IPC 0x44000007|0x30000018 was a LARGE_CONFIG_SET for the Pre Mixer Analog Playback Volume kcontrol (gain.1.1) — this is what the immediately-following kernel lines say (sof_ipc4_set_get_data: large config set failed at offset 0: -110 / Failed to set volume update for Pre Mixer Analog Playback Volume / kcontrol 4 set up failed for widget gain.1.1). So the IPC that timed out was a kcontrol restore during post-resume widget-list setup, not a pipeline-create.
  • The leak appears to be in sof_widget_setup_unlocked (sound/soc/sof/sof-audio.c:134). For a dynamic pipeline widget (in this topology, gain.1.1 and its peers), the setup path is:
    1. Recursive call at line 167 sets up the scheduler pipe_widget first → pipe_widget.use_count: 0→1, IDA allocated at ipc4-topology.c:2634, Create Pipeline IPC sent and replied OK.
    2. swidget.use_count: 0→1, tplg_ops->widget_setup(gain) succeeds.
    3. widget_kcontrol_setup(gain) is called at line 208. This is where the LARGE_CONFIG_SET was sent and timed out, returning -ETIMEDOUT.
    4. Control jumps to the widget_free: label at line 217.
    5. widget_free: calls sof_widget_free_unlocked(swidget=gain):
      • gain.use_count: 1→0, proceeds to free.
      • Sends Delete Module Instance for the gain.
      • Reaches sof-audio.c:109. Because gain.dynamic_pipeline_widget == true and gain.id != snd_soc_dapm_scheduler, recursively calls sof_widget_free_unlocked(pipe_widget=scheduler):
        • scheduler.use_count: 1→0, proceeds.
        • Sends Delete Pipeline, frees the IDA at ipc4-topology.c:2809. ← correct, first free.
    6. Falls through (no goto) into the pipe_widget_free: label at line 221.
    7. swidget=gain is not a scheduler, so it calls sof_widget_free_unlocked(pipe_widget=scheduler) a second time.
      • At line 58: if (--swidget->use_count) return 0;scheduler.use_count goes 0 → -1, the early-return fires, no IPC is sent, no IDA is freed. But use_count is now stuck at -1 for the lifetime of this snd_sof_widget.
  • After this corruption, every subsequent setup/teardown cycle of that scheduler leaks one ID from the global pipeline_ida:
    1. Setup: line 150 increments to 0. if (0 > 1) is false, so it does not treat the widget as already-set-up and proceeds with a fresh ida_alloc_max + Create Pipeline. Use-count is now 0 instead of the expected 1.
    2. Teardown: line 58 decrements to -1. if (-1) is truthy → early-returns. Delete Pipeline is not sent and ida_free is not called.
  • Per cycle, this leaks exactly one ID from pipeline_ida. After enough resume/teardown cycles (Claude estimates max_num_pipelines is roughly mid-double-digits on MTL, the user did many lid suspend/resume cycles in the ~17 h between the trigger and the next-day failure) the global pool fills and every ida_alloc_max returns -ENOSPC.

Why this matches every observed symptom:

  • The trigger is specifically a failure inside the widget_free: label fall-through (i.e. dai_config or widget_kcontrol_setup failing after the recursive pipe_widget setup succeeded). A failure of tplg_ops->widget_setup (line 186) jumps directly to pipe_widget_free: and is not affected. The user's log shows the failure at widget_kcontrol_setup — the exact path.
  • "Survives PCI rebind, survives S3" — pipeline_ida is module-global; neither operation resets it.
  • "Different pipeline IDs eventually all fail" — pool is global across all schedulers, so once exhausted, every scheduler's ida_alloc_max fails.
  • "Failing pipeline number tracks the host's request" — the number in the error message is swidget->widget->name (topology name), not the IDA value; the IDA was the negative return.

Suggested patch (drafted by Claude, untested — neither compiled nor run):

Made against linux-stable tag v6.12.90 (commit 2538fbeff8a94ee2b54eb09d92209e24a1e650d4, the running kernel). Same patch also applies to thesofproject/linux topic/sof-dev at commit 3a0f2aeac2e3a8020488c21afef5b483027514fc (HEAD as of 2026-05-29) — Claude diffed the surrounding region in both trees and the context is identical.

Skip the pipe_widget_free: label when the inner sof_widget_free_unlocked already propagated the free to pipe_widget via the dynamic_pipeline_widget branch at sof-audio.c:109. The non-dynamic and scheduler-itself paths still need pipe_widget_free: for the core-refcount decrement, so the label can't simply be removed.

diff --git a/sound/soc/sof/sof-audio.c b/sound/soc/sof/sof-audio.c
--- a/sound/soc/sof/sof-audio.c
+++ b/sound/soc/sof/sof-audio.c
@@ -215,9 +215,20 @@ static int sof_widget_setup_unlocked(struct snd_sof_dev *sdev,
 	return 0;
 
 widget_free:
-	/* widget use_count will be decremented by sof_widget_free() */
+	/*
+	 * widget use_count will be decremented by sof_widget_free_unlocked().
+	 * For a dynamic non-scheduler widget, that call also recursively
+	 * frees swidget->spipe->pipe_widget (see the dynamic_pipeline_widget
+	 * branch in sof_widget_free_unlocked()), so we must skip the
+	 * pipe_widget_free label below — otherwise pipe_widget is freed
+	 * twice, its use_count underflows to -1, and subsequent
+	 * setup/teardown cycles leak pipeline IDs from pipeline_ida.
+	 */
 	sof_widget_free_unlocked(sdev, swidget);
 	use_count_decremented = true;
+	if (swidget->dynamic_pipeline_widget &&
+	    swidget->id != snd_soc_dapm_scheduler)
+		goto use_count_dec;
 pipe_widget_free:
 	if (swidget->id != snd_soc_dapm_scheduler) {
 		sof_widget_free_unlocked(sdev, swidget->spipe->pipe_widget);

Verified with git apply --check against both trees (clean apply, exit 0 each).

Still to confirm (would benefit from someone with SOF familiarity):

  • Whether widget_kcontrol_setup is in fact called the way I described in the resume path for HDA-generic-2ch — i.e. whether the LARGE_CONFIG_SET for gain.1.1 really runs from sof_widget_setup_unlocked line 208 in this scenario.
  • Whether max_num_pipelines reported by firmware 2.13.0.1 on MTL is small enough that ~tens of leaks suffice to exhaust the pool. (SOF_IPC4_FW_CFG_MAX_PPL_COUNT is reported per-boot; I don't have it captured in dmesg from this boot.)
  • Whether the original IPC timeout itself has a separate root cause worth chasing, independent of the leak it triggers. The DSP reported running after the timeout, so it was almost certainly a kernel↔DSP scheduling/timing issue and not a DSP crash — but I haven't investigated further.

Topology context (for cross-reference):

  • Topology strings (extracted from sof-hda-generic-2ch.tplg) confirm pipeline.1 is the Analog Playback front-end (gain.1.1mixin.1.1pipeline.1dai-copier.HDA.Analog.playback) — the same path whose volume restore timed out. pipeline.15 is the Deepbuffer HDA Analog playback front-end (gain.15.1 "Pre Mixer Deepbuffer HDA Analog Volume" → mixin.15.1pipeline.15). HDMI uses entirely different pipelines (pipeline.50/.51/.60/.61/.70/.71 via dai-copier.HDA.iDisp{1,2,3}.playback).
  • The HiFi UCM profile disappearing after the pipewire restart is plausibly a downstream consequence of the analog-playback topology probe failing — if pcm 0 setup must succeed for HiFi to register, the failure would leave only pro-audio. Not verified against UCM source.
  • The original timeout happened on a DSP runtime resume, not an S3 system resume — no PM: suspend exit appears near 15:18:57. The most recent external event in dmesg is the dock disconnect 19 minutes earlier at 14:59:18. The next-day failure at 08:49:34 is an S3 resume.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working as expected

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions