Reapply "A few fixes in the threadpool semaphore. Unify Windows/Unix implementation of LIFO policy." (#125193)#125596
Reapply "A few fixes in the threadpool semaphore. Unify Windows/Unix implementation of LIFO policy." (#125193)#125596VSadov wants to merge 9 commits into
Conversation
|
Tagging subscribers to this area: @agocke, @VSadov |
There was a problem hiding this comment.
Pull request overview
Reapplies and reworks the ThreadPool LIFO semaphore changes (previously reverted due to NuGet restore regressions) by unifying the blocking/wake implementation across Windows and Unix using OS compare-and-wait primitives (WaitOnAddress / futex) with a monitor fallback.
Changes:
- Adds low-level compare-and-wait interop for Windows (WaitOnAddress) and Linux (futex) and wires them through System.Native/CoreLib.
- Replaces the prior per-OS
LowLevelLifoSemaphoreimplementations with a unified managed implementation using a LIFO stack of per-thread blockers plus updated spin/backoff behavior. - Adjusts worker dispatch heuristics (missed-steal handling) and configuration plumbing (cooperative blocking env var alias).
Reviewed changes
Copilot reviewed 23 out of 23 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| src/native/libs/System.Native/pal_threading.h | Adds exported futex-related entrypoints to the System.Native PAL surface. |
| src/native/libs/System.Native/pal_threading.c | Implements Linux futex wait/wake wrappers; provides non-Linux stubs. |
| src/native/libs/System.Native/entrypoints.c | Registers the new futex entrypoints for managed interop. |
| src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs | Adds a 1ms sleep before requesting workers when a steal was missed. |
| src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.WorkerThread.cs | Switches the worker wait to the new LowLevelLifoSemaphore.Wait(timeout, activeThreadCount) signature; removes old spin-limit wiring at the call site. |
| src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.Blocking.cs | Adds the DOTNET_ThreadPool_CooperativeBlocking env var alias for cooperative blocking. |
| src/libraries/System.Private.CoreLib/src/System/Threading/LowLevelThreadBlocker.cs | Introduces a portable blocker abstraction (futex/WaitOnAddress or LowLevelMonitor fallback). |
| src/libraries/System.Private.CoreLib/src/System/Threading/LowLevelLifoSemaphore.cs | Replaces OS-specific semaphore core with a unified managed LIFO implementation + updated spin heuristic and wake accounting. |
| src/libraries/System.Private.CoreLib/src/System/Threading/LowLevelLifoSemaphore.Windows.cs | Removes the prior Windows IOCP-based LIFO semaphore implementation. |
| src/libraries/System.Private.CoreLib/src/System/Threading/LowLevelLifoSemaphore.Unix.cs | Removes the prior Unix WaitSubsystem-based semaphore implementation. |
| src/libraries/System.Private.CoreLib/src/System/Threading/LowLevelFutex.Windows.cs | Adds Windows WaitOnAddress-based compare-and-wait wrapper. |
| src/libraries/System.Private.CoreLib/src/System/Threading/LowLevelFutex.Unix.cs | Adds Unix futex wrapper (currently Linux-only per comments). |
| src/libraries/System.Private.CoreLib/src/System/Threading/Backoff.cs | Updates exponential backoff to return spin count and reduces max backoff. |
| src/libraries/System.Private.CoreLib/src/System.Private.CoreLib.Shared.projitems | Wires new threading and interop files into CoreLib build (adds/removes Compile items). |
| src/libraries/Common/src/Interop/Windows/Mincore/Interop.WaitOnAddress.cs | Adds LibraryImport declarations for WaitOnAddress/WakeByAddressSingle. |
| src/libraries/Common/src/Interop/Windows/Kernel32/Interop.CriticalSection.cs | Adds SuppressGCTransition on LeaveCriticalSection. |
| src/libraries/Common/src/Interop/Windows/Interop.Libraries.cs | Adds the Synch API-set library constant for WaitOnAddress imports. |
| src/libraries/Common/src/Interop/Unix/System.Native/Interop.LowLevelMonitor.cs | Adds SuppressGCTransition on LowLevelMonitor_Release. |
| src/libraries/Common/src/Interop/Unix/System.Native/Interop.Futex.cs | Adds LibraryImport declarations for futex wait/wake entrypoints. |
| src/coreclr/tools/aot/ILCompiler/reproNative/reproNative.vcxproj | Adds Synchronization.lib to link set for NativeAOT repro project. |
| src/coreclr/nativeaot/BuildIntegration/WindowsAPIs.txt | Allows WaitOnAddress/WakeByAddressSingle through the NativeAOT Windows API allowlist. |
| src/coreclr/nativeaot/BuildIntegration/Microsoft.NETCore.Native.Windows.targets | Adds Synchronization.lib to NativeAOT SDK library list. |
| docs/coding-guidelines/interop-guidelines.md | Updates interop guideline examples to match casing/structure and adds Synch library mention. |
You can also share your feedback on Copilot code review. Take the survey.
There was a problem hiding this comment.
Pull request overview
Reapplies and extends the threadpool semaphore/LIFO-policy changes that were previously reverted due to NuGet restore performance regression, by unifying the Windows/Unix implementation around a shared managed LIFO waiter stack and adding low-level wait/wake primitives (Linux futex, Windows WaitOnAddress) plus supporting interop/AOT wiring.
Changes:
- Add Linux futex exports in System.Native and corresponding managed interop; add Windows WaitOnAddress interop and link inputs for NativeAOT.
- Replace platform-specific
LowLevelLifoSemaphoreimplementations with a unified managed implementation built onLowLevelThreadBlocker. - Adjust threadpool behavior around missed steals (including a brief delay) and tweak a blocking config switch plumbing.
Reviewed changes
Copilot reviewed 23 out of 23 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/native/libs/System.Native/pal_threading.h | Adds futex-related PALEXPORT declarations. |
| src/native/libs/System.Native/pal_threading.c | Implements Linux futex wait/wake syscalls (with non-Linux stubs). |
| src/native/libs/System.Native/entrypoints.c | Exposes futex entrypoints via DllImportEntry. |
| src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs | Adds a delay before requesting a worker when missed steals occur. |
| src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.WorkerThread.cs | Switches semaphore construction/Wait signature; minor comment fix. |
| src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.Blocking.cs | Adds env-var name to cooperative blocking config lookup. |
| src/libraries/System.Private.CoreLib/src/System/Threading/LowLevelThreadBlocker.cs | Introduces a portable blocker using futex/WaitOnAddress or monitor fallback. |
| src/libraries/System.Private.CoreLib/src/System/Threading/LowLevelLifoSemaphore.cs | Reworks semaphore into a single managed implementation with LIFO waiter stack + spin heuristic. |
| src/libraries/System.Private.CoreLib/src/System/Threading/LowLevelLifoSemaphore.Windows.cs | Removes prior Windows IOCP-based implementation. |
| src/libraries/System.Private.CoreLib/src/System/Threading/LowLevelLifoSemaphore.Unix.cs | Removes prior Unix WaitSubsystem-based implementation. |
| src/libraries/System.Private.CoreLib/src/System/Threading/LowLevelFutex.Windows.cs | Adds Windows WaitOnAddress/WakeByAddressSingle wrapper. |
| src/libraries/System.Private.CoreLib/src/System/Threading/LowLevelFutex.Unix.cs | Adds Linux futex wrapper (Linux-only). |
| src/libraries/System.Private.CoreLib/src/System/Threading/Backoff.cs | Changes exponential backoff to return spin count and reduces cap. |
| src/libraries/System.Private.CoreLib/src/System.Private.CoreLib.Shared.projitems | Wires new threading/interop files into CoreLib build and removes old semaphore OS-specific files. |
| src/libraries/Common/src/Interop/Windows/Mincore/Interop.WaitOnAddress.cs | Adds LibraryImport for WaitOnAddress/WakeByAddressSingle. |
| src/libraries/Common/src/Interop/Windows/Kernel32/Interop.CriticalSection.cs | Adds SuppressGCTransition to LeaveCriticalSection. |
| src/libraries/Common/src/Interop/Windows/Interop.Libraries.cs | Adds Libraries.Synch constant for the synch api-set. |
| src/libraries/Common/src/Interop/Unix/System.Native/Interop.LowLevelMonitor.cs | Adds SuppressGCTransition to LowLevelMonitor_Release. |
| src/libraries/Common/src/Interop/Unix/System.Native/Interop.Futex.cs | Adds LibraryImport declarations for System.Native futex exports. |
| src/coreclr/tools/aot/ILCompiler/reproNative/reproNative.vcxproj | Links Synchronization.lib for WaitOnAddress/WakeByAddressSingle. |
| src/coreclr/nativeaot/BuildIntegration/WindowsAPIs.txt | Adds WaitOnAddress/WakeByAddressSingle to the NativeAOT Windows API list. |
| src/coreclr/nativeaot/BuildIntegration/Microsoft.NETCore.Native.Windows.targets | Adds Synchronization.lib to SDK native libraries for NativeAOT. |
| docs/coding-guidelines/interop-guidelines.md | Updates interop naming/examples (e.g., Mincore, Synch). |
You can also share your feedback on Copilot code review. Take the survey.
There was a problem hiding this comment.
Pull request overview
This PR re-applies and extends prior thread pool semaphore/LIFO work distribution changes by introducing a unified, portable blocking/wake mechanism (futex/WaitOnAddress where available) and reworking the CoreLib LowLevelLifoSemaphore implementation to use it.
Changes:
- Add low-level futex-style wait/wake exports on Linux in
System.Nativeand corresponding managed interop (plus Windows WaitOnAddress interop). - Replace the platform-specific
LowLevelLifoSemaphoreimplementations with a unified CoreLib implementation based on a LIFO stack of per-thread blockers. - Adjust thread pool worker dispatch/wait logic to differentiate spurious dispatches from regular dispatch and change spin/parking behavior accordingly.
Reviewed changes
Copilot reviewed 23 out of 23 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/native/libs/System.Native/pal_threading.h | Adds new exported futex wait/wake APIs. |
| src/native/libs/System.Native/pal_threading.c | Implements Linux futex syscalls and non-Linux stubs. |
| src/native/libs/System.Native/entrypoints.c | Exposes the new futex entrypoints to managed code. |
| src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs | Changes dispatch return type to a new DispatchResult enum. |
| src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.WorkerThread.cs | Updates worker loop to use DispatchResult and to park without spin when idle. |
| src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.Blocking.cs | Adds DOTNET_ env-var alias for cooperative blocking config. |
| src/libraries/System.Private.CoreLib/src/System/Threading/LowLevelThreadBlocker.cs | New portable thread blocker abstraction using futex/WaitOnAddress/monitor. |
| src/libraries/System.Private.CoreLib/src/System/Threading/LowLevelLifoSemaphore.cs | Replaces prior OS-based semaphore with unified LIFO blocker-stack implementation + new spin heuristic. |
| src/libraries/System.Private.CoreLib/src/System/Threading/LowLevelFutex.Windows.cs | New Windows WaitOnAddress wrapper for futex-like operations. |
| src/libraries/System.Private.CoreLib/src/System/Threading/LowLevelFutex.Unix.cs | New Unix futex wrapper (Linux-backed via System.Native). |
| src/libraries/System.Private.CoreLib/src/System/Threading/Backoff.cs | Changes exponential backoff to return spin count and reduces max cap. |
| src/libraries/System.Private.CoreLib/src/System/Threading/LowLevelLifoSemaphore.Windows.cs | Deleted (replaced by unified implementation). |
| src/libraries/System.Private.CoreLib/src/System/Threading/LowLevelLifoSemaphore.Unix.cs | Deleted (replaced by unified implementation). |
| src/libraries/System.Private.CoreLib/src/System.Private.CoreLib.Shared.projitems | Wires in new threading + interop sources. |
| src/libraries/Common/src/Interop/Windows/Mincore/Interop.WaitOnAddress.cs | Adds P/Invokes for WaitOnAddress/WakeByAddressSingle. |
| src/libraries/Common/src/Interop/Windows/Kernel32/Interop.CriticalSection.cs | Adds SuppressGCTransition to LeaveCriticalSection import. |
| src/libraries/Common/src/Interop/Windows/Interop.Libraries.cs | Adds Libraries.Synch constant for the synch API-set DLL. |
| src/libraries/Common/src/Interop/Unix/System.Native/Interop.LowLevelMonitor.cs | Adds SuppressGCTransition to LowLevelMonitor_Release import. |
| src/libraries/Common/src/Interop/Unix/System.Native/Interop.Futex.cs | Adds System.Native futex interop declarations. |
| src/coreclr/tools/aot/ILCompiler/reproNative/reproNative.vcxproj | Adds Synchronization.lib to native repro project libs. |
| src/coreclr/nativeaot/BuildIntegration/WindowsAPIs.txt | Adds WaitOnAddress/WakeByAddressSingle to allowed API list. |
| src/coreclr/nativeaot/BuildIntegration/Microsoft.NETCore.Native.Windows.targets | Ensures Synchronization.lib is linked for NativeAOT. |
| docs/coding-guidelines/interop-guidelines.md | Updates documentation for Mincore casing/folder layout and Synch library constant. |
| if (null != workStealingQueue) | ||
| { | ||
| TransferLocalWork(); | ||
| ThreadPoolWorkQueue.TransferAllLocalWorkItemsToHighPriorityGlobalQueue(); |
| // Currently where this type is used, queued work is expected to be processed | ||
| // at high priority. The implementation could be modified to support different | ||
| // priorities if necessary. |
| if (!HasWaitersToWake(countsBeforeUpdate)) | ||
| break; | ||
|
|
||
| // CAS collision, but still have waters to wake, try again. |
| public void Signal() | ||
| { | ||
| // Increment signal count. This enables one-shot acquire. | ||
| Counts counts = _separated._counts.InterlockedIncrementSignalCount(); |
| { | ||
| return counts.SignalCount != 0 || WaitForSignal(timeoutMs); | ||
| WakeOne(); | ||
| break; |
| // LowLevelLock release is a full fence thus ordinary read of _pendingWake is ok | ||
| if (_pendingWake > 0) |
| if (!HasWaitersToWake(countsBeforeUpdate)) | ||
| break; | ||
|
|
||
| // CAS collision, but still have waters to wake, try again. |
…implementation of LIFO policy." (dotnet#125193) This reverts commit 51b1e92.
| if (!HasWaitersToWake(countsBeforeUpdate)) | ||
| break; | ||
|
|
||
| // CAS collision, but still have waters to wake, try again. |
| internal const string Ucrtbase = "ucrtbase.dll"; | ||
| internal const string Xolehlp = "xolehlp.dll"; | ||
| internal const string Comdlg32 = "comdlg32.dll"; | ||
| internal const string Gdiplus = "gdiplus.dll"; | ||
| internal const string Oleaut32 = "oleaut32.dll"; | ||
| internal const string Winspool = "winspool.drv"; | ||
| internal const string Synch = "api-ms-win-core-synch-l1-2-0.dll"; | ||
| } |
| @@ -94,12 +96,9 @@ internal static partial class Interop // contents of Common\src\Interop\Windows\ | |||
| private static class Libraries | |||
| { | |||
| internal const string Kernel32 = "kernel32.dll"; | |||
| internal const string OleAut32 = "oleaut32.dll"; | |||
| internal const string Localization = "api-ms-win-core-localization-l1-2-0.dll"; | |||
| internal const string Handle = "api-ms-win-core-handle-l1-1-0.dll"; | |||
| internal const string ProcessThreads = "api-ms-win-core-processthreads-l1-1-0.dll"; | |||
| internal const string File = "api-ms-win-core-file-l1-1-0.dll"; | |||
| internal const string NamedPipe = "api-ms-win-core-namedpipe-l1-1-0.dll"; | |||
| internal const string IO = "api-ms-win-core-io-l1-1-0.dll"; | |||
| internal const string Synch = "api-ms-win-core-synch-l1-2-0.dll"; | |||
| ... | |||
| } | |||
| } | |||
| private readonly short _maxSpinCount; | ||
| private readonly short _threadWakeCooldownUsec; | ||
| private readonly Action _onWait; | ||
|
|
| // The spin count is chosen to be in the range of typical thread wake latency and some additional overhead, | ||
| // all assuming a single spin is calibrated to around 35 nanoseconds. | ||
| // The thread wake latency commonly measures at 2-10 microsecond (year 2026) and unlikely to drastically change. | ||
| private const int DefaultSemaphoreSpinCountLimit = 256; | ||
| // The cooldown roughly serves as detection that the thread did not spend time being blocked. | ||
| // If it woke in under 4 microseconds, it was likely a fast/trivial wake without blocking. | ||
| private const int DefaultWakeCooldown = 4; | ||
|
|
| \Interop.DuplicateHandle_SafeTokenHandle.cs | ||
| \Interop.DuplicateHandle_IntPtr.cs | ||
| \Kernel32 | ||
| \Interop.DuplicateHandle_SafeFileHandle.cs |
| #else// TARGET_LINUX | ||
|
|
||
| #pragma clang diagnostic push | ||
| #pragma clang diagnostic ignored "-Wunused-parameter" | ||
| #pragma clang diagnostic ignored "-Wmissing-noreturn" | ||
| void SystemNative_LowLevelFutex_WaitOnAddress(int32_t* address, int32_t comparand) | ||
| { |
|
closing this as there is too much of now irrelevant comments/context. Will open a new one. New: #128606 |
Re: #125193
TODO: need to confirm that NuGet restore performance is ok with the updated change.