Skip to content

[clr-interp] Implement Managed Return Value support for the interpreter#127760

Draft
kotlarmilos wants to merge 3 commits into
dotnet:mainfrom
kotlarmilos:interp-mrv-return-value
Draft

[clr-interp] Implement Managed Return Value support for the interpreter#127760
kotlarmilos wants to merge 3 commits into
dotnet:mainfrom
kotlarmilos:interp-mrv-return-value

Conversation

@kotlarmilos
Copy link
Copy Markdown
Member

@kotlarmilos kotlarmilos commented May 4, 2026

Description

GetReturnValueLiveOffset returns the native offsets at which the return value of a call site is live, so the debugger can show the value a method just returned when stepping out. It walks the IL to native map looking for CALL_INSTRUCTION entries to find the post-call IP, then reads the value from the return register. The interpreter does not emit CALL_INSTRUCTION map entries and does not return values in registers, so the feature did not work for any interpreted method.

EmitCodeIns now emits a CALL_INSTRUCTION map entry for each managed call opcode. On the DI side, CordbNativeCode::GetReturnValueLiveOffsetImpl recognizes these entries and computes the post-call IP using a per-opcode slot length table built from intops.def. A new CordbNativeCode::GetInterpreterCallDvarOffset decodes the call at that IP and returns the FP-relative byte offset of the call's destination var.

The IL to native map previously emitted entries only at IL offsets where the evaluation stack was empty. It now emits one entry per IL offset and tags STACK_EMPTY only when the stack actually was empty. The stepper only stops on STACK_EMPTY, so stepping is unchanged, but breakpoint binding and IP to source mapping now also work at non-empty-stack offsets.

This fixes the following interpreter debugger test failures:

  • MRV.TestArrays
  • MRV.TestAsyncMethods
  • MRV.TestClasses
  • MRV.TestComplexGenerics
  • MRV.TestEnums
  • MRV.TestGenerics
  • MRV.TestMisc
  • MRV.TestNativeCalls
  • MRV.TestPointers
  • MRV.TestPrimitiveTypes
  • MRV.TestReflectionWithAbstractMethod
  • MRV.TestRefs
  • MRV.TestStructs

Copilot AI review requested due to automatic review settings May 4, 2026 15:11
@kotlarmilos kotlarmilos changed the title [interp] Implement Managed Return Value support for the interpreter [clr-interp] Implement managed return value support for the interpreter May 4, 2026
@kotlarmilos kotlarmilos changed the title [clr-interp] Implement managed return value support for the interpreter [clr-interp] Implement Managed Return Value support for the interpreter May 4, 2026
@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @BrzVlad, @janvorli, @kg
See info in area-owners.md if you want to be subscribed.

@kotlarmilos kotlarmilos added this to the 11.0.0 milestone May 4, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends CoreCLR interpreter debug info so managed return values can be surfaced for interpreted calls, and broadens IL/native mapping beyond stack-empty offsets. It fits into the debugger/interpreter pipeline by teaching the interpreter to emit richer boundary data and teaching DI to decode interpreter callsites.

Changes:

  • Adds interpreter-side IL/native map capacity tracking and emits per-IL-offset mappings plus CALL_INSTRUCTION entries for interpreter call opcodes.
  • Adds DI-side interpreter opcode-length decoding and synthetic stack-home lookup for interpreter return values.
  • Wires CordbJITILFrame to read interpreted return values from FP-relative dvars instead of native return registers.

Review found a blocking correctness issue: the new CALL_INSTRUCTION tagging in compiler.cpp also marks compiler-inserted helper calls (for example the helper emitted by EmitPushLdvirtftn), so GetReturnValueLiveOffset can report extra offsets for a single IL callsite and one of those offsets corresponds to the helper’s function-pointer result rather than the user call’s return value.

Show a summary per file
File Description
src/coreclr/interpreter/compiler.h Adds native-map capacity tracking for interpreter-emitted debug boundaries.
src/coreclr/interpreter/compiler.cpp Emits expanded IL/native mappings and interpreter callsite markers.
src/coreclr/debug/di/rsthread.cpp Reads interpreted return values from synthetic FP-relative variable homes.
src/coreclr/debug/di/rspriv.h Declares the new interpreter-specific DI helper.
src/coreclr/debug/di/module.cpp Decodes interpreter callsite lengths and dvar offsets in DI.

Copilot's findings

  • Files reviewed: 5/5 changed files
  • Comments generated: 1

Comment thread src/coreclr/interpreter/compiler.cpp Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

  • Files reviewed: 5/5 changed files
  • Comments generated: 3

Comment thread src/coreclr/interpreter/compiler.cpp Outdated
Comment thread src/coreclr/interpreter/compiler.cpp
Comment thread src/coreclr/interpreter/compiler.cpp Outdated
@kotlarmilos kotlarmilos force-pushed the interp-mrv-return-value branch from 3c1ac90 to 9f2787a Compare May 5, 2026 14:09
Copilot AI review requested due to automatic review settings May 5, 2026 14:15
@kotlarmilos kotlarmilos force-pushed the interp-mrv-return-value branch from 9f2787a to 946174b Compare May 5, 2026 14:15
@kotlarmilos kotlarmilos force-pushed the interp-mrv-return-value branch from 946174b to 9bffeb9 Compare May 5, 2026 14:15
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

  • Files reviewed: 5/5 changed files
  • Comments generated: 1

Comment thread src/coreclr/interpreter/compiler.cpp Outdated
Comment thread src/coreclr/debug/di/module.cpp Outdated
Comment thread src/coreclr/debug/di/rsthread.cpp Outdated
Comment thread src/coreclr/interpreter/compiler.h Outdated
Comment thread src/coreclr/interpreter/compiler.cpp Outdated
@kotlarmilos kotlarmilos force-pushed the interp-mrv-return-value branch from 9bffeb9 to 4d2200a Compare May 6, 2026 09:59
Comment thread src/coreclr/debug/di/module.cpp Outdated
Comment thread src/coreclr/debug/di/module.cpp Outdated
GetReturnValueLiveOffset returns the native offsets at which the return value of a call site is live, so the debugger can show the value a method just returned when stepping out. It walks the IL to native map looking for CALL_INSTRUCTION-tagged entries to find the post-call IP, then reads the value from the architectural return register. The interpreter does not emit CALL_INSTRUCTION map entries and does not return values in registers, so the feature did not work for any interpreted method.

This change wires it up. EmitCodeIns now emits a CALL_INSTRUCTION-flagged map entry for each managed call opcode (INTOP_CALL, INTOP_CALLVIRT, INTOP_CALLI, etc.). On the DI side, CordbNativeCode::GetReturnValueLiveOffsetImpl recognizes these entries and computes the post-call IP using a per-opcode slot length table built from intops.def. A new CordbNativeCode::GetInterpreterCallDvarOffset decodes the call at that IP and returns the FP-relative byte offset of the call's destination var. CordbJITILFrame::GetReturnValueForILOffsetImpl then fabricates NativeVarInfo{VLT_STK, REGNUM_FP, dvarOffset} and routes through the existing GetNativeVariable, which already reads FP-relative slots correctly out of an interpreter frame.

Two compiler-side cleanups come along with it. The IL to native map previously emitted entries only at IL offsets where the evaluation stack was empty. It now emits one entry per IL offset and tags STACK_EMPTY only when the stack actually was empty. The stepper only stops on STACK_EMPTY, so stepping is unchanged, but breakpoint binding and IP to source mapping now also work at non-empty-stack offsets. Separately, NewIns was burning m_isFirstInstForEmptyILStack on emit-nop pseudo-ops like INTOP_DEF, which left the first real IR at the same IL offset (the leading newobj of `var x = new Foo();` for example) untagged and made step-over skip past such statements. The flag is now consumed only when a real IR instruction is emitted.

This fixes the following interpreter debugger test failures:
- MRV.TestArrays
- MRV.TestAsyncMethods
- MRV.TestClasses
- MRV.TestComplexGenerics
- MRV.TestEnums
- MRV.TestGenerics
- MRV.TestMisc
- MRV.TestNativeCalls
- MRV.TestPointers
- MRV.TestPrimitiveTypes
- MRV.TestReflectionWithAbstractMethod
- MRV.TestRefs
- MRV.TestStructs

Co-authored-by: Copilot <[email protected]>
Copilot AI review requested due to automatic review settings May 6, 2026 11:18
@kotlarmilos kotlarmilos force-pushed the interp-mrv-return-value branch from 4d2200a to 8afd1eb Compare May 6, 2026 11:18
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

  • Files reviewed: 8/8 changed files
  • Comments generated: 2

Comment thread src/coreclr/debug/di/rsthread.cpp
Comment thread src/coreclr/debug/di/module.cpp
@kotlarmilos kotlarmilos force-pushed the interp-mrv-return-value branch from 8afd1eb to a1932f7 Compare May 6, 2026 11:53
Caller uses IfFailRet which treats S_FALSE as success. With the IsInterpreted() gating
in place, neither S_FALSE return path is reachable on the happy path, so propagate them
as real errors.

Co-authored-by: Copilot <[email protected]>
@kotlarmilos kotlarmilos force-pushed the interp-mrv-return-value branch from a1932f7 to 23409b8 Compare May 6, 2026 11:55
Copilot AI review requested due to automatic review settings May 7, 2026 11:53
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

  • Files reviewed: 8/8 changed files
  • Comments generated: 0 new

Copy link
Copy Markdown
Member

@noahfalk noahfalk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kotlarmilos - If possible lets hold off on this one until we settle what data contract we want here overall. If getting this in is blocking making progress elsewhere let me know but I'm guessing its relatively isolated. If we do wind up embedding this interpreter info into the debugger code we'll probably want to define some new IDacDbiInterface APIs and put the logic that understands interpreter disassembly in the DAC instead.


int CordbNativeCode::GetCallInstructionLength(BYTE *ip, ULONG32 count)
{
#ifdef FEATURE_INTERPRETER
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jakobbotsch @AndyAyersMS - the interpreter work is showing the historical rough edges on our CALL_INSTRUCTION source mappings. I wanted to explore how reasonable it would be to change it prior to doubling down on it as part of both interpreter work and upcoming cDAC work. If there are other JIT folks you think would be better to have in this discussion feel free to redirect :)

To set context, the debugger scenario is that users step through their code and they'd like to see the return value from functions they are stepping over and or stepping out of. In order to accomplish that we have two ICorDebug APIs:

  • ICorDebugCode3::GetReturnValueLiveOffset - indicates a native code offset where the return value will be live in the caller frame
  • ICorDebugILFrame3::GetReturnValueForILOffset - fetches the actual return value assuming the debugger has stopped the thread at the return value live offset

Ideally the data that the debugger would like to have looks like a native variable home: a code location when the return value is live + stack/register/memory info about where the return value is stored. The friction occurs because that isn't what the JIT debug data provides. The JIT provides:

  • the native offset of the call instruction, not the following instruction where return value is live. In order to bridge the gap the debugger code has to disassemble the call instruction to find the address of the instruction after it.
  • no direct information about register/stack/memory info where the return value is stored. In order to determine that the debugger has embedded knowledge of JIT ABI conventions.

I wouldn't be surprised if both the disassembler and the embedded JIT ABI conventions have bugs, limitations, or fail to stay up-to-date with JIT changes resulting in the debugger feature not working as robustly as we might hope. The interpreter is now starting to embed the same thing - a disassembler to skip over the call instruction + ABI knowledge to understand where to read the return value from after stopping.

I'm hoping to understand how difficult it would be to move the data the JIT provides closer to what the debugger wants so we eliminate the logic needed to bridge the gap. A couple potential options:

  1. Stop emitting CALL_INSTRUCTION mappings and instead emit additional variable home mappings. Currently variable home information has no representation for a return value temporary but I don't think it would be hard to create one. For example we could set varNumber to a well-known sentinel value and then reuse the endOffset field to store the IL offset of the callsite because the debugger doesn't care about full range information for this. We could also pick other encodings if it matters, its not likely to make much difference to the debugger. The JIT would need to set the startOffset as the instruction after the call and set the register/stack/memory location info to whatever is appropriate based on the ABI for the call. If there are unrepresentable locations we could either make the representation richer or consider some return values out-of-scope for now. The current hard-coded debugger ABI info doesn't support anything that isn't returned in a register or register pair.
  2. Replace CALL_INSTRUCTION source mappings with some new AFTER_CALL_INSTRUCTION source mapping. This would be a smaller change where the JIT identifies the instruction following the call rather than the call itself. It avoids the debugger needing to disassemble the call instruction but it still requires the debugger to embed ABI info to try reading the return value. For the interpreter this might actually be worse because it appears the debugger code needs to disassemble the interpreter call instruction as part of figuring out where the return value was stored.
  3. We could leave the debug info contract as-is and have interpreter debugging embed similar disassembler and ABI knowledge into the runtime data contract. Design-wise its more complex/fragile going forward but its doable. I think the main risk of maintaining the status quo is that we on a path where all those disassembly/ABI details will get embedded into the runtime data contract and it becomes a debugger breaking change to modify them in the future.

Option (1) I think would be best from the debugger and data contract perspective, but I don't know what cost it incurs for JIT changes. Thoughts?

fyi cDac folks - @rcj1 @max-charlamb @lateralusX @barosiak @jkotas @davidwrighton

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Option 1 eliminates the host-side architecture knowledge, which is favourable for wasm, I don't see how we would disassemble wasm bytecode and handle the calling convention there. What is the encoding when the return value isn't in a representable home (e.g. reg and spilled stack)? The current contract doesn't represent that case via GetReturnValueForILOffset.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the encoding when the return value isn't in a representable home (e.g. reg and spilled stack)?

reg and spilled stack sounds like the already existing VLT_REG_STK option. Its one register + the remainder of the data at memory address baseReg+offset. If that doesn't describe what the interpreter would need then we have the option to either add new location encodings or declare those return values unsupported/out-of-scope. The existing managed return value feature doesn't support all types today, but we'd at least want to support things we expected to be common.

@kotlarmilos kotlarmilos marked this pull request as draft May 12, 2026 13:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants