Skip to content

Add Arm64 AdvSimd implementation of Matrix4x4 Invert#128640

Closed
a74nh wants to merge 1 commit into
dotnet:mainfrom
a74nh:matrix_github
Closed

Add Arm64 AdvSimd implementation of Matrix4x4 Invert#128640
a74nh wants to merge 1 commit into
dotnet:mainfrom
a74nh:matrix_github

Conversation

@a74nh
Copy link
Copy Markdown
Contributor

@a74nh a74nh commented May 27, 2026

Code proposed by the Arm MCP server guided workflow. https://github.com/arm/mcp

Testing using dotnet/performance InvertBenchmark shows a 17% improvement on Cobalt.

Code proposed by the Arm MCP server guided workflow.
https://github.com/arm/mcp

Testing using dotnet/performance InvertBenchmark shows a 17%
improvement on Cobalt.
Copilot AI review requested due to automatic review settings May 27, 2026 11:11
@dotnet-policy-service dotnet-policy-service Bot added the community-contribution Indicates that the PR has been added by a community member label May 27, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds an Arm64 (AdvSimd/NEON) intrinsic implementation for Matrix4x4 inversion to improve performance on Arm64 platforms.

Changes:

  • Add an AdvSimd.Arm64 fast-path in Invert.
  • Introduce AdvSimdImpl mirroring the existing DirectXMath/SSE-based inversion algorithm using NEON intrinsics.

@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @dotnet/area-system-numerics
See info in area-owners.md if you want to be subscribed.

}

[CompExactlyDependsOn(typeof(AdvSimd.Arm64))]
static bool AdvSimdImpl(in Impl matrix, out Impl result)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a general nit, we're looking at landing #127690 which adds a number of xplat helper APIs and will allow us to unify the Arm64 and x64 implementations to a single code path.

It provides helpers like Vector128.ConcateLowerLower(row1, row2) which avoids having to extract a Vector64<T> if that isn't viable (such as on x64 or for SVE where no "half width" vector exists for Vector<T>)

It also provides ones like Vector128.UnzipEven(vTemp1, vTemp2) which unifies the consideration of needing to use a shuffle on some platforms or for some base types vs having a dedicated instruction on others.

If we can hold off until that lands, we should be able to just update Matrix4x4 to no longer have any architecture specific code paths.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can hold off until that lands, we should be able to just update Matrix4x4 to no longer have any architecture specific code paths.

That would be a much better solution. This PR is essentially duplicating the X86 code path.

What are the chances of landing #127690 and someone producing a combined version of Invert in time for .NET11? If it's not likely to happen, then would this PR be useful as a stopgap to help performance? Understood you may not want to for code size and churn reasons.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#127690 should be merged in the next few days, its just waiting on secondary sign-off. It's part of the planned work for .NET 11

Once that's done, updating Matrix4x4 to be xplat should be trivial; I can get it done relatively quickly.

@a74nh
Copy link
Copy Markdown
Contributor Author

a74nh commented May 29, 2026

Closing this as it should be implemented with the new APIs once they are available.

@a74nh a74nh closed this May 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-System.Numerics community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants