Lower GetElement on arm64 to the correct access sequence #104288

tannergooding · 2024-07-02T07:19:11Z

Unlike xarch where this could be handled a bit more trivially in codegen, Arm64 isn't as flexible and so we get better and more correct codegen by explicitly lowering to the correct sequence instead.

dotnet-policy-service · 2024-07-02T07:19:44Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

tannergooding · 2024-07-03T02:20:35Z

/azp run runtime-nativeaot-outerloop

azure-pipelines · 2024-07-03T02:20:50Z

Azure Pipelines successfully started running 1 pipeline(s).

tannergooding · 2024-07-03T05:57:06Z

/azp run runtime-nativeaot-outerloop

azure-pipelines · 2024-07-03T05:57:17Z

Azure Pipelines successfully started running 1 pipeline(s).

tannergooding · 2024-07-03T18:54:54Z

CC. @dotnet/jit-contrib, This is the "proper" fix for #104232, which was worked around in #104264.

No TP diff, but good asmdiff, such as for Linux Arm64:

Overall (-10,228 bytes)
MinOpts (-4,844 bytes)
FullOpts (-5,384 bytes)

It looks like there's more improvements to be had, but those can be handled separately I think. An example is we're generating:

-            ldr     q16, [fp, #-0x68]	// [V98 tmp78]
-            dup     s16, v16.s[1]
-            ldr     q18, [fp, #-0x68]	// [V98 tmp78]
-            dup     s18, v18.s[2]
-            ldr     q19, [fp, #-0x68]	// [V98 tmp78]
-            dup     s19, v19.s[3]
+            sub     x1, fp, #104	// [V98 tmp78]
+            ; byrRegs +[x1]
+            ldr     s16, [x1, #0x04]
+            sub     x1, fp, #104	// [V98 tmp78]
+            ldr     s18, [x1, #0x08]
+            sub     x1, fp, #104	// [V98 tmp78]
+            ldr     s19, [x1, #0x0C]

This is a net improvement as we're only loading scalars, rather than repeatedly loading vectors, but it does represent a case that probably should've been represented as:

ldr     s16, [fp, #-0x64]
ldr     s18, [fp, #-0x60]
ldr     s19, [fp, #-0x5C]

I think there's just a general optimization missing here when the LclAddr itself has an offset and we're adding a constant offset (so its still representable as a single AddrMode)

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 2, 2024

dotnet-policy-service bot assigned tannergooding Jul 2, 2024

tannergooding force-pushed the simd-getelement branch from cc4ebd7 to 2705326 Compare July 2, 2024 07:28

Lower GetElement on arm64 to the correct access sequence

09d80fa

tannergooding force-pushed the simd-getelement branch 3 times, most recently from a3b0300 to 95fdd44 Compare July 2, 2024 07:49

Use constant offset where possible

95fdd44

build-analysis bot mentioned this pull request Jul 2, 2024

[x86] stress failure in RayTracer.GetNaturalColor with DOTNET_JitStress=2 #102590

Open

tannergooding added 2 commits July 2, 2024 07:32

Ensure that lvaSIMDInitTempVarNum is marked as being used by LclAddrNode

e55da57

Fix assert

a6a4b0e

tannergooding force-pushed the simd-getelement branch from 0d4e0f3 to a6a4b0e Compare July 2, 2024 15:41

tannergooding added 2 commits July 2, 2024 11:08

Create a valid addr mode for Arm64

1d904dc

Don't lower unnecessarily

a71d7c8

This was referenced Jul 3, 2024

The Operation will be canceled. The next steps may not contain expected logs. dotnet/dnceng#3008

Open

The job running on agent NetCore-Public ran longer than the maximum time #104044

Open

tannergooding force-pushed the simd-getelement branch 3 times, most recently from 263de58 to 244c8da Compare July 3, 2024 05:56

Account for index 0 and scale 1

244c8da

This was referenced Jul 3, 2024

Test failure: GC\\Features\\HeapExpansion\\Finalizer\\Finalizer.cmd #102706

Open

TimeProviderTests.TestProviderTimer failed in CI #103459

Open

Remove the offset constant node when it's unused

ee2ef31

tannergooding marked this pull request as ready for review July 3, 2024 18:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lower GetElement on arm64 to the correct access sequence #104288

Lower GetElement on arm64 to the correct access sequence #104288

tannergooding commented Jul 2, 2024

dotnet-policy-service bot commented Jul 2, 2024

tannergooding commented Jul 3, 2024

azure-pipelines bot commented Jul 3, 2024

tannergooding commented Jul 3, 2024

azure-pipelines bot commented Jul 3, 2024

tannergooding commented Jul 3, 2024

Lower GetElement on arm64 to the correct access sequence #104288

Are you sure you want to change the base?

Lower GetElement on arm64 to the correct access sequence #104288

Conversation

tannergooding commented Jul 2, 2024

dotnet-policy-service bot commented Jul 2, 2024

tannergooding commented Jul 3, 2024

azure-pipelines bot commented Jul 3, 2024

tannergooding commented Jul 3, 2024

azure-pipelines bot commented Jul 3, 2024

tannergooding commented Jul 3, 2024