Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VirtualReserve is allocating 256GB on illumos rather than "reserving" #104211

Open
am11 opened this issue Jun 30, 2024 · 11 comments
Open

VirtualReserve is allocating 256GB on illumos rather than "reserving" #104211

am11 opened this issue Jun 30, 2024 · 11 comments
Labels
area-PAL-coreclr os-SunOS SunOS, currently not officially supported
Milestone

Comments

@am11
Copy link
Member

am11 commented Jun 30, 2024

Split from #34944 (comment)

We should improve this implementation to support illumos properly, so it doesn't require setting the upper limit for heap memory alloc.

With linux procfs we get more accurate or up-to-date stats than sysconf. That's why it's only used in fallback. Since on illumos procfs is binary based, it bails out early from ReadMemAvailable() function.

If there are similar concerns with sysconf() on illumos, kstat might be more accurate; something like this:

#elif defined(__sun)
    // Use kstat to get available memory on Illumos
    kstat_ctl_t *kc;
    kstat_t *ksp;
    kstat_named_t *knp;
    uint64_t free_memory = 0;

    if ((kc = kstat_open()) != NULL) {
        if ((ksp = kstat_lookup(kc, "unix", 0, "system_pages")) != NULL) {
            if (kstat_read(kc, ksp, NULL) != -1) {
                if ((knp = (kstat_named_t *)kstat_data_lookup(ksp, "pagesfree")) != NULL) {
                    free_memory = (uint64_t)knp->value.ui64 * sysconf(_SC_PAGESIZE);
                }
            }
        }
        kstat_close(kc);
    }

    if (free_memory == 0)  available = sysconf(SYSCONF_PAGES) * sysconf(_SC_PAGE_SIZE);

    available = free_memory;
@am11 am11 added area-PAL-coreclr os-SunOS SunOS, currently not officially supported labels Jun 30, 2024
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Jun 30, 2024
@gwr
Copy link

gwr commented Jun 30, 2024

Actually sysconf(_SC_AVPHYS_PAGES) gets us exactly the value we want here, and is a much simpler interface (and thread-safe etc).

@gwr
Copy link

gwr commented Jun 30, 2024

It looks like we might be better off using sysconf on Linux as well.
Here's a handy little test program:

#include <stdio.h>
#include <unistd.h> // sysconf

int
main(int argc, char **argv)
{
	long mem, pgs;

	pgs = sysconf(_SC_AVPHYS_PAGES);
	mem = pgs * (getpagesize() / 1024);
	printf("Available pages: %lu\n", pgs);
	printf("Available KB: %lu\n", mem);
	return (0);
}

And comparing results:

gwr@ubuntu18:/g/ws/dotnet$ head -3 /proc/meminfo
MemTotal:        8116820 kB
MemFree:         2923380 kB
MemAvailable:    7320288 kB
gwr@ubuntu18:/g/ws/dotnet$ ./getavail.linux 
Available pages: 730782
Available KB: 2923128

so it appears that on Linux, that sysconf gets you MemFree.
I think that's generally what we want here anyway.

I ran this on a VM with 8BG physical ram. That 7+ GB of RAM reported by MemAvailable is questionable, but maybe that's the right thing to use on Linux. I'm not sure.

@am11
Copy link
Member Author

am11 commented Jun 30, 2024

Not sure I understood why it is requiring us to set DOTNET_GCHeapHardLimit=<something smaller than 256 GB> on illumos if sysconf is returning the correct value. Isn't that exactly what it ends up using on illumos change (procfs path returns false on illumos). Of course "tidying up the implementation" vs. "fixing the heap mem issue for good" are two different aspects of it, latter of which is more interesting for the moment. ;)

@gwr
Copy link

gwr commented Jun 30, 2024

Here is similar data for an 8GB VM running illumos:

gwr@oi-work:/g/ws/dotnet$ kstat -m unix -n system_pages
module: unix                            instance: 0     
name:   system_pages                    class:    pages
        availrmem                       1114195
        crtime                          0
        desfree                         16366
        desscan                         25
        econtig                         4225150976
        fastscan                        1047486
        freemem                         1178999
        kernelbase                      0
        lotsfree                        32733
        low_mem_scan                    0
        minfree                         12274
        n_throttle                      0
        nalloc                          33109279
        nalloc_calls                    17586
        nfree                           32778862
        nfree_calls                     9355
        nscan                           0
        pagesfree                       1178999
        pageslocked                     980778
        pagestotal                      2094973
        physmem                         2094974
        pp_kernel                       775618
        slowscan                        100
        snaptime                        49307.761938326

And here's what that sysconf says:

gwr@oi-work:/g/ws/dotnet$ ./getavail.illumos 
Available pages: 1179147
Available KB: 4716588

So a little over 4GB free memory.

@gwr
Copy link

gwr commented Jun 30, 2024

OK, when illumos actually uses sysconf(_SC_AVPHYS_PAGES) in
GetAvailablePhysicalMemory() then the heap size stuff works correctly.

Have a look at the latest in this comparison. This is for debugging.
We could probably clean this up and integrate something like it.
main...gwr:dotnet-runtime:illumos2

Rather than bother with config variables for HAVE__SC_AVPHYS_PAGES and
HAVE__SC_PHYS_PAGES we could use defined(...) as shown here.

I'm not sure what went wrong with the config variables, but apparently something did.
The cross build stuff makes it hard to know which config header is used where.

@am11
Copy link
Member Author

am11 commented Jun 30, 2024

OK, when illumos actually uses sysconf(_SC_AVPHYS_PAGES) in
GetAvailablePhysicalMemory() then the heap size stuff works correctly.

Is it not using sysconf without the change (i.e. ReadMemAvailable() returns false on illumos and we hit the fallback)?

IOW, is this change saving us from setting DOTNET_GCHeapHardLimit or is the change only saving one call to ReadMemAvailable() that always returns false on the platform?

@gwr
Copy link

gwr commented Jun 30, 2024

Looks like my changes were unnecessary. I must have been confused somehow. Maybe running a different build than I thought? Anyway the GetAvailablePhysicalMemory code seems to be OK as it is.
Here's what happens with the current code:

gdb dotnet
GNU gdb (GDB) 14.2
[...]
Reading symbols from dotnet...
(gdb) break GetAvailablePhysicalMemory 
Function "GetAvailablePhysicalMemory" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (GetAvailablePhysicalMemory) pending.
(gdb) run helloworld/bin/Debug/net9.0/helloworld.dll
Starting program: /tank/ws/dnt/dotnet helloworld/bin/Debug/net9.0/helloworld.dll

Thread 2 hit Breakpoint 1, GetAvailablePhysicalMemory ()
    at /g/ws/dotnet/runtime/src/coreclr/gc/unix/gcenv.unix.cpp:1236
1236	{
(gdb) next
1237	    uint64_t available = 0;
(gdb) next
1265	    if (tryReadMemInfo)
(gdb) next
1269	        tryReadMemInfo = ReadMemAvailable(&available);
(gdb) next
1272	    if (!tryReadMemInfo)
(gdb) next
1276	        available = sysconf(SYSCONF_PAGES) * sysconf(_SC_PAGE_SIZE);
(gdb) next
1280	    return available;
(gdb) print available / 1024
$1 = 4298396
(gdb) cont
Continuing.
[New LWP    6        ]
[New LWP    7        ]
Hello, World!
[Inferior 1 (process 2368    ) exited normally]

That was running with this code:
This is with my illumos1 branch: main...gwr:dotnet-runtime:illumos1
That has no changes in coreclr/gc/unix/gcenv.unix.cpp (just other chanes I needed to unblock my work)

I guess we can close this issue. Thanks.

@am11
Copy link
Member Author

am11 commented Jun 30, 2024

Thank you for confirming. Glad to know that we don't need DOTNET_GCHeapHardLimit=1C00000 or via runtimeconfig.json file anymore. :)
(I'm currently on arm64 machine, and emulating an x64 VM is quite slow but I'll try once I'm on amd64 machine)

This is with my illumos1 branch: main...gwr:dotnet-runtime:illumos1

It would be nice if we could call mlock under vfork() as you suggested (am11@f132097?w=1) or other types of fork/clone to see if it emulates IPI and saves us from requiring mlock privileges. That would fix couple of issues with linux and freebsd jail as well (e.g. #44417). Dealing with mlock() has been bit of a challenge across the board so it will be a general goodness IMHO.

@am11 am11 closed this as completed Jun 30, 2024
@dotnet-policy-service dotnet-policy-service bot removed the untriaged New issue has not been triaged by the area owner label Jun 30, 2024
@gwr
Copy link

gwr commented Jul 2, 2024

It turns out GetAvailablePhysicalMemory() was fine, but the mmap call needed MAP_NORESERVE.
Should we reopen this? Get a new issue?

@am11
Copy link
Member Author

am11 commented Jul 2, 2024

Could you please open a pull request with that change?

@am11 am11 reopened this Jul 2, 2024
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Jul 2, 2024
gwr added a commit to gwr/dotnet-runtime that referenced this issue Jul 2, 2024
Need MAP_NORESERVE for the 256G mmap calls on illumos
(and maybe some other platforms)
gwr added a commit to gwr/dotnet-runtime that referenced this issue Jul 2, 2024
Need MAP_NORESERVE for the 256G mmap calls on illumos
(and maybe some other platforms)
gwr added a commit to gwr/dotnet-runtime that referenced this issue Jul 2, 2024
Need MAP_NORESERVE for the 256G mmap calls on illumos
(and maybe some other platforms)
@gwr
Copy link

gwr commented Jul 2, 2024

The PR is up. 104275
As I mentioned there, I might suggest changing the issue title. (I don't appear to be able to do that.) I'd prefer:
Need MAP_NORESERVE in VirtualReserve for illumos
or something like that. Or maybe:
VirtualReserve of 256GB fails on illumos

gwr added a commit to gwr/dotnet-runtime that referenced this issue Jul 2, 2024
Need MAP_NORESERVE for the 256G mmap calls on illumos,
(and maybe some other platforms) otherwise it tries to
allocate physical pages, which will fail on most VMs.
gwr added a commit to gwr/dotnet-runtime that referenced this issue Jul 2, 2024
Need MAP_NORESERVE for the 256G mmap calls on illumos,
(and maybe some other platforms) otherwise it tries to
allocate physical pages, which will fail on most VMs.
@am11 am11 changed the title GetAvailablePhysicalMemory() on illumos VirtualReserve is allocating 256GB on illumos rather than "reserving" Jul 2, 2024
gwr added a commit to gwr/dotnet-runtime that referenced this issue Jul 2, 2024
Need MAP_NORESERVE for the 256G mmap calls on illumos,
(and maybe some other platforms) otherwise it tries to
allocate physical pages, which will fail on most VMs.
gwr added a commit to gwr/dotnet-runtime that referenced this issue Jul 2, 2024
Need MAP_NORESERVE for the 256G mmap calls on illumos,
(and maybe some other platforms) otherwise it tries to
allocate physical pages, which will fail on most VMs.
@mangod9 mangod9 removed the untriaged New issue has not been triaged by the area owner label Jul 3, 2024
@mangod9 mangod9 added this to the Future milestone Jul 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-PAL-coreclr os-SunOS SunOS, currently not officially supported
Projects
None yet
Development

No branches or pull requests

3 participants