The 32x virtual memory to physical memory ratio plays into relocation and colored pointers (i.e pointers where some bits serve as flag bits).
Putting the actual data layouts in 44 bits out of 64 is a neat trick which relies on the allocator being aware of the mappings between physical and virtual addresses.
In the beginning of the 32 bit revolution, when the future was here but unevenly distributed, there was a lot of talk about how 32 bit pointers would fundamentally change how people wrote code. Among other things it got rid of a bunch of odd bookkeeping, and if you don’t have to do the bookkeeping you don’t have to write the code in a way that supports it, so you can do other things.
Not too long after someone asked what sort of interesting changes 64 bit will bring. And I’ve been keeping that question in the back of my mind ever since.
Aliasing memory multiple times in order to do read or write barriers and make GC much cheaper is a pretty good one. But another one I know of is that one of the secrets of the L4 microkernel is that its IPC speed comes substantially from reducing the amount of TLB work that needs to be done to switch to another process running in a different address space. They use the same address space and only swap out the access rights which cuts the call overhead in half. It’s pretty easy to put a bunch of processes into a 64 bit address space and just throw each one a randomly located 4GB slice of RAM.
Yeah, would love to see the CPU vendors invent some primitives to let user code pull those kinds of privilege isolation tricks within a single process and address space.
Something like: “From now on, code on these pages can only access data on these pages, and only return to/call into other code through these gates…”
I've had some ideas about avoiding format validation in IPC receivers if the data is encoded by trusted code, which is also the only code that has rights to send the IPC data / to connect to the receiver.
I can't really think of an important problem that it would solve, though. DBus always validates received data, but it's not really meant or very suitable for large amounts of data anyway.
What I’m looking for is a way for a process to de/re-escalate its privileges to access memory, without an expensive context switch being required at the transition. The CPU would simply enforce different rules based on (say) the high-order bits of the instruction pointer.
Imagine a server process that wants to run some elaborate third-party content parser. It’d be great to be sure that no matter how buggy or malicious that code, it can’t leak the TLS keys.
Today, high-security architectures must use process isolation to achieve this kind of architectural guarantee, but even finely tuned IPC like L4’s is an order of magnitude slower than a predictable jump.
When your comment and the article refer to “physical” addresses, those are physical in the context of the JVM, right? To the OS they’re virtual addresses in the JVM process space?
For relevant upcoming changes see Automatic Heap Sizing for ZGC: https://openjdk.org/jeps/8329758
The 32x virtual memory to physical memory ratio plays into relocation and colored pointers (i.e pointers where some bits serve as flag bits).
Putting the actual data layouts in 44 bits out of 64 is a neat trick which relies on the allocator being aware of the mappings between physical and virtual addresses.
In the beginning of the 32 bit revolution, when the future was here but unevenly distributed, there was a lot of talk about how 32 bit pointers would fundamentally change how people wrote code. Among other things it got rid of a bunch of odd bookkeeping, and if you don’t have to do the bookkeeping you don’t have to write the code in a way that supports it, so you can do other things.
Not too long after someone asked what sort of interesting changes 64 bit will bring. And I’ve been keeping that question in the back of my mind ever since.
Aliasing memory multiple times in order to do read or write barriers and make GC much cheaper is a pretty good one. But another one I know of is that one of the secrets of the L4 microkernel is that its IPC speed comes substantially from reducing the amount of TLB work that needs to be done to switch to another process running in a different address space. They use the same address space and only swap out the access rights which cuts the call overhead in half. It’s pretty easy to put a bunch of processes into a 64 bit address space and just throw each one a randomly located 4GB slice of RAM.
Yeah, would love to see the CPU vendors invent some primitives to let user code pull those kinds of privilege isolation tricks within a single process and address space.
Something like: “From now on, code on these pages can only access data on these pages, and only return to/call into other code through these gates…”
That would be pretty cool. Something like the Win32 function GetWriteWatch, but implemented in hardware instead of the page fault handler (I assume).
https://learn.microsoft.com/en-us/windows/win32/api/memoryap...
Or some sort of special write barrier store op-code, idk.
I've had some ideas about avoiding format validation in IPC receivers if the data is encoded by trusted code, which is also the only code that has rights to send the IPC data / to connect to the receiver. I can't really think of an important problem that it would solve, though. DBus always validates received data, but it's not really meant or very suitable for large amounts of data anyway.
What I’m looking for is a way for a process to de/re-escalate its privileges to access memory, without an expensive context switch being required at the transition. The CPU would simply enforce different rules based on (say) the high-order bits of the instruction pointer.
Imagine a server process that wants to run some elaborate third-party content parser. It’d be great to be sure that no matter how buggy or malicious that code, it can’t leak the TLS keys.
Today, high-security architectures must use process isolation to achieve this kind of architectural guarantee, but even finely tuned IPC like L4’s is an order of magnitude slower than a predictable jump.
For a brief moment Intel supported MPX which did something similar.
You can also play tricks with the virtualization hardware, bit it need kernel support.
Eventually we will get segments back again.
Thread based seems like it at least should be possible.
Isn't not swapping page tables during a call precisely what the KPTI mitigations had to turn off for Meltdown mitigations?
Is that something like the memory protection scheme on the Newton OS?
When your comment and the article refer to “physical” addresses, those are physical in the context of the JVM, right? To the OS they’re virtual addresses in the JVM process space?
Correct. ZGC has no way to escape from the virtualization by the kernel (assuming your hardware and kernel uses an MMU)
Thank you for the answer, I was wondering that as well.
[dead]