Friday, 28 March 2014

Java Object Layout: A Tale Of Confusion

Buy the T-shirt
Following the twists and turns of the conversation on this thread in the Mechanical Sympathy mailing list highlights how hard it is to reason about object layout based on remembered rules. Mostly every person on the thread is right, but not completely right. Here's how it went...


The thread discusses False Sharing as described here. It is pointed out that the padding is one sided and padding using inheritance is demonstrated as solution. The merits of using inheritance vs. using an array and utilizing Unsafe to access middle element (see Disruptor's Sequence) vs. using AtomicLongArray to achieve the same effect are discussed (I think the inheritance option is best, as explored here). And then confusion erupts...

What's my layout?

At this point Peter L. makes the following point:
[...]in fact the best option may be.
class Client1 {
    private long value;
    public long[] padding = new long[5];
What follows is a series of suggestions on what the layout of this class may be.

Option 1: My recollections

I was too lazy to check and from memory penned the following:
[...] the ordering of the fields may end up shifting the reference (if it's 32bit or CompressedOop) next to the header to fill the pad required for the value field. The end layout may well be:
12b header
4b padding(oop)
8b value

Option 2: Simon's recollections

Simon B. replied:
[...] I thought Hotspot was laying out longs before
references, and that the object header was 8 Bytes.
So I would expect Client1 to be laid out in this way:
8B header
8B value
4B padding (oop)
[...] Am I out of date in object layout and header size ?

Option 3: TM Jee's doubts

Mr. Jee slightly changed the class and continued:
class Client1 {
    private long value;
    public long[] padding = new long[5]
    public Object[] o = new Object[1];
the memory layout should be something like
12b header (or is it 16b)
8b value
4b for the long[] (its just the reference which is 4b for compressed and 8b if not)
4b for the Object[] (again it's just the reference)
Is this right so far?
To which  Peter L. wisely replied:
Yes. But as you recognise the sizes of the header and sizes of references are not known until runtime.

Option 4: Check...

So I used JOL to check. And as it turns out we are all somewhat right and somewhat wrong...
I'm right for compressed oops (the default for 64bit):
Running 64-bit HotSpot VM.
Using compressed references with 3-bit shift.
Client1 object internals:
      0     4          (object header)
      4     4          (object header)
      8     4          (object header)
     12     4   long[] Client1.padding
     16     8     long Client1.value
     24     4 Object[] Client1.o
     28     4          (loss due to the next object alignment)

The header is 12b and the array reference is shifted up to save on space. But my casual assumption 32bit JVM layout will be the same is wrong.

Simon is right that the header is 8b (but only for 32bit JVMs) and that references will go at the end (for both 32bit and 64bit, but not with compressed oops):
Running 32-bit HotSpot VM.
Client1 object internals:
      0     4          (object header)
      4     4          (object header)
      8     8     long Client1.value
     16     4   long[] Client1.padding
     20     4 Object[] Client1.o

And finally with 64bit Mr. Jee is right too:
Running 64-bit HotSpot VM.
Client1 object internals:
      0     4          (object header)
      4     4          (object header)
      8     4          (object header)
     12     4          (object header)
     16     8     long Client1.value
     24     8   long[] Client1.padding
     32     8 Object[] Client1.o

And Peter is entirely right to point out the runtime is the crucial variable in this equation.


If you catch yourself wondering about object layout:
  1. Use JOL to check, it's better than memorizing rules
  2. Remember that 32/64/64+Oops are different for Hotspot, and other JVMs may have different layouts altogether
  3. Read another post about java memory layout

Wednesday, 26 March 2014

Where is my safepoint?

My new job (at Azul Systems) leads me to look at JIT compiler generated assembly quite a bit. I enjoy it despite, or perhaps because, the amount of time I spend scratching my increasingly balding cranium in search of meaning. On one of these exploratory rummages I found a nicely annotated line in the Zing (the Azul JVM) generated assembly:
gs:cmp4i [0x40 tls._please_self_suspend],0
jnz 0x500a0186
Zing is such a lady of a JVM, always minding her Ps and Qs! But why is self suspending a good thing?

Safepoints and Checkpoints

There are a few posts out there on what is a safepoint (here's a nice one going into when it happens, and here is a long quote from Mechnical Sympthy mailing list on the topic). Here's the HotSpot glossary entry:

A point during program execution at which all GC roots are known and all heap object contents are consistent. From a global point of view, all threads must block at a safepoint before the GC can run. (As a special case, threads running JNI code can continue to run, because they use only handles. During a safepoint they must block instead of loading the contents of the handle.) From a local point of view, a safepoint is a distinguished point in a block of code where the executing thread may block for the GC. Most call sites qualify as safepoints. There are strong invariants which hold true at every safepoint, which may be disregarded at non-safepoints. 
To summarize, a safepoint is a known state of the JVM. Many operations the JVM needs to do happen only at safepoints. The OpenJDK safepoints are global, while Zing has a thread level safepoint called a checkpoint. The thing about them is that at a safepoint/checkpoint your code must volunteer to be suspended to allow the JVM to capitalize on this known state.
What will happen while you get suspended varies. Objects may move in memory, classes may get unloaded, code will be optimized or deoptimized, biased locks will unbias.... or maybe your JVM will just chill for a bit and catch its breath. At some point you'll get your CPU back and get on with whatever you were doing.
This will not happen often, but it can happen which is why the JVM makes sure you are never too far from a safepoint  and voluntary suspension. The above instruction from Zing's generated assembly of my code is simply that check. This is called safepoint polling.
The safepoint polling mechanism for Zing is comparing a thread local flag with 0. The comparison is harmless as long as the checkpoint flag is 0, but if the flag is set to 1 it will trigger a checkpoint call (the JNZ following the CMP4i will take us there) for the particular thread. This is key to Zing's pause-less GC algorithm as application threads are allowed to operate independently.

Reader Safpoint

Having happily grokked all of the above I went looking for the OpenJDK safepoint.

Oracle/OpenJDK Safepoints

I was hoping for something equally polite in the assembly output from Oracle, but no such luck. Beautifully annotated though the Oracle assembly output is when it comes to your code, it maintains some opaqueness when it's internals are concerned. After some digging I found this:
test   DWORD PTR [rip+0xa2b0966],eax        # 0x00007fd7f7327000
                                                ;   {poll}
No 'please', but still a safepoint poll. The OpenJDK mechanism for safepoint polling is by accessing a page that is protected when requiring suspension at a safepoint, and unprotected otherwise. Accessing a
protected page will cause a SEGV (think exception) which the JVM will handle (nice explanation here). To quote from the excellent Alexey Ragozin blog:
Safepoint status check itself is implemented in very cunning way. Normal memory variable check would require expensive memory barriers. Though, safepoint check is implemented as memory reads a barrier. Then safepoint is required, JVM unmaps page with that address provoking page fault on application thread (which is handled by JVM’s handler). This way, HotSpot maintains its JITed code CPU pipeline friendly, yet ensures correct memory semantic (page unmap is forcing memory barrier to processing cores).
The [rip+0xa2b0966] addressing is a way to save on space when storing the page address in the assembly code. The address commented on the right is the actual page address, and is equal to the rip (Relative Instruction Pointer) + given constant. This saves space as the constant is much smaller than the full address representation. I thank Mr. Tene for clarifying that one up for me.
If we were to look at safepoint polls throughout the assembly of the same process they would all follow the above pattern of pointing at the same global magic address (via this local relative trick). Setting the magic page to protected will trigger the SEGV for ALL threads. Note that the Time To Safe Point (TTSP) is not reported as GC time and may prove a hidden performance killer for your application. The effective cost of this global safepoint approach goes up the more runnable (and scheduled) threads your application has (all threads must wait for a safepoint consensus before the operation to be carried out at the safepoint can start).

Find The Safpoint Summary

In short, when looking for safepoints in Oracle/OpenJDK assembly search for poll. When looking at Zing assembly search for _please_self_suspend.