zafena development

October 9, 2009

The results displayed are generated using the debug build of shark, with assertions, that I made on the 6th of October compared against release builds of the pure zero cpp interpreter and the optimised zero assembler interpreter, both from Icedtea6-1.6.1.

Did we gain anything from having a jumping shark? By taking a quick peek at the graph you can quite quickly spot some 15X+ speed improvements so yes! yeah!, the shark JIT indeed got some sharp toots in its yaws! I am quite delighted to see that some parts of the benchmark got a 25x+ speed boost!
There are still some rough spots that can be identified that of course needs some polishing, so let me share some ideas on how to make the Shark JIT on ARM really shine.

As can be seen in the chart shark uses the zero cpp interpreter before the methods are jited and the extra overhead on running the JIT causes the zero interpreter to run slower during program launch on a single core ARM cpu, this penalty are removed once the initial warm-up have complete (somewhere around 300 to 500 compiled methods). New multi-core ARM Cortex-A9 CPU do not have this penalty since the compiler process are run in a separate thread and can be scheduled on a CPU of its own.

Some quick ways to fix the warm-up issue:
0. First of all I want to state that these results where generated using a debug build of shark, I have a build machine working on creating a release builds as I type so hopefully I will be able to generate some improved benchmark scores in the near future, especially to deal with the warm-up penalty.
1. A quick way to reduce the warmup penalty would be to make shark able to use the new assembler optimized interpreter instead of the pure cpp interpreter found in Icedtea6-1.6.1 and this could become a reality quite soon since they both share the same in memory structures. Also by using the new assembler optimizations would make Shark JIT more usable as a client JVM where initial GUI performance are crucial, and in this GUI area the assembler interpreter really shine.
2. I have also identified some parts in the LLVM JIT that could be quickly improved to make the LLVM JIT jitting faster. Basically I want to make the LLVM tablegen generate better lookuptables to speed up the instruction lowering, currently shark spends quite a large deal of time here running the LLVM ExecutionEngine::getPointerToFunction(). I think by generating some improved formatter code for the LLVM tablegen backend could quite quickly improve the autogenerated .inc files used for the target instruction lowering.
3. Examine the posibility to implement a bytecode cache in Shark to jumpstart the JIT even further. By making the JIT able to load precalculated LLVM IR or in memory representations of the methods would reduce some of the JIT overhead on program launch.
4. Add a PassManager framework to Shark to simplify the LLVM IR before it reaches the JIT. The tricky part are to select what passes to use and in what order to use them. If done correctly then this might both lower jitting time and improve the generated machine code quality.

October 6, 2009

picture of the day!

The picture that made my day!

Ok.. so what happened?

xerxes@babbage-karmic:/wd/icedtea6/openjdk/build/linux-arm/bin$ ./java -version
java version "1.6.0_0"
OpenJDK Runtime Environment (IcedTea6 1.7pre-r2a3725ce72d4) (build 1.6.0_0-b16)
OpenJDK Shark VM (build 14.0-b16-product, mixed mode)

xerxes@babbage-karmic:/wd/icedtea6/openjdk/build/linux-arm/bin$ cat /proc/cpuinfo
Processor    : ARMv7 Processor rev 1 (v7l)
BogoMIPS    : 799.53
Features    : swp half thumb fastmult vfp edsp
CPU implementer    : 0x41
CPU architecture: 7
CPU variant    : 0x2
CPU part    : 0xc08
CPU revision    : 1
Hardware    : Freescale MX51 Babbage Board
Revision    : 51011
Serial        : 0000000000000000

xerxes@babbage-karmic:/wd/llvm$ svn info
URL: http://llvm.org/svn/llvm-project/llvm/trunk
Repository Root: http://llvm.org/svn/llvm-project
Repository UUID: 91177308-0d34-0410-b5e6-96231b3b80d8
Revision: 82896
Node Kind: directory
Schedule: normal
Last Changed Author: edwin
Last Changed Rev: 82896
Last Changed Date: 2009-09-27 11:08:03 +0000 (Sun, 27 Sep 2009)

xerxes@babbage-karmic:/wd/llvm$ quilt diff
Index: llvm/lib/Target/ARM/ARMInstrInfo.td
===================================================================
--- llvm.orig/lib/Target/ARM/ARMInstrInfo.td    2009-10-06 12:35:26.000000000 +0000
+++ llvm/lib/Target/ARM/ARMInstrInfo.td    2009-10-06 12:36:03.000000000 +0000
@@ -645,7 +645,7 @@
 IIC_Br, "mov lr, pc\n\tbx $func",
 [(ARMcall_nolink GPR:$func)]>,
 Requires<[IsARM, IsNotDarwin]> {
-    let Inst{7-4}   = 0b0001;
+    let Inst{7-4}   = 0b0011;
 let Inst{19-8}  = 0b111111111111;
 let Inst{27-20} = 0b00010010;
 }

The last patch on LLVM are currently a hack. basically it makes LLVM emit ARM BLX instructions instead of BX instructions for ARM::CALL_NOLINK. So why did this little hack make it work?

In order to understand that, one have to find out what made Shark on ARM crash before…

Lets rewind time to some days ago... 

Hi, i have been enjoying myself inside gdb for some days, and I have now at least found the reason why the cpu
ends up in garbage memory when running shark on arm.

The problem can be illustrated like this:

frame manager invokes jited code
entry_zero.hpp:57 invokes jit code at 0x67c9e990

jited code runs
0x67c9e990:    push    {r4, r5, r6, r7, r8, r9, r10, r11, lr}
0x67c9e994:    sub    sp, sp, #12    ; 0xc
0x67c9e998:    ldr    r12, [r3, #756]
0x67c9e99c:    ldr    lr, [r3, #764]
0x67c9e9a0:    sub    r4, lr, #56    ; 0x38
0x67c9e9a4:    cmp    r4, r12
0x67c9e9a8:    bcc    0x67c9ebd0
0x67c9e9ac:    mov    r5, r3
0x67c9e9b0:    str    r2, [sp, #4]
0x67c9e9b4:    mov    r6, r0
0x67c9e9b8:    str    r4, [r5, #764]
0x67c9e9bc:    str    r4, [r4, #20]
0x67c9e9c0:    ldr    r0, [pc, #640]    ; 0x67c9ec48
0x67c9e9c4:    str    r0, [r4, #28]
0x67c9e9c8:    ldr    r0, [r5, #768]
0x67c9e9cc:    str    r0, [r4, #32]
0x67c9e9d0:    add    r0, r4, #32    ; 0x20
0x67c9e9d4:    str    r0, [r5, #768]
0x67c9e9d8:    str    r6, [r4, #16]
0x67c9e9dc:    ldr    r7, [r1]
0x67c9e9e0:    ldr    r0, [r1, #4]
0x67c9e9e4:    str    r0, [sp]
0x67c9e9e8:    ldr    r8, [r1, #8]
0x67c9e9ec:    ldr    r9, [r1, #12]
0x67c9e9f0:    ldr    r0, [r1, #16]
0x67c9e9f4:    str    r0, [sp, #8]
0x67c9e9f8:    ldr    r10, [r1, #20]
0x67c9e9fc:    ldr    r2, [pc, #584]    ; 0x67c9ec4c   <------ jit code calls a jvm function stored in this address
0x67c9ea00:    mov    r0, r1
0x67c9ea04:    bx    r2 <---------------------------   problem!  should have been blx!

(gdb) x 0x67c9ec4c
0x67c9ec4c:    0x40836d9c
(gdb) x 0x40836d9c
0x40836d9c <_ZN13SharedRuntime17OSR_migration_endEPi>:    0xe92d41f0
(gdb)

so lets check out _ZN13SharedRuntime17OSR_migration_endEPi

0x40836d9c <_ZN13SharedRuntime17OSR_migration_endEPi+0>:    push    {r4, r5, r6, r7, r8, lr}    <------  lr are backed up..  but bx did not update lr..
0x40836da0 <_ZN13SharedRuntime17OSR_migration_endEPi+4>:    ldr    r4, [pc, #284]    ; 0x40836ec4 <_ZN13SharedRuntime17OSR_migration_endEPi+296>
0x40836da4 <_ZN13SharedRuntime17OSR_migration_endEPi+8>:    ldr    r7, [pc, #284]    ; 0x40836ec8 <_ZN13SharedRuntime17OSR_migration_endEPi+300>
0x40836da8 <_ZN13SharedRuntime17OSR_migration_endEPi+12>:    ldr    r6, [pc, #284]    ; 0x40836ecc <_ZN13SharedRuntime17OSR_migration_endEPi+304>
0x40836dac <_ZN13SharedRuntime17OSR_migration_endEPi+16>:    add    r4, pc, r4
0x40836db0 <_ZN13SharedRuntime17OSR_migration_endEPi+20>:    ldr    r12, [r4, r7]
0x40836db4 <_ZN13SharedRuntime17OSR_migration_endEPi+24>:    ldr    r1, [r4, r6]
0x40836db8 <_ZN13SharedRuntime17OSR_migration_endEPi+28>:    ldr    r5, [r12]
0x40836dbc <_ZN13SharedRuntime17OSR_migration_endEPi+32>:    ldrb    r2, [r1]
0x40836dc0 <_ZN13SharedRuntime17OSR_migration_endEPi+36>:    add    r3, r5, #1    ; 0x1
0x40836dc4 <_ZN13SharedRuntime17OSR_migration_endEPi+40>:    cmp    r2, #0    ; 0x0
0x40836dc8 <_ZN13SharedRuntime17OSR_migration_endEPi+44>:    sub    sp, sp, #24    ; 0x18
0x40836dcc <_ZN13SharedRuntime17OSR_migration_endEPi+48>:    str    r3, [r12]
0x40836dd0 <_ZN13SharedRuntime17OSR_migration_endEPi+52>:    mov    r7, r0
0x40836dd4 <_ZN13SharedRuntime17OSR_migration_endEPi+56>:    bne 0x40836e74 <_ZN13SharedRuntime17OSR_migration_endEPi+216>
0x40836dd8 <_ZN13SharedRuntime17OSR_migration_endEPi+60>:    ldr    r2, [pc, #240]    ; 0x40836ed0 <_ZN13SharedRuntime17OSR_migration_endEPi+308>
0x40836ddc <_ZN13SharedRuntime17OSR_migration_endEPi+64>:    ldr    r12, [r4, r2]
0x40836de0 <_ZN13SharedRuntime17OSR_migration_endEPi+68>:    ldrb    r3, [r12]
0x40836de4 <_ZN13SharedRuntime17OSR_migration_endEPi+72>:    cmp    r3, #0    ; 0x0
0x40836de8 <_ZN13SharedRuntime17OSR_migration_endEPi+76>:    beq 0x40836e20 <_ZN13SharedRuntime17OSR_migration_endEPi+132>
0x40836dec <_ZN13SharedRuntime17OSR_migration_endEPi+80>:    ldr    r6, [pc, #224]    ; 0x40836ed4 <_ZN13SharedRuntime17OSR_migration_endEPi+312>
0x40836df0 <_ZN13SharedRuntime17OSR_migration_endEPi+84>:    ldr    r5, [r4, r6]
0x40836df4 <_ZN13SharedRuntime17OSR_migration_endEPi+88>:    add    r0, r4, r6
0x40836df8 <_ZN13SharedRuntime17OSR_migration_endEPi+92>:    tst    r5, #1    ; 0x1
0x40836dfc <_ZN13SharedRuntime17OSR_migration_endEPi+96>:    beq 0x40836e8c <_ZN13SharedRuntime17OSR_migration_endEPi+240>
0x40836e00 <_ZN13SharedRuntime17OSR_migration_endEPi+100>:    ldr    r5, [pc, #208]    ; 0x40836ed8 <_ZN13SharedRuntime17OSR_migration_endEPi+316>
0x40836e04 <_ZN13SharedRuntime17OSR_migration_endEPi+104>:    ldr    r3, [r4, r5]
0x40836e08 <_ZN13SharedRuntime17OSR_migration_endEPi+108>:    cmp    r3, #0    ; 0x0
0x40836e0c <_ZN13SharedRuntime17OSR_migration_endEPi+112>:    movne r0, r3
0x40836e10 <_ZN13SharedRuntime17OSR_migration_endEPi+116>:    ldrne r6, [r3]
0x40836e14 <_ZN13SharedRuntime17OSR_migration_endEPi+120>:    ldrne r12, [r6, #16]
0x40836e18 <_ZN13SharedRuntime17OSR_migration_endEPi+124>:    movne lr, pc
0x40836e1c <_ZN13SharedRuntime17OSR_migration_endEPi+128>:    bxne    r12
0x40836e20 <_ZN13SharedRuntime17OSR_migration_endEPi+132>:    add    r6, sp, #20    ; 0x14
0x40836e24 <_ZN13SharedRuntime17OSR_migration_endEPi+136>:    mov    r0, r6
0x40836e28 <_ZN13SharedRuntime17OSR_migration_endEPi+140>:    bl 0x40596c84 <NoHandleMark>
0x40836e2c <_ZN13SharedRuntime17OSR_migration_endEPi+144>:    mov    r0, sp
0x40836e30 <_ZN13SharedRuntime17OSR_migration_endEPi+148>:    bl 0x4057909c <JRT_Leaf_Verifier>
0x40836e34 <_ZN13SharedRuntime17OSR_migration_endEPi+152>:    ldr    r3, [pc, #160]    ; 0x40836edc <_ZN13SharedRuntime17OSR_migration_endEPi+320>
0x40836e38 <_ZN13SharedRuntime17OSR_migration_endEPi+156>:    mov    r5, sp
0x40836e3c <_ZN13SharedRuntime17OSR_migration_endEPi+160>:    ldr r12, [r4, r3]
0x40836e40 <_ZN13SharedRuntime17OSR_migration_endEPi+164>:    ldrb r0, [r12]
0x40836e44 <_ZN13SharedRuntime17OSR_migration_endEPi+168>:    cmp    r0, #0    ; 0x0
0x40836e48 <_ZN13SharedRuntime17OSR_migration_endEPi+172>:    movne r0, r7
0x40836e4c <_ZN13SharedRuntime17OSR_migration_endEPi+176>:    blne 0x4039b20c <_Z15trace_heap_freePv>
0x40836e50 <_ZN13SharedRuntime17OSR_migration_endEPi+180>:    mov    r0, r7
0x40836e54 <_ZN13SharedRuntime17OSR_migration_endEPi+184>:    bl 0x407b6a94 <_ZN2os4freeEPv>
0x40836e58 <_ZN13SharedRuntime17OSR_migration_endEPi+188>:    mov    r0, sp
0x40836e5c <_ZN13SharedRuntime17OSR_migration_endEPi+192>:    bl 0x40578c5c <~JRT_Leaf_Verifier>
0x40836e60 <_ZN13SharedRuntime17OSR_migration_endEPi+196>:    mov    r0, r6
0x40836e64 <_ZN13SharedRuntime17OSR_migration_endEPi+200>:    bl 0x40596b04 <~NoHandleMark>
0x40836e68 <_ZN13SharedRuntime17OSR_migration_endEPi+204>:    add    sp, sp, #24    ; 0x18
0x40836e6c <_ZN13SharedRuntime17OSR_migration_endEPi+208>:    pop {r4, r5, r6, r7, r8, lr}
0x40836e70 <_ZN13SharedRuntime17OSR_migration_endEPi+212>:    bx    lr <------  and woho. lets enjoy a trip to garbage memory!

So when the function that the jit calls returns we find ourself eating
garbage memory.

So the small hack fixed this issue quite well but broke armv4t compatibility for the moment.

My next task would be to fix this properly in LLVM.

September 24, 2009

4 running ARM developement boards are hidden in this picture, can you find them?

The ARM developement gear that I have access to have been rapidly upgraded during the past months and I do no longer experience any memory or storage bottlenecks. The boards in the picture are running native compiles, running checks and debugging sessions 24/7 to help mankind to deliver the next generation OpenJDK ARM builds using Zero and Shark and makes sure the software that will power our future to be as stable and fast as possible. I belive these small cool and silent boards will help us save the environment as well since this new energy efficient technology enables the transition to a information society where the whole IT infrastructure can become self supplying from simple solar power panels! Thanks to free-software like Linux and GNU , that are running on these small power efficient computers, that makes the new power efficient green computing future possible!

September 18, 2009

Dear Jalimo users!

I have pushed a quite massive patch into the Jalimo sourcetree to make all OpenJDK 6b16 recipe’s in sync with and able to cross-compile the latest Icedtea6-1.6.1 release!

So… What’s New?
—————–
- Security fixes for:
CVE-2009-2670 – OpenJDK Untrusted applet System properties access
CVE-2009-2671 CVE-2009-2672 – OpenJDK Proxy mechanism information leaks
CVE-2009-2673 – OpenJDK proxy mechanism allows non-authorized socket connections
CVE-2009-2674 – Java Web Start Buffer JPEG processing integer overflow
CVE-2009-2675 – Java Web Start Buffer unpack200 processing integer overflow
CVE-2009-2625 – OpenJDK XML parsing Denial-Of-Service
CVE-2009-2475 – OpenJDK information leaks in mutable variables
CVE-2009-2476 – OpenJDK OpenType checks can be bypassed
CVE-2009-2689 – OpenJDK JDK13Services grants unnecessary privileges
CVE-2009-2690 – OpenJDK private variable information disclosure
- FAST interpreter for ARM, now with gcc 4.1.2 support!
- Timezone fix: http://icedtea.classpath.org/bugzilla/show_bug.cgi?id=377
- Stackoverflow error fix:
http://icedtea.classpath.org/bugzilla/show_bug.cgi?id=381
- Backport regression (NPE) fix for AccessControlContext fix
- Bump to hs14b16

The following people helped with this release:
Gary Benson, Deepak Bhole, Andrew Haley, Andrew John Hughes, Mark
Wielaard, Lillian Angel, Matthias Klose, Ed Nevill, and many others.

We would also like to thank the bug reporters and testers!

Cheers and have a great day!
Xerxes

September 15, 2009

The Haiku team finally made it, they have released their first spin of the free software implementation of “BeOS”!
The R1/alpha 1 release and source can be fetched from: http://www.haiku-os.org/get-haiku
Dig around and you will even find a long anticipated status update on their ARM port where they now can display a booting kernel and framebuffer!

September 13, 2009

During the past month i have been running a public llvm-arm-linux buildbot in order to iron out the remaining bugs in the LLVM Execution Engine JIT for ARM.
My goal was to stabilise the LLVM JIT so that it can be used to speed up cool projects like OpenJDK on ARM by fixing all pre-requirements to run Gary Benson’s Shark JIT compiler on top of Zero!

I have been following the LLVM project for about a year and for me to see the following reports from the buildbot makes me jump of joy! It marks a new era, when all cool and silent energy efficient computing on ARM can get JIT accelerated!

  • (Sep 12 21:57) rev=[81669] success #153: build successful
  • (Sep 12 19:17) rev=[81660] success #152: build successful
  • (Sep 12 16:31) rev=[81655] failure #151: failed test-llvm
  • (Sep 12 13:03) rev=[81626] failure #148: failed test-llvm

The next LLVM release 2.6 gets out in about a week (21 of september 2009) and feel I have done my part in the LLVM stabilisation process for ARM, It are now up for the LLVM 2.6 release managers to merge in the patches from the 2.7 svn trunk to the release branch in order to make the LLVM 2.6 release stable on ARM as well.

Life is cool!

June 10, 2009

IcedTea is served: openjdk/build/linux-i586
mkdir -p stamps
touch stamps/icedtea.stamp
printf — ‘-cacao ERROR\n’ >> openjdk/build/linux-i586/j2sdk-image/jre/lib/i386/jvm.cfg
touch stamps/add-cacao.stamp
printf — ‘-zero ERROR\n’ >> openjdk/build/linux-i586/j2sdk-image/jre/lib/i386/jvm.cfg
touch stamps/add-zero.stamp
xerxes@labbserver:~/icedtea6$ openjdk/build/linux-i586/j2sdk-image/bin/java -version
java version “1.6.0_0″
OpenJDK Runtime Environment (IcedTea6 1.5-rce70ed27635c) (build 1.6.0_0-b16)
OpenJDK Shark VM (build 14.0-b15, mixed mode)
xerxes@labbserver:~/icedtea6$

May 10, 2009

I have just pushed an update to the Jalimo project that enables the new OpenJDK 6 b16 sourcebundle to be cross-compile-able for embedded devices using Jalimo as a cross-compile layer for Icedtea6.

Using Jalimo you can now cross-compile OpenJDK b16 and have hotspot + zero, hotspot + shark or cacao as the vm built out of the box, simply awesome!

Since shark are using the pre2.6 LLVM sources for its JIT I have also prepared “.bb” build recipes for Openembedded that enables quick cross compilation of LLVM based on the LLVM svn trunk so that Jalimo can make use of them when building shark.

The shark vm are built with assertions enabled in order to produce better debug output for all Jalimo users.

Robert Schuster have been an excellent tutor for me to understand all the quirks of OE-recipes, quirks that in turn helped me to creating all these new nice cross compile recipes for OE and Jalimo. Thank you Robert and thank you for pushing the LLVM recipes into the main OE dev git tree!

Andrew Haley and Gary Benson have helped me enormously to understand the lock-free code using memory-barriers that are part of the zero and shark hotspot implementations. I will keep working on these parts in order to make zero and shark rock solid on ARM before ARM Cortex A9 multi-core CPU’s will be part of every cool and silent computing loving persons pocket.

April 15, 2009

Why going to hell in the first place?
During the past year I have involved myself in the porting effort of OpenJDK to various embedded systems using the Icedtea build system. Unfortunally the embedded development boards that I had access during last year where equiped with inadequate amounts of RAM making it practically impossible to build Icedtea directly on native hardware. I learnt how to workaround this by using emulated hardware with more RAM using QEMU, now the compilation process of Icedtea was doable, yet it still took a week to compile Icedtea6 using QEMU. urgh…

I spent some time in my personal created emulator hell watching time pass by and always feeling behind not working on the current code base like all other freejava hackers. I finally decided something had to be done about this and wanted to break free.

My liberation was made possible by first scouting for a build environment suitable for rapid cross compilation and porting development of OpenJDK for any hardware architecture imaginable, including your toaster. Lucky for me I got to know about the Jalimo project and even got the chance to meet one of its core developers Robert Schuster during FOSDEM09. Robert demonstrated how easy embedded Java development could be using the Jalimo infrastructure and it provided me with all the tools I needed to speedup my compile run and test cycle, I was no longer in need of a emulator instead I could build binarys swiftly using the full power of an affordable IA32 quadcore cpu (with 12Mb of cache) and deploy my work for testing on real hardware for debugging within ohurs not days, simply bliss.

It took some time for me to understand how the four required pieces openembedded, bitbake, jalimo and my own goal could be merged, it turned out they where designed to fit!
First openembedded bitbake and jalimo where all three downloaded from their respecive svn or git trees
The only tricky part was that I (the user) needs to provide configuration files containing the basic information of what kind of target i want to crosscompile against and specify what bundles of recipe I want to use to accomplish this, basically I had to express my mind in a way that bitbake understood.

Once this was setup I could stand in any directory and start the build by simply typing:
bitbake openjdk-6
… or any other software package as long I knew the name of the recipe to use.
Even bitbake openjdk-6-shark builds out of the box!

Within a day playing with bitbake my build computer had downloaded 2gig of sourcecode, eaten several gigs of harddrive space. Rather cool… It had built all cross compilers tools I needed, compiled all dependent librarys, from scratch and all optimised for the target hardware that I wanted to run and debug the binarys on.
The final result was then obtained in the temp directory of my choice.

I have documented my work crosscompilation experiences on the Icedtea wiki:
http://icedtea.classpath.org/wiki/CrossCompileFaq

Cheers and have a great day!
Xerxes

February 28, 2009

Cheers!

Cheers to you from Xerxes and JeNI on this flashy picture!

I have prepared for your amusement some photogallerys from the pictures I took at FOSDEM 09.
# Free Java - FOSDEM 09 photography - Arrival and first day
# Free Java - FOSDEM 09 photography - First night and dinner
# Free Java - FOSDEM 09 photography - Second day
# Free Java - FOSDEM 09 photography - The day after - a great experience

It was thrilling meeting you all during the event, many thanks to SUN for sponsoring the Free Java devroom dinner!
I will fill in more photos for the second day so stay tuned. If you are portraited in any of these photos and would prefer not to please let me know ASAP: xerxes at zafena dot se .

Older Posts »

Powered by WordPress