zafena development

February 6, 2013

I have just returned from FOSDEM where we Julen Gouesse, Sven Gothel and Xerxes Rånby presented a JogAmp freejava love talk with some live demonstrations running hardware accelerated on x86 laptop, android/meego phone/tables and GNU/LInux systems such as the AC100 and RaspberryPi.
Slides, Video and Teaser from the JogAmp FOSDEM freejava love talk are now online:

If you want to design a game then we recommend you to use JogAmp indirect through a game engine library such as libgdx or JMonkeyEngine 3. We have a forum thread inside the JogAmp community where we work on improving engine support for embedded devices such as the Raspberry Pi using said engines. By using a game engine will get you up to speed developing professional games that can be run across all devices and formfactors.
The video and teaser recordings also includes footage of the JMonkeyEngine 3 JOGL 2 backend initialized by Julien Gouesse that we for time reason never managed to show during the strict 40min talk and live demo at FOSDEM.

Inside the FOSDEM talk we ran the open-source libgdx game pax-britannica using the new libgdx JogAmp JOGL 2 port initialized by Julien Gouesse in combination with the new hardfloat fixed CACAO ARM JVM found in the new IcedTea 6 1.12 release on the Raspberry Pi.
we also ran the JogAmp Jake2 port, a port done by Sven Gothel, using the armhf JamVM from IcedTea 7 on the ac100.
Both opensource games of course rocked running using freejava!
The point that we wanted to show is that if you start to use the dedicated love using the media accelerator found in all new devices your java applications rendering will run *equally* fast regardless of the JVM implementation, since the rendering is then performed by the GPU instead of the CPU.

For demonstation purposes: I had to extend the libgdx backend with a custom mouse pointer in order for otherwise touchscreen oriented games such as pax-britannica to work on the Raspberry Pi from console, the reason why this is needed is because there is no highlevel compositing windowmanager running on the RaspPi adding the overlay mousepointer for you like what you are custom to see when running desktop applications using X11. This libgdx RaspberryPi mouse pointer branch allowed me to test all touch oriented libgdx games and demos from console using a mouse input device.

While we know that compilation of custom JVM can be tricky I have prepared a RaspberryPi armhf CACAO that you can use as an drop in replacement into any OpenJDK 6 installation (/usr/lib/jvm/java-6-openjdk-armhf/jre/lib/arm/server/ This was built using the IcedTea 6 1.12 release from inside a Raspbian chroot.
For JamVM simply install the openjdk-7-jdk package and run it using java -jamvm its already built and packaged by the Raspbian team and work great! The new CACAO armhf is found here:

The Raspberry Pi Rasbian armhf distribution have now packaged IcedTea 6 1.12.1 and included it in the distribution, this means that you can test the new armhf CACAO JVM and JamVM JVM by simply installing the openjdk-6-jdk package.

sudo apt-get update
sudo apt-get install openjdk-6-jdk
java -jamvm -version
java -cacao -version

Also a great KUDOS for Qun, our dear camera-woman!

Cheers and enjoy the love!
On behalf of the JogAmp community – Xerxes

February 27, 2012

Today JogAmp added a workaround to deal with GPU drivers that reports a bogus 0Hz screen refresh rate. With this fix in place hardware acceleration are working out of the box on Nokia N9 MeeGo phones in combination with the Nokia compiled Imaginative Technologies SGX 530 GPU drivers!

If you have OpenJDK installed on any ARMv7 board with a proper OpenGL-ES libEGL and libGLES driver setup then you can try running this for yourself by using my prebuilt jogamp-armv7 jars.


tar zxvf jogamp-armv7.tar.gz

cd jogamp

sh ./

Source and build instructions are available.

JogAmp JOGL OpenGL-ES Driver compatiblity matrix

I am tracking ARMv7 libEGL/libGLES* GPU drivers compatiblity with JogAmp here:

Chuck Norris force you to use the produced jars from the JogAmp “Chuck Norris” build-bot! uses gluegen build 510

Assemble a ARMv7 jogamp testfolder using the JogAmp daily build:



7z x gluegen-2.0-b510-20120225-linux-armv7.7z

7z x jogl-2.0-b684-20120227-linux-armv7.7z

mkdir -p jogamp/jar
cp -r jogl*/etc jogamp/etc/
cp gluegen*/jar/*.jar jogamp/jar
cp gluegen*/lib/* jogamp/jar
cp jogl*/jar/*.jar jogamp/jar
cp jogl*/lib/lib* jogamp/jar
cp /usr/share/java/hamcrest-core.jar jogamp/
cp /usr/share/java/junit4.jar jogamp/

cd jogamp

java -cp jar/gluegen.jar:jar/jogl.all-mobile.jar:jar/jogl.test.jar:hamcrest-core.jar:junit4.jar com/jogamp/opengl/test/junit/jogl/demos/es2/newt/TestGearsES2NEWT -time 40000


February 15, 2012

Jim Connors at Oracle posed a interesting valentines gift, a compare of the latest open-source OpenJDK ARM JVM inside IcedTea6, 1.12pre, HEAD against their closed source Hotspot c1 and c2 implementations.

The Oracle blog antispam system in use...

I would have liked to comment directly on your blog but your spam system kept me at bay so i posed my reply to you here instead ;)

The OpenJDK Zero *mixed-mode* JVM used in Jims compare includes the now re-maintained ARM Thumb2 JIT and assembler interpreter port that got re-introduced in the IcedTea6-1.11 release.
Many of the OpenJDK JVM like CACAO and JamVM are by design tuned for embedded and client use and thus show strength in both low memory overhead and fast startup time.

When testing JVM performance on ARM its important to remember that the default optimization settings used by the compilers to build the JVM do matter.

The Debian 6.0.4 squeeze “armel” distribution use ARMv4t optimization by default. This low optimization level enable the Debian built packages run on as many kind of different ARM broads and CPU’s as possible. The trade-off are that you basically disable all VFP, floating point, optimizations and make synchronization code slower by forcing the JVM to call the Linux kernel helper instead of using faster ARMv7 atomic instructions directly.

To give OpenJDK JVM a better match i would suggest re-running the benchmark using OpenJDK built on top of Debian wheezy “armhf”, Ubuntu Precise “armhf” or Fedora F15 that by default optimize for the ARMv7 thumb2 instruction-set and make use of the VFP unit inside the CPU, also the “armhf” ABI allows better argument passing between library functions inside the CPU VFP registers. Two OpenJDK JVM, JamVM and Zero,  are already updated to support the new “armhf” hardfloat ABI.

You could also choose to run this benchmark using OpenJDK JVMs built using the Ubuntu Precise “armel” tool-chains that still use the legacy soft-float ABI while still adding ARMv7 Thumb2 and VFP optimizations. All OpenJDK JVM tested in this compare would run better by simply using a higher optimization level during the build.

All in all thank you Jim to give an introduction to the ARM OpenJDK porting effort, I look forward to the follow up article where all the JVM makers have picked their favourite GCC/Clang/Foo compiler options and suitable matching compile flags. One idea are to create a OpenJDK binary release with a custom OpenJDK installer that would ease testing of tuned OpenJDK JVM implementations.

Cheers, Xerxes

December 2, 2011

I have been following the CACAO JVM development on ARM since 2008, back then CACAO was one of the first alternative JVM, to be used instead of Hotspot in combination with the OpenJDK 6 class libraries.

CACAO history dates back to 1997-1998 when CACAO was one of the first JIT compiler to be used instead of SUN’s Java JVM interpreter.

Today CACAO are being used in combination with OpenJDK 6 on architectures like ARM, MIPS and PPC where Oracle have not yet released code for a GPL licensed Hotspot JIT. CACAO are popular, see the Debian OpenJDK-6 popularity contest chart where up to 80% of all the Debian OpenJDK 6 JVM users have picked CACAO to be installed. This trend kept on since the beginning of 2009 up to the summer of 2011.

Carpe diem CACAO JVM!

During the summer of 2011 Oracle released OpenJDK 7 and CACAO users started to abandon the JVM in favour for JamVM, the reason “why?” are that CACAO depends on the HPI API that have been removed from the OpenJDK 7 code base. This means that CACAO currently only work in combination with the “classic” OpenJDK 6. The second black cloud for CACAO JVM on ARM was that all major ARM Linux distributions started to move from “armel” towards the new “armhf” ABI something CACAO do not support. JamVM here provided the ARM Linux distributions and users with a stable and future proof alternative.

If we for a moment forget about the future and focus on today CACAO are in great shape when built from CACAO hg HEAD.

  • CACAO are FAST,
  • CACAO are stable, thanks to Stefan Ring who have been diligent on fixing bugs found in the CACAO JIT codegen.
  • CACAO are fresh, the current CACAO hg HEAD contains the rewritten, “still unreleased” C++ version of CACAO its a totally different JVM compared to the last C based release of CACAO 0.99.4.

If you want to experience the CACAO JVM in its finest the do run the latest development version of CACAO in combination with OpenJDK 6, built using the current IcedTea6 head. Run it on ARM “armel”, PPC or MIPS and experience a fast responsive JVM burning brighter than ever before!

March 10, 2010

When making a programming tool or a virtual machine getting the tool running perfectly stable without any crash bugs are always on a higher priority than gaining more speed. A crashing tool are a broken tool so I will share some tricks that I have practised to find and fix Shark LLVM JIT CodeGen crash bugs. The main trick are to be able generate reproducable testcases that can be reported to the LLVM developers bugzilla bugtracker by using what you can extract from the Shark LLVM JIT CodeGen crashes. Here is how I do it, enjoy!

How to provoke hard to find Shark LLVM JIT bugs
Some Shark LLVM JIT bugs are hard to find because they only occour after the Shark JIT enabled JVM have been running for a long time, this are because the Shark Hotspot JVM takes advantage of the fact that a given running application spends about 90% of its time running only 10% of the applications code. Hotspot profiles the running code and only JITs the most frequently used methods of the program. Hotspot uses a threshold to determine which methods to JIT. When a method have been used more than 100000 times then it are scheduled to be optimized by the JIT. JIT bugs can stay undetected if they are located in unfrequently executed methods, those methods that makes up the 90%, of the unfrequently executed application code.

A easy trick to provoke unfrequently executed JIT bugs are to lower the JIT threshold in Hotspot so that Hotspot JITs everything. The JIT threshold can be controlled by using the -XX:CompileThreshold=1 option and -Xbatch option. -Xbatch prevents the hotspot from running the JIT in background and will make hotspot reproduce JIT bugs more determistic.

Using a low JIT threshold will of course make the program startup magnitudes slower but it will also eventually find and hit all JIT bugs for a given application. Try pass -XX:+PrintCompilation to Hotspot as well so that you can observe all the java methods that Hotspot are JITting and find out which method that failed to JIT if Hotspot hits a JIT crash bug.
java -XX:CompileThreshold=1 -Xbatch -XX:+PrintCompilation JavaApplication
1 b java.lang.Thread:: (49 bytes)
10 b java.lang.String::getChars (66 bytes)
/home/xerxes/llvm/include/llvm/CodeGen/MachineFrameInfo.h:289: int64_t
llvm::MachineFrameInfo::getObjectOffset(int) const: Assertion
`!isDeadObjectIndex(ObjectIdx) && "Getting frame offset for a dead object?"'

Huh.. no logfile??
Most Shark LLVM JIT CodeGen crash bugs makes the JVM instantainiously exit without producting a hs_err_pid*.log file. Whats usefull are that the JVM output will contain a Assertion, Unreachable or Unimplemented keyword and a LLVM code line numer.

So what do we do now?
Thanks by using -XX:+PrintCompilation makes us aware that the last method JITed was the java.lang.String::getChars method and that caused the Assertion in the LLVM CodeGen when running the Shark JIT so the next step are to dump the LLVM IR that Shark have generated for the method.

Extract the LLVM IR for the java method that makes the Shark JIT crash.
Ok so we got a crash and we know that it was JITing of java.lang.String::getChars that caused it.

a) shark debug build -XX:SharkPrintBitcodeOf= method:
If you have built a debuggable “Mixtech” Shark build then Shark will contain some extra usefull debug runtimeoptions where one of the more usefull are
use it and Shark will dump the LLVM IR bitcode to stdout just before jitting it.

b) gdb call F->dump() method:
I personally prefer dumping LLVM IR from inside the gnu gdb debugger since this method can be used using release Shark build in combination with release llvm builds so lets jump into the gdb debugger!

Start gdb and attach it to the java application with all the options that triggered the JIT CodeGen bug!
$ gdb -args java -XX:CompileThreshold=1 -Xbatch -XX:+PrintCompilation JavaApplication
(gdb) run
Segmentation fault

Ick gdb crashed why? This are because the JVM launcher “java” first sets up the system environment and then forks off in a new process using execve(). gdb gets killed by the linux kernel when it are trying to read memory across process boundarys so we must stop java from forking!

The easiest way to prevent java from forking are to setup the system environments before launching the application. And all this can be done from inside gdb so lets try again!
$ gdb -args java -XX:CompileThreshold=1 -Xbatch -XX:+PrintCompilation JavaApplication
(gdb) break execve
Breakpoint 1 at 0x93b8
(gdb) run
(gdb) call puts(getenv("LD_LIBRARY_PATH"))
$1 = 220

Ok now we know what the LD_LIBRARY_PATH should look like and if we set it before running the java launcher will prevent java from forking using execve, this LD_LIBRARY_PATH and execve madness are thankfully gone in JDK7!
(gdb) set env LD_LIBRARY_PATH=/media/disk/
I will do one more thing namely set a gdb breakpoint inside java_md.c:652 right after the hotspot library have been loaded by the java launcher.
(gdb) break java_md.c:652
(gdb) run
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Breakpoint 2, LoadJavaVM ... java_md.c:652
652 if (libjvm == NULL) {

This are a good spot to setup new gdb breakpoints inside the loaded that contains the Shark JIT. Finally we are able to place a breakpoint on the line where the Shark JIT failed inside LLVM.
(gdb) break MachineFrameInfo.h:289
(gdb) continue
10 b java.lang.String::getChars (66 bytes)
[Switching to Thread 0x67ed96a490 (LWP 21127)]

Breakpoint 3, ... at ... MachineFrameInfo.h:289

Get a backtrace and try to locate the frame where Shark calls getPointerToFunction
(gdb) bt
#9 0x40d4ee68 in llvm::JIT::getPointerToFunction (this=0x9e138, F=0xda6f0)

Switch to the getPointerToFunction stack frame
(gdb) frame 9
and finnaly dump the LLVM IR for the function by calling the functions own method dump() !
(gdb) call F->dump()
define internal void @"java.lang.String::getChars"([84 x i8]* %method, i32 %base_pc, [788 x i8]* %thread) {
%1 = getelementptr inbounds [788 x i8]* %thread, i32 0, i32 756 ; [#uses=1]
%zero_stack = bitcast i8* %1 to [12 x i8]* ; <[12 x i8]*> [#uses=1]
%2 = getelementptr inbounds [12 x i8]* %zero_stack, i32 0, i32 8 ; [#uses=1]
%stack_pointer_addr = bitcast i8* %2 to i32* ; [#uses=1]
%3 = load i32* %stack_pointer_addr ; [#uses=1]
%142 = getelementptr inbounds [17 x i32]* %frame, i32 0, i32 12 ; [#uses=1]
store i32 %31, i32* %142
call void inttoptr (i32 13839116 to void ([788 x i8]*, i32)*)([788 x i8]* %thread, i32 7)
ret void

Horray! we have successfully dumped the Shark generated LLVM IR for the problematic method-call. Now simply copy the dump output from the terminal into a file named bug.ll and continue reading.

Check for LLVM CodeGen bugs by testing if the dumped LLVM IR bug.ll file can reproduce the bug using llc
After you have extracted LLVM IR for the problematic method check if you can reproduce the bug using llc..
$ llvm-as < bug.ll | llc
.syntax unified
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.file ""
int64_t llvm::MachineFrameInfo::getObjectOffset(int) const: Assertion
`!isDeadObjectIndex(ObjectIdx) && "Getting frame offset for a dead object?"'
0 llc 0x01368414
1 llc 0x01368ccc
2 0x4021cc10 __default_sa_restorer_v2 + 0
Stack dump:
0. Program arguments: /wd/r96575/Debug/bin/llc -march=arm
1. Running pass 'Prolog/Epilog Insertion & Frame Finalization' on function

If it crashes using llc then cheer up because you now got a reproducable CodeGen bug and thats great! These kind of crash bugs are on LLVM developers top wanted list because they can fire on any tool that uses LLVM code generation. The best way to report this kind of bugs are to first generate a compact testcase for LLVM that triggers the bug that can be used by the LLVM developers to fix it. It can also be run by the LLVM developers daily regression testing to make sure this bug never hits again.

If it fails to crash with an Aborted like above then you are probably observing a JIT CodeEmitter runtime bug, stay tuned and look forward to my next blog post on “How to fix Shark LLVM JIT CodeEmitter bugs”!

How to generate a bugpoint-reduced-simplified.bc from the bug.ll using bugpoint for CodeGen crash bugs
LLVM ships with a clever tool called bugpoint that are designed to convert dumped blocks of LLVM IR into a compact bugpoint-reduced-simplified.bc LLVM bitcode testcase file that only contains the instructions needed to reproduce the bug.

$ bugpoint -run-llc bug.ll --tool-args -march=arm

Bugpoint work by using deductive logic to break down and remove parts of the bug.ll file and automatically narrow down the LLVM IR lines needed to reproduce the bug. It can take some minutes so be patient but bugpoint will eventually stop and give you a bugpoint-reduced-simplified.bc and print some information on how to reproduce the bug.

File a LLVM bugreport containing the bugpoint-reduced-simplified.bc file
An example of a Shark JIT LLVM bug that have been fixed after submitting a bugpoint-reduced-simplified.bc produced from a dumped Shark metod are :
LLVM PR6478 ARM CodeGen Running pass ‘Prolog/Epilog Insertion & Frame Finalization’ on function ‘@”java.lang.String::getChars”‘

I hope this post have given you some inspiration on how to get Shark LLVM JIT CodeGen crash bugs fixed!
If you want to know more about how bugpoint works and how to officially prepare LLVM bugreports then take a peek at the LLVM documentation: its great.


March 9, 2010

I have from time to time been working on a automatic CPU feature tuning code for Shark to make the LLVM JIT generate better code for Cortex-A8 class ARM CPUs. Using it i was able to gain some substantial speed-improvements, all in all it made Shark generate 30% faster code on ARM and the Shark JIT are now able to beat the 2000 point CaffeineMark 3.0 score! I could not resist using CaffeineMark 3.0 for some benchmarking again :) .

While this looks all great with rainbows and unicorns the patch are sadly in limbo. I have had some trouble merging the code into LLVM 2.7 trunk before the next LLVM release since I got hit by LLVM problem report 6544 that will force me to redesign the implementation on top of LLVM before it can be commited.

And a similar optimisation like what i did for ARM Linux can easily be done for Shark on PPC Linux as well by adopting the ARM CPU features detection code from ARM to PPC!
For those who are interested the optimising code are submitted and can be fetched from LLVM PR5389.

Cheers and have a great day!

February 25, 2010

During the past months i have seen some really cool stuff done using small powerefficient ARM computers and OpenJDK.

SimpleSimon connects

Simple Simon PT connected to a hospital laboratory system by using a powerefficient plugcomputer and displaylink usb screen. All powered by OpenJDK

Simple Simon PT connects:

This project hooks up a battery powered laboratory coagulation device, a Simple Simon PT reader to standard hospital laboratory system using a ASTM-1394-1397 / LIS2-A2 connectivity over ethernet. A small ARM based plugcomputer does all data message processing and communication. User interraction are performed by using the Simon reader and a usb-barcode reader to enter laboratory identification. Optionally can a usb-touch-screen be connected for improved user feedback, by displaying charts using JFreeChart, to show and give a better understanding of the coagulation process.

Powerconsumption tops at 15W with the USB screen attached and 6W without. All running silent without any moving parts!

Shark linked against the shared

Shark linked against the shared

Shark linked against dynamic LLVM .so library

Earlier today I got Shark linked against a shared generated by using LLVM 2.7svn trunk. It work by simply building LLVM using configure –enable-shared –enable-optimized –disable-assertions and then tweak the Icedtea6 main Makefiles to use the shared library during liking:
Replace the line
LLVM_LIBS = -lLLVMX86Disassembler -lLLVMX86AsmParser -lLLVMMCParser -lLLVMX86AsmPrinter -lLLVMX86CodeGen -lLLVMSelectionDAG -lLLVMAsmPrinter -lLLVMX86Info -lLLVMJIT -lLLVMExecutionEngine -lLLVMCodeGen -lLLVMScalarOpts -lLLVMInstCombine -lLLVMTransformUtils -lLLVMipa -lLLVMAnalysis -lLLVMTarget -lLLVMMC -lLLVMCore -lLLVMSupport -lLLVMSystem
LLVM_LIBS = -lLLVM-2.7svn
in the main icedtea6/Makefile and then build Icedtea6 normally, Shark currently builds and works right out of the box when using a LLVM release build!

A cool thing by building shark against the shared library are that you can switch the LLVM JIT that Shark uses from running with or without assertions, debug code, and various extra optimizations by simply replacing the /usr/local/lib/ file with what you want. Linking time during shark builds and shark footprint are impressively smaller as well. Im really happy to see this functionallity in LLVM 2.7!

The LLVM 2.7 code freeze before the 2.7 release happens in about 1.5 weeks from now and i will stay busy for some days to observe and polish the current LLVM svn trunk to be usable with openjdk-6-shark.

Edward Nevill created a ARM Jazelle RTC Thumb2 JIT reference implementation

Meanwhile I have been busy taming Sharks a new kind of Thumb2 JIT have emerged built by Edward Nevill of Cambridge Software Labs! The new Tumb2JIT have been committed into the Icedtea6 trunk and it are a working implementation of Jazelle RTC to be used by ARM Cortex-A8+ class CPUs. It wonderfull that this have been released as free software, Wow!,

Suddenly we got three different JITs to use on ARM with OpenJDK: Cacao, Shark and T2. An opurtunity has emerged to tier them and so I did. Here comes the raw “truth” produced by Caffeine Mark 3! This will probably be the last time i will show off any Caffeine Mark 3 benchmark since it really dont give justice on real world client applications where responsiveness are more crucial than top runtime speed, nevertheless benchmarking using CM30 have always felt fun so here we go: All benchmarks running using a Sharp PC-Z1 Cortex-A8 Mobile internet tool.

Tier between Edwards Thumb 2 JIT , Shark LLVM JIT and Cacao JIT: All running on OpenJDK 6 ARM

Tier between Edwards Thumb 2 JIT , Shark LLVM JIT and Cacao JIT: All running on a ARM Sharp PC-Z1 Mobile internet tool smartbook using OpenJDK 6 compiled with Icedtea6.

This new T2 JIT’s main strenght are reduced jitting time, it basically cuts all jtting time to almost zero and client applications on ARM finnaly runs from tick one. This thumb2 jit makes a really nice java applet browser experience with about 15 seconds first applet startuptime on a ARM smartbook and and all usable instantly after being loaded.
A small 1min 12seconds .3gp movie displaying some java applets running on the Sharp PC-Z1 featuring the new thumb2jit from Icedtea6

Cheers and have a great day!

October 6, 2009

picture of the day!

The picture that made my day!

Ok.. so what happened?

xerxes@babbage-karmic:/wd/icedtea6/openjdk/build/linux-arm/bin$ ./java -version
java version "1.6.0_0"
OpenJDK Runtime Environment (IcedTea6 1.7pre-r2a3725ce72d4) (build 1.6.0_0-b16)
OpenJDK Shark VM (build 14.0-b16-product, mixed mode)

xerxes@babbage-karmic:/wd/icedtea6/openjdk/build/linux-arm/bin$ cat /proc/cpuinfo
Processor    : ARMv7 Processor rev 1 (v7l)
BogoMIPS    : 799.53
Features    : swp half thumb fastmult vfp edsp
CPU implementer    : 0x41
CPU architecture: 7
CPU variant    : 0x2
CPU part    : 0xc08
CPU revision    : 1
Hardware    : Freescale MX51 Babbage Board
Revision    : 51011
Serial        : 0000000000000000

xerxes@babbage-karmic:/wd/llvm$ svn info
Repository Root:
Repository UUID: 91177308-0d34-0410-b5e6-96231b3b80d8
Revision: 82896
Node Kind: directory
Schedule: normal
Last Changed Author: edwin
Last Changed Rev: 82896
Last Changed Date: 2009-09-27 11:08:03 +0000 (Sun, 27 Sep 2009)

xerxes@babbage-karmic:/wd/llvm$ quilt diff
Index: llvm/lib/Target/ARM/
--- llvm.orig/lib/Target/ARM/    2009-10-06 12:35:26.000000000 +0000
+++ llvm/lib/Target/ARM/    2009-10-06 12:36:03.000000000 +0000
@@ -645,7 +645,7 @@
 IIC_Br, "mov lr, pc\n\tbx $func",
 [(ARMcall_nolink GPR:$func)]>,
 Requires<[IsARM, IsNotDarwin]> {
-    let Inst{7-4}   = 0b0001;
+    let Inst{7-4}   = 0b0011;
 let Inst{19-8}  = 0b111111111111;
 let Inst{27-20} = 0b00010010;

The last patch on LLVM are currently a hack. basically it makes LLVM emit ARM BLX instructions instead of BX instructions for ARM::CALL_NOLINK. So why did this little hack make it work?

In order to understand that, one have to find out what made Shark on ARM crash before…

Lets rewind time to some days ago... 

Hi, i have been enjoying myself inside gdb for some days, and I have now at least found the reason why the cpu
ends up in garbage memory when running shark on arm.

The problem can be illustrated like this:

frame manager invokes jited code
entry_zero.hpp:57 invokes jit code at 0x67c9e990

jited code runs
0x67c9e990:    push    {r4, r5, r6, r7, r8, r9, r10, r11, lr}
0x67c9e994:    sub    sp, sp, #12    ; 0xc
0x67c9e998:    ldr    r12, [r3, #756]
0x67c9e99c:    ldr    lr, [r3, #764]
0x67c9e9a0:    sub    r4, lr, #56    ; 0x38
0x67c9e9a4:    cmp    r4, r12
0x67c9e9a8:    bcc    0x67c9ebd0
0x67c9e9ac:    mov    r5, r3
0x67c9e9b0:    str    r2, [sp, #4]
0x67c9e9b4:    mov    r6, r0
0x67c9e9b8:    str    r4, [r5, #764]
0x67c9e9bc:    str    r4, [r4, #20]
0x67c9e9c0:    ldr    r0, [pc, #640]    ; 0x67c9ec48
0x67c9e9c4:    str    r0, [r4, #28]
0x67c9e9c8:    ldr    r0, [r5, #768]
0x67c9e9cc:    str    r0, [r4, #32]
0x67c9e9d0:    add    r0, r4, #32    ; 0x20
0x67c9e9d4:    str    r0, [r5, #768]
0x67c9e9d8:    str    r6, [r4, #16]
0x67c9e9dc:    ldr    r7, [r1]
0x67c9e9e0:    ldr    r0, [r1, #4]
0x67c9e9e4:    str    r0, [sp]
0x67c9e9e8:    ldr    r8, [r1, #8]
0x67c9e9ec:    ldr    r9, [r1, #12]
0x67c9e9f0:    ldr    r0, [r1, #16]
0x67c9e9f4:    str    r0, [sp, #8]
0x67c9e9f8:    ldr    r10, [r1, #20]
0x67c9e9fc:    ldr    r2, [pc, #584]    ; 0x67c9ec4c   <------ jit code calls a jvm function stored in this address
0x67c9ea00:    mov    r0, r1
0x67c9ea04:    bx    r2 <---------------------------   problem!  should have been blx!

(gdb) x 0x67c9ec4c
0x67c9ec4c:    0x40836d9c
(gdb) x 0x40836d9c
0x40836d9c <_ZN13SharedRuntime17OSR_migration_endEPi>:    0xe92d41f0

so lets check out _ZN13SharedRuntime17OSR_migration_endEPi

0x40836d9c <_ZN13SharedRuntime17OSR_migration_endEPi+0>:    push    {r4, r5, r6, r7, r8, lr}    <------  lr are backed up..  but bx did not update lr..
0x40836da0 <_ZN13SharedRuntime17OSR_migration_endEPi+4>:    ldr    r4, [pc, #284]    ; 0x40836ec4 <_ZN13SharedRuntime17OSR_migration_endEPi+296>
0x40836da4 <_ZN13SharedRuntime17OSR_migration_endEPi+8>:    ldr    r7, [pc, #284]    ; 0x40836ec8 <_ZN13SharedRuntime17OSR_migration_endEPi+300>
0x40836da8 <_ZN13SharedRuntime17OSR_migration_endEPi+12>:    ldr    r6, [pc, #284]    ; 0x40836ecc <_ZN13SharedRuntime17OSR_migration_endEPi+304>
0x40836dac <_ZN13SharedRuntime17OSR_migration_endEPi+16>:    add    r4, pc, r4
0x40836db0 <_ZN13SharedRuntime17OSR_migration_endEPi+20>:    ldr    r12, [r4, r7]
0x40836db4 <_ZN13SharedRuntime17OSR_migration_endEPi+24>:    ldr    r1, [r4, r6]
0x40836db8 <_ZN13SharedRuntime17OSR_migration_endEPi+28>:    ldr    r5, [r12]
0x40836dbc <_ZN13SharedRuntime17OSR_migration_endEPi+32>:    ldrb    r2, [r1]
0x40836dc0 <_ZN13SharedRuntime17OSR_migration_endEPi+36>:    add    r3, r5, #1    ; 0x1
0x40836dc4 <_ZN13SharedRuntime17OSR_migration_endEPi+40>:    cmp    r2, #0    ; 0x0
0x40836dc8 <_ZN13SharedRuntime17OSR_migration_endEPi+44>:    sub    sp, sp, #24    ; 0x18
0x40836dcc <_ZN13SharedRuntime17OSR_migration_endEPi+48>:    str    r3, [r12]
0x40836dd0 <_ZN13SharedRuntime17OSR_migration_endEPi+52>:    mov    r7, r0
0x40836dd4 <_ZN13SharedRuntime17OSR_migration_endEPi+56>:    bne 0x40836e74 <_ZN13SharedRuntime17OSR_migration_endEPi+216>
0x40836dd8 <_ZN13SharedRuntime17OSR_migration_endEPi+60>:    ldr    r2, [pc, #240]    ; 0x40836ed0 <_ZN13SharedRuntime17OSR_migration_endEPi+308>
0x40836ddc <_ZN13SharedRuntime17OSR_migration_endEPi+64>:    ldr    r12, [r4, r2]
0x40836de0 <_ZN13SharedRuntime17OSR_migration_endEPi+68>:    ldrb    r3, [r12]
0x40836de4 <_ZN13SharedRuntime17OSR_migration_endEPi+72>:    cmp    r3, #0    ; 0x0
0x40836de8 <_ZN13SharedRuntime17OSR_migration_endEPi+76>:    beq 0x40836e20 <_ZN13SharedRuntime17OSR_migration_endEPi+132>
0x40836dec <_ZN13SharedRuntime17OSR_migration_endEPi+80>:    ldr    r6, [pc, #224]    ; 0x40836ed4 <_ZN13SharedRuntime17OSR_migration_endEPi+312>
0x40836df0 <_ZN13SharedRuntime17OSR_migration_endEPi+84>:    ldr    r5, [r4, r6]
0x40836df4 <_ZN13SharedRuntime17OSR_migration_endEPi+88>:    add    r0, r4, r6
0x40836df8 <_ZN13SharedRuntime17OSR_migration_endEPi+92>:    tst    r5, #1    ; 0x1
0x40836dfc <_ZN13SharedRuntime17OSR_migration_endEPi+96>:    beq 0x40836e8c <_ZN13SharedRuntime17OSR_migration_endEPi+240>
0x40836e00 <_ZN13SharedRuntime17OSR_migration_endEPi+100>:    ldr    r5, [pc, #208]    ; 0x40836ed8 <_ZN13SharedRuntime17OSR_migration_endEPi+316>
0x40836e04 <_ZN13SharedRuntime17OSR_migration_endEPi+104>:    ldr    r3, [r4, r5]
0x40836e08 <_ZN13SharedRuntime17OSR_migration_endEPi+108>:    cmp    r3, #0    ; 0x0
0x40836e0c <_ZN13SharedRuntime17OSR_migration_endEPi+112>:    movne r0, r3
0x40836e10 <_ZN13SharedRuntime17OSR_migration_endEPi+116>:    ldrne r6, [r3]
0x40836e14 <_ZN13SharedRuntime17OSR_migration_endEPi+120>:    ldrne r12, [r6, #16]
0x40836e18 <_ZN13SharedRuntime17OSR_migration_endEPi+124>:    movne lr, pc
0x40836e1c <_ZN13SharedRuntime17OSR_migration_endEPi+128>:    bxne    r12
0x40836e20 <_ZN13SharedRuntime17OSR_migration_endEPi+132>:    add    r6, sp, #20    ; 0x14
0x40836e24 <_ZN13SharedRuntime17OSR_migration_endEPi+136>:    mov    r0, r6
0x40836e28 <_ZN13SharedRuntime17OSR_migration_endEPi+140>:    bl 0x40596c84 <NoHandleMark>
0x40836e2c <_ZN13SharedRuntime17OSR_migration_endEPi+144>:    mov    r0, sp
0x40836e30 <_ZN13SharedRuntime17OSR_migration_endEPi+148>:    bl 0x4057909c <JRT_Leaf_Verifier>
0x40836e34 <_ZN13SharedRuntime17OSR_migration_endEPi+152>:    ldr    r3, [pc, #160]    ; 0x40836edc <_ZN13SharedRuntime17OSR_migration_endEPi+320>
0x40836e38 <_ZN13SharedRuntime17OSR_migration_endEPi+156>:    mov    r5, sp
0x40836e3c <_ZN13SharedRuntime17OSR_migration_endEPi+160>:    ldr r12, [r4, r3]
0x40836e40 <_ZN13SharedRuntime17OSR_migration_endEPi+164>:    ldrb r0, [r12]
0x40836e44 <_ZN13SharedRuntime17OSR_migration_endEPi+168>:    cmp    r0, #0    ; 0x0
0x40836e48 <_ZN13SharedRuntime17OSR_migration_endEPi+172>:    movne r0, r7
0x40836e4c <_ZN13SharedRuntime17OSR_migration_endEPi+176>:    blne 0x4039b20c <_Z15trace_heap_freePv>
0x40836e50 <_ZN13SharedRuntime17OSR_migration_endEPi+180>:    mov    r0, r7
0x40836e54 <_ZN13SharedRuntime17OSR_migration_endEPi+184>:    bl 0x407b6a94 <_ZN2os4freeEPv>
0x40836e58 <_ZN13SharedRuntime17OSR_migration_endEPi+188>:    mov    r0, sp
0x40836e5c <_ZN13SharedRuntime17OSR_migration_endEPi+192>:    bl 0x40578c5c <~JRT_Leaf_Verifier>
0x40836e60 <_ZN13SharedRuntime17OSR_migration_endEPi+196>:    mov    r0, r6
0x40836e64 <_ZN13SharedRuntime17OSR_migration_endEPi+200>:    bl 0x40596b04 <~NoHandleMark>
0x40836e68 <_ZN13SharedRuntime17OSR_migration_endEPi+204>:    add    sp, sp, #24    ; 0x18
0x40836e6c <_ZN13SharedRuntime17OSR_migration_endEPi+208>:    pop {r4, r5, r6, r7, r8, lr}
0x40836e70 <_ZN13SharedRuntime17OSR_migration_endEPi+212>:    bx    lr <------  and woho. lets enjoy a trip to garbage memory!

So when the function that the jit calls returns we find ourself eating
garbage memory.

So the small hack fixed this issue quite well but broke armv4t compatibility for the moment.

My next task would be to fix this properly in LLVM.

September 13, 2009

During the past month i have been running a public llvm-arm-linux buildbot in order to iron out the remaining bugs in the LLVM Execution Engine JIT for ARM.
My goal was to stabilise the LLVM JIT so that it can be used to speed up cool projects like OpenJDK on ARM by fixing all pre-requirements to run Gary Benson’s Shark JIT compiler on top of Zero!

I have been following the LLVM project for about a year and for me to see the following reports from the buildbot makes me jump of joy! It marks a new era, when all cool and silent energy efficient computing on ARM can get JIT accelerated!

  • (Sep 12 21:57) rev=[81669] success #153: build successful
  • (Sep 12 19:17) rev=[81660] success #152: build successful
  • (Sep 12 16:31) rev=[81655] failure #151: failed test-llvm
  • (Sep 12 13:03) rev=[81626] failure #148: failed test-llvm

The next LLVM release 2.6 gets out in about a week (21 of september 2009) and feel I have done my part in the LLVM stabilisation process for ARM, It are now up for the LLVM 2.6 release managers to merge in the patches from the 2.7 svn trunk to the release branch in order to make the LLVM 2.6 release stable on ARM as well.

Life is cool!

May 10, 2009

I have just pushed an update to the Jalimo project that enables the new OpenJDK 6 b16 sourcebundle to be cross-compile-able for embedded devices using Jalimo as a cross-compile layer for Icedtea6.

Using Jalimo you can now cross-compile OpenJDK b16 and have hotspot + zero, hotspot + shark or cacao as the vm built out of the box, simply awesome!

Since shark are using the pre2.6 LLVM sources for its JIT I have also prepared “.bb” build recipes for Openembedded that enables quick cross compilation of LLVM based on the LLVM svn trunk so that Jalimo can make use of them when building shark.

The shark vm are built with assertions enabled in order to produce better debug output for all Jalimo users.

Robert Schuster have been an excellent tutor for me to understand all the quirks of OE-recipes, quirks that in turn helped me to creating all these new nice cross compile recipes for OE and Jalimo. Thank you Robert and thank you for pushing the LLVM recipes into the main OE dev git tree!

Andrew Haley and Gary Benson have helped me enormously to understand the lock-free code using memory-barriers that are part of the zero and shark hotspot implementations. I will keep working on these parts in order to make zero and shark rock solid on ARM before ARM Cortex A9 multi-core CPU’s will be part of every cool and silent computing loving persons pocket.

Older Posts »

Powered by WordPress