zafena development

March 10, 2010

When making a programming tool or a virtual machine getting the tool running perfectly stable without any crash bugs are always on a higher priority than gaining more speed. A crashing tool are a broken tool so I will share some tricks that I have practised to find and fix Shark LLVM JIT CodeGen crash bugs. The main trick are to be able generate reproducable testcases that can be reported to the LLVM developers bugzilla bugtracker by using what you can extract from the Shark LLVM JIT CodeGen crashes. Here is how I do it, enjoy!

How to provoke hard to find Shark LLVM JIT bugs
Some Shark LLVM JIT bugs are hard to find because they only occour after the Shark JIT enabled JVM have been running for a long time, this are because the Shark Hotspot JVM takes advantage of the fact that a given running application spends about 90% of its time running only 10% of the applications code. Hotspot profiles the running code and only JITs the most frequently used methods of the program. Hotspot uses a threshold to determine which methods to JIT. When a method have been used more than 100000 times then it are scheduled to be optimized by the JIT. JIT bugs can stay undetected if they are located in unfrequently executed methods, those methods that makes up the 90%, of the unfrequently executed application code.

A easy trick to provoke unfrequently executed JIT bugs are to lower the JIT threshold in Hotspot so that Hotspot JITs everything. The JIT threshold can be controlled by using the -XX:CompileThreshold=1 option and -Xbatch option. -Xbatch prevents the hotspot from running the JIT in background and will make hotspot reproduce JIT bugs more determistic.

Using a low JIT threshold will of course make the program startup magnitudes slower but it will also eventually find and hit all JIT bugs for a given application. Try pass -XX:+PrintCompilation to Hotspot as well so that you can observe all the java methods that Hotspot are JITting and find out which method that failed to JIT if Hotspot hits a JIT crash bug.
java -XX:CompileThreshold=1 -Xbatch -XX:+PrintCompilation JavaApplication
1 b java.lang.Thread:: (49 bytes)

10 b java.lang.String::getChars (66 bytes)
*crash*
/home/xerxes/llvm/include/llvm/CodeGen/MachineFrameInfo.h:289: int64_t
llvm::MachineFrameInfo::getObjectOffset(int) const: Assertion
`!isDeadObjectIndex(ObjectIdx) && “Getting frame offset for a dead object?”‘
failed.

Huh.. no logfile??
Most Shark LLVM JIT CodeGen crash bugs makes the JVM instantainiously exit without producting a hs_err_pid*.log file. Whats usefull are that the JVM output will contain a Assertion, Unreachable or Unimplemented keyword and a LLVM code line numer.

So what do we do now?
Thanks by using -XX:+PrintCompilation makes us aware that the last method JITed was the java.lang.String::getChars method and that caused the Assertion in the LLVM CodeGen when running the Shark JIT so the next step are to dump the LLVM IR that Shark have generated for the method.

Extract the LLVM IR for the java method that makes the Shark JIT crash.
Ok so we got a crash and we know that it was JITing of java.lang.String::getChars that caused it.

a) shark debug build -XX:SharkPrintBitcodeOf= method:
If you have built a debuggable “Mixtech” Shark build then Shark will contain some extra usefull debug runtimeoptions where one of the more usefull are
-XX:SharkPrintBitcodeOf=java.package.name::MethodName
use it and Shark will dump the LLVM IR bitcode to stdout just before jitting it.

b) gdb call F->dump() method:
I personally prefer dumping LLVM IR from inside the gnu gdb debugger since this method can be used using release Shark build in combination with release llvm builds so lets jump into the gdb debugger!

Start gdb and attach it to the java application with all the options that triggered the JIT CodeGen bug!
$ gdb -args java -XX:CompileThreshold=1 -Xbatch -XX:+PrintCompilation JavaApplication
(gdb) run
...
Segmentation fault
$

Ick gdb crashed why? This are because the JVM launcher “java” first sets up the system environment and then forks off in a new process using execve(). gdb gets killed by the linux kernel when it are trying to read memory across process boundarys so we must stop java from forking!

The easiest way to prevent java from forking are to setup the system environments before launching the application. And all this can be done from inside gdb so lets try again!
$ gdb -args java -XX:CompileThreshold=1 -Xbatch -XX:+PrintCompilation JavaApplication
(gdb) break execve
Breakpoint 1 at 0x93b8
(gdb) run
(gdb) call puts(getenv("LD_LIBRARY_PATH"))
/media/disk/4mar-shark-1.8pre-b18-llvm-2.7svn.so-npplugin/jre/lib/arm/server:/media/disk/4mar-shark-1.8pre-b18-llvm-2.7svn.so-npplugin/jre/lib/arm:/media/disk/4mar-shark-1.8pre-b18-llvm-2.7svn.so-npplugin/jre/../lib/arm
$1 = 220

Ok now we know what the LD_LIBRARY_PATH should look like and if we set it before running the java launcher will prevent java from forking using execve, this LD_LIBRARY_PATH and execve madness are thankfully gone in JDK7!
(gdb) set env LD_LIBRARY_PATH=/media/disk/4mar-shark-1.8pre-b18-llvm-2.7svn.so-npplugin/jre/lib/arm/server:/media/disk/4mar-shark-1.8pre-b18-llvm-2.7svn.so-npplugin/jre/lib/arm:/media/disk/4mar-shark-1.8pre-b18-llvm-2.7svn.so-npplugin/jre/../lib/arm
I will do one more thing namely set a gdb breakpoint inside java_md.c:652 right after the hotspot library libjvm.so have been loaded by the java launcher.
(gdb) break java_md.c:652
(gdb) run
The program being debugged has been started already.
Start it from the beginning? (y or n) y
...
Breakpoint 2, LoadJavaVM ... java_md.c:652
652 if (libjvm == NULL) {

This are a good spot to setup new gdb breakpoints inside the loaded libjvm.so that contains the Shark JIT. Finally we are able to place a breakpoint on the line where the Shark JIT failed inside LLVM.
(gdb) break MachineFrameInfo.h:289
(gdb) continue
Continuing.
...
10 b java.lang.String::getChars (66 bytes)
[Switching to Thread 0x67ed96a490 (LWP 21127)]

Breakpoint 3, … at … MachineFrameInfo.h:289

Get a backtrace and try to locate the frame where Shark calls getPointerToFunction
(gdb) bt
...
#9 0x40d4ee68 in llvm::JIT::getPointerToFunction (this=0x9e138, F=0xda6f0)
...

Switch to the getPointerToFunction stack frame
(gdb) frame 9
and finnaly dump the LLVM IR for the function by calling the functions own method dump() !
(gdb) call F->dump()
define internal void @"java.lang.String::getChars"([84 x i8]* %method, i32 %base_pc, [788 x i8]* %thread) {
%1 = getelementptr inbounds [788 x i8]* %thread, i32 0, i32 756 ; [#uses=1]
%zero_stack = bitcast i8* %1 to [12 x i8]* ; <[12 x i8]*> [#uses=1]
%2 = getelementptr inbounds [12 x i8]* %zero_stack, i32 0, i32 8 ; [#uses=1]
%stack_pointer_addr = bitcast i8* %2 to i32* ; [#uses=1]
%3 = load i32* %stack_pointer_addr ; [#uses=1]

%142 = getelementptr inbounds [17 x i32]* %frame, i32 0, i32 12 ; [#uses=1]
store i32 %31, i32* %142
call void inttoptr (i32 13839116 to void ([788 x i8]*, i32)*)([788 x i8]* %thread, i32 7)
ret void
}

Horray! we have successfully dumped the Shark generated LLVM IR for the problematic method-call. Now simply copy the dump output from the terminal into a file named bug.ll and continue reading.

Check for LLVM CodeGen bugs by testing if the dumped LLVM IR bug.ll file can reproduce the bug using llc
After you have extracted LLVM IR for the problematic method check if you can reproduce the bug using llc..
$ llvm-as < bug.ll | llc
.syntax unified
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.file "
llc:
/wd/buildbot/llvm-arm-linux/llvm/include/llvm/CodeGen/MachineFrameInfo.h:289:
int64_t llvm::MachineFrameInfo::getObjectOffset(int) const: Assertion
`!isDeadObjectIndex(ObjectIdx) && “Getting frame offset for a dead object?”‘
failed.
0 llc 0×01368414
1 llc 0×01368ccc
2 libc.so.6 0×4021cc10 __default_sa_restorer_v2 + 0
Stack dump:
0. Program arguments: /wd/r96575/Debug/bin/llc -march=arm
1. Running pass ‘Prolog/Epilog Insertion & Frame Finalization’ on function
‘@”java.lang.String::getChars”‘
Aborted

If it crashes using llc then cheer up because you now got a reproducable CodeGen bug and thats great! These kind of crash bugs are on LLVM developers top wanted list because they can fire on any tool that uses LLVM code generation. The best way to report this kind of bugs are to first generate a compact testcase for LLVM that triggers the bug that can be used by the LLVM developers to fix it. It can also be run by the LLVM developers daily regression testing to make sure this bug never hits again.

If it fails to crash with an Aborted like above then you are probably observing a JIT CodeEmitter runtime bug, stay tuned and look forward to my next blog post on “How to fix Shark LLVM JIT CodeEmitter bugs”!

How to generate a bugpoint-reduced-simplified.bc from the bug.ll using bugpoint for CodeGen crash bugs
LLVM ships with a clever tool called bugpoint that are designed to convert dumped blocks of LLVM IR into a compact bugpoint-reduced-simplified.bc LLVM bitcode testcase file that only contains the instructions needed to reproduce the bug.

$ bugpoint -run-llc bug.ll --tool-args -march=arm

Bugpoint work by using deductive logic to break down and remove parts of the bug.ll file and automatically narrow down the LLVM IR lines needed to reproduce the bug. It can take some minutes so be patient but bugpoint will eventually stop and give you a bugpoint-reduced-simplified.bc and print some information on how to reproduce the bug.

File a LLVM bugreport containing the bugpoint-reduced-simplified.bc file
An example of a Shark JIT LLVM bug that have been fixed after submitting a bugpoint-reduced-simplified.bc produced from a dumped Shark metod are :
LLVM PR6478 ARM CodeGen Running pass ‘Prolog/Epilog Insertion & Frame Finalization’ on function ‘@”java.lang.String::getChars”‘

I hope this post have given you some inspiration on how to get Shark LLVM JIT CodeGen crash bugs fixed!
If you want to know more about how bugpoint works and how to officially prepare LLVM bugreports then take a peek at the LLVM documentation: http://llvm.org/docs/HowToSubmitABug.html its great.

Xerxes

March 9, 2010


I have from time to time been working on a automatic CPU feature tuning code for Shark to make the LLVM JIT generate better code for Cortex-A8 class ARM CPUs. Using it i was able to gain some substantial speed-improvements, all in all it made Shark generate 30% faster code on ARM and the Shark JIT are now able to beat the 2000 point CaffeineMark 3.0 score! I could not resist using CaffeineMark 3.0 for some benchmarking again :) .

While this looks all great with rainbows and unicorns the patch are sadly in limbo. I have had some trouble merging the code into LLVM 2.7 trunk before the next LLVM release since I got hit by LLVM problem report 6544 that will force me to redesign the implementation on top of LLVM before it can be commited.

And a similar optimisation like what i did for ARM Linux can easily be done for Shark on PPC Linux as well by adopting the ARM CPU features detection code from ARM to PPC!
For those who are interested the optimising code are submitted and can be fetched from LLVM PR5389.

Cheers and have a great day!
Xerxes

February 25, 2010

During the past months i have seen some really cool stuff done using small powerefficient ARM computers and OpenJDK.

SimpleSimon connects

Simple Simon PT connected to a hospital laboratory system by using a powerefficient plugcomputer and displaylink usb screen. All powered by OpenJDK

Simple Simon PT connects:

This project hooks up a battery powered laboratory coagulation device, a Simple Simon PT reader to standard hospital laboratory system using a ASTM-1394-1397 / LIS2-A2 connectivity over ethernet. A small ARM based plugcomputer does all data message processing and communication. User interraction are performed by using the Simon reader and a usb-barcode reader to enter laboratory identification. Optionally can a usb-touch-screen be connected for improved user feedback, by displaying charts using JFreeChart, to show and give a better understanding of the coagulation process.

Powerconsumption tops at 15W with the USB screen attached and 6W without. All running silent without any moving parts!

Shark linked against the shared libLLVM-2.7svn.so

Shark linked against the shared libLLVM-2.7svn.so

Shark linked against dynamic LLVM .so library

Earlier today I got Shark linked against a shared libLLVM-2.7svn.so generated by using LLVM 2.7svn trunk. It work by simply building LLVM using configure –enable-shared –enable-optimized –disable-assertions and then tweak the Icedtea6 main Makefiles to use the shared library during liking:
Replace the line
LLVM_LIBS = -lLLVMX86Disassembler -lLLVMX86AsmParser -lLLVMMCParser -lLLVMX86AsmPrinter -lLLVMX86CodeGen -lLLVMSelectionDAG -lLLVMAsmPrinter -lLLVMX86Info -lLLVMJIT -lLLVMExecutionEngine -lLLVMCodeGen -lLLVMScalarOpts -lLLVMInstCombine -lLLVMTransformUtils -lLLVMipa -lLLVMAnalysis -lLLVMTarget -lLLVMMC -lLLVMCore -lLLVMSupport -lLLVMSystem
with
LLVM_LIBS = -lLLVM-2.7svn
in the main icedtea6/Makefile and then build Icedtea6 normally, Shark currently builds and works right out of the box when using a LLVM release build!

A cool thing by building shark against the shared library are that you can switch the LLVM JIT that Shark uses from running with or without assertions, debug code, and various extra optimizations by simply replacing the /usr/local/lib/libLLVM-2.7svn.so file with what you want. Linking time during shark builds and shark footprint are impressively smaller as well. Im really happy to see this functionallity in LLVM 2.7!

The LLVM 2.7 code freeze before the 2.7 release happens in about 1.5 weeks from now and i will stay busy for some days to observe and polish the current LLVM svn trunk to be usable with openjdk-6-shark.

Edward Nevill created a ARM Jazelle RTC Thumb2 JIT reference implementation

Meanwhile I have been busy taming Sharks a new kind of Thumb2 JIT have emerged built by Edward Nevill of Cambridge Software Labs! The new Tumb2JIT have been committed into the Icedtea6 trunk and it are a working implementation of Jazelle RTC to be used by ARM Cortex-A8+ class CPUs. It wonderfull that this have been released as free software, Wow!,

Suddenly we got three different JITs to use on ARM with OpenJDK: Cacao, Shark and T2. An opurtunity has emerged to tier them and so I did. Here comes the raw “truth” produced by Caffeine Mark 3! This will probably be the last time i will show off any Caffeine Mark 3 benchmark since it really dont give justice on real world client applications where responsiveness are more crucial than top runtime speed, nevertheless benchmarking using CM30 have always felt fun so here we go: All benchmarks running using a Sharp PC-Z1 Cortex-A8 Mobile internet tool.

Tier between Edwards Thumb 2 JIT , Shark LLVM JIT and Cacao JIT: All running on OpenJDK 6 ARM

Tier between Edwards Thumb 2 JIT , Shark LLVM JIT and Cacao JIT: All running on a ARM Sharp PC-Z1 Mobile internet tool smartbook using OpenJDK 6 compiled with Icedtea6.

This new T2 JIT’s main strenght are reduced jitting time, it basically cuts all jtting time to almost zero and client applications on ARM finnaly runs from tick one. This thumb2 jit makes a really nice java applet browser experience with about 15 seconds first applet startuptime on a ARM smartbook and and all usable instantly after being loaded.
A small 1min 12seconds .3gp movie displaying some java applets running on the Sharp PC-Z1 featuring the new thumb2jit from Icedtea6

Cheers and have a great day!
Xerxes

October 6, 2009

picture of the day!

The picture that made my day!

Ok.. so what happened?

xerxes@babbage-karmic:/wd/icedtea6/openjdk/build/linux-arm/bin$ ./java -version
java version "1.6.0_0"
OpenJDK Runtime Environment (IcedTea6 1.7pre-r2a3725ce72d4) (build 1.6.0_0-b16)
OpenJDK Shark VM (build 14.0-b16-product, mixed mode)

xerxes@babbage-karmic:/wd/icedtea6/openjdk/build/linux-arm/bin$ cat /proc/cpuinfo
Processor    : ARMv7 Processor rev 1 (v7l)
BogoMIPS    : 799.53
Features    : swp half thumb fastmult vfp edsp
CPU implementer    : 0x41
CPU architecture: 7
CPU variant    : 0x2
CPU part    : 0xc08
CPU revision    : 1
Hardware    : Freescale MX51 Babbage Board
Revision    : 51011
Serial        : 0000000000000000

xerxes@babbage-karmic:/wd/llvm$ svn info
URL: http://llvm.org/svn/llvm-project/llvm/trunk
Repository Root: http://llvm.org/svn/llvm-project
Repository UUID: 91177308-0d34-0410-b5e6-96231b3b80d8
Revision: 82896
Node Kind: directory
Schedule: normal
Last Changed Author: edwin
Last Changed Rev: 82896
Last Changed Date: 2009-09-27 11:08:03 +0000 (Sun, 27 Sep 2009)

xerxes@babbage-karmic:/wd/llvm$ quilt diff
Index: llvm/lib/Target/ARM/ARMInstrInfo.td
===================================================================
--- llvm.orig/lib/Target/ARM/ARMInstrInfo.td    2009-10-06 12:35:26.000000000 +0000
+++ llvm/lib/Target/ARM/ARMInstrInfo.td    2009-10-06 12:36:03.000000000 +0000
@@ -645,7 +645,7 @@
 IIC_Br, "mov lr, pc\n\tbx $func",
 [(ARMcall_nolink GPR:$func)]>,
 Requires<[IsARM, IsNotDarwin]> {
-    let Inst{7-4}   = 0b0001;
+    let Inst{7-4}   = 0b0011;
 let Inst{19-8}  = 0b111111111111;
 let Inst{27-20} = 0b00010010;
 }

The last patch on LLVM are currently a hack. basically it makes LLVM emit ARM BLX instructions instead of BX instructions for ARM::CALL_NOLINK. So why did this little hack make it work?

In order to understand that, one have to find out what made Shark on ARM crash before…

Lets rewind time to some days ago... 

Hi, i have been enjoying myself inside gdb for some days, and I have now at least found the reason why the cpu
ends up in garbage memory when running shark on arm.

The problem can be illustrated like this:

frame manager invokes jited code
entry_zero.hpp:57 invokes jit code at 0x67c9e990

jited code runs
0x67c9e990:    push    {r4, r5, r6, r7, r8, r9, r10, r11, lr}
0x67c9e994:    sub    sp, sp, #12    ; 0xc
0x67c9e998:    ldr    r12, [r3, #756]
0x67c9e99c:    ldr    lr, [r3, #764]
0x67c9e9a0:    sub    r4, lr, #56    ; 0x38
0x67c9e9a4:    cmp    r4, r12
0x67c9e9a8:    bcc    0x67c9ebd0
0x67c9e9ac:    mov    r5, r3
0x67c9e9b0:    str    r2, [sp, #4]
0x67c9e9b4:    mov    r6, r0
0x67c9e9b8:    str    r4, [r5, #764]
0x67c9e9bc:    str    r4, [r4, #20]
0x67c9e9c0:    ldr    r0, [pc, #640]    ; 0x67c9ec48
0x67c9e9c4:    str    r0, [r4, #28]
0x67c9e9c8:    ldr    r0, [r5, #768]
0x67c9e9cc:    str    r0, [r4, #32]
0x67c9e9d0:    add    r0, r4, #32    ; 0x20
0x67c9e9d4:    str    r0, [r5, #768]
0x67c9e9d8:    str    r6, [r4, #16]
0x67c9e9dc:    ldr    r7, [r1]
0x67c9e9e0:    ldr    r0, [r1, #4]
0x67c9e9e4:    str    r0, [sp]
0x67c9e9e8:    ldr    r8, [r1, #8]
0x67c9e9ec:    ldr    r9, [r1, #12]
0x67c9e9f0:    ldr    r0, [r1, #16]
0x67c9e9f4:    str    r0, [sp, #8]
0x67c9e9f8:    ldr    r10, [r1, #20]
0x67c9e9fc:    ldr    r2, [pc, #584]    ; 0x67c9ec4c   <------ jit code calls a jvm function stored in this address
0x67c9ea00:    mov    r0, r1
0x67c9ea04:    bx    r2 <---------------------------   problem!  should have been blx!

(gdb) x 0x67c9ec4c
0x67c9ec4c:    0x40836d9c
(gdb) x 0x40836d9c
0x40836d9c <_ZN13SharedRuntime17OSR_migration_endEPi>:    0xe92d41f0
(gdb)

so lets check out _ZN13SharedRuntime17OSR_migration_endEPi

0x40836d9c <_ZN13SharedRuntime17OSR_migration_endEPi+0>:    push    {r4, r5, r6, r7, r8, lr}    <------  lr are backed up..  but bx did not update lr..
0x40836da0 <_ZN13SharedRuntime17OSR_migration_endEPi+4>:    ldr    r4, [pc, #284]    ; 0x40836ec4 <_ZN13SharedRuntime17OSR_migration_endEPi+296>
0x40836da4 <_ZN13SharedRuntime17OSR_migration_endEPi+8>:    ldr    r7, [pc, #284]    ; 0x40836ec8 <_ZN13SharedRuntime17OSR_migration_endEPi+300>
0x40836da8 <_ZN13SharedRuntime17OSR_migration_endEPi+12>:    ldr    r6, [pc, #284]    ; 0x40836ecc <_ZN13SharedRuntime17OSR_migration_endEPi+304>
0x40836dac <_ZN13SharedRuntime17OSR_migration_endEPi+16>:    add    r4, pc, r4
0x40836db0 <_ZN13SharedRuntime17OSR_migration_endEPi+20>:    ldr    r12, [r4, r7]
0x40836db4 <_ZN13SharedRuntime17OSR_migration_endEPi+24>:    ldr    r1, [r4, r6]
0x40836db8 <_ZN13SharedRuntime17OSR_migration_endEPi+28>:    ldr    r5, [r12]
0x40836dbc <_ZN13SharedRuntime17OSR_migration_endEPi+32>:    ldrb    r2, [r1]
0x40836dc0 <_ZN13SharedRuntime17OSR_migration_endEPi+36>:    add    r3, r5, #1    ; 0x1
0x40836dc4 <_ZN13SharedRuntime17OSR_migration_endEPi+40>:    cmp    r2, #0    ; 0x0
0x40836dc8 <_ZN13SharedRuntime17OSR_migration_endEPi+44>:    sub    sp, sp, #24    ; 0x18
0x40836dcc <_ZN13SharedRuntime17OSR_migration_endEPi+48>:    str    r3, [r12]
0x40836dd0 <_ZN13SharedRuntime17OSR_migration_endEPi+52>:    mov    r7, r0
0x40836dd4 <_ZN13SharedRuntime17OSR_migration_endEPi+56>:    bne 0x40836e74 <_ZN13SharedRuntime17OSR_migration_endEPi+216>
0x40836dd8 <_ZN13SharedRuntime17OSR_migration_endEPi+60>:    ldr    r2, [pc, #240]    ; 0x40836ed0 <_ZN13SharedRuntime17OSR_migration_endEPi+308>
0x40836ddc <_ZN13SharedRuntime17OSR_migration_endEPi+64>:    ldr    r12, [r4, r2]
0x40836de0 <_ZN13SharedRuntime17OSR_migration_endEPi+68>:    ldrb    r3, [r12]
0x40836de4 <_ZN13SharedRuntime17OSR_migration_endEPi+72>:    cmp    r3, #0    ; 0x0
0x40836de8 <_ZN13SharedRuntime17OSR_migration_endEPi+76>:    beq 0x40836e20 <_ZN13SharedRuntime17OSR_migration_endEPi+132>
0x40836dec <_ZN13SharedRuntime17OSR_migration_endEPi+80>:    ldr    r6, [pc, #224]    ; 0x40836ed4 <_ZN13SharedRuntime17OSR_migration_endEPi+312>
0x40836df0 <_ZN13SharedRuntime17OSR_migration_endEPi+84>:    ldr    r5, [r4, r6]
0x40836df4 <_ZN13SharedRuntime17OSR_migration_endEPi+88>:    add    r0, r4, r6
0x40836df8 <_ZN13SharedRuntime17OSR_migration_endEPi+92>:    tst    r5, #1    ; 0x1
0x40836dfc <_ZN13SharedRuntime17OSR_migration_endEPi+96>:    beq 0x40836e8c <_ZN13SharedRuntime17OSR_migration_endEPi+240>
0x40836e00 <_ZN13SharedRuntime17OSR_migration_endEPi+100>:    ldr    r5, [pc, #208]    ; 0x40836ed8 <_ZN13SharedRuntime17OSR_migration_endEPi+316>
0x40836e04 <_ZN13SharedRuntime17OSR_migration_endEPi+104>:    ldr    r3, [r4, r5]
0x40836e08 <_ZN13SharedRuntime17OSR_migration_endEPi+108>:    cmp    r3, #0    ; 0x0
0x40836e0c <_ZN13SharedRuntime17OSR_migration_endEPi+112>:    movne r0, r3
0x40836e10 <_ZN13SharedRuntime17OSR_migration_endEPi+116>:    ldrne r6, [r3]
0x40836e14 <_ZN13SharedRuntime17OSR_migration_endEPi+120>:    ldrne r12, [r6, #16]
0x40836e18 <_ZN13SharedRuntime17OSR_migration_endEPi+124>:    movne lr, pc
0x40836e1c <_ZN13SharedRuntime17OSR_migration_endEPi+128>:    bxne    r12
0x40836e20 <_ZN13SharedRuntime17OSR_migration_endEPi+132>:    add    r6, sp, #20    ; 0x14
0x40836e24 <_ZN13SharedRuntime17OSR_migration_endEPi+136>:    mov    r0, r6
0x40836e28 <_ZN13SharedRuntime17OSR_migration_endEPi+140>:    bl 0x40596c84 <NoHandleMark>
0x40836e2c <_ZN13SharedRuntime17OSR_migration_endEPi+144>:    mov    r0, sp
0x40836e30 <_ZN13SharedRuntime17OSR_migration_endEPi+148>:    bl 0x4057909c <JRT_Leaf_Verifier>
0x40836e34 <_ZN13SharedRuntime17OSR_migration_endEPi+152>:    ldr    r3, [pc, #160]    ; 0x40836edc <_ZN13SharedRuntime17OSR_migration_endEPi+320>
0x40836e38 <_ZN13SharedRuntime17OSR_migration_endEPi+156>:    mov    r5, sp
0x40836e3c <_ZN13SharedRuntime17OSR_migration_endEPi+160>:    ldr r12, [r4, r3]
0x40836e40 <_ZN13SharedRuntime17OSR_migration_endEPi+164>:    ldrb r0, [r12]
0x40836e44 <_ZN13SharedRuntime17OSR_migration_endEPi+168>:    cmp    r0, #0    ; 0x0
0x40836e48 <_ZN13SharedRuntime17OSR_migration_endEPi+172>:    movne r0, r7
0x40836e4c <_ZN13SharedRuntime17OSR_migration_endEPi+176>:    blne 0x4039b20c <_Z15trace_heap_freePv>
0x40836e50 <_ZN13SharedRuntime17OSR_migration_endEPi+180>:    mov    r0, r7
0x40836e54 <_ZN13SharedRuntime17OSR_migration_endEPi+184>:    bl 0x407b6a94 <_ZN2os4freeEPv>
0x40836e58 <_ZN13SharedRuntime17OSR_migration_endEPi+188>:    mov    r0, sp
0x40836e5c <_ZN13SharedRuntime17OSR_migration_endEPi+192>:    bl 0x40578c5c <~JRT_Leaf_Verifier>
0x40836e60 <_ZN13SharedRuntime17OSR_migration_endEPi+196>:    mov    r0, r6
0x40836e64 <_ZN13SharedRuntime17OSR_migration_endEPi+200>:    bl 0x40596b04 <~NoHandleMark>
0x40836e68 <_ZN13SharedRuntime17OSR_migration_endEPi+204>:    add    sp, sp, #24    ; 0x18
0x40836e6c <_ZN13SharedRuntime17OSR_migration_endEPi+208>:    pop {r4, r5, r6, r7, r8, lr}
0x40836e70 <_ZN13SharedRuntime17OSR_migration_endEPi+212>:    bx    lr <------  and woho. lets enjoy a trip to garbage memory!

So when the function that the jit calls returns we find ourself eating
garbage memory.

So the small hack fixed this issue quite well but broke armv4t compatibility for the moment.

My next task would be to fix this properly in LLVM.

September 13, 2009

During the past month i have been running a public llvm-arm-linux buildbot in order to iron out the remaining bugs in the LLVM Execution Engine JIT for ARM.
My goal was to stabilise the LLVM JIT so that it can be used to speed up cool projects like OpenJDK on ARM by fixing all pre-requirements to run Gary Benson’s Shark JIT compiler on top of Zero!

I have been following the LLVM project for about a year and for me to see the following reports from the buildbot makes me jump of joy! It marks a new era, when all cool and silent energy efficient computing on ARM can get JIT accelerated!

  • (Sep 12 21:57) rev=[81669] success #153: build successful
  • (Sep 12 19:17) rev=[81660] success #152: build successful
  • (Sep 12 16:31) rev=[81655] failure #151: failed test-llvm
  • (Sep 12 13:03) rev=[81626] failure #148: failed test-llvm

The next LLVM release 2.6 gets out in about a week (21 of september 2009) and feel I have done my part in the LLVM stabilisation process for ARM, It are now up for the LLVM 2.6 release managers to merge in the patches from the 2.7 svn trunk to the release branch in order to make the LLVM 2.6 release stable on ARM as well.

Life is cool!

May 10, 2009

I have just pushed an update to the Jalimo project that enables the new OpenJDK 6 b16 sourcebundle to be cross-compile-able for embedded devices using Jalimo as a cross-compile layer for Icedtea6.

Using Jalimo you can now cross-compile OpenJDK b16 and have hotspot + zero, hotspot + shark or cacao as the vm built out of the box, simply awesome!

Since shark are using the pre2.6 LLVM sources for its JIT I have also prepared “.bb” build recipes for Openembedded that enables quick cross compilation of LLVM based on the LLVM svn trunk so that Jalimo can make use of them when building shark.

The shark vm are built with assertions enabled in order to produce better debug output for all Jalimo users.

Robert Schuster have been an excellent tutor for me to understand all the quirks of OE-recipes, quirks that in turn helped me to creating all these new nice cross compile recipes for OE and Jalimo. Thank you Robert and thank you for pushing the LLVM recipes into the main OE dev git tree!

Andrew Haley and Gary Benson have helped me enormously to understand the lock-free code using memory-barriers that are part of the zero and shark hotspot implementations. I will keep working on these parts in order to make zero and shark rock solid on ARM before ARM Cortex A9 multi-core CPU’s will be part of every cool and silent computing loving persons pocket.

April 15, 2009

Why going to hell in the first place?
During the past year I have involved myself in the porting effort of OpenJDK to various embedded systems using the Icedtea build system. Unfortunally the embedded development boards that I had access during last year where equiped with inadequate amounts of RAM making it practically impossible to build Icedtea directly on native hardware. I learnt how to workaround this by using emulated hardware with more RAM using QEMU, now the compilation process of Icedtea was doable, yet it still took a week to compile Icedtea6 using QEMU. urgh…

I spent some time in my personal created emulator hell watching time pass by and always feeling behind not working on the current code base like all other freejava hackers. I finally decided something had to be done about this and wanted to break free.

My liberation was made possible by first scouting for a build environment suitable for rapid cross compilation and porting development of OpenJDK for any hardware architecture imaginable, including your toaster. Lucky for me I got to know about the Jalimo project and even got the chance to meet one of its core developers Robert Schuster during FOSDEM09. Robert demonstrated how easy embedded Java development could be using the Jalimo infrastructure and it provided me with all the tools I needed to speedup my compile run and test cycle, I was no longer in need of a emulator instead I could build binarys swiftly using the full power of an affordable IA32 quadcore cpu (with 12Mb of cache) and deploy my work for testing on real hardware for debugging within ohurs not days, simply bliss.

It took some time for me to understand how the four required pieces openembedded, bitbake, jalimo and my own goal could be merged, it turned out they where designed to fit!
First openembedded bitbake and jalimo where all three downloaded from their respecive svn or git trees
The only tricky part was that I (the user) needs to provide configuration files containing the basic information of what kind of target i want to crosscompile against and specify what bundles of recipe I want to use to accomplish this, basically I had to express my mind in a way that bitbake understood.

Once this was setup I could stand in any directory and start the build by simply typing:
bitbake openjdk-6
… or any other software package as long I knew the name of the recipe to use.
Even bitbake openjdk-6-shark builds out of the box!

Within a day playing with bitbake my build computer had downloaded 2gig of sourcecode, eaten several gigs of harddrive space. Rather cool… It had built all cross compilers tools I needed, compiled all dependent librarys, from scratch and all optimised for the target hardware that I wanted to run and debug the binarys on.
The final result was then obtained in the temp directory of my choice.

I have documented my work crosscompilation experiences on the Icedtea wiki:
http://icedtea.classpath.org/wiki/CrossCompileFaq

Cheers and have a great day!
Xerxes

August 29, 2008

Processing; a lightweight Java IDE targeted for creation of interactive computer art can now be run on embedded ARM hardware thanks to the Icedtea, Cacao, Classpath and OpenJDK projects!

Check out the full Processing IDE is running below on a embedded Fedora 8 ARM Linux system ! It is running using OpenJDK6 with CACAO vm, compiled using the classpath bootstrapped Icedtea buildsystem. The ARM hardware is a ATMEL AT91SAM9263-EK devkit with a 200mhz ARMv5tejl cpu and 64Mb ram.

Good news for embedded ARM users; The Debian armel port for ARM are now shipping prebuilt openjdk packages that can be installed by simply running:

apt-get install openjdk-6-jdk

OpenJDK using the CACAO JIT can also be obtained from the experimental (sid) repositorys and be installed by running:

apt-get install cacao-oj6-jdk

I look forward to see all the possibilities with interactive art created by using embedded ARM hardware, Linux, OpenJDK and Processing!

Ps. the ARM cpu is the same kind of cpu found in most cellphones including the iPhone!

Cheers!

Xerxes Rånby

August 21, 2008

OpenJDK6 got sucessfully compiled using CACAO JIT jvm for ARMv5tejl EABI softfloat using the Icedtea6 buildsystem! This compile was made using mercurial sources, Icedtea6 changeset: 1013:a469b20018d9 and CACAO changeset: 8656:140bc48ab360.

Icedtea is served!

java version “1.6.0_0″
OpenJDK Runtime Environment (build 1.6.0_0-b11)
CACAO (build 1.1.0pre, JIT mode)

The footprint of the compiled CACAO/OpenJDK6 j2re-image is 84mb.
I look forward to the next CACAO release “1.0.0″ and expect it to be a smasher for embedded ARM java developement! The release fixes some rather tricky bugs PR84 and PR99 found in CACAO 0.99.3 that could trigger sporadic crashes during the OpenJDK class compilation and when running java programs on ARM systems.

Some output from the running jvm.
For some reason red and blue gets swapped when running on my ARM displays framebuffer, red and blue looks fine if i run applications remote using “ssh -X”.

The best of enterprise business applications OneSlime on arm!


Voxel speedtest.

Powered by WordPress