zafena development

August 28, 2012

Last week I stumbled across and dipped my toes into, Avian, a new small and fast JVM that included a JIT for ARM.
Avian have been developed in the open during the last 5 years, I was quite surprised that all this was for me unheard of!
Avian got support to be used in combination with the OpenJDK 7 class library’s: http://oss.readytalk.com/avian/building.html

Build instructions to build Avian on a Raspberry Pi running Raspbian.
sudo apt-get install openjdk-7-jdk libz-dev git

# you may change these two export to match your system
# use i386, x86_64, armhf, armel or ppc here
export ARCH=armhf
# use i386, x86_64, ppc or arm here
export JVM_ARCH=arm

# clone and build avian in combination with one existing OpenJDK 7 jdk image
# It takes about 40 minutes to perform this compilation natively on a Raspberry Pi.
git clone https://github.com/ReadyTalk/avian.git
cd avian
JAVA_HOME=/usr/lib/jvm/java-7-openjdk-${ARCH} make openjdk=/usr/lib/jvm/java-7-openjdk-${ARCH}

# Run the avian test suite to check that your newly built avian pass all expected functionality
JAVA_HOME=/usr/lib/jvm/java-7-openjdk-armhf/ make openjdk=/usr/lib/jvm/java-7-openjdk-armhf/ test

# Install the built avian libjvm.so into OpenJDK 7
sudo mkdir -p /usr/lib/jvm/java-7-openjdk-${ARCH}/jre/lib/${JVM_ARCH}/avian
sudo cp build/linux-${JVM_ARCH}-openjdk/libjvm.so /usr/lib/jvm/java-7-openjdk-${ARCH}/jre/lib/${JVM_ARCH}/avian
# Add -avian KNOWN to the end of the jvm.cfg file in order to make java -avian work
sudo sh -c "echo '-avian KNOWN' >> /usr/lib/jvm/java-7-openjdk-${ARCH}/jre/lib/${JVM_ARCH}/jvm.cfg"

# hack for ubuntu and debian to let openjdk libjava.so find the libjvm.so
sudo ln -s /usr/lib/jvm/java-7-openjdk-${ARCH}/jre/lib/${JVM_ARCH}/avian/libjvm.so /usr/lib/jvm/java-7-openjdk-${ARCH}/jre/lib/${JVM_ARCH}/libjvm.so
# the hack/workaround is required if you see the following error when running java -avian -version
# Error: Could not create the Java Virtual Machine.
# Error: A fatal exception has occurred. Program will exit.
# The above hack is not required if you compile your own build of IcedTea or OpenJDK from upstream source directly

# Done now try run it
java -avian -version
# The output to be seen
java version "1.7.0_03"
OpenJDK Runtime Environment (IcedTea7 2.1.1) (7~u3-2.1.1-1+rpi1)
Avian (build null, null)

Avian is heavily tested by its main developers for use running Eclipse SWT applications,
users of Avian have only had working Swing and AWT running during this last month.
expect Java2D to still be a little shaky when using Avian, I expect these Java2D issues to stabilize quickly thanks to the rapid development cycle that takes pace on git hub and thanks to the internal Avian test-suite that gets run after each build.

Avian got some interesting properties:
The JIT is good, it runs at a speed similar to CACAO JIT and the Thumb 2 JIT add-on for Zero.
It is embeddable, you may use Avian to create a small standalone executable that include both the JVM and your application, Avian then uses proguard to cut out and bundle only the needed code from the large OpenJDK class library that is required to run your application, all documented in the avian build documentation. This embeddable feature would possibly allow, free and open-source, OpenJDK applications to be deployed inside walled gardens such as mobile platform application stores, some people even got it to run on webOS. Avian makes Java viable on low footprint embedded systems, a generated standalone executable can be compressed to less than 1Mb!

August 22, 2012

Your CPU is inefficient to render graphics on large displays:

For a long time computer programs and operating systems have been updating the display memory directly using the main CPU. Many high level programming languages was able to draw graphics faster by simply optimizing the internal JIT compiler that allowed the graphics routines to run faster on the CPU. This simple approach was fine when computers got relatively small screens of less than 640x480 pixels the code running on the CPU was then able to update all the 307 200 pixels on the screen using only a couple of Mhz processing time, if you wanted 60 updates per second then your CPU had to update 640*480*60 ≃ 18Million pixels each second, each pixel is made up red, green and blue stored in four bytes on a 32bit display thus 18Million * 4 = 72Million bytes to get updated each second, this was almost still possible to do on a 200Mhz CPU since you then had some time left to do general computations.

When you attached larger pixel density displays then the work load increase, a full HD display at 1980x1080 require that your CPU would need to update 1980*1080*60*4 ≃ 5000Million bytes/second you would then need a 10Ghz CPU to perform this task.. this kind of CPU is impossible to create, it would self-combust, since it require enormous amounts of power because the power consumption and generated heat by the CPU increases exponentially with the increased clock rate.

During the work of porting OpenJDK to run on ARM CPUs a lot of effort went into optimize the JVM to speed up basic computations, this work did also speed up graphics rendering to some degree using a JIT yet we faced a wall. The main bottleneck was limited that we only tried to use the main CPU. I had to find a way to offload graphics tasks from the main CPU to the graphics GPU processor found in the ARM, system on a chip, SoC designs to get good and fluid graphics performance using OpenJDK.

Modern graphics GPUs are designed to render graphics large screens using the least amount of energy:

A modern graphics GPU solves the problem on how to update the large displays by using the least amount of energy by using parallelism at the hardware level, the GPU is made up of several small processor-cores running at a lower clock rate that can receive instructions from the main CPU. The many smaller GPU cores can operate in parallel to quickly update the large screen. The main CPUs task is now changed from updating the graphics memory pixels directly to become a command central with the main purpose to inform the many GPU hardware cores what to do. Luckily the graphics GPU vendors have agreed on a standardised way on how to let applications running on the main CPU to interact and instruct the graphics GPU. The GPU vendors let you access the GPU by using the OpenGL "c" API. OpenGL is accessible by loading a shared library, libGL or libEGL, shipped with your operating-system.

Unfortunately high level languages running on the JVM can not use the system installed libGL or libEGL directly, they require a bridge usually coded in the JNI API to get access to the system OpenGL library. Writing JNI code manually to simply forward all OpenGL library calls would be high maintenance work and error prone.

I did check if there was any existing bindings found buried inside the OpenJDK Java2D classes and yes it did contain an old backend for desktop OpenGL 1 to let Swing and AWT applications run accelerated, unfortunately this backend had not been maintained for many years and was not even enabled by default, instead all Java2D was rendered using the CPU directly, you can enable this OpenGL 1 backend by passing -Dsun.java2d.opengl=True . The existing OpenGL 1 Java2D bindings contained no code to let applications access the most recent OpenGL 2 and OpenGL ES 2 hardware, I had to look elsewhere for a suitable binding to get access to accelerate graphics using the OpenGL ES 2 GPU found inside ARM SoCs.

JogAmp solves the problem on how to get access to your fast GPU to perform rendering from high level languages running on the JVM like Java:

I have worked with the JogAmp community that provide a platform neutral binding that allows languages running on the JVM a low overhead access to the system installed OpenGL library. JogAmp JOGL uses a tool called gluegen to load and probe the system installed libGL and libEGL at runtime and is able to auto-generate the required JNI code and classes to let languages running on the JVM get a low overhead direct access to the OpenGL API!

I have published a small JogAmp JOGL OpenGL ES 2.0 Vertex and Fragment shader introduction, that demonstrates how to access OpenGL 2/ES 2 from Java using JogAmp JOGL at:
https://github.com/xranby/jogl-demos/blob/master/src/demos/es2/RawGL2ES2demo.java#L54

The nice thing about this introduction is that the demo source and the compiled java .class will run unmodified on both Desktop OpenGL GL2 systems and Mobile OpenGL ES2 systems.
It uses the JogAmp GL2ES2 GLProfile that use the common subset of OpenGL calls found in both desktop GL2 and mobile ES2. JogAmp will handle all platform specific bits for you like how to open a native drawable surface and let your application focus on rendering using the platform independent OpenGL/ES calls. Note that the demo source-code itself contains no Architecture or OS specific code at all!

Demo break... lets have some fun...

Try the demo on a java enabled Desktop/Mobile running X11/Windows/MeeGo or Mac system:

wget http://jogamp.org/deployment/jogamp-current/archive/jogamp-all-platforms.7z
7z x jogamp-all-platforms.7z
cd jogamp-all-platforms
mkdir -p demos/es2
cd demos/es2
wget https://raw.github.com/xranby/jogl-demos/master/src/demos/es2/RawGL2ES2demo.java
cd ../..
javac -cp jar/jogl-all.jar:jar/gluegen-rt.jar demos/es2/RawGL2ES2demo.java
java -cp jar/jogl-all.jar:jar/gluegen-rt.jar:. demos.es2.RawGL2ES2demo

Raspberry Pi supported!

At Siggraph 2012 i demonstrated for the first time JogAmp JOGL OpenGL ES 2 bindings running on the RaspberryPi, since then all source-code have been committed to the JogAmp JOGL git and been processed through the JogAmp "chuck" auto-builder. Raspberry Pi is supported by the current JogAmp release thus use the same instructions to compile and run as compared to desktop! The Raspberry Pi Broadcom VC IV NEWT driver is included.
I am personally *stunned* by the excellent performance you get on the small Raspberry Pi machine, at 5-watt power consumption, it runs butter-smooth and outclass my desktop Intel Q45/Q43 Chipset system when running at full HD 1980x1080 resolution.

I hope you enjoyed the demo.

Lets talk about the shader programs that got executed inside your graphics GPU:

The only way to create a program that can get run and executed on the GPU is by transmitting shader code to the GPU through the OpenGL 2 API.

The shader program itself is sent as a clear text string to your GPU driver for compilation. When the program is compiled OpenGL hands you a reference to the program in form of a number-ticket that you later on can use to activate the program. You will not be able to actually see the compiled code, what the compiled program looks like is still a secret only known by your GPU vendor.

The opengl API let you define two types of GPU programs, a vertex shader and a fragment shader:

The vertex shader gets executed one time for each vertex:

/* Introducing the OpenGL ES 2 Vertex shader
 *
 * The main loop inside the vertex shader gets executed
 * one time for each vertex.
 *
 *      vertex -> *       uniform data -> mat4 projection = ( 1, 0, 0, 0,
 *      (0,1,0)  / \                                          0, 1, 0, 0,
 *              / . \  <- origo (0,0,0)                       0, 0, 1, 0,
 *             /     \                                        0, 0,-1, 1 );
 *  vertex -> *-------* <- vertex
 *  (-1,-1,0)             (1,-1,0) <- attribute data can be used
 *                        (0, 0,1)    for color, position, normals etc.
 *
 * The vertex shader recive input data in form of
 * "uniform" data that are common to all vertex
 * and
 * "attribute" data that are individual to each vertex.
 * One vertex can have several "attribute" data sources enabled.
 *
 * The vertex shader produce output used by the fragment shader.
 * gl_Position are expected to get set to the final vertex position.
 * You can also send additional user defined
 * "varying" data to the fragment shader.
 *
 * Model Translate, Scale and Rotate are done here by matrix-multiplying a
 * projection matrix against each vertex position.
 *
 * The whole vertex shader program are a String containing GLSL ES language
 * http://www.khronos.org/registry/gles/specs/2.0/GLSL_ES_Specification_1.0.17.pdf
 * sent to the GPU driver for compilation.
 */
static final String vertexShader =
// For GLSL 1 and 1.1 code i highly recomend to not include a
// GLSL ES language #version line, GLSL ES section 3.4
// Many GPU drivers refuse to compile the shader if #version is different from
// the drivers internal GLSL version.
"#ifdef GL_ES \n" +
"precision mediump float; \n" + // Precision Qualifiers
"precision mediump int; \n" +   // GLSL ES section 4.5.2
"#endif \n" +

"uniform mat4    uniform_Projection; \n" + // Incomming data used by
"attribute vec4  attribute_Position; \n" + // the vertex shader
"attribute vec4  attribute_Color; \n" +    // uniform and attributes

"varying vec4    varying_Color; \n" + // Outgoing varying data
                                      // sent to the fragment shader
"void main(void) \n" +
"{ \n" +
"  varying_Color = attribute_Color; \n" +
"  gl_Position = uniform_Projection * attribute_Position; \n" +
"} ";

The fragment shader gets executed one time for each visible pixel fragment:

/* Introducing the OpenGL ES 2 Fragment shader
 *
 * The main loop of the fragment shader gets executed for each visible
 * pixel fragment on the render buffer.
 *
 *       vertex-> *
 *      (0,1,-1) /f\
 *              /ffF\ <- This fragment F gl_FragCoord get interpolated
 *             /fffff\                   to (0.25,0.25,-1) based on the
 *   vertex-> *fffffff* <-vertex         three vertex gl_Position.
 *  (-1,-1,-1)           (1,-1,-1)
 *
 *
 * All incomming "varying" and gl_FragCoord data to the fragment shader
 * gets interpolated based on the vertex positions.
 *
 * The fragment shader produce and store the final color data output into
 * gl_FragColor.
 *
 * Is up to you to set the final colors and calculate lightning here based on
 * supplied position, color and normal data.
 *
 * The whole fragment shader program are a String containing GLSL ES language
 * http://www.khronos.org/registry/gles/specs/2.0/GLSL_ES_Specification_1.0.17.pdf
 * sent to the GPU driver for compilation.
 */
static final String fragmentShader =
"#ifdef GL_ES \n" +
"precision mediump float; \n" +
"precision mediump int; \n" +
"#endif \n" +

"varying   vec4    varying_Color; \n" + //incomming varying data to the
                                        //frament shader
                                        //sent from the vertex shader
"void main (void) \n" +
"{ \n" +
"  gl_FragColor = varying_Color; \n" +
"} ";

Use the source!

https://github.com/xranby/jogl-demos/blob/master/src/demos/es2/RawGL2ES2demo.java#L54
When you first look at the source code you might find the many lines needed to render a single triangle scary.
Take a break and instead focus on the display function that gets executed about 60times/s here notice that the CPU only have to perform a handful of 4x4 matrix multiplications and then call about 15 function calls to pass all data information to the vertex and fragment shader programs inside the GPU. The GPU then by itself performing all rendering to the screen. This small amount of preparation done by the CPU each frame can easily be performed by the most simple JVM interpreter. The main bottleneck is gone we have successfully offloaded all the time consuming graphics rendering from the CPU to the GPU and as a bonus we find that we got a lot of free idle CPU time to perform general application logic computations on the JVM.

Edit: For game programming we recommend you to use JogAmp indirect through a game engine library such as libgdx or jMonkeyEngine3. We have a forum thread inside the JogAmp community where we work on improving engine support for Raspberry Pi using said engines. By using a game engine will get you up to speed developing professional games that is utilizing the hardware acceleration across devices! http://forum.jogamp.org/JOGL-2-0-OpenGL-OpenGL-ES-backend-for-LibGDX-td4027689.html

Edit2: New JogAmp video/teaser footage is now online for the FOSDEM 2013 talk and the Siggraph 2012 BOF to get a better idea on what is possible to using JogAmp in combination with engines across devices. You may want to read the http://labb.zafena.se/?p=681 post that include background information on the FOSDEM 2013 demo setup.

I hope this introduction have been a delight to read, cheers and have a great day!
Xerxes

Powered by WordPress