zafena development

August 22, 2012

Your CPU is inefficient to render graphics on large displays:

For a long time computer programs and operating systems have been updating the display memory directly using the main CPU. Many high level programming languages was able to draw graphics faster by simply optimizing the internal JIT compiler that allowed the graphics routines to run faster on the CPU. This simple approach was fine when computers got relatively small screens of less than 640x480 pixels the code running on the CPU was then able to update all the 307 200 pixels on the screen using only a couple of Mhz processing time, if you wanted 60 updates per second then your CPU had to update 640*480*60 ≃ 18Million pixels each second, each pixel is made up red, green and blue stored in four bytes on a 32bit display thus 18Million * 4 = 72Million bytes to get updated each second, this was almost still possible to do on a 200Mhz CPU since you then had some time left to do general computations.

When you attached larger pixel density displays then the work load increase, a full HD display at 1980x1080 require that your CPU would need to update 1980*1080*60*4 ≃ 5000Million bytes/second you would then need a 10Ghz CPU to perform this task.. this kind of CPU is impossible to create, it would self-combust, since it require enormous amounts of power because the power consumption and generated heat by the CPU increases exponentially with the increased clock rate.

During the work of porting OpenJDK to run on ARM CPUs a lot of effort went into optimize the JVM to speed up basic computations, this work did also speed up graphics rendering to some degree using a JIT yet we faced a wall. The main bottleneck was limited that we only tried to use the main CPU. I had to find a way to offload graphics tasks from the main CPU to the graphics GPU processor found in the ARM, system on a chip, SoC designs to get good and fluid graphics performance using OpenJDK.

Modern graphics GPUs are designed to render graphics large screens using the least amount of energy:

A modern graphics GPU solves the problem on how to update the large displays by using the least amount of energy by using parallelism at the hardware level, the GPU is made up of several small processor-cores running at a lower clock rate that can receive instructions from the main CPU. The many smaller GPU cores can operate in parallel to quickly update the large screen. The main CPUs task is now changed from updating the graphics memory pixels directly to become a command central with the main purpose to inform the many GPU hardware cores what to do. Luckily the graphics GPU vendors have agreed on a standardised way on how to let applications running on the main CPU to interact and instruct the graphics GPU. The GPU vendors let you access the GPU by using the OpenGL "c" API. OpenGL is accessible by loading a shared library, libGL or libEGL, shipped with your operating-system.

Unfortunately high level languages running on the JVM can not use the system installed libGL or libEGL directly, they require a bridge usually coded in the JNI API to get access to the system OpenGL library. Writing JNI code manually to simply forward all OpenGL library calls would be high maintenance work and error prone.

I did check if there was any existing bindings found buried inside the OpenJDK Java2D classes and yes it did contain an old backend for desktop OpenGL 1 to let Swing and AWT applications run accelerated, unfortunately this backend had not been maintained for many years and was not even enabled by default, instead all Java2D was rendered using the CPU directly, you can enable this OpenGL 1 backend by passing -Dsun.java2d.opengl=True . The existing OpenGL 1 Java2D bindings contained no code to let applications access the most recent OpenGL 2 and OpenGL ES 2 hardware, I had to look elsewhere for a suitable binding to get access to accelerate graphics using the OpenGL ES 2 GPU found inside ARM SoCs.

JogAmp solves the problem on how to get access to your fast GPU to perform rendering from high level languages running on the JVM like Java:

I have worked with the JogAmp community that provide a platform neutral binding that allows languages running on the JVM a low overhead access to the system installed OpenGL library. JogAmp JOGL uses a tool called gluegen to load and probe the system installed libGL and libEGL at runtime and is able to auto-generate the required JNI code and classes to let languages running on the JVM get a low overhead direct access to the OpenGL API!

I have published a small JogAmp JOGL OpenGL ES 2.0 Vertex and Fragment shader introduction, that demonstrates how to access OpenGL 2/ES 2 from Java using JogAmp JOGL at:
https://github.com/xranby/jogl-demos/blob/master/src/demos/es2/RawGL2ES2demo.java#L54

The nice thing about this introduction is that the demo source and the compiled java .class will run unmodified on both Desktop OpenGL GL2 systems and Mobile OpenGL ES2 systems.
It uses the JogAmp GL2ES2 GLProfile that use the common subset of OpenGL calls found in both desktop GL2 and mobile ES2. JogAmp will handle all platform specific bits for you like how to open a native drawable surface and let your application focus on rendering using the platform independent OpenGL/ES calls. Note that the demo source-code itself contains no Architecture or OS specific code at all!

Demo break... lets have some fun...

Try the demo on a java enabled Desktop/Mobile running X11/Windows/MeeGo or Mac system:

wget http://jogamp.org/deployment/jogamp-current/archive/jogamp-all-platforms.7z
7z x jogamp-all-platforms.7z
cd jogamp-all-platforms
mkdir -p demos/es2
cd demos/es2
wget https://raw.github.com/xranby/jogl-demos/master/src/demos/es2/RawGL2ES2demo.java
cd ../..
javac -cp jar/jogl-all.jar:jar/gluegen-rt.jar demos/es2/RawGL2ES2demo.java
java -cp jar/jogl-all.jar:jar/gluegen-rt.jar:. demos.es2.RawGL2ES2demo

Raspberry Pi supported!

At Siggraph 2012 i demonstrated for the first time JogAmp JOGL OpenGL ES 2 bindings running on the RaspberryPi, since then all source-code have been committed to the JogAmp JOGL git and been processed through the JogAmp "chuck" auto-builder. Raspberry Pi is supported by the current JogAmp release thus use the same instructions to compile and run as compared to desktop! The Raspberry Pi Broadcom VC IV NEWT driver is included.
I am personally *stunned* by the excellent performance you get on the small Raspberry Pi machine, at 5-watt power consumption, it runs butter-smooth and outclass my desktop Intel Q45/Q43 Chipset system when running at full HD 1980x1080 resolution.

I hope you enjoyed the demo.

Lets talk about the shader programs that got executed inside your graphics GPU:

The only way to create a program that can get run and executed on the GPU is by transmitting shader code to the GPU through the OpenGL 2 API.

The shader program itself is sent as a clear text string to your GPU driver for compilation. When the program is compiled OpenGL hands you a reference to the program in form of a number-ticket that you later on can use to activate the program. You will not be able to actually see the compiled code, what the compiled program looks like is still a secret only known by your GPU vendor.

The opengl API let you define two types of GPU programs, a vertex shader and a fragment shader:

The vertex shader gets executed one time for each vertex:

/* Introducing the OpenGL ES 2 Vertex shader
 *
 * The main loop inside the vertex shader gets executed
 * one time for each vertex.
 *
 *      vertex -> *       uniform data -> mat4 projection = ( 1, 0, 0, 0,
 *      (0,1,0)  / \                                          0, 1, 0, 0,
 *              / . \  <- origo (0,0,0)                       0, 0, 1, 0,
 *             /     \                                        0, 0,-1, 1 );
 *  vertex -> *-------* <- vertex
 *  (-1,-1,0)             (1,-1,0) <- attribute data can be used
 *                        (0, 0,1)    for color, position, normals etc.
 *
 * The vertex shader recive input data in form of
 * "uniform" data that are common to all vertex
 * and
 * "attribute" data that are individual to each vertex.
 * One vertex can have several "attribute" data sources enabled.
 *
 * The vertex shader produce output used by the fragment shader.
 * gl_Position are expected to get set to the final vertex position.
 * You can also send additional user defined
 * "varying" data to the fragment shader.
 *
 * Model Translate, Scale and Rotate are done here by matrix-multiplying a
 * projection matrix against each vertex position.
 *
 * The whole vertex shader program are a String containing GLSL ES language
 * http://www.khronos.org/registry/gles/specs/2.0/GLSL_ES_Specification_1.0.17.pdf
 * sent to the GPU driver for compilation.
 */
static final String vertexShader =
// For GLSL 1 and 1.1 code i highly recomend to not include a
// GLSL ES language #version line, GLSL ES section 3.4
// Many GPU drivers refuse to compile the shader if #version is different from
// the drivers internal GLSL version.
"#ifdef GL_ES \n" +
"precision mediump float; \n" + // Precision Qualifiers
"precision mediump int; \n" +   // GLSL ES section 4.5.2
"#endif \n" +

"uniform mat4    uniform_Projection; \n" + // Incomming data used by
"attribute vec4  attribute_Position; \n" + // the vertex shader
"attribute vec4  attribute_Color; \n" +    // uniform and attributes

"varying vec4    varying_Color; \n" + // Outgoing varying data
                                      // sent to the fragment shader
"void main(void) \n" +
"{ \n" +
"  varying_Color = attribute_Color; \n" +
"  gl_Position = uniform_Projection * attribute_Position; \n" +
"} ";

The fragment shader gets executed one time for each visible pixel fragment:

/* Introducing the OpenGL ES 2 Fragment shader
 *
 * The main loop of the fragment shader gets executed for each visible
 * pixel fragment on the render buffer.
 *
 *       vertex-> *
 *      (0,1,-1) /f\
 *              /ffF\ <- This fragment F gl_FragCoord get interpolated
 *             /fffff\                   to (0.25,0.25,-1) based on the
 *   vertex-> *fffffff* <-vertex         three vertex gl_Position.
 *  (-1,-1,-1)           (1,-1,-1)
 *
 *
 * All incomming "varying" and gl_FragCoord data to the fragment shader
 * gets interpolated based on the vertex positions.
 *
 * The fragment shader produce and store the final color data output into
 * gl_FragColor.
 *
 * Is up to you to set the final colors and calculate lightning here based on
 * supplied position, color and normal data.
 *
 * The whole fragment shader program are a String containing GLSL ES language
 * http://www.khronos.org/registry/gles/specs/2.0/GLSL_ES_Specification_1.0.17.pdf
 * sent to the GPU driver for compilation.
 */
static final String fragmentShader =
"#ifdef GL_ES \n" +
"precision mediump float; \n" +
"precision mediump int; \n" +
"#endif \n" +

"varying   vec4    varying_Color; \n" + //incomming varying data to the
                                        //frament shader
                                        //sent from the vertex shader
"void main (void) \n" +
"{ \n" +
"  gl_FragColor = varying_Color; \n" +
"} ";

Use the source!

https://github.com/xranby/jogl-demos/blob/master/src/demos/es2/RawGL2ES2demo.java#L54
When you first look at the source code you might find the many lines needed to render a single triangle scary.
Take a break and instead focus on the display function that gets executed about 60times/s here notice that the CPU only have to perform a handful of 4x4 matrix multiplications and then call about 15 function calls to pass all data information to the vertex and fragment shader programs inside the GPU. The GPU then by itself performing all rendering to the screen. This small amount of preparation done by the CPU each frame can easily be performed by the most simple JVM interpreter. The main bottleneck is gone we have successfully offloaded all the time consuming graphics rendering from the CPU to the GPU and as a bonus we find that we got a lot of free idle CPU time to perform general application logic computations on the JVM.

Edit: For game programming we recommend you to use JogAmp indirect through a game engine library such as libgdx or jMonkeyEngine3. We have a forum thread inside the JogAmp community where we work on improving engine support for Raspberry Pi using said engines. By using a game engine will get you up to speed developing professional games that is utilizing the hardware acceleration across devices! http://forum.jogamp.org/JOGL-2-0-OpenGL-OpenGL-ES-backend-for-LibGDX-td4027689.html

Edit2: New JogAmp video/teaser footage is now online for the FOSDEM 2013 talk and the Siggraph 2012 BOF to get a better idea on what is possible to using JogAmp in combination with engines across devices. You may want to read the http://labb.zafena.se/?p=681 post that include background information on the FOSDEM 2013 demo setup.

I hope this introduction have been a delight to read, cheers and have a great day!
Xerxes

8 Comments »

  1. Thank you! Great!!!

    Comment by Andreas — September 12, 2012 @ 21:20

  2. Thank you Xerxes, I will add a link to your blog entry when I have some time to describe what we presented at Siggraph 2012.

    Comment by Julien Gouesse — September 17, 2012 @ 14:52

  3. Thank you very much for the demo! The rendering speed is indeed very impressive at 1080p for a such small device!

    Comment by emmanuel — October 22, 2012 @ 20:54

  4. [...] A cette occasion, Xerxes fit tourner JOGL 2.0 sur Raspberry Pi qui supporte OpenGL-ES 2.0 via Mesa comme vous pouvez le voir ici. [...]

    Pingback by SIGGRAPH 2012 à Los Angeles [SIGGRAPH 2012 at Los Angeles] | Blog de Julien Gouesse — November 11, 2012 @ 01:20

  5. Thank you for this article. I very need your help. I tried to run this demo on Raspberry Pi
    I've got this exception:
    Exception in thread "main" java.lang.UnsatisfiedLinkError:
    /tmp/jogamp_0000/file_cache/jln2113502944061639206/jln6985497449694482276/libgluegen-rt.so: /tmp/jogamp_0000/file_cache/jln2113502944061639206/jln6985497449694482276/libgluegen-rt.so: cannot open shared object file: No such file or directory

    I tried to change temp directory by setting -Djava.io.tmpdir but this didn't helped me. Can we set path to this library, so it wouldn't search it in temp directory?
    I hope you will help me.

    Comment by Elena — February 4, 2013 @ 18:52

  6. This issue is discussed on the JogAmp forum and is exposed when using the Oracle JDK 8 armhf early access build.
    http://forum.jogamp.org/gluegen-rt-raspberry-problem-with-tmp-directory-td4028128.html

    Quote Elena: " Thank you very much for help.
    java version "1.8.0-ea"
    Adding this option -Djava.library.path=/usr/lib/arm-linux-gnueabihf helped me."

    You are welcome!

    Comment by xerxes — February 6, 2013 @ 15:40

  7. The compile instructions have now slightly changed
    We, the JogAmp community, have moved the demos/es2 package inside the jogl-demos repository:
    The article have been updated accoringly.

    # Compile, run and enjoy:
    wget http://jogamp.org/deployment/jogamp-current/archive/jogamp-all-platforms.7z
    7z x jogamp-all-platforms.7z
    cd jogamp-all-platforms
    mkdir -p demos/es2
    cd demos/es2
    wget https://raw.github.com/xranby/jogl-demos/master/src/demos/es2/RawGL2ES2demo.java
    cd ../..
    javac -cp jar/jogl-all.jar:jar/gluegen-rt.jar demos/es2/RawGL2ES2demo.java
    java -cp jar/jogl-all.jar:jar/gluegen-rt.jar:. demos.es2.RawGL2ES2demo

    Comment by xranby — February 7, 2013 @ 04:02

  8. I am working on a fix for JogAmp JOGL 2.0.2
    https://jogamp.org/bugzilla/show_bug.cgi?id=821 - GL2ES2 backward compatibility broken using OpenGL 4.3.0 NVIDIA drivers

    Comment by Xerxes Rånby — August 29, 2013 @ 13:17

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by WordPress