zafena development

August 22, 2012

Your CPU is inefficient to render graphics on large displays:

For a long time computer programs and operating systems have been updating the display memory directly using the main CPU. Many high level programming languages was able to draw graphics faster by simply optimizing the internal JIT compiler that allowed the graphics routines to run faster on the CPU. This simple approach was fine when computers got relatively small screens of less than 640x480 pixels the code running on the CPU was then able to update all the 307 200 pixels on the screen using only a couple of Mhz processing time, if you wanted 60 updates per second then your CPU had to update 640*480*60 ≃ 18Million pixels each second, each pixel is made up red, green and blue stored in four bytes on a 32bit display thus 18Million * 4 = 72Million bytes to get updated each second, this was almost still possible to do on a 200Mhz CPU since you then had some time left to do general computations.

When you attached larger pixel density displays then the work load increase, a full HD display at 1980x1080 require that your CPU would need to update 1980*1080*60*4 ≃ 5000Million bytes/second you would then need a 10Ghz CPU to perform this task.. this kind of CPU is impossible to create, it would self-combust, since it require enormous amounts of power because the power consumption and generated heat by the CPU increases exponentially with the increased clock rate.

During the work of porting OpenJDK to run on ARM CPUs a lot of effort went into optimize the JVM to speed up basic computations, this work did also speed up graphics rendering to some degree using a JIT yet we faced a wall. The main bottleneck was limited that we only tried to use the main CPU. I had to find a way to offload graphics tasks from the main CPU to the graphics GPU processor found in the ARM, system on a chip, SoC designs to get good and fluid graphics performance using OpenJDK.

Modern graphics GPUs are designed to render graphics large screens using the least amount of energy:

A modern graphics GPU solves the problem on how to update the large displays by using the least amount of energy by using parallelism at the hardware level, the GPU is made up of several small processor-cores running at a lower clock rate that can receive instructions from the main CPU. The many smaller GPU cores can operate in parallel to quickly update the large screen. The main CPUs task is now changed from updating the graphics memory pixels directly to become a command central with the main purpose to inform the many GPU hardware cores what to do. Luckily the graphics GPU vendors have agreed on a standardised way on how to let applications running on the main CPU to interact and instruct the graphics GPU. The GPU vendors let you access the GPU by using the OpenGL "c" API. OpenGL is accessible by loading a shared library, libGL or libEGL, shipped with your operating-system.

Unfortunately high level languages running on the JVM can not use the system installed libGL or libEGL directly, they require a bridge usually coded in the JNI API to get access to the system OpenGL library. Writing JNI code manually to simply forward all OpenGL library calls would be high maintenance work and error prone.

I did check if there was any existing bindings found buried inside the OpenJDK Java2D classes and yes it did contain an old backend for desktop OpenGL 1 to let Swing and AWT applications run accelerated, unfortunately this backend had not been maintained for many years and was not even enabled by default, instead all Java2D was rendered using the CPU directly, you can enable this OpenGL 1 backend by passing -Dsun.java2d.opengl=True . The existing OpenGL 1 Java2D bindings contained no code to let applications access the most recent OpenGL 2 and OpenGL ES 2 hardware, I had to look elsewhere for a suitable binding to get access to accelerate graphics using the OpenGL ES 2 GPU found inside ARM SoCs.

JogAmp solves the problem on how to get access to your fast GPU to perform rendering from high level languages running on the JVM like Java:

I have worked with the JogAmp community that provide a platform neutral binding that allows languages running on the JVM a low overhead access to the system installed OpenGL library. JogAmp JOGL uses a tool called gluegen to load and probe the system installed libGL and libEGL at runtime and is able to auto-generate the required JNI code and classes to let languages running on the JVM get a low overhead direct access to the OpenGL API!

I have published a small JogAmp JOGL OpenGL ES 2.0 Vertex and Fragment shader introduction, that demonstrates how to access OpenGL 2/ES 2 from Java using JogAmp JOGL at:
https://github.com/xranby/jogl-demos/blob/master/src/demos/es2/RawGL2ES2demo.java#L54

The nice thing about this introduction is that the demo source and the compiled java .class will run unmodified on both Desktop OpenGL GL2 systems and Mobile OpenGL ES2 systems.
It uses the JogAmp GL2ES2 GLProfile that use the common subset of OpenGL calls found in both desktop GL2 and mobile ES2. JogAmp will handle all platform specific bits for you like how to open a native drawable surface and let your application focus on rendering using the platform independent OpenGL/ES calls. Note that the demo source-code itself contains no Architecture or OS specific code at all!

Demo break... lets have some fun...

Try the demo on a java enabled Desktop/Mobile running X11/Windows/MeeGo or Mac system:

wget http://jogamp.org/deployment/jogamp-current/archive/jogamp-all-platforms.7z
7z x jogamp-all-platforms.7z
cd jogamp-all-platforms
mkdir -p demos/es2
cd demos/es2
wget https://raw.github.com/xranby/jogl-demos/master/src/demos/es2/RawGL2ES2demo.java
cd ../..
javac -cp jar/jogl-all.jar:jar/gluegen-rt.jar demos/es2/RawGL2ES2demo.java
java -cp jar/jogl-all.jar:jar/gluegen-rt.jar:. demos.es2.RawGL2ES2demo

Raspberry Pi supported!

At Siggraph 2012 i demonstrated for the first time JogAmp JOGL OpenGL ES 2 bindings running on the RaspberryPi, since then all source-code have been committed to the JogAmp JOGL git and been processed through the JogAmp "chuck" auto-builder. Raspberry Pi is supported by the current JogAmp release thus use the same instructions to compile and run as compared to desktop! The Raspberry Pi Broadcom VC IV NEWT driver is included.
I am personally *stunned* by the excellent performance you get on the small Raspberry Pi machine, at 5-watt power consumption, it runs butter-smooth and outclass my desktop Intel Q45/Q43 Chipset system when running at full HD 1980x1080 resolution.

I hope you enjoyed the demo.

Lets talk about the shader programs that got executed inside your graphics GPU:

The only way to create a program that can get run and executed on the GPU is by transmitting shader code to the GPU through the OpenGL 2 API.

The shader program itself is sent as a clear text string to your GPU driver for compilation. When the program is compiled OpenGL hands you a reference to the program in form of a number-ticket that you later on can use to activate the program. You will not be able to actually see the compiled code, what the compiled program looks like is still a secret only known by your GPU vendor.

The opengl API let you define two types of GPU programs, a vertex shader and a fragment shader:

The vertex shader gets executed one time for each vertex:

/* Introducing the OpenGL ES 2 Vertex shader
 *
 * The main loop inside the vertex shader gets executed
 * one time for each vertex.
 *
 *      vertex -> *       uniform data -> mat4 projection = ( 1, 0, 0, 0,
 *      (0,1,0)  / \                                          0, 1, 0, 0,
 *              / . \  <- origo (0,0,0)                       0, 0, 1, 0,
 *             /     \                                        0, 0,-1, 1 );
 *  vertex -> *-------* <- vertex
 *  (-1,-1,0)             (1,-1,0) <- attribute data can be used
 *                        (0, 0,1)    for color, position, normals etc.
 *
 * The vertex shader recive input data in form of
 * "uniform" data that are common to all vertex
 * and
 * "attribute" data that are individual to each vertex.
 * One vertex can have several "attribute" data sources enabled.
 *
 * The vertex shader produce output used by the fragment shader.
 * gl_Position are expected to get set to the final vertex position.
 * You can also send additional user defined
 * "varying" data to the fragment shader.
 *
 * Model Translate, Scale and Rotate are done here by matrix-multiplying a
 * projection matrix against each vertex position.
 *
 * The whole vertex shader program are a String containing GLSL ES language
 * http://www.khronos.org/registry/gles/specs/2.0/GLSL_ES_Specification_1.0.17.pdf
 * sent to the GPU driver for compilation.
 */
static final String vertexShader =
// For GLSL 1 and 1.1 code i highly recomend to not include a
// GLSL ES language #version line, GLSL ES section 3.4
// Many GPU drivers refuse to compile the shader if #version is different from
// the drivers internal GLSL version.
"#ifdef GL_ES \n" +
"precision mediump float; \n" + // Precision Qualifiers
"precision mediump int; \n" +   // GLSL ES section 4.5.2
"#endif \n" +

"uniform mat4    uniform_Projection; \n" + // Incomming data used by
"attribute vec4  attribute_Position; \n" + // the vertex shader
"attribute vec4  attribute_Color; \n" +    // uniform and attributes

"varying vec4    varying_Color; \n" + // Outgoing varying data
                                      // sent to the fragment shader
"void main(void) \n" +
"{ \n" +
"  varying_Color = attribute_Color; \n" +
"  gl_Position = uniform_Projection * attribute_Position; \n" +
"} ";

The fragment shader gets executed one time for each visible pixel fragment:

/* Introducing the OpenGL ES 2 Fragment shader
 *
 * The main loop of the fragment shader gets executed for each visible
 * pixel fragment on the render buffer.
 *
 *       vertex-> *
 *      (0,1,-1) /f\
 *              /ffF\ <- This fragment F gl_FragCoord get interpolated
 *             /fffff\                   to (0.25,0.25,-1) based on the
 *   vertex-> *fffffff* <-vertex         three vertex gl_Position.
 *  (-1,-1,-1)           (1,-1,-1)
 *
 *
 * All incomming "varying" and gl_FragCoord data to the fragment shader
 * gets interpolated based on the vertex positions.
 *
 * The fragment shader produce and store the final color data output into
 * gl_FragColor.
 *
 * Is up to you to set the final colors and calculate lightning here based on
 * supplied position, color and normal data.
 *
 * The whole fragment shader program are a String containing GLSL ES language
 * http://www.khronos.org/registry/gles/specs/2.0/GLSL_ES_Specification_1.0.17.pdf
 * sent to the GPU driver for compilation.
 */
static final String fragmentShader =
"#ifdef GL_ES \n" +
"precision mediump float; \n" +
"precision mediump int; \n" +
"#endif \n" +

"varying   vec4    varying_Color; \n" + //incomming varying data to the
                                        //frament shader
                                        //sent from the vertex shader
"void main (void) \n" +
"{ \n" +
"  gl_FragColor = varying_Color; \n" +
"} ";

Use the source!

https://github.com/xranby/jogl-demos/blob/master/src/demos/es2/RawGL2ES2demo.java#L54
When you first look at the source code you might find the many lines needed to render a single triangle scary.
Take a break and instead focus on the display function that gets executed about 60times/s here notice that the CPU only have to perform a handful of 4x4 matrix multiplications and then call about 15 function calls to pass all data information to the vertex and fragment shader programs inside the GPU. The GPU then by itself performing all rendering to the screen. This small amount of preparation done by the CPU each frame can easily be performed by the most simple JVM interpreter. The main bottleneck is gone we have successfully offloaded all the time consuming graphics rendering from the CPU to the GPU and as a bonus we find that we got a lot of free idle CPU time to perform general application logic computations on the JVM.

Edit: For game programming we recommend you to use JogAmp indirect through a game engine library such as libgdx or jMonkeyEngine3. We have a forum thread inside the JogAmp community where we work on improving engine support for Raspberry Pi using said engines. By using a game engine will get you up to speed developing professional games that is utilizing the hardware acceleration across devices! http://forum.jogamp.org/JOGL-2-0-OpenGL-OpenGL-ES-backend-for-LibGDX-td4027689.html

Edit2: New JogAmp video/teaser footage is now online for the FOSDEM 2013 talk and the Siggraph 2012 BOF to get a better idea on what is possible to using JogAmp in combination with engines across devices. You may want to read the http://labb.zafena.se/?p=681 post that include background information on the FOSDEM 2013 demo setup.

I hope this introduction have been a delight to read, cheers and have a great day!
Xerxes

February 27, 2012

Today JogAmp added a workaround to deal with GPU drivers that reports a bogus 0Hz screen refresh rate. With this fix in place hardware acceleration are working out of the box on Nokia N9 MeeGo phones in combination with the Nokia compiled Imaginative Technologies SGX 530 GPU drivers!

If you have OpenJDK installed on any ARMv7 board with a proper OpenGL-ES libEGL and libGLES driver setup then you can try running this for yourself by using my prebuilt jogamp-armv7 jars.

wget http://labb.zafena.se/jogamp/armv7/jogamp-armv7.tar.gz

tar zxvf jogamp-armv7.tar.gz

cd jogamp

sh ./run-desktop.sh

Source and build instructions are available.

JogAmp JOGL OpenGL-ES Driver compatiblity matrix

I am tracking ARMv7 libEGL/libGLES* GPU drivers compatiblity with JogAmp here:

http://jogamp.org/wiki/index.php/OpenGL_ES_Driver_compatibility_matrix

Chuck Norris force you to use the produced jars from the JogAmp "Chuck Norris" build-bot!

https://jogamp.org/chuck/job/jogl/684/

http://jogamp.org/deployment/autobuilds/master/jogl-b684-2012-02-27_11-04-43/
http://jogamp.org/deployment/autobuilds/master/jogl-b684-2012-02-27_11-04-43/artifact.properties uses gluegen build 510
http://jogamp.org/deployment/autobuilds/master/gluegen-b510-2012-02-25_20-44-27/

Assemble a ARMv7 jogamp testfolder using the JogAmp daily build:

wget http://jogamp.org/deployment/autobuilds/master/gluegen-b510-2012-02-25_20-44-27/gluegen-2.0-b510-20120225-linux-armv7.7z

wget http://jogamp.org/deployment/autobuilds/master/jogl-b684-2012-02-27_11-04-43/jogl-2.0-b684-20120227-linux-armv7.7z

7z x gluegen-2.0-b510-20120225-linux-armv7.7z

7z x jogl-2.0-b684-20120227-linux-armv7.7z

mkdir -p jogamp/jar
cp -r jogl*/etc jogamp/etc/
cp gluegen*/jar/*.jar jogamp/jar
cp gluegen*/lib/* jogamp/jar
cp jogl*/jar/*.jar jogamp/jar
cp jogl*/lib/lib* jogamp/jar
cp /usr/share/java/hamcrest-core.jar jogamp/
cp /usr/share/java/junit4.jar jogamp/

cd jogamp

java -cp jar/gluegen.jar:jar/jogl.all-mobile.jar:jar/jogl.test.jar:hamcrest-core.jar:junit4.jar com/jogamp/opengl/test/junit/jogl/demos/es2/newt/TestGearsES2NEWT -time 40000

Enjoy!

February 25, 2012

I have created a Twitter stream where I track my current progress on getting desktop OpenJDK applications running faster on ARM by taking advantage of the OpenGL ES, possibly in combination with the lima driver, and OpenVG hardware acceleration.

OpenJDK  currently fallback to used CPU bound software rendering to draw most of Java2D and 3D application on ARM.

I decided to look into it and noticed that OpenJDK internally currently only support fast hardware acceleration on GNU/Linux systems by using the standard libGL OpenGL library, this libGL library do not support the latest ARM system on a chip GPU designs instead it allways fallback to use the uttelry slow Mesa software rasterizer. In order to get things fast OpenJDK need to take controll of the ARM GPU's using the libEGL and libGLES* OpenGL ES library drivers.

You can get OpenJDK running 2x faster today by simply setting:

_JAVA_OPTIONS="-Dsun.java2d.xrender=true"

This will enable the xrender pipeline made by Clemens, its compiled in most OpenJDK builds and are simply waiting for you to switch it on to test it, its not as fast as the libEGL drivers but its faster than the pure software rendered X11 pipeline. :)

I expect to get OpenJDK running butter smooth when proper hardware acceleration using JogAmp or LWJGL are in place, both of these API have recently added OpenGL ES support in the latest releases. A promising candidate to make it happen are to combine the JogAmp JOGL OpenGL ES bindings with Brandon Borkholder's GLG2D.

https://twitter.com/#!/xranby -Tweeets for you!

« Newer Posts

Powered by WordPress