[Contributing] Decals

Any community contributions to libgdx go here! Some may get included in the core API when permission is granted.

[Contributing] Decals

Postby Vevusio » Sun Feb 27, 2011 2:05 am

For the last few days i've been working on an alternative to Sprites.
The result are the Decal and DecalBatch class

Decals are basically sprites in 3d space, so there are the following additions:
+ Their position has a z component
-> You don't have to bother with rendering order, just set the z position once and let the batch do the rest
+ They can be rotated along their local x, y and z axis
-> If you only use an orthogonal 2d world you can rotate them just like you would a sprite by rotating along the z axis
+ The batch takes care of things like transparency and blending
-> If some of your decals require simple 0/100% transparency, others GL_ONE/GL_ONE blending, others whatnot, the batch deals with that (SpriteBatch had a setBlendFunc)
+ Possibility to set a depth sort or frustum culling strategy
-> Do you have a 2d side scroller? If yes write your own culling strategy that checks if a decal is in the view frustum. The batch automatically queries for view checks or depth sorting
+ Uses the z buffer
-> No overdraw if decals cover each other. I first considered the option to enable/disable the use of the buffer, but the trade off between avoiding overdraw and z testing probably evens out or ends up favoring the z testing
+ Decal batch is a singleton
-> There really is no need for more than 1 batch. Instead of managing it yourself just use DecalBatch.instance()

If you use Sprite and SpriteBatch you can simply switch them out.
Using Decals is very simple

Code: Select all
//load up a texture
Texture texture = new Texture(Gdx.files.internal("texture.png"));
//instanciate a decal
Decal decal = Decal.newDecal(new TextureRegion(texture));

//when rendering
DecalBatch.instance().add(decal_1);
DecalBatch.instance().add(decal_2);
..
DecalBatch.instance().add(decal_n);
//when done
DecalBatch.instance().flush();


DecalBatch has no matrices of its own. It will use whatever projection/modelview matrix is currently applied. If you used a camera just apply the transformation before flush()ing. If you used custom matrices with SpriteBatch just do something like

Code: Select all
class MatrixPoweredBatch {
    Matrix4 projection;
    Matrix4 modelView;

    public void add(Decal decal) { DecalBatch.instance().add(decal); }

    public void flush() {
        applyMatrices(); // member method of this class which changes the matrix mode to projection etc. etc.
        DecalBatch.instance().flush();
    }
}


I wanted to keep the matrices out of the batch.

Last but not least. Performance:
I know it's not a _proper_ benchmark, but good enough. One of the 2 samples you can download with the attachment uses a modified version of the code which you can find on the blog.
The modified version has a target fps (set to 40). What happens is that the average of the FPS of the last 5 seconds is taken, if its < 40 the amount of Decals rendered is decreased, if its > 40 then the amount of Decals rendered is increased. Exactly the same program exists using Sprites and SpriteBatch.
So after 30 sec or so the thing settles at 40 (+-1) fps. The purpose was to compare DecalBatch to SpriteBatch.
The screen resolution, camera etc. are the same on both tests.
The SpriteTest uses 2 textures (one needs blending, the other doesn't).
The DecalTest uses 3 textures (same 2 as SpriteTest + one which uses additive blending).
Code: Select all
 @40fps  |   PC   |  Device
--------------------------------
Sprites# | ~5.2k  |   ~115
Decals#  | ~31.1k |   ~480

* PC = i5, GTX260
* Device = i9000
-- Executed in direct succession to each other so roughly the same cpu load/ram usage can be assumed.


31k decals rendered (no blending, transparency blending and additive blending)
Image

demonstrating rotation
Image

NOTE1: The Decal uses Quaternion rotation, the Quaternion in the math package of libgdx really only contains a normalize method, i've subclassed it and added functionality, it would be nice if this could be merged with the existing quaternion - as far as math stuff and merging goes there are a couple TODO tags
NOTE2: The DecalBatch unlike the SpriteBatch does not utilize OGL ES 2.0 if it is available (also - TODO)
Attachments
decals.zip
(152.27 KiB) Downloaded 15282 times
Vevusio
 
Posts: 13
Joined: Tue Feb 15, 2011 2:12 am

Re: [Contributing] Decals

Postby mzechner » Sun Feb 27, 2011 2:35 am

This is awesome. I'll check it out asap and contact you concerning inclusion and licensing issues.

The benchmark is interesting (can't test it at the moment). The early out z-tests seem to help a lot. Awesome!
mzechner
Site Admin
 
Posts: 4879
Joined: Sat Jul 10, 2010 3:50 pm

Re: [Contributing] Decals

Postby mzechner » Sun Feb 27, 2011 3:38 am

OK, a couple of comments:

  • this is neat :D
  • - the state managment doesn't work with OpenGL ES 1.0. See page 28/29 at http://www.khronos.org/registry/gles/specs/1.0/opengles_spec_1_0.pdf
  • there's a small bug in DecalBatch.java, line 293. It enables depth writes instead of alpha testing.
  • while the use of iterators, Collection.sort(), HashMaps and so on is good practice it's sadly a no go on Android. All these classes produce allocations which make the GC on Android go crazy (yes, even on 2.3/3.0).
  • alpha testing does not go down well with the deferred tile renderers in mobile GPUs. I assume you enable it so that fully translucent pixels don't get written to the z-buffer? The problem is that this disables early-z testing which is a big win in many circumstances.
  • - setting the depth sort and culling strategy must be accompanied by a flush. Otherwise decals not yet flushed before setting the sort/cull will be rendered with the wrong settings.
  • instead of using a loop to transfer the vertices from the decal to the batch you should use System.arrayCopy(). That might increase the performance even more.having the pivot of the decal be its center is a good idea. Anything else is just asking for trouble and not really necessary
  • using quats to avoid gimbal lock and kill a few calculations is a good idea as well. the current Quaternion class lacks a few functions, we'll happily add what you have in QuaternionExt (and fix Matrix.fromQuat, mea culpa, old copy & paste job...)
  • Using srdBlend and dstBlend to decide whether a material is translucent or transparent is interesting. I have to check the other blend modes to see whether they exclude the need for alpha testing. If that's the case then there's no reason to distinguish between transparent and translucent as you need blending and alpha testing in both cases.
  • on a related note: alpha testing is not available in OpenGL ES 2.0, for the reason stated above. It kills all optimizations the GPU could do. Can we live without it? Then the transparent/translucent distinction could also go away.
  • unrolling Decal.transformVertices() might increase performance as well, especially on Android were there's no method inlining.
  • Cull and sort strategy interfaces are a very good idea. We have a coding standard that does not use I as an interface descriptor, that's more C# than Java.
  • We'd need to add shader support
  • The single instance approach is legitimate, we might consider it for SpriteBatch as well.

Also, You gamed the benchmarks :D

Code: Select all
      batch.begin();
      batch.disableBlending();
      int i = 0;
      for(Sprite sprite : toRender) {
         if(i++ % 2 == 0)
            batch.enableBlending();
         sprite.setRotation(sprite.getRotation() + elapsed * 45);
         sprite.setScale(scale, scale);
         sprite.draw(batch);
      }
      batch.end();


You disable/enable blending for each sprite which will flush the SpriteBatch internally. I already wondered how the DecalBatch could be faster in the bench than SpriteBatch. Well, now i know :) In DecalBatch you can sort by material and render material groups in one go, that's a clear plus over SpriteBatch. However, that only works when using the z-buffer and giving your Decals different z-coordinates, which, depending on the scenario, might not be an option (e.g. 3D game with perspective camera plus 2D HUD with Decals needs a z-buffer clear which is costly). So, DecalBatch is clearly faster than SpriteBatch in a scenario where you want to render opaque and blended sprites in an alternating manner. Without that need (and that need is not all that common in 2D) SpriteBatch wins.

Another slight advantage you give to the DecalTest is that you render less pixels. Lightingball.png is a 128x128 image, and every third decal uses that image. The other two images are 256x256. SpriteTest only uses the 256x256 images, so DecalTest renders 25% pixels per frame.

Overall it's an awesome addition and we'd love to include it. For this you'd have to agree to give us the right to include it in libgdx under the Apache 2 license along with the right for us to modify it as needed. You'd of course keep the copyright (nona as we say here in Austria) and would be listed as the author of the files in libgdx. If that's fine with you i'd start integrating this for 0.9.
mzechner
Site Admin
 
Posts: 4879
Joined: Sat Jul 10, 2010 3:50 pm

Re: [Contributing] Decals

Postby Obli » Sun Feb 27, 2011 9:22 am

Awesome stuff yes !
Still, as Mario pointed out, could you provide a comparison with the exact same conditions for SpriteB and DecalB ?
Obli
 
Posts: 616
Joined: Mon Jan 10, 2011 6:18 pm
Location: Bordeaux, France

Re: [Contributing] Decals

Postby NateS » Sun Feb 27, 2011 11:41 am

Looks really cool!

mzechner wrote:
  • while the use of iterators, Collection.sort(), HashMaps and so on is good practice it's sadly a no go on Android. All these classes produce allocations which make the GC on Android go crazy (yes, even on 2.3/3.0).

Vevusio, the sort problem you reported has been fixed, so you should be ok to use the libgdx utils classes. Thanks for pointing it out!

mzechner wrote:
  • The single instance approach is legitimate, we might consider it for SpriteBatch as well.

The SpriteBatch constructor is still needed for batch size and n-buffering, but the singleton is convenient for tests and for some apps. I don't really have strong feelings either way.

BlindZSort doesn't respect the contract on compare: sgn(compare(x, y)) == -sgn(compare(y, x))
NateS
 
Posts: 1980
Joined: Fri Nov 12, 2010 11:08 am

Re: [Contributing] Decals

Postby Obli » Sun Feb 27, 2011 12:30 pm

The problem with singleton in libgdx is that using the lib for a multi-window desktop application would be much harder (but, well, I'm not sure that the OpenGL context can be duplicated for now either...)
Obli
 
Posts: 616
Joined: Mon Jan 10, 2011 6:18 pm
Location: Bordeaux, France

Re: [Contributing] Decals

Postby mzechner » Sun Feb 27, 2011 12:44 pm

I hear you. For 1.0 (or the one of the next nightlies) i'll concentrate on making all things libgdx configureable with regards to window/context setup. This includes fullscreen optionsas well as context sharing.
mzechner
Site Admin
 
Posts: 4879
Joined: Sat Jul 10, 2010 3:50 pm

Re: [Contributing] Decals

Postby syl » Sun Feb 27, 2011 1:08 pm

here are the results on my phenom II with an integrated Radeon HD 4250 graphics card, on Linux

SpritePerformanceTest2 fps: 40 at spritecount: 624

DecalPerformanceTest2 fps: 40 at spritecount: 1337


on the milestone :

02-27 14:05:15.381: INFO/System.out(7460): SpritePerformanceTest2 fps: 42 at spritecount: 21

02-27 14:09:47.185: INFO/System.out(7495): DecalPerformanceTest2 fps: 34 at spritecount: 41
02-27 14:09:47.209: DEBUG/dalvikvm(7372): GC freed 361 objects / 19976 bytes in 130ms
02-27 14:09:48.193: INFO/System.out(7495): DecalPerformanceTest2 fps: 36 at spritecount: 34
02-27 14:09:49.217: INFO/System.out(7495): DecalPerformanceTest2 fps: 36 at spritecount: 34
02-27 14:09:50.240: INFO/System.out(7495): DecalPerformanceTest2 fps: 37 at spritecount: 34
02-27 14:09:51.256: INFO/System.out(7495): DecalPerformanceTest2 fps: 36 at spritecount: 34


I believe that's a device issue :p
the gc comes but not a lot much than with sprites.

what is strange is that Sprites looks smoother on Desktop, but Decals looks smoother on device :/

keep up the great work :)
syl
 
Posts: 212
Joined: Mon Nov 01, 2010 10:25 pm
Location: Bordeaux, France

Re: [Contributing] Decals

Postby mzechner » Sun Feb 27, 2011 1:31 pm

There shouldn't be any GC with sprites. I outlined the reason for SpriteBatch being a bitch in the given test above so the result is not a surprise. However, the Decal performance is unexpected. I'd have assumed it was faster? Hrm, i still have to lay my hands on the code, only looked at it in Notepad++.
mzechner
Site Admin
 
Posts: 4879
Joined: Sat Jul 10, 2010 3:50 pm

Re: [Contributing] Decals

Postby Vevusio » Sun Feb 27, 2011 4:56 pm

wooh ok that is a lot of bugs, and i would've gotten away with it, if it wasn't for this meddling open source

Also, You gamed the benchmarks
---
could you provide a comparison with the exact same conditions for SpriteB and DecalB

yes, no, maybe :)
i admit it was unfair to take exactly that code to benchmark, i realize it is the absolutely worst case a SpriteBatch can end up with, so here's the best case:
Code: Select all
 @40fps  |   PC    |  Device
--------------------------------
Sprites# | ~15.5k  |   ~890
Decals#  | ~31.5kk |   ~830

* Blending disabled once at the create() method for sprite batch, using only 1 texture so there is no switching
* DecalBatch now renders the same amount of pixels per decal as SpriteBatch per sprite

ultimately it's not really possible to do a real-real benchmark, SpriteBatch is like a mogwai, you have to watch what you feed it or else your town gets overrun by gremlins, while DecalBatch is like a blender, the order you throw stuff in doesn't matter, the result will be the same - ok enough with the bad metaphors
i guess the fast gpu on the pc wins with the z buffer but on the device it's a different story, anyway, there is a performance hit on the device with decals vs the advantage not having to care about the order in which objects are sent to the batch
Image


* instead of using a loop to transfer the vertices from the decal to the batch you should use System.arrayCopy(). That might increase the performance even more.
* while the use of iterators, Collection.sort(), HashMaps and so on is good practice it's sadly a no go on Android. All these classes produce allocations which make the GC on Android go crazy (yes, even on 2.3/3.0).
* there's a small bug in DecalBatch.java, line 293. It enables depth writes instead of alpha testing.
* unrolling Decal.transformVertices() might increase performance as well, especially on Android were there's no method inlining.
* We have a coding standard that does not use I as an interface descriptor, that's more C# than Java.
* the sort problem you reported has been fixed, so you should be ok to use the libgdx utils classes. Thanks for pointing it out!
* BlindZSort doesn't respect the contract on compare: sgn(compare(x, y)) == -sgn(compare(y, x))

ok gonna fix those things

* - setting the depth sort and culling strategy must be accompanied by a flush. Otherwise decals not yet flushed before setting the sort/cull will be rendered with the wrong settings.

i don't really want to do that, i' like to keep the "add stuff, render stuff" principle if somebody wants to use different strategies within one frame i think it is cleaner if they do it like this
Code: Select all
render() {
    renderWorld();
    renderHUD();
}
renderWorld() {
    ActiveCamera.instance().set(this.worldCam).apply(gl);
    DecalBatch.instance().setCullStrategy(firstSphereThenPerPointCulling);
    addWorldObjects();
    DecalBatch.instance().flush();
}
renderHud() {
    ActiveCamera.instance().set(this.screenCam).apply(gl);
    DecalBatch.instance().setCullStrategy(null);
    addHUDObjects();
    DecalBatch.instance().flush();
}

ultimately it has the same effect but the interface is cleaner

* the state managment doesn't work with OpenGL ES 1.0. See page 28/29 at http://www.khronos.org/registry/gles/sp ... ec_1_0.pdf
* Using srdBlend and dstBlend to decide whether a material is translucent or transparent is interesting. I have to check the other blend modes to see whether they exclude the need for alpha testing. If that's the case then there's no reason to distinguish between transparent and translucent as you need blending and alpha testing in both cases.
* on a related note: alpha testing is not available in OpenGL ES 2.0, for the reason stated above. It kills all optimizations the GPU could do. Can we live without it? Then the transparent/translucent distinction could also go away.
* alpha testing does not go down well with the deferred tile renderers in mobile GPUs. I assume you enable it so that fully translucent pixels don't get written to the z-buffer? The problem is that this disables early-z testing which is a big win in many circumstances.

alright, i don't know how deferred rendering does all the lighting stuff but its obvious that there is no way to make everyone happy with the fixed state changing
i will add another strategy which hooks in before/after rendering opaque/transparent stuff, and implement different versions so there is no problem between different versions, people who want to use something like deferred rendering will just implement their own

about the alpha testing, yes - "invisible" pixels don't get written to the z buffer, i don't quite understand why that would disable early z testing? if the pixel "isn't there" it has no business of being in the z buffer in the first place, and if it was, it would be tested against other pixels, which if they don't pass the z test (because of that invis pixel) would not be rendered even though they should (because they are actually visible, instead they are covered by the z-buffer-blocking-fully-transparent-pixel)
but yes, i will merge the transparent/translucent maps together

--

so anyway, ill do a couple of changes and post a new version later
Vevusio
 
Posts: 13
Joined: Tue Feb 15, 2011 2:12 am

Next

Return to Libgdx Contributions

Who is online

Users browsing this forum: No registered users and 1 guest