UEFI+BIOS bootloader combination

Discussions on more advanced topics such as monolithic vs micro-kernels, transactional memory models, and paging vs segmentation should go here. Use this forum to expand and improve the wiki!
rdos
Member
Member
Posts: 3297
Joined: Wed Oct 01, 2008 1:55 pm

Re: UEFI+BIOS bootloader combination

Post by rdos »

Brendan wrote: Yes; and sending 2 KiB of "list of commands" 60 times per second is a lot more efficient than sending 8 MiB of pixel data 60 times per second; so that "appropriate driver" can just do nothing.
I have no idea why you think command lists would avoid "pixel pounding". If some operation (like rendering a PNG) requires pixels as an input, then it does regardless if you use LFB or command lists. Or are you suggesting that PNG files are processed in the GPU? And how about TrueType rendering? Are you suggesting this will also be done in the GPU?
Brendan wrote:
Owen wrote:Every good system is built upon layers. For graphics, I view three:
  • The low level layer; your OpenGL and similar; direct access for maximum performance. Similarly, you might allow direct PostScript access to PostScript printers (for example)
  • The mid level layer; your vector 2D API; useful for drawing arbitrary graphics. Cairo or OpenVG.
  • The high level layer; your GUI library; uses the vector API to draw widgets
You'll find all competent systems are built this way today, and all incompetent systems evolve to look like it (e.g. X11 has evolved Cairo, because X11's rendering APIs suck)
Every bad system is also built on layers. The most important thing is that application developers get a single consistent interface for all graphics (not just video cards), rather than a mess of several APIs and libraries that all do similar things in different ways.
Owen has a point. While you want a (Windows-like) uniform layer that can do anything from low-level graphics primitives to advanced GUI operations, and instead want to layer with "command lists", this will end up as a real inflexible solution. For instance, what happens if the user wants custom menus or command buttons? Will he need to implement those from the ground-up because you have no middle-layer that can be used as a starting point?

Additionally, it is pointless to run a GUI against a printer or file.
Antti
Member
Member
Posts: 923
Joined: Thu Jul 05, 2012 5:12 am
Location: Finland

Re: UEFI+BIOS bootloader combination

Post by Antti »

This is quite complicated and the debate has no members who are "absolutely wrong" with their views of this topic. Even though I said that "a high-level description of what to draw" is superior, there are some cases where it is not. If we had a client-server model and the server side took care of all logic and building the scene (complex things), the client would probably want to have rasterized array of pixels. In this case, if the client received complex instructions of how to render some non-trivial scene, it would be slower for its own display driver infrastructure to build the image from these instructions than simply blit the array of pixels. This same applies, of course, for photographs where it is not easy to convert it to vector-like graphics. It is necessary to handle pixel-specific information.

However, I think that most of the desktop applications could use the descriptive definition of what to draw. I like this idea more but it has some problems like Owen has pointed out. For example, if thinking about all the failed "retained mode" attempts, it is a challenge to succeed in this. The direction Brendan is going to is fundamentally the right way when it comes to this. I do not understand how it is not obvious that the description of what to draw gives very nice abstraction and flexibility to do whatever we want to do. Maybe it is obvious but the real-life examples tend to favor fastest possible solutions currently available. I really hope that the advantages of the "command list" will win the disadvantages in the long run.
User avatar
Owen
Member
Member
Posts: 1700
Joined: Fri Jun 13, 2008 3:21 pm
Location: Cambridge, United Kingdom
Contact:

Re: UEFI+BIOS bootloader combination

Post by Owen »

Brendan wrote:Hi,
Owen wrote:I was lazy and it was an easy test to do. I downloaded "glxgears" for Windows (Vista) and did the experiment again - exactly the same speed when everything except a thin black strip is covered, and 5 times faster when the window is minimised. This shows that both X and Vista suck (but doesn't rule out the fact that glxgears sucks).

I don't have any 3D games for Linux. Instead I tested the frame rate of Minecraft on Windows. I set the rendering distance to "far" and stood on top of a very tall tower looking out towards the horizon, so that there's lots of graphics on the bottom of the screen and nothing but blue sky at the top of the screen. With the game paused and the window entirely visible I got 25 frames per second, and with the game paused and everything except for part of the sky (and the FPS display) covered up I got the same 25 frames per second - no speedup at all. I didn't/couldn't measure it minimised. Then I decided to try Crysis and got the same frame rates (for both the menu and while playing the game) regardless of how much of the game's window is obscured.

Finally (for fun) I decided to try Minecraft and Crysis running (and both visible) at the same time. Minecraft was paused and Crysis seemed playable. After loading a saved game in Crysis, I ran up to a beach and decided to unload some ammo into a passing turtle. As soon as I pressed the trigger the graphics froze, then Minecraft and Crysis both crashed at the same time. Three cheers for mouldy-tasking OSs! ;)

Feel free to conduct your own tests using whatever games and/or 3D applications you like, running on whatever OS you like, if you need further confirmation that existing graphics systems suck.
Theres' no denying that in some regards they suck. Actually, modern versions of Windows are handicapped in some ways with this - if you hover over the Crysis icon in your task bar you'll probably find an animated thumbnail of it.
Owen wrote:So I take it you're using an old and crappy non-compositing window manager then? Because that's the only situation in which it would become relevant that the window was interposed in between two displays
I'm using KDE 4, which is meant to support compositing (but every time I attempt to enable compositing effects it complains that something else is using the graphics accelerator and refuses). To be honest; it was hard enough just getting it to run OpenGL in a window (different versions of X and ATI drivers and libraries and whatever, all with varying degrees of "unstable"); and once I found a magic combination that seemed to work I stopped updating any of it in case I upset something.
Got it. Crappy video drivers.
Brendan wrote:
Owen wrote:Of course, in that case each graphics card isn't doing half the work each. Assuming the commands are going to both cards, they're both rendering the scene. Even if you managed to spatially divide the work up precisely between the cards.. they'd still be rendering more than 1/2 of the triangles each (because of overlap)
True; but I'd expect worst case to be the same frame rate when the window is split across 2 video cards, rather than 50 times slower.
Owen wrote:Whatever is going on, its' not smart, and it says more about the X server than GLXGears.

What I suspect is happening is that the "primary" card is rendering the image. Because its' split between the two screens, X11 cant' do its normal non-composited thing and render directly into a scisor-rect of the desktop. Therefore, its' falling back to rendering to a buffer, and its' probably picked something crappy like a pixmap (allocated in system memory), then copying out of said pixmap to the framebuffer on the CPU.

Even if you're using a compositing window manager, X11 barely counts as modern. If you're not using a compositing window manager, yeah, expect these problems because that's just how X11 is.
Yes, but I doubt Wayland (or Windows) will be better. The problem is that the application expects to draw to a buffer in video display memory, and that buffer can't be in 2 different video cards at the same time. There's many ways to make it work (e.g. copy from one video card to another) but the only way that doesn't suck is my way (make the application create a "list of commands", create 2 copies and do clipping differently for each copy).
Except this model is already in use today, on laptops and machines where all the outputs are connected to the IGP, such that the discrete graphics card can be powered down completely when you're not gaming. Surely you've heard of things like nVIDIA's Optimus and LucidLogix' Virtu?

For the former case the overhead is generally about 2%. Whether they're doing DMA from video RAM to system RAM, or just allocating the colour buffer in system RAM (its' the depth buffer which gets most traffic), I don't know. The latter option would actually work quite well with modern games which write the colour buffer exactly once (because they have various post processing effects)
Brendan wrote:
Owen wrote:OK, so you do expect to make the one final retained mode 3D engine to rule all 3D engines then.
I expect to create a standard set of commands for describing the contents of (2D) textures and (3D) volumes; where the commands say where (relative to the origin of the texture/volume being described) different things (primitive shapes, other textures, other volumes, lights, text, etc) should be, and some more commands set attributes (e.g. ambient light, etc). Applications create these lists of commands (but do none of the rendering).
So the final retained mode 3D engine to rule all 3D engines. Except yours is going to be more abstracted and therefore lower performance than most, never mind the fact that developers aren't going to be able to do anything innovative on it because you haven't implemented any of that yet...
Brendan wrote:So you're saying that it's entirely possible to use ray tracing on data originally intended for rasterization, and that the ray traced version can be much higher quality than the "real time rasterized" version; and your only complaint is that you'd get even higher quality images from ray tracing if mathematical models of shapes is used instead things like meshes?
Sure, you can raytrace polygon meshes. You'd never want to.

As I said: Raytracing provides very good reproduction for specular lighting effects and terrible reproduction for diffuse effects. Rasterization natively provides nothing, but the shading systems modern engines use mean that, for diffuse lighting - the majority of lighting in the real world - they're more convincing.
Brendan wrote:
Owen wrote:
Brendan wrote:If the graphics are completely ignored; then in both cases the application decides what to draw. After that, in my case the application appends some tiny little commands to a buffer (which is sent, then ignored). In your case the application does a massive amount of pointless pixel pounding (which is sent, then ignored). If you think my way is slower, you're either a moron or a troll.
GLXGears is hardly doing any pixel pounding. In fact, its' doing none at all. It's building command lists which, if your driver is good, go straight to the hardware. If your driver is mediocre, then.. well, all sorts of shenanigans can occur.
I don't think we were talking about GLXGears here; but anyway...

For GLXGears, the application sends a list of commands to the video card's hardware, and the video card's hardware does a massive pile of pointless pixel pounding, then the application sends the results to the GUI (which ignores/discards it)?

Why not just send the list of commands to the GUI instead, so that the GUI can ignore it before the prodigious pile of pointless pixel pounding occurs?
Why does the GUI not just take the application's framebuffer away?
Brendan wrote:
Owen wrote:Assuming that the application just draws its' geometry to the window with no post processing, yes, you will get a negligible speed boost (any 3D accelerator can scale and rotate bitmaps at a blistering speed).

Of course, for any modern 3D game (where modern covers the last 7 years or so) what you'll find is that the geometry you're scaling is a quad with one texture attached and a fragment shader doing the last post-processing step.
So for a modern 3D game; you start with a bunch of vertexes and textures, rotate/scale/whatever the vertexes wrong (e.g. convert "3D world-space co-ords" into "wrong 2D screen space co-ords"), then use these wrong vertexes to draw the screen wrong, then rotate/scale/whatever a second time to fix the mess you made; and for some amazing reason that defies any attempt at logic, doing it wrong and then fixing your screw-up is "faster" than just doing the original rotating/scaling/whatevering correctly to begin with? Yay!
Deferred shading is quite common on modern games. Rather than rendering the geometry with all the final shading effects, you render it with a shader which outputs the details relevant to the final shading effects. This buffer might contain the data

Code: Select all

ubyte[3] rgb_ambient;
ubyte[3] rgb_diffuse;
ubyte[3] rgb_specular
half_float[2] normal; // 3rd component of normal reverse engineered later because normals are normalized
plus of course don't forget the depth buffer

Next, the game will render, to either the actual frame buffer or more likely a HDR intermediate buffer, a quad with a fragment shader attached which reads this data, reverse engineers the original position from the depth and the transformation matrix, and does the lighting that the game developer desired.

HDR lighting will then require at least 2 more passes over the data in order to work out the average intensity and then apply it. If the applciation is doing bloom, expect another couple of passes.

Why? Well, consider a lighting system which requires 1 pass per light, with 4 lights, over 500 objects. With deferred shading, that amounts to 504 GPU state changes. Without, it becomes 2000.

Deferred shading isn't perfect of course, because it doesn't work for transparent objects (so everyone fills them in with a standard forward rendering pass later - as they do anyway, because transparent objects are expensive and have to be rendered back to front, the least efficient order, and so if you render them last hopefully they'll be occluded)
Brendan wrote:
Owen wrote:
Brendan wrote:Also note that you are completely ignoring the massive amount of extra flexibility that "list of commands" provides (e.g. trivial to send the "list of commands" to multiple video cards, over a network, to a printer, to a file, etc).
Whats more efficient:
  • Executing the same commands on multiple independent GPUs
  • Executing them on one GPU (or multiple couple GPUs, i.e. CrossFire/SLI)
Answer: The later, because it uses less power (and because it lets you use the second GPU for other things, maybe accelerating physics or running the GUI)
I don't know what your point is. Are you suggesting that "list of commands" is so flexible that one GPU can execute the list of commands once and generate graphics data in 2 different resolutions (with 2 different pixel formats) and transfer data to a completely separate GPU instantly?
What I'm saying is that sending the same commands to both GPUs is a waste of a bunch of CPU time (particularly bad when most 3D is CPU bound these days) and a bunch of power and GPU time (~half of the objects on each GPU will be culled, making the entire geometry transformation a waste of time)

Buffer copying from one GPU to another is cheap (they have fast DMA engines). For two different pixel densities... render in the higher density. For two different gamuts... render into the higher gamut format.
Brendan wrote:I have no intention of supporting buffer readbacks. When one GPU only supports the fixed function pipeline and the other supports all the latest shiny features; one GPU's device driver will do what its programmer programmed it to do and the other GPU's device driver will do what its programmer programmed it to do.
OK, so any hope of doing GPGPU is gone, as is any hope of performance when you drag a window between your shiny GeForce Titan and the monitor plugged into your Intel IGP (because the IGP's driver just went into software rendering for everything because it doesn't support much, and what it does support is slow anyway).
Brendan wrote:
Owen wrote:Any GUI lets you send the commands over a network or to a file; you just need to implement the appropriate "driver"
Yes; and sending 2 KiB of "list of commands" 60 times per second is a lot more efficient than sending 8 MiB of pixel data 60 times per second; so that "appropriate driver" can just do nothing.
Did I say otherwise?

Actually, the one saving grace of X11 is that it actually manages to do this at somewhat reasonable efficiency, but AIGLX is still significantly behind direct rendering. However, even removing the X server overhead couldn't hope to save it.
Brendan wrote:Every bad system is also built on layers. The most important thing is that application developers get a single consistent interface for all graphics (not just video cards), rather than a mess of several APIs and libraries that all do similar things in different ways.
Sure.

If you want to do realtime 3D, use OpenGL or DirectX.
If you want to do some 2D, use Cairo, or CoreGraphics, or GDI.
If you want a UI... use the UI toolkit.

Nobody doing a UI wants to use the same API as someone rendering a 3D scene. That would just be mad. And maddening.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: UEFI+BIOS bootloader combination

Post by Brendan »

Hi,
rdos wrote:I have no idea why you think command lists would avoid "pixel pounding".
Imagine you've got an application, and that application's window contains 123 textured triangles (or 123 icons, or 123 thumb-nails, or 123 pieces of text, or whatever). Now imagine the application's window is positioned so that most of it is past the edge of the screen (or obscured by a different window or whatever else) and only 10 of the textured triangles are actually visible. The application itself doesn't know which triangles are visible and which ones aren't, so the application has to assume all triangles are visible.

For your "application creates pixel data" approach all 123 triangles must be drawn, then the application sends its data to something else (GUI or video driver or something) which discards most of the pixels that you wasted time drawing.

For my "lists of commands" approach all 123 triangles must be added to the "list of commands", then the application sends its data to something else (GUI or video driver or something) which determines which triangles are actually visible (which can be done extremely quickly), and only actually draws the 10 triangles that are actually visible (and completely avoids wasting time drawing the other 113 textured triangles).

Please note that almost every "rasteriser" has worked like this for the last 20+ years. There's nothing new or hard about determining which triangles (or icons, or thumb-nails, or pieces of text, or whatever) aren't visible and skipping them, or determining which triangles (or whatever) are partially visible and clipping them; especially for the very simple "bounding box" case.
rdos wrote:If some operation (like rendering a PNG) requires pixels as an input, then it does regardless if you use LFB or command lists. Or are you suggesting that PNG files are processed in the GPU? And how about TrueType rendering? Are you suggesting this will also be done in the GPU?
For these cases, the application sends a command (e.g. maybe "load_texture_from_PNG ID, file_name" or "load_texture_from_UTF8_string ID, times_new_roman, "Hello world"") and the video driver is responsible for creating the texture with the requested ID (where "responsible for" may mean that the video driver actually does the work itself, but may also mean that the video driver is responsible for asking something else, like a font engine, to do the work). Of course the video driver may cache the textures after they're loaded; and the application can send commands like "draw_texture ID, x, y" to ask the video driver to draw the textures anywhere it likes in its windows/screen (where the application's window/screen is just another texture, which also may be cached, where the GUI can send a command asking the video driver to draw the application's window/screen/texture anywhere it likes).

Also note that (examples):
  • if the "Hello world" texture isn't used for any reason (e.g. the application didn't ask for it to be used in any other texture, or it was hidden behind something, or it was past the edge of the screen, or whatever) then the video driver doesn't need to waste time creating the "Hello world" texture at all.
  • the video driver might have the texture for "Hello world" cached for one size and display it 10 times at that size, then the user (using the GUI) might zoom in on the application's window causing the video driver to ask the font engine for another texture representing "Hello world" in a much larger size (to improve image quality); and this can happen without the application knowing or caring at all, and without the GUI knowing or caring what the application's window actually contains.
  • if the user asks for a screen shot; the video driver can provide the "list of commands" that was used to create the current screen (plus any "lists of commands", file names, text strings, etc that were used to create any other textures that are referenced by the screen's "list of commands"); and this "list of commands" (and other stuff) can be used to recreate that graphics that were on the screen in a completely different resolution and/or pixel format; and could even be sent "as is" to the printer (which may render those graphics in an extremely high resolution using CMYK and print a version that is far higher quality than the version that was originally on the screen); all without any application or GUI that created the "lists of commands" knowing or caring.
rdos wrote:
Brendan wrote:
Owen wrote:Every good system is built upon layers. For graphics, I view three:
  • The low level layer; your OpenGL and similar; direct access for maximum performance. Similarly, you might allow direct PostScript access to PostScript printers (for example)
  • The mid level layer; your vector 2D API; useful for drawing arbitrary graphics. Cairo or OpenVG.
  • The high level layer; your GUI library; uses the vector API to draw widgets
You'll find all competent systems are built this way today, and all incompetent systems evolve to look like it (e.g. X11 has evolved Cairo, because X11's rendering APIs suck)
Every bad system is also built on layers. The most important thing is that application developers get a single consistent interface for all graphics (not just video cards), rather than a mess of several APIs and libraries that all do similar things in different ways.
Owen has a point. While you want a (Windows-like) uniform layer that can do anything from low-level graphics primitives to advanced GUI operations, and instead want to layer with "command lists", this will end up as a real inflexible solution. For instance, what happens if the user wants custom menus or command buttons? Will he need to implement those from the ground-up because you have no middle-layer that can be used as a starting point?
Creating a texture (that represents a custom menu or custom command button or anything else) is no different to creating a texture (that represents an application's window), which is no different to creating a texture (that represents the GUI's screen containing many application's windows).
rdos wrote:Additionally, it is pointless to run a GUI against a printer or file.
Because nobody has ever printed a screenshot, or used something like Fraps to create a file (that they can upload to youtube)?


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: UEFI+BIOS bootloader combination

Post by Brendan »

Hi,
Owen wrote:
Brendan wrote:I'm using KDE 4, which is meant to support compositing (but every time I attempt to enable compositing effects it complains that something else is using the graphics accelerator and refuses). To be honest; it was hard enough just getting it to run OpenGL in a window (different versions of X and ATI drivers and libraries and whatever, all with varying degrees of "unstable"); and once I found a magic combination that seemed to work I stopped updating any of it in case I upset something.
Got it. Crappy video drivers.
What makes you think it's the video card drivers, and not the kernel or X or KDE or applications or mesa or openGL or other libraries? What makes you think it's any one single piece, and not a design problem with the way 2 or more pieces fit together, and not a massive design problem with the entire "graphics stack" that causes the entire graphics system to suck?

As far as I can tell, the ATI drivers themselves work well (e.g. and can handle 3D openGL faster than the ATI drivers on my Windows machine can).
Owen wrote:
Brendan wrote:
Owen wrote:Even if you're using a compositing window manager, X11 barely counts as modern. If you're not using a compositing window manager, yeah, expect these problems because that's just how X11 is.
Yes, but I doubt Wayland (or Windows) will be better. The problem is that the application expects to draw to a buffer in video display memory, and that buffer can't be in 2 different video cards at the same time. There's many ways to make it work (e.g. copy from one video card to another) but the only way that doesn't suck is my way (make the application create a "list of commands", create 2 copies and do clipping differently for each copy).
Except this model is already in use today, on laptops and machines where all the outputs are connected to the IGP, such that the discrete graphics card can be powered down completely when you're not gaming. Surely you've heard of things like nVIDIA's Optimus and LucidLogix' Virtu?
I was assuming that both video cards are equal (which is the case for my computer). When they aren't equal (e.g. some sort of "big/little" arrangement) it'd be easy for me to (e.g.) have one video driver that asks a different video driver to render some or all of the graphics; even if the video cards are made by different manufacturers, and (potentially) even if the video drivers are running on different computers (if/when any of these things happen to make sense for any specific situation).
Owen wrote:
Brendan wrote:
Owen wrote:OK, so you do expect to make the one final retained mode 3D engine to rule all 3D engines then.
I expect to create a standard set of commands for describing the contents of (2D) textures and (3D) volumes; where the commands say where (relative to the origin of the texture/volume being described) different things (primitive shapes, other textures, other volumes, lights, text, etc) should be, and some more commands set attributes (e.g. ambient light, etc). Applications create these lists of commands (but do none of the rendering).
So the final retained mode 3D engine to rule all 3D engines. Except yours is going to be more abstracted and therefore lower performance than most, never mind the fact that developers aren't going to be able to do anything innovative on it because you haven't implemented any of that yet...
You've never seen an actual standard or anything that describes exactly what it is, and never seen an actual implementation of it, and you've only seen a (partial/simplified) description of certain parts of it; and with this complete lack of any usable information you've ignored any/all obvious advantages and innovations, then leaped to the conclusion that it's "worse" for no apparent reason whatsoever.

Do you have one rational reason to make this claim (that hasn't already been discussed and shown as false) at all?
Owen wrote:
Brendan wrote:So you're saying that it's entirely possible to use ray tracing on data originally intended for rasterization, and that the ray traced version can be much higher quality than the "real time rasterized" version; and your only complaint is that you'd get even higher quality images from ray tracing if mathematical models of shapes is used instead things like meshes?
Sure, you can raytrace polygon meshes. You'd never want to.
Sure, nobody would ever want to do that.

Of course the entire point of this part of the discussion was that you could render the "lists of commands" in much higher quality (using whatever technique you like) if you don't have real-time (e.g. 1/60th of a second) time limits and you can afford to throw "unlimited" amounts of processing at it (e.g. screen shots, printers, etc).
Owen wrote:
Owen wrote:GLXGears is hardly doing any pixel pounding. In fact, its' doing none at all. It's building command lists which, if your driver is good, go straight to the hardware. If your driver is mediocre, then.. well, all sorts of shenanigans can occur.
I don't think we were talking about GLXGears here; but anyway...
I think this part of the discussion originally came from rdos's inability to understand how it's possible to avoid drawing things that can't be seen. I'm fairly sure you don't have the same problem (and that you do understand things like clipping polygons to the edges of viewing volumes, etc, even if you seem to be refusing to extend this knowledge to GUIs)...
Owen wrote:
Brendan wrote:For GLXGears, the application sends a list of commands to the video card's hardware, and the video card's hardware does a massive pile of pointless pixel pounding, then the application sends the results to the GUI (which ignores/discards it)?

Why not just send the list of commands to the GUI instead, so that the GUI can ignore it before the prodigious pile of pointless pixel pounding occurs?
Why does the GUI not just take the application's framebuffer away?
Because then you can't avoid doing a prodigious pile of pointless pixel pounding when none of it is visible. Also note that for a hobby OS, the prodigious pile of pointless pixel pounding is likely to be extremely expensive because native video drivers capable of full hardware acceleration don't write themselves, and it's very likely that the prodigious pile of pointless pixel pounding (that you're trying very hard to fail to avoid) will have to be done in software without any hardware acceleration at all.
Owen wrote:
Brendan wrote:So for a modern 3D game; you start with a bunch of vertexes and textures, rotate/scale/whatever the vertexes wrong (e.g. convert "3D world-space co-ords" into "wrong 2D screen space co-ords"), then use these wrong vertexes to draw the screen wrong, then rotate/scale/whatever a second time to fix the mess you made; and for some amazing reason that defies any attempt at logic, doing it wrong and then fixing your screw-up is "faster" than just doing the original rotating/scaling/whatevering correctly to begin with? Yay!
Deferred shading is quite common on modern games. Rather than rendering the geometry with all the final shading effects, you render it with a shader which outputs the details relevant to the final shading effects. This buffer might contain the data

Code: Select all

ubyte[3] rgb_ambient;
ubyte[3] rgb_diffuse;
ubyte[3] rgb_specular
half_float[2] normal; // 3rd component of normal reverse engineered later because normals are normalized
plus of course don't forget the depth buffer

Next, the game will render, to either the actual frame buffer or more likely a HDR intermediate buffer, a quad with a fragment shader attached which reads this data, reverse engineers the original position from the depth and the transformation matrix, and does the lighting that the game developer desired.

HDR lighting will then require at least 2 more passes over the data in order to work out the average intensity and then apply it. If the applciation is doing bloom, expect another couple of passes.

Why? Well, consider a lighting system which requires 1 pass per light, with 4 lights, over 500 objects. With deferred shading, that amounts to 504 GPU state changes. Without, it becomes 2000.

Deferred shading isn't perfect of course, because it doesn't work for transparent objects (so everyone fills them in with a standard forward rendering pass later - as they do anyway, because transparent objects are expensive and have to be rendered back to front, the least efficient order, and so if you render them last hopefully they'll be occluded)
And it does all of this using the wrong 2D co-ords, then fixes its screw-up with extra overhead (that could have easily been avoided) afterwards?

Let me put it another way (without burying this part of the discussion under a huge wall of unnecessary/pointless details). Is it better to:
  • a) draw things in the right place at the right scale, or
  • b) draw things in the wrong place at the wrong scale for no reason whatsoever, then waste extra time fixing your stupidity
Owen wrote:
Brendan wrote:I don't know what your point is. Are you suggesting that "list of commands" is so flexible that one GPU can execute the list of commands once and generate graphics data in 2 different resolutions (with 2 different pixel formats) and transfer data to a completely separate GPU instantly?
What I'm saying is that sending the same commands to both GPUs is a waste of a bunch of CPU time (particularly bad when most 3D is CPU bound these days) and a bunch of power and GPU time (~half of the objects on each GPU will be culled, making the entire geometry transformation a waste of time)
Ah, so you're saying that "list of commands" is so flexible that it can solve this problem in many different ways (but some of those ways are better/worse than others), and that you agree that "list of commands" is amazingly flexible. 8)
Owen wrote:
Brendan wrote:I have no intention of supporting buffer readbacks. When one GPU only supports the fixed function pipeline and the other supports all the latest shiny features; one GPU's device driver will do what its programmer programmed it to do and the other GPU's device driver will do what its programmer programmed it to do.
OK, so any hope of doing GPGPU is gone,
Whether or not my graphics API supports or doesn't support reading back graphics has nothing to do with GPGPU (which is an entirely different API for an entirely different purpose).
Owen wrote:as is any hope of performance when you drag a window between your shiny GeForce Titan and the monitor plugged into your Intel IGP (because the IGP's driver just went into software rendering for everything because it doesn't support much, and what it does support is slow anyway).
A GPU's device driver will do what its programmer programmed it to do; which (as I've already explained several times) may include asking a different device driver to do some or all of its rendering (e.g. rendering the entire screen, or rendering individual textures).
Owen wrote:If you want to do realtime 3D, use OpenGL or DirectX.
If you want to do some 2D, use Cairo, or CoreGraphics, or GDI.
If you want a UI... use the UI toolkit.
If you want to do real-time 3D, use "list of commands"
If you want to do some 2D, then that's just a special case of 3D anyway and it'd be stupid to have an entirely different API
If you want a UI, everything (applications, GUIs, "widget services", tookits, libraries, whatever) can easily handle generating, processing and communicating with each other using "lists of commands"
Owen wrote:Nobody doing a UI wants to use the same API as someone rendering a 3D scene. That would just be mad. And maddening.
Because creating a list of commands that describes a texture (for a 3D game), is entirely different to creating a list of commands that describes a texture (for a GUI), which is entirely different to creating a list of commands that describes a texture (for an application's window), which is entirely different to creating a list of commands that describes a texture (for a widget); and because all of these things are entirely different, programmers need to deal with the extra complexity and stupidity of many completely different APIs for all of these completely different purposes (creating a list of commands that describes a texture).


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
SinglePoster
Posts: 1
Joined: Sat Jan 11, 2014 10:54 am

Re: UEFI+BIOS bootloader combination

Post by SinglePoster »

I'd like to point out a few things that hopefully show the truth in this little war on design.
-The GPU is essentially a black box. You could think of it as a CPU where the opcode for add could be any number of things but unless your coding a low-level driver you're more likely to use the ADD(float a, float b) function.
-Supporting the abstracted add is more portable while the low-level is faster.
-Clipping to the screen visible portions and culling unseen components are seperate ideas.
-Culling happens at the application level. If its reached the video card you have not culled, you have simply clipped and made the video card and databus suffer.
-Data comes in many formats from vector, to raster, to math formula, etc.
-There is no "one format" zip will never compete with png on images and png will never compete with zip on general purpose. PERIOD. We have the same argument here for the same Information Theoretic reasons: 3d and 2d are the same as the 1d byte stream when unwrapped.
-For a GUI specifically you can implement it vector style instead of raster style. Ultimately it will become raster style because that is how the card works. (Read final note and critique's OpenGL section for why this is likely to be a moot point)
-Culling can only happen per element regardless. A single vector or a single pixel.
-If you have a vector square and a static noise texture you DO NOT convert those formats to anything else because they are the archetypical cases for that format. They are that format's strong point and every other format's bad day.
-You should provide facilities for culling, etc. for both because there will be scenarios where each is the the best.
-If you want to extract away everything and have it in a server setup so your GUI and apps never have to worry about culling or ANYTHING because your layer does that supposedly... Congrats, you just remade X11! (Well assuming you're POSIX...).
-The BEST you can do if you don't want to support everything everyone can think up is provide a common ground. Enter DirectX, OpenGL, VESA, and FrameBuffers. You have your vector/pixel APIs and your pixels.
-These lowest layers will NEVER cull (remember that's the apps job) and they MAY clip.
-These lowest layers are essentially "hardwired software" they handle the triangle vector model and textures. That's what their APIs have in common. If you're really concerned about "the complete package" just pipe OpenGL commands wherever you want them to go.
-If you want to do ANYTHING ELSE than what the APIs provide: you are looking at pixels since its the root (vector triangles get rasterized into pixels after all). Typically we do this as shaders on a quad so we can hardware accelerate our new method.
-The both/root route is the lowest layer and you build the other layers on top. This is why we have layered hierarchies. They aren't fun but they're necessary because there IS NO UNIFORMITY so we abstract as much as we can without losing capabilities. Then if we want something more special case (GUI's elements are vector culling before they're rasterized/"turned into OpenGL calls" while video elements are decode into a pixel buffer) we code a special layer for it.

Basically both are the best. One is super easy to support the other is extremely hard. They should both be supported for the cases where they shine but when it all boils down you better at least be able to support a print bmp to screen and vice versa. And since your desktop is likely to be raster that function better be damn fast, not a bunch of rect calls.


final note and critique:
Brendan's way would be the god mode way. It's clearly best but is unimplementable. At best it assumes a common data format for all visuals. Something we have not yet created and of which triangles certainly aren't (to be fair they're close which is why we went with them. They are also inefficient which is why we add detail with textures). At worst it assumes hardware that supports something beyond a basic framebuffer or even hardware that is uniformly the same in at least API (read: Game Consoles, Apple, and possibly computers if AMD gets their way., OpenGL is an attempt at creating this through abstraction).
Owen's is common truth across nearly all hardware in existance. It also could be hardware accelerated as texture mapped quads. Its simplicity invites optimization to the limit. The tradeoff is a larger usage of memory for vector images. But seeing how moderately complex vectors at typical monitor sizes are ~1/5 the size of the same pixel version, this isn't much. Considering svg is probably not what your graphics card talks in for vectors... your looking at CPU for the transform as well, probably making your graphics CPU bound (DON'T do this).
As far as using OpenGL: you pipe drawing stuff and you run into the rats nest of everything doing direct calls to hardware or an OS managed abstraction. You STILL have to do culling, and if you want your special window culling you are writing a window manager middle layer that all OpenGL windows get piped through. They still become pixel buffers that you composite before you send it to main display. The OpenGL creates a texture for each window. You handle each window's culling and clipping to the viewport and then OpenGL displays the textured quads. You STILL have to implement texture handling functionality at this point. So if its going to be a part of your "Special GUI method" at the very end for composition you might as well do it from the start at the very least as an option for direct texture displaying of bitmaps and the like.


personal note: I think where programming got astray was in abstraction. Abstraction in and of itself is NOT good. It is a tradeoff we developed because the market evolved that way. Its an obfuscation of usage whether of an interface or of data. Arguing for its use are ease of developing reusable code, portability/"legacy support", and ease of coding secure programs. Arguing against are bloat, obfuscation, speed loss, and "paradigm latch". That last one is essentially what we see in this war here. We use abstraction so much on larger projects to facilitate code reuse and isolation as well as distancing ourselves from the divergent hardware that's evolved that we forget it is not "THE ONE WAY" and sometimes we can't abstract to the extent we desire.
Post Reply