bzt wrote:
OpenGL
reapersms wrote:OpenGL was not a "video card" API, it was a 3D rendering API
True, but that 3D rendering was supposed to be executed on the video card, not on the CPU. (Yes, there were CPU-only GL implementations too, but that was merely a workaround, not a design goal. CPUs were never designed for efficient parallel matrix operations for example, they are and always were, general purpose.)
After some google digging, it appears the IRIS hardware did provide geometry operations, though there are hints that they weren't always there (explicit callouts in documentation that later era ones implemented the 'full' IRIS GL pipeline in hardware)
OpenGL was a cleaned up and generalized IRIS GL, and keep in mind that transition happened in '92.
The matrix library bits were commonly partially hardware, partially software. One particular family of consoles had essentially the OpenGL 1.2 pipeline implemented in hardware, as far as having equivalents of the GL_PROJECTION, GL_NORMAL, and GL_MODELVIEW matrices, but the matrix stack operations on them were all done in software.
bzt wrote:
reapersms wrote:integration with a system GUI was left to external bits.
...
API were influenced a fair bit by the demands of X11
Since X11 is a GUI, these two sentences contradicts each other. Also I don't remember that GL was dependent on Xlib ever (only the other way around, X11 incorporated GL). Even if X11 protocol had tokens for GL commands, I don't think that had any influence on how the functions were named and what arguments they needed, or what properties the GL state machine had.
After some thought, it would be more correct to say that OpenGL itself did not really care how you got the context, it just defined what you could do with the context. The particulars of how you acquire an OpenGL context are up to the external bits, in this case GLX, and (later) AGL and WGL. GLX being the X11 extension to provide an OpenGL context, it absolutely defined X11 tokens for passing GL state around.
bzt wrote:
reapersms wrote:The standard was not static, just a bit slow moving
What I meant, it did not had something like EFI's LocateProtocol. The GL standard (and therefore the API) lacked something like glGetAvailableExtensions() and glDynamicallyLinkExtension() functions. You had to install a new version of the library to get more features. You couldn't get those in run-time without re-compilation, static in this sense.
It absolutely did. glGetString(GL_EXTENSIONS) existed from 1.0. Acquiring the function pointers for any added entry points was provided by GLX, WGL, or AGL. You did have to know about what extensions you wanted to support when you wrote your program, but that is not a particuarly onerous or unusual requirement. I'm not sure what your complaint about having to install a new version of the library to get more features is about, if you're dynamically linking anyways, you have to do that for anything else. The app links against libGL.so or opengl32.dll, and a update of X or the graphics driver substitutes their own library in, or the rough equivalent (Windows ICD vs MCD stuff)
Generally that sort of thing would be largely outside the scope of the OpenGL part of the specification anyways, as the process and mechanism of dynamic linking is inherently system specific.
bzt wrote:
reapersms wrote:(ARB realities)
Ergo they were a bunch of people incompetent to properly design an API. You said it too
They were a bunch of people with occasionally competing interests, that managed to come up with an API that still largely works nearly 30 years later. I'd call that human, not incompetent =P
bzt wrote:
reapersms wrote:The abstraction wasn't terrible at the time
Not at the time, no, but it wasn't future proof, and manufacturers couldn't extend it with their latest features for sure. You had to wait until the board voted the feature in (which, as you said, they did not do in several occasions). For example the X11 protocol is designed in a way that it can be extended without loosing compatibility or requiring a new version of Xlib.so to be installed. Therefore it can cover the features of new hardware without changing the interface. That I'd say is a good, non-static abstraction.
About the only vaguely correct notion in here is that X extensions went about things with additional DLLs for some of them, like XRender. The fact that OpenGL on X doesn't usually work like that is more an implementation detail.
Vendors could expose their shiny new features via extensions from 1.0 on, with no input from the ARB at all. They still do to this day. This isn't great for the application developer, as you end up with a bunch of conditional code blocks depending on whether NV_sliced_bread or AMD_chrome_plating existed in the extension string or not (yes, returning them all in one string was a tad shortsighted, but that got fist over a decade ago I think)
If some of those extensions are similar to each other, or seem to be good ideas, a vendor neutral EXT version would get sorted out, usually by a cross vendor group of 2-3 people. Where the ARB comes in is in deciding whether an EXT extension is a good enough (and politically feasible enough) to be a core feature, and if it is, it gets renamed ARB_whatever for querying in version 1.now, and is provided as a first class core function inn version 1.(now+1). It is still accessible via the extension and glXGetProcAddress, so applications that linked against 1.0 still work if the system updates to a 1.1 lib.
bzt wrote:
reapersms wrote:The GLSL situation was rather not great. It was designed a bit from on high to push towards a higher level abstraction, and get away from the vendor specific assembly interfaces. The goal was laudable, to let applications write one shader program
For that it's enough to standardize a bytecode. The hardware manufacturers can then convert the standard bytecode into their vendor specific instructions. Just as SPIR-V in Vulkan. No need for a complete compiler. I'd like to point out that IBM had experience with JIT bytecode compilation for over half a century (!) and Java already had a long history when OpenGL 3 came out... Again, the example were there, but Khronos members simply did not pay attention.
Yes it would have been nice if they'd gone with a common bytecode or IL at the time. The extensions that the ARB shader system grew from were assembly bytecode level, but they decided to put the language itself straight in.
Given what the hardware at the time did, I am a bit glad SPIR-V took until now. The bytecode at the time would have almost certainly been based around the concept of 4-wide SIMD, as vectorizing scalar code was not particularly reliable in any compiler I recall from the period. HLSL did go with a 4-wide bytecode, and many of the optimizations done to fit that make things more difficult these days -- either by confusing the output for shader debuggers, or being flat-out unoptimal now.
bzt wrote:
reapersms wrote:While that was going on, the hardware was changing drastically under the hood, which is where the abstraction started diverging from reality. OpenGL handled it a tad better than D3D did.
Yes. However the basic concepts remained, a vertex contains 3 coordinates just as 30 years ago. (I know VBOs were introduced as a big thing, huge innovation, revolution whatsoever, but in reality programs stored verteces in CPU memory in struct arrays anyway. The only difference is, a VBO is allocated on the GPU's memory.)
Everything related to vertex buffers has been a matter of eliminating redundant copies and memory traffic. 1.0 didn't have vertex arrays in core, you pushed everything through repeated calls to glVertex*(), glColor*(), glNormal*(), etc. The EXT_vertex_array extension let you give it a base pointer, stride, and format specification. You would then either make a call to glArrayElement() per vertex, or use one of the bulk calls like glDrawArrays() or glDrawElements(). It very likely was still looping over the array moving things around itself under the hood, but at least it cut out the function call and parameter traffic. In theory it could have done some things to optimize the path a bit, but practically it tended to not have enough information, and I think DrawArrays was optional. It was promoted to standard in 1.1.
Buffer objects came (much) later. GPU memory buffers were certainly a big part, but another part was the interface change to explicit map and unmap calls, letting the driver know when you could have modified the memory (and thus it would need to reupload, or reprocess if the format wasn't natively supported)
bzt wrote:
Again, you are perfectly right, but let's not forget M$ was on Khronos, and if anybody, they know the best how to keep compatibility with ancient old APIs and introducing new innovation at the same time. The required knowledge was there, it should not have been such a big deal to cut ties with history, imho.
... API was unexpandable. ...
As I mentioned above, the API was expandable. What it was not particularly great at was removing features. The Compatability profile was the way to support old software, and the Core profile was to move forward. Also, MS was pretty terrible about that. Capability bits were a horrible thing in practice, and they made little or no effort at forward or backward compatability later.
Core was a bit overzealous about removing things, and the compatability profile was still used by a fair amount of new software.
The parts the hardware/driver vendors really wanted to eliminate were omitted from the OpenGL ES spec, and the explosion of mobile after that has helped a good deal at weaning people off of immediate mode, if only by encouraging utility libraries to provide it layered on top of arrays.
bzt wrote:
Vulkan
reapersms wrote:That API was turned, almost directly, into Vulkan. It's primary purpose was performance, and providing a low overhead interface to reality.
That's the thing. It's does not have better performance.
An application using OpenGL shaders and another one using Vulkan has exactly the same (logical) steps to make before their triangle gets displayed on the screen. What really happened here is, that all so called "overhead" is now moved from driver space into application space. Which means IF the application developer knows what he's doing, and is a better programmer than the driver developers, then he can write a software which performs better. On the other hand, all the other application programmers (who are less talented/experienced than the driver developers) are going to write a much much worse, more buggy and slower software.
And they are the majority.
There's no way they would know, what you meant by "stuff" when you said "pCreateInfo: structure with all of the stuff it needs to create the actual object", and they are definitely the ones who are too lazy to read the documentation. They will just copy'n'paste a struct from some forums without knowing what it actually means.
And Vulkan is not aimed at them. It is not meant to be the be-all, end-all API for providing 3D rendering and GPU compute. OpenGL is still there, and still getting updated. It may be the case that the OpenGL implementation happens to be a straight translation layer to Vulkan, but the application developer does not care about that detail.
Inexperienced developers are going to write shitty code, built off of terrible tutorials, and there's nothing an API can really do to avoid that. Explicitly designing an API around the idea that you can beat that is a dead end.
It is not a matter of the application developer being better than the driver developers, it is a matter of which one has more accurate information about the workload at hand. The driver developer has to support all possible workloads, and track state accordingly. They may even be kind and allow programs that don't match the spec to run "correctly", but that approach does not scale terribly well.
A concrete example would be something like the number of simultaneous texture references. One API supports 128 texture slots per shader stage. While the vast majority of applications will never do something that absurd, the driver needs to support it somehow. Maybe it gets fancy, and starts off only tracking a more reasonable number like 32, and reallocating structures under the hood or changing to a different code path when it detects the application really using that many. Maybe it just has a single code path and throws memory at the issue. Vulkan lets the application decide exactly how many slots it cares to track, and if it really does need a lot, it can. If it only needs a lot for a few specific cases, but not the other 95% of the shaders and draw calls, it can do that as well, and it doesn't have to guess.
More than that, it is a fundamentally different architecture. Multithreading OpenGL is a complicated mess, as contexts are explicitly single threaded. Sharing resources between contexts is complicated at best, nightmarish at worst.
It has been possible to come quite close to the efficiency avaliable via Vulkan, through careful use of some OpenGL extensions, and rearranging the way your data is structured to make the driver happy. It tends to be a lot easier said than done, but a brief on the details is
here
bzt wrote:
[But window system integration had exactly the opposite effect. It should have been the windowing system that integrates Vulkan, and not the other way around. This is called poorly designed abstraction.
In retrospect, it's more accurate to describe the window system integration as less so being in the API itself, and more that Khronos provides an additional helper library to present a more uniform approach to acquiring a context across platforms. OpenGL ES did the same via the EGL API.
The value there is getting some consistency, vs the application developer picking one of SDL, Allegro, GLFW, glew, SFML, glut, or any of another dozen multimedia framework libraries
bzt wrote:
I actually never had problems about a GL window not getting updated, well, ever. Although this is not a big deal, it's not that multiplatform programs are not full of "#ifdef __OS__" precompiler directives already. It's just one more.
I didn't mean the window itself didn't get updated, I meant the OS developer just decided to quit updating their implementations. MS did that for a long, long time, leaving the system standard GL stuck at 1.2. The driver could provide better, but the process to get that went something like
- Create a window on the screen you want to display to[Win32]
- Select a pixelformat, choosing from the options avaliable as of 1.2, creating the WGL context (you're still talking to the stock WGL) [WGL]
- Create a GL context (this gets you talking to the actual driver WGL) [WGL]
- Query the context for extensions and version [GL]
- Get pointers to the WGL extension functions, and use them to find out whether it supports a better pixelformat than you put up with [WGL]
- Destroy everything, and create a new window [GL, WGL, Win32]
- Create your real window [Win32]
- Using the extracted WGL extension functions, select the real pixelformat you want to use
- Create a GL context
- GetProcAddress everything, including everything you had queried before, because the function pointers returned are context specific.
That situation improved a bit around Windows 7 I think.
Apple has been just as poorly behaved, and ceased updating their system GL around 3.3 in favor of forcing everyone to use Metal.
bzt wrote:
reapersms wrote:API usage validation has gotten more complicated than is reasonable to deal with via glGetError and friends.
Exactly. Also API usage validation cannot substitute glGetError, as the latter can report run-time errors too, while a validator obviously can't.
(Imagine that there's everything fine with the API calls, the program works on 9 machines without problems, but on the 10th machine the video card is faulty or the driver is buggy. No way a validation layer can handle this - unfortunately not far fetched - scenario.) => bad design
I believe you misunderstand the validation layer. It is not (just) a compile time validation, it is extensive runtime checking of all arguments and state to at above or beyond the ability of glGetError. When you use it, all of your vkWhatever calls go to it, and it then forwards things on to the actual driver after checking. Error reporting tends to be via callbacks, logging, or debug prints depending on choice. D3D offers similar functionality via the debug and validated device creation flags. The checks these layers do can be quite thorough, almost annoyingly so at times (yes yes, I know the value of that matrix element is really small, and will probably be zero. Don't tell me about every single one of them please)
It is also closely related to that most useful of GPU debugging tools, the API frame capture.
bzt wrote:
reapersms wrote:the PC environment is a good deal messier than the ideal
That's true, however I have never heard of anybody complaining about using for example the POSIX API on a PC is different than using it on IRIX. Because that's the purpose of the standardized API, to hide the hardware and lower level mess.
My meaning here was that while there are a number of APIs that are as fast or faster than Vulkan, and easier to work with, they achieve that by being specific to particular environments. I.E. consoles.
On PC D3D 9..11 you need to ask the API to kindly create and manage all of your objects. You get back opaque pointers you get to pass around to things, and if you want any information about them you need to make more API calls or track it yourself in an object you wrap around that pointer. You have to pretend the various memory blocks under the hood don't really exist and you just have the object.
On PC Vulkan and D3D 12, you get a bit more control. You ask it nicely for large memory blocks, and can parcel those blocks out as you like. You may still have to go through API calls to convert offsets into real pointers, and probably still have to go through some API calls to query properties of things. There's still a lot of OS wrappings around things to deal with PCIe and such. Your GPU pointers probably aren't exactly the same as your CPU ones, and may move underneath you, hence the API calls.
Consoles are that, but usually you have a flat unified memory model, so you can ditch most of the address translation bits. Resources are exposed as small CPU visible headers referring to a gpu address, and you can modify them as you like. All of the mechanisms used to provide the shader abstraction are laid bare, and your offline shader compiler spits out actual GPU code instead of an IR that gets turned into real GPU code later.
bzt wrote:
I really don't want to break down your enthusiasm, and I really did enjoy your historical comments. But the truth is, most people don't really care why a particular interface is a big pile of sh*t, what they care about is that IT IS a big pile of sh*t. No offense of any kind meant.
Recognizing what's bad about things can be useful, and certainly cathartic. There are many times where a coworker will ask 'But reapersms, why did they build something this way?' or 'Why did anyone ever build something this terrible and not just <insert simple solution>?'.
Many of those will indeed have an answer along the lines of 'Those guys were &*%@#(heads that hate us and just want to make development difficult', but that is usually for things where it is quite obviously something being driven by marketing or some non-technical business initiative. Other times it is clearly some research project that should have stayed in the oven longer.
Many many more of those, however, will be things like:
- The hardware accepted these values directly at the time, and they needed no conversion/rearranging/etc
- Throwing memory at the problem was faster at the time, as the CPU clock was only 2x the memory clock
- It was built for a system that only did single textured alpha blended polygons, but could do them at 4X the max pixel rate of our current platforms
- It was built for a system that had unusually fast memory and a middlin CPU, so throwing more memory at it helped
- It was built for a system that had 2/3 the memory of its peers, so Great Efforts were put towards reducing memory footprint above all else
- Twisting things this way results in memory access patterns that are about 20x better than the straightforward one
- Yes this is terrible, but it doesn't become terrible until you throw two orders of magnitude more objects at it than anyone could conceive of at the time
- Someone created a flexible, general purpose system to make creators lives easier, but left too much flexibility in so the rendering can't take advantage of things
- Someone created a highly specific system to blast through at the speed of light, but if you need more than N things it becomes a nightmare
- This feature was designed to solve this one particular problem in an elegant fashion, which it did. Unfortunately the special cases it added mean that it's 5-20x faster to just brute force it in this other way.
If one starts from a position of "This API is terrible because everyone who designed it was incompetent" then it is quite easy to start designing your Newer, Better, Faster API... and then find out the hard way why certain things were done that way.
It is usually better to give them the benefit of the doubt, but still investigate. Worst case, you find out yes, that bit was built by someone who shouldn't have been anywhere near it. Often, however, you find that they were dodging some nasty, but quite well hidden land mines. Alternatively, you find that it ends up being a decision between different tradeoffs.
Also, when dealing with existing standards, it can be possible to explore how to improve things while still running on the existing ones, like the AZDO presentation above. Those have the advantage of working on other peoples machines, without having to wait for your shiny new standard to get market share.
bzt wrote:
Speaking only for myself, I'm interested in history, so please. I only hope others are interested too. I particularly liked what you wrote about Vulkan and memory allocation, it was interesting.
A problem with history is that there's a lot of it.
There's plenty I can provide more details on, just some of it is more off topic than others. Some options, if anyone has a preference:
- Buffer management, command lists, and synchronization woes
- Window system concerns
- What the hardware register interfaces looked like across different eras
- Early PC accelerators and the OpenGL/D3D API fights, roughly '96-'02
- The fixed function -> shader transition, '02-'06
- Attempts at moving beyond simple vertex and fragment programs, and why most of them end up questionably esoteric at best
- Compute and GPGPU
I'll try to break some things up in the future so they aren't all multi-page walls of text, quote-response is a bit messy about that