The theory behind Window Servers/Window Systems (GUI)

Discussions on more advanced topics such as monolithic vs micro-kernels, transactional memory models, and paging vs segmentation should go here. Use this forum to expand and improve the wiki!
Post Reply
User avatar
skwee
Member
Member
Posts: 73
Joined: Mon Jan 12, 2009 3:11 am

The theory behind Window Servers/Window Systems (GUI)

Post by skwee »

Hi there!

[NOTE] I'm not sure that "Window Server" is the correct term here. Display Manager? Window Manager? Window System? What do they call it? :-k

I'd like to discuss the theory behind Window Servers. I want you to go higher from the hardware and assume you have a running OS on a specific device. You don't want to use X-Server or Wayland. And since its an OSDev community, you want to challenge yourself and write a Window Server.

So as you understood for the preamble, I'm interested in writing a display server that will run on Linux on a SoC with ARM processor.
For my surprise, the web does not have a lot of information about this. Mainly I managed to find architecture and design of current existing Window servers like X server and Wayland. There are also overview of Ubuntu Mir and comparison to X and Wayland.

I have no idea where to start. I guess I need somehow to interact with OpenVG and OpenGL ES, but how?
I also fail to understand why Client-Server architecture is used? Historical reasons when back in the days there was a main computer that run the X server and different terminals used to connect to it via network?

I assume some of you might have done the research (its an OSDev community after all), or might have stumbled upon material and other people who tried to accomplish this. I'd be more than glad if you will share you knowledge.

Thank you!
Last edited by skwee on Thu Aug 15, 2013 4:35 am, edited 2 times in total.
TCP/IP: Connecting people...
User avatar
iansjack
Member
Member
Posts: 4703
Joined: Sat Mar 31, 2012 3:07 am
Location: Chichester, UK

Re: The theory behind Window Servers

Post by iansjack »

I think it would help if you were to explain exactly what you are talking about. Do you mean a program that manages graphics on a local machine only or are you talking about displaying programs running on a remote computer on a local display? Obviously there will be design differences between a local-only display system and a distributed one.
User avatar
skwee
Member
Member
Posts: 73
Joined: Mon Jan 12, 2009 3:11 am

Re: The theory behind Window Servers

Post by skwee »

I mean an alternative to X Server\Wayland. A program that manages graphics on local machine.
TCP/IP: Connecting people...
User avatar
iansjack
Member
Member
Posts: 4703
Joined: Sat Mar 31, 2012 3:07 am
Location: Chichester, UK

Re: The theory behind Window Servers/Window Systems (GUI)

Post by iansjack »

Those two sentences seem contradictory to me. The whole design philosophy behind X Window is that it a protocol for distributed graphics on a network. A local-only graphics systems seems to be a far less complicated situation.
User avatar
bluemoon
Member
Member
Posts: 1761
Joined: Wed Dec 01, 2010 3:41 am
Location: Hong Kong

Re: The theory behind Window Servers/Window Systems (GUI)

Post by bluemoon »

with client server architecture you gain the flexibility to change transport of different layer.
For example, VNC or NFC.
User avatar
skwee
Member
Member
Posts: 73
Joined: Mon Jan 12, 2009 3:11 am

Re: The theory behind Window Servers/Window Systems (GUI)

Post by skwee »

It confuses me as well. For some reason both major Display Managers (X and Wayland) are designed in Client-Server architecture, but used mainly for managing graphics locally.

So to make my self more clear, I'm interested in managing graphics locally only!

BUT, I'm also interested to know, what are the design reasons behind X and Wayland that made them use Client-Server architecture (I suspect this was historical consideration, at least for X server).
TCP/IP: Connecting people...
User avatar
iansjack
Member
Member
Posts: 4703
Joined: Sat Mar 31, 2012 3:07 am
Location: Chichester, UK

Re: The theory behind Window Servers/Window Systems (GUI)

Post by iansjack »

As far as I know the reason that they use client-server architecture is because they are designed to work across a network. And very convenient it is, too.
User avatar
skwee
Member
Member
Posts: 73
Joined: Mon Jan 12, 2009 3:11 am

Re: The theory behind Window Servers/Window Systems (GUI)

Post by skwee »

I see.
Anyway, currently I'm interested only in Local graphics management.
TCP/IP: Connecting people...
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: The theory behind Window Servers/Window Systems (GUI)

Post by Combuster »

skwee wrote:For some reason both major Display Managers (X and Wayland) are designed in Client-Server architecture, but used mainly for managing graphics locally.
And you realize that the opposite design would be peer-to-peer, which doesn't quite make sense given the significantly distinct roles of the application and the graphics driver? :wink:

Hence, since we need that architecture already, why not choose the implementation of the communication mechanism such that it trivially extends over a network? You are not required to do it that way, but it'll be a mostly uncorrectable design choice if you do.
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
User avatar
skwee
Member
Member
Posts: 73
Joined: Mon Jan 12, 2009 3:11 am

Re: The theory behind Window Servers/Window Systems (GUI)

Post by skwee »

Combuster
Oh! I think I got it now! Until that point, I failed to understand where does networking come in managing GUI.
TCP/IP: Connecting people...
User avatar
bluemoon
Member
Member
Posts: 1761
Joined: Wed Dec 01, 2010 3:41 am
Location: Hong Kong

Re: The theory behind Window Servers/Window Systems (GUI)

Post by bluemoon »

However there is a 3rd design, instead of 2 entities communicates, you could put everything into one big entity like MS Windows, aka the big mess.
User avatar
skwee
Member
Member
Posts: 73
Joined: Mon Jan 12, 2009 3:11 am

Re: The theory behind Window Servers/Window Systems (GUI)

Post by skwee »

I found a very detailed and well explained article
The Linux Graphics Stack
From that blog post, I understand that I need to look into DRM\DRI and KMS.
TCP/IP: Connecting people...
klange
Member
Member
Posts: 679
Joined: Wed Mar 30, 2011 12:31 am
Libera.chat IRC: klange
Discord: klange

Re: The theory behind Window Servers/Window Systems (GUI)

Post by klange »

Hello. I know there is a lot of contention on this board surrounding graphics stacks, and a number of users who frown upon the "status quo" as seen in another recent thread here, but I thought I'd describe some of the systems I have worked with in the past. For reference, I worked on Compiz after Beryl was merged back in, but prior to its restructuring and porting to C++, and I also worked on OS X's Quartz (WindowServer) at Apple for a short period of time. I also have my own window system that I am constantly improving and which is in the middle of a massive rewrite.

This is going to be a bit random and stream-of-thought...

In the real world, there are essentially two different sorts of windowing systems: Command-based, paint-directly-to-the-screen models like X or Windows and compositing draw-to-a-canvas systems like Quartz and Wayland. There is a sort of hybrid third option that can be seen in newer versions of Windows and in X window managers like Compiz in which the core of the windowing system is still command-based and probably still involves legacy applications believing themselves to be writing directly to the screen but where actual rendering is done through composition (compositing). I'll note I've never worked on any of the Windows systems, and I'm not really familiar with them so if I'm wrong about how the work it's because I'm mostly guessing.

In the early command-based systems, like X, applications send requests to the server to draw individual lines or shapes or render images or text. This is how legacy X applications work, and it's how older versions of toolkits like GTK used to work. It is very easy to make these systems network transparent because all of your communication happens over a command stream, and so X was built this way. This was very useful for early usage of X where a mainframe could be running your primary applications and you could be on a dumb terminal with just barely enough power to run a graphical display. This network transparency was a major factor in the design of the various GLX implementations in current use: if we can send general rendering over this command stream, perhaps we can send OpenGL commands?

Compositing and command stacks can go together, as is the case for Compiz and all other X11 compositors. The X server still keeps track of the actual window contents on its own and builds them from commands sent by clients (except in certain instances where special extensions are used to allow the client to draw directly into a buffer, but we'll get to that later) and it presents the windows to the compositor through an extension to the X protocol, usually involving a conversion of those window "pixmaps" (canvases, buffers, whatever you want to call them) to a format more suitable for use in the compositor (like an OpenGL texture - this happens through an OpenGL extension called "texture from pixmap") and the compositor draws the windows to the screen where they are supposed to be. Meanwhile, the compositor may steal some input as it is itself a client of the X server (but fun fact: it can't affect input to other windows beyond blocking it).

Compositing itself is not new. The Amiga used separate buffers for each window and composited them to the screen. OS X's Quartz has been around since the first release OS X and it has evolved from pure-software compositing to OpenGL hardware acceleration. However, in the earliest days, it was not very feasible in memory-constrained systems. Having applications use an assisted method to write directly to the screen meant that you only needed one buffer - the physical framebuffer. Now, with a full buffer-backed compositing system you need a lot more space - imagine having fifty fullscreen windows open at once, that's about 400MB on a 1080p display. Even 50 8bpp 800x600 windows is 23 megabytes - a good chunk of memory when those resolutions were popular.

The modern systems - Wayland, Mir, my compositor (Yutani) - do away with the command streams and just give applications direct access to their canvases. How they do this varies - and in fact Wayland doesn't specify exactly how, just what the format of the canvas needs to be - but typically, in a mature environment like Linux, you would use the graphics card's facilities to provide clients and the server with access to a texture in texture memory that represents the window. The end result is that the application needs some way to draw things - but we have dozens of these: Cairo, Clutter, SDL, OpenGL, and at higher levels, toolkits like GTK and Qt. Now, the problem is that this can't easily be made network transparent, so we have to use other protocols (like VNC) to send graphics regions of either individual windows (a specification for a native remote desktop system for Compiz was designed in this way, and remote use of Wayland is supposed to work this way) or the resulting composited framebuffer (VNC). If done inefficiently, this can be much slower and use much more bandwidth than commands streams.

How these systems manage accelerated graphics is usually a matter of having applications render directly into their own OpenGL context, which is usually the actual canvas used to render the window (Wayland does some extra layers of buffering to ensure "every frame is perfect"; in my pure-software environment, I point software Mesa at the actual window canvas and let it go) which is itself managed by kernel-level systems (DRI2, etc.) that eventually hit the drivers. Then the compositor has the resulting output as a texture it can do what it wants with (like rotate, or make transparent, or turn into a thumbnail preview...).
User avatar
skwee
Member
Member
Posts: 73
Joined: Mon Jan 12, 2009 3:11 am

Re: The theory behind Window Servers/Window Systems (GUI)

Post by skwee »

Thank a lot for this theory! Made some thing more clear!
TCP/IP: Connecting people...
h0bby1
Member
Member
Posts: 240
Joined: Wed Aug 21, 2013 7:08 am

Re: The theory behind Window Servers/Window Systems (GUI)

Post by h0bby1 »

i'm writing a simple windows mannager, i have worked with 3D engine, and video rendering/encoding for quite a while, under windows and linux and a little bit mac os

the first thing i find illogical, is if you compare to an application that is run in console mode, the application can have access to basic console input and output, the program is run from and into a console or terminal that already exists, same should be done with windows, but it would not be very intuitive to use, but in a way creating first a window context, and then running the program into it would seem more logical, but it's just an idea i had some while ago while thinking about designing such kind of system

windows in themselve are already an abstract concept, they doesn't mean much in term of hardware, they are basically just a rectangle with a position and and size that is also used to track mouse events, keyboard focus, and associated with rendering context in general, or it can use direct framebuffer access, in which case all the rendering command targeted for that window must be clipped manually to fit the window location in the framebuffer, either indeed you'll have a rendering context associated with the window in which all the rendering occur

windows are mostly to be seen as a way to share the screen space and mouse/key events between different application in multi tasking graphic environement

with the modern hardware, if you want to have minimum fast access, either for simple blit/copy, resizing, format conversion like for yuv format to speed up video decoding, the rendering is mostly done by the graphic card, so there is two level of command chained next to each other, command to windows/graphic context , like draw a filled rect into the windows, and then the rendering context will send commands to the graphic card to the job, then as mentionned you can either send those graphic card command directly to the framebuffer at window location and the clipping is done in hardware with a scissor, or either each window will have an offscreen surface into which all the rendering occur

this is already for 2D operation, like copying/resizing/converting images, most modern hardware and os don't give access to the actually framebuffer, but only to a backbuffer through a rendering context, and lot of the function has to be handled by the graphic card either from surface loaded in video memory , or direct copy from the ram

the 3D is another beast, as it use a ring buffer to put command into, so when you use a window with something like opengl, you will have the level of the windows/2D rendering context, and the glx/wgl interface to match the 2D rendering context of the windows with an hardware 3D rendering context, and then the graphic card will process the command from the ring buffer and draw them into the backbuffer, and the os then mannage to copy all the windows backbuffer into the framebuffer when it render the frame

but from my experience of 3D engine and all kind of multimedia programming, the best solution that i'm implementing is not to give direct access to rendering function to the user, but instead exposing an interface with container into which objects can be added, like text, images, 3D objects, whatever else, it's rather similar to the flash AS3 system with movie clip and objects added into it, the user level just can setup a scene to be rendered associated with a window/container, and the rendering engine then parse this scene structure to do the actual rendering, it's much cleaner, and the code to create the scene is independant from the renderer engine, technically the rendering could be totally asynchronous with the scene build up, it's always good practice i've been doing to have most of the application for 3D engine or video thing to compile without the need to include any system specific function, like for 3D scene, all matrixes and objects into a scene are mannaged without knowing the rendering engine behind, and then the renderer component will take this scene setup as an input to make the actual rendering with whatever is present on the system, it's already like that most gui works, you just add some gui component into the windows dialog context, and the rendering is cared about internally by the renderer

with this system, that is mannaged with a tree structure to add child to object, there is never the need to care about the actual rendering engine, just about what you want to render into the window, and setup the scene structure associated with the window like a movieclip in flash, and then the system call the rendering commands when it need to render the window, and the user space never render anything directy or have anything to do with the lower level rendering engine, it just add text, image, shapes, rect as child object in a tree structure with position/zoom/rotation relative to the previous one

the need for giving direct access to rendering function imo is inherited from old days in which graphic device could have very different graphic color mode, very different way to optimize some operation, and to setup many things, as nowdays the way most graphic hardware function is rather standard, there is not that much need to expose the direct rendering interface to application

in most case it's also good idea to have fonts and cursor handled by the lowest level drivers, as many graphic device can have specific way to handle them, like bios text vga have their own set of function to handle fonts and if you want to use a graphic cursor in text mode, it can be loaded as a font

actually my system can switch from graphic to text mode with the same windowing engine, with the mouse cursor and keyboard handled in the same way with a logical coordinate system, in sort that the end user code to create the windows and what there is to draw is completly independant from the actual renderer, i need to handle something cleanly with dpi or something to handle correctly different resolution/monitor for the position and scale of object to be rendered most optimally following the original size, the dpi it's supposed to be rendered on, and the actual dpi of the resolution/monitor

i think in windows hdc the basic rendering context already have an handling of dpi, as hdc can also be used as a printing source, and if you want to use stuff like scanners, or even for a good mannagment or vector font, it can be a good thing
Post Reply