Programming "Views"

Programming, for all ages and all languages.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Programming "Views"

Post by Brendan »

Hi,
Kevin wrote:Well, convert each supported device and backend into a separate process, then your process diagram is overloaded. ;)

A lot of these units don't really exist in any running process anyway, they are mostly alternative. So for your disk backend, you usually have either a raw image, or a qcow2 image, or a VMDK image, or an NBD connection, etc. but rarely you use all of them at the same time. Similarly, you have either PC hardware or some ARM board or an old PPC Mac platform, but you never need all the devices at the same time.

The other thing is that for perfomance reasons you'll likely want to have the device implementation in the same thread as the backend (I assume for purpose of this IDE, thread and process are mostly equivalent). So a running instance will have an IDE device and a qcow2 backend in one process, and the SCSI device and a raw image in a second process. So you actually have a lot of processes/threads involved, but from the perspective of this IDE they are just one type of process, which would contain all block backends and all block device emulations as its units, right?
I'm think using an emulator as an example was an unfortunate choice - I've got a few strange plans involving devices and emulation, and this is probably going to get complex (and off topic) very quickly...

Imagine you've got a real PCI device with a real device driver running on the OS where the device driver happens to export the "emulated PCI device" messaging protocol; so that any/all emulators can take advantage of the real PCI device (by using the "emulated PCI device" messaging protocol to talk to the real device driver).

Now let's assume this is a distributed system. We can have a single a virtual machine that happens to be using a mixture of real and emulated devices on several different/remote computers without knowing it. For example, we could have a LAN of 20 computers where each computer has 2 separate real video cards (and a real SATA controller and 3 real USB controllers); with an OS (e.g. Windows or Linux) running inside a virtual machine that's able to use all 40 real video cards (and all 20 SATA controllers and all 120 USB controllers).

Also...

By using a complex arrangement of exception handler abuse; it's possible for micro-kernel to trick a real device driver, such that the device driver thinks it's talking to real hardware but each time it accesses memory mapped IO or IO ports the kernel is actually trapping these accesses and silently forwarding them to a normal process that implements the "emulated PCI device" messaging protocol (and for any "virtual IRQs" sent from that normal process the kernel delivers the IRQ to the real device driver and pretends it was a real IRQ). In this way, a real kernel can have real device drivers that happen to be using emulated hardware.

Of course it's still a distributed system. For example, you can one computer running 20 processes that emulate 20 different video cards; and then have 20 more computers on that LAN that are running 20 real device drivers (but talking to the emulated video cards on the first machine).

Now; let's imagine you're writing a device driver for a Nvidia video card. You start by implementing a "dummy driver" that doesn't actually do much more than support the "emulated PCI device" messaging protocol. This is relatively easy (it's mostly "pass through") - e.g. if the driver is told (by the emulator) to emulate a write to a video card register it just does the write to the real video card's register. It's only when bus mastering is involved that the dummy video driver would have to actually do any emulation (e.g. do the bus mastering transfer using physical addresses of a temporary buffer rather than "virtual physical addresses" from the virtual machine).

Once the "dummy driver" is done and working right; you'd be able to run an OS (e.g. Windows/Linux) inside a virtual machine and that OS would be using your real "dummy driver" and use the real video card. Of course you'd also implement a whole pile of logging in your "dummy driver", and end up with the most powerful reverse engineering tool you could hope for (e.g. log every single read and write to the video card that Window's native video driver makes).

The next step would be run a "testing" video driver that thinks it's using real hardware, but get the kernel to trap everything and send it to a copy of the previous "dummy driver" (that still does a pile of logging, and can reset/recover the device when your "testing" video driver crashes or does something silly). That way you get the most powerful device driver debugging environment you could hope for.

Of course this is still a distributed system. You could have a computer for testing somewhere on the LAN that is running the "dummy driver" and the virtual machine and/or "test video driver"; and use a different computer for viewing the logs, running the IDE, etc. That way if something goes very wrong and the test computer crashes hard (e.g. blows chunks and triple faults) then you just have to wait for it to reboot again without having the computer you're actually using interrupted.

TL;DR; The way existing emulators are designed isn't really what I'd be aiming for; and (for my purposes) each "virtual device" would be a separate project that isn't really part of the emulator project at all.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Kevin
Member
Member
Posts: 1071
Joined: Sun Feb 01, 2009 6:11 am
Location: Germany
Contact:

Re: Programming "Views"

Post by Kevin »

Okay... That was an interesting, even adventurous read, but I doubt that this would work out.

Let's leave performance aside for the moment, because that you're never going to get reasonable performance out of this design is too obvious to be interesting (especially once we talk about virtualisation instead of emulation, so that it's no longer an excuse that the CPU is going to be slow anyway). You're enabling a few corner cases and if you force the design for these corner cases on the common cases, too, they will inevitably suffer.

So you want to do PCI passthrough. Nothing wrong with that, and most parts look easy enough to implement generically (eek, the bad word!) for all PCI devices. Except for DMA, which is the tough problem that needs to be solved. Your solution is to give up on a generic solution and to have specific passthrough drivers for each device. Their job is basically "only" to forward MMIO and port accesses ad rewrite memory addresses that are passed this way.

For very simple devices is this easy enough: The passthrough driver just needs to know, which register takes physical addresses and convert any value written into it from guest-physical to host-physical addresses (note that already this isn't generally possible: the guest-physical address might point to MMIO of another device, or just be completely invalid). It also needs to store the guest-physical address in case the guest wants to read it back from the register.

Then you get devices that have a linked list of request descriptors in memory and just pass the address of the first descriptor to a register in order to start processing. Now the pasthrough driver needs to parse the whole list and rewrite all addresses in the descriptors. For this, it has to copy them to temporary buffers because when the guest reads back the memory, it should see the old addresses. Then it can forward the MMIO write to the real device and monitor the state of the descriptors. Once it sees that they have been processed, it can copy back the changed fields in the descriptor and free the temporary buffers. Perhaps it also needs to do the same for intermediate updates of the descriptors.

And then you have devices that have a list of (lists of) descriptors, but aren't kicked by some I/O, but just poll the list 1000 times a second. (Hi, USB!) You need to parse all descriptors and monitor the memory of each descriptor for guest writes all the time and somehow synchronise it with the state of the real device.

All of this is possible, no doubt (well, you may not be able to process 1000 polls a second across the network). But your passthrough driver needs to understand a lot about how the real device works so that it can rewrite addresses and allocate/synchronise/free temporary buffers at the right point in time. It needs to know, and intercept, every single cause for DMA on this device. With that much state in the passthrough driver, you'll probably not be far away from a complete device emulation.

That, and performance, is probably the reason why nobody is doing it. Everyone uses an IOMMU for PCI passthrough, because it's the only sane way to do it, and to do it generically. It gives you all of the address rewriting in hardware, but it comes with its own share of problems (like hardware providing groups of devices that can only be virtualised together because they always use the same IOMMU context).

And even with an IOMMU, you need a physical address on the host. This might turn out troublesome in a distributed environment, because it means that the RAM of the VM must be on the machine that the device sits on. Which basically means that you need to distribute RAM across the machines as well (this will be a heavily NUMA virtual machine...). Except that this doesn't help when you have two devices on differernt machines accessing the same memory.

</offtopic>
TL;DR; The way existing emulators are designed isn't really what I'd be aiming for; and (for my purposes) each "virtual device" would be a separate project that isn't really part of the emulator project at all.
This isn't really related to all of the above. ;)

If you see a project as a self-contained unit of code, why not. If you see it as an organisation of people working together on something, namely the emulator, not so much. They will probably all be involved in the emulator project and in several virtual device projects. So now you have just moved the problem from the processes view to the projects view. Which I guess is the good old filesystem?
Developer of tyndur - community OS of Lowlevel (German)
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Programming "Views"

Post by Brendan »

Hi,
Kevin wrote:Let's leave performance aside for the moment, because that you're never going to get reasonable performance out of this design is too obvious to be interesting (especially once we talk about virtualisation instead of emulation, so that it's no longer an excuse that the CPU is going to be slow anyway). You're enabling a few corner cases and if you force the design for these corner cases on the common cases, too, they will inevitably suffer.
It's too hard to determine how performance will be effected. The worst case is reads from device registers (where an emulated CPU has to wait for data to arrive from the device before continuing), but even for that case it's easy enough to emulate a different CPU until the data arrives (e.g. like hyper-threading in real CPUs). The real question is whether the overhead of sending/receiving messages (and stalling in cases where time spent waiting for replies can't be hidden) is more or less than the performance improvement you get from doing everything in parallel.
Kevin wrote:So you want to do PCI passthrough. Nothing wrong with that, and most parts look easy enough to implement generically (eek, the bad word!) for all PCI devices. Except for DMA, which is the tough problem that needs to be solved. Your solution is to give up on a generic solution and to have specific passthrough drivers for each device. Their job is basically "only" to forward MMIO and port accesses ad rewrite memory addresses that are passed this way.
Eventually, it'd just be an extra feature built into (some/most) native device drivers.

Don't forget that the code needs to be able to restore the device to a sane state after the client/emulator terminates. For example, consider an ancient VGA card (which has no DMA or bus mastering) where the emulator is terminated in the middle of setting a video mode - you'd want the "pass through" code to restore the state of the real video card (and possibly display some sort of picture to indicate that the video card isn't being used by any real or emulated computer). "Generic pass through" was never really a sane option (even for the few devices that don't have some sort of DMA or bus mastering).

For most devices that do DMA and/or bus mastering; software (running on a real computer talking to real hardware) has to avoid race conditions. To avoid race conditions software typically doesn't touch the data being transferred until something (e.g. an IRQ) indicates that the transfer has completed. This means that (for emulation) the guest state only has to be sane at well defined "synchronisation points" (e.g. when an IRQ occurs, not before).

For example, if the guest OS sets up bus mastering to write 123456 bytes to disk, then the real driver can copy all the data into a temporary buffer immediately, do the transfer in its own time, then send the IRQ whenever the transfer finished. The real driver doesn't need to care about the emulated system modifying the data while the transfer is in progress. In the same way; if the guest OS sets up a bus mastering to read 123456 bytes to disk, then the real driver doesn't need to care about the emulated system reading the data before the transfer completes.
Kevin wrote:And then you have devices that have a list of (lists of) descriptors, but aren't kicked by some I/O, but just poll the list 1000 times a second. (Hi, USB!) You need to parse all descriptors and monitor the memory of each descriptor for guest writes all the time and somehow synchronise it with the state of the real device.
I'd be willing to bet there are ways to avoid "poll everything every 1 ms". For a start, I doubt you need to care about anything other than the subset of "everything" that matters for the next 1 ms of time.
Kevin wrote:That, and performance, is probably the reason why nobody is doing it. Everyone uses an IOMMU for PCI passthrough, because it's the only sane way to do it, and to do it generically. It gives you all of the address rewriting in hardware, but it comes with its own share of problems (like hardware providing groups of devices that can only be virtualised together because they always use the same IOMMU context).

And even with an IOMMU, you need a physical address on the host. This might turn out troublesome in a distributed environment, because it means that the RAM of the VM must be on the machine that the device sits on. Which basically means that you need to distribute RAM across the machines as well (this will be a heavily NUMA virtual machine...). Except that this doesn't help when you have two devices on differernt machines accessing the same memory.
Let's do some maths. Let's assume having devices as separate processes means that things like reading/writing a device's registers and transferring data from temporary buffers to "emulated physical memory" becomes 500 times slower. It's not going to effect how quickly a CPU can calculate the millionth digit of PI, and it's not going to make hard disk seek times any slower. How much will it contribute to "overall guest OS performance"? My guess is that it might make the guest OS seem 5% slower, because the guest OS's spends a lot of time doing other things anyway. Maybe it will effect "overall guest OS performance" more than that, and maybe it'll will effect "overall guest OS performance" less than that - it doesn't really matter.

Now; I think I've described is a system that's far more powerful and flexible than anything that has ever existed. Do you honestly think I'm willing to completely destroy everything that makes this awesome just to get a little more performance? That would be extremely stupid. I'll do what I can; but quite frankly if it ends up being 100% slower I don't care at all. The first rule of virtual machines is: if you need maximum performance then you shouldn't be using a virtual machine to begin with.
Kevin wrote:
TL;DR; The way existing emulators are designed isn't really what I'd be aiming for; and (for my purposes) each "virtual device" would be a separate project that isn't really part of the emulator project at all.
This isn't really related to all of the above. ;)

If you see a project as a self-contained unit of code, why not. If you see it as an organisation of people working together on something, namely the emulator, not so much. They will probably all be involved in the emulator project and in several virtual device projects. So now you have just moved the problem from the processes view to the projects view. Which I guess is the good old filesystem?
Imagine there's 4 completely separate open source projects - let's says they are Firefox, PhpBB forum software, a PHP interpreter and Apache. If someone happens to spend time working on 2 or more of these completely separate projects; does that make them all parts of the same "meta project"?

Imagine there's 4 completely separate open source projects - let's says they are a JIT emulator that lets you run 80x86 guest OSs on ARM CPUs, a process that implements a virtual PCI video card (that just displays a window on the host OS's GUI), a native SATA controller driver for an OS that happens to support "pass through", and an emulator that uses hardware virtualisation on 80x86 to emulate an 80x86. Now, if someone happens to spend time working on 2 or more of these completely separate projects; does that make them all parts of the same "meta project"?

In which way is the first case (different projects that end up being combined by a user to create thier forums) any different to the second case (different projects that end up being combined by a user to create their virtual machine)?


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
Rusky
Member
Member
Posts: 792
Joined: Wed Jan 06, 2010 7:07 pm

Re: Programming "Views"

Post by Rusky »

Have you seen Subtext? In particular, schematic tables are pretty relevant here. It's more focused on visual representations of code itself rather than projects, but I think it's a natural extension of your ideas. Visual representations of code can convey concepts much more clearly, and enable entirely novel ways of manipulating code that are much more powerful than editing text.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Programming "Views"

Post by Brendan »

Hi,
Rusky wrote:Have you seen Subtext? In particular, schematic tables are pretty relevant here. It's more focused on visual representations of code itself rather than projects, but I think it's a natural extension of your ideas. Visual representations of code can convey concepts much more clearly, and enable entirely novel ways of manipulating code that are much more powerful than editing text.
Oh my - the schematic tables video is extremely awesome. :shock:

I want something like this for writing code and debugging (as an alternative to, rather than as a replacement of, a "text like" view of a function's code). I'm not too sure how schematic tables would handle something like a simple loop (without resorting to recursion) though.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
Rusky
Member
Member
Posts: 792
Joined: Wed Jan 06, 2010 7:07 pm

Re: Programming "Views"

Post by Rusky »

The beautiful thing about schematic tables is they're entirely in the IDE. If you look at his older papers, the programming environment he's talking about is a lot like an AST. Schematic tables is just one view into it (and a pretty simple transformation at that).

Plain old schematic tables would just use recursion (see his factorial example), but he also mentions ways to re-render tail recursion as iteration (since they're trivially equivalent). IIRC, rather than scrolling horizontally into a "zoomed in" stack frame, it would just let you scroll vertically through the different iterations.

One thing I like about his more functional-style system (since it's essentially at the expression level) is that debugging is just code browsing. There's no stepping through branching possible threads of execution, since it's all always visible (e.g. recursion can be browsed through horizontally). His idea as of the schematic tables video for I/O is similar although I think he's found some problems with it and moved onto another idea that he hasn't published much about.

His first paper on example-centric programming has some ideas that point to a good visual editor for tests as well.
User avatar
AndrewAPrice
Member
Member
Posts: 2305
Joined: Mon Jun 05, 2006 11:00 pm
Location: USA (and Australia)

Re: Programming "Views"

Post by AndrewAPrice »

From a user (the end-programmer trying to use your IDE) perspective, I think the view would be very domain-specific.

If I'm developing an interactive 2D or 3D application, I love Unity3D. Unity is a great example of a domain-specific IDE.

I developed my latest game Nullify with Unity. The rendering in Nullify was done completely in C#, (the level is procedurally drawn each frame) but as the game was running, I could switch to the "scene" tab - I could pause/resume the game while looking around the 'scene' arbitrarily. I could click on any game object, and in the "properties" window I could see a a list of public variables and manipulate them in real time. Many properties are context-specific - selecting "Texture2D" variables would show a preview of what is inside the texture.

They are tools and features that once you're use to, it's hard to live without (I'm still fond of the good old days of developing and debugging my 3d engine in Visual Studio/C++ with none of that, but Unity definitely makes life easier when trying to accomplish the same goals.)

But, I wouldn't use Unity for database/server programming - that's when I have hundreds of small Perl and PL/SQL scripts that "do one thing and do it well", that I pipe together. An IDE with some sort of 'flow' tool that allows me to string together scripts, matching outputs to input streams and parameters (and being able to pause it, and view the input/output of each stage individually) - like this would really help.
My OS is Perception.
Post Reply