Page 2 of 3

Re: Synchronous vs. asynchronous message passing

Posted: Wed Nov 07, 2007 9:51 am
by Colonel Kernel
jal wrote:So basically, you don't threads are a good idea at all? Only processes? That is possible of course, but creates a big overhead. Any shared information, even when never touched the same time, must be passed along via messages or the like.
I will spare everyone the pain of re-iterating all my old arguments in favour of Software-Isolated Processes in this thread. :) Suffice it to say that it is the current approach to process isolation that imposes the costs you describe. With software-isolated processes, message passing can be zero-copy and extremely efficient.

Sufficiently radical changes to CPU design (which may be necessary as we approach hundreds or even thousands of cores per chip in a decade) may be another solution to the problem. For example, imagine if each core had its own dedicated L2 cache that were directly addressable by the core (like the way the Cell SPEs work). The only way for memory to be shared would be for the OS to move it around from core to core using DMA. You'd get isolation essentially for free without paging, because each running process' memory would be in a physically separate cache from all the others. Message-passing itself would be hardware-accelerated, and with a good point-to-point interconnect technology, really fast.

Of course, this assumes that processes don't move between cores very often, and that cores don't switch between processes very often either. This is much easier to achieve with asynchronous message-passing. In a world with hundreds or thousands of cores it is not as wasteful as you might think either. This scheme also assumes that processes are small enough for the working set to fit in L2 cache, which means that programmers would have to aggressively decompose programs into teams of co-operating processes (e.g. -- the Actor model in Erlang).

This guy has some neat ideas about security and the principle of least privilege that dovetail very nicely with what I'm suggesting.

This is all blue-sky stuff of course, but that's how technology advances -- by trying to imagine the future.

Posted: Thu Nov 08, 2007 12:57 am
by mystran
'Synchronous' is synonymous to 20th century. Leads to all kinds of stupid things like having to simulate asynchronous operations by spawning a separate thread for each message sent.

Just do asynchronous IPC and then if some idiot application programmer insists on synchronous API, just do a Send();WaitForReply() in a library wrapper kludge or something. That guy won't ever write any well-behaving apps anyway...

The thing is, it's almost trivial to simulate synchronous operations building on top of asynchronous primitives, while the other way around it gets really ugly really fast, and generally involves lots of threads = lots of wasted resources.

Besides, for purposes of an RTOS, asynchronous operation solves quite a few problems. It's just that synchronous IPC is slightly easier to implement if you need a constant upper bound. One might also argue that certain types of proofs are easier if IPC is synchronous..

Posted: Thu Nov 08, 2007 1:50 am
by jal
mystran wrote:Just do asynchronous IPC and then if some idiot application programmer insists on synchronous API, just do a Send();WaitForReply() in a library wrapper kludge or something. That guy won't ever write any well-behaving apps anyway...

The thing is, it's almost trivial to simulate synchronous operations building on top of asynchronous primitives, while the other way around it gets really ugly really fast, and generally involves lots of threads = lots of wasted resources.
So asynchronous is the way to go then. I'm happy it is, since we already more or less decided that would be preferable. Then the next question arises: how to receive the answer? A special 'receive' call, a call back function, something else?


JAL

Posted: Thu Nov 08, 2007 3:14 am
by JamesM
I'm still trying to work out how you (as in all those who posted to this thread in favour of deprecating synchronous APIs) get around the point of return values. That is: I have a POSIX write call. That translates into a message to the ext2 driver, but POSIX states that the return value is important (of course it is! along with errno), so the call must be synchronous! Even if (as in my implementation) it is wrapped around an asynchronous system with a Post();Spin();Return(); combo it still has to be present.
Just do asynchronous IPC and then if some idiot application programmer insists on synchronous API, just do a Send();WaitForReply() in a library wrapper kludge or something. That guy won't ever write any well-behaving apps anyway...
How in this case do you handle POSIX calls etc in a microkernel system?

Posted: Thu Nov 08, 2007 3:20 am
by jal
JamesM wrote:That is: I have a POSIX write call. That translates into a message to the ext2 driver, but POSIX states that the return value is important (of course it is! along with errno), so the call must be synchronous!
I think you do not understand what asynchronous means. At least, from what you write, you seem to think it means "kick it off and never look back". However, asynchronous means "kick it off and come back for the answer later (and do something useful in the meantime)".


JAL

Posted: Thu Nov 08, 2007 5:43 am
by JamesM
Yes, but what useful work can possibly be done (with the exception of changing threads) in the following code:

Code: Select all

int retVal = write(fd, buffer, 10);
if (retVal == 0)
{
  doError();
}
The next instruction explicitly relies on the return value from the call. How do you do any useful processing when the next instruction relies on a return value?

Posted: Thu Nov 08, 2007 6:10 am
by Brendan
Hi,
mystran wrote:Besides, for purposes of an RTOS, asynchronous operation solves quite a few problems. It's just that synchronous IPC is slightly easier to implement if you need a constant upper bound. One might also argue that certain types of proofs are easier if IPC is synchronous..
IMHO it's not such a clear cut thing - there's several other characteristics that influence things.

For example, are messages received in the order they are sent? What if threadA sends message1 to threadB (which forwards message1 to threadD) then threadA sends message2 to threadC (which forwards message2 to threadD) - would threadD receive message1 before message2 or is there no guaranteed order? For my OS, messages sent from one thread directly to another thread are guaranteed to arrive in order, but messages sent from one thread via. another thread to a third thread are definately not guaranteed to arrive in order (the order depends on the load on each CPU, thread priorities, etc). For something like rendezvous messaging (often used in real time systems) you can guarantee that messages are received in order for both cases.

Then there's how delivery failures are handled. For synchronous messaging this is normally simple - either the message will be received or "sendMessage()" will return an error. For my OS there's no guarantee that a successfully sent message will be successfully received - the receiver can be on a different computer and could be terminated after the message is sent but before it's received. In this case there's only 2 ways to handle it - the sender (asynchronously) receives a "failed to deliver" message (like the "delivery failure" email you get several hours after you try to send an email to an unknown email address), or the message simply disappears with no way of knowing it wasn't received. For my OS the message simply disappears, as it's extremely difficult to guarantee that a "failed to deliver" message will be sent in all possible situations (including network failures), so it's easier not to bother (and to build software knowing that sent messages may not be received).


Cheers,

Brendan

Posted: Thu Nov 08, 2007 6:22 am
by jal
JamesM wrote:Yes, but what useful work can possibly be done (with the exception of changing threads) in the following code:

Code: Select all

int retVal = write(fd, buffer, 10);
if (retVal == 0)
{
  doError();
}
The next instruction explicitly relies on the return value from the call. How do you do any useful processing when the next instruction relies on a return value?
None. But that's the whole point really. You are thinking sequential, while asynchronous programming requires a different approach. What if your code would look like this:

Code: Select all

int retVal = write(fd, buffer, 10);
if (retVal == 0)
{
  doError();
}

retVal = write(fd2, buffer2, 10);
if (retVal == 0)
{
  doError();
}

retVal = write(fd3, buffer3, 10);
if (retVal == 0)
{
  doError();
}
In that case, you could easily write three files at once, then wait for their succes (or failure).


JAL

Posted: Thu Nov 08, 2007 7:54 am
by os64dev
Ok i'll have a go at this because the previous post was kind a of DUH.

@jal

Code: Select all

int retVal = write(fd, buffer, 10);
if (retVal == 0)
{
  doError();
}

retVal = write(fd2, buffer2, 10);
if (retVal == 0)
{
  doError();
}

retVal = write(fd3, buffer3, 10);
if (retVal == 0)
{
  doError();
} 
This will give you hell because even though the write has returned with a value it will have more then one state and you are in no position to read the state after. remember that the call write could have returned without even starting harddisk access.

The synchronous vs asynchronous is IMHO kind of weird any way. Jochen liedtke(creator of L4 micro kernel) made all calls synchronous because of performance issues. Keeping in mind that asynchronous behavior still could be achieved by using threads.

Posted: Thu Nov 08, 2007 8:57 am
by Brendan
Hi,
os64dev wrote:The synchronous vs asynchronous is IMHO kind of weird any way. Jochen liedtke(creator of L4 micro kernel) made all calls synchronous because of performance issues. Keeping in mind that asynchronous behavior still could be achieved by using threads.
Which performance issues?

For example, if you use segmentation to create "small address spaces" and avoid TLB flushing, and then make messages so small that they fit in CPU registers and only support single-CPU (never any need to copy or store message data), then (with synchronous messaging) sending and receiving a message can be as fast as a normal kernel API call.

Of course with 64-bit CPUs (that don't support segmentation) and multi-core everywhere, some of L4's design descisions start to look a bit silly.

I'm expecting that computers with 64 or more CPUs will be common by the time my OS is really useful. My highest priority is scalability. Removing the need for otherwise unnecessary threads and the re-entrancy concerns they cause is just a nice bonus.


Cheers,

Brendan

Posted: Thu Nov 08, 2007 9:30 am
by Colonel Kernel
JamesM wrote:How in this case do you handle POSIX calls etc in a microkernel system?
You don't, at least not efficiently. Although you could have something POSIX-like if you use futures:

Code: Select all

future<int> retVal = write(fd, buffer, 10); 
future<int> retVal2 = write(fd2, buffer2, 10); 
future<int> retVal3 = write(fd3, buffer3, 10); 

doSomethingElseUsefulForAWhile();

if ((retVal.getValue() == 0) ||
    (retVal2.getValue() == 0) ||
    (retVal3.getValue() == 0)) 
{ 
  doError(); 
}

Posted: Thu Nov 08, 2007 1:31 pm
by mystran
POSIX API is 20th century design, and it's a good design exactly as long as you want to do one thing at a time. That is, it's a wonderful API for batch processes. IMHO the "portability" it gets you is hugely overrated. Sure, it's a 'standard' but that doesn't mean it's good. It's not totally bad, but it assumes synchronous processing in a lot of places, which means it's ill-suited for anything that includes the words "client" and "server" -- ie. most modern systems.

That said, whether your IPC is synchronous or asynchronous, you have to account for the fact that sometimes you won't get a reply. Sure, in synchronous design you can have the kernel notify the client when server dies while a request is pending (so you can stop blocking and return an error) but then again, there's no reason you can't do this with asynchronous system as well: just have the client tell the kernel to send a message if the server process dies..

Obviously, independent of IPC type, that only works on single machine thanks to partial failures on network environment, so if your system involves stuff like network-mounted filesystems, you'll have to deal with timeouts anyway... which is like a million times easier when the API is asynchronous.

Anyway, asynchronous programming means understanding event-based programming, which in most cases involves certain amounts of inversion of control.. it's not easy for someone who's grown up in the batch world. IMHO it's still worth the trouble. IMHO we should prepare to get rid of stuff like POSIX APIs as soon as possible. IMHO that's the only way to eventually make software more reliable.

Now, if I could just find the time, I'd write down my design for a low-kernel-overhead asynchronous message-passing API, which allows easy userspace synchronous-call emulations while still keeping the event-loop clean, and avoids unbounded kernel allocations (that is, kernel memory is reserved for threads but not queued messages). But it's complex enough that I don't have the time right now..

Posted: Thu Nov 08, 2007 2:54 pm
by os64dev
@brendan
For example, if you use segmentation to create "small address spaces" and avoid TLB flushing, and then make messages so small that they fit in CPU registers and only support single-CPU (never any need to copy or store message data), then (with synchronous messaging) sending and receiving a message can be as fast as a normal kernel API call.

Of course with 64-bit CPUs (that don't support segmentation) and multi-core everywhere, some of L4's design descisions start to look a bit silly.
Well if you would have implemented core affinity, the fast/normal kernel API call would still hold.

But i agree with the latter statement. As my os will be 64bit and indeed for multicore it seems that some of the L4 descisions are outdated. Personally i would implement both synchronous and asynchronous, but starting with the synchronous. I am not at that point yet.

Posted: Thu Nov 08, 2007 3:49 pm
by mystran
L4 trades low messaging overhead for huge overhead else when you want to multiplex a service. Sure, you can wait for a message from any client, but if the server needs to call into another server, you need a separate thread if you want to keep serving other clients while the call is pending. Oh and without a timeout, it could be pending forever, with no way to cancel.. isn't that great?

Not to mention that the original L4 design means tons of extra messaging if you want to implement any sort of normal security on top of it, since the original L4 design doesn't have any sort of capability system. Try to implement something like POSIX file descriptors on top of it, and you'll see.

L4 is really not a good example of anything except scoring high on messaging micro-benchmarks. On macro-scale, it's an awful design when it comes to performance and resource utilization.

IMHO any sane design should allow a single-threaded server to properly multiplex for arbitrary number of clients, even if it needs to perform slow requests itself on behalf of it's clients. IMHO any sane design should also provide means to program against the possibility of malicious servers causing DoS on it's clients... which L4 design only satisfies if you use thread-per-request, essentially converting a local DoS to a global DoS when memory resources are exhausted by the unnecessary extra threads.

Posted: Fri Nov 09, 2007 5:38 pm
by bewing
mystran wrote: we should prepare to get rid of stuff like POSIX APIs as soon as possible.

Now, if I could just find the time, I'd write down my design for a low-kernel-overhead asynchronous message-passing API, which allows easy userspace synchronous-call emulations while still keeping the event-loop clean, and avoids unbounded kernel allocations (that is, kernel memory is reserved for threads but not queued messages). But it's complex enough that I don't have the time right now..
I agree with you, mystran -- but until somebody does actually sit down and create the perfect asynch API and a new beautiful language that makes event-loops look *pretty* -- I think it's going to be best to stick with the method of spawning tens or hundreds of mini-threads that block on the completion/acknowledgement of underlying asynchronous system calls.