Replacing fork() with egg byte codes

Discussions on more advanced topics such as monolithic vs micro-kernels, transactional memory models, and paging vs segmentation should go here. Use this forum to expand and improve the wiki!
linguofreak
Member
Member
Posts: 510
Joined: Wed Mar 09, 2011 3:55 am

Re: Replacing fork() with egg byte codes

Post by linguofreak »

nullplan wrote:In order to set all pages to CoW, I have to set them read-only in the CPU. This requires dumping the entire user space TLB for the calling process. Even if I set them back to being writable as soon as the child process exits or execs, it is still a major performance impact on the parent. And all just for a call to fork().
Honestly, though, how often does fork() get called by a parent that isn't just going to yield the CPU next thing? I'd bet that 99% of fork() calls don't come from a cpu-bound parent, but rather a user-input-bound or child-exit-bound parent. In other words, for most process spawns, the important thing is spinning up the child and getting it running as quickly as possible, while doing as little as possible that the child doesn't need, rather than returning to the parent quickly and minimizing performance impact on the parent. I'll note that the original implementation of fork() (on early Unix, where only one process was in memory at a time, and fork swapped out the current contents of user memory, without swapping anything in to replace it), could either have had the copy on disk be the parent and the copy in RAM be the child, or vice versa, but it treated the copy on disk as the parent and resumed execution with the child immediately after the system call.
User avatar
eekee
Member
Member
Posts: 892
Joined: Mon May 22, 2017 5:56 am
Location: Kerbin
Discord: eekee
Contact:

Re: Replacing fork() with egg byte codes

Post by eekee »

I like codyd51's egg as a structure; I don't like too many options on one line. Even Plan 9, which doesn't even try to get away from the nature of fork(), essentially has 9 options to its extended fork(). The options are passed as a bitfield so you don't have to set all 9, but it still could be messy in some uses. (I'm counting mutually exclusive flags as one.)
Kaph — a modular OS intended to be easy and fun to administer and code for.
"May wisdom, fun, and the greater good shine forth in all your work." — Leo Brodie
xeyes
Member
Member
Posts: 212
Joined: Mon Dec 07, 2020 8:09 am

Re: Replacing fork() with egg byte codes

Post by xeyes »

My thoughts on this topic:

1. CreateProcess() is a fine way. It is clearly defined and thus highly resistant to abuse.
fork() has no arguments, CreateProcess() on Windows has around a dozen
Most of CreateProcess()'s parameters are optional. Let's also not forget that execve/execvp/... also have lists of parameters, and a dozen more different calls (like fnctl or dup and open/close pipe create to redirect STDIO, or the myriad of [set/get][r/e][xyz]id calls needed to describe security contexts) are needed to replicate what CreateProcess() can do in a single shot.
never extensible (if a new property to be set comes along, you need a new version of the API).
There's no Ex version of CreateProcess(), ever wondered why? My take is that there isn't a strong enough need for that.

2. vfork() does invite a lot of abuses and documentations on how tolerate should the kernel be towards abusing seems to be system dependent, like Oracle's are different from Linux documentation on this. This creates ample opportunities for the kernel and the programs both trying (sometimes non standard) things in order to best accommodate each other and many times actually stepping on each other's toes while trying to be nice. fork() is fine though.

My biggest beef with the UNIX way of doing this is about the execve family. Every time I think about it, the image that some sort of parasite has done consuming its host and bursts out into the open with the host's flesh and blood flying around just appears vividly before my eyes :shock:

Makes one wonder did guy who originally design this method get his idea from some horror movie?

3. Process in the traditional sense is already a very big concept/scope in computing. Few processes have any legitimate need to keep creating and killing them (shells, service managers/init or other session managers obviously have to do this, "process drivers" like make need this, or some really fancy and huge programs might have their own pre-loaders or accompany processes, can't easily think of others that have a strong need for this). Thus I feel that it is their responsibility to understand and use the available utilities correctly, in exchange for the great power of managing the life and death of their likes. Namely, it's not the kernel's business to handhold them.

That said, if you end up making it I'd be excited to learn more about the details of the interface.
User avatar
eekee
Member
Member
Posts: 892
Joined: Mon May 22, 2017 5:56 am
Location: Kerbin
Discord: eekee
Contact:

Re: Replacing fork() with egg byte codes

Post by eekee »

xeyes wrote:My biggest beef with the UNIX way of doing this is about the execve family. Every time I think about it, the image that some sort of parasite has done consuming its host and bursts out into the open with the host's flesh and blood flying around just appears vividly before my eyes :shock:

Makes one wonder did guy who originally design this method get his idea from some horror movie?
LOL! I think my initial reaction to exec was much the same. It was a long time ago so I don't clearly recall it, but I've somehow ended up not wanting to think about exec very much. :? :) I don't think it's as bad as another Unix feature: parents have to clean up after the death of their children. And that's saying nothing about "kill" and "killed". Inappropriate analogies were common in the computers of the olden days.
xeyes wrote:3. Process in the traditional sense is already a very big concept/scope in computing.
It's become the mainstream, presumably because process=application was natural on single-core machines, but I think there have been exceptions for almost as long as there has been multitasking. In Plan 9, a process corresponds more nearly to the definition of a thread, except with optional protections.
Kaph — a modular OS intended to be easy and fun to administer and code for.
"May wisdom, fun, and the greater good shine forth in all your work." — Leo Brodie
xeyes
Member
Member
Posts: 212
Joined: Mon Dec 07, 2020 8:09 am

Re: Replacing fork() with egg byte codes

Post by xeyes »

eekee wrote:
xeyes wrote:My biggest beef with the UNIX way of doing this is about the execve family. Every time I think about it, the image that some sort of parasite has done consuming its host and bursts out into the open with the host's flesh and blood flying around just appears vividly before my eyes :shock:

Makes one wonder did guy who originally design this method get his idea from some horror movie?
LOL! I think my initial reaction to exec was much the same. It was a long time ago so I don't clearly recall it, but I've somehow ended up not wanting to think about exec very much. :? :) I don't think it's as bad as another Unix feature: parents have to clean up after the death of their children. And that's saying nothing about "kill" and "killed". Inappropriate analogies were common in the computers of the olden days.
Glad to hear that I'm not the only one :)

Indeed, as you said, lots of inappropriate analogies.

I'm trying to fend off zombies while also making the POSIX apps happy, not sure how long can this delicate balance hold itself until having to let the zombies in.
eekee wrote:
xeyes wrote:3. Process in the traditional sense is already a very big concept/scope in computing.
It's become the mainstream, presumably because process=application was natural on single-core machines, but I think there have been exceptions for almost as long as there has been multitasking. In Plan 9, a process corresponds more nearly to the definition of a thread, except with optional protections.
Interesting, have to read up on how Plan 9 makes processes similar to threads.

I actually think that SMP machines make process an even more powerful (and bigger) concept now due to threads inside it running truly in parallel to each other thus allowing great throughput and great responsiveness all in one entity.

Large NUMA machines do present challenges to this as there's a conflict between the symmetric view of memory inside a process, and the asymmetric memory routing and performance in the underlying HW.
linguofreak
Member
Member
Posts: 510
Joined: Wed Mar 09, 2011 3:55 am

Re: Replacing fork() with egg byte codes

Post by linguofreak »

xeyes wrote:My thoughts on this topic:

1. CreateProcess() is a fine way. It is clearly defined and thus highly resistant to abuse.
fork() has no arguments, CreateProcess() on Windows has around a dozen
Most of CreateProcess()'s parameters are optional. Let's also not forget that execve/execvp/... also have lists of parameters,
The maximum set of arguments taken by any exec family function is equivalent to the arguements lpApplicationName, lpCommandLine, and lpEnvironment to CreateProcess(). The command line is broken up into individual args, and in the execl family those are a variadic argument list instead of an array as they are for the execv family, but either way it's just a different way of passing the same information passed in lpCommandLine.
and a dozen more different calls (like fnctl or dup and open/close pipe create to redirect STDIO, or the myriad of [set/get][r/e][xyz]id calls needed to describe security contexts) are needed to replicate what CreateProcess() can do in a single shot.
That's not a bug, it's a feature :D. The process of creating a new process (or starting a new program in the same process; exec need not only occur after fork(), see below) can be broken down into individual elements, each of which can be omitted if not needed.

As an example of exec'ing without fork(), consider the following, from my own machine:

Code: Select all

% ps                            
    PID TTY          TIME CMD
2432424 pts/26   00:00:00 zsh
2432431 pts/26   00:00:00 ps
% exec bash
$ ps
    PID TTY          TIME CMD
2432424 pts/26   00:00:00 bash
2432479 pts/26   00:00:00 ps
2. vfork() does invite a lot of abuses and documentations on how tolerate should the kernel be towards abusing seems to be system dependent, like Oracle's are different from Linux documentation on this. This creates ample opportunities for the kernel and the programs both trying (sometimes non standard) things in order to best accommodate each other and many times actually stepping on each other's toes while trying to be nice. fork() is fine though.
vfork() is a red herring. It was a kludge adopted by BSD in the interim between PDP-11 Unix (where address spaces were small and fork() without COW wasn't hugely problematic) and the introduction of COW on VAX BSD. The man page for vfork() in 4.2 BSD warned that it was a temporary kludge and would eventually be made synonymous to fork(). It was made synonymous to fork() in 4.4 BSD. Since then, several systems have resurrected it for performance/memory reasons that might be useful on embedded systems, or back in the 90s, but POSIX deprecated it and eventually removed it, and allowed it to be identical to fork(). For maximum portability, it's best to assume that it's identical to fork(), and to avoid using it if possible(). If you're implementing a new OS with a unix or unix-like API, vfork() can be left out or can be an alias for fork().
My biggest beef with the UNIX way of doing this is about the execve family. Every time I think about it, the image that some sort of parasite has done consuming its host and bursts out into the open with the host's flesh and blood flying around just appears vividly before my eyes :shock:

Makes one wonder did guy who originally design this method get his idea from some horror movie?
Eh. I'd say it's more "body-snatchers" than "Alien". The host doesn't explode, it just starts acting like the parasite.
3. Process in the traditional sense is already a very big concept/scope in computing. Few processes have any legitimate need to keep creating and killing them (shells, service managers/init or other session managers obviously have to do this, "process drivers" like make need this, or some really fancy and huge programs might have their own pre-loaders or accompany processes, can't easily think of others that have a strong need for this). Thus I feel that it is their responsibility to understand and use the available utilities correctly, in exchange for the great power of managing the life and death of their likes. Namely, it's not the kernel's business to handhold them.
To me, a big monolithic function like CreateProcess() feels more handholdy than a fine grained fork+exec method.

A lot of games have subsystems that could be separated from the game into separate executables, as they only operate during one part of the lifecycle of a playthrough, are otherwise inactive, and generate data that's used later on user-interaction timescales (i.e, the game is going to spend the intervening time waiting for the user, so tight coupling for performance isn't critical). For example, consider a typical 4x: Map generation can be split off into a separate executable (and indeed, many games due have separate map editors and then duplicate part of the map editor functionality in the main game). Even randomly generated maps can often be saved for use in a later game. Specifying the player empire can be spun off into a separate executable: it's done once at the beginning of play, the output is often saved for later use, etc. It's not typically done, but it's an example of the kind of program that could benefit from creating child processes that doesn't fall into one of the categories you supplied. I would argue that if a component of a program can be split off into a separate executable, it generally should be, as otherwise a pointer bug in one module can screw up the heap for the whole program.

I think you'd be surprised, too, just how many shell instances run in the background on a typical Unix system. Under traditional init systems, all of your service management is done through shell scripts. Systemd has changed that somewhat on Linux, but it's deeply divisive, and isn't a thing on other *nixes.
Gigasoft
Member
Member
Posts: 856
Joined: Sat Nov 21, 2009 5:11 pm

Re: Replacing fork() with egg byte codes

Post by Gigasoft »

There's no Ex version of CreateProcess(), ever wondered why? My take is that there isn't a strong enough need for that.
Not so fast, there are a bunch of different functions on Windows that create processes, the high level ones in addition to CreateProcess being CreateProcessAsUser, CreateProcessWithLogon, CreateProcessWithToken. Then there is NtCreateProcess which does not create a main thread and only maps the main executable, NtCreateProcessEx which also associates the new process with a job, and NtCreateUserProcess which has a whole bunch of different attributes that can be set, similar to the OPs idea.
Post Reply