developping kernel in C11

h0bby1 · Post by **h0bby1** » Fri Mar 07, 2014 4:41 pm

so i started to have a multi tasking engine that works well, and i'm now into figure out the most efficient way to implement all the semaphore/lock/memory barrier/atoms and synchronization primitive

For now it just kept it as simple for it to work, but there are many stuff that i need to implement in better fashion if i want to have everything that is perfectly safe and scalable.

so i just watched the C11 description, and it seem to already support all the synchronization primitive in a built in manner, and able to deal with memory barrier, code ordering/visibility, atomic operations, lock/semaphore primitive, and interface to manage thread

a few things i could find about C11 support for synchronization

http://www.cl.cam.ac.uk/~mjb220/popl085ap-sewell.pdf

general things about synchronization issues

http://www.1024cores.net/

normally from what i understood, the whole multi threading support is basically the same between C++11 and C11

but there is this mailing list from linus where he discuss issues about current compiler status, and general specification of the standard that can leave some corner case creating some buggs

http://lwn.net/Articles/586838/

(there are plenty of links in the page above if you want to figure it out in details)

apparently linus seem to say in the current state of things, the kernel is better off doing with custom memory barrier code than using atomics system from the c11 compiler

i guess for now, most people writing kernel create their own set of spinlock/semaphore/atomics/memory fence to suit their own need at the level of the kernel, but apparently C11 can mostly replace all of this, and generate the good assembler code for the target cpu to handle the synchronization correctly, which can be a huge advantage

has anyone already used C11 for kernel development, or have any experience using the synchronization primitive of C11 for kernel development ? or is there any major draw back at the moment given current state of compiler support and/or unclear specification, and better to code memory barrier and synchronization in assembler or C manually ?

it's not that it's that much of code, but it can always be heavily cpu dependant, and can be critical for the good working of the synchronization, and having all this done using C11 types could be a good thing if it works well

I guess if C11 compiler works well, they can deal much better with many case where memory order access can be important, and to have better understand of how the program is supposed to run taking in account multi threading operation, which can be a very good thing, but i'm not sure if compiler are really supporting the features very well, or if there is a real advantage to use them

sortie · Post by **sortie** » Fri Mar 07, 2014 4:49 pm

It is worth noting that the latest GCC release gcc 4.8.2 doesn't have C11 atomics, that is introduced in gcc 4.9.0 that will be released in the coming month or so.

h0bby1 · Post by **h0bby1** » Fri Mar 07, 2014 4:54 pm

even if i have to use some exotic compiler, if it does the job well it can be workable, on wikipedia there is a list of all compiler and how well they support it, there seem to be a bunch of fully C11 compliant compiler out there, but i'm not sure how they work, or what they worth at all, what kind of binary format they support, and if t hey are suitable for kernel development , like being able to switch off reference to it's runtime and library and all that

apparently visual studio seem to have no support at all for anything more recent than C89, latest version support C++11, but visual studio mostly dropped C support and oriented toward C++, but there are some compiler who apparently support it fully

as long as the standard and spec are clear enough, for that compiler can still make 100% safe and predictible and well defined implementation of the thing it is supposed to implement, it doesn't matter if it has to use specific compiler at least for the kernel part that deal with the low level aspect of synchronization

linus didn't seem to be happy with many things with it, so i wonder if there is really a need to worry about it, or if it's really useable and safe to use for kernel development where all corner case of memory synchronization can happen and will be very important

apparently also if i understood well what's going on, linus is also trying to get an hard bargain for that C11 standard can fit what they need for kernel development specifically, and he say the kernel will be the major customer of that kind of C feature if they adopt it, so he try to push it hard on the C11 comitee, but i still wonder if it's really useable for kernel development

h0bby1 · Post by **h0bby1** » Fri Mar 07, 2014 6:12 pm

maybe i'll get a look at this in the meanwhile http://mintomic.github.io/

xenos · Post by **xenos** » Sat Mar 08, 2014 2:00 am

I wonder which of these features are already builtins, so they are supported by a freestanding compiler, and which require runtime support (and are thus unavailable to kernel / OS developing unless you implement them yourself). But I have to admit that I haven't digged into C11 / C++11 atomics so far.

BTW, I'm developing my kernel in C++11 and already looking forward to GCC 4.9.0

h0bby1 · Post by **h0bby1** » Sat Mar 08, 2014 4:20 am

do you use the atomics and memory barrier things of C++11 with a multi tasking kernel ?

for now i only use C and asm for the base part, i could probably switch to c++ at some point, but i also wanted to keep simple and compatible between compilers, and easy to debug/disassemble and interface with asm, and C is more standard and simpler for certain things

but as my system is lot based on simple node structure, hash list in C etc, i'm sure there could be way to handle many situation in lockless manner, but would still require memory barriers and atomic types to be safe, the order in which things are executed matter, and there could be some use for some cpu specific things to handle certain cases

the lib i posted above is not that great btw

but there is part of the logic of memory access ordering that depend on compiler mostly, and it's ability to deal with multi thread logic, it would be nice if there were C compiler able to manage this without having to rely too much on specific directive or assembler

like having something like

Code: Select all

struct lock
{
   unsigned int current_lock;
   array           queue;
};


int *current_lock=&lock.current_lock;

if((*current_lock)!=0)
{
   add_this_thread_to_queue(lock);
   wait();
}

(*current_lock)=this_thread

logically there is nothing that would prevent the compiler to put the (*current_lock)=this_thread before the wait

or case like that were order can be important, i don't see anyway to deal with that other than with either compiler or cpu specific tricks and if it rely on compiler behavior, it needs to be reliable and predictible

there can be same issue when state variable/pointer need to be updated atomically in the good order

if C++11 compilers can handle this correctly, i can use c++ for some part, because it doesn't look like there is lot of enthousiasm regarding progress of C compiler toward C11, and c++11 seem to be more likely to be implemented correctly in compiler soon

i already need to organize all things neatly regarding read/write access to structure, and organizing a bit some routine to be workable in lockless manner, at least for some things, but it seem difficult to have safe implementation without the equivalent of memory barrier and atomic operations or doing some part in assembler with cpu specific things, but if compiler can be thread smart and used to deal with that sort of things in standard manner, it would be better

xenos · Post by **xenos** » Sat Mar 08, 2014 5:11 am

h0bby1 wrote:do you use the atomics and memory barrier things of C++11 with a multi tasking kernel ?

Not yet, so far I have my own implementation, but at some point I might switch to the C++11 atomics.

h0bby1 · Post by **h0bby1** » Sat Mar 08, 2014 5:14 am

XenOS wrote:
h0bby1 wrote:do you use the atomics and memory barrier things of C++11 with a multi tasking kernel ?
Not yet, so far I have my own implementation, but at some point I might switch to the C++11 atomics.

you implemented these in assembler directly ?

the thing that i wonder if there seem to be some concern about how compiler are supposed to handle certain case with optimization enabled, and there are some stuff that compiler might do that are not suitable for certain part of the code that need to handle multi thread logic

xenos · Post by **xenos** » Sat Mar 08, 2014 6:04 am

h0bby1 wrote:you implemented these in assembler directly ?

Not completely, just some very basic stuff that I needed - I also used some GCC builtins. But this is subject to change.

http://sourceforge.net/p/xenos/code/HEA ... tomicOps.h
http://sourceforge.net/p/xenos/code/HEA ... tomicOps.h
http://sourceforge.net/p/xenos/code/HEA ... tomicOps.h

h0bby1 · Post by **h0bby1** » Sat Mar 08, 2014 8:11 am

ok i see, well i want to avoid inline assembly, but i'll probably end up coding the atomics operation in assembler routines for the moment

remain the issue of ordering, and i'm not sure compilers are supposed to take in account anything done with inline assembler as it's not part of language specification, so i'm not sure a compiler is supposed to take in account fence instructions, it's only to avoid the cpu out of order execution right ? so for this probably need to use some tricks to force the compiler to do the thing in the right order

maybe i might end up in implementing a full c++11 runtime environment for higher level things, and recoding the whole multi tasking thing for user space in c++11, but i need to have a system to handle his at kernel level, and there doesn't seem to be very good option to handle this with C, c++11 could do the job, but i need to have at least a version that can work at low level for kernel part

for now there doesn't seem to be much viable solution other than coding atomics in assembler routine, and using compiler trick to handle multi thread related memory access ordering

xenos · Post by **xenos** » Sun Mar 09, 2014 3:11 am

h0bby1 wrote:remain the issue of ordering, and i'm not sure compilers are supposed to take in account anything done with inline assembler as it's not part of language specification, so i'm not sure a compiler is supposed to take in account fence instructions, it's only to avoid the cpu out of order execution right ? so for this probably need to use some tricks to force the compiler to do the thing in the right order

IIRC, the volatile keyword keeps the compiler from reordering this asm block, so that every instruction before the asm block happens before the mfence.

h0bby1 · Post by **h0bby1** » Sun Mar 09, 2014 8:17 am

i guess the volatile attribute can help , but it's not a standard behavior ? i want to avoid to rely too much on non standard compiler behavior, and it's not very clear to me what C specification exactly say on this topic

one way i see is like they explain in the article with linus, even if this solution make linus's hairs rise on his head, (and mine too a bit), can always do a statement like

if(dummy_function_returning_1())
do_something();

if the dummy function is defined in a different file in sort the compiler can't figure it does nothing and eliminate it, it should ensure the code is executed after

it's why i'm attracted to C11 things because it seem there is potential for that the compiler can really understand the multi thread logic as all the lock/atomics are part of the C language spec, and the compiler can deal with it accordingly, but there doesn't seem to much fully implemented C11 compliant compiler

because maybe i miss something, but in a semaphore logic, it seem really easy to figure out that anything that is after a lock must remain inside of the block, and cannot be reordered to execute outside of the lock/release block, and it shouldn't be too hard for the compiler to figure that out provided it can understand the multi threaded instruction, but otherwise it's almost as stupid as executing a statement inside of an if block before it

i'm not sure if there is really any clear documentation on how a C compiler is supposed to take in account a fence instruction in the compilation/optimization process, or if C standard even specify anything related to multi thread logic, and i'd rather avoid to rely on non standard feature even if some compiler can deal with it correctly, otherwise it's a bit expecting more from compiler than what they are really supposed to do

jnc100 · Post by **jnc100** » Sun Mar 09, 2014 3:08 pm

h0bby1 wrote:i guess the volatile attribute can help , but it's not a standard behavior ? i want to avoid to rely too much on non standard compiler behavior, and it's not very clear to me what C specification exactly say on this topic

The whole notion of inline assembly is compiler specific, and not defined in the C standard.

If you accept this, and are happy to stick with gcc (or possibly the Intel compiler), then you may want to look at the __sync and __atomic builtins which are supported in the current gcc (4.8.2). There is a reasonably good documentation there as to which memory fences are implied by each instruction. The obvious benefit over the inline assembly version is that the compiler knows what you are doing at each step and optimizes appropriately around your intent.

Regards,
John.

h0bby1 · Post by **h0bby1** » Sun Mar 09, 2014 3:37 pm

http://docwiki.embarcadero.com/RADStudi ... es_(BCC64) this compiler based on clang seem to support certain number of things, it's a c++ compiler though.

intel compiler seem to support a certain number of things too, i'm not sure how true it can be that intel compiler do not garantee anything for amd processor, agner frog talk about this ( http://www.agner.org/optimize/blog/read.php?i=49 ), not sure is there is really some issue with this, but intel has also their own libs to deal with parallelization and vector math, i'll get a look at what they have and if it can be used for kernel development

latest version of visual studio are supposed to have support for atomics and C++11 things related to multi threading

maybe even openMP can give some kind of solution to this, openMP is a lot about compiler directive, but i never looked too deeply in it yet

http://openmp.org/mp-documents/OpenMP-4.0-C.pdf

atomic [2.12.6] [2.8.5]
Ensures that a specific storage location is accessed
atomically. [seq_cst] is 4.0 .
#pragma omp atomic [read | write | update | capture]
[seq_cst]
expression-stmt
#pragma omp atomic capture [seq_cst]
structured-block
where expression-stmt may be one of:
if clause is... expression-stmt:
read v = x;
write x = expr;
update or
is not present
x++; x--; ++x; --x;
x binop= expr; x = x binop expr;
x = expr binop x;
capture v=x++; v=x--; v=++x; v= --x;
v=x binop= expr; v=x = x binop expr;
v=x = expr binop x;
and where structured-block may be one of the following
forms:
{v = x; x binop= expr;} {x binop= expr; v = x;}
{v = x; x = x binop expr;} {v = x; x = expr binop x;}
{x = x binop expr; v = x;} {x = expr binop x; v = x;}
{v = x; x = expr;} {v = x; x++;}
{v = x; ++x;} {++x; v = x;}
{x++; v = x;} {v = x; x--;}
{v = x; --x;} {--x; v = x;}
{x--; v = x;}

flush [2.12.7] [2.8.6]
Executes the OpenMP flush operation, which makes
a thread’s temporary view of memory consistent with
memory, and enforces an order on the memory operations
of the variables.

#pragma omp flush [(list)]

ordered [2.12.8] [2.8.7]
Specifies a structured block in a loop region that will be
executed in the order of the loop iterations
.
#pragma omp ordered
structured-block

h0bby1 · Post by **h0bby1** » Mon Mar 10, 2014 2:43 am

volatile could probably help, but as far as i know, volatile just say the compiler that change to the variable must be put back in memory as soon as possible, but it doesn't say much about ordering, it's "often" said that volatile also disable optimization on the variable, so i guess that include reordering, but i'm not sure that's really supposed to be used as a memory barrier, as it shouldn't prevent the compiler to reorder access before the access to the volatile variable, unless to set all variable as volatile, but i don't trust it that much

i guess i'm going to give a try at open MP, it seems the best solution so far, it's supported by all major compilers, and apparently have all the directive i need to do what i want to do, and the compiler should be doing all the work to inline what need to be inlined and should compile the thing with adequate ordering following multi thread logic, and it doesn't involve anything fancy, and the standard and functions seem to be defined properly in a way that all compiler that support openMP should recognize in predictible manner

OSDev.org

developping kernel in C11

developping kernel in C11

Re: developping kernel in C11

Re: developping kernel in C11

Re: developping kernel in C11

Re: developping kernel in C11

Re: developping kernel in C11

Re: developping kernel in C11

Re: developping kernel in C11

Re: developping kernel in C11

Re: developping kernel in C11

Re: developping kernel in C11

Re: developping kernel in C11

Re: developping kernel in C11

Re: developping kernel in C11

Re: developping kernel in C11