Languages for standalone

Programming, for all ages and all languages.
PeterX
Member
Member
Posts: 590
Joined: Fri Nov 22, 2019 5:46 am

Languages for standalone

Post by PeterX »

Which programming lnaguages can compile to a standalone program (= without needing runtime library/object file)?

- C can.
- Forth can, I guess.
- C++ can't because of OOP initialization.
- Lisp/Scheme can probably if I use a compiler and don't use sophisticated list operations (like eval).
- Rust can't?
- Can Go?
- Any other language?

Greetings
Peter
User avatar
bzt
Member
Member
Posts: 1584
Joined: Thu Oct 13, 2016 4:55 pm
Contact:

Re: Languages for standalone

Post by bzt »

I guess all compiled languages can if you statically link their runtime in. You can do that with C++, Rust, Pascal, Ada (gnat) etc. The big advantage of C here is that it was created for freestanding mode in the first place (libc and POSIX was added later). Other languages were designed in a way that they expect support from the OS to some extent.

C++ doesn't have a "freestanding" mode per se, but if your code avoids OOP (no exceptions, no new/delete calls, no streams etc.), you can get away with it. Rust is a bit trickier, because in theory you can use it in standard-library-free mode, but problematic TBH (much easier to get the stdlib statically compiled in than to write stdlib-free Rust code). With Ada there's no way to eliminate the runtime (the language has some constructs that require runtime support, like exceptions, generics, async functions and randevous points etc.).

Not sure if Assembly counts for you (as it's not a language rather a one-by-one translation of instructions), but if it does, then you can also count in many HLA capable assemblers (each with its own dialect and macro-sets).

Cheers,
bzt
nullplan
Member
Member
Posts: 1801
Joined: Wed Aug 30, 2017 8:24 am

Re: Languages for standalone

Post by nullplan »

PeterX wrote:Which programming lnaguages can compile to a standalone program (= without needing runtime library/object file)?
All languages, absolutely all of them, require some kind of support. Some of that support can be implemented in the bootloader (that would be load-time support), but still some things need to be done at run time. And it depends on your compiler and version what exactly that entails. For example, GCC compiled C code can make calls to libgcc functions, and calls to memcpy, memmove, or memcmp. So something must implement those. And at least the mem* functions aren't going to fall from the sky.

Depending on your choice of language, the run-time support becomes larger or smaller. C has a pretty small amount, especially on AMD64 if you avoid FPU stuff (which you should, in kernel mode), because in that case libgcc contains almost nothing.
PeterX wrote:- C++ can't because of OOP initialization.
A thousand issues with using C++ for a kernel, and this is the one you pick? Initialization can be implemented rather easily. You only need to figure out when a good time for the initialization would be (for instance, before setting up paging is probably not a good time), and then run all constructors in order.
bzt wrote:C++ doesn't have a "freestanding" mode per se,
False, it does too have one. Note that that one requires a lot more from you in terms of run-time support than C's freestanding mode: <new>, <typeinfo>, and <exception> are all part of the set of required headers for freestanding mode.
bzt wrote: but if your code avoids OOP (no exceptions, no new/delete calls, no streams etc.), you can get away with it.
All of these can be implemented. For exceptions, there is an ABI that says how to do it. new and delete can be overridden even by a hosted application, never mind a freestanding one, and streams are just a library you can implement. I personally don't see the point, but you can. If you are willing to stray outside the bounds of the standard, then you can use some GCC options to prevent the use of exceptions and RTTI, to reduce the amount needed for the run-time library.
Carpe diem!
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Re: Languages for standalone

Post by Solar »

Kindly refer to the Languages page of the OSDev Wiki which elaborates on exactly this issue, and has links to various language-specific subpages.
Every good solution is obvious once you've found it.
User avatar
bzt
Member
Member
Posts: 1584
Joined: Thu Oct 13, 2016 4:55 pm
Contact:

Re: Languages for standalone

Post by bzt »

nullplan wrote:All languages, absolutely all of them, require some kind of support.
Nope, C language doesn't need anything. All of its run-time is pushed into library functions. C language simply doesn't have any complex constructs (like strings, exceptions, streams etc). Comparing strings for example is just a function call like any other, has no language specific syntax. Allocating memory likewise, just a function call like any other, no language specific syntax.

The only true dependency a freestanding C code has is a stack and a zerod out bss, however those are also required by almost every executable no matter the language they were compiled from.
nullplan wrote:For example, GCC compiled C code can make calls to libgcc functions, and calls to memcpy, memmove, or memcmp. So something must implement those. And at least the mem* functions aren't going to fall from the sky.
In freestanding mode without optimization, no C compiler will make such calls (neither gcc, nor Clang, nor TCC, nor MSVC etc.). Btw, memcmp and friends appeared first in AT&T System V UNIX, and the C language is much older than that.

It is true that the gcc optimizer might emit mem* calls, but that's compiler specific. Furthermore gcc will automatically inject __builtin_mem* functions if you don't link with libgcc, so they literally "fall from the sky" with gcc. But this is again a compiler specific thing, CLang for example doesn't do that, neither has a libgcc library, meaning these aren't language features. (Not to mention that if you compile for UEFI for example, then the CLang optimizer will automatically generate CompareMem / ZeroMem calls and not memcmp / memset; so even the optimizer is environment and not language specific.)
nullplan wrote:
bzt wrote:C++ doesn't have a "freestanding" mode per se,
False, it does too have one.
The page you linked tells also that C++ freestanding mode is implementation-specific. A certain compiler might or might not implement that. It might inject the required functions transparently, or it might require a statically linked run-time. The point is, unlike C, the C++ language has features that won't automatically work, unless the programmer provides run-time support. (Eg.: memory allocation is part of the language for example)
nullplan wrote:Note that that one requires a lot more from you in terms of run-time support than C's freestanding mode: <new>, <typeinfo>, and <exception> are all part of the set of required headers for freestanding mode.
Including headers will avoid syntax errors, but they won't give you run-time support. That's another beast to feed.
nullplan wrote:
bzt wrote: but if your code avoids OOP (no exceptions, no new/delete calls, no streams etc.), you can get away with it.
All of these can be implemented.
They CAN be, but they are typically not provided by the compiler. Of course everything can be done if you statically link or implement the run-time support into your executable.

Cheers,
bzt
sj95126
Member
Member
Posts: 151
Joined: Tue Aug 11, 2020 12:14 pm

Re: Languages for standalone

Post by sj95126 »

bzt wrote:The only true dependency a freestanding C code has is a stack and a zerod out bss
Technically, you don't even need a bss, if you have no uninitialized global data. For a long time, my (admittedly small) kernel had no .bss section in its ELF header.

To be *really* pedantic, you could conceivably compile simple C code into a standalone flat binary and never use a stack either. You'd have to implement a custom ABI but it's possible.
nexos
Member
Member
Posts: 1081
Joined: Tue Feb 18, 2020 3:29 pm
Libera.chat IRC: nexos

Re: Languages for standalone

Post by nexos »

GCC programs require libgcc.
"How did you do this?"
"It's very simple — you read the protocol and write the code." - Bill Joy
Projects: NexNix | libnex | nnpkg
sj95126
Member
Member
Posts: 151
Joined: Tue Aug 11, 2020 12:14 pm

Re: Languages for standalone

Post by sj95126 »

nexos wrote:GCC programs require libgcc.
My kernel is built with gcc and doesn't use libgcc.
thewrongchristian
Member
Member
Posts: 426
Joined: Tue Apr 03, 2018 2:44 am

Re: Languages for standalone

Post by thewrongchristian »

sj95126 wrote:
nexos wrote:GCC programs require libgcc.
My kernel is built with gcc and doesn't use libgcc.
https://wiki.osdev.org/Libgcc#I_link_wi ... changes.3F
sj95126
Member
Member
Posts: 151
Joined: Tue Aug 11, 2020 12:14 pm

Re: Languages for standalone

Post by sj95126 »

thewrongchristian wrote:
sj95126 wrote:
nexos wrote:GCC programs require libgcc.
My kernel is built with gcc and doesn't use libgcc.
https://wiki.osdev.org/Libgcc#I_link_wi ... changes.3F
Seeing as how I moved libgcc.a out of the cross-compiler directory structure, it'd be awfully hard for it to use it without me knowing.
Octocontrabass
Member
Member
Posts: 5588
Joined: Mon Mar 25, 2013 7:01 pm

Re: Languages for standalone

Post by Octocontrabass »

bzt wrote:In freestanding mode without optimization, no C compiler will make such calls (neither gcc, nor Clang, nor TCC, nor MSVC etc.).
Here's GCC making a call to the libgcc function __divdi3() in freestanding mode without optimization.

Here's Clang making a call to memset() in freestanding mode without optimization.

Compiler Explorer doesn't seem to support TCC. I'm not familiar enough with MSVC to know if it automatically links against a library for the function calls it emits.
bzt wrote:Furthermore gcc will automatically inject __builtin_mem* functions if you don't link with libgcc, so they literally "fall from the sky" with gcc.
False. GCC will only emit inline code for the __builtin_mem*() functions in cases where the optimizer thinks doing so will be better than emitting a mem*() function call. This has nothing to do with linking against libgcc. You must provide implementations for the cases where the compiler chooses to emit a function call instead of inline code. (Try compiling with -Os, which tells the optimizer to prefer function calls since they're usually smaller than inline code.)

Clang does the same thing, but its optimizer will sometimes make different choices compared to GCC.
bzt wrote:But this is again a compiler specific thing, CLang for example doesn't do that, neither has a libgcc library, meaning these aren't language features.
You're correct that it's compiler-specific, but this is a bad example. Clang has inline implementations of the __builtin_mem*() functions, just like GCC, and emits calls to its support library, just like GCC. From what I understand, Clang typically uses compiler-rt instead of libgcc, and always links against it even in freestanding mode so you won't notice that it's a separate library unless it's missing.
nexos
Member
Member
Posts: 1081
Joined: Tue Feb 18, 2020 3:29 pm
Libera.chat IRC: nexos

Re: Languages for standalone

Post by nexos »

Overall, C itself appears to be a freestanding language with no dependencies. With GCC, you need libgcc, and probably a couple other functions the optimizer with make calls to. BSS should also be cleared. But the loader will do that (hopefully, unless it's my first loader and I didn't think it was needed until a variable initialized to zero in a program contained garbage :? ).
"How did you do this?"
"It's very simple — you read the protocol and write the code." - Bill Joy
Projects: NexNix | libnex | nnpkg
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Re: Languages for standalone

Post by Solar »

Hosted C needs:
  • Setup of the standard input, standard output, and standard error file streams.
  • Setup of the "C" locale (for ctype.h and time.h functions).
  • Arrays of function pointers to be used for registering functions via atexit() and at_quick_exit().
  • Initialization of all objects with static duration with their respective init values. (Neiter .bss nor .data nor .rodata is mentioned in the C standard.)
  • Setup of argc, argv in whatever way main() will expect those parameters on the platform. (Neither "stack" nor "heap" is mentioned in the C standard.)
  • Some way for getenv() to read environment variables. (POSIX handles this via a third parameter to main(), which is why I mention it separately from the library/kernel interfaces below.)
  • Jump to main().
  • On return from main, calling any functions registered via atexit() and at_quick_exit().
  • Flushing and closing of all open streams.
  • Delivering the return value of main() to the calling environment.
  • Of course, the backend functionality on which those library features rely that cannot be themselves be implemented without external support. fopen(), fclose(), fputc(), fgetc(), rename(), remove(), fseek(), time(), system(), ...
The usual mechanic is for the loader to call _start(), which does the setup, calls main(), and handles the wind-down after main() returns.

Freestanding C does not need to provide any library facilities beyond <float.h>, <iso646.h>, <limits.h>, <stdalign.h>, <stdarg.h>, <stdbool.h>, <stddef.h>, <stdint.h>, and <stdnoreturn.h> (which are, incidentially, those headers that only declare constants and macros, but no functions). That does away with most of the above requirements, unless of course your freestanding environment offers such support. Objects with static duration need still be initialized. The function called at program startup is implementation-defined, as is the effect of program termination.

C++ requires some mechanism to call constructors of objects with static duration, which is pretty easily solved with a bit of link script and two lines of plumbing in _start() (or by not having constructed objects with static duration). If you settle for a subset of C++ without exceptions and RTTI, you're done (and neither exceptions nor RTTI are of much use in kernel space anyway).

The rest (like those libgcc dependencies you are talking about) are an issue of GCC's implementation, not of the language.
Every good solution is obvious once you've found it.
User avatar
bzt
Member
Member
Posts: 1584
Joined: Thu Oct 13, 2016 4:55 pm
Contact:

Re: Languages for standalone

Post by bzt »

Octocontrabass wrote:False. GCC will only emit inline code for the __builtin_mem*() functions in cases where the optimizer thinks doing so will be better than emitting a mem*() function call.
Octocontrabass wrote:Clang has inline implementations of the __builtin_mem*() functions, just like GCC
I have a different experience. Look, here's an example, where it wasn't the optimizer that emitted the memcmp call. If I've used memcmp() or __builtin_memcmp(), I could compile this code with gcc, but not with CLang (both in freestanding mode). The solution was to use __builtin_memcmp() with gcc, and implement memcmp() with CLang.

I admit, this was with older versions (about two years ago), both gcc and CLang could have changed since. I can imagine for example that __builtin_memcmp() was added to CLang since.
sj95126 wrote:My kernel is built with gcc and doesn't use libgcc.
Mine neither. I've deliberately eliminated all compiler-specific libraries (wasn't easy, but possible). Now I can compile my OS with both gcc and CLang as-is (and possibly with many other ANSI C compilers too).
sj95126 wrote:Technically, you don't even need a bss, if you have no uninitialized global data.
Yes, that's true. I was assuming a typical freestanding code will need a bss, but true, you can do without.
sj95126 wrote:To be *really* pedantic, you could conceivably compile simple C code into a standalone flat binary and never use a stack either.
On the other hand I don't think this is possible (not on all architectures that is). Regardless to the ABI, on x86 some CPU instructions need the stack (like "call" or "ret" for example), and I don't think you can convince a C compiler not to use such instructions.

Cheers,
bzt
Octocontrabass
Member
Member
Posts: 5588
Joined: Mon Mar 25, 2013 7:01 pm

Re: Languages for standalone

Post by Octocontrabass »

bzt wrote:I have a different experience. Look, here's an example, where it wasn't the optimizer that emitted the memcmp call. If I've used memcmp() or __builtin_memcmp(), I could compile this code with gcc, but not with CLang (both in freestanding mode). The solution was to use __builtin_memcmp() with gcc, and implement memcmp() with CLang.
Replace -O2 with -Os and GCC emits calls to memcmp() too.
bzt wrote:Mine neither. I've deliberately eliminated all compiler-specific libraries (wasn't easy, but possible). Now I can compile my OS with both gcc and CLang as-is (and possibly with many other ANSI C compilers too).
How do you tell Clang to not link against libgcc or compiler-rt?
Post Reply