It's "do", and the feeling is mutual.turdus wrote:You totally does not understand
Why are ASM hobby OS more successful than other languages ?
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re: Why are ASM hobby OS more successful than other language
Re: Why are ASM hobby OS more successful than other language
Unnamed problems? I've named the null-pointer problem several times, but off the top of my head there's also array bounds checking, null-terminated strings, dangling pointers, uninitialized values, undefined behavior, array/function pointer decay, implicit type conversions, poor handling of co/contravariant pointers, and a string-based macro-system.
I'm not saying C makes things Unix-like, or that it makes these problems unsolvable, or even that you shouldn't use C. I'm saying C embodies a lot of these problems, and that Unix is a kind of local maximum in operating system design, so "to an extent," using C means you inherit the problems of that C/Unix design.
There are languages that solve these problems. Individually they don't solve all of them, or aren't well suited to systems programming, or don't have support in the areas you mention. I agree that mainstream acceptance is important, but that doesn't mean people should "turn a cold shoulder" on things that aren't there (yet).
I'm not saying C makes things Unix-like, or that it makes these problems unsolvable, or even that you shouldn't use C. I'm saying C embodies a lot of these problems, and that Unix is a kind of local maximum in operating system design, so "to an extent," using C means you inherit the problems of that C/Unix design.
There are languages that solve these problems. Individually they don't solve all of them, or aren't well suited to systems programming, or don't have support in the areas you mention. I agree that mainstream acceptance is important, but that doesn't mean people should "turn a cold shoulder" on things that aren't there (yet).
Re: Why are ASM hobby OS more successful than other language
...aaaaaand didn't add anything to what was said before.
There's also the problem of world hunger. I blame martians. We should all pray.
(Zero-terminated strings? Really? I had no idea! Of course, you are right, how come we didn't see the superiority of the alternative you suggested right away?)
There's also the problem of world hunger. I blame martians. We should all pray.
(Zero-terminated strings? Really? I had no idea! Of course, you are right, how come we didn't see the superiority of the alternative you suggested right away?)
Every good solution is obvious once you've found it.
Re: Why are ASM hobby OS more successful than other language
First all the problems were unnamed, but then when I named them I didn't add anything?
Solutions are pretty widely available, as I said, but here are some possibilities anyway:
null pointers - don't include null as a valid value in the default pointer type; replace their use for optional types with an Option type, replace their use as sentinels with array lengths (see below), etc.
array bounds checking - specify array size as part of its type, don't let arrays decay into pointers so the type is kept across calls/etc., let compiler check for possible overruns and only force runtime checks when necessary (yes, this is possible).
null-terminated strings - strings are arrays of chars, so their length gets stored with their type. It can thus be reified in the program without looping over the string, and there are no more security holes by default with strcpy et. al.
uninitialized values/dangling pointers - use typestate to make accessing an uninitialized value, an already-freed pointer, or otherwise non-useful value illegal.
undefined behavior - a lot of undefined behavior should instead be illegal- violating sequence points is trivial for the compiler to detect, using undefined values or freed pointers is solvable (see above).
implicit type conversions - make far, far less of them so that accidentally losing information or causing undefined behavior is impossible.
co/contravariance - this is easier to avoid in C, but this code should be illegal and is not even given a warning (again, by default):
string-based macro system - Lisp has had real macros since the 70s, and nasm has a good macro system even for assembly. Why are we still pasting strings together in C? Why not bits of AST that are immune to e.g. precedence issues that force parenthesizing every argument?
A good language to look at that attempts some of these in a systems programming context is Rust.
Solutions are pretty widely available, as I said, but here are some possibilities anyway:
null pointers - don't include null as a valid value in the default pointer type; replace their use for optional types with an Option type, replace their use as sentinels with array lengths (see below), etc.
array bounds checking - specify array size as part of its type, don't let arrays decay into pointers so the type is kept across calls/etc., let compiler check for possible overruns and only force runtime checks when necessary (yes, this is possible).
null-terminated strings - strings are arrays of chars, so their length gets stored with their type. It can thus be reified in the program without looping over the string, and there are no more security holes by default with strcpy et. al.
uninitialized values/dangling pointers - use typestate to make accessing an uninitialized value, an already-freed pointer, or otherwise non-useful value illegal.
undefined behavior - a lot of undefined behavior should instead be illegal- violating sequence points is trivial for the compiler to detect, using undefined values or freed pointers is solvable (see above).
implicit type conversions - make far, far less of them so that accidentally losing information or causing undefined behavior is impossible.
co/contravariance - this is easier to avoid in C, but this code should be illegal and is not even given a warning (again, by default):
Code: Select all
struct A { int a; };
struct B : public A { int b; };
A *as = new B[10];
A good language to look at that attempts some of these in a systems programming context is Rust.
Re: Why are ASM hobby OS more successful than other language
Now this post actually adds something tangible.Rusky wrote:First all the problems were unnamed, but then when I named them I didn't add anything?
What follows is not an attempt to say "there is no problem", but an attempt to show that "it's not that much of a problem". I will use C++, simply because it's a) a language I am familiar with, b) a system programming language, and c) addresses most of the issues you have.
As I said before, a NullPointerException goes a long way towards handling what was fatal to any C app. Boost::Option exists, std::vector and std::string exist.null pointers - don't include null as a valid value in the default pointer type; replace their use for optional types with an Option type, replace their use as sentinels with array lengths (see below), etc.
std::vector.array bounds checking - specify array size as part of its type, don't let arrays decay into pointers so the type is kept across calls/etc.,
Point to you. I'm not sure how good C++ compiler warnings are in this regard....let compiler check for possible overruns and only force runtime checks when necessary (yes, this is possible).
std::string.null-terminated strings - strings are arrays of chars, so their length gets stored with their type. It can thus be reified in the program without looping over the string, and there are no more security holes by default with strcpy et. al.
-Wall -Wextra -Werror goes a looong way towards this end.uninitialized values/dangling pointers - use typestate to make accessing an uninitialized value, an already-freed pointer, or otherwise non-useful value illegal.
-Wall -Wextra -Werror plus any of std::*_ptr, again don't solve the problem, but take much of the edge off it.undefined behavior - a lot of undefined behavior should instead be illegal- violating sequence points is trivial for the compiler to detect, using undefined values or freed pointers is solvable (see above).
Here, C++ wins over C without moving a finger.implicit type conversions - make far, far less of them so that accidentally losing information or causing undefined behavior is impossible.
ACK. While it is easy to avoid the issue, the fact that the compiler turns a blind eye is bad.co/contravariance - this is easier to avoid in C, but this code should be illegal and is not even given a warning (again, by default)...
Basically, you are free to use whatever macro system you want to use, even in C. You don't have to rely on the C preprocessor.string-based macro system - Lisp has had real macros since the 70s, and nasm has a good macro system even for assembly. Why are we still pasting strings together in C? Why not bits of AST that are immune to e.g. precedence issues that force parenthesizing every argument?
As for macros in C++ code, they fall into the same category as arrays in C++ code: You do, you die. At least as long as I can get away with tossing people out of the window.
And here we get right down to the point: I admit that C/C++ have a couple of weak spots. Some of them got better over time, some remain firmly rooted in their 197x / 198x heritage.A good language to look at that attempts some of these in a systems programming context is Rust.
But they are here, they are known, they are supported. Every programmer interested in system programming knows them, or will happily learn them as this knowledge significantly improves his CV. The footholes are well known and easily avoided by the careful programmer.
Rust? Looks nice at first glance. I'll come back in five years to find out if it has had its first release, and what other coders have to say about it.
All this is not meant to flame or to be derogatory. Yes, there are some issues in C and its descendants. But those descendants have evolved, which shouldn't be ignored for sake of making a point, and their ubiquitousness is a major factor. There's a reason why virtually all tutorials on OS development are in C or C++...
Every good solution is obvious once you've found it.
Re: Why are ASM hobby OS more successful than other language
Hi,
Let's assume that the boot loader is responsible for finding ACPI tables and telling the kernel the address of the RSDT. During kernel initialisation you want to parse the ACPI tables. Show us the code in the kernel that would replace "int parse_ACPI_tables(void *address_of_RSDT);".
This function should:
You can assume that (if the RSDT is present) the RSDT can be accessed directly from your kernel code via. the address supplied by the boot loader (e.g. no need to map physical pages into the kernel's virtual address space, no need to mess with segment bases, etc).
The latest version of the ACPI specification can be found here, but it doesn't matter if you use an earlier version of the specification if you want.
Good luck!
Cheers,
Brendan
Let's have a practical example.Rusky wrote:First all the problems were unnamed, but then when I named them I didn't add anything?
Solutions are pretty widely available, as I said, but here are some possibilities anyway:
Let's assume that the boot loader is responsible for finding ACPI tables and telling the kernel the address of the RSDT. During kernel initialisation you want to parse the ACPI tables. Show us the code in the kernel that would replace "int parse_ACPI_tables(void *address_of_RSDT);".
This function should:
- handle the "no ACPI tables present" case (e.g. "if(address_of_ACPI_tables == NULL)")
- check the checksum for the RSDT
- display the "OEMID" string from the RSDT
- for each table pointed to by the array of pointers in the RSDT, it should:
- check the checksum for that table, and:
- if the checksum is OK, display a message that includes the table's signature
- if the checksum failed, display a message that says the checksum failed
- check the checksum for that table, and:
- if all tables had valid checksums, then for each table pointed to by the array of pointers in the RSDT, it should:
- if the signature is known, call a function to parse that table based on the table's signature (e.g. call one function if the signature is 'APIC', a different function if the signature is 'FADT', etc). You only need to provide one of these functions for parsing the MADT (table signature 'APIC') - assume the rest are "to be implemented later"
- if the signature is not known, skip the table
- Display the APIC ID for any "IO APIC Structures" it finds
- Display the APIC ID for any "Processor Local APIC Structures" it finds
- Correctly ignore/skip all other structures in the MADT
You can assume that (if the RSDT is present) the RSDT can be accessed directly from your kernel code via. the address supplied by the boot loader (e.g. no need to map physical pages into the kernel's virtual address space, no need to mess with segment bases, etc).
The latest version of the ACPI specification can be found here, but it doesn't matter if you use an earlier version of the specification if you want.
Good luck!
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re: Why are ASM hobby OS more successful than other language
Finding the base in the kernel is simple enough. A few lines of assembly is all that is needed. It gets more complicated if paging is enabled, since then there is a need to mess with paging.Brendan wrote:Let's assume that the boot loader is responsible for finding ACPI tables and telling the kernel the address of the RSDT.
We have ACPICA for that.Brendan wrote:During kernel initialisation you want to parse the ACPI tables. Show us the code in the kernel that would replace "int parse_ACPI_tables(void *address_of_RSDT);".
This function should:
- handle the "no ACPI tables present" case (e.g. "if(address_of_ACPI_tables == NULL)")
- check the checksum for the RSDT
- display the "OEMID" string from the RSDT
- for each table pointed to by the array of pointers in the RSDT, it should:
- check the checksum for that table, and:
- if the checksum is OK, display a message that includes the table's signature
- if the checksum failed, display a message that says the checksum failed
- if all tables had valid checksums, then for each table pointed to by the array of pointers in the RSDT, it should:
- if the signature is known, call a function to parse that table based on the table's signature (e.g. call one function if the signature is 'APIC', a different function if the signature is 'FADT', etc). You only need to provide one of these functions for parsing the MADT (table signature 'APIC') - assume the rest are "to be implemented later"
- if the signature is not known, skip the table
However, ACPICA requires a set of functions to operate, and those are often not supported in your typical C compiler. For instance, C doesn't have IO-access, PCI-access, synchronization, and physical memory access.
Understanding ACPI and building an assembler-based handler seemed a little too complex issue with too little use, so I went into a lot of work to port the clib to kernel-space instead.Brendan wrote:The latest version of the ACPI specification can be found here, but it doesn't matter if you use an earlier version of the specification if you want.
BTW, even if using ACPICA, I don't see the functions I really need, like give me the IRQ for this PCI-device. That requires writing a lot of code to provide (which I'm currently doing, in C). At least now I have a list of devices and their resource uses and the PCI IRQ redirection table, which might be enough to solve that mapping issue. I plan to write a tool to list relevant device-info, which would be good for debugging.
When it comes to raw-table access (MADT and HPET), I let the device-drivers themselves parse them (in assembly). I don't see what use I would have for C or ACPICA there.
Re: Why are ASM hobby OS more successful than other language
Hi,
The idea is to get Rusky to show that his idea of a system programming language is actually capable of being used for system programming. Rusky isn't silly though - if I asked him to re-implement ACPICA in a language with strong type checking, array bounds checking, no "void *", no NULL, etc; then he'd probably be smart enough to refuse.
Cheers,
Brendan
Assume the kernel can be booted from boot loaders designed for PC BIOS and UEFI. For UEFI you ask the firmware for the address of the ACPI tables and don't do a stupid (cache thrashing) search for them.rdos wrote:Finding the base in the kernel is simple enough. A few lines of assembly is all that is needed. It gets more complicated if paging is enabled, since then there is a need to mess with paging.Brendan wrote:Let's assume that the boot loader is responsible for finding ACPI tables and telling the kernel the address of the RSDT.
You missed the entire point of the exercise.rdos wrote:We have ACPICA for that.Brendan wrote:During kernel initialisation you want to parse the ACPI tables. Show us the code in the kernel that would replace "int parse_ACPI_tables(void *address_of_RSDT);".
The idea is to get Rusky to show that his idea of a system programming language is actually capable of being used for system programming. Rusky isn't silly though - if I asked him to re-implement ACPICA in a language with strong type checking, array bounds checking, no "void *", no NULL, etc; then he'd probably be smart enough to refuse.
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re: Why are ASM hobby OS more successful than other language
That would make sense then.Brendan wrote:Assume the kernel can be booted from boot loaders designed for PC BIOS and UEFI. For UEFI you ask the firmware for the address of the ACPI tables and don't do a stupid (cache thrashing) search for them.
Brendan wrote:You missed the entire point of the exercise.
The idea is to get Rusky to show that his idea of a system programming language is actually capable of being used for system programming. Rusky isn't silly though - if I asked him to re-implement ACPICA in a language with strong type checking, array bounds checking, no "void *", no NULL, etc; then he'd probably be smart enough to refuse.
Yes, but my point was that you don't want to rewrite something for ACPI when all you need from ACPI is IRQ-mappings (basically). The other tables (MADT, HPET) can easily be parsed in any language (possibly in combination with assembly).
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re: Why are ASM hobby OS more successful than other language
Actually, it's check when the compiler can not prove it's safe. In other words, possible in a subset of array accesses. We demonstrated that fact earlier in another thread...let compiler check for possible overruns and only force runtime checks when necessary (yes, this is possible).
Re: Why are ASM hobby OS more successful than other language
C and C++ have a lot of support, a lot of users, serve well for systems programming, and have libraries, features, or compiler warnings to address the issues with their "vanilla" forms- definitely agreed. However, these options often aren't completely satisfactory, add runtime overhead, needlessly complicate things, or don't actually prevent incorrect code from running.
Language features and compiler-enforced rules would be an improvement because they would be the default, they would be more efficient, and they would be able to do much more than a separate static analyzer. Let's continue to evolve what's available for systems programming- how do you think C++ got started?
NullPointerExceptions only happen at runtime, they cannot be caught at compile time once null is allowed. It's the same problem as dynamic typing- that's essentially what pointers are, as the values null/dangling pointer and valid address do not support the same set of operations. C++ references solve the typing problem, but they don't allow pointer arithmetic without re-entering the dynamically-typed, compiler-unchecked world of C-style pointers.
Containers like boost::option, std::vector, and std::string are good, but they aren't perfect. std::vector and std::string add overhead that can often be avoided, and don't always guarantee in-bounds access. They can't be used for everything (including things like parsing ACPI tables), whereas simple first-class arrays are more transparent and don't have these problems. Think of them as this class, with compiler-checked access:
C++ doesn't actually solve the problems of C's implicit type conversions. For example, this compiles with no errors or warnings as both C and C++, even with -Wall -Wextra -Werror -pedantic:
Macro systems are not entirely separate from their languages- especially Lisp macros, which were my first example because they are so much more powerful and useful than either the C preprocessor or C++'s various replacement features. Slapping another preprocessor on top doesn't let macros manipulate abstract syntax trees and in-language values to do things like generate tables, create custom control structures, or design DSLs.
Language features and compiler-enforced rules would be an improvement because they would be the default, they would be more efficient, and they would be able to do much more than a separate static analyzer. Let's continue to evolve what's available for systems programming- how do you think C++ got started?
NullPointerExceptions only happen at runtime, they cannot be caught at compile time once null is allowed. It's the same problem as dynamic typing- that's essentially what pointers are, as the values null/dangling pointer and valid address do not support the same set of operations. C++ references solve the typing problem, but they don't allow pointer arithmetic without re-entering the dynamically-typed, compiler-unchecked world of C-style pointers.
Containers like boost::option, std::vector, and std::string are good, but they aren't perfect. std::vector and std::string add overhead that can often be avoided, and don't always guarantee in-bounds access. They can't be used for everything (including things like parsing ACPI tables), whereas simple first-class arrays are more transparent and don't have these problems. Think of them as this class, with compiler-checked access:
Code: Select all
template <typename T, int len>
class array {
private:
T data[len];
public:
T &operator[](int i) { return data[i]; }
};
Code: Select all
int main() { char c; int a = 0xffff0000; c = a; return c; }
Last edited by Rusky on Wed Dec 14, 2011 8:28 pm, edited 1 time in total.
Re: Why are ASM hobby OS more successful than other language
Brendan's challenge is somewhat biased because ACPI is designed for C, and its ABI would likely be different if another language had been dominant when it was written (yes, this is an advantage of C, but we all agree on that point). I also don't have an existing language to use so the syntax will probably be pretty rough and not everything will be implemented the best way possible, but I'll give it a shot:
Code: Select all
// this implements a rudementary form of inheritance polymorphism and/or algebraic data types
// depending on your preference, these could be in the language so long as they have a well-defined abi
// this would probably decrease the amount of casting, which would be good
acpi_header: struct = {
signature: char[4]
length: uint32
revision: byte
checksum: byte
oem_id: char[6]
oem_table_id: char[8]
oem_revision: uint32
creator_id: char[4]
creator_revision: char[4]
}
acpi_rsdt: struct = {
header: acpi_header = { signature = "RSDT" }
// this is not the most elegant solution and is probably one thing I would change
tables: acpi_header*[header.length / 4 - sizeof header / 4]
}
acpi_madt: struct = {
header: acpi_header = {
signature = "APIC"
revision = 3
}
lapic: apic_lapic*
flags: uint32
controllers: acpi_madt_header*[header.length / 4 - sizeof header / 4]
}
acpi_madt_header: struct = {
type: byte
length: byte
}
acpi_madt_lapic: struct = {
header: acpi_madt_header = {
type = 0
length = sizeof acpi_madt_ioapic
}
processor_id: byte
apic_id: byte
flags: uint32
}
acpi_madt_ioapic: struct = {
header: acpi_madt_header = {
type = 1
length = sizeof acpi_madt_ioapic
}
ioapic_id: byte
reserved: byte = 0
address: apic_ioapic*
interrupt_start: uint32
}
// the bootloader is responsible for providing an optional pointer to the rsdt
// it must conform to the language's abi, potentially exactly the same as a C acpi_rsdt*
// the memory pointed to, if not null, must be verified or trusted by the bootloader to conform to the type above
kernel_entry: (..., optional_rsdt: acpi_rsdt?, ...) -> () = {
...
// before dereferencing optional_rsdt, it must be null-checked
// if the pointer were not optional, it could be typed "acpi_rsdt*", again with the bootloader responsible for self-verification and abi compatibility
match (optional_rsdt) {
real_rsdt: acpi_rsdt* -> parse_acpi_tables(real_rsdt)
_ -> /* no rsdt */
}
...
}
checksum: (contents: byte[len]*, checksum: byte): (bool) = {
// not sure this is the correct way to calculate the checksum, but it's illustrative enough...
// bytes is a pointer to an array of bytes, inferred to be the size of an acpi_rsdt
// sum is a standard, generic function of type array of addables to that same type
checksum: byte = sum(bytes)
return checksum == 0
}
parse_acpi_tables: (rsdt: acpi_rsdt*) -> (results, errors, etc) = {
assert(checksum(byte[]*(rsdt)))
// straightforward enough, just note that print has varargs of type printable*
print("acpi oemid: "&, oem_id&, "\n"&)
pass: bool = true
for (table in rsdt->tables) { // this could be a macro, or whatever else you want
match (checksum(byte[table->length]*(table))) {
true -> print(table->signature&, " passed\n"&)
false -> {
print(table->signature&, " failed\n"&)
pass = false
}
}
}
// based on your wording I think this is what you meant, but this could be folded into the above loop instead
if (pass) {
for (table in rsdt.tables) {
// matching on strings is possible because they're just first-class array values
match (table->signature) {
"APIC" -> parse_madt(acpi_madt*(table))
...
}
}
}
...
}
parse_madt: (madt: acpi_madt*) -> (ditto) = {
...
for (controller in madt->controllers) {
// this is a place where algebraic data types/dynamic_cast-equivalent would help
// instead of matching on the type using symbols, it would match based on type, like:
// lapic: acpi_madt_lapic* -> ...
// ioapic: acpi_madt_ioapic* -> ...
match (controller->header.type) {
// these should be symbolic constants.
0 -> print("lapic "&, acpi_madt_lapic*(controller)->apic_id&, "\n"&)
1 -> print("ioapic "&, acpi_madt_ioapic*(controller)->ioapic_id&, "\n"&)
...
}
}
...
}
Re: Why are ASM hobby OS more successful than other language
Hi,
I also don't think that the ability to handle arrays/lists of mixed types is realistic; especially for "foreign" structures where the compiler can't insert hidden fields to make it easier (e.g. turn it into a linked list of "union { struct; struct; struct;}"). It's the sort of feature that sounds easy to do until you attempt to implement a compiler that does it. Perhaps this is why you couldn't find an existing language?
Cheers,
Brendan
It's just parsing data. You'd have similar problems parsing most (non-text) file formats, except that "pointer to something" would be "offset of something within file" (and you'd still need type casts, just they'd be like "my_type *foo = (my_type *)&byte_array[offset]" instead).Rusky wrote:Brendan's challenge is somewhat biased because ACPI is designed for C, and its ABI would likely be different if another language had been dominant when it was written (yes, this is an advantage of C, but we all agree on that point).
This code is like C, except that:Rusky wrote:I also don't have an existing language to use so the syntax will probably be pretty rough and not everything will be implemented the best way possible, but I'll give it a shot:
- some keywords were renamed (e.g. "match" instead of "switch")
- semicolons were replaced with end of line, and a few other trivial changes in syntax
- the order things appear in declerations/definitions was changed (e.g. "checksum: (contents: byte[len]*, checksum: byte): (bool)" instead of "bool checksum(byte[len] *contents, byte checksum)"
- you stole "cout" from C++ (and renamed it to "print")
- there's support for returning multiple values from functions
- the compiler is able to make (potentially wrong) assumptions about the sizes of arrays (e.g. "checksum(byte[]*(rsdt))")
- the compiler is able to handle arrays of mixed types (e.g. "for (controller in madt->controllers)")
I also don't think that the ability to handle arrays/lists of mixed types is realistic; especially for "foreign" structures where the compiler can't insert hidden fields to make it easier (e.g. turn it into a linked list of "union { struct; struct; struct;}"). It's the sort of feature that sounds easy to do until you attempt to implement a compiler that does it. Perhaps this is why you couldn't find an existing language?
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re: Why are ASM hobby OS more successful than other language
Yes, but which data? You can specify sizes and such in a variety of different ways, and using e.g. a length in table entries rather than a byte length or null terminator makes it nicer for this imaginary language.Brendan wrote:It's just parsing data.
This was mostly to emphasize the difference from C. By changing the little things it makes it obvious it's not the same, at a glance. What did you expect?Brendan wrote:This code is like C, except that:
- some keywords were renamed (e.g. "match" instead of "switch")
- semicolons were replaced with end of line, and a few other trivial changes in syntax
- the order things appear in declerations/definitions was changed (e.g. "checksum: (contents: byte[len]*, checksum: byte): (bool)" instead of "bool checksum(byte[len] *contents, byte checksum)"
- you stole "cout" from C++ (and renamed it to "print")
- there's support for returning multiple values from functions
However, note that match is a little bit more flexible in that it doesn't switch strictly on value but also on things like the presence of a value in an optional type. It's pattern matching, borrowed from functional programming btw.
My example does not use heterogeneous arrays, only pointers to "base classes." Did I misunderstand the ACPI spec? If so, and the controllers are in-line, try this:Brendan wrote:...
- the compiler is able to handle arrays of mixed types (e.g. "for (controller in madt->controllers)")
I also don't think that the ability to handle arrays/lists of mixed types is realistic; especially for "foreign" structures where the compiler can't insert hidden fields to make it easier (e.g. turn it into a linked list of "union { struct; struct; struct;}"). It's the sort of feature that sounds easy to do until you attempt to implement a compiler that does it. Perhaps this is why you couldn't find an existing language?
Code: Select all
acpi_madt: struct = {
...
controllers: byte[header.length / 4 - (sizeof header + sizeof lapic + sizeof flags) / 4]
}
Code: Select all
for (
controller: acpi_madt_header* = acpi_madt_header*(madt->controllers&);
controller <= madt->controllers[-1]&; // since array bounds are checked, use negative indices to index from the end- could be other options
controller = acpi_madt_header*(byte*(controller) + controller->header.length) // yet another place where a modified abi would be nice
) {
...
}
How is that potentially wrong? Given a correct RSDT, which is the firmware and/or bootloader's responsibility, the program knows the size of the table by reading rsdt->header.length (since the size of the rsdt is based on it). It's just more implicit than you may be used to.Brendan wrote:
- the compiler is able to make (potentially wrong) assumptions about the sizes of arrays (e.g. "checksum(byte[]*(rsdt))")
Considering I probably misunderstood the MADT layout, your question should be addressed at this point. As for how the compiler responds to problems, my preference would be a compile-time error. For example, if inside that new for loop above, controller is cast to something (acpi_madt_header* even) that could permit access outside the bounds of controllers, the programmer has to add a check themselves before dereferencing it. Because that check is necessary anyway if you're willing to expect incorrectly-sized tables, this is just the compiler and language semantics reminding you of potential problems before they come up in hard-to-find bugs at runtime.Brendan wrote:For the code itself, the amount of validation done is lacking. For example, what happens if the MADT is 200 bytes but has a 16-byte structure that starts at offset 198? What happens if the length/size of the RSDT is odd (and there's only space for part of a pointer at the end of the table of pointers)? Does the compiler automatically generate code to check these things; and if it does, how does it report errors? Should the entire thing by wrapped in "try/catch" exception handling?
However, note that this is a systems programming language and should make allowances for dealing with memory directly- if I were designing a managed language things would be more strict and things like parsing ACPI tables would probably be reduced to reading out of a byte array.
Re: Why are ASM hobby OS more successful than other language
Hi,
Note: support for returning multiple values from functions doesn't belong in the "change for the sake of change" list.
Actually, no. I don't understand the line above it ("real_rsdt: acpi_rsdt* -> parse_acpi_tables(real_rsdt)") which looks like it creates a variable called "read_rsdt" of type "acpi_rsdt*" and assigns the value returned by "parse_acpi_tables(real_rsdt)" to the variable. In C syntax it'd be "acpit_rsdt *read_rsdt = parse_acpi_tables(real_rsdt);", and you'd get a warning about using an uninitialised variable as the argument to "parse_acpi_tables()".
Wouldn't it make more sense to do "stop looping if the address of the "length" field in the controller structure would be past the end of the parent "acpi_madt" structure; and (if the first check passes and you can safely use the length field of the controller structure) also stop looping if the "controller start address plus controller length field" is beyond the size of the parent acpi_madt structure"?
Cheers,
Brendan
I would've avoided "change for the sake of change", and tried to use a syntax that 50% of forum members would be more likely to understand, to avoid the need for lots of explanations/comments.Rusky wrote:This was mostly to emphasize the difference from C. By changing the little things it makes it obvious it's not the same, at a glance. What did you expect?Brendan wrote:This code is like C, except that:
- some keywords were renamed (e.g. "match" instead of "switch")
- semicolons were replaced with end of line, and a few other trivial changes in syntax
- the order things appear in declerations/definitions was changed (e.g. "checksum: (contents: byte[len]*, checksum: byte): (bool)" instead of "bool checksum(byte[len] *contents, byte checksum)"
- you stole "cout" from C++ (and renamed it to "print")
- there's support for returning multiple values from functions
Note: support for returning multiple values from functions doesn't belong in the "change for the sake of change" list.
Ah - now I understand. The underscore character on line 70 ("_ -> /* no rsdt */") is your renamed NULL.Rusky wrote:However, note that match is a little bit more flexible in that it doesn't switch strictly on value but also on things like the presence of a value in an optional type. It's pattern matching, borrowed from functional programming btw.
Actually, no. I don't understand the line above it ("real_rsdt: acpi_rsdt* -> parse_acpi_tables(real_rsdt)") which looks like it creates a variable called "read_rsdt" of type "acpi_rsdt*" and assigns the value returned by "parse_acpi_tables(real_rsdt)" to the variable. In C syntax it'd be "acpit_rsdt *read_rsdt = parse_acpi_tables(real_rsdt);", and you'd get a warning about using an uninitialised variable as the argument to "parse_acpi_tables()".
How does the "controller <= madt->controllers[-1]&;" make sense? Should I translate it into "stop looping if the starting address of the current controller is not lower than or equal to the starting address of the last entry in a heterogeneous array that isn't a heterogeneous array"?Rusky wrote:My example does not use heterogeneous arrays, only pointers to "base classes." Did I misunderstand the ACPI spec? If so, and the controllers are in-line, try this:Brendan wrote:...
- the compiler is able to handle arrays of mixed types (e.g. "for (controller in madt->controllers)")
I also don't think that the ability to handle arrays/lists of mixed types is realistic; especially for "foreign" structures where the compiler can't insert hidden fields to make it easier (e.g. turn it into a linked list of "union { struct; struct; struct;}"). It's the sort of feature that sounds easy to do until you attempt to implement a compiler that does it. Perhaps this is why you couldn't find an existing language?Code: Select all
acpi_madt: struct = { ... controllers: byte[header.length / 4 - (sizeof header + sizeof lapic + sizeof flags) / 4] }
Code: Select all
for ( controller: acpi_madt_header* = acpi_madt_header*(madt->controllers&); controller <= madt->controllers[-1]&; // since array bounds are checked, use negative indices to index from the end- could be other options controller = acpi_madt_header*(byte*(controller) + controller->header.length) // yet another place where a modified abi would be nice ) { ... }
Wouldn't it make more sense to do "stop looping if the address of the "length" field in the controller structure would be past the end of the parent "acpi_madt" structure; and (if the first check passes and you can safely use the length field of the controller structure) also stop looping if the "controller start address plus controller length field" is beyond the size of the parent acpi_madt structure"?
Given a correct RSDT? Nobody said it's a correct RSDT.Rusky wrote:How is that potentially wrong? Given a correct RSDT, which is the firmware and/or bootloader's responsibility, the program knows the size of the table by reading rsdt->header.length (since the size of the rsdt is based on it). It's just more implicit than you may be used to.Brendan wrote:
- the compiler is able to make (potentially wrong) assumptions about the sizes of arrays (e.g. "checksum(byte[]*(rsdt))")
I don't think that the ability to do compile time enforcement of explicit run-time checking is realistic either (another "sounds easy to do until you attempt to implement a compiler that does it" feature).Rusky wrote:As for how the compiler responds to problems, my preference would be a compile-time error. For example, if inside that new for loop above, controller is cast to something (acpi_madt_header* even) that could permit access outside the bounds of controllers, the programmer has to add a check themselves before dereferencing it. Because that check is necessary anyway if you're willing to expect incorrectly-sized tables, this is just the compiler and language semantics reminding you of potential problems before they come up in hard-to-find bugs at runtime.
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.