What do you think about managed code and OSes written in it?

Brendan · Post by **Brendan** » Wed Feb 04, 2015 6:02 am

Hi,

HoTT wrote:The general case is where both your index and your array size may be the result of computation possibly involving input.

Of course you can say that all your arrays are of length 255 and are indexed by an 8-bit integer type. But then you introduced a big limitation already, one that I don't think is practical.

If I said "all arrays are of length 256 and must be indexed by an integer type that is only capable of storing values from 0 to 255", then it's easy to prove there are no bounds problems, but it is too limited in practice.

What if I said "all arrays are of length N and must be indexed by an integer type that is only capable of storing values from 0 to N-1, for any value of N"?

Surely you can see this is just a minor extension to your 8-bit example. It's still easy to prove there are no bounds problems, and even though it's less limited, it's still too limited in practice.

Now, what if I said "all arrays of type T are of length Tn and must be indexed by an integer type that is only capable of storing values from 0 to Tn-1, for any value of Tn; but there can be many different array types each with their own independent Tn"?

Surely you can see this is just a minor extension to the previous minor extension of your 8-bit example. It's still easy to prove there are no bounds problems. Is it still too limited?

Finally, what if I said "all arrays of type T are of length Tn and must be indexed by an integer type that is either only capable of storing values from 0 to Tn-1 or can be proven to only hold a value from 0 to Tn-1, for any value of Tn; but there can be many different array types each with their own independent Tn"?

Surely you can see this is just another minor extension. It's still easy to prove there are no bounds problems. Is it still too limited?

Cheers,

Brendan

embryo · Post by **embryo** » Wed Feb 04, 2015 6:12 am

HoTT wrote:
You can use safe language, but it requires another parts of managed environment like garbage collector,
Garbage collection is not unique to managed environments. There have been GCs long before that term (managed X) got coined. But why do you need a garbage collector to avoid errors that relate to endianess? Could you give an example of an error and how it is avoided using a GC?

GC is part of a managed language. If there is no automatic memory management then the language in unmanaged. Once you have decided to use managed language (because of endiannes issues, for example), there is no choice but to use it with some memory management facilities like GC.

HoTT wrote:
No. The world has Java runtimes, that are implemented for many platforms. So, when anybody would use my platform, then the Java's platform independence will be given for free.
Not for free, someone has to implement that runtime.

If we look closely at the issue of the managed environment implementation then we can see one very exciting thing - the environment should be implemented just once and then it can be used by millions of developers. Even if the managed environment development process takes 1000 human years (and this number is intentionally overextended) we still have a very attractive efficiency of resource spendings. If we divide millions human-years, that were spent on development of Java programs, by the 1000 human years, then it is obvious, that total investments are just an invisible fraction of the total benefits. So, I can conclude that an implementation of a managed environment can be considered as "for free" if we look at the bigger picture.

HoTT · Post by **HoTT** » Wed Feb 04, 2015 6:14 am

Finally, what if I said "all arrays of type T are of length Tn and must be indexed by an integer type that is either only capable of storing values from 0 to Tn-1 or can be proven to only hold a value from 0 to Tn-1, for any value of Tn; but there can be many different array types each with their own independent Tn"?

Problem marked bold. It's what I said prior in the thread. If it cannot be proven, you'll have to insert a runtime check somewhere to make it provable.

Is it still too limited?

Depends. Say strings are just arrays of utf8 codeunits and I want to write an algorithm that checks for any possible string if it only contains valid utf8.

How would it be typed? I get the feeling that you'll want to introduce dependent types. I think it's to cumbersome to please a dependent type system for every use case. I'd rather see a simple type system that does not account for something like that and a additional layer of checking that uses http://www.di.ens.fr/~cousot/AI/IntroAbsInt.html to provide those lengths checks. I think that Cosout proved somewhere that current type systems and abstract interpretation can express each other, so it's not really different, just another approach.

HoTT · Post by **HoTT** » Wed Feb 04, 2015 6:29 am

GC is part of a managed language. If there is no automatic memory management then the language in unmanaged. Once you have decided to use managed language (because of endiannes issues, for example), there is no choice but to use it with some memory management facilities like GC.

What not follows is that

a language with a GC is managed
endianess problems can only be solved in a managed language.

Icee · Post by **Icee** » Wed Feb 04, 2015 6:32 am

@HoTT

Your reference to Cousot's abstract semantics is well in place. I'd say that the main difference between the static N case and the dynamic N case (array size may be the result of computation possibly involving input) is how the actual dataflow analysis lattice is organized. In the former case, it's trivial. The latter, however, seems to be quite challenging. I suspect looking into affine inequality constraints may be a good thing here.

A quick googling suggests this thesis. Haven't read it thoroughly yet, though.

embryo · Post by **embryo** » Wed Feb 04, 2015 6:51 am

Brendan wrote:I'm thinking of an unmanaged language combined with a managed environment for developers to use for debugging, like C and valgrind (but not C or valgrind); and I'm not thinking of a managed language (and managed environment) that wraps programmers in bubble wrap and prevents them from doing their job properly.

Well, here you refuse to accept "a bubble wrap" because it "prevents them from doing their job properly". But in the following:

Brendan wrote:If DEFLATE was implemented as a service; then the cost of my messaging is around 1000 cycles per message (for a max. message size of 2 MiB; assuming both sender and receiver are on the same computer). To send 1 GiB of data to a "DEFLATE service" and receive 2 GiB of data back (assuming a 2:1 compression ratio) it's going to need to be split into 1536 messages and cost about 1536000 cycles for all sending and receiving. On an ancient 25 MHz 80486 (the oldest/slowest/worst CPU I intend to support) that works out to about 62 ms. Compared to the cost of compressing/decompressing the data, the overhead of messaging is almost entirely irrelevant. If it happened regularly nobody would care.

You have accepted another kind of "bubble wrap", but now it is called "services" with message passing related overhead. And you are eager to argue that in some cases the overhead is acceptable, but you refuse to listen to arguments about another "bubble wrap" and it's related pretty acceptable overhead. So, I conclude that it is just your religion, that prevents you from accepting the same thing when it is introduced by somebody else but you. But hopefully, the religion canons can be relaxed in the future and you finally will accept managed approach as efficient enough to compensate any involved overhead with really useful features.

Brendan wrote:How do you convince end users to switch from existing OSs (that are more mature - faster, less buggy and with more drivers) to your OS, when the applications run on both OSs? "Same but worse" is unlikely to be an good marketing strategy.

It's more efficient. And it's users (and developers) are more productive.

Brendan wrote:
embryo wrote:
Brendan wrote:I also want a language where unit testing is an inbuilt feature; and an IDE that automatically runs the unit tests in the background and highlights things that failed their unit tests while you type. You probably haven't thought about this, but I think it's important to detect all the bugs that can't be detected automatically (with or without run-time tests).
Here I miss the point of unit tests. Who will write them? If it is a developer, then such "inbuilt" feature works just as an ordinary unit test runner program. I mean I see no difference between the "inbuilt" and traditional approaches.
The most obvious benefit is integration - e.g. having mistakes highlighted for you while you're working on the source code in the IDE (and not having additional hassle, or steps to accidentally forget, or having to manually translate the unit test results into a location in the source code).

The less obvious benefit is that it provides "default scenarios" for debugging. For example, maybe there's a problem with an "isNumberPrime()" function buried deep within the program, and instead of running the entire program in the debugger you can just run an existing unit test in the debugger without all the unnecessary baggage.

The approach (as you describe it) was implemented long ago with the help of different plugins for different IDEs. But it was found inconvenient to run unit tests concurrently with code writing activity. It slows an IDE and produces permanent distraction for a developer because the code is just not ready to be unit tested. So, as a final solution a button "run unit tests" was chosen.

Brendan wrote:but I'm using asynchronous messaging where the service may be running on a completely different computer and there's some other things that come may into it.

May be such approach can draw some benefits. It's kin in form of web-services is accepted by many developers and still is in active development. But the overhead issue still here and another part of developers still refuse to accept such approach as efficient. But it seems that here we have more religion than actually viable statistics or similar numbers, so, it is still an open issue (just as managed vs unmanaged for you and me).

Brendan wrote:How does the managed environment "manage the low level parts" (e.g. prevent intentionally malicious assembly language code from doing unsafe things)?

It doesn't prevent all forms of malicious behavior. But it manages such code in the following way - it detects such code fragments, next it notifies user that provided application will be run in dangerous mode, next it accepts user's decision, next it compiles managed and unmanaged parts of the program, next it runs resulting program under hardware protection (and prevents some malicious activity from being possible). And all specified actions are acts of management, so we can see the management is still in place.

Brendan wrote:I think you'll find that for all the security vulnerabilities most would've been prevented by a better unmanaged language and the remainder would not have been prevented by a managed language.

Oh, Brendan, please, just try to think a bit out of the box, just let yourself to see that there are many ways to implement different things and there is no absolute winner just because requirements are always different too. And it is about your personal assessment of a weight of requirements that differs from assessment of another person. But if you remember that assessments are mostly based on some subtle human's internal argumentation, then it should be obvious for you that here we are discussing mostly religious issue of some assessments being more preferred than others.

embryo · Post by **embryo** » Wed Feb 04, 2015 6:56 am

HoTT wrote:I think it's to cumbersome to please a dependent type system for every use case

There are examples of such (and very usable) type systems. In Java it is ArrayList class. It holds an array of a suitable size and changes the array every time the size is found to be unsuitable.

embryo · Post by **embryo** » Wed Feb 04, 2015 7:00 am

HoTT wrote:What not follows is that

a language with a GC is managed

endianess problems can only be solved in a managed language.

You can set the claims from above as your goal and next to implement it, but would it be an efficient approach?

If you have to fight one problem it doesn't mean that you should refuse to fight all other possible problems. It means if you are fighting endiannes with safe language then it can be much more efficient, than if you would fight the endiannes only.

HoTT · Post by **HoTT** » Wed Feb 04, 2015 7:13 am

embryo wrote:
HoTT wrote:I think it's to cumbersome to please a dependent type system for every use case
There are examples of such (and very usable) type systems. In Java it is ArrayList class. It holds an array of a suitable size and changes the array every time the size is found to be unsuitable.

With dependent type systems I mean this. Javas ArrayList does not qualify, it's length is not part of its type.

You can set the claims from above as your goal and next to implement it, but would it be an efficient approach?

If you have to fight one problem it doesn't mean that you should refuse to fight all other possible problems. It means if you are fighting endiannes with safe language then it can be much more efficient, than if you would fight the endiannes only.

I don't know what you're aiming at. But your constantly attributing advantages that some managed ecosystems/evironments have to the fact that they are managed. Which is not true.

And I'll point out again that while managed might imply safe, safe does not imply managed. Same for GC.

What was the problem with endianess again?

Rusky · Post by **Rusky** » Wed Feb 04, 2015 9:43 am

Brendan wrote:If DEFLATE was implemented as a service; then the cost of my messaging is around 1000 cycles per message (for a max. message size of 2 MiB; assuming both sender and receiver are on the same computer). To send 1 GiB of data to a "DEFLATE service" and receive 2 GiB of data back (assuming a 2:1 compression ratio) it's going to need to be split into 1536 messages and cost about 1536000 cycles for all sending and receiving. On an ancient 25 MHz 80486 (the oldest/slowest/worst CPU I intend to support) that works out to about 62 ms. Compared to the cost of compressing/decompressing the data, the overhead of messaging is almost entirely irrelevant. If it happened regularly nobody would care.

However; it won't happen regularly either. PNG files would be converted into the OS's native file format as soon as possible (preferably, before they're stored in the OS's file system) and never converted from the OS's native file format back into PNG. None of the OS's native file formats use DEFLATE (including the OS's native "compressed file format").

Neither of these assumptions hold true for all uses of DEFLATE. Dealing with files is not particularly latency-sensitive, but on-the-fly compression/decompression as part of a network protocol (a multiplayer game? real-time data stream of some sort?), or compressed-in-RAM virtual memory (as opposed to storing it on disk), or other high-performance uses of compression, might not work out so well.

Your signature (which I assume you'd apply here) says scalable slow code can beat unscalable fast code, but scaling goes both directions and at different granularities. I will preemptively agree with you that overly-generic code is harmful, but there will always be pieces of code that need to be shared with applications that can't handle the latency of potentially-networked message passing.

Brendan · Post by **Brendan** » Wed Feb 04, 2015 9:57 am

Hi,

HoTT wrote:
Finally, what if I said "all arrays of type T are of length Tn and must be indexed by an integer type that is either only capable of storing values from 0 to Tn-1 or can be proven to only hold a value from 0 to Tn-1, for any value of Tn; but there can be many different array types each with their own independent Tn"?
Problem marked bold. It's what I said prior in the thread. If it cannot be proven, you'll have to insert a runtime check somewhere to make it provable.

I don't know why you think it's a problem - it's not.

Code: Select all

int32_t foo(uint8_t x) {
                           // the value in 'x' must range from 0 to 255 here
    uint32_t y = 4;        // the value in 'y' must range from 4 to 4 here
    uint32_t z = 0;        // the value in 'z' must range from 0 to 0 here

    y += x;                // the value in 'y' must range from 4 to 259 here
    if(y < 100) {
                           // the value in 'y' must range from 0 to 99 here
       z = x/2;            // the value in 'z' must range from 0 to 127 here
    } else {
                           // the value in 'y' must range from 100 to 259 here
       z = x + y;          // the value in 'z' must range from 100 to 514 here
    }
                           // the value in 'z' must range from 0 to 514 here
    if( (z & 1) == 1) {
                           // the value in 'z' must range from 0 to 514 here
        z = z & 0xFF;      // the value in 'z' must range from 0 to 255 here
        y = y ^ foo(z);    // the value in 'y' must range from 0 to 0xFFFFFFFF here
    }
                           // the value in 'z' must range from 0 to 514 here
                           // the value in 'y' must range from 0 to 0xFFFFFFFF here
    while(z > 0) {
                           // the value in 'z' must range from 0 to 514 here
       z--;                // the value in 'z' must range from 0 to 513 here
                           // the value in 'y' must range from 0 to 0xFFFFFFFF here
       y = y & 1234;       // the value in 'y' must range from 0 to 1234 here
    }
                           // the value in 'y' must range from 0 to 0xFFFFFFFF here
    return y;
}

HoTT wrote:
Is it still too limited?
Depends. Say strings are just arrays of utf8 codeunits and I want to write an algorithm that checks for any possible string if it only contains valid utf8.

How would it be typed?

Where does the data come from? How is it used (what are the access patterns)? What are the string's max. size? Will you be inserting/deleting data from the middle?

Cheers,

Brendan

Brendan · Post by **Brendan** » Wed Feb 04, 2015 10:32 am

Hi,

embryo wrote:
Brendan wrote:I'm thinking of an unmanaged language combined with a managed environment for developers to use for debugging, like C and valgrind (but not C or valgrind); and I'm not thinking of a managed language (and managed environment) that wraps programmers in bubble wrap and prevents them from doing their job properly.
Well, here you refuse to accept "a bubble wrap" because it "prevents them from doing their job properly". But in the following:
Brendan wrote:If DEFLATE was implemented as a service; then the cost of my messaging is around 1000 cycles per message (for a max. message size of 2 MiB; assuming both sender and receiver are on the same computer). To send 1 GiB of data to a "DEFLATE service" and receive 2 GiB of data back (assuming a 2:1 compression ratio) it's going to need to be split into 1536 messages and cost about 1536000 cycles for all sending and receiving. On an ancient 25 MHz 80486 (the oldest/slowest/worst CPU I intend to support) that works out to about 62 ms. Compared to the cost of compressing/decompressing the data, the overhead of messaging is almost entirely irrelevant. If it happened regularly nobody would care.
You have accepted another kind of "bubble wrap", but now it is called "services" with message passing related overhead. And you are eager to argue that in some cases the overhead is acceptable, but you refuse to listen to arguments about another "bubble wrap" and it's related pretty acceptable overhead. So, I conclude that it is just your religion, that prevents you from accepting the same thing when it is introduced by somebody else but you. But hopefully, the religion canons can be relaxed in the future and you finally will accept managed approach as efficient enough to compensate any involved overhead with really useful features.

The main reason I'm going for "services" is for scalability (e.g. being able to use many CPUs in many separate computers in a distributed system). It's not about "bubble wrap" at all.

embryo wrote:
Brendan wrote:How do you convince end users to switch from existing OSs (that are more mature - faster, less buggy and with more drivers) to your OS, when the applications run on both OSs? "Same but worse" is unlikely to be an good marketing strategy.
It's more efficient. And it's users (and developers) are more productive.

So, the exact same Java application will run more efficiently on your OS than it does on both Windows and Linux? That's nice...

embryo wrote:
Brendan wrote:
embryo wrote:Here I miss the point of unit tests. Who will write them? If it is a developer, then such "inbuilt" feature works just as an ordinary unit test runner program. I mean I see no difference between the "inbuilt" and traditional approaches.
The most obvious benefit is integration - e.g. having mistakes highlighted for you while you're working on the source code in the IDE (and not having additional hassle, or steps to accidentally forget, or having to manually translate the unit test results into a location in the source code).

The less obvious benefit is that it provides "default scenarios" for debugging. For example, maybe there's a problem with an "isNumberPrime()" function buried deep within the program, and instead of running the entire program in the debugger you can just run an existing unit test in the debugger without all the unnecessary baggage.
The approach (as you describe it) was implemented long ago with the help of different plugins for different IDEs. But it was found inconvenient to run unit tests concurrently with code writing activity. It slows an IDE and produces permanent distraction for a developer because the code is just not ready to be unit tested. So, as a final solution a button "run unit tests" was chosen.

I sit in a room surrounded by a pool of 25+ computers on a LAN. Almost all of them are idle almost all of the time. Are you suggesting there isn't enough processing power here for one computer to be editing text while 10 other computers do unit tests?

Worst case (e.g. a single old 80486 and nothing else) is that the unit tests take much longer to complete, but otherwise there's no user visible performance difference (because they'd be running as low priority threads).

Of course there would be some logic built in. E.g. there's no point running the unit tests while the source is constantly being modified - you have some sort of time delay (if the code the unit test relies on hasn't been modified for 60+ seconds, then start the unit test). You also wouldn't repeat tests if the pieces of source code they test weren't modified since last time.

embryo wrote:
Brendan wrote:How does the managed environment "manage the low level parts" (e.g. prevent intentionally malicious assembly language code from doing unsafe things)?
It doesn't prevent all forms of malicious behavior. But it manages such code in the following way - it detects such code fragments, next it notifies user that provided application will be run in dangerous mode, next it accepts user's decision, next it compiles managed and unmanaged parts of the program, next it runs resulting program under hardware protection (and prevents some malicious activity from being possible). And all specified actions are acts of management, so we can see the management is still in place.

In that case, it'll be like UAC in Vista, where everything caused the dialog box to pop up and users clicked on it without thinking (until they found a way to disable UAC).

Cheers,

Brendan

AndrewAPrice · Post by **AndrewAPrice** » Wed Feb 04, 2015 10:47 am

It comes back to - what is your definition of managed?

Ultimately, what are you trying to accomplish?

Are you interested in creating applications that may run in MMU-less environments and need a way to verify that they can't touch another process's memory? (By forcing then to be distributed in a bytecode or source representation that lacks a way to represent arbitrary memory access?)

Do you want your applications distributed in a portable bytecode format so that they can run on any architecture?

A language being "safe", automatic or manual memory, strict/loose/dynamic typing, etc. is a feature of the language. You can have any combination of these regardless of if your language runs natively or inside of a virtual machine. In fact, an ideal virtual machine should be able to express all of them.

So, what are you trying to accomplish?

Brendan · Post by **Brendan** » Wed Feb 04, 2015 11:00 am

Hi,

Rusky wrote:
Brendan wrote:If DEFLATE was implemented as a service; then the cost of my messaging is around 1000 cycles per message (for a max. message size of 2 MiB; assuming both sender and receiver are on the same computer). To send 1 GiB of data to a "DEFLATE service" and receive 2 GiB of data back (assuming a 2:1 compression ratio) it's going to need to be split into 1536 messages and cost about 1536000 cycles for all sending and receiving. On an ancient 25 MHz 80486 (the oldest/slowest/worst CPU I intend to support) that works out to about 62 ms. Compared to the cost of compressing/decompressing the data, the overhead of messaging is almost entirely irrelevant. If it happened regularly nobody would care.

However; it won't happen regularly either. PNG files would be converted into the OS's native file format as soon as possible (preferably, before they're stored in the OS's file system) and never converted from the OS's native file format back into PNG. None of the OS's native file formats use DEFLATE (including the OS's native "compressed file format").
Neither of these assumptions hold true for all uses of DEFLATE. Dealing with files is not particularly latency-sensitive, but on-the-fly compression/decompression as part of a network protocol (a multiplayer game? real-time data stream of some sort?), or compressed-in-RAM virtual memory (as opposed to storing it on disk), or other high-performance uses of compression, might not work out so well.

The best solution for latency-sensitive things is unlikely to be DEFLATE - compression times aren't great.

Rusky wrote:Your signature (which I assume you'd apply here) says scalable slow code can beat unscalable fast code, but scaling goes both directions and at different granularities. I will preemptively agree with you that overly-generic code is harmful, but there will always be pieces of code that need to be shared with applications that can't handle the latency of potentially-networked message passing.

There's definitely pieces of code where the (potential) network latency is unacceptable. Some are tiny (easier to re-implement than to find a pre-existing implementation), some are too small to matter ("cut & paste" is good enough), some are overly-general, some I don't care about (XML parsing).

I can't think of anything that doesn't fit in one of those categories. This doesn't mean there are none; but does imply it won't be common enough to be important.

Cheers,

Brendan

Rusky · Post by **Rusky** » Wed Feb 04, 2015 11:41 am

Brendan wrote:Finally, what if I said "all arrays of type T are of length Tn and must be indexed by an integer type that is either only capable of storing values from 0 to Tn-1 or can be proven to only hold a value from 0 to Tn-1, for any value of Tn; but there can be many different array types each with their own independent Tn"?

Surely you can see this is just another minor extension. It's still easy to prove there are no bounds problems. Is it still too limited?

That is definitely not a minor extension, as HoTT pointed out. Besides making the type system quite complicated, it has been proven that all possible type systems are incapable of proving array access to be in-bounds in every case, both because of user input and because of arbitrary computation.

The user-input problem is pretty trivial. You validate user inputs, and those validations are forced to exist by the type system because otherwise you'd be trying to (for example) access an array with an integer with a bigger range.

The arbitrary computation problem is much less trivial, and is why your type system will sometimes force more checks than just user-input validation. Luckily for you, all of these will have to be there anyway for correctness (like with my earlier example of iterators, which simply use the end-of-iteration check as the bounds check as well).

For example, let's say you're implementing a priority queue with a binary heap, stored as a complete tree in an array (it can even be fixed-size if you want). The type system cannot prove that the algorithms used on the heap never generate out-of-bounds array indices without forcing some run-time checks. Here's inserting/removing an item in a min-heap:

Code: Select all

fn insert(x) {
    int pos = ++size;

    // this check must be enforced or heap could be accessed past its end
    if (pos >= capacity)
        /* resize the array, or error */

    while (pos > 0 && x < heap[(pos - 1) / 2]) {
        heap[pos] = heap[(pos - 1) / 2];
        pos /= 2;
    }

    heap[pos] = x;
}

fn remove() {
    result = heap[0];
    heap[0] = heap[size--];

    int pos = 0;

    // this check must be enforced
    while (pos * 2 + 1 < size) {
        int child = pos * 2 + 1;

        // the first check here check must be enforced
        if (child + 1 < size && heap[child] > heap[child + 1])
            child = child + 1;

        if (heap[pos] < heap[child])
            return;

        swap(heap[pos], heap[child]);
        pos = child;
}

If this is what you meant by "all array bounds are checked statically," then I guess we all agreed from the start. But whatever you meant, the type system must enforce run-time checks or risk exposing crashes and vulnerabilities in the program.

OSDev.org

What do you think about managed code and OSes written in it?

Re: What do you think about managed code and OSes written in

Re: What do you think about managed code and OSes written in

Re: What do you think about managed code and OSes written in

Re: What do you think about managed code and OSes written in

Re: What do you think about managed code and OSes written in

Re: What do you think about managed code and OSes written in

Re: What do you think about managed code and OSes written in

Re: What do you think about managed code and OSes written in

Re: What do you think about managed code and OSes written in

Re: What do you think about managed code and OSes written in

Re: What do you think about managed code and OSes written in

Re: What do you think about managed code and OSes written in

Re: What do you think about managed code and OSes written in

Re: What do you think about managed code and OSes written in

Re: What do you think about managed code and OSes written in