Null pointers and pointer safety

Discussions on more advanced topics such as monolithic vs micro-kernels, transactional memory models, and paging vs segmentation should go here. Use this forum to expand and improve the wiki!
LtG
Member
Member
Posts: 384
Joined: Thu Aug 13, 2015 4:57 pm

Re: Null pointers and pointer safety

Post by LtG »

Sik wrote:How do you enforce a verifiable representation if your OS takes native binaries though? The whole point I was making is that a native binary could have been written in just about any language (or even different portions written in several languages) and the OS has absolutely no way to know, making any sort of enforcement impossible.
I'd say a bit difficult, but certainly possible.

If you exclude a few categories of apps (mainly self modifying code, which is frowned upon as is) it becomes easier. Not sure if this is exactly the same as the halting problem or just has significant overlap with it, but I'm pretty sure it really is doable to verify machine code.

As a simple example, consider translating assembly back into C, or C# or some other managed language where the verification is easier. I would guess that for most real world apps the outcome of verification would be either of the following:
- Verified ok
- Verified not ok; likely due to a bug; such bug would likely be a security vulnerability under normal circumstances; fix the bug and the code verifies ok

I'm not sure on the performance of such verification but likely it would be somewhat slow, probably at least several seconds if not minutes to days. However that's the developers problem (and cloud can help for release builds). Once verified the proof can then be attached to the binary and the end system only needs to evaluate the proof is valid for the binary given and even this need only be done once (install time) and after that you can just check hash.

As for developing such system, that might be complex and at least I'm not planning on doing it for the time being (not having enough time for OS dev as it is). =)

edit. I think the same can be done "recursively" for self modifying code, at least to some extent, but certainly becomes more complex.
User avatar
Rusky
Member
Member
Posts: 792
Joined: Wed Jan 06, 2010 7:07 pm

Re: Null pointers and pointer safety

Post by Rusky »

Sik wrote:How do you enforce a verifiable representation if your OS takes native binaries though? The whole point I was making is that a native binary could have been written in just about any language (or even different portions written in several languages) and the OS has absolutely no way to know, making any sort of enforcement impossible.

If your OS can run programs written in C in any form, you better get ready to deal with the risk of wrong addresses being accessed, because it will happen.
We're in agreement. All I was pointing out is that the problem is not "real addresses themselves as pointers." You do need more information than a raw x86/etc binary in order to enforce memory safety.
rdos
Member
Member
Posts: 3304
Joined: Wed Oct 01, 2008 1:55 pm

Re: Null pointers and pointer safety

Post by rdos »

Null values, or values near 0, can come from memory corruption errors, or from invalid retyping in C. That's why 0, but also addresses near 0, should give faults. I regard the whole first 4MBs as invalid in user-space.

As for techniques of pointer safety, and identifying problems is user code, I have a special user runtime library that will do malloc and delete in kernel-space instead of from the runtime heap. This feature will sparsely (skipping one page) allocate whole pages regardless of size of the request. It will then fill a special pattern in the part of the page that is beyond the allocated size, and check it when the memory block is freed. The tool will also detect double-frees, and using memory after freeing it. This is a tool I use to debug memory allocation problems in C.
StudlyCaps
Member
Member
Posts: 232
Joined: Mon Jul 25, 2016 6:54 pm
Location: Adelaide, Australia

Re: Null pointers and pointer safety

Post by StudlyCaps »

LtG wrote: I'd say a bit difficult, but certainly possible.
I can't do a proper proof, but I'm almost certain this is equivalent to the halting problem.
Take

Code: Select all

int i = func(); 
char c = *( ( char* ) i );
you need to know every possible return value of func() which is equivalent to the halting problem on func(), right?

Trying to translate back to a managed language is either impossible, or it actually makes no difference. Think about it, if there were a way to translate native code to a managed language then there must be a 1:1 binding for all possible operations, and if that were true, the complexity of analyzing the code is identical.
Icee
Member
Member
Posts: 100
Joined: Wed Jan 08, 2014 8:41 am
Location: Moscow, Russia

Re: Null pointers and pointer safety

Post by Icee »

StudlyCaps wrote:I can't do a proper proof, but I'm almost certain this is equivalent to the halting problem.
It indeed is, see Rice's theorem.
LtG
Member
Member
Posts: 384
Joined: Thu Aug 13, 2015 4:57 pm

Re: Null pointers and pointer safety

Post by LtG »

StudlyCaps wrote: I can't do a proper proof, but I'm almost certain this is equivalent to the halting problem.
Take

Code: Select all

int i = func(); 
char c = *( ( char* ) i );
you need to know every possible return value of func() which is equivalent to the halting problem on func(), right?
I think that's a different problem, our problem is more like:

Code: Select all

int func() {
char* c = (char*) mmap(SomeValidAddress, SomeNumberOfBytes);
c[0] = 'a';
return (int) c;
}

int i = func(); 
char c = *( ( char* ) i );
The difference being that if you can't see what "func()" does, then I think there's no way of knowing the outcome, however in practice the OS/validator must be able to see what "func()" unless "func()" doesn't exist, but if it doesn't exist then the caller would not be able call it either.

Note, I've never attempted to do this, so I don't know how impractical or slow it would become. Also it might be possible that only part of existing software could be verified as safe and the rest would be undecidable. Given how many bugs there are in software in general it might be that most software falls in to the latter category, though those parts in those programs could be made into "unsafe" functions which have bounds checking and other safety features (similar to emulation or JIT) and allowing much of the app run natively.

But realistically I'm not sure if any of that is worth the effort when designing a new language that doesn't suck would go a long way of solving that issue _and_ make programming in general a lot saner.
Korona
Member
Member
Posts: 1000
Joined: Thu May 17, 2007 1:27 pm
Contact:

Re: Null pointers and pointer safety

Post by Korona »

I doesn't matter if you can see what func() does. Turing machines can be encoded without self modifying code etc. The reduction to the halting problem still works.
managarm: Microkernel-based OS capable of running a Wayland desktop (Discord: https://discord.gg/7WB6Ur3). My OS-dev projects: [mlibc: Portable C library for managarm, qword, Linux, Sigma, ...] [LAI: AML interpreter] [xbstrap: Build system for OS distributions].
StudlyCaps
Member
Member
Posts: 232
Joined: Mon Jul 25, 2016 6:54 pm
Location: Adelaide, Australia

Re: Null pointers and pointer safety

Post by StudlyCaps »

That's the thing, any Turing complete language can construct a function which cannot be analyzed by another program. Not just that it will be slow, but that it will literally run forever without finishing.
This isn't something super rare either, most non-trivial functions will not be analyzable. Rice's theorem mentioned by Icee actually does formally prove this.

However most managed languages do actually have pretty great static analysis tools, but they can only catch super simple pointer problems. They mostly ask for null checks unless it is obvious that things aren't null when dereferenced. This is fine if you want to use them yourself, but pretty much impossible to enforce at the OS level.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Null pointers and pointer safety

Post by Brendan »

Hi,
Korona wrote:I doesn't matter if you can see what func() does. Turing machines can be encoded without self modifying code etc. The reduction to the halting problem still works.
Except, everyone is forgetting that we only care if the code is "safe" or "unsafe", and don't care if the code halts or not, and therefore the halting problem is irrelevant.

Consider this code:

Code: Select all

    do {
        x = rand() % 10;
    } while( x > 0 );
Is it safe or unsafe? It's trivial to prove that it is safe. Will it halt? Who cares.

Consider this code:

Code: Select all

    x = rand() % 10;
    y = 100 / x;
Is it safe or unsafe? It's trivial to prove that it's not safe. Will it halt? Who cares.

You can categorise any piece of code as "safe", "unsafe" or "undecided". With extremely simple analysis almost all code would be categorised as "undecided"; and as the analysis gets more complicated less code ends up being categorised as "undecided"; and with enough work/complexity you reach a point where you can just assume that "undecided" means "unsafe" (or define "safe" as "provably safe", and define "unsafe" as "not provably safe").

Essentially; with sufficiently advanced/complex analysis; for the "undecided" case you can lie and say the code is unsafe (even though you don't know if it's safe or unsafe), and the programmer that wrote the code won't be able to tell that you lied because the programmer that wrote the code won't be able to prove that it's safe either.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
goku420
Member
Member
Posts: 51
Joined: Wed Jul 10, 2013 9:11 am

Re: Null pointers and pointer safety

Post by goku420 »

The above argument made by Brendan is usually made by compiler developers (and for good reason) when people complain about lack of diagnostics, usually for catching non-trivial undefined behavior. There's a reason why the C++ specification says "no diagnostic required" in situations where it would be difficult or impossible for a vendor to reasonably produce an error. Templates and the ODR are classic examples.

I personally use youcompleteme which uses a Clang static analyzer back-end with vim. I'm sure somebody who is hardcore can port ubsan et al. for their OS. Personally, now that I've managed to port libstdc++, I can use libraries like range-v3 and GSL which help reduce a class of programming errors with misuse of pointers and containers.
alexfru
Member
Member
Posts: 1112
Joined: Tue Mar 04, 2014 5:27 am

Re: Null pointers and pointer safety

Post by alexfru »

Brendan wrote: Consider this code:

Code: Select all

    do {
        x = rand() % 10;
    } while( x > 0 );
Is it safe or unsafe? It's trivial to prove that it is safe. Will it halt? Who cares.
There are tasks where looping forever (or just taking too much time to complete) is unsafe. Right?
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Null pointers and pointer safety

Post by Brendan »

Hi,
alexfru wrote:There are tasks where looping forever (or just taking too much time to complete) is unsafe. Right?
No. If something is full of bugs, never does the right thing, and locks up when it shouldn't; then it can still be perfectly safe.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Icee
Member
Member
Posts: 100
Joined: Wed Jan 08, 2014 8:41 am
Location: Moscow, Russia

Re: Null pointers and pointer safety

Post by Icee »

Some replies here are missing an essential thing: in reality you don't get to analyze a snippet of code. You have to decide whether a given program (as in isolated service) is "safe" or not as a whole. This does not really boil down to proof by composition. A program composed of multiple "safe" pieces it not necessarily "safe". I am putting "safe" in irony quotes here because even providing the formal definition of what "safe" is is a huge task by itself, with PhD theses on this subject popping out every now and then.

Mathematical apparatus exists for whole-program analysis in form of lattices, transfer functions, fixpoint solution, etc. The key problem here is that you have to consider all possible paths in the program, i.e. analyze its control flow in its entirety (cf. MOP solution versus MFP solution), and this is where the halting problem comes into play. Anytime you are looking at interprocedural analysis you also probably want context sensitivity, again a huge thing that it hard to tune right (i.e. lower the amount of false positives while not giving rise to false negatives).

There are also verification techniques that employ automatic or automated proof construction, and that is even more complicated because you also have to provide formal semantics of the entire machine. You cannot simply analyze the C sources because of possible flaws introduced during compilation and linking (mostly due to undefined behavior in the source code). Multicore/multithread stuff comes into play, too. The result is that the specification you are verifying the code against is comparable in its size to the actual code, and therefore may contain errors of its own.

I could go on rambling on this topic forever, but what I'm really trying to say here is not that such analysis is practically impossible at all, but rather that it comes with a ton of limitations that may or may not work for a given scenario and are not to be taken lightly.
goku420
Member
Member
Posts: 51
Joined: Wed Jul 10, 2013 9:11 am

Re: Null pointers and pointer safety

Post by goku420 »

Icee wrote:I could go on rambling on this topic forever, but what I'm really trying to say here is not that such analysis is practically impossible at all, but rather that it comes with a ton of limitations that may or may not work for a given scenario and are not to be taken lightly.
You could argue that allowing undefined behavior to occur in the code (i.e a null pointer dereference) and trying to catch it via debugging classes (like "safe" iterators) is a pointless exercise since at the point of undefined behavior, the entire program becomes undefined behavior. And the danger is not that your harddrive will be reformatted (though in kernel code that could very well be the case) or you'll be attacked by nasal demons, but that the compiler is no longer bound by the specification and optimize your code in "unsafe" ways.

Further, the simple existence of the safe iterators can affect your code in unintended ways that will cause problems of its own; ubsan can corrupt the stack, libstdc++'s debug mode can make a program not crash by coincidence even if you've corrupted the stack and so on.

At the least static analysis tools don't touch the binaries; and no static analysis tool is pretending to not have false positives or be unlimited in scope. Still static analyzers have an advantage in that they won't have to compile/link the entire program to point out errors in a specific source file.
Icee
Member
Member
Posts: 100
Joined: Wed Jan 08, 2014 8:41 am
Location: Moscow, Russia

Re: Null pointers and pointer safety

Post by Icee »

goku420 wrote:At the least static analysis tools don't touch the binaries; and no static analysis tool is pretending to not have false positives or be unlimited in scope. Still static analyzers have an advantage in that they won't have to compile/link the entire program to point out errors in a specific source file.
I'm pretty much aware of that (mostly because developing program analysis tools has been my bread and butter for the last ten years), and this is exactly what I've been trying to convey here: it is unlikely for a static analyzer to be developed for allowing "safe" code through that at the same time:
(a) has no false negatives under a given additional set of limitations imposed on the program;
(b) has small amount of predictable false positives so that it's actually possible to restructure the program in such a way that is passes;
(c) the imposed limitations still allow to implement every program the system requires.

Static analysis undoubtedly makes software better but the thread strikes me as an attempt to say that static analysis is sufficient to mark generic binary code "safe" which I tend to doubt.
Post Reply