Optimising JIT'ing
- AndrewAPrice
- Member
- Posts: 2303
- Joined: Mon Jun 05, 2006 11:00 pm
- Location: USA (and Australia)
Optimising JIT'ing
I've been thinking about a optimizing JIT-implementation.
Each executable file includes 2 copies. One in bytecode and (optionally) one in native code. When a program is compiled, the compiler generates bytecode, and also attaches a list of CPU optimisations that the program can take advantage of.
When the program is ran, the OS will look at the list of optimisations that can be done to the program, and see which ones are supported (like certain registers, instruction sets, ways to re-arrange code to run faster) by the processor/system and then optimise the bytecode into native code for that system. This will probably take a while in large executables which is why the ending native program could also be stored back in the executable.
When a native program is built and stored back into the executable, there will also be a list of processor optimizations (along with the architect of the system, etc) that have been done on the executable.
The next time you run an executable, it will see that a native version exists. The OS will look at the list of system-specific optimisations that have been done on the executable and if one isn't supported on the system or an optimisation is available that hasn't yet been done, and if there are any then the native image will be rebuilt - which solves compatibility with transferring programs between systems.
Of course, there will be ways to run a program without optimising it (direct bitcode to native) and without storing the native version back into the program (default when running over a read-only or network file system).
Why do I think this is worth while? Because instruction sets like SSE2/3/4/5 are constantly being developed, and not all processes support these. So if a program was optimised on each system to take advantage of what instruction sets were available, you could increase program performance on new processors while maintaining compatibility on older systems.
Each executable file includes 2 copies. One in bytecode and (optionally) one in native code. When a program is compiled, the compiler generates bytecode, and also attaches a list of CPU optimisations that the program can take advantage of.
When the program is ran, the OS will look at the list of optimisations that can be done to the program, and see which ones are supported (like certain registers, instruction sets, ways to re-arrange code to run faster) by the processor/system and then optimise the bytecode into native code for that system. This will probably take a while in large executables which is why the ending native program could also be stored back in the executable.
When a native program is built and stored back into the executable, there will also be a list of processor optimizations (along with the architect of the system, etc) that have been done on the executable.
The next time you run an executable, it will see that a native version exists. The OS will look at the list of system-specific optimisations that have been done on the executable and if one isn't supported on the system or an optimisation is available that hasn't yet been done, and if there are any then the native image will be rebuilt - which solves compatibility with transferring programs between systems.
Of course, there will be ways to run a program without optimising it (direct bitcode to native) and without storing the native version back into the program (default when running over a read-only or network file system).
Why do I think this is worth while? Because instruction sets like SSE2/3/4/5 are constantly being developed, and not all processes support these. So if a program was optimised on each system to take advantage of what instruction sets were available, you could increase program performance on new processors while maintaining compatibility on older systems.
My OS is Perception.
sounds like a pretty cool concept, but surely this will add a significant amount of load time or lag on the program when its running? i know this will only be for the first time on each system, but it could be a significant period of time in the case of large programs.
Also, this will increase the actual size of the file by a fair amount, may not be an issue, as HDD prices are constantly coming down, but still, taking the argument of large programs again, could mean that say a 500mb binary is now 800/900 or even a gig.
Also, this will increase the actual size of the file by a fair amount, may not be an issue, as HDD prices are constantly coming down, but still, taking the argument of large programs again, could mean that say a 500mb binary is now 800/900 or even a gig.
- AndrewAPrice
- Member
- Posts: 2303
- Joined: Mon Jun 05, 2006 11:00 pm
- Location: USA (and Australia)
The argument list will be like simple, like attached to the bytecode:
- This program will benefit with hyperthreading.
- This program will benefit with SSE.
And attached to the native code:
- This program has been compiled for a x86 system.
- This program utilises SSE.
And the OS will compare what features are/are-not available on that system and rebuild the program as necessary.
A 500MB binary would barely have 10KB of executable code and the other would be 499.9MB embedded resources that wouldn't even need to be touched by the kernel.
- This program will benefit with hyperthreading.
- This program will benefit with SSE.
And attached to the native code:
- This program has been compiled for a x86 system.
- This program utilises SSE.
And the OS will compare what features are/are-not available on that system and rebuild the program as necessary.
A 500MB binary would barely have 10KB of executable code and the other would be 499.9MB embedded resources that wouldn't even need to be touched by the kernel.
My OS is Perception.
ah, that makes sense about the executable vs. included resources sizes, and although the argument of size is still valid, it will make much less impact.
In that case, i think its an even cooler concept, and especially if you can cut down the recompile (Conversion to native code) time, then i think your onto a winner
In that case, i think its an even cooler concept, and especially if you can cut down the recompile (Conversion to native code) time, then i think your onto a winner
Re: Optimising JIT'ing
Probably for the same reason I do. This has been a design plan of mine for many years.MessiahAndrw wrote:Why do I think this is worth while?
However, I will not being focusing entirely on actual JIT, as that will be especially for sandbox (unsafe) apps.
Instead, I intend to focus on native compilation. I tentatively call this "optimizing installation" for any program that is installed from a bytecode installation source and assembled/compiled/linked/converted to run directly on the system architecture. Stand-alone programs will be forced to run in the sand-box, as their native versions must be accounted for in some manner after installation/optimization occurs.
When CPU upgrades/downgrades occur and new optimizations are available/needed, it can trigger applications to "re-optimize" from the program's bytecode source, in which will actually be cached somewhere on the hard drive.
There are many more details, of course, but I did not want to hijack this thread too much
Overall, it is a massive undertaking, but I can always shoot for the bare-minimum CPU in an architecture line (e.g. i386) and work my way up the optimization chain as time permits
- Colonel Kernel
- Member
- Posts: 1437
- Joined: Tue Oct 17, 2006 6:06 pm
- Location: Vancouver, BC, Canada
- Contact:
Re: Optimising JIT'ing
I don't think what you've described could be called JIT -- it's more like the install-time pre-compilation done by Singularity.MessiahAndrw wrote:I've been thinking about a optimizing JIT-implementation.
In a JIT system, the executable that's loaded is nearly all bytecode, with only a small bootstrap portion that calls the run-time environment. The v-table for each class in such systems are initially set to point to JIT-compiler routines so that the first time a method is called, it is compiled to native code and the corresponding v-tables are patched. At least that's the general idea...
Whether you need JIT or just pre-compilation depends a lot on whether you allow dynamic loading. How would that work in your system?
Top three reasons why my OS project died:
- Too much overtime at work
- Got married
- My brain got stuck in an infinite loop while trying to design the memory manager
Re: Optimising JIT'ing
This is true.Colonel Kernel wrote:I don't think what you've described could be called JIT -- it's more like the install-time pre-compilation done by Singularity.MessiahAndrw wrote:I've been thinking about a optimizing JIT-implementation.
However, and believe it or not, my design goals pre-date the announcement of Singularity by at least a year. I think C# was out already, but as soon as I heard and saw what Singularity was, I dropped my design in utter frustration. I knew they could fully release in 1 year (not including the many patches/updates that will follow) what it would take me another 5 just to prototype. I have an overbearing habit of not wanting to waste my minuscule amount of free time in reinventing the wheel.
Thankfully, C#/CIL/.Net have ultimately panned out to be quite different than my designs. So after a few years of neglect, I am back to developing DynatOS and the Dynatos VM based upon my original plans
Re: Optimising JIT'ing
Would your call .NET JIT? I would, and it does the above with a few changes.Colonel Kernel wrote:I don't think what you've described could be called JIT -- it's more like the install-time pre-compilation done by Singularity.MessiahAndrw wrote:I've been thinking about a optimizing JIT-implementation.
In a JIT system, the executable that's loaded is nearly all bytecode, with only a small bootstrap portion that calls the run-time environment. The v-table for each class in such systems are initially set to point to JIT-compiler routines so that the first time a method is called, it is compiled to native code and the corresponding v-tables are patched. At least that's the general idea...
Whether you need JIT or just pre-compilation depends a lot on whether you allow dynamic loading. How would that work in your system?
Firstly, native code is not placed back into the original file, it is stored seperately. Secondly, not all code is compiled to native for the long term, only that which is specified. Lastly, they run the compiler constantly in the background when it has Assemblies to compile, instead of waiting for them to be run.
Compilers themselves have hard time deciding when to apply right optimization, which jump is more likely to happen, whenever programs needs SSE or not. In many cases benefits from using multiple treads can be only verified by testing(measure time) the algo. I probably start writing compiler first.
You'll need lots of your time to implement that so I can see decent(5%+) speed increase.
If you plan to store optimized version I would run the optimization process on all files on hard drive only once during program installation.
You'll need lots of your time to implement that so I can see decent(5%+) speed increase.
If you plan to store optimized version I would run the optimization process on all files on hard drive only once during program installation.
I think a simpler solution would be for the program writers to cross compile their program to a lot of different platforms. If someone has a significantly different system, they could just compile from source.
There's an option in the Intel compiler to do a runtime analysis of a program where it will insert the correct branch hints based on actual program execution examples.
There's an option in the Intel compiler to do a runtime analysis of a program where it will insert the correct branch hints based on actual program execution examples.