brute forcer
- Kevin McGuire
- Member
- Posts: 843
- Joined: Tue Nov 09, 2004 12:00 am
- Location: United States
- Contact:
- Kevin McGuire
- Member
- Posts: 843
- Joined: Tue Nov 09, 2004 12:00 am
- Location: United States
- Contact:
What I Did
I did what osdev64 was wanting me to do, which was integrate his idea into my threaded implementation. I had already noticed early on before he asked, and before Brynet-Inc posted that the method he was using showed a much faster way.
Clearing The MD5 Buffer
The reason osdev64's is much faster is because he does not clear the entire MD5 buffer for each call to the md5_hash function. This was the actual slow down. So, thanks for the idea osdev64.
Why Brynet-Inc Botched The Results From His Test
That also brings me to why Brynet-Inc is a dork with his last post. Simply, because he could not have been testing the implementations correctly or gathering the test results and presenting it to us correctly.
And, this one when compiled takes three seconds to run which is much higher than the number of seconds he gave in his last post, but has a higher hashes per second.
Anyway, I got 3.318144 million hashes per second using osdev64's idea of only clearing as much of the MD5 block buffer as you need. This is good, right? He was right too.
The Implementation
Just in case someone gets to comparing numbers again here is a official implementation. This is not a implementation by Kevin McGuire, but rather a implementation by GLNeo, osdev64, and Kevin McGuire. Anyway, that means no single one of us takes the blame - lol.
(implementation below)
http://kmcguire.jouleos.galekus.com/dok ... _threading
Changes In 'md5_hash'
The changes in md5_hash are the removal of code preparing the buffer. This code made sure it was a multiple of 64 bytes, added the padding, and cleared the remaining to zeros, and added the length in bits. This code would execute each time a hash was made even if only a byte or two needed clearing, or no bytes needed clearing which is a common case when simply advancing digits in the sequence.
I did what osdev64 was wanting me to do, which was integrate his idea into my threaded implementation. I had already noticed early on before he asked, and before Brynet-Inc posted that the method he was using showed a much faster way.
Clearing The MD5 Buffer
The reason osdev64's is much faster is because he does not clear the entire MD5 buffer for each call to the md5_hash function. This was the actual slow down. So, thanks for the idea osdev64.
Why Brynet-Inc Botched The Results From His Test
That also brings me to why Brynet-Inc is a dork with his last post. Simply, because he could not have been testing the implementations correctly or gathering the test results and presenting it to us correctly.
And, this one when compiled takes three seconds to run which is much higher than the number of seconds he gave in his last post, but has a higher hashes per second.
The Resultsocaine 5b9104dc42e2f8fdd3f6a801f7656d65 5
Collision Found!: 5b9104dc42e2f8fdd3f6a801f7656d65 is : glich
Cracking took: 2.38s
Average h/s: 1364077.47 h/s
$ ./brute.exe 5b9104dc42e2f8fdd3f6a801f7656d65 5
Collision Found!
hash[5b9104dc42e2f8fdd3f6a801f7656d65] = 'glich'
- time: 1.49s
- avg. hash/s: 2181605.39 h/s
Anyway, I got 3.318144 million hashes per second using osdev64's idea of only clearing as much of the MD5 block buffer as you need. This is good, right? He was right too.
The Implementation
Just in case someone gets to comparing numbers again here is a official implementation. This is not a implementation by Kevin McGuire, but rather a implementation by GLNeo, osdev64, and Kevin McGuire. Anyway, that means no single one of us takes the blame - lol.
(implementation below)
http://kmcguire.jouleos.galekus.com/dok ... _threading
Changes In 'md5_hash'
The changes in md5_hash are the removal of code preparing the buffer. This code made sure it was a multiple of 64 bytes, added the padding, and cleared the remaining to zeros, and added the length in bits. This code would execute each time a hash was made even if only a byte or two needed clearing, or no bytes needed clearing which is a common case when simply advancing digits in the sequence.
- Brynet-Inc
- Member
- Posts: 2426
- Joined: Tue Oct 17, 2006 9:29 pm
- Libera.chat IRC: brynet
- Location: Canada
- Contact:
well i don't think Brynet-Inc botched the results because they are consistent with what i am getting
also my previous post had code attached that is the sum of all our ideas and runs at ~2,190,000 on a 1.4GHz, so it is ~60% faster than Kevin McGuire's latest posting.
to end this lets post our latest build and let others try a test?, might work? os64dev, Kevin McGuire post yours and we can see which will be the official build, the threading is good for multi-core but on my computer it is the slowest build
p.s. lets try to keep this friendly
also my previous post had code attached that is the sum of all our ideas and runs at ~2,190,000 on a 1.4GHz, so it is ~60% faster than Kevin McGuire's latest posting.
to end this lets post our latest build and let others try a test?, might work? os64dev, Kevin McGuire post yours and we can see which will be the official build, the threading is good for multi-core but on my computer it is the slowest build
p.s. lets try to keep this friendly
- Attachments
-
- cocaine.c
- (8.33 KiB) Downloaded 157 times
- Kevin McGuire
- Member
- Posts: 843
- Joined: Tue Nov 09, 2004 12:00 am
- Location: United States
- Contact:
Umm.. I only run one thread.. I am not sure why someone would run more than one thread for testing hashes per second, although it can produce quicker searches in certain cases.
How can 2.1 million be faster than 3.3 million - and sometimes 4.0 million depending on the sequence length? Are you crazy? I had the same code as you. In actuality, I wrote the darn code and I should know how fast it runs.
And... if Brynet-Inc used the code I saw it only searched for sequences of six in length, unlike mine which correctly searches through the minimum and maximum length you specify... AND I did not see Brynet-Inc mention what he had those set on... ?? ? ? ? ? ? ? ?
osdev64 had the fastest implementation... not you. It was his idea to clear only select parts of the MD5 block between hashing messages..which is another reason why I can not understand how yours was right along at the same hashes per second as him.
---- pause ----
I did not want this to turn into who wrote what. I tried resolving it down to the three of us above, but man you are underhanded. So let me just bring the truth out and tell you that the code you just posted is really osdev64's. It looks like you just copied it and attached it calling it your own implementation.
And I could care less who's is the official build. I was giving it away for free..
How can 2.1 million be faster than 3.3 million - and sometimes 4.0 million depending on the sequence length? Are you crazy? I had the same code as you. In actuality, I wrote the darn code and I should know how fast it runs.
And... if Brynet-Inc used the code I saw it only searched for sequences of six in length, unlike mine which correctly searches through the minimum and maximum length you specify... AND I did not see Brynet-Inc mention what he had those set on... ?? ? ? ? ? ? ? ?
osdev64 had the fastest implementation... not you. It was his idea to clear only select parts of the MD5 block between hashing messages..which is another reason why I can not understand how yours was right along at the same hashes per second as him.
---- pause ----
I did not want this to turn into who wrote what. I tried resolving it down to the three of us above, but man you are underhanded. So let me just bring the truth out and tell you that the code you just posted is really osdev64's. It looks like you just copied it and attached it calling it your own implementation.
And I could care less who's is the official build. I was giving it away for free..
- Attachments
-
- md5.c
- (11.02 KiB) Downloaded 71 times
- Kevin McGuire
- Member
- Posts: 843
- Joined: Tue Nov 09, 2004 12:00 am
- Location: United States
- Contact:
gcc cocaine.c -lm -lpthread -O3 -march=i686 -o md5 && time ./md5 5b9104dc42e2f8fdd3f6a801f7656d65 5
Cracking took: 1.34s
Collision Found!: glitch
Average h/s: 2417674 h/s
real 2.006s
user 1.571s
sys 0.003s
gcc md5.c -lm -lpthread -O3 -march=i686 -o md5 && time ./md5 5b9104dc42e2f8fdd3f6a801f7656d65 1 47
initializing..
digest_target is: dc04915bfdf8e24201a8f6d3656d65f7
h/s: 4317414
(1) I found a collision!
message: glitch
g_thread_active_cnt:0
real 2.006s
user 1.566s
sys 0.005s
I also want you to notice that mine takes the same amount of time (slight longer in fractions of a second), but has more hashes per second. Why?
It is quite simple, and it also the reason why Brynet-Inc's result were botched, or more accurately the way he presented them ticked me off. You should notice by reviewing your code that it only searches for a sequenced of a specified length which is 5 in your case. You can see the 5 being passed on the command line. If you look at mine you will notice it takes two options which are the minimum and maximum sequence (message) length to search for.
So when someone does some tests and throw the results up and puts my name at the end with a bunch of periods trailing saying twenty six seconds I think I will get a little bent out of shape wondering what size brain this person has.
If you do not believe me then look at the code. My implementation is actually mine from the start using osdev64 crucial idea boosting the hashes per second beyond what I would have done by my self. So thanks, osdev64. -- Not the mention mine and osdev64's actually support multi-threading for multi-processor systems.
Yeah. You and Brynet-Inc make a good team with misrepresented results and underhanded tactics!
Cracking took: 1.34s
Collision Found!: glitch
Average h/s: 2417674 h/s
real 2.006s
user 1.571s
sys 0.003s
gcc md5.c -lm -lpthread -O3 -march=i686 -o md5 && time ./md5 5b9104dc42e2f8fdd3f6a801f7656d65 1 47
initializing..
digest_target is: dc04915bfdf8e24201a8f6d3656d65f7
h/s: 4317414
(1) I found a collision!
message: glitch
g_thread_active_cnt:0
real 2.006s
user 1.566s
sys 0.005s
I also want you to notice that mine takes the same amount of time (slight longer in fractions of a second), but has more hashes per second. Why?
It is quite simple, and it also the reason why Brynet-Inc's result were botched, or more accurately the way he presented them ticked me off. You should notice by reviewing your code that it only searches for a sequenced of a specified length which is 5 in your case. You can see the 5 being passed on the command line. If you look at mine you will notice it takes two options which are the minimum and maximum sequence (message) length to search for.
So when someone does some tests and throw the results up and puts my name at the end with a bunch of periods trailing saying twenty six seconds I think I will get a little bent out of shape wondering what size brain this person has.
If you do not believe me then look at the code. My implementation is actually mine from the start using osdev64 crucial idea boosting the hashes per second beyond what I would have done by my self. So thanks, osdev64. -- Not the mention mine and osdev64's actually support multi-threading for multi-processor systems.
Yeah. You and Brynet-Inc make a good team with misrepresented results and underhanded tactics!
run it as: i don't know what you'll get but i would just like to see
besides I can add range "1 47" support easily, its not that big of a thing
Code: Select all
./md5 5b9104dc42e2f8fdd3f6a801f7656d65 5 5
besides I can add range "1 47" support easily, its not that big of a thing
- Kevin McGuire
- Member
- Posts: 843
- Joined: Tue Nov 09, 2004 12:00 am
- Location: United States
- Contact:
- Kevin McGuire
- Member
- Posts: 843
- Joined: Tue Nov 09, 2004 12:00 am
- Location: United States
- Contact:
The creation time for the one thread is keeping the times high. If I remove that and make a direct call I can get:
real 1.871s
user 1.618s
sys 0.003s
I imagine if I removed the mutual exclusion code it would become even faster, and would place it on a level playing field.
---
Removing one of the locks gives:
real: 1.793s
user: 1.536s
sys: 0.005s
My character map included 26 lower case letters and a space. If I removed the space and had twenty-six letters like you I get:
real: 1.614s
user: 1.382s
sys: 0.007s
real 1.871s
user 1.618s
sys 0.003s
I imagine if I removed the mutual exclusion code it would become even faster, and would place it on a level playing field.
---
Removing one of the locks gives:
real: 1.793s
user: 1.536s
sys: 0.005s
My character map included 26 lower case letters and a space. If I removed the space and had twenty-six letters like you I get:
real: 1.614s
user: 1.382s
sys: 0.007s
Last edited by Kevin McGuire on Wed Jul 04, 2007 10:15 pm, edited 1 time in total.
- Kevin McGuire
- Member
- Posts: 843
- Joined: Tue Nov 09, 2004 12:00 am
- Location: United States
- Contact:
Well. Lets compare then with a larger message? Say 11 characters long?
... way too long.... Lets try 8 characters long..
hash: a341ecf4f58085643d0a7a6454842b20 ./log
message: 6d 63 67 75 69 72 65 0a mcguire.
char charset[] = {'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 0x0a}; // lower_alpha
It should take no more than 14 hours for a decent brute force, right?
I started yours and mine together, and I will see tomorrow which one was the fastest, or if I screwed up and typed the numbers wrong or some weird crap.
... way too long.... Lets try 8 characters long..
hash: a341ecf4f58085643d0a7a6454842b20 ./log
message: 6d 63 67 75 69 72 65 0a mcguire.
char charset[] = {'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 0x0a}; // lower_alpha
It should take no more than 14 hours for a decent brute force, right?
I started yours and mine together, and I will see tomorrow which one was the fastest, or if I screwed up and typed the numbers wrong or some weird crap.
Interesting topic this becomes after i go to sleep.
Keep in mind that you are comparing apples with apples under a different microscope . Meaning do you both have exactly the same hardware, processor, memory, programs running etc... This is also the reason why Brynet-Inc could have his outcome. He was running the test on a pentium 2 @ 400. Well clearing and coping memory is certainly very expensive on that type of processor.
For the 'official' build stuff. *sigh* i don't care about that really. I will test Kevin's implementation in a few and probably will do a complete rewrite of the code if i wish to use it for myself. (if its ok with you Kev).
Regarding the threading and normal sollution. This is kind of normal behaviour: multithreading is allways expensive on a single core/processor computer system, simple because you add overhead(read context switching). Multi-threading is not the same as parallel computing, hence my remark about 1 thread per core/cpu. So to have a fast sollution on any system, the code needs to determing the number of cpus and adapt to it. no threading on single core, 2 threads on dual core, 4 threads on quad core, etc.. This is not hard just a lot more work. Suddenly i remember a discussion about bloat-ware... this might also be a reason... ok back on topic.
Still the fastcall and inline a tenths of a percent compared to the time taken in MD5 hash. so the optimisation is IMHO small, though it might work for your system.
So what did we learn about this 3 people can make a darn fast brute forcer and are instantly known as hackers in the osdev community
Keep in mind that you are comparing apples with apples under a different microscope . Meaning do you both have exactly the same hardware, processor, memory, programs running etc... This is also the reason why Brynet-Inc could have his outcome. He was running the test on a pentium 2 @ 400. Well clearing and coping memory is certainly very expensive on that type of processor.
For the 'official' build stuff. *sigh* i don't care about that really. I will test Kevin's implementation in a few and probably will do a complete rewrite of the code if i wish to use it for myself. (if its ok with you Kev).
Regarding the threading and normal sollution. This is kind of normal behaviour: multithreading is allways expensive on a single core/processor computer system, simple because you add overhead(read context switching). Multi-threading is not the same as parallel computing, hence my remark about 1 thread per core/cpu. So to have a fast sollution on any system, the code needs to determing the number of cpus and adapt to it. no threading on single core, 2 threads on dual core, 4 threads on quad core, etc.. This is not hard just a lot more work. Suddenly i remember a discussion about bloat-ware... this might also be a reason... ok back on topic.
hopefully not on the same system because then the outcome would be meaningless. You don't know how the OS schedules the processes/threads, however the 14 hour time might level it out. A more fairer test would be to run each instance in solitude.Kevin wrote:It should take no more than 14 hours for a decent brute force, right?
I started yours and mine together, and I will see tomorrow which one was the fastest, or if I screwed up and typed the numbers wrong or some weird crap.
That would be the inline stuff i guess.. I've also tested with inline and fastcall but it made my implementation slower. The reason for this is code locality when not inlining the compiler alligns the code to a new 16 byte boundry and takes a fresh set of register values. in my case this means better cache and register usage. The only memory moves i got were the writes to the digest. so in 64-bit long mode this will also be register moves. i need to test this one day..GLneo wrote:i don't see why your code would be any faster than mine, i don't see any optimizations that you have the I don't, but i do see optimizations I have that you don't.
Still the fastcall and inline a tenths of a percent compared to the time taken in MD5 hash. so the optimisation is IMHO small, though it might work for your system.
So what did we learn about this 3 people can make a darn fast brute forcer and are instantly known as hackers in the osdev community
Author of COBOS