brute forcer

GLneo · Post by **GLneo** » Wed Jul 04, 2007 5:11 pm

have you tried the code i just posted above you post?

Kevin McGuire · Post by **Kevin McGuire** » Wed Jul 04, 2007 5:54 pm

GLneo wrote:have you tried the code i just posted above you post?

It is no telling what he tried. He is a dork, and has no idea what he was putting together or timing.

Kevin McGuire · Post by **Kevin McGuire** » Wed Jul 04, 2007 6:41 pm

What I Did
I did what osdev64 was wanting me to do, which was integrate his idea into my threaded implementation. I had already noticed early on before he asked, and before Brynet-Inc posted that the method he was using showed a much faster way.

Clearing The MD5 Buffer
The reason osdev64's is much faster is because he does not clear the entire MD5 buffer for each call to the md5_hash function. This was the actual slow down. So, thanks for the idea osdev64.

Why Brynet-Inc Botched The Results From His Test
That also brings me to why Brynet-Inc is a dork with his last post. Simply, because he could not have been testing the implementations correctly or gathering the test results and presenting it to us correctly.

And, this one when compiled takes three seconds to run which is much higher than the number of seconds he gave in his last post, but has a higher hashes per second.

ocaine 5b9104dc42e2f8fdd3f6a801f7656d65 5
Collision Found!: 5b9104dc42e2f8fdd3f6a801f7656d65 is : glich
Cracking took: 2.38s
Average h/s: 1364077.47 h/s

$ ./brute.exe 5b9104dc42e2f8fdd3f6a801f7656d65 5
Collision Found!
hash[5b9104dc42e2f8fdd3f6a801f7656d65] = 'glich'
- time: 1.49s
- avg. hash/s: 2181605.39 h/s

The Results
Anyway, I got 3.318144 million hashes per second using osdev64's idea of only clearing as much of the MD5 block buffer as you need. This is good, right? He was right too.

The Implementation
Just in case someone gets to comparing numbers again here is a official implementation. This is not a implementation by Kevin McGuire, but rather a implementation by GLNeo, osdev64, and Kevin McGuire. Anyway, that means no single one of us takes the blame - lol.
(implementation below)
http://kmcguire.jouleos.galekus.com/dok ... _threading

Changes In 'md5_hash'
The changes in md5_hash are the removal of code preparing the buffer. This code made sure it was a multiple of 64 bytes, added the padding, and cleared the remaining to zeros, and added the length in bits. This code would execute each time a hash was made even if only a byte or two needed clearing, or no bytes needed clearing which is a common case when simply advancing digits in the sequence.

Brynet-Inc · Post by **Brynet-Inc** » Wed Jul 04, 2007 7:26 pm

I'm a dork? Interesting.. if you want to resort to childish name calling.. I could call you a brainless American.

We even?

GLneo · Post by **GLneo** » Wed Jul 04, 2007 9:07 pm

well i don't think Brynet-Inc botched the results because they are consistent with what i am getting

also my previous post had code attached that is the sum of all our ideas and runs at ~2,190,000 on a 1.4GHz, so it is ~60% faster than Kevin McGuire's latest posting.

to end this lets post our latest build and let others try a test?, might work? os64dev, Kevin McGuire post yours and we can see which will be the official build, the threading is good for multi-core but on my computer it is the slowest build

p.s. lets try to keep this friendly

Kevin McGuire · Post by **Kevin McGuire** » Wed Jul 04, 2007 9:15 pm

Umm.. I only run one thread.. I am not sure why someone would run more than one thread for testing hashes per second, although it can produce quicker searches in certain cases.

How can 2.1 million be faster than 3.3 million - and sometimes 4.0 million depending on the sequence length? Are you crazy? I had the same code as you. In actuality, I wrote the darn code and I should know how fast it runs.

And... if Brynet-Inc used the code I saw it only searched for sequences of six in length, unlike mine which correctly searches through the minimum and maximum length you specify... AND I did not see Brynet-Inc mention what he had those set on... ?? ? ? ? ? ? ? ?

osdev64 had the fastest implementation... not you. It was his idea to clear only select parts of the MD5 block between hashing messages..which is another reason why I can not understand how yours was right along at the same hashes per second as him.

---- pause ----

I did not want this to turn into who wrote what. I tried resolving it down to the three of us above, but man you are underhanded. So let me just bring the truth out and tell you that the code you just posted is really osdev64's. It looks like you just copied it and attached it calling it your own implementation.

And I could care less who's is the official build. I was giving it away for free..

GLneo · Post by **GLneo** » Wed Jul 04, 2007 9:42 pm

I'm not clamming it is my code, i did take os64dev's code an add it to mine so mine has my optimizations AND os64dev AND yours so that is why mine is more "official", the last thing we need is ten forks of the same little project

Kevin McGuire · Post by **Kevin McGuire** » Wed Jul 04, 2007 9:42 pm

gcc cocaine.c -lm -lpthread -O3 -march=i686 -o md5 && time ./md5 5b9104dc42e2f8fdd3f6a801f7656d65 5
Cracking took: 1.34s
Collision Found!: glitch
Average h/s: 2417674 h/s
real 2.006s
user 1.571s
sys 0.003s

gcc md5.c -lm -lpthread -O3 -march=i686 -o md5 && time ./md5 5b9104dc42e2f8fdd3f6a801f7656d65 1 47
initializing..
digest_target is: dc04915bfdf8e24201a8f6d3656d65f7
h/s: 4317414
(1) I found a collision!
message: glitch
g_thread_active_cnt:0
real 2.006s
user 1.566s
sys 0.005s

I also want you to notice that mine takes the same amount of time (slight longer in fractions of a second), but has more hashes per second. Why?

It is quite simple, and it also the reason why Brynet-Inc's result were botched, or more accurately the way he presented them ticked me off. You should notice by reviewing your code that it only searches for a sequenced of a specified length which is 5 in your case. You can see the 5 being passed on the command line. If you look at mine you will notice it takes two options which are the minimum and maximum sequence (message) length to search for.

So when someone does some tests and throw the results up and puts my name at the end with a bunch of periods trailing saying twenty six seconds I think I will get a little bent out of shape wondering what size brain this person has.

If you do not believe me then look at the code.

My implementation is actually mine from the start using osdev64 crucial idea boosting the hashes per second beyond what I would have done by my self. So thanks, osdev64. -- Not the mention mine and osdev64's actually support multi-threading for multi-processor systems.

Yeah. You and Brynet-Inc make a good team with misrepresented results and underhanded tactics!

GLneo · Post by **GLneo** » Wed Jul 04, 2007 9:54 pm

run it as:

Code: Select all

./md5 5b9104dc42e2f8fdd3f6a801f7656d65 5 5

i don't know what you'll get but i would just like to see

besides I can add range "1 47" support easily, its not that big of a thing

Kevin McGuire · Post by **Kevin McGuire** » Wed Jul 04, 2007 9:59 pm

....
h/s: 4317414
....
message: glitch
....
real: 2.005s
user: 1.566s
sys: 0.005s

GLneo · Post by **GLneo** » Wed Jul 04, 2007 10:01 pm

if yours is doing more h/s then why are ours finding it at the same time???

Kevin McGuire · Post by **Kevin McGuire** » Wed Jul 04, 2007 10:03 pm

The creation time for the one thread is keeping the times high. If I remove that and make a direct call I can get:
real 1.871s
user 1.618s
sys 0.003s

I imagine if I removed the mutual exclusion code it would become even faster, and would place it on a level playing field.

---
Removing one of the locks gives:
real: 1.793s
user: 1.536s
sys: 0.005s

My character map included 26 lower case letters and a space. If I removed the space and had twenty-six letters like you I get:

real: 1.614s
user: 1.382s
sys: 0.007s

GLneo · Post by **GLneo** » Wed Jul 04, 2007 10:08 pm

i don't see why your code would be any faster than mine, i don't see any optimizations that you have the I don't, but i do see optimizations I have that you don't.

Kevin McGuire · Post by **Kevin McGuire** » Wed Jul 04, 2007 10:16 pm

Well. Lets compare then with a larger message? Say 11 characters long?
... way too long.... Lets try 8 characters long..

hash: a341ecf4f58085643d0a7a6454842b20 ./log
message: 6d 63 67 75 69 72 65 0a mcguire.

char charset[] = {'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 0x0a}; // lower_alpha

It should take no more than 14 hours for a decent brute force, right?

I started yours and mine together, and I will see tomorrow which one was the fastest, or if I screwed up and typed the numbers wrong or some weird crap.

os64dev · Post by **os64dev** » Thu Jul 05, 2007 12:32 am

Interesting topic this becomes after i go to sleep.

Keep in mind that you are comparing apples with apples under a different microscope

. Meaning do you both have exactly the same hardware, processor, memory, programs running etc... This is also the reason why Brynet-Inc could have his outcome. He was running the test on a pentium 2 @ 400. Well clearing and coping memory is certainly very expensive on that type of processor.

For the 'official' build stuff. *sigh* i don't care about that really. I will test Kevin's implementation in a few and probably will do a complete rewrite of the code if i wish to use it for myself. (if its ok with you Kev).

Regarding the threading and normal sollution. This is kind of normal behaviour: multithreading is allways expensive on a single core/processor computer system, simple because you add overhead(read context switching). Multi-threading is not the same as parallel computing, hence my remark about 1 thread per core/cpu. So to have a fast sollution on any system, the code needs to determing the number of cpus and adapt to it. no threading on single core, 2 threads on dual core, 4 threads on quad core, etc.. This is not hard just a lot more work. Suddenly i remember a discussion about bloat-ware... this might also be a reason... ok back on topic.

Kevin wrote:It should take no more than 14 hours for a decent brute force, right?

I started yours and mine together, and I will see tomorrow which one was the fastest, or if I screwed up and typed the numbers wrong or some weird crap.

hopefully not on the same system because then the outcome would be meaningless. You don't know how the OS schedules the processes/threads, however the 14 hour time might level it out. A more fairer test would be to run each instance in solitude.

GLneo wrote:i don't see why your code would be any faster than mine, i don't see any optimizations that you have the I don't, but i do see optimizations I have that you don't.

That would be the inline stuff i guess.. I've also tested with inline and fastcall but it made my implementation slower. The reason for this is code locality when not inlining the compiler alligns the code to a new 16 byte boundry and takes a fresh set of register values. in my case this means better cache and register usage. The only memory moves i got were the writes to the digest. so in 64-bit long mode this will also be register moves. i need to test this one day..

Still the fastcall and inline a tenths of a percent compared to the time taken in MD5 hash. so the optimisation is IMHO small, though it might work for your system.

So what did we learn about this

3 people can make a darn fast brute forcer and are instantly known as hackers in the osdev community