brute forcer

GLneo · Post by **GLneo** » Thu Jul 05, 2007 4:59 pm

@jhawthorn i have also found that if you remove the unused variable "digest" it "Collision NOT Found"'s, so I've been walking on egg-shells trying not to rearrange things to much or things stop working

Ninjarider · Post by **Ninjarider** » Thu Jul 05, 2007 5:54 pm

im not sure if it would be possible for it to be done in c, i know it is done in asm. all processors have a v-pipe and a u-pipe. certain instructions come in on the u pipe others on the v-pipe and some go in on any pipe. there might be a way to play with all the stuff in the loops to line up the opcodes to possible make it twice as fast in the loops.

starting work on a asm module. might take a while to get it on here since i dont have internet at the house.

GLneo · Post by **GLneo** » Thu Jul 05, 2007 6:38 pm

i don't believe pipelining will double the speed, also that would break compatibility, currently the code should compile on any 32bit CPU.

Ninjarider · Post by **Ninjarider** » Thu Jul 05, 2007 8:40 pm

as far as the compatiblity there should be any issue with an intel 32-bit processor using the piplines. it would not have the same speed going from an intel to an amd. it will not exactly double the speed. but gives the possiblility to increase speed up to double for any loops.

not to mention when implementing something like that you can running in to computation errors because v pipe has excecuted an instructions before the u pipe and the u pipe required a value the v pipe changed.

Brynet-Inc · Post by **Brynet-Inc** » Thu Jul 05, 2007 9:08 pm

Ninjarider wrote:as far as the compatiblity there should be any issue with an intel 32-bit processor using the piplines. it would not have the same speed going from an intel to an amd. it will not exactly double the speed. but gives the possiblility to increase speed up to double for any loops.

not to mention when implementing something like that you can running in to computation errors because v pipe has excecuted an instructions before the u pipe and the u pipe required a value the v pipe changed.

I think he meant it would work on "any" 32bit CPU, you do know there are more then just x86 processors right?

Ninjarider · Post by **Ninjarider** » Thu Jul 05, 2007 9:35 pm

yeah. amd and intel are the main x86 processors i know of.

jhawthorn · Post by **jhawthorn** » Thu Jul 05, 2007 11:35 pm

Found myself 5 minutes to look at it and found the source of the problem.

Please change

Code: Select all

sscanf(&argv[1][t*2], "%2x", (unsigned int *)&raw_inhash[t]);

to

Code: Select all

unsigned int tmp;
sscanf(&argv[1][t*2], "%2x", &tmp);
raw_inhash[t] = tmp;

Keep up the good work. I love watching projects like this.

os64dev · Post by **os64dev** » Fri Jul 06, 2007 12:01 am

jhawthorn wrote:Found myself 5 minutes to look at it and found the source of the problem.

Please change
Code: Select all
sscanf(&argv[1][t*2], "%2x", (unsigned int *)&raw_inhash[t]);
to
Code: Select all
unsigned int tmp;
sscanf(&argv[1][t*2], "%2x", &tmp);
raw_inhash[t] = tmp;
Keep up the good work. I love watching projects like this.

Doh, that error was kind of obvious. I didn't look into it because is didn't rearrange the code but thanks. Ofcourse we will continue i have to beet Cain

os64dev · Post by **os64dev** » Fri Jul 06, 2007 2:15 am

an additonal 5%, though now it gettting problematic to increase speed. The MD5 has is almost entirely done in registers. Now i will use this version as a basis for multithreading.

old version:

$ time ./brute.exe d6a6bc0db10694a2d90e3a69648f3a03 6
Collision Found!
hash[d6a6bc0db10694a2d90e3a69648f3a03] = 'hacker'
- time: 48.81s
- avg. hash/s: 4178977.22 h/s

real 0m48.938s
user 0m48.859s
sys 0m0.000s

new version:

$ time ./brute.exe d6a6bc0db10694a2d90e3a69648f3a03 6
Collision Found!
hash[d6a6bc0db10694a2d90e3a69648f3a03] = 'hacker'
- time: 22.98s
- avg. hash/s: 4371943.88 h/s

real 0m23.047s
user 0m23.015s
sys 0m0.015s

jhawthorn · Post by **jhawthorn** » Fri Jul 06, 2007 4:13 am

Latest changes have put me up to ~4050000 h/s on my AMD64 3000+. I have to disagree with changing digest, raw_inhash, and charset into global variables. Digest will almost certainly be used individually by each thread eventually. charset will, hopefully, not always be a constant. Moreover, all the functions are being inlined, so there shouldn't (assuming your compiler is half sane) be a big performance hit from passing them an additional argument.

I look forward to seeing the threaded and then distributed versions of this piece of code.

GLneo · Post by **GLneo** » Fri Jul 06, 2007 4:28 am

well, why even use digest? now that the comparison is made in the md5 hasher why do we still use that variable?:

Code: Select all

int __attribute__((__always_inline__)) md5_hash(unsigned char *message, unsigned int mlength, unsigned char input[16])
{
    uint32_t AA, BB, CC, DD;
    uint32_t *X; 
    uint32_t A, B, C, D; 
    uint32_t i;

    AA = 0x67452301;
    BB = 0xefcdab89;
    CC = 0x98badcfe;
    DD = 0x10325476;

    for(i = 0; i < (mlength / 64); ++i) 
    { 
        A = AA; 
        B = BB; 
        C = CC; 
        D = DD; 

        X = (uint32_t *)&message[i * 64];

        /// round one (unrolled) 
        A = B + ROTATE_LEFT((A + F(B, C, D) + X[ 0] + 0xd76aa478),  7); 
        D = A + ROTATE_LEFT((D + F(A, B, C) + X[ 1] + 0xe8c7b756), 12); 
        C = D + ROTATE_LEFT((C + F(D, A, B) + X[ 2] + 0x242070db), 17); 
        B = C + ROTATE_LEFT((B + F(C, D, A) + X[ 3] + 0xc1bdceee), 22); 
        A = B + ROTATE_LEFT((A + F(B, C, D) + X[ 4] + 0xf57c0faf),  7); 
        D = A + ROTATE_LEFT((D + F(A, B, C) + X[ 5] + 0x4787c62a), 12); 
        C = D + ROTATE_LEFT((C + F(D, A, B) + X[ 6] + 0xa8304613), 17); 
        B = C + ROTATE_LEFT((B + F(C, D, A) + X[ 7] + 0xfd469501), 22); 
        A = B + ROTATE_LEFT((A + F(B, C, D) + X[ 8] + 0x698098d8),  7); 
        D = A + ROTATE_LEFT((D + F(A, B, C) + X[ 9] + 0x8b44f7af), 12); 
        C = D + ROTATE_LEFT((C + F(D, A, B) + X[10] + 0xffff5bb1), 17); 
        B = C + ROTATE_LEFT((B + F(C, D, A) + X[11] + 0x895cd7be), 22); 
        A = B + ROTATE_LEFT((A + F(B, C, D) + X[12] + 0x6b901122),  7); 
        D = A + ROTATE_LEFT((D + F(A, B, C) + X[13] + 0xfd987193), 12); 
        C = D + ROTATE_LEFT((C + F(D, A, B) + X[14] + 0xa679438e), 17); 
        B = C + ROTATE_LEFT((B + F(C, D, A) + X[15] + 0x49b40821), 22); 
        /// round two (unrolled) 
        A = B + ROTATE_LEFT((A + G(B, C, D) + X[ 1] + 0xf61e2562),  5); 
        D = A + ROTATE_LEFT((D + G(A, B, C) + X[ 6] + 0xc040b340),  9); 
        C = D + ROTATE_LEFT((C + G(D, A, B) + X[11] + 0x265e5a51), 14); 
        B = C + ROTATE_LEFT((B + G(C, D, A) + X[ 0] + 0xe9b6c7aa), 20); 
        A = B + ROTATE_LEFT((A + G(B, C, D) + X[ 5] + 0xd62f105d),  5); 
        D = A + ROTATE_LEFT((D + G(A, B, C) + X[10] + 0x02441453),  9); 
        C = D + ROTATE_LEFT((C + G(D, A, B) + X[15] + 0xd8a1e681), 14); 
        B = C + ROTATE_LEFT((B + G(C, D, A) + X[ 4] + 0xe7d3fbc8), 20); 
        A = B + ROTATE_LEFT((A + G(B, C, D) + X[ 9] + 0x21e1cde6),  5); 
        D = A + ROTATE_LEFT((D + G(A, B, C) + X[14] + 0xc33707d6),  9); 
        C = D + ROTATE_LEFT((C + G(D, A, B) + X[ 3] + 0xf4d50d87), 14); 
        B = C + ROTATE_LEFT((B + G(C, D, A) + X[ 8] + 0x455a14ed), 20); 
        A = B + ROTATE_LEFT((A + G(B, C, D) + X[13] + 0xa9e3e905),  5); 
        D = A + ROTATE_LEFT((D + G(A, B, C) + X[ 2] + 0xfcefa3f8),  9); 
        C = D + ROTATE_LEFT((C + G(D, A, B) + X[ 7] + 0x676f02d9), 14); 
        B = C + ROTATE_LEFT((B + G(C, D, A) + X[12] + 0x8d2a4c8a), 20); 
        /// round three (unrolled) 
        A = B + ROTATE_LEFT((A + H(B, C, D) + X[ 5] + 0xfffa3942),  4); 
        D = A + ROTATE_LEFT((D + H(A, B, C) + X[ 8] + 0x8771f681), 11); 
        C = D + ROTATE_LEFT((C + H(D, A, B) + X[11] + 0x6d9d6122), 16); 
        B = C + ROTATE_LEFT((B + H(C, D, A) + X[14] + 0xfde5380c), 23); 
        A = B + ROTATE_LEFT((A + H(B, C, D) + X[ 1] + 0xa4beea44),  4); 
        D = A + ROTATE_LEFT((D + H(A, B, C) + X[ 4] + 0x4bdecfa9), 11); 
        C = D + ROTATE_LEFT((C + H(D, A, B) + X[ 7] + 0xf6bb4b60), 16); 
        B = C + ROTATE_LEFT((B + H(C, D, A) + X[10] + 0xbebfbc70), 23); 
        A = B + ROTATE_LEFT((A + H(B, C, D) + X[13] + 0x289b7ec6),  4); 
        D = A + ROTATE_LEFT((D + H(A, B, C) + X[ 0] + 0xeaa127fa), 11); 
        C = D + ROTATE_LEFT((C + H(D, A, B) + X[ 3] + 0xd4ef3085), 16); 
        B = C + ROTATE_LEFT((B + H(C, D, A) + X[ 6] + 0x04881d05), 23); 
        A = B + ROTATE_LEFT((A + H(B, C, D) + X[ 9] + 0xd9d4d039),  4); 
        D = A + ROTATE_LEFT((D + H(A, B, C) + X[12] + 0xe6db99e5), 11); 
        C = D + ROTATE_LEFT((C + H(D, A, B) + X[15] + 0x1fa27cf8), 16); 
        B = C + ROTATE_LEFT((B + H(C, D, A) + X[ 2] + 0xc4ac5665), 23); 
        /// round four (unrolled) 
        A = B + ROTATE_LEFT((A + I(B, C, D) + X[ 0] + 0xf4292244),  6); 
        D = A + ROTATE_LEFT((D + I(A, B, C) + X[ 7] + 0x432aff97), 10); 
        C = D + ROTATE_LEFT((C + I(D, A, B) + X[14] + 0xab9423a7), 15); 
        B = C + ROTATE_LEFT((B + I(C, D, A) + X[ 5] + 0xfc93a039), 21); 
        A = B + ROTATE_LEFT((A + I(B, C, D) + X[12] + 0x655b59c3),  6); 
        D = A + ROTATE_LEFT((D + I(A, B, C) + X[ 3] + 0x8f0ccc92), 10); 
        C = D + ROTATE_LEFT((C + I(D, A, B) + X[10] + 0xffeff47d), 15); 
        B = C + ROTATE_LEFT((B + I(C, D, A) + X[ 1] + 0x85845dd1), 21); 
        A = B + ROTATE_LEFT((A + I(B, C, D) + X[ 8] + 0x6fa87e4f),  6); 
        D = A + ROTATE_LEFT((D + I(A, B, C) + X[15] + 0xfe2ce6e0), 10); 
        C = D + ROTATE_LEFT((C + I(D, A, B) + X[ 6] + 0xa3014314), 15); 
        B = C + ROTATE_LEFT((B + I(C, D, A) + X[13] + 0x4e0811a1), 21); 
        A = B + ROTATE_LEFT((A + I(B, C, D) + X[ 4] + 0xf7537e82),  6); 
        D = A + ROTATE_LEFT((D + I(A, B, C) + X[11] + 0xbd3af235), 10); 
        C = D + ROTATE_LEFT((C + I(D, A, B) + X[ 2] + 0x2ad7d2bb), 15); 
        B = C + ROTATE_LEFT((B + I(C, D, A) + X[ 9] + 0xeb86d391), 21); 

        AA += A; 
        BB += B; 
        CC += C; 
        DD += D; 
    }

   if((*(unsigned long *)(input) == (AA)) &&
   (*(unsigned long *)(input+4) == (BB)) &&
   (*(unsigned long *)(input+8) == (CC)) &&
   (*(unsigned long *)(input+12) == (DD)))
       return 1;
   return 0;
}

os64dev · Post by **os64dev** » Fri Jul 06, 2007 4:45 am

jhawthorn wrote:Latest changes have put me up to ~4050000 h/s on my AMD64 3000+. I have to disagree with changing digest, raw_inhash, and charset into global variables. Digest will almost certainly be used individually by each thread eventually. charset will, hopefully, not always be a constant. Moreover, all the functions are being inlined, so there shouldn't (assuming your compiler is half sane) be a big performance hit from passing them an additional argument.

I look forward to seeing the threaded and then distributed versions of this piece of code.

Well you can disagree but i just did it to gain performance. The md5 hash function now uses registers for the whole md5 processing. The additional argument did take a few percent for the same reason as above. I should test is on 64-bit because then the global variables stuff will be converted to RIP-relative addressing.

I added a new version because there was a bug in the previous versions (try aaa as a password). However it seemed to have slowed down a bit.
The process takes longer now but the hashes per second are still high. The total time is bogus any way. for instance test with the password zzzzza and zzzzzz.

$ time ./brute.exe d6a6bc0db10694a2d90e3a69648f3a03 6
Collision Found!
hash[d6a6bc0db10694a2d90e3a69648f3a03] = 'hacker'
- time: 47.14s
- avg. hash/s: 4327197.45 h/s

real 0m47.297s
user 0m47.187s
sys 0m0.030s

@glneo
I tested that and it produced about the same result. I even made a MD5 has that doesn't have the length parameter either based on the assumption that a password generally is smaller then 64-9 = 55 characters, but even that didn't improve much.

@all
i've been trying to get the multithreaded function running and succeeded however the performace didn't even get close to the single threaded version so me is puzzled. I think i leave the MT version for Kevin

os64dev · Post by **os64dev** » Fri Jul 06, 2007 8:32 am

ahh. for such sweet moments we live. I am glad to announce that multi-threading is working $\:D/$ . And for you pleasure here it is. I know that some of you are eager for the stats. I limited the sequence to 32 characters.

$ g++ brute-mt.cc -foptimize-register-move -finline-functions -fno-exceptions -fno-rtti -fomit-frame-pointer -O3 -march=i686 -o brute.exe

$ time ./brute.exe d6a6bc0db10694a2d90e3a69648f3a03 6 2
threadList[t].sequence[0]: 0
threadList[t].sequence[0]: 13
Collision Found!
hash[d6a6bc0db10694a2d90e3a69648f3a03] = 'hacker'
time: 23.23s
- avg. hash/s: 7106282.43 h/s
#done.

real 0m23.454s
user 0m46.655s
sys 0m0.015s

frank · Post by **frank** » Fri Jul 06, 2007 8:53 am

Well I have a Core 2 Duo running at 1.4Ghz and I have some results for the code os64dev posted above

2 Threads

Code: Select all

$ time ./brute d6a6bc0db10694a2d90e3a69648f3a03 6 2
threadList[t].sequence[0]: 0
threadList[t].sequence[0]: 13
Collision Found!
hash[d6a6bc0db10694a2d90e3a69648f3a03] = 'hacker'
time: 38.87s
- avg. hash/s: 4493377.81 h/s
#done.

real    0m38.977s
user    1m14.256s
sys     0m0.093s

1 Thread

Code: Select all

$ time ./brute d6a6bc0db10694a2d90e3a69648f3a03 6 1
threadList[t].sequence[0]: 0
Collision Found!
hash[d6a6bc0db10694a2d90e3a69648f3a03] = 'hacker'
time: 26.98s
- avg. hash/s: 3084769.79 h/s
#done.

real    0m27.098s
user    0m25.864s
sys     0m0.139s

EDIT: Cain takes 52 seconds on my computer when I set it to min 6 max 6 and the lowercase alpha charset. It says 4050000 pass/s.

Kevin McGuire · Post by **Kevin McGuire** » Fri Jul 06, 2007 9:22 am

Does this one do any better? I am too afraid to report my findings since the last time I pursued what I thought was faster was not.

Compile
gcc md5.c -o md5 -O3
Options
md5 [hash] [minimum-length] [maximum-length] [thread-count]

Try it with the hash:
d6a6bc0db10694a2d90e3a69648f3a03 = hacker (longer run time)

Also multiple threads on a UNI can increase the cracking time by starting at different offsets in the message space.

OSDev.org

brute forcer

segfault

Re: segfault