brute forcer

Programming, for all ages and all languages.
GLneo
Member
Member
Posts: 237
Joined: Wed Dec 20, 2006 7:56 pm

Post by GLneo »

@jhawthorn i have also found that if you remove the unused variable "digest" it "Collision NOT Found"'s, so I've been walking on egg-shells trying not to rearrange things to much or things stop working :P
Ninjarider
Member
Member
Posts: 62
Joined: Fri Jun 29, 2007 8:36 pm

Post by Ninjarider »

im not sure if it would be possible for it to be done in c, i know it is done in asm. all processors have a v-pipe and a u-pipe. certain instructions come in on the u pipe others on the v-pipe and some go in on any pipe. there might be a way to play with all the stuff in the loops to line up the opcodes to possible make it twice as fast in the loops.

starting work on a asm module. might take a while to get it on here since i dont have internet at the house.
GLneo
Member
Member
Posts: 237
Joined: Wed Dec 20, 2006 7:56 pm

Post by GLneo »

i don't believe pipelining will double the speed, also that would break compatibility, currently the code should compile on any 32bit CPU.
Ninjarider
Member
Member
Posts: 62
Joined: Fri Jun 29, 2007 8:36 pm

Post by Ninjarider »

as far as the compatiblity there should be any issue with an intel 32-bit processor using the piplines. it would not have the same speed going from an intel to an amd. it will not exactly double the speed. but gives the possiblility to increase speed up to double for any loops.

not to mention when implementing something like that you can running in to computation errors because v pipe has excecuted an instructions before the u pipe and the u pipe required a value the v pipe changed.
User avatar
Brynet-Inc
Member
Member
Posts: 2426
Joined: Tue Oct 17, 2006 9:29 pm
Libera.chat IRC: brynet
Location: Canada
Contact:

Post by Brynet-Inc »

Ninjarider wrote:as far as the compatiblity there should be any issue with an intel 32-bit processor using the piplines. it would not have the same speed going from an intel to an amd. it will not exactly double the speed. but gives the possiblility to increase speed up to double for any loops.

not to mention when implementing something like that you can running in to computation errors because v pipe has excecuted an instructions before the u pipe and the u pipe required a value the v pipe changed.
I think he meant it would work on "any" 32bit CPU, you do know there are more then just x86 processors right? :roll:
Image
Twitter: @canadianbryan. Award by smcerm, I stole it. Original was larger.
Ninjarider
Member
Member
Posts: 62
Joined: Fri Jun 29, 2007 8:36 pm

Post by Ninjarider »

yeah. amd and intel are the main x86 processors i know of.
User avatar
jhawthorn
Member
Member
Posts: 58
Joined: Sun Nov 26, 2006 4:06 pm
Location: Victoria, BC, Canada
Contact:

segfault

Post by jhawthorn »

Found myself 5 minutes to look at it and found the source of the problem.

Please change

Code: Select all

sscanf(&argv[1][t*2], "%2x", (unsigned int *)&raw_inhash[t]);
to

Code: Select all

unsigned int tmp;
sscanf(&argv[1][t*2], "%2x", &tmp);
raw_inhash[t] = tmp;
Keep up the good work. I love watching projects like this.
User avatar
os64dev
Member
Member
Posts: 553
Joined: Sat Jan 27, 2007 3:21 pm
Location: Best, Netherlands

Re: segfault

Post by os64dev »

jhawthorn wrote:Found myself 5 minutes to look at it and found the source of the problem.

Please change

Code: Select all

sscanf(&argv[1][t*2], "%2x", (unsigned int *)&raw_inhash[t]);
to

Code: Select all

unsigned int tmp;
sscanf(&argv[1][t*2], "%2x", &tmp);
raw_inhash[t] = tmp;
Keep up the good work. I love watching projects like this.
Doh, that error was kind of obvious. I didn't look into it because is didn't rearrange the code but thanks. Ofcourse we will continue i have to beet Cain :twisted:
Author of COBOS
User avatar
os64dev
Member
Member
Posts: 553
Joined: Sat Jan 27, 2007 3:21 pm
Location: Best, Netherlands

Post by os64dev »

an additonal 5%, though now it gettting problematic to increase speed. The MD5 has is almost entirely done in registers. Now i will use this version as a basis for multithreading.

old version:
$ time ./brute.exe d6a6bc0db10694a2d90e3a69648f3a03 6
Collision Found!
hash[d6a6bc0db10694a2d90e3a69648f3a03] = 'hacker'
- time: 48.81s
- avg. hash/s: 4178977.22 h/s

real 0m48.938s
user 0m48.859s
sys 0m0.000s
new version:
$ time ./brute.exe d6a6bc0db10694a2d90e3a69648f3a03 6
Collision Found!
hash[d6a6bc0db10694a2d90e3a69648f3a03] = 'hacker'
- time: 22.98s
- avg. hash/s: 4371943.88 h/s

real 0m23.047s
user 0m23.015s
sys 0m0.015s
Attachments

[The extension cc has been deactivated and can no longer be displayed.]

Author of COBOS
User avatar
jhawthorn
Member
Member
Posts: 58
Joined: Sun Nov 26, 2006 4:06 pm
Location: Victoria, BC, Canada
Contact:

Post by jhawthorn »

Latest changes have put me up to ~4050000 h/s on my AMD64 3000+. I have to disagree with changing digest, raw_inhash, and charset into global variables. Digest will almost certainly be used individually by each thread eventually. charset will, hopefully, not always be a constant. Moreover, all the functions are being inlined, so there shouldn't (assuming your compiler is half sane) be a big performance hit from passing them an additional argument.

I look forward to seeing the threaded and then distributed versions of this piece of code.
GLneo
Member
Member
Posts: 237
Joined: Wed Dec 20, 2006 7:56 pm

Post by GLneo »

well, why even use digest? now that the comparison is made in the md5 hasher why do we still use that variable?:

Code: Select all

int __attribute__((__always_inline__)) md5_hash(unsigned char *message, unsigned int mlength, unsigned char input[16])
{
    uint32_t AA, BB, CC, DD;
    uint32_t *X; 
    uint32_t A, B, C, D; 
    uint32_t i;

    AA = 0x67452301;
    BB = 0xefcdab89;
    CC = 0x98badcfe;
    DD = 0x10325476;

    for(i = 0; i < (mlength / 64); ++i) 
    { 
        A = AA; 
        B = BB; 
        C = CC; 
        D = DD; 

        X = (uint32_t *)&message[i * 64];

        /// round one (unrolled) 
        A = B + ROTATE_LEFT((A + F(B, C, D) + X[ 0] + 0xd76aa478),  7); 
        D = A + ROTATE_LEFT((D + F(A, B, C) + X[ 1] + 0xe8c7b756), 12); 
        C = D + ROTATE_LEFT((C + F(D, A, B) + X[ 2] + 0x242070db), 17); 
        B = C + ROTATE_LEFT((B + F(C, D, A) + X[ 3] + 0xc1bdceee), 22); 
        A = B + ROTATE_LEFT((A + F(B, C, D) + X[ 4] + 0xf57c0faf),  7); 
        D = A + ROTATE_LEFT((D + F(A, B, C) + X[ 5] + 0x4787c62a), 12); 
        C = D + ROTATE_LEFT((C + F(D, A, B) + X[ 6] + 0xa8304613), 17); 
        B = C + ROTATE_LEFT((B + F(C, D, A) + X[ 7] + 0xfd469501), 22); 
        A = B + ROTATE_LEFT((A + F(B, C, D) + X[ 8] + 0x698098d8),  7); 
        D = A + ROTATE_LEFT((D + F(A, B, C) + X[ 9] + 0x8b44f7af), 12); 
        C = D + ROTATE_LEFT((C + F(D, A, B) + X[10] + 0xffff5bb1), 17); 
        B = C + ROTATE_LEFT((B + F(C, D, A) + X[11] + 0x895cd7be), 22); 
        A = B + ROTATE_LEFT((A + F(B, C, D) + X[12] + 0x6b901122),  7); 
        D = A + ROTATE_LEFT((D + F(A, B, C) + X[13] + 0xfd987193), 12); 
        C = D + ROTATE_LEFT((C + F(D, A, B) + X[14] + 0xa679438e), 17); 
        B = C + ROTATE_LEFT((B + F(C, D, A) + X[15] + 0x49b40821), 22); 
        /// round two (unrolled) 
        A = B + ROTATE_LEFT((A + G(B, C, D) + X[ 1] + 0xf61e2562),  5); 
        D = A + ROTATE_LEFT((D + G(A, B, C) + X[ 6] + 0xc040b340),  9); 
        C = D + ROTATE_LEFT((C + G(D, A, B) + X[11] + 0x265e5a51), 14); 
        B = C + ROTATE_LEFT((B + G(C, D, A) + X[ 0] + 0xe9b6c7aa), 20); 
        A = B + ROTATE_LEFT((A + G(B, C, D) + X[ 5] + 0xd62f105d),  5); 
        D = A + ROTATE_LEFT((D + G(A, B, C) + X[10] + 0x02441453),  9); 
        C = D + ROTATE_LEFT((C + G(D, A, B) + X[15] + 0xd8a1e681), 14); 
        B = C + ROTATE_LEFT((B + G(C, D, A) + X[ 4] + 0xe7d3fbc8), 20); 
        A = B + ROTATE_LEFT((A + G(B, C, D) + X[ 9] + 0x21e1cde6),  5); 
        D = A + ROTATE_LEFT((D + G(A, B, C) + X[14] + 0xc33707d6),  9); 
        C = D + ROTATE_LEFT((C + G(D, A, B) + X[ 3] + 0xf4d50d87), 14); 
        B = C + ROTATE_LEFT((B + G(C, D, A) + X[ 8] + 0x455a14ed), 20); 
        A = B + ROTATE_LEFT((A + G(B, C, D) + X[13] + 0xa9e3e905),  5); 
        D = A + ROTATE_LEFT((D + G(A, B, C) + X[ 2] + 0xfcefa3f8),  9); 
        C = D + ROTATE_LEFT((C + G(D, A, B) + X[ 7] + 0x676f02d9), 14); 
        B = C + ROTATE_LEFT((B + G(C, D, A) + X[12] + 0x8d2a4c8a), 20); 
        /// round three (unrolled) 
        A = B + ROTATE_LEFT((A + H(B, C, D) + X[ 5] + 0xfffa3942),  4); 
        D = A + ROTATE_LEFT((D + H(A, B, C) + X[ 8] + 0x8771f681), 11); 
        C = D + ROTATE_LEFT((C + H(D, A, B) + X[11] + 0x6d9d6122), 16); 
        B = C + ROTATE_LEFT((B + H(C, D, A) + X[14] + 0xfde5380c), 23); 
        A = B + ROTATE_LEFT((A + H(B, C, D) + X[ 1] + 0xa4beea44),  4); 
        D = A + ROTATE_LEFT((D + H(A, B, C) + X[ 4] + 0x4bdecfa9), 11); 
        C = D + ROTATE_LEFT((C + H(D, A, B) + X[ 7] + 0xf6bb4b60), 16); 
        B = C + ROTATE_LEFT((B + H(C, D, A) + X[10] + 0xbebfbc70), 23); 
        A = B + ROTATE_LEFT((A + H(B, C, D) + X[13] + 0x289b7ec6),  4); 
        D = A + ROTATE_LEFT((D + H(A, B, C) + X[ 0] + 0xeaa127fa), 11); 
        C = D + ROTATE_LEFT((C + H(D, A, B) + X[ 3] + 0xd4ef3085), 16); 
        B = C + ROTATE_LEFT((B + H(C, D, A) + X[ 6] + 0x04881d05), 23); 
        A = B + ROTATE_LEFT((A + H(B, C, D) + X[ 9] + 0xd9d4d039),  4); 
        D = A + ROTATE_LEFT((D + H(A, B, C) + X[12] + 0xe6db99e5), 11); 
        C = D + ROTATE_LEFT((C + H(D, A, B) + X[15] + 0x1fa27cf8), 16); 
        B = C + ROTATE_LEFT((B + H(C, D, A) + X[ 2] + 0xc4ac5665), 23); 
        /// round four (unrolled) 
        A = B + ROTATE_LEFT((A + I(B, C, D) + X[ 0] + 0xf4292244),  6); 
        D = A + ROTATE_LEFT((D + I(A, B, C) + X[ 7] + 0x432aff97), 10); 
        C = D + ROTATE_LEFT((C + I(D, A, B) + X[14] + 0xab9423a7), 15); 
        B = C + ROTATE_LEFT((B + I(C, D, A) + X[ 5] + 0xfc93a039), 21); 
        A = B + ROTATE_LEFT((A + I(B, C, D) + X[12] + 0x655b59c3),  6); 
        D = A + ROTATE_LEFT((D + I(A, B, C) + X[ 3] + 0x8f0ccc92), 10); 
        C = D + ROTATE_LEFT((C + I(D, A, B) + X[10] + 0xffeff47d), 15); 
        B = C + ROTATE_LEFT((B + I(C, D, A) + X[ 1] + 0x85845dd1), 21); 
        A = B + ROTATE_LEFT((A + I(B, C, D) + X[ 8] + 0x6fa87e4f),  6); 
        D = A + ROTATE_LEFT((D + I(A, B, C) + X[15] + 0xfe2ce6e0), 10); 
        C = D + ROTATE_LEFT((C + I(D, A, B) + X[ 6] + 0xa3014314), 15); 
        B = C + ROTATE_LEFT((B + I(C, D, A) + X[13] + 0x4e0811a1), 21); 
        A = B + ROTATE_LEFT((A + I(B, C, D) + X[ 4] + 0xf7537e82),  6); 
        D = A + ROTATE_LEFT((D + I(A, B, C) + X[11] + 0xbd3af235), 10); 
        C = D + ROTATE_LEFT((C + I(D, A, B) + X[ 2] + 0x2ad7d2bb), 15); 
        B = C + ROTATE_LEFT((B + I(C, D, A) + X[ 9] + 0xeb86d391), 21); 

        AA += A; 
        BB += B; 
        CC += C; 
        DD += D; 
    }

   if((*(unsigned long *)(input) == (AA)) &&
   (*(unsigned long *)(input+4) == (BB)) &&
   (*(unsigned long *)(input+8) == (CC)) &&
   (*(unsigned long *)(input+12) == (DD)))
       return 1;
   return 0;
}
User avatar
os64dev
Member
Member
Posts: 553
Joined: Sat Jan 27, 2007 3:21 pm
Location: Best, Netherlands

Post by os64dev »

jhawthorn wrote:Latest changes have put me up to ~4050000 h/s on my AMD64 3000+. I have to disagree with changing digest, raw_inhash, and charset into global variables. Digest will almost certainly be used individually by each thread eventually. charset will, hopefully, not always be a constant. Moreover, all the functions are being inlined, so there shouldn't (assuming your compiler is half sane) be a big performance hit from passing them an additional argument.

I look forward to seeing the threaded and then distributed versions of this piece of code.
Well you can disagree but i just did it to gain performance. The md5 hash function now uses registers for the whole md5 processing. The additional argument did take a few percent for the same reason as above. I should test is on 64-bit because then the global variables stuff will be converted to RIP-relative addressing.

I added a new version because there was a bug in the previous versions (try aaa as a password). However it seemed to have slowed down a bit.
The process takes longer now but the hashes per second are still high. The total time is bogus any way. for instance test with the password zzzzza and zzzzzz.
$ time ./brute.exe d6a6bc0db10694a2d90e3a69648f3a03 6
Collision Found!
hash[d6a6bc0db10694a2d90e3a69648f3a03] = 'hacker'
- time: 47.14s
- avg. hash/s: 4327197.45 h/s

real 0m47.297s
user 0m47.187s
sys 0m0.030s
@glneo
I tested that and it produced about the same result. I even made a MD5 has that doesn't have the length parameter either based on the assumption that a password generally is smaller then 64-9 = 55 characters, but even that didn't improve much.

@all
i've been trying to get the multithreaded function running and succeeded however the performace didn't even get close to the single threaded version so me is puzzled. I think i leave the MT version for Kevin :wink:
Attachments

[The extension cc has been deactivated and can no longer be displayed.]

Author of COBOS
User avatar
os64dev
Member
Member
Posts: 553
Joined: Sat Jan 27, 2007 3:21 pm
Location: Best, Netherlands

Post by os64dev »

ahh. for such sweet moments we live. I am glad to announce that multi-threading is working \:D/ . And for you pleasure here it is. I know that some of you are eager for the stats. I limited the sequence to 32 characters.
$ g++ brute-mt.cc -foptimize-register-move -finline-functions -fno-exceptions -fno-rtti -fomit-frame-pointer -O3 -march=i686 -o brute.exe

$ time ./brute.exe d6a6bc0db10694a2d90e3a69648f3a03 6 2
threadList[t].sequence[0]: 0
threadList[t].sequence[0]: 13
Collision Found!
hash[d6a6bc0db10694a2d90e3a69648f3a03] = 'hacker'
time: 23.23s
- avg. hash/s: 7106282.43 h/s
#done.

real 0m23.454s
user 0m46.655s
sys 0m0.015s
Attachments

[The extension cc has been deactivated and can no longer be displayed.]

Author of COBOS
frank
Member
Member
Posts: 729
Joined: Sat Dec 30, 2006 2:31 pm
Location: East Coast, USA

Post by frank »

Well I have a Core 2 Duo running at 1.4Ghz and I have some results for the code os64dev posted above

2 Threads

Code: Select all

$ time ./brute d6a6bc0db10694a2d90e3a69648f3a03 6 2
threadList[t].sequence[0]: 0
threadList[t].sequence[0]: 13
Collision Found!
hash[d6a6bc0db10694a2d90e3a69648f3a03] = 'hacker'
time: 38.87s
- avg. hash/s: 4493377.81 h/s
#done.

real    0m38.977s
user    1m14.256s
sys     0m0.093s
1 Thread

Code: Select all

$ time ./brute d6a6bc0db10694a2d90e3a69648f3a03 6 1
threadList[t].sequence[0]: 0
Collision Found!
hash[d6a6bc0db10694a2d90e3a69648f3a03] = 'hacker'
time: 26.98s
- avg. hash/s: 3084769.79 h/s
#done.

real    0m27.098s
user    0m25.864s
sys     0m0.139s
EDIT: Cain takes 52 seconds on my computer when I set it to min 6 max 6 and the lowercase alpha charset. It says 4050000 pass/s.
User avatar
Kevin McGuire
Member
Member
Posts: 843
Joined: Tue Nov 09, 2004 12:00 am
Location: United States
Contact:

Post by Kevin McGuire »

Does this one do any better? I am too afraid to report my findings since the last time I pursued what I thought was faster was not.

Compile
gcc md5.c -o md5 -O3
Options
md5 [hash] [minimum-length] [maximum-length] [thread-count]

Try it with the hash:
d6a6bc0db10694a2d90e3a69648f3a03 = hacker (longer run time)

Also multiple threads on a UNI can increase the cracking time by starting at different offsets in the message space.
Attachments
md5.c
(11.55 KiB) Downloaded 127 times
Post Reply