Posted: Thu Jul 05, 2007 4:59 pm
@jhawthorn i have also found that if you remove the unused variable "digest" it "Collision NOT Found"'s, so I've been walking on egg-shells trying not to rearrange things to much or things stop working
I think he meant it would work on "any" 32bit CPU, you do know there are more then just x86 processors right?Ninjarider wrote:as far as the compatiblity there should be any issue with an intel 32-bit processor using the piplines. it would not have the same speed going from an intel to an amd. it will not exactly double the speed. but gives the possiblility to increase speed up to double for any loops.
not to mention when implementing something like that you can running in to computation errors because v pipe has excecuted an instructions before the u pipe and the u pipe required a value the v pipe changed.
Code: Select all
sscanf(&argv[1][t*2], "%2x", (unsigned int *)&raw_inhash[t]);
Code: Select all
unsigned int tmp;
sscanf(&argv[1][t*2], "%2x", &tmp);
raw_inhash[t] = tmp;
Doh, that error was kind of obvious. I didn't look into it because is didn't rearrange the code but thanks. Ofcourse we will continue i have to beet Cainjhawthorn wrote:Found myself 5 minutes to look at it and found the source of the problem.
Please changetoCode: Select all
sscanf(&argv[1][t*2], "%2x", (unsigned int *)&raw_inhash[t]);
Keep up the good work. I love watching projects like this.Code: Select all
unsigned int tmp; sscanf(&argv[1][t*2], "%2x", &tmp); raw_inhash[t] = tmp;
new version:$ time ./brute.exe d6a6bc0db10694a2d90e3a69648f3a03 6
Collision Found!
hash[d6a6bc0db10694a2d90e3a69648f3a03] = 'hacker'
- time: 48.81s
- avg. hash/s: 4178977.22 h/s
real 0m48.938s
user 0m48.859s
sys 0m0.000s
$ time ./brute.exe d6a6bc0db10694a2d90e3a69648f3a03 6
Collision Found!
hash[d6a6bc0db10694a2d90e3a69648f3a03] = 'hacker'
- time: 22.98s
- avg. hash/s: 4371943.88 h/s
real 0m23.047s
user 0m23.015s
sys 0m0.015s
Code: Select all
int __attribute__((__always_inline__)) md5_hash(unsigned char *message, unsigned int mlength, unsigned char input[16])
{
uint32_t AA, BB, CC, DD;
uint32_t *X;
uint32_t A, B, C, D;
uint32_t i;
AA = 0x67452301;
BB = 0xefcdab89;
CC = 0x98badcfe;
DD = 0x10325476;
for(i = 0; i < (mlength / 64); ++i)
{
A = AA;
B = BB;
C = CC;
D = DD;
X = (uint32_t *)&message[i * 64];
/// round one (unrolled)
A = B + ROTATE_LEFT((A + F(B, C, D) + X[ 0] + 0xd76aa478), 7);
D = A + ROTATE_LEFT((D + F(A, B, C) + X[ 1] + 0xe8c7b756), 12);
C = D + ROTATE_LEFT((C + F(D, A, B) + X[ 2] + 0x242070db), 17);
B = C + ROTATE_LEFT((B + F(C, D, A) + X[ 3] + 0xc1bdceee), 22);
A = B + ROTATE_LEFT((A + F(B, C, D) + X[ 4] + 0xf57c0faf), 7);
D = A + ROTATE_LEFT((D + F(A, B, C) + X[ 5] + 0x4787c62a), 12);
C = D + ROTATE_LEFT((C + F(D, A, B) + X[ 6] + 0xa8304613), 17);
B = C + ROTATE_LEFT((B + F(C, D, A) + X[ 7] + 0xfd469501), 22);
A = B + ROTATE_LEFT((A + F(B, C, D) + X[ 8] + 0x698098d8), 7);
D = A + ROTATE_LEFT((D + F(A, B, C) + X[ 9] + 0x8b44f7af), 12);
C = D + ROTATE_LEFT((C + F(D, A, B) + X[10] + 0xffff5bb1), 17);
B = C + ROTATE_LEFT((B + F(C, D, A) + X[11] + 0x895cd7be), 22);
A = B + ROTATE_LEFT((A + F(B, C, D) + X[12] + 0x6b901122), 7);
D = A + ROTATE_LEFT((D + F(A, B, C) + X[13] + 0xfd987193), 12);
C = D + ROTATE_LEFT((C + F(D, A, B) + X[14] + 0xa679438e), 17);
B = C + ROTATE_LEFT((B + F(C, D, A) + X[15] + 0x49b40821), 22);
/// round two (unrolled)
A = B + ROTATE_LEFT((A + G(B, C, D) + X[ 1] + 0xf61e2562), 5);
D = A + ROTATE_LEFT((D + G(A, B, C) + X[ 6] + 0xc040b340), 9);
C = D + ROTATE_LEFT((C + G(D, A, B) + X[11] + 0x265e5a51), 14);
B = C + ROTATE_LEFT((B + G(C, D, A) + X[ 0] + 0xe9b6c7aa), 20);
A = B + ROTATE_LEFT((A + G(B, C, D) + X[ 5] + 0xd62f105d), 5);
D = A + ROTATE_LEFT((D + G(A, B, C) + X[10] + 0x02441453), 9);
C = D + ROTATE_LEFT((C + G(D, A, B) + X[15] + 0xd8a1e681), 14);
B = C + ROTATE_LEFT((B + G(C, D, A) + X[ 4] + 0xe7d3fbc8), 20);
A = B + ROTATE_LEFT((A + G(B, C, D) + X[ 9] + 0x21e1cde6), 5);
D = A + ROTATE_LEFT((D + G(A, B, C) + X[14] + 0xc33707d6), 9);
C = D + ROTATE_LEFT((C + G(D, A, B) + X[ 3] + 0xf4d50d87), 14);
B = C + ROTATE_LEFT((B + G(C, D, A) + X[ 8] + 0x455a14ed), 20);
A = B + ROTATE_LEFT((A + G(B, C, D) + X[13] + 0xa9e3e905), 5);
D = A + ROTATE_LEFT((D + G(A, B, C) + X[ 2] + 0xfcefa3f8), 9);
C = D + ROTATE_LEFT((C + G(D, A, B) + X[ 7] + 0x676f02d9), 14);
B = C + ROTATE_LEFT((B + G(C, D, A) + X[12] + 0x8d2a4c8a), 20);
/// round three (unrolled)
A = B + ROTATE_LEFT((A + H(B, C, D) + X[ 5] + 0xfffa3942), 4);
D = A + ROTATE_LEFT((D + H(A, B, C) + X[ 8] + 0x8771f681), 11);
C = D + ROTATE_LEFT((C + H(D, A, B) + X[11] + 0x6d9d6122), 16);
B = C + ROTATE_LEFT((B + H(C, D, A) + X[14] + 0xfde5380c), 23);
A = B + ROTATE_LEFT((A + H(B, C, D) + X[ 1] + 0xa4beea44), 4);
D = A + ROTATE_LEFT((D + H(A, B, C) + X[ 4] + 0x4bdecfa9), 11);
C = D + ROTATE_LEFT((C + H(D, A, B) + X[ 7] + 0xf6bb4b60), 16);
B = C + ROTATE_LEFT((B + H(C, D, A) + X[10] + 0xbebfbc70), 23);
A = B + ROTATE_LEFT((A + H(B, C, D) + X[13] + 0x289b7ec6), 4);
D = A + ROTATE_LEFT((D + H(A, B, C) + X[ 0] + 0xeaa127fa), 11);
C = D + ROTATE_LEFT((C + H(D, A, B) + X[ 3] + 0xd4ef3085), 16);
B = C + ROTATE_LEFT((B + H(C, D, A) + X[ 6] + 0x04881d05), 23);
A = B + ROTATE_LEFT((A + H(B, C, D) + X[ 9] + 0xd9d4d039), 4);
D = A + ROTATE_LEFT((D + H(A, B, C) + X[12] + 0xe6db99e5), 11);
C = D + ROTATE_LEFT((C + H(D, A, B) + X[15] + 0x1fa27cf8), 16);
B = C + ROTATE_LEFT((B + H(C, D, A) + X[ 2] + 0xc4ac5665), 23);
/// round four (unrolled)
A = B + ROTATE_LEFT((A + I(B, C, D) + X[ 0] + 0xf4292244), 6);
D = A + ROTATE_LEFT((D + I(A, B, C) + X[ 7] + 0x432aff97), 10);
C = D + ROTATE_LEFT((C + I(D, A, B) + X[14] + 0xab9423a7), 15);
B = C + ROTATE_LEFT((B + I(C, D, A) + X[ 5] + 0xfc93a039), 21);
A = B + ROTATE_LEFT((A + I(B, C, D) + X[12] + 0x655b59c3), 6);
D = A + ROTATE_LEFT((D + I(A, B, C) + X[ 3] + 0x8f0ccc92), 10);
C = D + ROTATE_LEFT((C + I(D, A, B) + X[10] + 0xffeff47d), 15);
B = C + ROTATE_LEFT((B + I(C, D, A) + X[ 1] + 0x85845dd1), 21);
A = B + ROTATE_LEFT((A + I(B, C, D) + X[ 8] + 0x6fa87e4f), 6);
D = A + ROTATE_LEFT((D + I(A, B, C) + X[15] + 0xfe2ce6e0), 10);
C = D + ROTATE_LEFT((C + I(D, A, B) + X[ 6] + 0xa3014314), 15);
B = C + ROTATE_LEFT((B + I(C, D, A) + X[13] + 0x4e0811a1), 21);
A = B + ROTATE_LEFT((A + I(B, C, D) + X[ 4] + 0xf7537e82), 6);
D = A + ROTATE_LEFT((D + I(A, B, C) + X[11] + 0xbd3af235), 10);
C = D + ROTATE_LEFT((C + I(D, A, B) + X[ 2] + 0x2ad7d2bb), 15);
B = C + ROTATE_LEFT((B + I(C, D, A) + X[ 9] + 0xeb86d391), 21);
AA += A;
BB += B;
CC += C;
DD += D;
}
if((*(unsigned long *)(input) == (AA)) &&
(*(unsigned long *)(input+4) == (BB)) &&
(*(unsigned long *)(input+8) == (CC)) &&
(*(unsigned long *)(input+12) == (DD)))
return 1;
return 0;
}
Well you can disagree but i just did it to gain performance. The md5 hash function now uses registers for the whole md5 processing. The additional argument did take a few percent for the same reason as above. I should test is on 64-bit because then the global variables stuff will be converted to RIP-relative addressing.jhawthorn wrote:Latest changes have put me up to ~4050000 h/s on my AMD64 3000+. I have to disagree with changing digest, raw_inhash, and charset into global variables. Digest will almost certainly be used individually by each thread eventually. charset will, hopefully, not always be a constant. Moreover, all the functions are being inlined, so there shouldn't (assuming your compiler is half sane) be a big performance hit from passing them an additional argument.
I look forward to seeing the threaded and then distributed versions of this piece of code.
@glneo$ time ./brute.exe d6a6bc0db10694a2d90e3a69648f3a03 6
Collision Found!
hash[d6a6bc0db10694a2d90e3a69648f3a03] = 'hacker'
- time: 47.14s
- avg. hash/s: 4327197.45 h/s
real 0m47.297s
user 0m47.187s
sys 0m0.030s
$ g++ brute-mt.cc -foptimize-register-move -finline-functions -fno-exceptions -fno-rtti -fomit-frame-pointer -O3 -march=i686 -o brute.exe
$ time ./brute.exe d6a6bc0db10694a2d90e3a69648f3a03 6 2
threadList[t].sequence[0]: 0
threadList[t].sequence[0]: 13
Collision Found!
hash[d6a6bc0db10694a2d90e3a69648f3a03] = 'hacker'
time: 23.23s
- avg. hash/s: 7106282.43 h/s
#done.
real 0m23.454s
user 0m46.655s
sys 0m0.015s
Code: Select all
$ time ./brute d6a6bc0db10694a2d90e3a69648f3a03 6 2
threadList[t].sequence[0]: 0
threadList[t].sequence[0]: 13
Collision Found!
hash[d6a6bc0db10694a2d90e3a69648f3a03] = 'hacker'
time: 38.87s
- avg. hash/s: 4493377.81 h/s
#done.
real 0m38.977s
user 1m14.256s
sys 0m0.093s
Code: Select all
$ time ./brute d6a6bc0db10694a2d90e3a69648f3a03 6 1
threadList[t].sequence[0]: 0
Collision Found!
hash[d6a6bc0db10694a2d90e3a69648f3a03] = 'hacker'
time: 26.98s
- avg. hash/s: 3084769.79 h/s
#done.
real 0m27.098s
user 0m25.864s
sys 0m0.139s