Hi,
Thomas wrote:You guys are incorrect it seems , ouput is 2.00 , if a = 3.0 and b = 2.0 and 3.00 if a = 3.0 and b = 1.0
That's because the rounding errors cancel each other out. For an example, using three digits:
2/3 = 667*10^(-3)
667*10^(-3) * 3 = 200*10^(-2) = 2
Now try "((4/3)-1)*3" from my post:
Code: Select all
#include <stdio.h>
int main(int argc, char *argv[]) {
float a = 3.0f;
float b = 4.0f;
float c = 1.0f;
float x;
x = (b/a - c) * a;
printf("%#.30f\n", x);
}
And now with "double":
Code: Select all
#include <stdio.h>
int main(int argc, char *argv[]) {
double a = 3.0f;
double b = 4.0f;
double c = 1.0f;
double x;
x = (b/a - c) * a;
printf("%#.30f\n", x);
}
Now, let's do it with an 8-bit relational numbers (4-bit numerator and 4-bit divisor):
Code: Select all
a = {0011b / 0001b};
b = {0100b / 0001b};
c = {0001b / 0001b};
x = (b/a - c) * a
= ( {0100b / 0001b} / {0011b / 0001b} - {0001b / 0001b} ) * {0011b / 0001b}; // Replace variables
= ( {0100b / 0001b} * {0001b / 0011b} - {0001b / 0001b} ) * {0011b / 0001b}; // Multiply by reciprocal rather than divide
= ( {1100b / 0011b} * {0001b / 0011b} - {0001b / 0001b} ) * {0011b / 0001b}; // Get divisors the same
= ( {1100b / 1001b} - {0001b / 0001b} ) * {0011b / 0001b}; // Do first multiplication
= ( {0100b / 0011b} - {0001b / 0001b} ) * {0011b / 0001b}; // Remove highest common denominator (3)
= ( {0100b / 0011b} - {0011b / 0011b} ) * {0011b / 0001b}; // Get divisors the same
= {0001b / 0011b} * {0011b / 0001b}; // Do subtraction
= {0001b / 0011b} * {0011b / 0001b}; // Remove highest common denominator (1)
= {0001b / 0011b} * {1001b / 0011b}; // Get divisors the same
= {1001b / 1001b}; // Do second multiplication
= {0001b / 0001b}; // Remove highest common denominator (9)
= 1
Who wants to know what 0.1 looks like in binary? Here it is (it's recursive):
0.000110011001100110011001100110011...
In 32-bit floating point it becomes "1.100110011001100110011010b * 2^-4" (which is about 0.10000000149011611938 in decimal).
In 64-bit floating point it becomes "1.10011001100110011001100110011001100110011001100110011b * 2^-4" (which is about 0.099999999999999998612 in decimal).
In 8-bit rational, it becomes "0010b / 0101b" (which is a perfect representation of 0.1).
For the examples above, floating point numbers would need an infinite number of bits just to get close to what rational numbers can do in 8-bits!
Cheers,
Brendan