From d27623e797ca7c8a1d53ba08088b0a1d8a07ae15 Mon Sep 17 00:00:00 2001 From: Perry Kivolowitz Date: Thu, 16 Jun 2022 12:30:06 -0500 Subject: [PATCH] refactored the text and completed what --- section_1/float/.vscode/settings.json | 3 + section_1/float/floatster.cpp | 34 +++++ section_1/float/what.md | 207 +++++++++++++++++++------- 3 files changed, 194 insertions(+), 50 deletions(-) create mode 100644 section_1/float/floatster.cpp diff --git a/section_1/float/.vscode/settings.json b/section_1/float/.vscode/settings.json index 9a7112f..2f37ace 100644 --- a/section_1/float/.vscode/settings.json +++ b/section_1/float/.vscode/settings.json @@ -5,5 +5,8 @@ }, "cSpell.ignoreWords": [ "SIZF" + ], + "cSpell.words": [ + "isinf" ] } \ No newline at end of file diff --git a/section_1/float/floatster.cpp b/section_1/float/floatster.cpp new file mode 100644 index 0000000..08e0879 --- /dev/null +++ b/section_1/float/floatster.cpp @@ -0,0 +1,34 @@ +#include +#include +#include +#include +#include + +using namespace std; + +union U { + float f; + uint32_t i; +}; + +int main(int argc, char ** argv) { + U u; + int e; + + cout << "Enter a number (-100 causes divide by 0, -200 causes sqrt(-1): "; + cin >> u.f; + if (u.f == -100) { + cout << "Dividing by zero." << endl; + u.f /= 0.0; + } else if (u.f == -200) { + cout << "Using sqrt(-1))." << endl; + u.f = sqrtf(-1); + } + e = ((u.i >> 23) & 0xFF); + cout << "sign: " << hex << ((u.i >> 31) & 0x01) << endl; + cout << "exp: " << hex << e << " debiased: " << dec << e - 127 << endl; + cout << "frac: " << hex << setw(7) << setfill('0') << (u.i & 0x7FFFFF) << endl; + cout << "NaN: " << isnan(u.f) << endl; + cout << "Inf: " << isinf(u.f) << endl; + return 0; +} \ No newline at end of file diff --git a/section_1/float/what.md b/section_1/float/what.md index 39a4342..a065918 100644 --- a/section_1/float/what.md +++ b/section_1/float/what.md @@ -21,16 +21,15 @@ apart. Here are some examples: -```text -% ./a.out +```text Must supply a floating point value as a command line argument. -% ``` -This is what happens when you do not provide a value to examine. +The above is what happens when you do not provide a value to examine. + +Next, let's see what is output from the value of 1. ```text -% ./a.out 1 Component Double Float Comment Value: 1 1 Delta(F - D): 0 Sign: 0 0 @@ -42,27 +41,61 @@ Quarters: 0 0 Eighths: 0 0 Sixteenths: 0 0 Thirty seconds: 0 0 -% +Full fraction: 0 0 +Equation: 1 x 2^0 1 x 2^0 ``` -Above, we examine the value of 1. +On the line marked "Value" you can see the values represented as double precision +and as single precious. Under "Comment" you can see that there +is no difference between the double and the single precision numbers. Remember +the key thing about floating point numbers: they are approximations. Sometimes, +as in the case of whole numbers like 1, the approximation is exact. When there +is a difference, the difference will be small and printed in the Comment +column. -On the line marked "Value" you can see the values represented as double precision and as single precious. Under "Comment" you can see that there -is no difference between the double and the single precision numbers. +The Sign field is 0. This indicates that the whole floating point value is positive. +There are no other sign values including in the exponent. However, exponents can +be negative... this is explained next. -| Line | Meaning | -| ---- | ------- | -| Sign | 1 is a positive number so the sign bits are 0 | -| Exponent | First, notice that the double precision exponent is 11 bits wide while the single precision exponent is only 8 bits wide. Next, notice the values... 1023 and 127 respectively. The value of 1 is 1 raised to the power of 0 base 2. So why 1023 or 127?
There is no sign bit for the exponent yet the exponent must support negative numbers. It does this by incorporating an offset of 1023 and 127 respectively (where both work out to a value of 0). Anything above 1023 and 127 are positive exponents. Anything below these values are negative exponents. -| De-biased | These are the values of the exponent with their bias removed. Notice they work out to 0. So, the value of 1 is represented by 1 raised to the power of 0 base 2. | -| Fraction | Zero??? Where's the 1 that we've been talking about get stored? It isn't. A value of 1 is always assumed to be the only value in front of the decimal place in a `float` or `double`. Every floating point value is 1 plus a fraction all raised to some power base 2. | -| Halves | There are no halves in the value of 1.| -| Quarters | There are no quarters in the value of 1.| -| Eighths | There are no eighths in the value of 1.| -| Sixteenths | There are no sixteenths in the value of 1.| -| Thirty Seconds | There are no thirty seconds in the value of 1.| +First, notice that the double precision exponent is 11 bits wide while the single +precision exponent is only 8 bits wide. Next, notice the values... 1023 and 127 +respectively. The value of 1 is 1 raised to the power of 0 base 2. So why 1023 +or 127? -Of course, there are more fractional values to `float` and `doubles` but listing them all wouldn't be a fun tasks and we're all about fun. :) +There is no sign bit for the exponent yet the exponent must support negative numbers. +It does this by incorporating an offset of 1023 and 127 respectively (both representing +0). Anything above 1023 and 127 are positive exponents. Anything below these values +are negative exponents. + +The De-biased line are the values of the exponent with their bias removed. +Notice they work out to 0. So, the value of 1 is represented by 1 raised to the power of 0. + +The Fraction has a value of zero. Where's the 1 that we've been talking about get stored? +It isn't. A value of 1 is always assumed to be the only value in front of the decimal place +in a `float` or `double`. Every floating point value is 1 plus a fraction all raised to +some power of 2. + +We thought we'd highlight a few of the bits in the fractional part of a floating point +number. These can be illuminating when the value being shown is in the range of +-2 < x < 2. Notice the the values of -2 and 2 are outside this range. In other words, +showing the first few bits of the fraction are illuminating when the exponent works +out to 0. + +* Halves - There are no halves in the value of 1. + +* Quarters - There are no quarters in the value of 1. + +* Eighths - There are no eighths in the value of 1. + +* Sixteenths - There are no sixteenths in the value of 1. + +* Thirty Seconds - There are no thirty seconds in the value of 1. + +Of course, there are more fractional values to `float` and `doubles` but listing them all +wouldn't be a fun tasks and we're all about fun. :) + +Finally, the Equation line rebuilds the floating point value in its actual "scientific" +notation. The value of 1 is a 1 raised to the zeroth power of 2. How about a value of 1.5? @@ -77,16 +110,13 @@ Halves: 1 1 Quarters: 0 0 Eighths: 0 0 Sixteenths: 0 0 -Thirty seconds: 0 0 +Thirty seconds: 0 0 +Full fraction: 0.5 0.5 +Equation: 1.5 x 2^0 1.5 x 2^0 ``` -The only difference is that there is a bit turned on in the fraction. It is the most significant bit... there is a half in one and a half. - -1 ^ 0 = 1 + - -1 ^ -1 = ½ - -Altogether makes 1.5. +The only difference is that there is a bit turned on in the fraction. +It is the most significant bit... there is a half in one and a half. How about 1.875? @@ -101,43 +131,120 @@ Halves: 1 1 Quarters: 1 1 Eighths: 1 1 Sixteenths: 0 0 -Thirty seconds: 0 0 +Thirty seconds: 0 0 +Full fraction: 0.875 0.875 +Equation: 1.875 x 2^0 1.875 x 2^0 ``` -This says 1.875 is: +How about 8.5? -1 ^ 0 = 1 + - -1 ^ -1 = ½ + - -1 ^ -2 = ¼ + - -1 ^ -3 = ⅛ - -How about 8.375? This is the first time we are looking at +This is the first time we are looking at a value which increases the (de-biased) exponent to non-zero. -Things get a little more complicated. +Things get a little more complicated. Now, there isn't an +obvious mapping of the fraction bits to the final number they +represent. This is the impact of the non-zero exponent. ```text Component Double Float Comment -Value: 8.375 8.375 Delta(F - D): 0 +Value: 8.5 8.5 Delta(F - D): 0 Sign: 0 0 Exponent (hex): 402 82 De-biased (dec): 3 3 -Fraction (hex): c00000000000 60000 +Fraction (hex): 1000000000000 80000 Halves: 0 0 Quarters: 0 0 Eighths: 0 0 -Sixteenths: 0 0 -Thirty seconds: 1 1 +Sixteenths: 1 1 +Thirty seconds: 0 0 +Full fraction: 0.0625 0.0625 +Equation: 1.0625 x 2^3 1.0625 x 2^3 ``` -Notice the exponent has changed. This says: +Even though there is a half in eight and a half, the Halves bit +is 0. What is 8? Eight is a 2 raised to the power of 3. In +other words, the bit for the half in 8.5 is shifted to the +right by three bits. Confirm this by looking at the +Sixteenths. *There's our bit!* -1 ^ 0 = 1 + +Turn your attention to the Equation. 1.0625 multiplied by 8 +is 8.5. Cool huh? -1 ^ -1 = ½ + +How about something harder? Like 8.51 - just a teensy bit +different from the previous example. -1 ^ -2 = ¼ + +```text +Component Double Float Comment +Value: 8.51 8.510000229 Delta(F - D): 2.288818362e-07 +Sign: 0 0 +Exponent (hex): 402 82 +De-biased (dec): 3 3 +Fraction (hex): 1051eb851eb85 828f6 +Halves: 0 0 +Quarters: 0 0 +Eighths: 0 0 +Sixteenths: 1 1 +Thirty seconds: 0 0 +Full fraction: 0.06375 0.06375002861 +Equation: 1.06375 x 2^3 1.0637500286 x 2^3 +``` -1 ^ -3 = ⅛ +For the first time we're seeing that 8.51 cannot be perfectly +represented by `float`. `double` gets it right. The difference +between the `double` and `float` is the very small number shown +on the first line of output. + +## When a Number if Not a Number and How About Infinity? + +`NaN` is an actual value. It means `not a number`. + +[Here](./floatster.cpp) is the source code to another program we +have written that explores both `NaN` and `Inf`. + +Let's examine `NaN` which is produced when you do naughty things +like take the square root of a negative number. + +```text +Enter a number (-100 causes divide by 0, -200 causes sqrt(-1): -200 +Using sqrt(-1)). +sign: 0 +exp: ff debiased: 128 +frac: 0400000 +NaN: 1 +Inf: 0 +``` + +`Nan` is true (for `float`) when its exponent is 0xFF and the sign +is 0. So, you'll +never get a `float` that is 2 raised to the power of 128 because +that value is reserved for `NaN` and `Inf`. + +How about `Inf`? + +```text +Enter a number (-100 causes divide by 0, -200 causes sqrt(-1): -100 +Dividing by zero. +sign: 1 +exp: ff debiased: 128 +frac: 0000000 +NaN: 0 +Inf: 1 +``` + +Once again, notice the out-of-bounds value for the exponent: 0xFF. +Second notice that the sign bit is 1. This stands for `Inf` or +an infinite result. + +## Testing for Naughty Values + +Thankfully, there exists two functions that will do the inspection +for you, looking for `Nan` and `Inf`. + +* `isnan(floating point value)` and + +* `isinf(floating point value)` + +Both of these functions work with `double` and `float`. + +Once a variable goes `NaN` or `Inf`, all subsequent operations +will remain `NaN` or `Inf` until the variable is reset to a +valid number.