Q: What is the difference between 32-bit and 64-bit floating point?

A 32-bit single-precision float uses 1 sign bit, 8 exponent bits (bias 127), and 23 mantissa bits, giving about 7 significant decimal digits. A 64-bit double-precision float uses 1 sign bit, 11 exponent bits (bias 1023), and 52 mantissa bits, giving about 15-17 significant decimal digits. Double precision is the default in most programming languages; single precision is preferred in GPU and ML workloads where memory and throughput matter.

Q: What are subnormal (denormalized) numbers?

Subnormal numbers are very small values between zero and the smallest normal float. In 32-bit format, normal numbers have exponent bits from 00000001 to 11111110. When the exponent is all zeros and the mantissa is nonzero, the number is subnormal: the implicit leading bit is 0 instead of 1, and the effective exponent is fixed at 2^-126. Subnormals fill the gap near zero gracefully, but operations on them can be much slower on hardware that handles them in microcode rather than dedicated circuits.

Q: What does NaN mean and how does it arise?

NaN stands for Not a Number. IEEE 754 reserves it for the result of undefined operations: 0 divided by 0, infinity minus infinity, the square root of a negative real number, and similar undefined forms. A computation that produces NaN will propagate it through subsequent arithmetic, making it easy to detect that something went wrong. In the bit pattern, NaN has all exponent bits set to 1 and a nonzero mantissa.

Q: Can I convert from hex back to decimal using this calculator?

This calculator converts decimal to binary/hex. To go the other direction, enter the decimal number that the hex represents, or use a dedicated hex-to-float tool. For the reverse, interpret the 8 hex digits as 32 bits: extract the sign (bit 31), exponent (bits 30-23), and mantissa (bits 22-0), subtract the bias from the exponent, and apply the IEEE 754 formula: (-1)^sign x 2^(exp - 127) x (1 + mantissa).

Q: What is the precision error shown in the results?

The precision error is the difference between the exact decimal you typed and the closest value the binary format can store. For 32-bit float, this can be as large as about 5.96e-8 relative to the value. For 64-bit double, it is typically around 1e-16 relative. A precision error of 0 (exact) means the number you entered happens to be exactly representable as a binary fraction (e.g. 0.5, 1.25, -8.0).

Question 1

Why does 0.1 + 0.2 not equal 0.3 in most programming languages?

Accepted Answer

Because 0.1 and 0.2 cannot be represented exactly in binary. Each is stored as the nearest binary fraction, and when you add two slightly-off values, the result is also slightly off: 0.30000000000000004 in 64-bit. The error is small (about 5.6e-17) but nonzero. To check equality of floats, compare with a small tolerance: |a - b| < epsilon, rather than a == b.

Question 2

What is the difference between 32-bit and 64-bit floating point?

Accepted Answer

A 32-bit single-precision float uses 1 sign bit, 8 exponent bits (bias 127), and 23 mantissa bits, giving about 7 significant decimal digits. A 64-bit double-precision float uses 1 sign bit, 11 exponent bits (bias 1023), and 52 mantissa bits, giving about 15-17 significant decimal digits. Double precision is the default in most programming languages; single precision is preferred in GPU and ML workloads where memory and throughput matter.

Question 3

What are subnormal (denormalized) numbers?

Accepted Answer

Subnormal numbers are very small values between zero and the smallest normal float. In 32-bit format, normal numbers have exponent bits from 00000001 to 11111110. When the exponent is all zeros and the mantissa is nonzero, the number is subnormal: the implicit leading bit is 0 instead of 1, and the effective exponent is fixed at 2^-126. Subnormals fill the gap near zero gracefully, but operations on them can be much slower on hardware that handles them in microcode rather than dedicated circuits.

Question 4

What does NaN mean and how does it arise?

Accepted Answer

NaN stands for Not a Number. IEEE 754 reserves it for the result of undefined operations: 0 divided by 0, infinity minus infinity, the square root of a negative real number, and similar undefined forms. A computation that produces NaN will propagate it through subsequent arithmetic, making it easy to detect that something went wrong. In the bit pattern, NaN has all exponent bits set to 1 and a nonzero mantissa.

Question 5

Can I convert from hex back to decimal using this calculator?

Accepted Answer

This calculator converts decimal to binary/hex. To go the other direction, enter the decimal number that the hex represents, or use a dedicated hex-to-float tool. For the reverse, interpret the 8 hex digits as 32 bits: extract the sign (bit 31), exponent (bits 30-23), and mantissa (bits 22-0), subtract the bias from the exponent, and apply the IEEE 754 formula: (-1)^sign x 2^(exp - 127) x (1 + mantissa).

Question 6

What is the precision error shown in the results?

Accepted Answer

The precision error is the difference between the exact decimal you typed and the closest value the binary format can store. For 32-bit float, this can be as large as about 5.96e-8 relative to the value. For 64-bit double, it is typically around 1e-16 relative. A precision error of 0 (exact) means the number you entered happens to be exactly representable as a binary fraction (e.g. 0.5, 1.25, -8.0).

Sign	Exponent (8 bits)	Mantissa (23 bits)	Value
0	00000000	00000000000000000000000	+Zero
1	00000000	00000000000000000000000	-Zero
0	00000000	any nonzero	Positive subnormal
1	00000000	any nonzero	Negative subnormal
0	00000001 ... 11111110	any	Positive normal number
1	00000001 ... 11111110	any	Negative normal number
0	11111111	00000000000000000000000	+Infinity
1	11111111	00000000000000000000000	-Infinity
x	11111111	any nonzero	NaN (Not a Number)

Floating-Point Calculator (IEEE 754)

Your details

What is IEEE 754 floating-point?

How the three fields work

Precision, rounding, and why 0.1 + 0.2 is not 0.3

Comparing 32-bit and 64-bit formats

IEEE 754 special values (32-bit single precision)

Frequently asked questions

Sources