Skip to main content

IAR Embedded Workbench for RL78 5.20

Basic data types—floating-point types

In this section:

In the IAR C/C++ Compiler for RL78, floating-point values are represented in standard IEC 60559 format. The sizes for the different floating-point types are:

Type

Size if double=32

Size if double=64

float

32 bits

32 bits

double

32 bits (default)

64 bits

long double

32 bits

64 bits

Table 77. Floating-point types 


Note

The size of double and long double depends on the ‑‑double={32|64} option, see ‑‑double. The type long double uses the same precision as double.

Floating-point environment

Exception flags are not supported. The feraiseexcept function does not raise any exceptions.

32-bit floating-point format

The representation of a 32-bit floating-point number as an integer is:

32bitFloatFormat_1.png

The exponent is 8 bits, and the mantissa is 23 bits.

The value of the number is:

(-1)S * 2(Exponent-127) * 1.Mantissa

The range of the number is at least:

±1.18E-38 to ±3.39E+38

The precision of the float operators (+, -, *, and /) is approximately 7 decimal digits.

64-bit floating-point format

The representation of a 64-bit floating-point number as an integer is:

64bitFloatFormat_1.png

The exponent is 11 bits, and the mantissa is 52 bits.

The value of the number is:

(-1)S * 2(Exponent-1023) * 1.Mantissa

The range of the number is at least:

±2.23E-308 to ±1.79E+308

The precision of the float operators (+, -, *, and /) is approximately 15 decimal digits.

Representation of special floating-point numbers

This list describes the representation of special floating-point numbers:

  • Zero is represented by zero mantissa and exponent. The sign bit signifies positive or negative zero.

  • Infinity is represented by setting the exponent to the highest value and the mantissa to zero. The sign bit signifies positive or negative infinity.

  • Not a number (NaN) is represented by setting the exponent to the highest positive value and the mantissa to a non-zero value. The value of the sign bit is ignored. The uppermost bits must be set.

  • Subnormal numbers are used for representing values smaller than what can be represented by normal values. The drawback is that the precision will decrease with smaller values. The exponent is set to 0 to signify that the number is subnormal, even though the number is treated as if the exponent was 1. Unlike normal numbers, subnormal numbers do not have an implicit 1 as the most significant bit (the MSB) of the mantissa. The value of a subnormal number is:

    (-1)S * 2(1-BIAS) * 0.Mantissa

    where BIAS is 127.