Basic data types—floating-point types

In this section:

In the IAR C/C++ Compiler for RISC-V, floating-point values are represented in standard IEC 60559 format. The sizes for the different floating-point types are:

Type	Size	Range (+/-)	Decimals	Exponent	Mantissa	Alignment
`float`	32 bits	±1.18E-38 to ±3.40E+38	7	8 bits	23 bits	4
`double`	64 bits	±2.23E-308 to ±1.79E+308	15	11 bits	52 bits	8
`long double`	64 bits	±2.23E-308 to ±1.79E+308	15	11 bits	52 bits	8

Table 78. Floating-point types

Floating-point environment

The feraiseexcept function does not raise any exceptions, it just sets the corresponding exception flags.

Exception flags for floating-point values are supported for operations performed by the FPU. For devices with a 64-bit FPU, they are defined in the fenv.h file.

32-bit floating-point format

The representation of a 32-bit floating-point number as an integer is:

The exponent is 8 bits, and the mantissa is 23 bits.

The value of the number is:

(-1)^S * 2^{(Exponent-127)} * 1.Mantissa

The range of the number is at least:

±1.18E-38 to ±3.39E+38

The precision of the float operators (+, -, *, and /) is approximately 7 decimal digits.

64-bit floating-point format

The representation of a 64-bit floating-point number as an integer is:

The exponent is 11 bits, and the mantissa is 52 bits.

The value of the number is:

(-1)^S * 2^{(Exponent-1023)} * 1.Mantissa

The range of the number is at least:

±2.23E-308 to ±1.79E+308

The precision of the float operators (+, -, *, and /) is approximately 15 decimal digits.

Representation of special floating-point numbers

This list describes the representation of special floating-point numbers:

Zero is represented by zero mantissa and exponent. The sign bit signifies positive or negative zero.
Infinity is represented by setting the exponent to the highest value and the mantissa to zero. The sign bit signifies positive or negative infinity.
Not a number (NaN) is represented by setting the exponent to the highest positive value and the mantissa to a non-zero value. The value of the sign bit is ignored.
Subnormal numbers are used for representing values smaller than what can be represented by normal values. The drawback is that the precision will decrease with smaller values. The exponent is set to 0 to signify that the number is subnormal, even though the number is treated as if the exponent was 1. Unlike normal numbers, subnormal numbers do not have an implicit 1 as the most significant bit (the MSB) of the mantissa. The value of a subnormal number is:
```
(-1)^S * 2^(1-BIAS) * 0.Mantissa
```
where BIAS is 127 and 1023 for 32-bit and 64-bit floating-point values, respectively.

IAR Embedded Workbench for RISC-V 3.40