Float 8#

class onnx_array_api.validation.f8.CastFloat8[source]#

Helpers to cast float8 into float32 or the other way around.

static find_closest_value(value, sorted_values)[source]#

Search a value into a sorted array of values.

Parameters:
  • value – float32 value to search

  • sorted_values – list of tuple [(float 32, byte)]

Returns:

byte

The function searches into the first column the closest value and return the value on the second columns.

exception onnx_array_api.validation.f8.UndefinedCastError[source]#

Unable to case a number.

onnx_array_api.validation.f8.display_fe4m3(value, sign=1, exponent=4, mantissa=3)[source]#

Displays a float 8 E4M3 into b.

Parameters:
  • value – value to display (int)

  • sign – number of bits for the sign

  • exponent – number of bits for the exponent

  • mantissa – number of bits for the mantissa

Returns:

string

onnx_array_api.validation.f8.display_fe5m2(value, sign=1, exponent=4, mantissa=3)[source]#

Displays a float 8 E5M2 into binary format.

Parameters:
  • value – value to display (int)

  • sign – number of bits for the sign

  • exponent – number of bits for the exponent

  • mantissa – number of bits for the mantissa

Returns:

string

onnx_array_api.validation.f8.display_fexmx(value, sign, exponent, mantissa)[source]#

Displays any float encoded with 1 bit for the sign, exponent bit for the exponent and mantissa bit for the mantissa.

Parameters:
  • value – value to display (int)

  • sign – number of bits for the sign

  • exponent – number of bits for the exponent

  • mantissa – number of bits for the mantissa

Returns:

string

onnx_array_api.validation.f8.display_float16(value, sign=1, exponent=5, mantissa=10)[source]#

Displays a float32 into b.

Parameters:
  • value – value to display (float16)

  • sign – number of bits for the sign

  • exponent – number of bits for the exponent

  • mantissa – number of bits for the mantissa

Returns:

string

onnx_array_api.validation.f8.display_float32(value, sign=1, exponent=8, mantissa=23)[source]#

Displays a float32 into b.

Parameters:
  • value – value to display (float32)

  • sign – number of bits for the sign

  • exponent – number of bits for the exponent

  • mantissa – number of bits for the mantissa

Returns:

string

onnx_array_api.validation.f8.display_int(ival, sign=1, exponent=8, mantissa=23)[source]#

Displays an integer as bits.

Parameters:
  • ival – value to display (float32)

  • sign – number of bits for the sign

  • exponent – number of bits for the exponent

  • mantissa – number of bits for the mantissa

Returns:

string

onnx_array_api.validation.f8.fe4m3_to_float32(ival: int, fn: bool = True, uz: bool = False) float[source]#

Casts a float E4M3 encoded as an integer into a float.

Parameters:
  • ival – byte

  • fn – no inifinite values

  • uz – no negative zero

Returns:

float (float 32)

onnx_array_api.validation.f8.fe4m3_to_float32_float(ival: int, fn: bool = True, uz: bool = False) float[source]#

Casts a float 8 encoded as an integer into a float.

Parameters:
  • ival – byte

  • fn – no infinite values

  • uz – no negative zero

Returns:

float (float 32)

onnx_array_api.validation.f8.fe5m2_to_float32(ival: int, fn: bool = False, uz: bool = False) float[source]#

Casts a float E5M2 encoded as an integer into a float.

Parameters:
  • ival – byte

  • fn – no inifinite values

  • uz – no negative values

Returns:

float (float 32)

onnx_array_api.validation.f8.fe5m2_to_float32_float(ival: int, fn: bool = False, uz: bool = False) float[source]#

Casts a float 8 encoded as an integer into a float.

Parameters:
  • ival – byte

  • fn – no infinite values

  • uz – no negative zero

Returns:

float (float 32)

onnx_array_api.validation.f8.float32_to_fe4m3(x, fn: bool = True, uz: bool = False, saturate: bool = True)[source]#

Converts a float32 into a float E4M3.

Parameters:
  • x – numpy.float32

  • fn – no infinite values

  • uz – no negative zero

  • saturate – to convert out of range and infinities to max value if True

Returns:

byte

onnx_array_api.validation.f8.float32_to_fe5m2(x, fn: bool = False, uz: bool = False, saturate: bool = True)[source]#

Converts a float32 into a float E5M2.

Parameters:
  • x – numpy.float32

  • fn – no infinite values

  • uz – no negative zero

  • saturate – to convert out of range and infinities to max value if True

Returns:

byte

onnx_array_api.validation.f8.search_float32_into_fe4m3(value: float, fn: bool = True, uz: bool = False, saturate: bool = True) int[source]#

Casts a float 32 into a float E4M3.

Parameters:
  • value – float

  • fn – no infinite values

  • uz – no negative zero

  • saturate – to convert out of range and infinities to max value if True

Returns:

byte

onnx_array_api.validation.f8.search_float32_into_fe5m2(value: float, fn: bool = False, uz: bool = False, saturate: bool = True) int[source]#

Casts a float 32 into a float E5M2.

Parameters:
  • value – float

  • fn – no infinite values

  • uz – no negative zero

  • saturate – to convert out of range and infinities to max value if True

Returns:

byte