Float 8¶

class onnx_array_api.validation.f8.CastFloat8[source]¶

Helpers to cast float8 into float32 or the other way around.

static find_closest_value(value, sorted_values)[source]¶

Search a value into a sorted array of values.

Parameters:

value – float32 value to search
sorted_values – list of tuple [(float 32, byte)]

Returns:

byte

The function searches into the first column the closest value and return the value on the second columns.

exception onnx_array_api.validation.f8.UndefinedCastError[source]¶: Unable to case a number.

onnx_array_api.validation.f8.display_fe4m3(value, sign=1, exponent=4, mantissa=3)[source]¶

Displays a float 8 E4M3 into b.

Parameters:

value – value to display (int)
sign – number of bits for the sign
exponent – number of bits for the exponent
mantissa – number of bits for the mantissa

Returns:

string

onnx_array_api.validation.f8.display_fe5m2(value, sign=1, exponent=4, mantissa=3)[source]¶

Displays a float 8 E5M2 into binary format.

Parameters:

value – value to display (int)
sign – number of bits for the sign
exponent – number of bits for the exponent
mantissa – number of bits for the mantissa

Returns:

string

onnx_array_api.validation.f8.display_fexmx(value, sign, exponent, mantissa)[source]¶

Displays any float encoded with 1 bit for the sign, exponent bit for the exponent and mantissa bit for the mantissa.

Parameters:

value – value to display (int)
sign – number of bits for the sign
exponent – number of bits for the exponent
mantissa – number of bits for the mantissa

Returns:

string

onnx_array_api.validation.f8.display_float16(value, sign=1, exponent=5, mantissa=10)[source]¶

Displays a float32 into b.

Parameters:

value – value to display (float16)
sign – number of bits for the sign
exponent – number of bits for the exponent
mantissa – number of bits for the mantissa

Returns:

string

onnx_array_api.validation.f8.display_float32(value, sign=1, exponent=8, mantissa=23)[source]¶

Displays a float32 into b.

Parameters:

value – value to display (float32)
sign – number of bits for the sign
exponent – number of bits for the exponent
mantissa – number of bits for the mantissa

Returns:

string

onnx_array_api.validation.f8.display_int(ival, sign=1, exponent=8, mantissa=23)[source]¶

Displays an integer as bits.

Parameters:

ival – value to display (float32)
sign – number of bits for the sign
exponent – number of bits for the exponent
mantissa – number of bits for the mantissa

Returns:

string

onnx_array_api.validation.f8.fe4m3_to_float32(ival: int, fn: bool = True, uz: bool = False) → float[source]¶

Casts a float E4M3 encoded as an integer into a float.

Parameters:

ival – byte
fn – no inifinite values
uz – no negative zero

Returns:

float (float 32)

onnx_array_api.validation.f8.fe4m3_to_float32_float(ival: int, fn: bool = True, uz: bool = False) → float[source]¶

Casts a float 8 encoded as an integer into a float.

Parameters:

ival – byte
fn – no infinite values
uz – no negative zero

Returns:

float (float 32)

onnx_array_api.validation.f8.fe5m2_to_float32(ival: int, fn: bool = False, uz: bool = False) → float[source]¶

Casts a float E5M2 encoded as an integer into a float.

Parameters:

ival – byte
fn – no inifinite values
uz – no negative values

Returns:

float (float 32)

onnx_array_api.validation.f8.fe5m2_to_float32_float(ival: int, fn: bool = False, uz: bool = False) → float[source]¶

Casts a float 8 encoded as an integer into a float.

Parameters:

ival – byte
fn – no infinite values
uz – no negative zero

Returns:

float (float 32)

onnx_array_api.validation.f8.float32_to_fe4m3(x, fn: bool = True, uz: bool = False, saturate: bool = True)[source]¶

Converts a float32 into a float E4M3.

Parameters:

x – numpy.float32
fn – no infinite values
uz – no negative zero
saturate – to convert out of range and infinities to max value if True

Returns:

byte

onnx_array_api.validation.f8.float32_to_fe5m2(x, fn: bool = False, uz: bool = False, saturate: bool = True)[source]¶

Converts a float32 into a float E5M2.

Parameters:

x – numpy.float32
fn – no infinite values
uz – no negative zero
saturate – to convert out of range and infinities to max value if True

Returns:

byte

onnx_array_api.validation.f8.search_float32_into_fe4m3(value: float, fn: bool = True, uz: bool = False, saturate: bool = True) → int[source]¶

Casts a float 32 into a float E4M3.

Parameters:

value – float
fn – no infinite values
uz – no negative zero
saturate – to convert out of range and infinities to max value if True

Returns:

byte

onnx_array_api.validation.f8.search_float32_into_fe5m2(value: float, fn: bool = False, uz: bool = False, saturate: bool = True) → int[source]¶

Casts a float 32 into a float E5M2.

Parameters:

value – float
fn – no infinite values
uz – no negative zero
saturate – to convert out of range and infinities to max value if True

Returns:

byte