Float 8#
- class onnx_array_api.validation.f8.CastFloat8[source]#
Helpers to cast float8 into float32 or the other way around.
- static find_closest_value(value, sorted_values)[source]#
Search a value into a sorted array of values.
- Parameters:
value – float32 value to search
sorted_values – list of tuple [(float 32, byte)]
- Returns:
byte
The function searches into the first column the closest value and return the value on the second columns.
- onnx_array_api.validation.f8.display_fe4m3(value, sign=1, exponent=4, mantissa=3)[source]#
Displays a float 8 E4M3 into b.
- Parameters:
value – value to display (int)
sign – number of bits for the sign
exponent – number of bits for the exponent
mantissa – number of bits for the mantissa
- Returns:
string
- onnx_array_api.validation.f8.display_fe5m2(value, sign=1, exponent=4, mantissa=3)[source]#
Displays a float 8 E5M2 into binary format.
- Parameters:
value – value to display (int)
sign – number of bits for the sign
exponent – number of bits for the exponent
mantissa – number of bits for the mantissa
- Returns:
string
- onnx_array_api.validation.f8.display_fexmx(value, sign, exponent, mantissa)[source]#
Displays any float encoded with 1 bit for the sign, exponent bit for the exponent and mantissa bit for the mantissa.
- Parameters:
value – value to display (int)
sign – number of bits for the sign
exponent – number of bits for the exponent
mantissa – number of bits for the mantissa
- Returns:
string
- onnx_array_api.validation.f8.display_float16(value, sign=1, exponent=5, mantissa=10)[source]#
Displays a float32 into b.
- Parameters:
value – value to display (float16)
sign – number of bits for the sign
exponent – number of bits for the exponent
mantissa – number of bits for the mantissa
- Returns:
string
- onnx_array_api.validation.f8.display_float32(value, sign=1, exponent=8, mantissa=23)[source]#
Displays a float32 into b.
- Parameters:
value – value to display (float32)
sign – number of bits for the sign
exponent – number of bits for the exponent
mantissa – number of bits for the mantissa
- Returns:
string
- onnx_array_api.validation.f8.display_int(ival, sign=1, exponent=8, mantissa=23)[source]#
Displays an integer as bits.
- Parameters:
ival – value to display (float32)
sign – number of bits for the sign
exponent – number of bits for the exponent
mantissa – number of bits for the mantissa
- Returns:
string
- onnx_array_api.validation.f8.fe4m3_to_float32(ival: int, fn: bool = True, uz: bool = False) float [source]#
Casts a float E4M3 encoded as an integer into a float.
- Parameters:
ival – byte
fn – no inifinite values
uz – no negative zero
- Returns:
float (float 32)
- onnx_array_api.validation.f8.fe4m3_to_float32_float(ival: int, fn: bool = True, uz: bool = False) float [source]#
Casts a float 8 encoded as an integer into a float.
- Parameters:
ival – byte
fn – no infinite values
uz – no negative zero
- Returns:
float (float 32)
- onnx_array_api.validation.f8.fe5m2_to_float32(ival: int, fn: bool = False, uz: bool = False) float [source]#
Casts a float E5M2 encoded as an integer into a float.
- Parameters:
ival – byte
fn – no inifinite values
uz – no negative values
- Returns:
float (float 32)
- onnx_array_api.validation.f8.fe5m2_to_float32_float(ival: int, fn: bool = False, uz: bool = False) float [source]#
Casts a float 8 encoded as an integer into a float.
- Parameters:
ival – byte
fn – no infinite values
uz – no negative zero
- Returns:
float (float 32)
- onnx_array_api.validation.f8.float32_to_fe4m3(x, fn: bool = True, uz: bool = False, saturate: bool = True)[source]#
Converts a float32 into a float E4M3.
- Parameters:
x – numpy.float32
fn – no infinite values
uz – no negative zero
saturate – to convert out of range and infinities to max value if True
- Returns:
byte
- onnx_array_api.validation.f8.float32_to_fe5m2(x, fn: bool = False, uz: bool = False, saturate: bool = True)[source]#
Converts a float32 into a float E5M2.
- Parameters:
x – numpy.float32
fn – no infinite values
uz – no negative zero
saturate – to convert out of range and infinities to max value if True
- Returns:
byte
- onnx_array_api.validation.f8.search_float32_into_fe4m3(value: float, fn: bool = True, uz: bool = False, saturate: bool = True) int [source]#
Casts a float 32 into a float E4M3.
- Parameters:
value – float
fn – no infinite values
uz – no negative zero
saturate – to convert out of range and infinities to max value if True
- Returns:
byte
- onnx_array_api.validation.f8.search_float32_into_fe5m2(value: float, fn: bool = False, uz: bool = False, saturate: bool = True) int [source]#
Casts a float 32 into a float E5M2.
- Parameters:
value – float
fn – no infinite values
uz – no negative zero
saturate – to convert out of range and infinities to max value if True
- Returns:
byte