onnx_diagnostic.helpers._log_helper

onnx_diagnostic.helpers._log_helper.align_dataframe_with(df: DataFrame, baseline: DataFrame, fill_value: float = 0) DataFrame | None[source][source]

Modifies the first dataframe df to get the exact same number of columns and rows. They must share the same levels on both axes. Empty cells are filled with 0. We only keep the numerical columns. The function return None if the output is empty.

onnx_diagnostic.helpers._log_helper.apply_excel_style(filename_or_writer: Any, f_highlights: Dict[str, Callable[[Any], CubeViewDef.HighLightKind]] | None = None, time_mask_view: Dict[str, DataFrame] | None = None, verbose: int = 0)[source][source]

Applies styles on all sheets in a file unless the sheet is too big.

Parameters:
  • filename_or_writer – filename, modified inplace

  • f_highlight – color function to apply, one per sheet

  • time_mask_view – if specified, it contains dataframe with the same shape and values in {-1, 0, +1} which indicates if a value is unexpectedly lower (-1) or higher (+1), it changes the color of the background then.

  • verbosity – progress loop

onnx_diagnostic.helpers._log_helper.breaking_last_point(series: Sequence[float], threshold: float = 1.2)[source][source]

Assuming a timeseries is constant, we check the last value is not an outlier.

Parameters:

series – series

Returns:

significant change (-1, 0, +1), test value

onnx_diagnostic.helpers._log_helper.enumerate_csv_files(data: DataFrame | List[str | Tuple[str, str]] | str | Tuple[str, str, str, str], verbose: int = 0, filtering: Callable[[str], bool] | None = None) Iterator[DataFrame | str | Tuple[str, str, str, str]][source][source]

Enumerates files considered for the aggregation. Only csv files are considered. If a zip file is given, the function digs into the zip files and loops over csv candidates.

Parameters:
  • data – dataframe with the raw data or a file or list of files

  • vrbose – verbosity

  • filtering – function to filter in or out files in zip files, must return true to keep the file, false to skip it.

Returns:

a generator yielding tuples with the filename, date, full path and zip file

data can contains: * a dataframe * a string for a filename, zip or csv * a list of string * a tuple

onnx_diagnostic.helpers._log_helper.filter_data(df: DataFrame, filter_in: str | None = None, filter_out: str | None = None, verbose: int = 0) DataFrame[source][source]

Argument filter follows the syntax <column1>:<fmt1>//<column2>:<fmt2>.

The format is the following:

  • a value or a set of values separated by ;

onnx_diagnostic.helpers._log_helper.mann_kendall(series: Sequence[float], threshold: float = 0.5)[source][source]

Computes the test of Mann-Kendall.

Parameters:
  • series – series

  • threshold – 1.96 is the usual value, 0.5 means a short timeseries (0, 1, 2, 3, 4) has a significant trend

Returns:

trend (-1, 0, +1), test value

S =\sum_{i=1}^{n}\sum_{j=i+1}^{n} sign(x_j - x_i)

where the function sign is:

sign(x) = \left\{ \begin{array}{l} -1 if x < 0 \\ 0 if x = 0 \\ +1 otherwise
\end{array} \right.

And:

Var(S)= \frac{n(n-1)(2n+5) - \sum_t t(t-1)(2t+5)}{18}

onnx_diagnostic.helpers._log_helper.open_dataframe(data: str | Tuple[str, str, str, str] | DataFrame) DataFrame[source][source]

Opens a filename defined by function onnx_diagnostic.helpers.log_helper.enumerate_csv_files().

Parameters:

data – a dataframe, a filename, a tuple indicating the file is coming from a zip file

Returns:

a dataframe