onnx_diagnostic.helpers.log_helper

class onnx_diagnostic.helpers.log_helper.CubeLogs(data: Any, time: str = 'date', keys: Sequence[str] = ('version_.*', 'model_.*'), values: Sequence[str] = ('time_.*', 'disc_.*'), ignored: Sequence[str] = (), recent: bool = False, formulas: Sequence[str] | Dict[str, str | Callable[[DataFrame], Series]] | None = None, fill_missing: Sequence[Tuple[str, Any]] | None = None, keep_last_date: bool = False)[source][source]

Processes logs coming from experiments. A cube is basically a database with certain columns playing specific roles.

  • time: only one column, it is not mandatory but it is recommended to have one

  • keys: they are somehow coordinates, they cannot be aggregated, they are not numbers, more like categories, (time, *keys) identifies an element of the database in an unique way, there cannot be more than one row sharing the same key and time values

  • values: they are not necessary numerical, but if they are, they can be aggregated

Every other columns is ignored. More columns can be added by using formulas.

Parameters:
  • data – the raw data

  • time – the time column

  • keys – the keys, can include regular expressions

  • values – the values, can include regular expressions

  • ignored – ignores some column, acts as negative regular expressions for the other two

  • recent – if more than one rows share the same keys, the cube only keeps the most recent one

  • formulas – columns to add, defined with formulas

  • fill_missing – a dictionary, defines values replacing missing one for some columns

  • keep_last_date – overwrites all the times with the most recent one, it makes things easier for timeseries

clone(data: DataFrame | None = None, keys: Sequence[str] | None = None) CubeLogs[source][source]

Makes a copy of the dataframe. It copies the processed data not the original one.

property columns: Sequence[str]

Returns the columns.

cube_time(fill_other_dates: bool = False, threshold: float = 1.2) CubeLogs[source][source]

Aggregates the data over time to detect changes on the last value. If fill_other_dates is True, all dates are kept, but values are filled with 0. threshold determines the bandwidth within the values are expected, should be a factor of the standard deviation.

describe() DataFrame[source][source]

Basic description of all variables.

load(verbose: int = 0)[source][source]

Loads and preprocesses the data. Returns self.

make_view_def(name: str) CubeViewDef | None[source][source]

Returns a view definition.

Parameters:

name – name of a value

Returns:

a CubeViewDef or None if name does not make sense

post_load_process_piece(df: DataFrame, unique: bool = False) DataFrame[source][source]

Postprocesses a piece when a cube is made of multiple pieces before it gets merged.

sbs(configs: Dict[str, Dict[str, Any]], column_name: str = 'CONF') Tuple[DataFrame, DataFrame][source][source]

Creates a side-by-side for two configurations. Every configuration a dictionary column:value which filters in the rows to keep in order to compute the side by side. Every configuration is given a name (the key in configs), it is added in column column_name.

Parameters:
  • configs – example dict(CFA=dict(exporter="E1", opt="O"), CFB=dict(exporter="E2", opt="O"))

  • column_name – column to add with the name of the configuration

Returns:

data and aggregated date

property shape: Tuple[int, int]

Returns the shape.

to_excel(output: str, views: Sequence[str] | Dict[str, str | CubeViewDef], main: str | None = 'main', raw: str | None = 'raw', verbose: int = 0, csv: Sequence[str] | None = None, time_mask: bool = False, sbs: Dict[str, Dict[str, Any]] | None = None)[source][source]

Creates an excel file with a list of views.

Parameters:
  • output – output file to create

  • views – sequence or dictionary of views to append

  • main – add a page with statitcs on all variables

  • raw – add a page with the raw data

  • csv – views to dump as csv files (same name as outputs + view naw)

  • verbose – verbosity

  • time_mask – color the background of the cells if one of the value for the last date is unexpected, assuming they should remain stale

  • sbs – configurations to compare side-by-side, this adds two tabs, one gathering raw data about the two configurations, the other one is aggregated by metrics

view(view_def: str | CubeViewDef, return_view_def: bool = False, verbose: int = 0) DataFrame | Tuple[DataFrame, CubeViewDef][source][source]

Returns a dataframe, a pivot view. key_index determines the index, the other key columns determines the columns. If ignore_unique is True, every columns with a unique value is removed.

Parameters:
  • view_def – view definition

  • return_view_def – returns the view as well

  • verbose – verbosity level

Returns:

dataframe

class onnx_diagnostic.helpers.log_helper.CubeLogsPerformance(data: Any, time: str = 'DATE', keys: Sequence[str] = ('^version_.*', '^model_.*', 'device', 'opt_patterns', 'suite', 'memory_peak', 'machine', 'exporter', 'dynamic', 'rtopt', 'dtype', 'device', 'architecture'), values: Sequence[str] = ('^time_.*', '^disc.*', '^ERR_.*', 'CMD', '^ITER', '^onnx_.*', '^op_onnx_.*', '^peak_gpu_.*'), ignored: Sequence[str] = ('version_python',), recent: bool = True, formulas: Sequence[str] | Dict[str, str | Callable[[DataFrame], Series]] | None = ('speedup', 'bucket[speedup]', 'ERR1', 'n_models', 'n_model_eager', 'n_model_running', 'n_model_acc01', 'n_model_acc001', 'n_model_dynamic', 'n_model_pass', 'n_model_faster', 'n_model_faster2x', 'n_model_faster3x', 'n_model_faster4x', 'n_node_attention', 'n_node_control_flow', 'n_node_scatter', 'n_node_function', 'n_node_initializer', 'n_node_initializer_small', 'n_node_constant', 'n_node_shape', 'n_node_expand', 'onnx_n_nodes_no_cst', 'peak_gpu_torch', 'peak_gpu_nvidia', 'time_export_unbiased'), fill_missing: Sequence[Tuple[str, Any]] | None = (('model_attn_impl', 'eager'),), keep_last_date: bool = False)[source][source]

Processes logs coming from experiments.

clone(data: DataFrame | None = None, keys: Sequence[str] | None = None) CubeLogs[source][source]

Makes a copy of the dataframe. It copies the processed data not the original one. keys can be changed as well.

make_view_def(name: str) CubeViewDef | None[source][source]

Returns a view definition.

Parameters:

name – name of the view

Returns:

a CubeViewDef or None if name does not make sense

Available views:

  • agg-suite: aggregation per suite

  • disc: discrepancies

  • speedup: speedup

  • bucket_speedup: speedup in buckets

  • time: latency

  • time_export: time to export

  • counts: status, running, faster, has control flow, …

  • err: important errors

  • cmd: command lines

  • raw-short: raw data without all the unused columns

post_load_process_piece(df: DataFrame, unique: bool = False) DataFrame[source][source]

Postprocesses a piece when a cube is made of multiple pieces before it gets merged.

view(view_def: str | CubeViewDef | None, return_view_def: bool = False, verbose: int = 0) DataFrame | None | Tuple[DataFrame | None, CubeViewDef | None][source][source]

Returns a dataframe, a pivot view.

If view_def is a string, it is replaced by a prefined view.

Parameters:
  • view_def – view definition or a string

  • return_view_def – returns the view definition as well

  • verbose – verbosity level

Returns:

dataframe or a couple (dataframe, view definition), both of them can be one if view_def cannot be interpreted

class onnx_diagnostic.helpers.log_helper.CubePlot(df: DataFrame, kind: str = 'bar', orientation='col', split: bool = True, timeseries: str | None = None)[source][source]

Creates a plot.

Parameters:
  • df – dataframe

  • kind – kind of graph to plot, bar, barh, line

  • split – draw a graph per line in the dataframe

  • timeseries – this assumes the time is one level of the columns, this argument indices the level name

It defines a graph. Usually bar or barh is used to compare experiments for every metric, a subplot by metric.

CubePlot(df, kind="barh", orientation="row", split=True)

line is usually used to plot timeseries showing the evolution of metrics over time.

CubePlot(
    df,
    kind="line",
    orientation="row",
    split=True,
    timeseries="time",
)
classmethod group_columns(columns: List[str], sep: str = '/', depth: int = 2) List[List[str]][source][source]

Groups columns to have nice display.

to_images(verbose: int = 0, merge: bool = True, title_suffix: str | None = None) List[bytes][source][source]

Converts data into plots and images.

Parameters:
  • verbose – verbosity

  • merge – returns all graphs in a single image (True) or an image for every graph (False)

  • title_suffix – prefix for the title of every graph

Returns:

list of binary images (format PNG)

class onnx_diagnostic.helpers.log_helper.CubeViewDef(key_index: Sequence[str], values: Sequence[str], ignore_unique: bool = True, order: Sequence[str] | None = None, key_agg: Sequence[str] | None = None, agg_args: Sequence[Any] | Callable[[str], Any] = ('sum',), agg_kwargs: Dict[str, Any] | None = None, agg_multi: Dict[str, Callable[[DataFrameGroupBy], Series]] | None = None, ignore_columns: Sequence[str] | None = None, keep_columns_in_index: Sequence[str] | None = None, dropna: bool = True, transpose: bool = False, f_highlight: Callable[[Any], HighLightKind] | None = None, name: str | None = None, no_index: bool = False, plots: bool = False)[source][source]

Defines how to compute a view.

Parameters:
  • key_index – keys to put in the row index

  • values – values to show

  • ignore_unique – ignore keys with a unique value

  • order – to reorder key in columns index

  • key_agg – aggregate according to these columns before creating the view

  • agg_args – see pandas.core.groupby.DataFrameGroupBy.agg(), it can be also a callable to return a different aggregation method depending on the column name

  • agg_kwargs – see pandas.core.groupby.DataFrameGroupBy.agg()

  • agg_multi – aggregation over multiple columns

  • ignore_columns – ignore the following columns if known to overload the view

  • keep_columns_in_index – keeps the columns even if there is only one unique value

  • dropna – drops rows with nan if not relevant

  • transpose – transpose

  • f_highlight – to highlights some values

  • name – name of the view, used mostly to debug

  • plots – adds plot to the Excel sheet

  • no_index – remove the index (but keeps the columns)

Some examples of views. First example is an aggregated view for many metrics.

cube = CubeLogs(...)

CubeViewDef(
    key_index=cube._filter_column(fs, cube.keys_time),
    values=cube._filter_column(
        ["TIME_ITER", "speedup", "time_latency.*", "onnx_n_nodes"],
        cube.values,
    ),
    ignore_unique=True,
    key_agg=["model_name", "task", "model_task", "suite"],
    agg_args=lambda column_name: "sum" if column_name.startswith("n_") else "mean",
    agg_multi={"speedup_weighted": mean_weight, "speedup_geo": mean_geo},
    name="agg-all",
    plots=True,
)

Next one focuses on a couple of metrics.

cube = CubeLogs(...)

CubeViewDef(
    key_index=cube._filter_column(fs, cube.keys_time),
    values=cube._filter_column(["speedup"], cube.values),
    ignore_unique=True,
    keep_columns_in_index=["suite"],
    name="speedup",
)
class HighLightKind(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source][source]

Codes to highlight values.