onnx_diagnostic.helpers.log_helper¶

class onnx_diagnostic.helpers.log_helper.CubeLogs(data: Any, time: str = 'date', keys: Sequence[str] = ('version_.*', 'model_.*'), values: Sequence[str] = ('time_.*', 'disc_.*'), ignored: Sequence[str] = (), recent: bool = False, formulas: Sequence[str] | Dict[str, str | Callable[[DataFrame], Series]] | None = None, fill_missing: Sequence[Tuple[str, Any]] | None = None, keep_last_date: bool = False)[source][source]¶

Processes logs coming from experiments.

property columns: Sequence[str]¶: Returns the columns.

describe() → DataFrame[source][source]¶: Basic description of all variables.

load(verbose: int = 0)[source][source]¶: Loads and preprocesses the data. Returns self.

post_load_process_piece(df: DataFrame, unique: bool = False) → DataFrame[source][source]¶: Postprocesses a piece when a cube is made of multiple pieces before it gets merged.

property shape: Tuple[int, int]¶: Returns the shape.

Creates an excel file with a list of view.

Parameters:

output – output file to create
views – sequence or dictionary of views to append
main – add a page with statitcs on all variables
raw – add a page with the raw data
csv – views to dump as csv files (same name as outputs + view naw)
verbose – verbosity

view(view_def: str | CubeViewDef, return_view_def: bool = False, verbose: int = 0) → DataFrame | Tuple[DataFrame, CubeViewDef][source][source]¶

Returns a dataframe, a pivot view. key_index determines the index, the other key columns determines the columns. If ignore_unique is True, every columns with a unique value is removed.

Parameters:

view_def – view definition
return_view_def – returns the view as well
verbose – verbosity level

Returns:

dataframe

class onnx_diagnostic.helpers.log_helper.CubeLogsPerformance(data: Any, time: str = 'DATE', keys: Sequence[str] = ('^version_.*', '^model_.*', 'device', 'opt_patterns', 'suite', 'memory_peak', 'machine', 'exporter', 'dynamic', 'rtopt', 'dtype', 'device', 'architecture'), values: Sequence[str] = ('^time_.*', '^disc.*', '^ERR_.*', 'CMD', '^ITER', '^onnx_.*', '^op_onnx_.*', '^peak_gpu_.*'), ignored: Sequence[str] = ('version_python',), recent: bool = True, formulas: Sequence[str] | Dict[str, str | Callable[[DataFrame], Series]] | None = ('speedup', 'bucket[speedup]', 'ERR1', 'n_models', 'n_model_eager', 'n_model_running', 'n_model_acc01', 'n_model_acc001', 'n_model_dynamic', 'n_model_pass', 'n_model_faster', 'n_model_faster2x', 'n_model_faster3x', 'n_model_faster4x', 'n_node_attention', 'n_node_control_flow', 'n_node_scatter', 'n_node_function', 'n_node_initializer', 'n_node_constant', 'n_node_shape', 'n_node_expand', 'peak_gpu_torch', 'peak_gpu_nvidia', 'time_export_unbiased'), fill_missing: Sequence[Tuple[str, Any]] | None = (('model_attn_impl', 'eager'),), keep_last_date: bool = False)[source][source]¶

Processes logs coming from experiments.

make_view_def(name: str) → CubeViewDef[source][source]¶

Returns a view definition.

Parameters:: name – name of the view
Returns:: a CubeViewDef

Available views:

agg-suite: aggregation per suite
disc: discrepancies
speedup: speedup
bucket_speedup: speedup in buckets
time: latency
time_export: time to export
counts: status, running, faster, has control flow, …
err: important errors
cmd: command lines
raw-short: raw data without all the unused columns

post_load_process_piece(df: DataFrame, unique: bool = False) → DataFrame[source][source]¶: Postprocesses a piece when a cube is made of multiple pieces before it gets merged.

view(view_def: str | CubeViewDef, return_view_def: bool = False, verbose: int = 0) → DataFrame | Tuple[DataFrame, CubeViewDef][source][source]¶

Returns a dataframe, a pivot view.

If view_def is a string, it is replaced by a prefined view.

Parameters:

view_def – view definition or a string
return_view_def – returns the view definition as well
verbose – verbosity level

Returns:

dataframe

class onnx_diagnostic.helpers.log_helper.CubePlot(df: DataFrame, kind: str = 'bar', orientation='col', split: bool = True)[source][source]¶

Creates a plot.

to_charts(writer: ExcelWriter, sheet, empty_row: int = 1)[source][source]¶

Draws plots on a page. The data is copied on this page.

Parameters:

name – sheet name
writer – writer (from pandas)
sheet_name – sheet
graph_index – graph index

Returns:

list of graph

to_images(verbose: int = 0, merge: bool = True, title_suffix: str | None = None)[source][source]¶: Converts data into plots and images.

class onnx_diagnostic.helpers.log_helper.CubeViewDef(key_index: Sequence[str], values: Sequence[str], ignore_unique: bool = True, order: Sequence[str] | None = None, key_agg: Sequence[str] | None = None, agg_args: Sequence[Any] | Callable[[str], Any] = ('sum',), agg_kwargs: Dict[str, Any] | None = None, agg_multi: Dict[str, Callable[[DataFrameGroupBy], Series]] | None = None, ignore_columns: Sequence[str] | None = None, keep_columns_in_index: Sequence[str] | None = None, dropna: bool = True, transpose: bool = False, f_highlight: Callable[[Any], HighLightKind] | None = None, name: str | None = None, no_index: bool = False, plots: bool = False)[source][source]¶

Defines how to compute a view.

Parameters:

key_index – keys to put in the row index
values – values to show
ignore_unique – ignore keys with a unique value
order – to reorder key in columns index
key_agg – aggregate according to these columns before creating the view
agg_args – see pandas.core.groupby.DataFrameGroupBy.agg(), it can be also a callable to return a different aggregation method depending on the column name
agg_kwargs – see pandas.core.groupby.DataFrameGroupBy.agg()
agg_multi – aggregation over multiple columns
ignore_columns – ignore the following columns if known to overload the view
keep_columns_in_index – keeps the columns even if there is only one unique value
dropna – drops rows with nan if not relevant
transpose – transpose
f_highlight – to highlights some values
name – name of the view, used mostly to debug
plots – adds plot to the Excel sheet
no_index – remove the index (but keeps the columns)

class HighLightKind(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source][source]¶

onnx_diagnostic.helpers.log_helper.apply_excel_style(filename_or_writer: Any, f_highlights: Dict[str, Callable[[Any], HighLightKind]] | None = None)[source][source]¶

Applies styles on all sheets in a file unless the sheet is too big.

Parameters:

filename_or_writer – filename, modified inplace
f_highlight – color function to apply, one per sheet

Enumerates files considered for the aggregation. Only csv files are considered. If a zip file is given, the function digs into the zip files and loops over csv candidates.

Parameters:

data – dataframe with the raw data or a file or list of files
vrbose – verbosity
filtering – function to filter in or out files in zip files, must return true to keep the file, false to skip it.

Returns:

a generator yielding tuples with the filename, date, full path and zip file

data can contains: * a dataframe * a string for a filename, zip or csv * a list of string * a tuple

onnx_diagnostic.helpers.log_helper.open_dataframe(data: str | Tuple[str, str, str, str] | DataFrame) → DataFrame[source][source]¶

Opens a filename.

Parameters:: data – a dataframe, a filename, a tuple indicating the file is coming from a zip file
Returns:: a dataframe