experimental_experiment.torch_bench._bash_bench_benchmark_runner_agg¶

Enumerates files considered for the aggregation. Only csv files are considered. If a zip file is given, the function digs into the zip files and loops over csv candidates.

Parameters:: data – dataframe with the raw data or a file or list of files

data can contains: * a dataframe * a string for a filename, zip or csv * a list of string * a tuple

experimental_experiment.torch_bench._bash_bench_benchmark_runner_agg.merge_benchmark_reports(data: DataFrame | List[str] | str, model=('suite', 'model_task', 'model_name'), keys=('architecture', 'exporter', 'opt_patterns', 'rtopt', 'device', 'device_name', 'dynamic', 'model_attn_impl', 'flag_fake_tensor', 'flag_no_grad', 'flag_training', 'machine', 'processor', 'processor_name', 'version_python', 'version_onnx', 'version_onnxruntime', 'version_onnxscript', 'version_tag', 'version_torch', 'version_transformers', 'version_monai', 'version_timm', 'strategy'), column_keys=('stat', 'exporter', 'opt_patterns', 'dtype', 'dynamic', 'rtopt', 'model_attn_impl'), report_on=('speedup', 'speedup_increase', 'speedup_med', 'discrepancies_*', 'TIME_ITER', 'time_*', 'ERR_*', 'onnx_*', 'op_*', 'memory_*', 'mem_*', 'config_*', 'torch_*'), formulas=('export', 'memory_peak', 'buckets', 'status', 'memory_delta', 'control_flow', 'pass_rate', 'accuracy_rate', 'accuracy_dynamic_rate', 'date', 'correction', 'error'), timestamp_column: str = 'timestamp', excel_output: str | None = None, exc: bool = True, filter_in: str | None = None, filter_out: str | None = None, verbose: int = 0, output_clean_raw_data: str | None = None, baseline: DataFrame | None = None, export_simple: str | None = None, export_correlations: str | None = None, broken: bool = False, disc: float | None = None, slow: float | None = None, fast: float | None = None, slow_script: float | None = None, fast_script: float | None = None, exclude: List[int] | None = None, keep_more_recent: bool = False) → Dict[str, DataFrame][source]¶

Merges multiple files produced by bash_benchmark…

_index,DATE,ERR_export,ITER,TIME_ITER,capability,cpu,date_start,device,device_name,...
101Dummy-custom,2024-07-08,,0,7.119158490095288,7.0,40,2024-07-08,cuda,...
101Dummy-script,2024-07-08,,1,6.705480073112994,7.0,40,2024-07-08,cuda,...
101Dummy16-custom,2024-07-08,,2,6.970448340754956,7.0,40,2024-07-08,cuda,...

Parameters:

data – dataframe with the raw data or a file or list of files
model – columns defining a unique model
keys – colimns definined a unique experiment
report_on – report on those metrics, <prefix>* means all columns starting with this prefix
formulas – add computed metrics
timestamp_column – a day, used to tell the user this was run on this day
excel_output – output the computed dataframe into a excel document
exc – raise exception by default
filter_in – filter in some data to make the report smaller (see below)
filter_out – filter out some data to make the report smaller (see below)
verbose – verbosity
output_clean_raw_data – output the concatenated raw data so that it can be used later to make a comparison
baseline – to compute difference
export_simple – if not None, export simple in this file.
export_correlations – if not None, export correlations between exporters
broken – produce a document for the broken models per exporter
slow – produce a document for the slow models per exporter
fast – produce a document for the fast models per exporter
slow_script – produce a document for the slow models per exporter compare to torch_script
fast_script – produce a document for the fast models per exporter compare to torch_script
exclude – exclude a list of files in the list
keep_more_recent – in case of duplicates, keep the most recent value

Returns:

dictionary of dataframes

Every key with a unique value is removed. Every column with a unique value is displayed on main. List of knowns columns:

DATE
ERR_export
ERR_warmup
ITER
TIME_ITER
capability
cpu
date_start
device
device_name
discrepancies_abs
discrepancies_rel
dtype
dump_folder
dynamic
executable
exporter
...

Argument filter_in or filter_out follows the syntax <column1>:<fmt1>/<column2>:<fmt2>.

The format is the following:

a value or a set of values separated by ;

experimental_experiment.torch_bench._bash_bench_benchmark_runner_agg.open_dataframe(data: str | Tuple[str, str, str, str] | DataFrame) → DataFrame[source]¶

Opens a filename.

Parameters:: data – a dataframe, a filename, a tuple indicating the file is coming from a zip file
Returns:: a dataframe