experimental_experiment.torch_bench._bash_bench_benchmark_runner_agg

experimental_experiment.torch_bench._bash_bench_benchmark_runner_agg.enumerate_csv_files(data: DataFrame | List[str | Tuple[str, str]] | str | Tuple[str, str, str, str], verbose: int = 0) Iterator[DataFrame | str | Tuple[str, str, str, str]][source]

Enumerates files considered for the aggregation. Only csv files are considered. If a zip file is given, the function digs into the zip files and loops over csv candidates.

Parameters:

data – dataframe with the raw data or a file or list of files

data can contains: * a dataframe * a string for a filename, zip or csv * a list of string * a tuple

experimental_experiment.torch_bench._bash_bench_benchmark_runner_agg.merge_benchmark_reports(data: DataFrame | List[str] | str, model=('suite', 'model_name'), keys=('architecture', 'exporter', 'opt_patterns', 'rtopt', 'device', 'device_name', 'dtype', 'dynamic', 'flag_fake_tensor', 'flag_no_grad', 'flag_training', 'machine', 'processor', 'processor_name', 'version_python', 'version_onnx', 'version_onnxruntime', 'version_onnxscript', 'version_tag', 'version_torch', 'version_transformers', 'version_monai', 'version_timm', 'strategy'), column_keys=('stat', 'exporter', 'opt_patterns', 'dynamic', 'rtopt'), report_on=('speedup', 'speedup_increase', 'speedup_med', 'discrepancies_*', 'TIME_ITER', 'time_*', 'ERR_*', 'onnx_*', 'op_*', 'memory_*', 'mem_*', 'config_*', 'torch_*'), formulas=('export', 'memory_peak', 'buckets', 'status', 'memory_delta', 'control_flow', 'pass_rate', 'accuracy_rate', 'date', 'correction', 'error'), timestamp_column: str = 'timestamp', excel_output: str | None = None, exc: bool = True, filter_in: str | None = None, filter_out: str | None = None, verbose: int = 0, output_clean_raw_data: str | None = None, baseline: DataFrame | None = None, export_simple: str | None = None, export_correlations: str | None = None, broken: bool = False, disc: float | None = None, slow: float | None = None, fast: float | None = None, slow_script: float | None = None, fast_script: float | None = None, exclude: List[int] | None = None, keep_more_recent: bool = False) Dict[str, DataFrame][source]

Merges multiple files produced by bash_benchmark…

_index,DATE,ERR_export,ITER,TIME_ITER,capability,cpu,date_start,device,device_name,...
101Dummy-custom,2024-07-08,,0,7.119158490095288,7.0,40,2024-07-08,cuda,...
101Dummy-script,2024-07-08,,1,6.705480073112994,7.0,40,2024-07-08,cuda,...
101Dummy16-custom,2024-07-08,,2,6.970448340754956,7.0,40,2024-07-08,cuda,...
Parameters:
  • data – dataframe with the raw data or a file or list of files

  • model – columns defining a unique model

  • keys – colimns definined a unique experiment

  • report_on – report on those metrics, <prefix>* means all columns starting with this prefix

  • formulas – add computed metrics

  • timestamp_column – a day, used to tell the user this was run on this day

  • excel_output – output the computed dataframe into a excel document

  • exc – raise exception by default

  • filter_in – filter in some data to make the report smaller (see below)

  • filter_out – filter out some data to make the report smaller (see below)

  • verbose – verbosity

  • output_clean_raw_data – output the concatenated raw data so that it can be used later to make a comparison

  • baseline – to compute difference

  • export_simple – if not None, export simple in this file.

  • export_correlations – if not None, export correlations between exporters

  • broken – produce a document for the broken models per exporter

  • slow – produce a document for the slow models per exporter

  • fast – produce a document for the fast models per exporter

  • slow_script – produce a document for the slow models per exporter compare to torch_script

  • fast_script – produce a document for the fast models per exporter compare to torch_script

  • exclude – exclude a list of files in the list

  • keep_more_recent – in case of duplicates, keep the most recent value

Returns:

dictionary of dataframes

Every key with a unique value is removed. Every column with a unique value is displayed on main. List of knowns columns:

DATE
ERR_export
ERR_warmup
ITER
TIME_ITER
capability
cpu
date_start
device
device_name
discrepancies_abs
discrepancies_rel
dtype
dump_folder
dynamic
executable
exporter
...

Argument filter_in or filter_out follows the syntax <column1>:<fmt1>/<column2>:<fmt2>.

The format is the following:

  • a value or a set of values separated by ;

experimental_experiment.torch_bench._bash_bench_benchmark_runner_agg.open_dataframe(data: str | Tuple[str, str, str, str] | DataFrame) DataFrame[source]

Opens a filename.

Parameters:

data – a dataframe, a filename, a tuple indicating the file is coming from a zip file

Returns:

a dataframe