-m yobx agg … aggregate statistics from benchmark runs#
The command aggregates statistics produced by benchmarks. It reads one or more CSV files (or ZIP archives containing CSV files), combines them into a single dataset, and writes an Excel workbook with multiple tabs – one per view.
Description#
See yobx.helpers.cube_helper.CubeLogsPerformance.
usage: agg [-h] [--filter FILTER] [--recent | --no-recent] [--keep-last-date | --no-keep-last-date] [--raw | --no-raw] [-t TIME]
[-k KEYS] [--drop-keys DROP_KEYS] [-w VALUES] [-i IGNORED] [-f FORMULA] [--views VIEWS] [--csv CSV] [-v VERBOSE]
[--filter-in FILTER_IN] [--filter-out FILTER_OUT] [--sbs SBS]
output inputs [inputs ...]
Aggregates statistics coming from benchmarks.
Every run is a row. Every row is indexed by some keys,
and produces values. Every row has a date.
The data can come any csv files produces by benchmarks,
it can concatenates many csv files, or csv files inside zip files.
It produces an excel file with many tabs, one per view.
positional arguments:
output output excel file
inputs input csv or zip files, at least 1, it can be a name, or search path
options:
-h, --help show this help message and exit
--filter FILTER filter for input files inside zip files
--recent, --no-recent
Keeps only the most recent experiment for the same set of keys.
--keep-last-date, --no-keep-last-date
Rewrite all dates to the last one to simplify the analysis, this assume changing the date does not add ambiguity, if any, option --recent should be added.
--raw, --no-raw Keeps the raw data in a sheet.
-t TIME, --time TIME Date or time column
-k KEYS, --keys KEYS List of columns to consider as keys, multiple values are separated by `,`
regular expressions are allowed
--drop-keys DROP_KEYS
Drops keys from the given list. Something it is faster to remove one than to select all the remaining ones.
-w VALUES, --values VALUES
List of columns to consider as values, multiple values are separated by `,`
regular expressions are allowed
-i IGNORED, --ignored IGNORED
List of columns to ignore
-f FORMULA, --formula FORMULA
Columns to compute after the aggregation was done.
--views VIEWS
Views to add to the output files. Each view becomes a tab.
A view is defined by its name, among
agg-suite, agg-all, disc, speedup, time, time_export, err,
cmd, bucket-speedup, raw-short, counts, peak-gpu, onnx.
Their definition is part of class CubeLogsPerformance.
--csv CSV Views to dump as csv files.
-v VERBOSE, --verbose VERBOSE
verbosity
--filter-in FILTER_IN
adds a filter to filter in data, syntax is
``"<column1>:<value1>;<value2>//<column2>:<value3>"`` ...
--filter-out FILTER_OUT
adds a filter to filter out data, syntax is
``"<column1>:<value1>;<value2>//<column2>:<value3>"`` ...
--sbs SBS
Defines an exporter to compare to another, there must be at least
two arguments defined with --sbs. Example:
--sbs dynamo:exporter=onnx-dynamo,opt=ir,attn_impl=eager
--sbs custom:exporter=custom,opt=default,attn_impl=eager
examples:
python -m yobx agg test_agg.xlsx raw/*.zip -v 1
python -m yobx agg agg.xlsx raw/*.zip raw/*.csv -v 1 \
--no-raw --keep-last-date --filter-out "exporter:test-exporter"
Another to create timeseries:
python -m yobx agg history.xlsx raw/*.csv -v 1 --no-raw \
--no-recent
Examples#
Basic aggregation from a set of ZIP archives:
python -m yobx agg test_agg.xlsx raw/*.zip -v 1
Drop the raw-data sheet, keep only the most recent run per key set, and filter out a specific exporter:
python -m yobx agg agg.xlsx raw/*.zip raw/*.csv -v 1 \
--no-raw --keep-last-date --filter-out "exporter:test-exporter"
Create a time-series view (no recent-only filtering):
python -m yobx agg history.xlsx raw/*.csv -v 1 --no-raw \
--no-recent