-m yobx agg … aggregate statistics from benchmark runs#

The command aggregates statistics produced by benchmarks. It reads one or more CSV files (or ZIP archives containing CSV files), combines them into a single dataset, and writes an Excel workbook with multiple tabs – one per view.

Description#

See yobx.helpers.cube_helper.CubeLogsPerformance.

    usage: agg [-h] [--filter FILTER] [--recent | --no-recent] [--keep-last-date | --no-keep-last-date] [--raw | --no-raw] [-t TIME] [-k KEYS]
               [--drop-keys DROP_KEYS] [-w VALUES] [-i IGNORED] [-f FORMULA] [--views VIEWS] [--csv CSV] [-v VERBOSE] [--filter-in FILTER_IN]
               [--filter-out FILTER_OUT] [--sbs SBS]
               output inputs [inputs ...]
    
    Aggregates statistics coming from benchmarks.
    Every run is a row. Every row is indexed by some keys,
    and produces values. Every row has a date.
    The data can come any csv files produces by benchmarks,
    it can concatenates many csv files, or csv files inside zip files.
    It produces an excel file with many tabs, one per view.
    
    positional arguments:
      output                output excel file
      inputs                input csv or zip files, at least 1, it can be a name, or search path
    
    options:
      -h, --help            show this help message and exit
      --filter FILTER       filter for input files inside zip files
      --recent, --no-recent
                            Keeps only the most recent experiment for the same set of keys.
      --keep-last-date, --no-keep-last-date
                            Rewrite all dates to the last one to simplify the analysis, this assume changing the date does not add ambiguity, if any, option --recent should be added.
      --raw, --no-raw       Keeps the raw data in a sheet.
      -t TIME, --time TIME  Date or time column
      -k KEYS, --keys KEYS  List of columns to consider as keys, multiple values are separated by `,`
                            regular expressions are allowed
      --drop-keys DROP_KEYS
                            Drops keys from the given list. Something it is faster to remove one than to select all the remaining ones.
      -w VALUES, --values VALUES
                            List of columns to consider as values, multiple values are separated by `,`
                            regular expressions are allowed
      -i IGNORED, --ignored IGNORED
                            List of columns to ignore
      -f FORMULA, --formula FORMULA
                            Columns to compute after the aggregation was done.
      --views VIEWS         
                            Views to add to the output files. Each view becomes a tab.
                            A view is defined by its name, among
                            agg-suite, agg-all, disc, speedup, time, time_export, err,
                            cmd, bucket-speedup, raw-short, counts, peak-gpu, onnx.
                            Their definition is part of class CubeLogsPerformance.
      --csv CSV             Views to dump as csv files.
      -v VERBOSE, --verbose VERBOSE
                            verbosity
      --filter-in FILTER_IN
                            adds a filter to filter in data, syntax is
                            ``"<column1>:<value1>;<value2>//<column2>:<value3>"`` ...
      --filter-out FILTER_OUT
                            adds a filter to filter out data, syntax is
                            ``"<column1>:<value1>;<value2>//<column2>:<value3>"`` ...
      --sbs SBS             
                            Defines an exporter to compare to another, there must be at least
                            two arguments defined with --sbs. Example:
                                --sbs dynamo:exporter=onnx-dynamo,opt=ir,attn_impl=eager
                                --sbs custom:exporter=custom,opt=default,attn_impl=eager
    
    examples:
    
        python -m yobx agg test_agg.xlsx raw/*.zip -v 1
        python -m yobx agg agg.xlsx raw/*.zip raw/*.csv -v 1 \
            --no-raw  --keep-last-date --filter-out "exporter:test-exporter"
    
    Another to create timeseries:
    
        python -m yobx agg history.xlsx raw/*.csv -v 1 --no-raw \
            --no-recent

Examples#

Basic aggregation from a set of ZIP archives:

python -m yobx agg test_agg.xlsx raw/*.zip -v 1

Drop the raw-data sheet, keep only the most recent run per key set, and filter out a specific exporter:

python -m yobx agg agg.xlsx raw/*.zip raw/*.csv -v 1 \
    --no-raw --keep-last-date --filter-out "exporter:test-exporter"

Create a time-series view (no recent-only filtering):

python -m yobx agg history.xlsx raw/*.csv -v 1 --no-raw \
    --no-recent