onnxruntime.SessionOptions Guide#
onnxruntime.SessionOptions controls how
onnxruntime.InferenceSession loads and runs a model. This page
lists every property, method, and related enum together with a short
description, its default value, and — where applicable — the matching
yobx wrapper parameter.
yobx wrappers
The following classes accept individual SessionOptions fields as
keyword arguments so that callers rarely need to build a
onnxruntime.SessionOptions object by hand:
_InferenceSessionInferenceSessionForNumpyInferenceSessionForTorch
Pass a fully-configured onnxruntime.SessionOptions object via the
session_options argument to bypass all individual keyword arguments.
Properties#
Property |
Default |
Description |
|---|---|---|
|
|
When |
|
|
When |
|
|
When |
|
|
When |
|
|
Controls whether operators are executed sequentially or in parallel.
See ExecutionMode for the available values.
Use |
|
|
Determines the order in which nodes are scheduled for execution. See ExecutionOrder for the available values. |
|
|
Sets the level of graph optimizations applied before execution.
See GraphOptimizationLevel for the available levels.
Corresponds to the |
|
|
Number of threads used for parallelism between independent graph
nodes when |
|
|
Number of threads used for parallelism within a single operator
(e.g., matrix multiplication). |
|
|
Severity threshold for session-level log messages. Messages below
this level are suppressed. Common values: |
|
|
Verbosity sub-level for VERBOSE messages
( |
|
|
String tag prepended to log messages emitted by this session. Useful when multiple sessions run concurrently. |
|
|
Path where the optimized ONNX model is saved after graph
optimization. Leave empty to skip saving. When set, a companion
data file is also configured automatically by the yobx wrappers.
Corresponds to the |
|
|
Prefix of the JSON file written when |
|
|
When |
|
|
When |
Methods#
Method |
Description |
|---|---|
|
Sets a single session configuration entry as a key/value string pair.
This is the primary way to pass advanced options that are not exposed
as named properties. See Session Configuration Entries for a
selection of commonly used keys. Used by the yobx wrappers to set
|
|
Returns the string value previously set with
|
|
Binds the symbolic input dimension named |
|
Like |
|
Shares a pre-allocated |
|
Supplies a list of external initializers (by name and
|
|
Like |
|
Adds an explicit execution provider with a string options mapping.
Prefer passing the |
|
Like |
|
Returns |
|
Registers a shared library ( |
|
When |
|
Sets an automatic EP selection policy
( |
|
Provides a Python callable that ONNX Runtime calls to choose an
execution provider. The callable receives the candidate
|
GraphOptimizationLevel#
onnxruntime.GraphOptimizationLevel is an enum that controls which
optimization passes run before model execution.
Name |
Value |
Description |
|---|---|---|
|
|
Disables all graph optimizations. The model is executed exactly as exported. Use this when diagnosing incorrect results caused by graph rewrites, or when benchmarking the unoptimized graph. |
|
|
Enables constant folding, redundant node elimination, and other cheap algebraic simplifications that are always safe. |
|
|
Adds more aggressive fusions on top of |
|
|
Adds layout-transformation optimizations (e.g. NCHWc) on top of
|
|
|
Enables all optimizations, including the ones that depend on the selected execution provider. This is the default. |
In the yobx wrappers the graph_optimization_level parameter also accepts
a plain bool: True maps to ORT_ENABLE_ALL and False maps to
ORT_DISABLE_ALL.
ExecutionMode#
onnxruntime.ExecutionMode controls whether independent graph nodes are
run sequentially or concurrently.
Name |
Value |
Description |
|---|---|---|
|
|
Nodes are executed one after another in topological order. This is the default and is usually fastest for single-batch inference because it avoids thread-synchronization overhead. |
|
|
Independent nodes may run concurrently on different threads. Use
together with |
ExecutionOrder#
onnxruntime.ExecutionOrder determines the scheduling order for nodes
in the execution plan.
Name |
Value |
Description |
|---|---|---|
|
|
Nodes are scheduled in the default topological order computed by ONNX Runtime. |
|
|
Nodes are scheduled according to a priority that ONNX Runtime assigns to minimize peak memory usage while maximizing throughput. |
|
|
Nodes are scheduled to minimize peak memory consumption, at the potential cost of some throughput. |
Session Configuration Entries#
add_session_config_entry(key, value) accepts string key/value pairs for
advanced session options that are not exposed as named properties. The full
set of recognised keys is defined in the ONNX Runtime source file
include/onnxruntime/core/session/onnxruntime_session_options_config_keys.h.
The tables below enumerate all keys, grouped by prefix.
session.* keys#
Key |
Default |
Description |
|---|---|---|
|
|
Set to |
|
|
Set to |
|
|
Force the model format: |
|
|
Controls the format used when saving the optimized model
( |
|
|
Set to |
|
|
Set to |
|
|
Set to |
|
|
Set to |
|
|
Set to |
|
|
Set to |
|
|
Controls whether graph optimizations run in a feedback loop.
|
|
|
Set to |
|
|
Set to |
|
|
Same as |
|
(iteration-based) |
Duration in microseconds that intra-op threads spin before
blocking. Requires |
|
(iteration-based) |
Same as |
|
|
Maximum exponential-backoff cap for the intra-op spin loop. Values ≥ 2 reduce CPU load during spinning. Clamped to 64. |
|
|
Same as |
|
|
Set to |
|
|
Set to |
|
|
Set to |
|
|
Set to |
|
(disabled) |
Set to a positive integer (e.g. |
|
|
Set to |
|
|
Set to |
|
|
Set to |
|
|
Path to a file that specifies which nodes are assigned to which execution providers (logic-stream partitioning). |
|
|
Semicolon-separated CPU affinity specification for intra-op
threads. Example: |
|
|
Set to |
|
|
Set to |
|
|
When |
|
|
Minimum initializer size in bytes above which initializers are placed in the external data file during serialization. |
|
|
Folder path for external data files when loading a model from a memory buffer. All external data files must reside in this folder. |
|
|
Set to |
|
(not set) |
Full path to a CSV file where per-node memory statistics (initializer size, dynamic output sizes, temp allocations) are written. Useful for estimating runtime memory requirements. |
|
|
Composite |
|
|
Semicolon-separated per-device annotation strings that guide node
assignment during partitioning, matched against node metadata
|
|
|
Accuracy level used when converting |
|
|
Block size for the |
|
|
Set to |
|
|
Set to |
|
|
Set to |
|
|
Set to |
optimization.* keys#
Key |
Default |
Description |
|---|---|---|
|
|
Set to |
|
|
Set to |
|
|
Comma-separated list of optimizer names to skip (e.g.
|
|
|
Controls how minimal-build optimizations are applied in a full
build: |
|
|
(Training only) Path to a JSON file describing memory
optimization configurations (recompute subgraph patterns) for
|
|
|
(Training only) Integer pair controlling subgraph detection for memory-footprint reduction via recompute. |
ep.* keys#
Key |
Default |
Description |
|---|---|---|
|
(default set) |
Comma-separated list of op types at which the NNAPI EP stops
graph partitioning. Set to |
|
|
Set to |
|
(original name + ``_ctx.onnx``) |
Path for the EPContext ONNX file written when
|
|
|
|
|
|
Prefix added to EPContext node names to make them unique when multiple EPContext graphs are merged into one model. |
|
|
Set to |
|
|
Set to |
|
|
When generating an EPContext model and some nodes fall back to the CPU EP, this entry names the external data file into which all initializers are placed in the generated ONNX file. |
|
|
Set to |
mlas.* keys#
Key |
Default |
Description |
|---|---|---|
|
|
Set to |
|
|
Set to |
|
|
Set to |
Dynamic EP options (ep.dynamic.*)#
These keys are intended for use with SetEpDynamicOptions and may be
changed at any time, not just at session creation.
Key |
Default |
Description |
|---|---|---|
|
|
Scheduling-priority hint for the session workload. |
|
|
QNN HTP performance mode. Allowed values: |
Example#
The following snippet shows how to configure a session that disables all graph optimizations, enables profiling, and restricts execution to two intra-op threads:
import onnxruntime
opts = onnxruntime.SessionOptions()
opts.graph_optimization_level = onnxruntime.GraphOptimizationLevel.ORT_DISABLE_ALL
opts.enable_profiling = True
opts.profile_file_prefix = "/tmp/my_model_profile_"
opts.intra_op_num_threads = 2
sess = onnxruntime.InferenceSession(
"model.onnx",
sess_options=opts,
providers=["CPUExecutionProvider"],
)
The same can be achieved with the yobx wrappers by passing individual keyword arguments:
from yobx.reference import OnnxruntimeEvaluator
evaluator = OnnxruntimeEvaluator(
"model.onnx",
graph_optimization_level=False, # False → ORT_DISABLE_ALL
enable_profiling=True,
)
See also
Evaluators — overview of the three evaluators provided
by yobx and when to use each one.
onnxruntime.RunOptions#
onnxruntime.RunOptions controls a single call to
InferenceSession.run(). It exposes a small set of named properties
and, like SessionOptions, an add_run_config_entry(key, value)
method for advanced per-run settings.
Named properties#
Property |
Default |
Description |
|---|---|---|
|
|
Minimum verbosity of messages logged during the run.
Same severity scale as
|
|
|
Verbosity level for VERBOSE-severity messages. Only
effective when |
|
|
Tag prepended to log messages emitted during this run. |
|
|
Set to |
|
|
When |
|
|
Set to |
Run Configuration Entries#
add_run_config_entry(key, value) sets per-run configuration that
cannot be expressed through named properties. The full set of
recognised keys is defined in
include/onnxruntime/core/session/onnxruntime_run_options_config_keys.h.
Key |
Default |
Description |
|---|---|---|
|
|
Semicolon-separated |
|
|
Set to |
|
|
HTP performance mode applied before the run for the QNN EP.
Allowed values: |
|
|
HTP performance mode restored after the run for the QNN EP.
Accepts the same values as |
|
(not set) |
RPC control latency setting for the QNN HTP backend. |
|
(not set) |
Path to a QNN LoRA configuration file used to apply LoRA weights inside the QNN context binary during inference. |
|
|
Graph annotation ID for CUDA EP when |