.. _l-design-session-options: ================================ onnxruntime.SessionOptions Guide ================================ :class:`onnxruntime.SessionOptions` controls how :class:`onnxruntime.InferenceSession` loads and runs a model. This page lists every property, method, and related enum together with a short description, its default value, and — where applicable — the matching ``yobx`` wrapper parameter. .. rubric:: yobx wrappers The following classes accept individual ``SessionOptions`` fields as keyword arguments so that callers rarely need to build a :class:`onnxruntime.SessionOptions` object by hand: * :class:`~yobx.reference._inference_session._InferenceSession` * :class:`~yobx.reference._inference_session_numpy.InferenceSessionForNumpy` * :class:`~yobx.reference._inference_session_torch.InferenceSessionForTorch` * :class:`~yobx.reference.onnxruntime_evaluator.OnnxruntimeEvaluator` Pass a fully-configured :class:`onnxruntime.SessionOptions` object via the ``session_options`` argument to bypass all individual keyword arguments. Properties ========== .. list-table:: :widths: 30 12 58 :header-rows: 1 * - Property - Default - Description * - ``enable_cpu_mem_arena`` - ``True`` - When ``True``, enables the CPU memory arena. The arena pre-allocates a large block of memory and serves subsequent allocations from it, which reduces allocation overhead for many small tensors. Set to ``False`` to disable the arena and use the system allocator directly (useful when memory is tight). * - ``enable_mem_pattern`` - ``True`` - When ``True``, ONNX Runtime analyses the graph to determine a static memory layout that can be reused across runs. Disabling this can slightly reduce peak memory at the cost of some per-run overhead. * - ``enable_mem_reuse`` - ``True`` - When ``True``, output buffers are reused across inference calls where possible. Set to ``False`` to always allocate fresh output buffers (useful for debugging memory issues). * - ``enable_profiling`` - ``False`` - When ``True``, ONNX Runtime collects per-node timing data during inference. The profile is written to a JSON file whose name is derived from ``profile_file_prefix``. Corresponds to the ``enable_profiling`` parameter of the yobx wrappers. * - ``execution_mode`` - ``ORT_SEQUENTIAL`` - Controls whether operators are executed sequentially or in parallel. See :ref:`l-ort-execution-mode` for the available values. Use ``ORT_PARALLEL`` together with ``inter_op_num_threads`` to exploit multi-node parallelism. * - ``execution_order`` - ``DEFAULT`` - Determines the order in which nodes are scheduled for execution. See :ref:`l-ort-execution-order` for the available values. * - ``graph_optimization_level`` - ``ORT_ENABLE_ALL`` - Sets the level of graph optimizations applied before execution. See :ref:`l-ort-graph-optimization-level` for the available levels. Corresponds to the ``graph_optimization_level`` parameter of the yobx wrappers (also accepts a plain ``bool``). * - ``inter_op_num_threads`` - ``0`` (auto) - Number of threads used for parallelism *between* independent graph nodes when ``execution_mode`` is ``ORT_PARALLEL``. ``0`` lets ONNX Runtime choose automatically. * - ``intra_op_num_threads`` - ``0`` (auto) - Number of threads used for parallelism *within* a single operator (e.g., matrix multiplication). ``0`` lets ONNX Runtime choose automatically. * - ``log_severity_level`` - ``-1`` (default) - Severity threshold for session-level log messages. Messages below this level are suppressed. Common values: ``0`` = VERBOSE, ``1`` = INFO, ``2`` = WARNING, ``3`` = ERROR, ``4`` = FATAL. Corresponds to the ``log_severity_level`` parameter of the yobx wrappers. * - ``log_verbosity_level`` - ``0`` - Verbosity sub-level for VERBOSE messages (``log_severity_level == 0``). Higher values produce more output. Corresponds to the ``log_verbosity_level`` parameter of the yobx wrappers. * - ``logid`` - ``""`` - String tag prepended to log messages emitted by this session. Useful when multiple sessions run concurrently. * - ``optimized_model_filepath`` - ``""`` - Path where the optimized ONNX model is saved after graph optimization. Leave empty to skip saving. When set, a companion data file is also configured automatically by the yobx wrappers. Corresponds to the ``optimized_model_filepath`` parameter of the yobx wrappers. * - ``profile_file_prefix`` - ``"onnxruntime_profile_"`` - Prefix of the JSON file written when ``enable_profiling`` is ``True``. ONNX Runtime appends a timestamp and ``.json`` suffix. * - ``use_deterministic_compute`` - ``False`` - When ``True``, forces ONNX Runtime to use deterministic algorithms everywhere, at the cost of potentially lower performance. Useful for reproducible debugging. * - ``use_per_session_threads`` - ``True`` - When ``True``, each session owns its own thread pool. Set to ``False`` to share a global thread pool across sessions, which reduces thread-creation overhead when many short-lived sessions are created. Methods ======= .. list-table:: :widths: 40 60 :header-rows: 1 * - Method - Description * - ``add_session_config_entry(key, value)`` - Sets a single session configuration entry as a key/value string pair. This is the primary way to pass advanced options that are not exposed as named properties. See :ref:`l-ort-session-config-entries` for a selection of commonly used keys. Used by the yobx wrappers to set ``session.disable_aot_function_inlining`` and the external-data file name for ``optimized_model_filepath``. * - ``get_session_config_entry(key)`` - Returns the string value previously set with ``add_session_config_entry``. * - ``add_free_dimension_override_by_name(dim_name, value)`` - Binds the symbolic input dimension named ``dim_name`` to a concrete integer ``value`` for this session. Allows ONNX Runtime to specialize and optimize the model for a fixed shape without re-exporting. * - ``add_free_dimension_override_by_denotation(denotation, value)`` - Like ``add_free_dimension_override_by_name`` but identifies the dimension by its ONNX *denotation* string (e.g. ``"DATA_BATCH"``). * - ``add_initializer(name, ort_value)`` - Shares a pre-allocated :class:`onnxruntime.OrtValue` as a named initializer with the session. Avoids copying large weight tensors into the session at load time. * - ``add_external_initializers(names, ort_values)`` - Supplies a list of external initializers (by name and ``OrtValue``) that override the initializers stored in the ONNX model. Useful for sharing weights across sessions. * - ``add_external_initializers_from_files_in_memory(filenames, buffers, lengths)`` - Like ``add_external_initializers`` but reads the initializer data from in-memory byte buffers that correspond to external data files. * - ``add_provider(provider_name, options)`` - Adds an explicit execution provider with a string options mapping. Prefer passing the ``providers`` list to :class:`onnxruntime.InferenceSession` directly; this method is useful when building options programmatically. * - ``add_provider_for_devices(ort_ep_devices, options)`` - Like ``add_provider`` but identifies the provider through a sequence of ``OrtEpDevice`` descriptors returned by the device-selection API. * - ``has_providers()`` - Returns ``True`` if the ``SessionOptions`` object already has execution providers, ``OrtEpDevices``, or policies configured. * - ``register_custom_ops_library(path)`` - Registers a shared library (``*.so`` / ``*.dll``) that implements custom ONNX operator kernels required by the model. Must be called before the :class:`onnxruntime.InferenceSession` is created. * - ``set_load_cancellation_flag(cancel)`` - When ``cancel=True``, requests that an in-progress session load be aborted. Useful for implementing load timeouts in long-running services. * - ``set_provider_selection_policy(policy)`` - Sets an automatic EP selection policy (``OrtExecutionProviderDevicePolicy``) that ONNX Runtime uses to pick the best execution provider at runtime. * - ``set_provider_selection_policy_delegate(callable)`` - Provides a Python callable that ONNX Runtime calls to choose an execution provider. The callable receives the candidate ``OrtEpDevice`` list and must return the chosen provider name and options. .. _l-ort-graph-optimization-level: GraphOptimizationLevel ====================== ``onnxruntime.GraphOptimizationLevel`` is an enum that controls which optimization passes run before model execution. .. list-table:: :widths: 30 10 60 :header-rows: 1 * - Name - Value - Description * - ``ORT_DISABLE_ALL`` - ``0`` - Disables all graph optimizations. The model is executed exactly as exported. Use this when diagnosing incorrect results caused by graph rewrites, or when benchmarking the unoptimized graph. * - ``ORT_ENABLE_BASIC`` - ``1`` - Enables constant folding, redundant node elimination, and other cheap algebraic simplifications that are always safe. * - ``ORT_ENABLE_EXTENDED`` - ``2`` - Adds more aggressive fusions on top of ``ORT_ENABLE_BASIC`` such as GELU, attention, and layer-normalization fusions. * - ``ORT_ENABLE_LAYOUT`` - ``3`` - Adds layout-transformation optimizations (e.g. NCHWc) on top of ``ORT_ENABLE_EXTENDED``. * - ``ORT_ENABLE_ALL`` - ``99`` - Enables all optimizations, including the ones that depend on the selected execution provider. This is the default. In the yobx wrappers the ``graph_optimization_level`` parameter also accepts a plain ``bool``: ``True`` maps to ``ORT_ENABLE_ALL`` and ``False`` maps to ``ORT_DISABLE_ALL``. .. _l-ort-execution-mode: ExecutionMode ============= ``onnxruntime.ExecutionMode`` controls whether independent graph nodes are run sequentially or concurrently. .. list-table:: :widths: 30 10 60 :header-rows: 1 * - Name - Value - Description * - ``ORT_SEQUENTIAL`` - ``0`` - Nodes are executed one after another in topological order. This is the default and is usually fastest for single-batch inference because it avoids thread-synchronization overhead. * - ``ORT_PARALLEL`` - ``1`` - Independent nodes may run concurrently on different threads. Use together with ``inter_op_num_threads`` to set the thread count. Most beneficial when the graph contains wide parallelism (many independent branches) and the hardware has many cores. .. _l-ort-execution-order: ExecutionOrder ============== ``onnxruntime.ExecutionOrder`` determines the scheduling order for nodes in the execution plan. .. list-table:: :widths: 30 10 60 :header-rows: 1 * - Name - Value - Description * - ``DEFAULT`` - ``0`` - Nodes are scheduled in the default topological order computed by ONNX Runtime. * - ``PRIORITY_BASED`` - ``1`` - Nodes are scheduled according to a priority that ONNX Runtime assigns to minimize peak memory usage while maximizing throughput. * - ``MEMORY_EFFICIENT`` - ``2`` - Nodes are scheduled to minimize peak memory consumption, at the potential cost of some throughput. .. _l-ort-session-config-entries: Session Configuration Entries ============================== ``add_session_config_entry(key, value)`` accepts string key/value pairs for advanced session options that are not exposed as named properties. The full set of recognised keys is defined in the ONNX Runtime source file ``include/onnxruntime/core/session/onnxruntime_session_options_config_keys.h``. The tables below enumerate all keys, grouped by prefix. ``session.*`` keys ------------------ .. list-table:: :widths: 55 10 35 :header-rows: 1 * - Key - Default - Description * - ``session.disable_prepacking`` - ``"0"`` - Set to ``"1"`` to disable pre-packing of constant initializers. Pre-packing rearranges weights at load time for faster kernels; disabling it reduces session-creation time at the cost of slower inference. * - ``session.use_env_allocators`` - ``"0"`` - Set to ``"1"`` to use allocators registered in the ``OrtEnv`` instead of per-session allocators. Allows allocator sharing across sessions. * - ``session.load_model_format`` - ``""`` - Force the model format: ``"ORT"`` for the ORT flatbuffer format, ``"ONNX"`` for the protobuf format. Leave empty for automatic detection based on file extension or byte signature. * - ``session.save_model_format`` - ``""`` - Controls the format used when saving the optimized model (``optimized_model_filepath``). ``"ORT"`` saves as a flatbuffer, ``"ONNX"`` saves as a protobuf. Leave empty for automatic detection based on the file extension. * - ``session.set_denormal_as_zero`` - ``"0"`` - Set to ``"1"`` to enable flush-to-zero and denormal-as-zero in floating-point arithmetic for all threads in the session thread pool. Can improve performance on some hardware but may reduce accuracy for models that rely on denormal values. * - ``session.disable_quant_qdq`` - ``"0"`` - Set to ``"1"`` to disable QDQ (QuantizeLinear/DequantizeLinear) fusion optimizations. Defaults to ``"1"`` automatically when the DirectML EP is registered. * - ``session.disable_qdq_constant_folding`` - ``"0"`` - Set to ``"1"`` to prevent DequantizeLinear nodes from being individually constant-folded, even when ``session.disable_quant_qdq`` is ``"1"``. Useful for EPs such as WebNN that disable QDQ fusion but still need the original DQ/Q nodes. * - ``session.disable_double_qdq_remover`` - ``"0"`` - Set to ``"1"`` to keep the middle two nodes in ``Q→(DQ→Q)→DQ`` patterns instead of removing them. * - ``session.enable_quant_qdq_cleanup`` - ``"0"`` - Set to ``"1"`` to remove residual ``Q→DQ`` pairs after all QDQ handling is complete. Can improve performance but may affect accuracy; test carefully. Available since ORT 1.11. * - ``session.disable_aot_function_inlining`` - ``"0"`` - Set to ``"1"`` to prevent ONNX Runtime from inlining ONNX functions ahead-of-time. Useful when function boundaries are needed for debugging or profiling. Corresponds to the ``disable_aot_function_inlining`` parameter of the yobx wrappers. * - ``session.graph_optimizations_loop_level`` - ``"1"`` - Controls whether graph optimizations run in a feedback loop. ``"0"`` = single pass; ``"1"`` = loop re-runs if Level 4 optimizations were applied; ``"2"`` = loop re-runs if any Level 2+ optimization was applied. * - ``session.use_device_allocator_for_initializers`` - ``"0"`` - Set to ``"1"`` to allocate initialized tensor memory through the device allocator (i.e., ``malloc``/``new``) instead of the session arena. * - ``session.inter_op.allow_spinning`` - ``"1"`` - Set to ``"0"`` to make inter-op threads block immediately when idle instead of spinning. Reduces CPU utilization at the cost of potentially higher latency. Defaults to ``"0"`` in client/on-device builds (``ORT_CLIENT_PACKAGE_BUILD``). * - ``session.intra_op.allow_spinning`` - ``"1"`` - Same as ``session.inter_op.allow_spinning`` but for intra-op threads. * - ``session.intra_op.spin_duration_us`` - *(iteration-based)* - Duration in microseconds that intra-op threads spin before blocking. Requires ``session.intra_op.allow_spinning`` to be enabled. Typical range: ``500``–``2000``. * - ``session.inter_op.spin_duration_us`` - *(iteration-based)* - Same as ``session.intra_op.spin_duration_us`` but for inter-op threads. * - ``session.intra_op.spin_backoff_max`` - ``"1"`` - Maximum exponential-backoff cap for the intra-op spin loop. Values ≥ 2 reduce CPU load during spinning. Clamped to 64. * - ``session.inter_op.spin_backoff_max`` - ``"1"`` - Same as ``session.intra_op.spin_backoff_max`` but for inter-op threads. * - ``session.use_ort_model_bytes_directly`` - ``"0"`` - Set to ``"1"`` to use the raw in-memory model bytes without copying them. The caller must keep the buffer alive for the lifetime of the session. * - ``session.use_ort_model_bytes_for_initializers`` - ``"0"`` - Set to ``"1"`` to read initializer data directly from the flatbuffer bytes (requires ``session.use_ort_model_bytes_directly``). Reduces peak memory during loading. * - ``session.qdqisint8allowed`` - ``"0"`` - Set to ``"1"`` when exporting an ORT format model for use on ARM platforms (enables INT8 QDQ). Available since ORT 1.11. * - ``session.x64quantprecision`` - ``"0"`` - Set to ``"1"`` to use U8U8 (instead of U8S8) matrix multiplication on x64 platforms with AVX2/AVX512 to avoid overflow. Slower but more numerically correct. * - ``session.dynamic_block_base`` - *(disabled)* - Set to a positive integer (e.g. ``"4"``) to enable dynamic block-sizing for the thread pool. Helps reduce E2E latency variance on wide-parallelism graphs. Available since ORT 1.11. * - ``session.force_spinning_stop`` - ``"0"`` - Set to ``"1"`` to force thread-pool threads to stop spinning immediately when the last concurrent ``Run()`` call returns. Reduces idle CPU usage between infrequent requests. * - ``session.strict_shape_type_inference`` - ``"0"`` - Set to ``"1"`` to turn shape/type inference inconsistencies into hard failures instead of logged warnings. * - ``session.allow_released_opsets_only`` - ``"0"`` - Set to ``"1"`` to reject models using opsets newer than the latest released version. Useful to catch accidental use of pre-release opsets in production. * - ``session.node_partition_config_file`` - ``""`` - Path to a file that specifies which nodes are assigned to which execution providers (logic-stream partitioning). * - ``session.intra_op_thread_affinities`` - ``""`` - Semicolon-separated CPU affinity specification for intra-op threads. Example: ``"1,2,3;4,5"`` pins thread 0 to CPUs 1–3 and thread 1 to CPUs 4–5. The number of entries must equal ``intra_op_num_threads - 1``. * - ``session.debug_layout_transformation`` - ``"0"`` - Set to ``"1"`` to dump intermediate ONNX models during layout transformation (e.g. NHWC conversion for NNAPI/XNNPACK/QNN EPs). Intended for developer debugging only. * - ``session.disable_cpu_ep_fallback`` - ``"0"`` - Set to ``"1"`` to prevent unsupported nodes from falling back to the CPU EP. Session creation fails if the selected EP cannot handle all nodes. Incompatible with explicitly adding the CPU EP. * - ``session.optimized_model_external_initializers_file_name`` - ``""`` - When ``optimized_model_filepath`` is set, this entry names the companion ``.data`` file for large initializers stored externally. Set automatically by the yobx wrappers. * - ``session.optimized_model_external_initializers_min_size_in_bytes`` - ``"1024"`` - Minimum initializer size in bytes above which initializers are placed in the external data file during serialization. * - ``session.model_external_initializers_file_folder_path`` - ``""`` - Folder path for external data files when loading a model from a memory buffer. All external data files must reside in this folder. * - ``session.save_external_prepacked_constant_initializers`` - ``"0"`` - Set to ``"1"`` to write pre-packed constant initializers to an external data file. Allows memory-mapping them on load, reducing heap usage for large models. * - ``session.collect_node_memory_stats_to_file`` - *(not set)* - Full path to a CSV file where per-node memory statistics (initializer size, dynamic output sizes, temp allocations) are written. Useful for estimating runtime memory requirements. * - ``session.resource_cuda_partitioning_settings`` - ``""`` - Composite ``"memory_limit_kb,stats_file"`` string enabling capacity-aware partitioning for the CUDA EP. * - ``session.layer_assignment_settings`` - ``""`` - Semicolon-separated per-device annotation strings that guide node assignment during partitioning, matched against node metadata ``layer_ann`` entries. * - ``session.qdq_matmulnbits_accuracy_level`` - ``"4"`` - Accuracy level used when converting ``DQ + MatMul`` to ``MatMulNBits``. See the ``MatMulNBits`` op schema for allowed values. * - ``session.qdq_matmulnbits_block_size`` - ``"0"`` (→ 32) - Block size for the ``DQ + MatMul → MatMulNBits`` conversion. ``"0"`` uses the default of 32; ``"-1"`` picks the largest power-of-2 ≤ min(K, 256) that minimizes padding. * - ``session.enable_dq_matmulnbits_fusion`` - ``"0"`` - Set to ``"1"`` to enable the ``DQ → MatMulNBits`` fusion graph transformer. Typically enabled automatically by the NvTensorRTRTX EP. * - ``session.disable_model_compile`` - ``"0"`` - Set to ``"1"`` to fail session creation if any EP needs to compile the model (i.e. require a pre-compiled EPContext model). * - ``session.fail_on_suboptimal_compiled_model`` - ``"0"`` - Set to ``"1"`` to fail session creation when the compiled model compatibility is ``SUPPORTED_PREFER_RECOMPILATION`` (suboptimal). * - ``session.record_ep_graph_assignment_info`` - ``"0"`` - Set to ``"1"`` to record which nodes were assigned to which EPs. Retrieve the information via ``Session_GetEpGraphAssignmentInfo()``. ``optimization.*`` keys ----------------------- .. list-table:: :widths: 55 10 35 :header-rows: 1 * - Key - Default - Description * - ``optimization.enable_gelu_approximation`` - ``"0"`` - Set to ``"1"`` to enable the fast GELU approximation in graph optimization. May change inference results slightly. * - ``optimization.enable_cast_chain_elimination`` - ``"0"`` - Set to ``"1"`` to enable elimination of chains of Cast nodes. May change inference results in edge cases. * - ``optimization.disable_specified_optimizers`` - ``""`` - Comma-separated list of optimizer names to skip (e.g. ``"ConstantFolding,MatMulAddFusion"``). Useful when a specific optimizer causes incorrect results or excessive load time. Not available in minimal builds. * - ``optimization.minimal_build_optimizations`` - ``""`` - Controls how minimal-build optimizations are applied in a full build: ``"save"`` saves them when exporting an ORT model; ``"apply"`` only applies optimizations available in a minimal build; leave empty for all full-build optimizations. Available since ORT 1.11. * - ``optimization.memory_optimizer_config`` - ``""`` - *(Training only)* Path to a JSON file describing memory optimization configurations (recompute subgraph patterns) for ``onnxruntime-training``. * - ``optimization.enable_memory_probe_recompute_config`` - ``"0:0"`` - *(Training only)* Integer pair controlling subgraph detection for memory-footprint reduction via recompute. ``ep.*`` keys ------------- .. list-table:: :widths: 55 10 35 :header-rows: 1 * - Key - Default - Description * - ``ep.nnapi.partitioning_stop_ops`` - *(default set)* - Comma-separated list of op types at which the NNAPI EP stops graph partitioning. Set to ``""`` to disable stop-op exclusion entirely. * - ``ep.context_enable`` - ``"0"`` - Set to ``"1"`` to enable the EPContext feature: after session creation the partitioned graph (with compiled EP context blobs) is saved to an ONNX file for reuse in future inference sessions, avoiding repeated compile overhead. * - ``ep.context_file_path`` - *(original name + ``_ctx.onnx``)* - Path for the EPContext ONNX file written when ``ep.context_enable`` is ``"1"``. Must be a file path, not a directory. * - ``ep.context_embed_mode`` - ``"0"`` - ``"0"`` stores the EP context blob in a separate file (path kept in the ONNX model); ``"1"`` embeds the blob directly inside the ONNX model. * - ``ep.context_node_name_prefix`` - ``""`` - Prefix added to EPContext node names to make them unique when multiple EPContext graphs are merged into one model. * - ``ep.share_ep_contexts`` - ``"0"`` - Set to ``"1"`` to share EP resources (e.g. compiled binaries) across sessions. * - ``ep.stop_share_ep_contexts`` - ``"0"`` - Set to ``"1"`` to stop sharing EP resources from this point on. * - ``ep.context_model_external_initializers_file_name`` - ``""`` - When generating an EPContext model and some nodes fall back to the CPU EP, this entry names the external data file into which all initializers are placed in the generated ONNX file. * - ``ep.enable_weightless_ep_context_nodes`` - ``"0"`` - Set to ``"1"`` to request that EPs create EPContext nodes without embedded weights (weights are provided as explicit inputs). Requires ``ep.context_enable`` to be ``"1"``. ``mlas.*`` keys --------------- .. list-table:: :widths: 55 10 35 :header-rows: 1 * - Key - Default - Description * - ``mlas.enable_gemm_fastmath_arm64_bfloat16`` - ``"0"`` - Set to ``"1"`` to enable BFloat16-accelerated GEMM on ARM64 (fastmath mode). * - ``mlas.use_lut_gemm`` - ``"0"`` - Set to ``"1"`` to use lookup-table-based GEMM kernels for quantized models when available. * - ``mlas.disable_kleidiai`` - ``"0"`` - Set to ``"1"`` to disable KleidiAI kernels even if the platform supports them. Dynamic EP options (``ep.dynamic.*``) -------------------------------------- These keys are intended for use with ``SetEpDynamicOptions`` and may be changed at any time, not just at session creation. .. list-table:: :widths: 55 10 35 :header-rows: 1 * - Key - Default - Description * - ``ep.dynamic.workload_type`` - ``"Default"`` - Scheduling-priority hint for the session workload. ``"Default"`` lets the OS choose; ``"Efficient"`` signals an efficiency-oriented, low-priority workload. * - ``ep.dynamic.qnn_htp_performance_mode`` - ``"default"`` - QNN HTP performance mode. Allowed values: ``"burst"``, ``"balanced"``, ``"default"``, ``"high_performance"``, ``"high_power_saver"``, ``"low_balanced"``, ``"extreme_power_saver"``, ``"low_power_saver"``, ``"power_saver"``, ``"sustained_high_performance"``. Example ======= The following snippet shows how to configure a session that disables all graph optimizations, enables profiling, and restricts execution to two intra-op threads: .. code-block:: python import onnxruntime opts = onnxruntime.SessionOptions() opts.graph_optimization_level = onnxruntime.GraphOptimizationLevel.ORT_DISABLE_ALL opts.enable_profiling = True opts.profile_file_prefix = "/tmp/my_model_profile_" opts.intra_op_num_threads = 2 sess = onnxruntime.InferenceSession( "model.onnx", sess_options=opts, providers=["CPUExecutionProvider"], ) The same can be achieved with the yobx wrappers by passing individual keyword arguments: .. code-block:: python from yobx.reference import OnnxruntimeEvaluator evaluator = OnnxruntimeEvaluator( "model.onnx", graph_optimization_level=False, # False → ORT_DISABLE_ALL enable_profiling=True, ) .. seealso:: :ref:`l-design-evaluator` — overview of the three evaluators provided by ``yobx`` and when to use each one. .. _l-ort-run-options: onnxruntime.RunOptions ====================== :class:`onnxruntime.RunOptions` controls a single call to ``InferenceSession.run()``. It exposes a small set of named properties and, like ``SessionOptions``, an ``add_run_config_entry(key, value)`` method for advanced per-run settings. Named properties ---------------- .. list-table:: :widths: 35 12 53 :header-rows: 1 * - Property - Default - Description * - ``log_severity_level`` - ``2`` (WARNING) - Minimum verbosity of messages logged during the run. Same severity scale as ``SessionOptions.log_severity_level``: 0 = VERBOSE, 1 = INFO, 2 = WARNING, 3 = ERROR, 4 = FATAL. * - ``log_verbosity_level`` - ``0`` - Verbosity level for VERBOSE-severity messages. Only effective when ``log_severity_level`` is 0. * - ``logid`` - ``""`` - Tag prepended to log messages emitted during this run. * - ``terminate`` - ``False`` - Set to ``True`` to request early termination of an in-progress ``run()`` call (e.g. from another thread). Resets to ``False`` automatically before the next run. * - ``only_execute_path_to_fetches`` - ``False`` - When ``True``, only the nodes required to compute the requested output names are executed, skipping the rest of the graph. Can reduce latency when fetching a subset of outputs. * - ``training_mode`` - ``False`` - Set to ``True`` to execute the model in training mode (e.g. keep dropout active). .. _l-ort-run-config-entries: Run Configuration Entries -------------------------- ``add_run_config_entry(key, value)`` sets per-run configuration that cannot be expressed through named properties. The full set of recognised keys is defined in ``include/onnxruntime/core/session/onnxruntime_run_options_config_keys.h``. .. list-table:: :widths: 55 12 33 :header-rows: 1 * - Key - Default - Description * - ``memory.enable_memory_arena_shrinkage`` - ``""`` - Semicolon-separated ``device:id`` pairs identifying memory arenas to shrink after the run (e.g. ``"cpu:0;gpu:0"``). Reduces peak RSS between runs at the cost of re-allocation overhead. The CPU arena must not have been disabled via ``SessionOptions.enable_cpu_mem_arena = False`` when ``"cpu"`` is listed. * - ``disable_synchronize_execution_providers`` - ``"0"`` - Set to ``"1"`` to skip synchronising execution providers (e.g. the CUDA compute stream) with the CPU at the end of the run. Useful for pipelined workloads where the caller synchronises manually. * - ``qnn.htp_perf_mode`` - ``"default"`` - HTP performance mode applied before the run for the QNN EP. Allowed values: ``"burst"``, ``"balanced"``, ``"default"``, ``"high_performance"``, ``"high_power_saver"``, ``"low_balanced"``, ``"extreme_power_saver"``, ``"low_power_saver"``, ``"power_saver"``, ``"sustained_high_performance"``. * - ``qnn.htp_perf_mode_post_run`` - ``"default"`` - HTP performance mode restored after the run for the QNN EP. Accepts the same values as ``qnn.htp_perf_mode``. * - ``qnn.rpc_control_latency`` - *(not set)* - RPC control latency setting for the QNN HTP backend. * - ``qnn.lora_config`` - *(not set)* - Path to a QNN LoRA configuration file used to apply LoRA weights inside the QNN context binary during inference. * - ``gpu_graph_id`` - ``"0"`` - Graph annotation ID for CUDA EP when ``enable_cuda_graph`` is enabled. Allows capturing and replaying multiple distinct CUDA graphs in the same session. Set to ``"-1"`` to disable CUDA graph capture/replay for a specific run. Value ``"0"`` is reserved for internal use.