onnx_diagnostic.ci_models.export_qwen25_vl

Export visual and embedding parts of Qwen/Qwen2.5-VL-7B-Instruct

Requirements

git+https://github.com/sdpython/experimental-experiment.git  # optional
huggingface_hub>=1.2.1
onnx-diagnostic>=0.8.6
onnxruntime>=1.23
torch>=2.9  # weekly is better
tqdm
transformers>=4.57

Examples

python -m onnx_diagnostic.ci_models.export_qwen25_vl \
    -m Qwen/Qwen2.5-VL-7B-Instruct \
    --device cpu --dtype float32 --exporter onnx-dynamo --pretrained --second-input --zip

To choose a specific Attention schema:

QWEN25ATTENTION=LOOPMHA python -m onnx_diagnostic.ci_models.export_qwen25_vl \
    -m Qwen/Qwen2.5-VL-7B-Instruct \
    --device cpu --dtype float32 --exporter onnx-dynamo --pretrained --second-input --zip

Cheat sheet for tar commands. To make a tar: tar -czvf model.tar.gz model.onnx model.data And to untar: tar -xzvf model.tar.gz.

Rewritings

Attention

The attention is either implemented with MultiHeadAttention in a loop, either with PackedMultiHeadAttention. The choice is made based on the device. It is possible to overwrite this by by setting environment variable QWEN25ATTENTION to:

  • PACKED: PackedMultiHeadAttention

  • LOOPMHA: Loop over MultiHeadAttention

  • LOOPA23: Loop over Attention(23), needs opset 23+.

onnx_diagnostic.ci_models.export_qwen25_vl.get_untrained_model(model_id: str, second_input: bool, verbose: int) Dict[str, Any][source][source]

Returns an untrained model.

Parameters:
  • model_id – model id

  • second_input – second input set

  • verbose – verbosity

Returns:

model and data

onnx_diagnostic.ci_models.export_qwen25_vl.main(model_id: str = 'Qwen/Qwen2.5-VL-7B-Instruct', device: str = 'cpu', dtype: str = 'float32', exporter: str = 'onnx-dynamo', pretrained: bool = True, second_input: bool = True, make_zip: bool = False, output_folder: str = 'dump_models', existing_onnx: str | None = None, part: str = 'visual', atol: float = 0.01, mismatch01: float = 0.1, profile_exporter: bool = False)[source][source]

Exports model Qwen/Qwen2.5-VL-7B-Instruct or pieces of it. The script applies as well to other models based on the same architecture.

The function saves everything on disk. It does not generate new inputs on the second run but reuses the saved ones. Same goes for the expected outputs with are also saved on disk.

Parameters:
  • model_id – model id

  • device – device

  • dtype – dtype

  • exporter – exportor to use

  • pretrained – pretrained=False is usually used to test

  • second_input – checks discrepancies on more examples

  • make_zip – creates a zip at the end

  • output_folder – output folder

  • part – “” to export the whole model, "visual" for visual part, "embedding" for the embedding part

  • atol – raises an exception if tolerance is above that threshold

  • mismatch01 – raises an exception if the ratio of mismatches is above that threshold

  • profile_exporter – profiles the exporter