onnx_diagnostic.ci_models.export_qwen25_vl¶
Export visual and embedding parts of Qwen/Qwen2.5-VL-7B-Instruct¶
Requirements¶
git+https://github.com/sdpython/experimental-experiment.git # optional
huggingface_hub>=1.2.1
onnx-diagnostic>=0.8.6
onnxruntime>=1.23
torch>=2.9 # weekly is better
tqdm
transformers>=4.57
Examples¶
python -m onnx_diagnostic.ci_models.export_qwen25_vl \
-m Qwen/Qwen2.5-VL-7B-Instruct \
--device cpu --dtype float32 --exporter onnx-dynamo --pretrained --second-input --zip
To choose a specific Attention schema:
QWEN25ATTENTION=LOOPMHA python -m onnx_diagnostic.ci_models.export_qwen25_vl \
-m Qwen/Qwen2.5-VL-7B-Instruct \
--device cpu --dtype float32 --exporter onnx-dynamo --pretrained --second-input --zip
Cheat sheet for tar commands. To make a tar:
tar -czvf model.tar.gz model.onnx model.data
And to untar:
tar -xzvf model.tar.gz.
Rewritings¶
Attention¶
The attention is either implemented with MultiHeadAttention in a loop,
either with PackedMultiHeadAttention. The choice is made based on the device.
It is possible to overwrite this by by setting environment variable
QWEN25ATTENTION to:
PACKED: PackedMultiHeadAttentionLOOPMHA: Loop over MultiHeadAttentionLOOPA23: Loop over Attention(23), needs opset 23+.
- onnx_diagnostic.ci_models.export_qwen25_vl.get_untrained_model(model_id: str, second_input: bool, verbose: int) Dict[str, Any][source][source]¶
Returns an untrained model.
- Parameters:
model_id – model id
second_input – second input set
verbose – verbosity
- Returns:
model and data
- onnx_diagnostic.ci_models.export_qwen25_vl.main(model_id: str = 'Qwen/Qwen2.5-VL-7B-Instruct', device: str = 'cpu', dtype: str = 'float32', exporter: str = 'onnx-dynamo', pretrained: bool = True, second_input: bool = True, make_zip: bool = False, output_folder: str = 'dump_models', existing_onnx: str | None = None, part: str = 'visual', atol: float = 0.01, mismatch01: float = 0.1, profile_exporter: bool = False)[source][source]¶
Exports model Qwen/Qwen2.5-VL-7B-Instruct or pieces of it. The script applies as well to other models based on the same architecture.
The function saves everything on disk. It does not generate new inputs on the second run but reuses the saved ones. Same goes for the expected outputs with are also saved on disk.
- Parameters:
model_id – model id
device – device
dtype – dtype
exporter – exportor to use
pretrained – pretrained=False is usually used to test
second_input – checks discrepancies on more examples
make_zip – creates a zip at the end
output_folder – output folder
part – “” to export the whole model,
"visual"for visual part,"embedding"for the embedding partatol – raises an exception if tolerance is above that threshold
mismatch01 – raises an exception if the ratio of mismatches is above that threshold
profile_exporter – profiles the exporter