Change Logs

0.3.0

  • #181: add MaskedScatterNDOfShape custom operator

  • #175: adds custom operator MulSub and SubMul on CUDA

  • #173: adds custom operator AddSharedInput, MulSharedInput on CUDA

  • #170: adds custom operator TriMatrix on CUDA

  • #169: adds custom operator ReplaceZero on CUDA

  • #168: adds custom operator MulSigmoid on CUDA

  • #167: adds custom operator Rotary on CUDA

  • #166, #178: adds custom operators AddMul, MulAdd on CUDA

  • #165: adds custom operators AddAddAdd, MulMulMul on CUDA

  • #163: use onnxruntime==1.17.3 as default

  • #162: add ScatterNDOfShape implementation on CUDA without atomics

  • #159: add AddAdd custom operator on CUDA

  • #158: add MulMul custom operator on CUDA

  • #157: add ScatterNDOfShape custom operator

  • #155: add a function to draw a timeline from a profile

  • #154: improves ploting legend for profiling

  • #151: refactoring of TreeEnsemble code to make them faster

  • #129, #132: support sparse features for TreeEnsemble

0.2.4

  • #120: use onnxruntime==1.16.3 as default

  • #115, #116, #118: adds C implementation of SVMRegressor, SVMClassifier reference operator based on it, and custom kernels for onnxruntime as well

  • #111, #117, #119: adds C implementation of TfIdfVectorizer + python implementation of Tokenizer + custom kernel for onnxruntime

  • #110: allows LEQ as an alias for BRANCH_LEQ for nodes_modes in TreeEnsemble* operators

  • #108: improves command lines documentation, fix an issue in command line stats

  • #103: add methods to compute statistics on TreeEnsemble and initializers

0.2.3

  • #99: use onnxruntime==1.16.1 as default

  • #96: implements a fonction to convert a ModelProto into string (not bytes), add a function to multiply the number of trees in a TreeEnsemble

  • #75: add an implementation of murmurhash3 to validate some options

  • #93: validates the wheels in CI

  • #89: add a function to merge models and update them if both have different opsets

0.2.2

  • #87: update the quantization tools to use a simplified dynamic linear quantization into float 8

  • #85: add load_model, save_model to help saving with/without external data

  • #82: fixes benchmark on multiple versions of onnxruntime

0.2.1

  • #79: update to onnxruntime v1.16.0

  • #77: helpers to benchmark a model

  • #74: add a function to enumerate all intermediate results with onnxruntime

  • #71, #72, #73: add function to analyse a profile produce by onnxruntime

  • #68, #69, #70: add CPU implementation for CustomGemmFloat8

  • #67: add a function to extract a subgraph of a model

  • #59, #60, #61, #62, #63, #65, #66, #68, #69, #70: add local functions to quantize into float 8, float 16

  • #57: add C implementation for DynamicQuantizeLinear (for experimentation)

  • #56: add C implementation to cast a float into float 8

  • #55, #58: add basic functionality to transform a graph, starts with basic quantization

  • #51: fix optimized TreeEnsembleRegressor and adds TreeEnsembleClassifier as custom ops

  • #50: add command line store to store intermediate outputs

  • #49: add option to save intermediate results in CReferenceEvaluator

  • #45: add option cuda-link to setup.py to specify how to link with CUDA library

  • #41: implements a custom kernel for RandomForestRegressor easier to optimize

  • #34: update to onnxruntime v1.15.1

  • #31: implement a custom CUDA kernel (gemm)

  • #32: update to onnxruntime v1.15.0

  • #27: add a custom kernel with parameters to onnxruntime

  • #26: add a custom kernel to onnxruntime

  • #24: use Eigen to implement Conv operator

  • #23: make pip wheel . work

  • #22: rename cmake into _cmake to avoid warnings related to cmake package

  • #19: minimal settings to use onnxruntime

  • #14: minimal setting to use CUDA

  • #8: support for C++ unit test