Linear Regression and export to ONNX¶

scikit-learn and torch to train a linear regression.

data¶

import numpy as np
from sklearn.datasets import make_regression
from sklearn.linear_model import LinearRegression, SGDRegressor
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.model_selection import train_test_split
import torch
from onnxruntime import InferenceSession
from experimental_experiment.helpers import pretty_onnx
from onnx_array_api.plotting.graphviz_helper import plot_dot


X, y = make_regression(1000, n_features=5, noise=10.0, n_informative=2)
print(X.shape, y.shape)

X_train, X_test, y_train, y_test = train_test_split(X, y)

(1000, 5) (1000,)

scikit-learn: the simple regression¶

$A^* = (X'X)^{-1}X'Y$

clr = LinearRegression()
clr.fit(X_train, y_train)

print(f"coefficients: {clr.coef_}, {clr.intercept_}")

coefficients: [ 0.3459669   0.45336692  0.65710949 26.14819898 25.6202578 ], -0.15630463825218383

Evaluation¶

y_pred = clr.predict(X_test)
l2 = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"LinearRegression: l2={l2}, r2={r2}")

LinearRegression: l2=102.12270199862125, r2=0.9443646784589387

scikit-learn: SGD algorithm¶

SGD = Stochastic Gradient Descent

clr = SGDRegressor(max_iter=5, verbose=1)
clr.fit(X_train, y_train)

print(f"coefficients: {clr.coef_}, {clr.intercept_}")

-- Epoch 1
Norm: 31.18, NNZs: 5, Bias: -0.284870, T: 750, Avg. loss: 186.075143
Total training time: 0.00 seconds.
-- Epoch 2
Norm: 35.18, NNZs: 5, Bias: -0.270052, T: 1500, Avg. loss: 58.010925
Total training time: 0.00 seconds.
-- Epoch 3
Norm: 36.24, NNZs: 5, Bias: -0.194536, T: 2250, Avg. loss: 53.378042
Total training time: 0.00 seconds.
-- Epoch 4
Norm: 36.57, NNZs: 5, Bias: -0.231132, T: 3000, Avg. loss: 52.981006
Total training time: 0.00 seconds.
-- Epoch 5
Norm: 36.50, NNZs: 5, Bias: -0.166303, T: 3750, Avg. loss: 52.930445
Total training time: 0.00 seconds.
~/vv/this312/lib/python3.12/site-packages/sklearn/linear_model/_stochastic_gradient.py:1608: ConvergenceWarning: Maximum number of iteration reached before convergence. Consider increasing max_iter to improve the fit.
  warnings.warn(
coefficients: [ 0.24442643  0.51140703  0.67514459 26.11876261 25.47591706], [-0.16630295]

Evaluation

y_pred = clr.predict(X_test)
sl2 = mean_squared_error(y_test, y_pred)
sr2 = r2_score(y_test, y_pred)
print(f"SGDRegressor: sl2={sl2}, sr2={sr2}")

SGDRegressor: sl2=102.6936035240574, sr2=0.9440536576054526

Linrar Regression with pytorch¶

class TorchLinearRegression(torch.nn.Module):
    def __init__(self, n_dims: int, n_targets: int):
        super().__init__()
        self.linear = torch.nn.Linear(n_dims, n_targets)

    def forward(self, x):
        return self.linear(x)


def train_loop(dataloader, model, loss_fn, optimizer):
    total_loss = 0.0

    # Set the model to training mode - important for batch normalization and dropout layers
    # Unnecessary in this situation but added for best practices
    model.train()
    for X, y in dataloader:
        # Compute prediction and loss
        pred = model(X)
        loss = loss_fn(pred.ravel(), y)

        # Backpropagation
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

        # training loss
        total_loss += loss

    return total_loss


model = TorchLinearRegression(X_train.shape[1], 1)
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)
loss_fn = torch.nn.MSELoss()

device = "cpu"
model = model.to(device)
dataset = torch.utils.data.TensorDataset(
    torch.Tensor(X_train).to(device), torch.Tensor(y_train).to(device)
)
dataloader = torch.utils.data.DataLoader(dataset, batch_size=1)


for i in range(5):
    loss = train_loop(dataloader, model, loss_fn, optimizer)
    print(f"iteration {i}, loss={loss}")

iteration 0, loss=405055.0
iteration 1, loss=96568.84375
iteration 2, loss=80600.0
iteration 3, loss=79749.5546875
iteration 4, loss=79699.8359375

Let’s check the error

y_pred = model(torch.Tensor(X_test)).detach().numpy()
tl2 = mean_squared_error(y_test, y_pred)
tr2 = r2_score(y_test, y_pred)
print(f"TorchLinearRegression: tl2={tl2}, tr2={tr2}")

TorchLinearRegression: tl2=102.52056525913498, tr2=0.9441479269434102

And the coefficients.

print("coefficients:")
for p in model.parameters():
    print(p)

coefficients:
Parameter containing:
tensor([[ 0.3743,  0.3008,  0.7140, 26.3154, 25.4603]], requires_grad=True)
Parameter containing:
tensor([-0.2682], requires_grad=True)

Conversion to ONNX¶

Let’s convert it to ONNX.

ep = torch.onnx.export(model, (torch.Tensor(X_test[:2]),), dynamo=True)
onx = ep.model_proto

~/github/onnxscript/onnxscript/converter.py:816: FutureWarning: 'onnxscript.values.Op.param_schemas' is deprecated in version 0.1 and will be removed in the future. Please use '.op_signature' instead.
  param_schemas = callee.param_schemas()
~/github/onnxscript/onnxscript/converter.py:816: FutureWarning: 'onnxscript.values.OnnxFunction.param_schemas' is deprecated in version 0.1 and will be removed in the future. Please use '.op_signature' instead.
  param_schemas = callee.param_schemas()
[torch.onnx] Obtain model graph for `TorchLinearRegression([...]` with `torch.export.export(..., strict=False)`...
[torch.onnx] Obtain model graph for `TorchLinearRegression([...]` with `torch.export.export(..., strict=False)`... ✅
[torch.onnx] Run decomposition...
[torch.onnx] Run decomposition... ✅
[torch.onnx] Translate the graph into ONNX...
[torch.onnx] Translate the graph into ONNX... ✅

Let’s check it is work.

sess = InferenceSession(onx.SerializeToString(), providers=["CPUExecutionProvider"])
res = sess.run(None, {"x": X_test.astype(np.float32)[:2]})
print(res)

[array([[ 69.933266],
       [-15.684591]], dtype=float32)]

And the model.

plot_dot(onx)

Optimization¶

By default, the exported model is not optimized and leaves many local functions. They can be inlined and the model optimized with method optimize.

ep.optimize()
onx = ep.model_proto

plot_dot(onx)

With dynamic shapes¶

The dynamic shapes are used by torch.export.export() and must follow the convention described there.

ep = torch.onnx.export(
    model,
    (torch.Tensor(X_test[:2]),),
    dynamic_shapes={"x": {0: torch.export.Dim("batch")}},
    dynamo=True,
)
ep.optimize()
onx = ep.model_proto

print(pretty_onnx(onx))

[torch.onnx] Obtain model graph for `TorchLinearRegression([...]` with `torch.export.export(..., strict=False)`...
[torch.onnx] Obtain model graph for `TorchLinearRegression([...]` with `torch.export.export(..., strict=False)`... ✅
[torch.onnx] Run decomposition...
[torch.onnx] Run decomposition... ✅
[torch.onnx] Translate the graph into ONNX...
[torch.onnx] Translate the graph into ONNX... ✅
opset: domain='' version=18
input: name='x' type=dtype('float32') shape=['batch', 5]
init: name='linear.weight' type=float32 shape=(1, 5)
init: name='linear.bias' type=float32 shape=(1,) -- array([-0.2681867], dtype=float32)
Gemm(x, linear.weight, linear.bias, beta=1.00, transB=1, alpha=1.00, transA=0) -> linear
output: name='linear' type=dtype('float32') shape=['batch', 1]

For simplicity, it is possible to use torch.export.Dim.DYNAMIC or torch.export.Dim.AUTO.

ep = torch.onnx.export(
    model,
    (torch.Tensor(X_test[:2]),),
    dynamic_shapes={"x": {0: torch.export.Dim.AUTO}},
    dynamo=True,
)
ep.optimize()
onx = ep.model_proto

print(pretty_onnx(onx))

[torch.onnx] Obtain model graph for `TorchLinearRegression([...]` with `torch.export.export(..., strict=False)`...
[torch.onnx] Obtain model graph for `TorchLinearRegression([...]` with `torch.export.export(..., strict=False)`... ✅
[torch.onnx] Run decomposition...
[torch.onnx] Run decomposition... ✅
[torch.onnx] Translate the graph into ONNX...
[torch.onnx] Translate the graph into ONNX... ✅
opset: domain='' version=18
input: name='x' type=dtype('float32') shape=['s35', 5]
init: name='linear.weight' type=float32 shape=(1, 5)
init: name='linear.bias' type=float32 shape=(1,) -- array([-0.2681867], dtype=float32)
Gemm(x, linear.weight, linear.bias, beta=1.00, transB=1, alpha=1.00, transA=0) -> linear
output: name='linear' type=dtype('float32') shape=['s35', 1]

Total running time of the script: (0 minutes 5.865 seconds)