Trees¶
Digging into the tree structure¶
- mlinsights.mltree.tree_structure.predict_leaves(model, X)[source]¶
Returns the leave every observations of X falls into.
@param model a decision tree @param X observations @return array of leaves
- mlinsights.mltree.tree_structure.tree_find_common_node(tree, i, j, parents=None)[source]¶
Finds the common node to nodes i and j.
- Parameters:
tree – tree
i – node index (
tree.nodes[i]
)j – node index (
tree.nodes[j]
)parents – precomputed parents (None -> calls
tree_node_range()
)
- Returns:
common root, remaining path to i, remaining path to j
- mlinsights.mltree.tree_structure.tree_find_path_to_root(tree, i, parents=None)[source]¶
Lists nodes involved into the path to find node i.
- Parameters:
tree – tree
i – node index (
tree.nodes[i]
)parents – precomputed parents (None -> calls
tree_node_range()
)
- Returns:
one array of size (D, 2) where D is the number of dimensions
- mlinsights.mltree.tree_structure.tree_node_parents(tree)[source]¶
Returns a dictionary
{node_id: parent_id}
.@param tree tree @return parents
- mlinsights.mltree.tree_structure.tree_node_range(tree, i, parents=None)[source]¶
Determines the ranges for a node all dimensions.
nan
means infinity.- Parameters:
tree – tree
i – node index (
tree.nodes[i]
)parents – precomputed parents (None -> calls
tree_node_range()
)
- Returns:
one array of size (D, 2) where D is the number of dimensions
The following example shows what the function returns in case of simple grid in two dimensions.
<<<
import numpy from sklearn.tree import DecisionTreeClassifier from mlinsights.mltree import tree_leave_index, tree_node_range X = numpy.array( [[0, 0], [0, 1], [0, 2], [1, 0], [1, 1], [1, 2], [2, 0], [2, 1], [2, 2]] ) y = list(range(X.shape[0])) clr = DecisionTreeClassifier(max_depth=4) clr.fit(X, y) leaves = tree_leave_index(clr) ra = tree_node_range(clr, leaves[0]) print(ra)
>>>
[[nan 0.5] [nan 0.5]]
- mlinsights.mltree.tree_structure.tree_leave_index(model)[source]¶
Returns the indices of every leave in a tree.
- Parameters:
model – something which has a member
tree_
- Returns:
leave indices
- mlinsights.mltree.tree_structure.tree_leave_neighbors(model)[source]¶
The function determines which leaves are neighbors. The method uses some memory as it creates creates a grid of the feature spaces, each split multiplies the number of cells by two.
- Parameters:
model – a
sklearn.tree.DecisionTreeRegressor
, asklearn.tree.DecisionTreeClassifier
, a model which has a membertree_
- Returns:
a dictionary
{(i, j): (dimension, x1, x2)}
, i, j are node indices, if \(X_d * sign < th * sign\), the observations goes to node i, j otherwise, i < j. The border is somewhere in the segment[x1, x2]
.
The following example shows what the function returns in case of simple grid in two dimensions.
<<<
import numpy from sklearn.tree import DecisionTreeClassifier from mlinsights.mltree import tree_leave_neighbors X = numpy.array( [[0, 0], [0, 1], [0, 2], [1, 0], [1, 1], [1, 2], [2, 0], [2, 1], [2, 2]] ) y = list(range(X.shape[0])) clr = DecisionTreeClassifier(max_depth=4) clr.fit(X, y) nei = tree_leave_neighbors(clr) import pprint pprint.pprint(nei)
>>>
{(np.int32(2), np.int32(4)): [(np.int64(0), (np.float64(0.0), np.float64(0.0)), (np.float64(1.0), np.float64(0.0)))], (np.int32(2), np.int32(8)): [(np.int64(1), (np.float64(0.0), np.float64(0.0)), (np.float64(0.0), np.float64(1.0)))], (np.int32(4), np.int32(5)): [(np.int64(0), (np.float64(1.0), np.float64(0.0)), (np.float64(2.0), np.float64(0.0)))], (np.int32(4), np.int32(12)): [(np.int64(1), (np.float64(1.0), np.float64(0.0)), (np.float64(1.0), np.float64(1.0)))], (np.int32(5), np.int32(13)): [(np.int64(1), (np.float64(2.0), np.float64(0.0)), (np.float64(2.0), np.float64(1.0)))], (np.int32(8), np.int32(9)): [(np.int64(1), (np.float64(0.0), np.float64(1.0)), (np.float64(0.0), np.float64(2.0)))], (np.int32(8), np.int32(12)): [(np.int64(0), (np.float64(0.0), np.float64(1.0)), (np.float64(1.0), np.float64(1.0)))], (np.int32(9), np.int32(15)): [(np.int64(0), (np.float64(0.0), np.float64(2.0)), (np.float64(1.0), np.float64(2.0)))], (np.int32(12), np.int32(13)): [(np.int64(0), (np.float64(1.0), np.float64(1.0)), (np.float64(2.0), np.float64(1.0)))], (np.int32(12), np.int32(15)): [(np.int64(1), (np.float64(1.0), np.float64(1.0)), (np.float64(1.0), np.float64(2.0)))], (np.int32(13), np.int32(16)): [(np.int64(1), (np.float64(2.0), np.float64(1.0)), (np.float64(2.0), np.float64(2.0)))], (np.int32(15), np.int32(16)): [(np.int64(0), (np.float64(1.0), np.float64(2.0)), (np.float64(2.0), np.float64(2.0)))]}
Experiments, exercise¶
- mlinsights.mltree.tree_digitize.digitize2tree(bins, right=False)[source]¶
Builds a decision tree which returns the same result as lambda x: numpy.digitize(x, bins, right=right) (see numpy.digitize).
- Parameters:
bins – array of bins. It has to be 1-dimensional and monotonic.
right – Indicating whether the intervals include the right or the left bin edge. Default behavior is (right==False) indicating that the interval does not include the right edge. The left bin end is open in this case, i.e., bins[i-1] <= x < bins[i] is the default behavior for monotonically increasing bins.
- Returns:
decision tree
Note
The implementation of decision trees in scikit-learn only allows one type of decision (<=). That’s why the function throws an exception when right=False. However, this could be overcome by using ONNX where all kind of decision rules are implemented. Default value for right is still False to follow numpy API even though this value raises an exception in digitize2tree.
The following example shows what the tree looks like.
<<<
import numpy from sklearn.tree import export_text from mlinsights.mltree import digitize2tree x = numpy.array([0.2, 6.4, 3.0, 1.6]) bins = numpy.array([0.0, 1.0, 2.5, 4.0, 7.0]) expected = numpy.digitize(x, bins, right=True) tree = digitize2tree(bins, right=True) pred = tree.predict(x.reshape((-1, 1))) print("Comparison with numpy:") print(expected, pred) print("Tree:") print(export_text(tree, feature_names=["x"]))
>>>
Comparison with numpy: [1 4 3 2] [1. 4. 3. 2.] Tree: |--- x <= 2.50 | |--- x <= 1.00 | | |--- x <= 0.00 | | | |--- value: [0.00] | | |--- x > 0.00 | | | |--- value: [1.00] | |--- x > 1.00 | | |--- value: [2.00] |--- x > 2.50 | |--- x <= 4.00 | | |--- x <= 2.50 | | | |--- value: [2.00] | | |--- x > 2.50 | | | |--- value: [3.00] | |--- x > 4.00 | | |--- x <= 7.00 | | | |--- x <= 4.00 | | | | |--- value: [3.00] | | | |--- x > 4.00 | | | | |--- value: [4.00] | | |--- x > 7.00 | | | |--- value: [5.00]