mlstatpy.ml.roc¶

class mlstatpy.ml.roc.ROC(y_true=None, y_score=None, sample_weight=None, df=None)[source][source]¶

Helper to draw a ROC curve.

Initialisation with a dataframe and two or three columns:

column 1: score (y_score)
column 2: expected answer (boolean) (y_true)
column 3: weight (optional) (sample_weight)

Paramètres:

y_true – if df is None, y_true, y_score, sample_weight must be filled, y_true is whether or None the answer is true. y_true means the prediction is right.
y_score – score prediction
sample_weight – weights
df – dataframe or array or list, it must contains 2 or 3 columns always in the same order

class CurveType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source][source]¶

Curve types:

PROBSCORE: 1 - False Positive / True Positive
ERRPREC: error / recall
RECPREC: precision / recall
ROC: False Positive / True Positive
SKROC: False Positive / True Positive (scikit-learn)

property Data¶: Returns the underlying dataframe.

auc(cloud=None)[source][source]¶

Computes the area under the curve (AUC).

Paramètres:: cloud – data or None to use self.data, the function assumes the data is sorted.
Renvoie:: AUC

The first column is the label, the second one is the score, the third one is the weight.

auc_interval(bootstrap=10, alpha=0.95)[source][source]¶

Determines a confidence interval for the AUC with bootstrap.

@param bootstrap number of random estimations @param alpha define the confidence interval @return dictionary of values

compute_roc_curve(nb=100, curve=CurveType.ROC, bootstrap=False)[source][source]¶

Computes a ROC curve with nb points avec nb, if nb == -1, there are as many as points as the data contains, if bootstrap == True, it draws random number to create confidence interval based on bootstrap method.

@param nb number of points for the curve @param curve see CurveType @param bootstrap builds the curve after resampling @return DataFrame (metrics and threshold)

If curve is SKROC, the parameter nb is not taken into account. It should be set to 0.

confusion(score=None, nb=10, curve=CurveType.ROC, bootstrap=False)[source][source]¶

Computes the confusion matrix for a specific score or all if score is None.

@param score score or None. @param nb number of scores (if score is None) @param curve see CurveType @param boostrap builds the curve after resampling @return One row if score is precised, many roww is score is None

plot(nb=100, curve=CurveType.ROC, bootstrap=0, ax=None, thresholds=False, **kwargs)[source][source]¶

Plots a ROC curve.

@param nb number of points @param curve see CurveType @param bootstrap number of curves for the boostrap (0 for None) @param ax axis @param thresholds use thresholds for the X axis @param kwargs sent to pandas.plot @return ax

precision()[source][source]¶: Computes the precision.

random_cloud()[source][source]¶

Resamples among the data.

@return DataFrame

roc_intersect(roc, x)[source][source]¶

The ROC curve is defined by a set of points. This function interpolates those points to determine y for any x.

@param roc ROC curve @param x x @return y

roc_intersect_interval(x, nb, curve=CurveType.ROC, bootstrap=10, alpha=0.05)[source][source]¶

Computes a confidence interval for the value returned by @see me roc_intersect.

@param x x @param nb number of curves to draw @param curve see CurveType @param bootstrap number of random estimations @param alpha confidence interval @return dictionary