Model error quantification

September 11, 2020 less than 1 minute read

When we train a machine-learning model, we almost always report some performance metric, such as accuracy, recall, or F1-score.

But there is some inherent randomness to this machine learning, and different training runs will result in different output metric values.

Benjamin Obi Tayo’s article “Random Error Quantification in Machine Learning” has a nice summary for how to quantify and visualize the error associated with your metric from the output of your cross-validation steps:

train_score = []

for i in range(n):
    X_train, X_test, y_train, y_test = train_test_split(
        X,
        y,
        test_size=0.3,
        random_state=i,
    )
    y_train_std = sc_y.fit_transform(y_train[:, np.newaxis]).flatten()
    train_score = np.append(
        train_score,
        np.mean(
            cross_val_score(pipe_lr, X_train, y_train_std, scoring ='r2', cv = 10)
            )
        )
train_mean = np.mean(train_score)
train_std = np.std(train_score)
print('R2 train: %.3f +/- %.3f' % (train_mean,train_std))

plt.figure(figsize=(11,7))
plt.plot(range(n),train_score,color='blue', linestyle='dashed',
         marker='o',markerfacecolor='red', markersize=10)
plt.fill_between(range(n),
                 train_score + 2*train_std,
                 train_score - 2*train_std,
                 alpha=0.15, color='green')
plt.grid()
plt.ylim(0.8,1)
plt.title('Mean cross-validation R2 score vs. random state parameter', size = 14)
plt.xlabel('Random state parameter', size = 14)
plt.ylabel('Mean cross-validation R2 score', size = 14)
plt.show()

Metric values

You can see the full details in the associated Jupyter Notebook.

Share on

Twitter Facebook LinkedIn

Francis T. O'Donovan

Model error quantification

Share on

Leave a comment

You may also enjoy

Using Stoicism to thrive as an IC Data Scientist

Culture turns strategy into action

Python: How uv got so fast

Stoicism and the pursuit of a good life