AI News, Machine Learning forDiabetes

Machine Learning forDiabetes

import pandas as pd import numpy as np import matplotlib.pyplot as plt %matplotlib inline diabetes = pd.read_csv('diabetes.csv') print(diabetes.columns) Index([‘Pregnancies’, ‘Glucose’, ‘BloodPressure’, ‘SkinThickness’, ‘Insulin’, ‘BMI’, ‘DiabetesPedigreeFunction’, ‘Age’, ‘Outcome’], dtype=’object’) diabetes.head() Figure 1 The diabetes data set consists of 768 data points, with 9 features each: print('dimension of diabetes data: {}'.format(diabetes.shape)) dimension of diabetes data: (768, 9) “Outcome” is the feature we are going to predict, 0 means No diabetes, 1 means diabetes.

To make a prediction for a new data point, the algorithm finds the closest data points in the training data set — its “nearest neighbors.” First, Let’s investigate whether we can confirm the connection between model complexity and accuracy: from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(diabetes.loc[:, diabetes.columns != 'Outcome'], diabetes['Outcome'], stratify=diabetes['Outcome'], random_state=66) from sklearn.neighbors import KNeighborsClassifier training_accuracy = [] test_accuracy = [] #

test_accuracy.append(knn.score(X_test, y_test)) plt.plot(neighbors_settings, training_accuracy, label='training accuracy') plt.plot(neighbors_settings, test_accuracy, label='test accuracy') plt.ylabel('Accuracy') plt.xlabel('n_neighbors') plt.legend() plt.savefig('knn_compare_model') Figure 5 The above plot shows the training and test set accuracy on the y-axis against the setting of n_neighbors on the x-axis.

from sklearn.linear_model import LogisticRegression logreg = LogisticRegression().fit(X_train, y_train) print('Training set score: {:.3f}'.format(logreg.score(X_train, y_train))) print('Test set score: {:.3f}'.format(logreg.score(X_test, y_test))) Training set accuracy: 0.781 Test set accuracy: 0.771 The default value of C=1 provides with 78% accuracy on the training and 77% accuracy on the test set.

logreg100 = LogisticRegression(C=100).fit(X_train, y_train) print('Training set accuracy: {:.3f}'.format(logreg100.score(X_train, y_train))) print('Test set accuracy: {:.3f}'.format(logreg100.score(X_test, y_test))) Training set accuracy: 0.785 Test set accuracy: 0.766 Using C=100 results in a little bit higher accuracy on the training set and little bit lower accuracy on the test set, confirming that less regularization and a more complex model may not generalize better than default setting.

diabetes_features = [x for i,x in enumerate(diabetes.columns) if i!=8] plt.figure(figsize=(8,6)) plt.plot(logreg.coef_.T, 'o', label='C=1') plt.plot(logreg100.coef_.T, '^', label='C=100') plt.plot(logreg001.coef_.T, 'v', label='C=0.001') plt.xticks(range(diabetes.shape[1]), diabetes_features, rotation=90) plt.hlines(0, 0, diabetes.shape[1]) plt.ylim(-5, 5) plt.xlabel('Feature') plt.ylabel('Coefficient magnitude') plt.legend() plt.savefig('log_coef') Figure 6 Decision Tree from sklearn.tree import DecisionTreeClassifier tree = DecisionTreeClassifier(random_state=0), y_train) print('Accuracy on training set: {:.3f}'.format(tree.score(X_train, y_train))) print('Accuracy on test set: {:.3f}'.format(tree.score(X_test, y_test))) Accuracy on training set: 1.000 Accuracy on test set: 0.714 The accuracy on the training set is 100%, while the test set accuracy is much worse.

Random Forest Let’s apply a random forest consisting of 100 trees on the diabetes data set: from sklearn.ensemble import RandomForestClassifier rf = RandomForestClassifier(n_estimators=100, random_state=0), y_train) print('Accuracy on training set: {:.3f}'.format(rf.score(X_train, y_train))) print('Accuracy on test set: {:.3f}'.format(rf.score(X_test, y_test))) Accuracy on training set: 1.000 Accuracy on test set: 0.786 The random forest gives us an accuracy of 78.6%, better than the logistic regression model or a single decision tree, without tuning any parameters.

To reduce overfitting, we could either apply stronger pre-pruning by limiting the maximum depth or lower the learning rate: gb1 = GradientBoostingClassifier(random_state=0, max_depth=1), y_train) print('Accuracy on training set: {:.3f}'.format(gb1.score(X_train, y_train))) print('Accuracy on test set: {:.3f}'.format(gb1.score(X_test, y_test))) Accuracy on training set: 0.804 Accuracy on test set: 0.781 gb2 = GradientBoostingClassifier(random_state=0, learning_rate=0.01), y_train) print('Accuracy on training set: {:.3f}'.format(gb2.score(X_train, y_train))) print('Accuracy on test set: {:.3f}'.format(gb2.score(X_test, y_test))) Accuracy on training set: 0.802 Accuracy on test set: 0.776 Both methods of decreasing the model complexity reduced the training set accuracy, as expected.

Support Vector Machine from sklearn.svm import SVC svc = SVC(), y_train) print('Accuracy on training set: {:.2f}'.format(svc.score(X_train, y_train))) print('Accuracy on test set: {:.2f}'.format(svc.score(X_test, y_test))) Accuracy on training set: 1.00 Accuracy on test set: 0.65 The model overfits quite substantially, with a perfect score on the training set and only 65% accuracy on the test set.

We will need to re-scale our data that all the features are approximately on the same scale: from sklearn.preprocessing import MinMaxScaler scaler = MinMaxScaler() X_train_scaled = scaler.fit_transform(X_train) X_test_scaled = scaler.fit_transform(X_test) svc = SVC(), y_train) print('Accuracy on training set: {:.2f}'.format(svc.score(X_train_scaled, y_train))) print('Accuracy on test set: {:.2f}'.format(svc.score(X_test_scaled, y_test))) Accuracy on training set: 0.77 Accuracy on test set: 0.77 Scaling the data made a huge difference!


Accuracy Measures For A Forecast Model Returns range of summary measures of the forecast accuracy.

If x is not provided, the function only produces training set accuracy measures of the forecasts based on f['x']-fitted(f).

S3 method for default accuracy(f, x, test = NULL, d = NULL, D = NULL, ...) Arguments f

It will also work with Arima, ets and lm objects if x is omitted -- in which case training set accuracy measures are returned.

An optional numerical vector containing actual values of the same length as object, or a time series overlapping with the times of f.

By default, the MASE calculation is scaled using MAE of training set naive forecasts for non-seasonal time series, training set seasonal naive forecasts for seasonal time series and training set mean forecasts for non-time series data.

} Documentation reproduced from package forecast, version 8.1, License: GPL (>= 3) Community examples Looks like there are no examples yet.

Improved alternate test accuracy using weighted training sets

The alternate test paradigm has been proposed as a low-cost replacement to expensive and time consuming conventional specification tests of analog/radio-frequency (RF) integrated circuits.

To construct accurate models across the whole design space, a large set of real data needs ideally to be collected from different wafers and lots over a long period of time, that is not possible in the early production stage.

How To Calibrate Your Monitor

Many monitors don't have accurate colors out of the box. Fortunately, you can often correct this by calibrating your display. How do you do this, and do you necessarily need expensive hardware?...

Ilios Photon 2 Accuracy & Resolution Tests

There are many numbers thrown around when it comes to 3D printer resolution. However not many realize that the actual resolution a 3D printer can achieve is far from what the theoretical numbers...

How to Find the Perfect Print Settings For Your 3D Printer

In this video I show you how to get the perfect print settings for your new printer or filament, enjoy! All music was used with permission from the creator and is royalty free.

Accuracy Flight Test - DCS UH 1H FAT Joe - Part 2 Others

In depth flight dynamics tests of the UH 1H in DCS. This video is part of our accuracy flight test series, which compares helicopter simulator platform and add-on in search for the most realistic...

Accuracy of a print head sensor in a DIY linear drive

Table Of Contents: 00:05 Introduction 00:48 Measurement setup 02:29 Controlling the drive with a microcontroller 04:05 Backlash 06:46 Limits of the linear sensor The project page:

Digital Thickness Gauge Accuracy

It amazes me how accurate this device is. This is a very inexpensive tool from Harbor Freight, and i take it to the test and measure the thickness of a feeler gauge.

Robot Arm Accuracy Test

First test picking up & stacking 5/8" wooden blocks. 6 DOF arm controlled by PicAxe 20M2, cost ~$50 to build. Described in ROBOT magazine May/June 2012 . Quite accurate for a simple analog...

IVA PRO26DX Digital Speaker Management - Crossover Filters Accuracy Test

Introducing the most affordable and professionally accurate Digital Speaker Management, IVA PRO26DX. We use the word professional is based on the performance and accuracy of our IVA digital...

3D Scanning 101: Resolution vs Accuracy Difference

To learn more about our 3D Scanning services, including reverse engineering and inspection visit - To learn more about portable 3D scanners visit -

Ruger 10/22 Accuracy Gains by Torquing ONE Screw?!

Wheeler Firearm Accurizing Torque FAT Wrench vs 10/22 takedown screw. Will it tighten my groups?