Differences

This shows you the differences between two versions of the page.

--- code:model_selection [2015/10/05 13:25]
asa created
+++ code:model_selection [2015/10/05 13:56]
asa
@@ Line 20: / Line 20: @@
 </code>
+The simplest form of model evaluation uses a validation/test set:
+<code python>
+In [9]: X_train, X_test, y_train, y_test = cross_validation.train_test_split(X, y, test_size=0.4, random_state=0)
+In [10]: classifier = svm.SVC(kernel='linear', C=1).fit(X_train, y_train)
+In [11]: classifier.score(X_test, y_test)
+Out[11]: 0.7592592592592593
+</code>
+Next, let'd perform cross-validation:
+<code python>
+In [18]: cross_validation.cross_val_score(classifier, X, y, cv=5, scoring='accuracy')
+Out[18]: array([ 0.7962963 ,  0.83333333,  0.88888889,  0.83333333,  0.83333333])
+In [19]:
+In [19]: # you can obtain accuracy for other metrics, such as area under the roc curve:
+In [20]: cross_validation.cross_val_score(classifier, X, y, cv=5, scoring='roc_auc')
+Out[20]: array([ 0.89166667,  0.89166667,  0.95833333,  0.87638889,  0.91388889])
+In [21]:
+In [21]: # you can also obtain the predictions by cross-validation and then compute the accuracy:
+In [22]: y_predict = cross_validation.cross_val_predict(classifier, X, y, cv=5)
+In [23]: metrics.accuracy_score(y, y_predict)
+Out[23]: 0.83703703703703702
+</code>
+H ere's an alternative way of doing cross-validation.
+<code python>
+In [25]: # first divide the data into folds:
+In [26]: cv = cross_validation.StratifiedKFold(y, 5)
+In [27]: # now use these folds:
+In [28]: print cross_validation.cross_val_score(classifier, X, y, cv=cv, scoring='roc_auc')
+[ 0.89166667  0.89166667  0.95833333  0.87638889  0.91388889]
+In [29]:
+In [29]: # you can see how examples were divided into folds by looking at the test_folds attribute:
+In [30]: print cv.test_folds
+[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
+0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2
+2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
+2 2 2 2 2 2 2 2 2 2 2 3 3 2 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
+3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4
+4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
+4 4 4 4 4 4 4 4 4 4]
+</code>

CS545 fall 2016

User Tools

Site Tools

Differences

Page Tools