Warning: Declaration of action_plugin_tablewidth::register(&$controller) should be compatible with DokuWiki_Action_Plugin::register(Doku_Event_Handler $controller) in /s/bach/b/class/cs545/public_html/fall16/lib/plugins/tablewidth/action.php on line 93
code:model_selection [CS545 fall 2016]

User Tools

Site Tools


code:model_selection

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
code:model_selection [2015/10/05 13:56]
asa
code:model_selection [2016/10/06 14:58]
asa
Line 1: Line 1:
-===== model selection ​and cross validation ​in scikit-learn ===== +===== model selection in scikit-learn =====
- +
-First let's import some modules and read in some data:+
  
 <code python> <code python>
  
-In [1]: import numpy as np 
  
-In [2]: from sklearn import cross_validation+"""​classifier evaluation using scikit-learn
  
-In [3]from sklearn import svm+more details at: 
 +http://​scikit-learn.org/​stable/​modules/​cross_validation.html 
 +http://​scikit-learn.org/​stable/​tutorial/​statistical_inference/​model_selection.html 
 +"""​
  
-In [4]: from sklearn import metrics+import numpy as np 
 +from sklearn import cross_validation 
 +from sklearn import svm 
 +from sklearn import metrics
  
-In [5]: data=np.genfromtxt("​../​data/​heart_scale.data",​ delimiter=","​)+# read in the heart dataset
  
-In [6]: X=data[:,​1:​]+data=np.genfromtxt("​../​data/​heart_scale.data",​ delimiter=","​) 
 +X=data[:,1:
 +y=data[:,0]
  
-In [7]y=data[:,0]+# first let's do regular cross-validation:
  
-</​code>​+cv = cross_validation.StratifiedKFold(y,​ 5, shuffle=True,​ random_state=0) 
 +print (cv.test_folds)
  
-The simplest form of model evaluation uses a validation/​test set:+classifier = svm.SVC(kernel='​linear',​ C=1)
  
-<code python>​ +y_predict ​= cross_validation.cross_val_predict(classifier, ​X, y, cv=cv) 
-In [9]: X_train, X_test, y_train, y_test ​= cross_validation.train_test_split(X, y, test_size=0.4random_state=0)+print(metrics.accuracy_score(yy_predict))
  
-In [10]: classifier = svm.SVC(kernel='​linear',​ C=1).fit(X_train,​ y_train) 
  
-In [11]: classifier.score(X_test,​ y_test) +# grid search
-Out[11]: 0.7592592592592593+
  
 +# let's perform model selection using grid search ​
  
-</​code>​ +from sklearn.grid_search import GridSearchCV 
- +Cs = np.logspace(-236
-Next, let'd perform cross-validation:​ +classifier = GridSearchCV(estimator=svm.LinearSVC(), param_grid=dict(C=Cs) ) 
- +classifier.fit(X, y)
-<code python>​ +
- +
-In [18]: cross_validation.cross_val_score(classifierXy, cv=5, scoring='​accuracy'​+
-Out[18]: array([ 0.7962963 ,  0.83333333, ​ 0.88888889, ​ 0.83333333, ​ 0.83333333]) +
- +
-In [19]:  +
- +
-In [19]: # you can obtain accuracy for other metricssuch as area under the roc curve: +
- +
-In [20]: cross_validation.cross_val_score(classifier, X, y, cv=5, scoring='​roc_auc'​) +
-Out[20]: array([ 0.89166667, ​ 0.89166667, ​ 0.95833333, ​ 0.87638889, ​ 0.91388889]+
- +
-In [21]:  +
- +
-In [21]: # you can also obtain the predictions by cross-validation and then compute the accuracy: +
- +
-In [22]: y_predict = cross_validation.cross_val_predict(classifier, ​X, y, cv=5) +
- +
-In [23]: metrics.accuracy_score(y,​ y_predict) +
-Out[23]: 0.83703703703703702 +
- +
-</​code>​ +
- +
-H ere's an alternative way of doing cross-validation. +
- +
-<code python>​ +
-In [25]: # first divide the data into folds:+
  
-In [26]cv = cross_validation.StratifiedKFold(y, 5)+# print the best accuracy, classifier and parameters: 
 +print (classifier.best_score_) 
 +print (classifier.best_estimator_) 
 +print (classifier.best_params_)
  
-In [27]: now use these folds:+performing nested cross validation:
  
-In [28]: print cross_validation.cross_val_score(classifier,​ X, y, cv=cv, scoring='​roc_auc'​+y_predict = cross_validation.cross_val_predict(classifier,​ X, y, cv=cv) 
-[ 0.89166667 ​ 0.89166667 ​ 0.95833333 ​ 0.87638889 ​ 0.91388889]+print(metrics.accuracy_score(y,​ y_predict))
  
-In [29]:  
  
-In [29]: # you can see how examples were divided into folds by looking at the test_folds attribute:+# if we want to do grid search over multiple parameters:​ 
 +param_grid = [ 
 +  {'​C':​ [1, 10, 100, 1000], '​kernel'​['​linear'​]},​ 
 +  {'​C'​[1, 10, 100, 1000], '​gamma':​ [0.001, 0.0001], '​kernel':​ ['​rbf'​]},​ 
 + ] 
 +classifier = GridSearchCV(estimator=svm.SVC(),​ param_grid=param_grid)
  
-In [30]: print cv.test_folds +y_predict = cross_validation.cross_val_predict(classifier,​ X, y, cv=cv) 
-[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 +print(metrics.accuracy_score(y,​ y_predict))
- 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 +
- 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 +
- 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 +
- 2 2 2 2 2 2 2 2 2 2 2 2 3 3 2 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 +
- 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 +
- 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 +
- 4 4 4 4 4 4 4 4 4 4 4]+
  
-</code>+</file>
  
code/model_selection.txt · Last modified: 2016/10/06 14:58 by asa