Warning: Declaration of action_plugin_tablewidth::register(&$controller) should be compatible with DokuWiki_Action_Plugin::register(Doku_Event_Handler $controller) in /s/bach/b/class/cs545/public_html/fall16/lib/plugins/tablewidth/action.php on line 93
code:model_selection [CS545 fall 2016]

User Tools

Site Tools


code:model_selection

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
code:model_selection [2015/10/05 13:56]
asa
code:model_selection [2016/10/06 14:58]
asa
Line 1: Line 1:
-===== model selection ​and cross validation ​in scikit-learn =====+===== model selection in scikit-learn =====
  
-First let's import some modules and read in some data:+<file python model_selection.py>​
  
-<code python> 
  
-In [1]: import numpy as np+"""​classifier evaluation using scikit-learn
  
-In [2]from sklearn import ​cross_validation+more details at: 
 +http://​scikit-learn.org/​stable/​modules/​cross_validation.html 
 +http://​scikit-learn.org/​stable/​tutorial/​statistical_inference/​model_selection.html 
 +"""​
  
-In [3]: from sklearn import svm+import numpy as np 
 +from sklearn import cross_validation 
 +from sklearn import svm 
 +from sklearn import metrics
  
-In [4]: from sklearn import metrics+# read in the heart dataset
  
-In [5]: data=np.genfromtxt("​../​data/​heart_scale.data",​ delimiter=","​)+data=np.genfromtxt("​../​data/​heart_scale.data",​ delimiter=","​) 
 +X=data[:,​1:​] 
 +y=data[:,0]
  
-In [6]X=data[:,​1:​]+# first let's do regular cross-validation:
  
-In [7]: y=data[:,0]+cv = cross_validation.StratifiedKFold(y, 5, shuffle=Truerandom_state=0
 +print (cv.test_folds)
  
-</​code>​+classifier = svm.SVC(kernel='​linear',​ C=1)
  
-The simplest form of model evaluation uses a validation/​test set:+y_predict = cross_validation.cross_val_predict(classifier,​ X, y, cv=cv) 
 +print(metrics.accuracy_score(y,​ y_predict))
  
-<code python> 
-In [9]: X_train, X_test, y_train, y_test = cross_validation.train_test_split(X,​ y, test_size=0.4,​ random_state=0) 
  
-In [10]: classifier = svm.SVC(kernel='​linear',​ C=1).fit(X_train,​ y_train)+# grid search
  
-In [11]: classifier.score(X_test,​ y_test) +# let's perform model selection using grid search ​
-Out[11]: 0.7592592592592593+
  
 +from sklearn.grid_search import GridSearchCV
 +Cs = np.logspace(-2,​ 3, 6)
 +classifier = GridSearchCV(estimator=svm.LinearSVC(),​ param_grid=dict(C=Cs) )
 +classifier.fit(X,​ y)
  
-</​code>​+# print the best accuracy, classifier and parameters:​ 
 +print (classifier.best_score_) 
 +print (classifier.best_estimator_) 
 +print (classifier.best_params_)
  
-Next, let'd perform ​cross-validation:+# performing nested ​cross validation:
  
-<code python>+y_predict = cross_validation.cross_val_predict(classifier,​ X, y, cv=cv) 
 +print(metrics.accuracy_score(y,​ y_predict))
  
-In [18]: cross_validation.cross_val_score(classifier,​ X, y, cv=5, scoring='​accuracy'​) 
-Out[18]: array([ 0.7962963 ,  0.83333333, ​ 0.88888889, ​ 0.83333333, ​ 0.83333333]) 
  
-In [19]: +# if we want to do grid search over multiple parameters:​ 
 +param_grid = [ 
 +  {'​C':​ [1, 10, 100, 1000], '​kernel'​['​linear'​]},​ 
 +  {'​C':​ [1, 10, 100, 1000], '​gamma':​ [0.001, 0.0001], '​kernel':​ ['​rbf'​]},​ 
 + ] 
 +classifier = GridSearchCV(estimator=svm.SVC(),​ param_grid=param_grid)
  
-In [19]: # you can obtain accuracy for other metrics, ​such as area under the roc curve:+y_predict = cross_validation.cross_val_predict(classifier,​ X, y, cv=cv) 
 +print(metrics.accuracy_score(yy_predict))
  
-In [20]: cross_validation.cross_val_score(classifier,​ X, y, cv=5, scoring='​roc_auc'​) +</file>
-Out[20]: array([ 0.89166667, ​ 0.89166667, ​ 0.95833333, ​ 0.87638889, ​ 0.91388889]) +
- +
-In [21]:  +
- +
-In [21]: # you can also obtain the predictions by cross-validation and then compute the accuracy: +
- +
-In [22]: y_predict = cross_validation.cross_val_predict(classifier,​ X, y, cv=5) +
- +
-In [23]: metrics.accuracy_score(y,​ y_predict) +
-Out[23]: 0.83703703703703702 +
- +
-</code> +
- +
-H ere's an alternative way of doing cross-validation. +
- +
-<code python>​ +
-In [25]: # first divide the data into folds: +
- +
-In [26]: cv = cross_validation.StratifiedKFold(y,​ 5) +
- +
-In [27]: # now use these folds: +
- +
-In [28]: print cross_validation.cross_val_score(classifier,​ X, y, cv=cv, scoring='​roc_auc'​) +
-[ 0.89166667 ​ 0.89166667 ​ 0.95833333 ​ 0.87638889 ​ 0.91388889] +
- +
-In [29]:  +
- +
-In [29]: # you can see how examples were divided into folds by looking at the test_folds attribute:​ +
- +
-In [30]: print cv.test_folds +
-[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 +
- 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 +
- 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 +
- 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 +
- 2 2 2 2 2 2 2 2 2 2 2 2 3 3 2 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 +
- 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 +
- 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 +
- 4 4 4 4 4 4 4 4 4 4 4] +
- +
-</code>+
  
code/model_selection.txt · Last modified: 2016/10/06 14:58 by asa