Warning: Declaration of action_plugin_tablewidth::register(&$controller) should be compatible with DokuWiki_Action_Plugin::register(Doku_Event_Handler $controller) in /s/bach/b/class/cs545/public_html/fall16/lib/plugins/tablewidth/action.php on line 93
assignments:assignment5 [CS545 fall 2016]

User Tools

Site Tools


assignments:assignment5

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision Both sides next revision
assignments:assignment5 [2015/10/31 09:07]
asa
assignments:assignment5 [2015/10/31 09:24]
asa
Line 41: Line 41:
 Compare the accuracy of an L1 SVM to an SVM that uses RFE to select relevant features. Compare the accuracy of an L1 SVM to an SVM that uses RFE to select relevant features.
  
-Compare the accuracy of a regular L2 SVM trained on those features with an L2 SVM trained on all the features ​computed ​using 5-fold cross-validation.+Compare the accuracy of a regular L2 SVM trained on the features ​selected by the L1 SVM with the accuracy of an L2 SVM trained on all the features ​(compute accuracy ​using 5-fold cross-validation).
  
 It has been argued in the literature that L1-SVMs often leads to solutions that are too sparse. ​ As a workaround, implement the following strategy: It has been argued in the literature that L1-SVMs often leads to solutions that are too sparse. ​ As a workaround, implement the following strategy:
Line 47: Line 47:
   * Create $k$ sub-samples of the data in which you randomly choose 80% of the examples.   * Create $k$ sub-samples of the data in which you randomly choose 80% of the examples.
   * For each sub-sample train an L1-SVM.   * For each sub-sample train an L1-SVM.
-  * For each feature compute a score that is the average weight vector ​+  * For each feature compute a score that is the number of sub-samples for which that feature yielded a non-zero score.
  
  
 +===== Part 3:  Method comparison =====
 +
 +Compute the accuracy of a Linear L2 SVM as a function of the number of selected features on the leukemia and Arcene datasets for the following feature selection methods:
 +
 +  * The Golub score
 +  * L1-SVM feature selection using subsamples
 +  * RFE-SVM
 +
 +Make sure that your evaluation provides an un-biased estimate of classifier performance.
 +Comment on the results.
 +
 +For the above experiment you do not need to select the optimal value for the SVM soft-margin constant.
 +Compare these results to results obtained using internal cross-validation for selecting ​
 +the soft margin constant $C$ over a grid of values.
 +
 +In writing your code, use scikit-learn'​s ability to combine analysis steps using the [[http://​scikit-learn.org/​stable/​modules/​pipeline.html |Pipeline class]]. ​ This will be particularly useful for performing model selection.
  
-Do your results change if you do model selection for the resulting classifier over a grid of values for the soft margin constant $C$? 
  
  
assignments/assignment5.txt ยท Last modified: 2016/10/18 09:18 by asa