Differences

This shows you the differences between two versions of the page.

--- assignments:assignment5 [2015/10/31 09:07]
asa
+++ assignments:assignment5 [2015/10/31 09:24]
asa
@@ Line 41: / Line 41: @@
 Compare the accuracy of an L1 SVM to an SVM that uses RFE to select relevant features.
-Compare the accuracy of a regular L2 SVM trained on those features with an L2 SVM trained on all the features computed using 5-fold cross-validation.
+Compare the accuracy of a regular L2 SVM trained on the features selected by the L1 SVM with the accuracy of an L2 SVM trained on all the features (compute accuracy using 5-fold cross-validation).
 It has been argued in the literature that L1-SVMs often leads to solutions that are too sparse.  As a workaround, implement the following strategy:
@@ Line 47: / Line 47: @@
   * Create $k$ sub-samples of the data in which you randomly choose 80% of the examples.
   * For each sub-sample train an L1-SVM.
-  * For each feature compute a score that is the average weight vector
+  * For each feature compute a score that is the number of sub-samples for which that feature yielded a non-zero score.
+===== Part 3:  Method comparison =====
+Compute the accuracy of a Linear L2 SVM as a function of the number of selected features on the leukemia and Arcene datasets for the following feature selection methods:
+  * The Golub score
+  * L1-SVM feature selection using subsamples
+  * RFE-SVM
+Make sure that your evaluation provides an un-biased estimate of classifier performance.
+Comment on the results.
+For the above experiment you do not need to select the optimal value for the SVM soft-margin constant.
+Compare these results to results obtained using internal cross-validation for selecting
+the soft margin constant $C$ over a grid of values.
+In writing your code, use scikit-learn's ability to combine analysis steps using the [[http://scikit-learn.org/stable/modules/pipeline.html |Pipeline class]].  This will be particularly useful for performing model selection.
-Do your results change if you do model selection for the resulting classifier over a grid of values for the soft margin constant $C$?

CS545 fall 2016

User Tools

Site Tools

Differences

Page Tools