Differences

This shows you the differences between two versions of the page.

--- assignments:assignment5 [2015/10/29 11:55]
asa
+++ assignments:assignment5 [2015/10/30 12:53]
asa
@@ Line 17: / Line 17: @@
 In order for your function to work with the scikit-learn filter framework it needs to have two parameters: ''golub(X, y)'', where X is the feature matrix, and y is a vector of labels.  All scikit-learn filter methods return two values - a vector of scores, and a vector of p-values.  For our purposes, we won't use p-values associated with the Golub scores, so just return the computed vector of scores twice:  if your vector of scores is stored in an array called scores, have the return statement be:
-<code python>
+''return scores,scores''
-return scores,scores
-</code>
-===== Part 2:  Comparison of filter and embedded methods =====
+===== Part 2:  Embedded methods:  L1 SVM =====
+The L1-SVM is an SVM that uses the L1 norm as the regularization term by replacing $w^Tw$ with $\sum_{i=1}^d |w_i|$.  As discussed in class, the L1 SVM leads to very sparse solutions, and can therefore be used to perform feature selection.
+Run the L1-SVM on the datasets mentioned above.
+In scikit-learn use ''LinearSVC(penalty='l1', dual=False)'' to create one.
+How many features have non-zero weight vector coefficients?  (Note that you can obtain the weight vector of a trained SVM by looking at its ''coef0'' attribute.
+Compare the accuracy of an L1 SVM to an SVM that uses RFE to select relevant features.
+Compare the accuracy of a regular L2 SVM trained on those features with an L2 SVM trained on all the features computed using 5-fold cross-validation.
+It has been argued in the literature that L1-SVMs often leads to solutions that are too sparse.  As a workaround, implement the following strategy:
+  * Create $k$ sub-samples of the data in which you randomly choose 80% of the examples.
+  * For each sub-sample train an L1-SVM.
+  * For each feature compute a score that is the average weight vector
+Do your results change if you do model selection for the resulting classifier over a grid of values for the soft margin constant $C$?
 ===== Submission =====

CS545 fall 2016

User Tools

Site Tools

Differences

Page Tools