This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
assignments:assignment5 [2015/10/29 14:47] asa |
assignments:assignment5 [2015/10/30 12:53] asa |
||
---|---|---|---|
Line 25: | Line 25: | ||
The L1-SVM is an SVM that uses the L1 norm as the regularization term by replacing $w^Tw$ with $\sum_{i=1}^d |w_i|$. As discussed in class, the L1 SVM leads to very sparse solutions, and can therefore be used to perform feature selection. | The L1-SVM is an SVM that uses the L1 norm as the regularization term by replacing $w^Tw$ with $\sum_{i=1}^d |w_i|$. As discussed in class, the L1 SVM leads to very sparse solutions, and can therefore be used to perform feature selection. | ||
- | Run the L1-SVM on the datasets mentioned above. How many features have non-zero weight vector coefficients? Compare the accuracy of a regular L2 SVM trained on those features with an L2 SVM trained on all the features computed using 5-fold cross-validation. | + | Run the L1-SVM on the datasets mentioned above. |
+ | In scikit-learn use ''LinearSVC(penalty='l1', dual=False)'' to create one. | ||
+ | How many features have non-zero weight vector coefficients? (Note that you can obtain the weight vector of a trained SVM by looking at its ''coef0'' attribute. | ||
+ | Compare the accuracy of an L1 SVM to an SVM that uses RFE to select relevant features. | ||
- | L1-SVMs often leads to solutions that are too sparse. As a workaround, implement the following strategy: | + | Compare the accuracy of a regular L2 SVM trained on those features with an L2 SVM trained on all the features computed using 5-fold cross-validation. |
+ | |||
+ | It has been argued in the literature that L1-SVMs often leads to solutions that are too sparse. As a workaround, implement the following strategy: | ||
* Create $k$ sub-samples of the data in which you randomly choose 80% of the examples. | * Create $k$ sub-samples of the data in which you randomly choose 80% of the examples. | ||
- | * For each sub-sample train an L1-SVM | + | * For each sub-sample train an L1-SVM. |
+ | * For each feature compute a score that is the average weight vector | ||