This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
assignments:assignment5 [2015/11/05 09:55] asa [Part 2: Embedded methods: L1 SVM] |
assignments:assignment5 [2015/11/16 16:27] asa [Part 2: Embedded methods: L1 SVM] |
||
---|---|---|---|
Line 35: | Line 35: | ||
In scikit-learn use ''LinearSVC(penalty='l1', dual=False)'' to create one. | In scikit-learn use ''LinearSVC(penalty='l1', dual=False)'' to create one. | ||
How many features have non-zero weight vector coefficients? (Note that you can obtain the weight vector of a trained SVM by looking at its ''coef0_'' attribute. | How many features have non-zero weight vector coefficients? (Note that you can obtain the weight vector of a trained SVM by looking at its ''coef0_'' attribute. | ||
- | Compare the accuracy of an L1 SVM to an L2 SVM that uses RFE (with an L2-SVM) to select relevant features. | ||
- | Compare the accuracy of a regular L2 SVM trained on the features selected by the L1 SVM with the accuracy of an L2 SVM trained on all the features (compute accuracy using 5-fold cross-validation). | + | Compare the accuracy of the following approaches using cross-validation on the two datasets: |
+ | |||
+ | * L1 SVM | ||
+ | * L2 SVM trained on the features selected by the L1 SVM | ||
+ | * L2 SVM trained on all the features | ||
+ | * L2 SVM that uses RFE (with an L2-SVM) to select relevant features; use the class ''RFECV'' which automatically selects the number of features. | ||
It has been argued in the literature that L1-SVMs often leads to solutions that are too sparse. As a workaround, implement the following strategy: | It has been argued in the literature that L1-SVMs often leads to solutions that are too sparse. As a workaround, implement the following strategy: | ||
- | * Create $k$ sub-samples of the data in which you randomly choose 80% of the examples. | + | * Create $k$ sub-samples of the training data. For each sub-sample randomly choose a subset consisting of 80% of the training examples. |
* For each sub-sample train an L1-SVM. | * For each sub-sample train an L1-SVM. | ||
- | * For each feature compute a score that is the number of sub-samples for which that feature yielded a non-zero score. | + | * For each feature compute a score that is the number of sub-samples for which that feature yielded a non-zero weight vector coefficient. |
+ | In the next part of the assignment you will compare this approach to RFE and the Golub filter method that you implemented in part 1. | ||
===== Part 3: Method comparison ===== | ===== Part 3: Method comparison ===== | ||