This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
assignments:assignment5 [2015/10/29 14:47] asa |
assignments:assignment5 [2015/10/31 08:30] asa |
||
---|---|---|---|
Line 3: | Line 3: | ||
Due: November 15th at 11pm | Due: November 15th at 11pm | ||
- | In this assignment we will compare several feature selection methods on several datasets. | + | In this assignment you will compare several feature selection methods on several datasets. |
+ | The first dataset is the [[https://archive.ics.uci.edu/ml/datasets/Arcene| Arcene]] dataset which was used in a feature selection competition | ||
The datasets we will use are the yeast gene expression dataset | The datasets we will use are the yeast gene expression dataset | ||
Line 25: | Line 26: | ||
The L1-SVM is an SVM that uses the L1 norm as the regularization term by replacing $w^Tw$ with $\sum_{i=1}^d |w_i|$. As discussed in class, the L1 SVM leads to very sparse solutions, and can therefore be used to perform feature selection. | The L1-SVM is an SVM that uses the L1 norm as the regularization term by replacing $w^Tw$ with $\sum_{i=1}^d |w_i|$. As discussed in class, the L1 SVM leads to very sparse solutions, and can therefore be used to perform feature selection. | ||
- | Run the L1-SVM on the datasets mentioned above. How many features have non-zero weight vector coefficients? Compare the accuracy of a regular L2 SVM trained on those features with an L2 SVM trained on all the features computed using 5-fold cross-validation. | + | Run the L1-SVM on the datasets mentioned above. |
+ | In scikit-learn use ''LinearSVC(penalty='l1', dual=False)'' to create one. | ||
+ | How many features have non-zero weight vector coefficients? (Note that you can obtain the weight vector of a trained SVM by looking at its ''coef0'' attribute. | ||
+ | Compare the accuracy of an L1 SVM to an SVM that uses RFE to select relevant features. | ||
- | L1-SVMs often leads to solutions that are too sparse. As a workaround, implement the following strategy: | + | Compare the accuracy of a regular L2 SVM trained on those features with an L2 SVM trained on all the features computed using 5-fold cross-validation. |
+ | |||
+ | It has been argued in the literature that L1-SVMs often leads to solutions that are too sparse. As a workaround, implement the following strategy: | ||
* Create $k$ sub-samples of the data in which you randomly choose 80% of the examples. | * Create $k$ sub-samples of the data in which you randomly choose 80% of the examples. | ||
- | * For each sub-sample train an L1-SVM | + | * For each sub-sample train an L1-SVM. |
+ | * For each feature compute a score that is the average weight vector | ||
Do your results change if you do model selection for the resulting classifier over a grid of values for the soft margin constant $C$? | Do your results change if you do model selection for the resulting classifier over a grid of values for the soft margin constant $C$? | ||
+ | |||
+ | |||
===== Submission ===== | ===== Submission ===== |