Differences

This shows you the differences between two versions of the page.

--- assignments:assignment5 [2013/11/22 13:49]
asa
+++ assignments:assignment5 [2015/10/31 08:30]
asa
@@ Line 1: / Line 1: @@
-========= Assignment 5: Naive Bayes ============
+========= Assignment 5: Feature selection ============
-Due:  November 17th at 6pm
+Due:  November 15th at 11pm
-===== Part 1:  A few short questions about naive Bayes =====
+In this assignment you will compare several feature selection methods on several datasets.
+The first dataset is the [[https://archive.ics.uci.edu/ml/datasets/Arcene| Arcene]] dataset which was used in a feature selection competition
+The datasets we will use are the yeast gene expression dataset
+===== Part 1:  Filter methods =====
-  - Can you use naive Bayes for data that contains both categorical and real-valued features?
+Implement a Python function that returns an array with the Golub score of a labeled dataset.  Recall that the Golub score for feature $i$ is defined as:
-  - The basic assumption in naive Bayes is that all attributes are independent given the label.  How can you model just 2 of $d$ features as dependent?
-  - Given a trained naive Bayes classifier, and without access to the training data, how would you select a subset of features that are most predictive of the class label?
-===== Part 2:  naive Bayes implementation =====
+$$
+\frac{|\mu_i^{(+)} - \mu_i^{(-)}|}{\sigma_i^{(+)} + \sigma_i^{(-)}}
+$$
+where $\mu_i^{(+)}$ is the average of feature $i$ in the positive examples,
+where $\sigma_i^{(+)}$ is the standard deviation of feature $i$ in the positive examples, and $\mu_i^{(-)}, \sigma_i^{(-)}$ are defined analogously for the negative examples.
+In order for your function to work with the scikit-learn filter framework it needs to have two parameters: ''golub(X, y)'', where X is the feature matrix, and y is a vector of labels.  All scikit-learn filter methods return two values - a vector of scores, and a vector of p-values.  For our purposes, we won't use p-values associated with the Golub scores, so just return the computed vector of scores twice:  if your vector of scores is stored in an array called scores, have the return statement be:
-Implement a naive Bayes classifier for either categorical or continuous data.  Compare its performance to that of an SVM (make sure to perform proper model selection for classifier parameters using internal cross-validation).  Use two UCI repository datasets for this task.  There are several datasets that have categorical data: e.g. [[http://archive.ics.uci.edu/ml/datasets/Nursery | nursery school application ranking]], [[http://archive.ics.uci.edu/ml/datasets/Adult | census income prediction]], and [[http://archive.ics.uci.edu/ml/datasets/Molecular+Biology+(Splice-junction+Gene+Sequences)| splice junction detection]].  If you are implementing naive Bayes for categorical data, make sure to include pseudo-counts to avoid over fitting.
+''return scores,scores''
+===== Part 2:  Embedded methods:  L1 SVM =====
+The L1-SVM is an SVM that uses the L1 norm as the regularization term by replacing $w^Tw$ with $\sum_{i=1}^d |w_i|$.  As discussed in class, the L1 SVM leads to very sparse solutions, and can therefore be used to perform feature selection.
+Run the L1-SVM on the datasets mentioned above.
+In scikit-learn use ''LinearSVC(penalty='l1', dual=False)'' to create one.
+How many features have non-zero weight vector coefficients?  (Note that you can obtain the weight vector of a trained SVM by looking at its ''coef0'' attribute.
+Compare the accuracy of an L1 SVM to an SVM that uses RFE to select relevant features.
+Compare the accuracy of a regular L2 SVM trained on those features with an L2 SVM trained on all the features computed using 5-fold cross-validation.
+It has been argued in the literature that L1-SVMs often leads to solutions that are too sparse.  As a workaround, implement the following strategy:
+  * Create $k$ sub-samples of the data in which you randomly choose 80% of the examples.
+  * For each sub-sample train an L1-SVM.
+  * For each feature compute a score that is the average weight vector
+Do your results change if you do model selection for the resulting classifier over a grid of values for the soft margin constant $C$?
+===== Submission =====
+Submit the pdf of your report via Canvas.  Python code can be displayed in your report if it is succinct (not more than a page or two at the most) or submitted separately.  The latex sample document shows how to display Python code in a latex document.  Code needs to be there so we can make sure that you implemented the algorithms and data analysis methodology correctly.  Canvas allows you to submit multiple files for an assignment, so DO NOT submit an archive file (tar, zip, etc).  Canvas will only allow you to submit pdfs (.pdf extension) or python code (.py extension).
+For this assignment there is a strict 8 page limit (not including references and code that is provided as an appendix).  We will take off points for reports that go over the page limit.
+In addition to the code snippets that you include in your report, make sure you provide complete code from which we can see exactly how your results were generated.
 ===== Grading =====
-Here is what the grading sheet will look like for this assignment.  A few general guidelines for this and future assignments in the course:
+A few general guidelines for this and future assignments in the course:
-  * Always provide a description of the method you used to produce a given result in sufficient detail such that the reader can reproduce your results on the basis of the description.  You can use a few lines of python code or pseudo-code.  If your code is more than a few lines, you can include it as an appendix to your report.  For example, for the first part of the assignment, provide the protocol you use to evaluate classifier accuracy.
+  * Always provide a description of the method you used to produce a given result in sufficient detail such that the reader can reproduce your results on the basis of the description (UNLESS the method has been provided in class or is there in the book).  Your code needs to be provided in sufficient detail so we can make sure that your implementation is correct.  The saying that "the devil is in the details" holds true for machine learning, and is sometimes the makes the difference between correct and incorrect results.  If your code is more than a few lines, you can include it as an appendix to your report, or submit it as a separate file.  Make sure your code is readable!
-  * You can provide results in the form of tables, figures or text - whatever form is most appropriate for a given problem.  There are no rules about how much space each answer should take.  BUT we will take off points if we have to wade through a lot of redundant data.
+  * You can provide results in the form of tables, figures or text - whatever form is most appropriate for a given problem.
   * In any machine learning paper there is a discussion of the results.  There is a similar expectation from your assignments that you reason about your results.  For example, for the learning curve problem, what can you say on the basis of the observed learning curve?
+  * Write succinct answers.  We will take off points for rambling answers that are not to the point, and and similarly, if we have to wade through a lot of data/results that are not to the point.
 <code>
-Grading sheet for assignment 5
+Grading sheet for assignment 3
 Part 1:  40 points.
-(14 points):  1st question
+(10 points):  Primal SVM formulation is correct
-(13 points):  2nd question
+( 7 points):  Lagrangian found correctly
-(13 points):  3rd question
+( 8 points):  Derivation of saddle point equations
+(10 points):  Derivation of the dual
+( 5 points):  Discussion of the implication of the form of the dual for SMO-like algorithms
+Part 2:  10 points.
-Part 2:  50 points.
+Part 3:  40 points.
-(10 points):  Experimental protocol
+(20 points):  Accuracy as a function of parameters and discussion of the results
-(20 points):  Correct classifier implementation
+(15 points):  Comparison of normalized and non-normalized kernels and correct model selection
-(10 points):  Results for the two classifiers on both datasets
+( 5 points):  Visualization of the kernel matrix and observations made about it
-(10 points):  Discussion of the results
 Report structure, grammar and spelling:  10 points
-( 3 points):  Heading and subheading structure easy to follow and
+(10 points):  Heading and subheading structure easy to follow and clearly divides report into logical sections.
-              clearly divides report into logical sections.
+              Code, math, figure captions, and all other aspects of the report are well-written and formatted.
-( 4 points):  Code, math, figure captions, and all other aspects of
+              Grammar, spelling, and punctuation.  Answers are clear and to the point.
-              report are well-written and formatted.
-( 3 points):  Grammar, spelling, and punctuation.
 </code>

CS545 fall 2016

User Tools

Site Tools

Differences

Page Tools