Warning: Declaration of action_plugin_tablewidth::register(&$controller) should be compatible with DokuWiki_Action_Plugin::register(Doku_Event_Handler $controller) in /s/bach/b/class/cs545/public_html/fall16/lib/plugins/tablewidth/action.php on line 93
assignments:assignment3 [CS545 fall 2016]

User Tools

Site Tools


assignments:assignment3

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
Next revision Both sides next revision
assignments:assignment3 [2013/10/04 15:32]
asa created
assignments:assignment3 [2016/09/15 12:53]
asa
Line 1: Line 1:
-========= Assignment ​3: Support Vector Machines ============+====== Assignment ​======
  
-===== Part 1:  SVM with no bias term =====+**Due:** 10/at 11:59pm.
  
-Formulate a soft-margin SVM without ​the bias term, i.e$f(\x) = \w^{\tr} \x$. +In this assignment you will explore ridge regression applied to the task of predicting wine quality. 
-Derive ​the saddle point conditionsKKT conditions ​and the dual. +You will use the [[http://​archive.ics.uci.edu/​ml/​datasets/​Wine+Quality | wine quality]] dataset from the UCI machine learning repository, and compare accuracy obtained using ridge regression to the results from a [[http://​www.sciencedirect.com/​science/​article/​pii/​S0167923609001377#​ | recent publication]] (if you have trouble accessing that version of the paper, here's a link to a [[http://​www3.dsi.uminho.pt/​pcortez/​wine5.pdf| preprint]]
-Compare it to the standard SVM formulation+The wine data is composed ​of two datasets - one for white wines, and one for reds.  ​Perform all your analyses on both.
-What is the implication ​of the difference on the design of SMO-like algorithms?​ +
-Recall that SMO algorithms work by iteratively optimizing ​two variables at a time. +
-Hint:  consider the difference in the constraints.+
  
-Discuss ​the merit of the bias-less formulation ​as the dimensionality +=== Part 1 === 
-of the data (or the feature ​space) is varied+Implement ridge regression and functions for computing ​the following measures ​of error: 
-When using this SVM formulation ​it may be useful ​to add constant ​to the + 
-kernel matrix.  ​Explain why this can be beneficial.+  * The Root Mean Square Error (RMSE). 
 +  * The Maximum Absolute Deviation (MAD). 
 + 
 + 
 +For a hypothesis $h$, they are defined ​as follows: 
 + 
 +$$RMSE(h) = \sqrt{\frac{1}{N}\sum_{i=1}^N (y_i - h(\mathbf{x}_i))^2}$$ 
 + 
 +and 
 + 
 +$$MAD(h) = \frac{1}{N}\sum_{i=1}^N |y_i - h(\mathbf{x}_i)|.$$ 
 + 
 +Compute these measures of error for ridge regression applied 
 +to the wine dataset over a range of the regularization parameter, $\lambda$ (choose values on a logarithmic scale, e.g. 0.01, 0.1, 1, 10, 100, 1000) and plot the results (use a fixed test set for computing them!) 
 +The features for the wine dataset are not standardized,​ so make sure you do this, especially since we are going to consider the magnitude ​of the weight vector ​(recall that standardization entails subtracting ​the mean and then dividing by the standard deviation for each feature; you can use the [[http://​docs.scipy.org/​doc/​numpy/​reference/​routines.statistics.html | Numpy statistics module]] to perform the required calculations).  What is the potential advantage of MAD over RMSE? 
 + 
 +In addition to RMSE and MAD, plot the Regression Error Characteristic (REC) curve of a representative classifier  
 +REC curves are described in the following [[http://​machinelearning.wustl.edu/​mlpapers/​paper_files/​icml2003_BiB03.pdf|paper]]. 
 +What can you learn from this curve that you cannot learn from RMSE or MAD?   
 + 
 +Compare the results that you are getting with the published results in the paper. 
 + 
 +=== Part 2 === 
 + 
 +As we discussed in class, the magnitude of the weight vector can be interpreted as a measure of feature importance. 
 +Train a ridge regression classifier on a subset of the dataset that you reserved for training. 
 +We will explore the relationship between the magnitude of weight vector components and their relevance to the classification task in several ways. 
 +Each feature is associated with a component of the weight vector. ​ It can also be associated with the correlation of that feature with the vector of labels. 
 +Create a scatter plot of the weight vector component against the [[https://​en.wikipedia.org/​wiki/​Pearson_product-moment_correlation_coefficient | Pearson correlation coefficient]] of a feature against the labels (again, you can use the [[http://​docs.scipy.org/​doc/​numpy/​reference/​routines.statistics.html | Numpy statistics module]] to compute ​it). 
 +What can you conclude from this plot? 
 +The paper ranks features according to their importance using a different approach. ​ Compare your results with what they obtain. 
 + 
 +Next, perform the following experiment:​ 
 +Incrementally remove the feature with the lowest absolute value of the weight vector and retrain the ridge regression classifier. 
 +Plot RMSE and MAD as a function of the number of features that remain on the test set which you have set aside. 
 + 
 +===== Submission ===== 
 + 
 +Submit your report via Canvas. ​ Python code can be displayed in your report if it is succinct (not more than a page or two at the most) or submitted separately. ​ The latex sample document shows how to display Python code in latex document. ​ Code needs to be there so we can make sure that you implemented ​the algorithms and data analysis methodology correctly. ​ Canvas allows you to submit multiple files for an assignment, so DO NOT submit an archive file (tar, zip, etc). 
 + 
 +===== Grading ===== 
 + 
 +Here is what the grade sheet will look like for this assignment.  ​A few general guidelines for this and future assignments in the course: 
 + 
 +  * Always provide a description of the method you used to produce a given result in sufficient detail such that the reader ​can reproduce your results on the basis of the description (UNLESS the method has been provided in class or is there in the book). ​ Your code needs to be provided in sufficient detail so we can make sure that your implementation is correct The saying that "the devil is in the details"​ holds true for machine learning, and is sometimes the makes the difference between correct and incorrect results. ​ If your code is more than a few lines, you can include it as an appendix to your report, or submit it as a separate file.  Make sure your code is readable! 
 +  * You can provide results in the form of tables, figures or text - whatever form is most appropriate for a given problem. 
 +  * In any machine learning paper there is a discussion of the results. ​ There is a similar expectation from your assignments that you reason about your results. ​ For example, for the learning curve problem, what can you say on the basis of the observed learning curve? 
 +  * Write succinct answers. ​ We will take off points for rambling answers that are not to the point, and and similarly, if we have to wade through a lot of data/​results that are not to the point. 
 + 
 +<​code>​ 
 +Grading sheet for assignment 2 
 + 
 +Part 1:  50 points. 
 +(20 points): ​ Plots of MAD and RMSE as a function of lambda are generated correctly. 
 +(20 points): ​ REC curves are generated correctly 
 +( 5 points): ​ discussion of REC curves 
 +( 5 points): ​ Discussion of the MAD and RMSE plots and comparison of results to the published ones. 
 + 
 +Part 2:  40 points. 
 +(30 points): ​ Weight vector analysis 
 +(10 points): ​ Comparison to the published results 
 + 
 +Report structure, grammar and spelling: ​ 10 points 
 +(10 points): ​ Heading and subheading structure easy to follow and clearly divides report into logical sections. ​  
 +              Code, math, figure captions, and all other aspects of the report are well-written and formatted. 
 +              Grammar, spelling, and punctuation. 
 +</​code>​
assignments/assignment3.txt · Last modified: 2016/09/20 09:34 by asa