Warning: Declaration of action_plugin_tablewidth::register(&$controller) should be compatible with DokuWiki_Action_Plugin::register(Doku_Event_Handler $controller) in /s/bach/b/class/cs545/public_html/fall16/lib/plugins/tablewidth/action.php on line 93
assignments:assignment3 [CS545 fall 2016]

User Tools

Site Tools


assignments:assignment3

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
Next revision Both sides next revision
assignments:assignment3 [2013/10/04 15:32]
asa created
assignments:assignment3 [2016/09/15 14:43]
asa
Line 1: Line 1:
-========= Assignment 3: Support Vector Machines ============+~~NOTOC~~
  
-===== Part 1:  SVM with no bias term =====+====== Assignment 3 ======
  
-Formulate a soft-margin SVM without the bias term, i.e. $f(\x) = \w^{\tr} \x$. +**Due:** 10/2 at 11pm.
-Derive the saddle point conditions, KKT conditions and the dual. +
-Compare it to the standard SVM formulation. +
-What is the implication of the difference on the design of SMO-like algorithms?​ +
-Recall that SMO algorithms work by iteratively optimizing two variables at a time. +
-Hint ​consider the difference in the constraints.+
  
-Discuss ​the merit of the bias-less formulation as the dimensionality +===== Preliminaries ===== 
-of the data (or the feature ​space) is varied+ 
-When using this SVM formulation ​it may be useful ​to add constant ​to the +In this assignment you will explore ridge regression applied to the task of predicting wine quality. 
-kernel matrix.  ​Explain why this can be beneficial.+You will use the [[http://​archive.ics.uci.edu/​ml/​datasets/​Wine+Quality | wine quality]] dataset from the UCI machine learning repository, and compare accuracy obtained using ridge regression to the results from a [[http://​www.sciencedirect.com/​science/​article/​pii/​S0167923609001377#​ | recent publication]] (if you have trouble accessing that version of the paper, here's a link to a [[http://​www3.dsi.uminho.pt/​pcortez/​wine5.pdf| preprint]]. 
 +The wine data is composed ​of two datasets - one for white wines, and one for reds.  In this assignment perform all your analyses on just the red wine data
 + 
 +The features for the wine dataset are not standardized,​ so make sure you do this, especially since we are going to consider the magnitude of the weight vector ​(recall that standardization entails subtracting ​the mean and then dividing by the standard deviation for each feature; you can use the [[http://​docs.scipy.org/​doc/​numpy/​reference/​routines.statistics.html | Numpy statistics module]] to perform the required calculations).   
 +==== Part 1 ==== 
 + 
 +Implement ridge regression keeping the same API you used in implementing the classifiers in assignment 2, and functions for computing the following measures of error: 
 + 
 +  * The Root Mean Square Error (RMSE). 
 +  * The Maximum Absolute Deviation (MAD). 
 + 
 + 
 +For a hypothesis $h$, they are defined as follows: 
 + 
 +$$RMSE(h) = \sqrt{\frac{1}{N}\sum_{i=1}^N (y_i - h(\mathbf{x}_i))^2}$$ 
 + 
 +and 
 + 
 +$$MAD(h) = \frac{1}{N}\sum_{i=1}^N |y_i - h(\mathbf{x}_i)|.$$ 
 + 
 +With the code you just implemented,​ your next task is to explore the dependence of error on the value of the regularization parameter, $\lambda$
 +In what follows set aside 30% of the data as a test-set, and compute the in-sample error, and the test-set error as a function of the parameter $\lambda$ on the red wine data.  Choose the values of $\lambda$ on a logarithmic scale with values 0.01, 0.1, 1, 10, 100, 1000 and plot the RMSE only. 
 +Repeat the same experiment where instead of using all the training data, choose 20 random training examples. 
 + 
 +Now answer the following:​ 
 + 
 +  * What is the optimal value of $\lambda$?​ 
 +  * What observations can you make on the basis of these plots? ​ (The concepts of overfitting/​underfitting should be addressed in your answer). 
 +  * Finally, compare the results that you are getting with the published results in the paper linked above. ​ In particular, is the performance you have obtained is comparable to that observed in the paper? 
 + 
 +==== Part 2 ==== 
 + 
 +Regression Error Characteristic (REC) curves are an interesting way of visualizing regression error as described 
 +in the following [[http://​machinelearning.wustl.edu/​mlpapers/​paper_files/​icml2003_BiB03.pdf|paper]]. 
 +Write a function that plots the REC curve of a regression method, and plot the REC curve of the best regressor you found in Part 1 of the assignment. 
 +What can you learn from this curve that you cannot learn from an error measure such as RMSE or MAD? 
 + 
 + 
 +==== Part 3 ==== 
 + 
 +As we discussed in class, the magnitude of the weight vector can be interpreted as a measure of feature importance. 
 +Train a ridge regression classifier on a subset of the dataset that you reserved for training. 
 +We will explore the relationship between the magnitude of weight vector components and their relevance to the classification task in several ways. 
 +Each feature is associated with a component of the weight vector. ​ It can also be associated with the correlation of that feature with the vector of labels. 
 +Create a scatter plot of the weight vector component against the [[https://​en.wikipedia.org/​wiki/​Pearson_product-moment_correlation_coefficient | Pearson correlation coefficient]] of a feature against the labels (again, you can use the [[http://​docs.scipy.org/​doc/​numpy/​reference/​routines.statistics.html | Numpy statistics module]] to compute ​it). 
 +What can you conclude from this plot? 
 +The paper ranks features according to their importance using a different approach. ​ Compare your results with what they obtain. 
 + 
 +Next, perform the following experiment:​ 
 +Incrementally remove the feature with the lowest absolute value of the weight vector and retrain the ridge regression classifier. 
 +Plot RMSE as a function of the number of features that remain on the test set which you have set aside. 
 + 
 +===== Submission ===== 
 + 
 +Submit your report via Canvas. ​ Python code can be displayed in your report if it is succinct (not more than a page or two at the most) or submitted separately. ​ The latex sample document shows how to display Python code in latex document. ​ Code needs to be there so we can make sure that you implemented ​the algorithms and data analysis methodology correctly. ​ Canvas allows you to submit multiple files for an assignment, so DO NOT submit an archive file (tar, zip, etc). 
 + 
 +===== Grading ===== 
 + 
 +Here is what the grade sheet will look like for this assignment.  ​A few general guidelines for this and future assignments in the course: 
 + 
 +  * Always provide a description of the method you used to produce a given result in sufficient detail such that the reader ​can reproduce your results on the basis of the description (UNLESS the method has been provided in class or is there in the book). ​ Your code needs to be provided in sufficient detail so we can make sure that your implementation is correct The saying that "the devil is in the details"​ holds true for machine learning, and is sometimes the makes the difference between correct and incorrect results. ​ If your code is more than a few lines, you can include it as an appendix to your report, or submit it as a separate file.  Make sure your code is readable! 
 +  * You can provide results in the form of tables, figures or text - whatever form is most appropriate for a given problem. 
 +  * In any machine learning paper there is a discussion of the results. ​ There is a similar expectation from your assignments that you reason about your results. ​ For example, for the learning curve problem, what can you say on the basis of the observed learning curve? 
 +  * Write succinct answers. ​ We will take off points for rambling answers that are not to the point, and and similarly, if we have to wade through a lot of data/​results that are not to the point. 
 + 
 +<​code>​ 
 +Grading sheet for assignment 2 
 + 
 +Part 1:  50 points. 
 +(20 points): ​ Plots of MAD and RMSE as a function of lambda are generated correctly. 
 +(20 points): ​ REC curves are generated correctly 
 +( 5 points): ​ discussion of REC curves 
 +( 5 points): ​ Discussion of the MAD and RMSE plots and comparison of results to the published ones. 
 + 
 +Part 2:  40 points. 
 +(30 points): ​ Weight vector analysis 
 +(10 points): ​ Comparison to the published results 
 + 
 +Report structure, grammar and spelling: ​ 10 points 
 +(10 points): ​ Heading and subheading structure easy to follow and clearly divides report into logical sections. ​  
 +              Code, math, figure captions, and all other aspects of the report are well-written and formatted. 
 +              Grammar, spelling, and punctuation. 
 +</​code>​
assignments/assignment3.txt · Last modified: 2016/09/20 09:34 by asa