Due: November 3rd at 6pm
a. In a recent lecture your instructor claimed that when the data is very imbalanced, ROC curves are not always a good indicator of the usefulness of the results provided by the classifier if the user is mainly interested in the top predictions produced by the classifier. Explain!
b. True or False: A classifier with a high AUC score on a given problem will have a high success rate. Explain!
Suppose you are training an SVM classifier on a given dataset with varying number of training examples, and for each dataset you are selecting the optimal C parameter. How do you expect the optimal C parameter for an SVM to scale with the number of training examples?
a. Suppose you have a dataset for which the kernel matrix has the following property: $K_{ij} \ll K_{ii}$ where $i \neq j$ (i.e. the off diagonal elements are much smaller than the diagonal elements). Do you expect a classifier to perform well using such a kernel? (Hint: when does this happen when using the Gaussian kernel).
b. A good heuristic for choosing the parameter $\gamma$ of the Gaussian kernel is as the inverse of the median of the squared distance between pairs of examples. Explain why this is a good idea.
In the description of the wearable computing dataset the contributers of the data mention that “we have lower performance on 'leave-one-subject-out' tests” (meaning lower than standard 10-fold cross validation). Explain what they mean, and why this is the case. Which form of cross-validation is more relevant?
Compare the performance of the one-vs-one and one-vs-all multi-class classifiers on the following datasets:
* Amazon commerce reviews (a 50-class dataset). This dataset is in ARFF format. There are python parsers for this format. Here's one that works on this dataset: https://github.com/renatopp/liac-arff.
* The ISOLET spoken letter recognition.
For the amazon dataset use cross-validation. For the ISOLET data either use the provided dataset or perform cross-validation. In these two datasets the number of classes is large, so displaying the confusion matrix as numbers is not a good option. Find an alternative visual representation and comment on the results.
Here is what the grading sheet will look like for this assignment. A few general guidelines for this and future assignments in the course:
Grading sheet for assignment 4 Part 1: 15 points. ( 7 points): Part a ( 8 points): Part b Part 2: 10 points. Part 3: 15 points. Part 4: 10 points. Part 5: 40 points. (15 points): Experimental protocol (15 points): Results for the two classifiers on both datasets and their visualization (10 points): Discussion of the results Report structure, grammar and spelling: 15 points ( 3 points): Heading and subheading structure easy to follow and clearly divides report into logical sections. ( 4 points): Code, math, figure captions, and all other aspects of report are well-written and formatted. ( 3 points): Grammar, spelling, and punctuation.