Assignment 4: Multi-class classification and classifier evaluation

Assignment 4: Multi-class classification and classifier evaluation

Due: November 3rd at 6pm

Part 1: ROC curves

a. In a recent lecture your instructor claimed that when the data is very imbalanced, ROC curves are not always a good indicator of the usefulness of the results provided by the classifier if the user is mainly interested in the top predictions produced by the classifier. Explain!

b. True or False: A classifier with a high AUC score on a given problem will have a high success rate. Explain!

Part 2: Soft-margin SVM

Suppose you are training an SVM classifier on a given dataset with varying number of training examples, and for each dataset you are selecting the optimal C parameter. How do you expect the optimal C parameter for an SVM to scale with the number of training examples?

Part 3: Kernels

a. Suppose you have a dataset for which the kernel matrix has the following property: $K_{ij} \ll K_{ii}$ where $i \neq j$ (i.e. the off diagonal elements are much smaller than the diagonal elements). Do you expect a classifier to perform well using such a kernel? (Hint: when does this happen when using the Gaussian kernel).

b. A good heuristic for choosing the parameter $\gamma$ of the Gaussian kernel is as the inverse of the median of the squared distance between pairs of examples. Explain why this is a good idea.

Part 4: Classifier evaluation

In the description of the wearable computing dataset the contributers of the data mention that “we have lower performance on 'leave-one-subject-out' tests” (meaning lower than standard 10-fold cross validation). Explain what they mean, and why this is the case. Which form of cross-validation is more relevant?

Part 5: Multi-class classification

Compare the performance of the one-vs-one and one-vs-all multi-class classifiers on the following datasets:

* Amazon commerce reviews (a 50-class dataset). This dataset is in ARFF format. There are python parsers for this format. Here's one that works on this dataset: https://github.com/renatopp/liac-arff.

* The ISOLET spoken letter recognition.

For the amazon dataset use cross-validation. For the ISOLET data either use the provided dataset or perform cross-validation. In these two datasets the number of classes is large, so displaying the confusion matrix as numbers is not a good option. Find an alternative visual representation and comment on the results.

Grading

Here is what the grading sheet will look like for this assignment. A few general guidelines for this and future assignments in the course:

Always provide a description of the method you used to produce a given result in sufficient detail such that the reader can reproduce your results on the basis of the description. You can use a few lines of python code or pseudo-code. If your code is more than a few lines, you can include it as an appendix to your report. For example, for the first part of the assignment, provide the protocol you use to evaluate classifier accuracy.
You can provide results in the form of tables, figures or text - whatever form is most appropriate for a given problem. There are no rules about how much space each answer should take. BUT we will take off points if we have to wade through a lot of redundant data.
In any machine learning paper there is a discussion of the results. There is a similar expectation from your assignments that you reason about your results. For example, for the learning curve problem, what can you say on the basis of the observed learning curve?

Grading sheet for assignment 4

Part 1:  15 points.
( 7 points):  Part a
( 8 points):  Part b

Part 2:  10 points.

Part 3:  15 points.

Part 4:  10 points.

Part 5:  40 points.
(15 points):  Experimental protocol
(15 points):  Results for the two classifiers on both datasets and their visualization
(10 points):  Discussion of the results

Report structure, grammar and spelling:  15 points
( 3 points):  Heading and subheading structure easy to follow and
              clearly divides report into logical sections.
( 4 points):  Code, math, figure captions, and all other aspects of  
              report are well-written and formatted.
( 3 points):  Grammar, spelling, and punctuation.

Table of Contents