Differences

This shows you the differences between two versions of the page.

--- assignments:assignment3 [2013/10/06 11:54]
asa
+++ assignments:assignment3 [2013/10/06 20:51]
asa
@@ Line 12: / Line 12: @@
 ===== Part 2:  Closest Centroid Algorithm =====
-Express the closest centroid algorithm in terms of kernels, i.e. determine how the $alpha_i$ coefficients will be determined using a given labeled dataset.
+Express the closest centroid algorithm in terms of kernels, i.e. determine how the coefficients $\alpha_i$ will be computed using a given labeled dataset.
-===== Part 3:  Using SVMs =====
+===== Part 3:  Soft-margin SVM for separable data =====
+Consider training a soft-margin SVM
+with $C$ set to some positive constant. Suppose the training data is linearly separable.
+Since increasing the $\xi_i$ can only increase the objective of the primal problem (which
+we are trying to minimize), at the optimal solution to the primal problem, all the
+training examples will have $\xi_i$ equal
+to zero. True or false?  Explain!
+Given a linearly separable dataset, is it necessarily better to use a
+a hard margin SVM over a soft-margin SVM?
+===== Part 4:  Using SVMs =====
 The data for this question comes from a database called SCOP (structural
 classification of proteins), which classifies proteins into classes
-according to their structure (download it from {{assignments:scop_motif.data|here}}.
+according to their structure (download it from {{assignments:scop_motif.data|here}}).
 The data is a two-class classification
 problem
@@ Line 36: / Line 47: @@
   * A. Ben-Hur and D. Brutlag. [[http://bioinformatics.oxfordjournals.org/content/19/suppl_1/i26.abstract | Remote homology detection: a motif based approach]]. In: Proceedings, eleventh international conference on intelligent systems for molecular biology. Bioinformatics 19(Suppl. 1): i26-i33, 2003.
-Download the dataset associated with this assignment from the homework
+In this part of the assignment we will explore the dependence of classifier accuracy on
-page of the course.
+the kernel, kernel parameters, kernel normalization, and SVM parameter soft-margin parameter.
-In this assignment we will explore the dependence of classifier accuracy on
+The use of the SVM class is discussed in the PyML [[http://pyml.sourceforge.net/tutorial.html#svms|tutorial]], and by using help(SVM) in the python interpreter.
-the kernel, kernel parameters, kernel normalization, and SVM parameter.
-The use of the SVM class is discussed in the PyML [[http://pyml.sourceforge.net/tutorial.html#svms|tutorial]].
-By default a dataset is instantiated with a linear kernel attached to it.
+By default, a dataset is instantiated with a linear kernel attached to it.
 To use a different kernel you need to attach a new kernel to the dataset:
 <code python>
@@ Line 53: / Line 62: @@
 >>> data.attachKernel(ker.Polynomial(degree = 3))
 </code>
+Alternatively, you can instantiate an SVM with a given kernel:
+<code python>
+>>> classifier = SVM(ker.Gaussian(gamma = 0.1))
+</code>
+This will override the kernel the data is associated with.
 In this question we will consider both the Gaussian and polynomial kernels:
 $$
-K_{gaus}(\mathbf{x}, \mathbf{x'}) = \exp(-\gamma || \mathbf{x} - \mathbf{x}' ||^2)
+K_{gauss}(\mathbf{x}, \mathbf{x'}) = \exp(-\gamma || \mathbf{x} - \mathbf{x}' ||^2)
 $$
 $$
 K_{poly}(\mathbf{x}, \mathbf{x'}) = (1 + \mathbf{x}^T \mathbf{x}') ^{p}
 $$
-Plot the accuracy of the classifier, measured using the success rate and the area under the ROC curve
+Plot the accuracy of the SVM, measured using the balanced success rate
-as a function of both the ridge parameter of the classifier, and the free parameter
+as a function of both the soft-margin parameter of the SVM, and the free parameter
 of the kernel function.
 Show a couple of representative cross sections of this plot for a given value
@@ Line 67: / Line 82: @@
 Comment on the results.  When exploring the values of a continuous
 classifier/kernel parameter it is
-useful use values that are distributed on an exponential grid,
+useful to use values that are distributed on an exponential grid,
 i.e. something like 0.01, 0.1, 1, 10, 100 (note that the degree of the
 polynomial kernel is not such a parameter).
+For this type of sparse dataset it is useful to normalize the input features.
-For this type of sparse dataset it is useful to normalize the input features before
-training and testing your classifier.
 One way to do so is to divide each input example by its norm.  This is
 accomplished in PyML by:

CS545 fall 2016

User Tools

Site Tools

Differences

Page Tools