This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
assignments:assignment3 [2013/10/04 21:06] asa |
assignments:assignment3 [2013/10/06 15:22] asa |
||
---|---|---|---|
Line 10: | Line 10: | ||
Hint: consider the difference in the constraints. | Hint: consider the difference in the constraints. | ||
- | Discuss the merit of the bias-less formulation as the dimensionality | + | ===== Part 2: Closest Centroid Algorithm ===== |
- | of the data (or the feature space) is varied. | + | |
- | When using this SVM formulation it may be useful to add a constant to the | + | |
- | kernel matrix. Explain why this can be beneficial. | + | |
+ | Express the closest centroid algorithm in terms of kernels, i.e. determine how the coefficients $\alpha_i$ will be computed using a given labeled dataset. | ||
- | ===== Part 3: Using the SVM ===== | + | ==== Part 3: Soft-margin for separable data ==== |
- | Download the dataset associated with this assignment from the homework | + | Consider training a soft-margin SVM |
- | page of the course. | + | with $C$ set to some positive constant. Suppose the training data is linearly separable. |
- | In this assignment we will explore the dependence of classifier accuracy on | + | Since increasing the $\xi_i$ can only increase the objective of the primal problem (which |
- | the kernel, kernel parameters, kernel normalization, and SVM parameter. | + | we are trying to minimize), at the optimal solution to the primal problem, all the |
- | The use of the SVM class is discussed in the PyML [[http://pyml.sourceforge.net/tutorial.html#svms|tutorial]]. | + | training examples will have $\xi_i$ equal |
+ | to zero. True or false? Explain! | ||
+ | Given a linearly separable dataset, is it necessarily better to use a | ||
+ | a hard margin SVM over a soft-margin SVM? | ||
- | By default a dataset is instantiated with a linear kernel attached to it. | + | ===== Part 4: Using SVMs ===== |
+ | |||
+ | The data for this question comes from a database called SCOP (structural | ||
+ | classification of proteins), which classifies proteins into classes | ||
+ | according to their structure (download it from {{assignments:scop_motif.data|here}}). | ||
+ | The data is a two-class classification | ||
+ | problem | ||
+ | of distinguishing a particular class of proteins from a selection of | ||
+ | examples sampled from the rest of the SCOP database. | ||
+ | I chose to represent the proteins in | ||
+ | terms of their motif composition. A sequence motif is a | ||
+ | pattern of nucleotides/amino acids that is conserved in evolution. | ||
+ | Motifs are usually associated with regions of the protein that are | ||
+ | important for its function, and are therefore useful in differentiating between classes of proteins. | ||
+ | A given protein will typically contain only a handful of motifs, and | ||
+ | so the data is very sparse. It is also very high dimensional, since | ||
+ | the number of conserved patterns in the space of all proteins is | ||
+ | large. | ||
+ | The data was constructed as part of the following analysis of detecting distant relationships between proteins: | ||
+ | |||
+ | * A. Ben-Hur and D. Brutlag. [[http://bioinformatics.oxfordjournals.org/content/19/suppl_1/i26.abstract | Remote homology detection: a motif based approach]]. In: Proceedings, eleventh international conference on intelligent systems for molecular biology. Bioinformatics 19(Suppl. 1): i26-i33, 2003. | ||
+ | |||
+ | In this part of the assignment we will explore the dependence of classifier accuracy on | ||
+ | the kernel, kernel parameters, kernel normalization, and SVM parameter soft-margin parameter. | ||
+ | The use of the SVM class is discussed in the PyML [[http://pyml.sourceforge.net/tutorial.html#svms|tutorial]], and by using help(SVM) in the python interpreter. | ||
+ | |||
+ | By default, a dataset is instantiated with a linear kernel attached to it. | ||
To use a different kernel you need to attach a new kernel to the dataset: | To use a different kernel you need to attach a new kernel to the dataset: | ||
<code python> | <code python> | ||
Line 35: | Line 62: | ||
>>> data.attachKernel(ker.Polynomial(degree = 3)) | >>> data.attachKernel(ker.Polynomial(degree = 3)) | ||
</code> | </code> | ||
+ | Alternatively, you can instantiate an SVM with a given kernel: | ||
+ | <code python> | ||
+ | >>> classifier = SVM(ker.Gaussian(gamma = 0.1)) | ||
+ | </code> | ||
+ | This will override the kernel the data is associated with. | ||
+ | |||
In this question we will consider both the Gaussian and polynomial kernels: | In this question we will consider both the Gaussian and polynomial kernels: | ||
$$ | $$ | ||
- | K_{gaus}(\mathbf{x}, \mathbf{x'} = \exp(-\gamma || \mathbf{x} - \mathbf{x}' ||^2) | + | K_{gauss}(\mathbf{x}, \mathbf{x'}) = \exp(-\gamma || \mathbf{x} - \mathbf{x}' ||^2) |
$$ | $$ | ||
$$ | $$ | ||
- | K_{poly}(\mathbf{x}, \mathbf{x'} = (1 + \mathbf{x}^T \mathbf{x}') ^{p} | + | K_{poly}(\mathbf{x}, \mathbf{x'}) = (1 + \mathbf{x}^T \mathbf{x}') ^{p} |
$$ | $$ | ||
- | Plot the accuracy of the classifier, measured using the success rate and the area under the ROC curve | + | Plot the accuracy of the SVM, measured using the balnced success rate |
- | as a function of both the ridge parameter of the classifier, and the free parameter | + | as a function of both the soft-margin parameter of the SVM, and the free parameter |
of the kernel function. | of the kernel function. | ||
Show a couple of representative cross sections of this plot for a given value | Show a couple of representative cross sections of this plot for a given value | ||
Line 49: | Line 82: | ||
Comment on the results. When exploring the values of a continuous | Comment on the results. When exploring the values of a continuous | ||
classifier/kernel parameter it is | classifier/kernel parameter it is | ||
- | useful use values that are distributed on an exponential grid, | + | useful to use values that are distributed on an exponential grid, |
i.e. something like 0.01, 0.1, 1, 10, 100 (note that the degree of the | i.e. something like 0.01, 0.1, 1, 10, 100 (note that the degree of the | ||
polynomial kernel is not such a parameter). | polynomial kernel is not such a parameter). | ||
- | The data for this question comes from a database called SCOP (structural | + | For this type of sparse dataset it is useful to normalize the input features. |
- | classification of proteins), which classifies proteins into classes | + | |
- | according to their structure. The data is a two-class classification | + | |
- | problem | + | |
- | of distinguishing a particular class of proteins from a selection of | + | |
- | examples sampled from the rest of the SCOP database. | + | |
- | I chose to represent the proteins in | + | |
- | terms of their motif composition. A sequence motif is a | + | |
- | pattern of nucleotides/amino acids that is conserved in evolution. | + | |
- | Motifs are usually associated with regions of the protein that are | + | |
- | important for its function, and are therefore useful in predicting protein | + | |
- | function. | + | |
- | A given protein will typically contain only a handful of motifs, and | + | |
- | so the data is very sparse. It is also very high dimensional, since | + | |
- | the number of conserved patterns in the space of all proteins is | + | |
- | large. | + | |
- | More information about motifs and their usefulness in classifying | + | |
- | proteins can be found in the following paper: | + | |
- | + | ||
- | * A. Ben-Hur and D. Brutlag. Protein sequence motifs: Highly predictive features of protein function. In: Feature extraction, foundations and applications. I. Guyon, S. Gunn, M. Nikravesh, and L. Zadeh (eds.) Springer Verlag, 2006. | + | |
- | + | ||
- | For this type of sparse dataset it is useful to normalize the input features before | + | |
- | training and testing your classifier. | + | |
One way to do so is to divide each input example by its norm. This is | One way to do so is to divide each input example by its norm. This is | ||
accomplished in PyML by: | accomplished in PyML by: | ||
<code python> | <code python> | ||
- | data.normalize() | + | >>> data.normalize() |
</code> | </code> | ||
Compare the results under this normalization with what you obtain | Compare the results under this normalization with what you obtain | ||
Line 91: | Line 102: | ||
Explain the structure that you are seeing in the plot (it is more | Explain the structure that you are seeing in the plot (it is more | ||
interesting when the data is normalized). | interesting when the data is normalized). | ||
- |