This shows you the differences between two versions of the page.
Both sides previous revision Previous revision | Next revision Both sides next revision | ||
assignments:assignment3 [2015/10/02 10:04] asa |
assignments:assignment3 [2015/10/02 12:05] asa |
||
---|---|---|---|
Line 32: | Line 32: | ||
of distinguishing a particular class of proteins from a selection of | of distinguishing a particular class of proteins from a selection of | ||
examples sampled from the rest of the SCOP database | examples sampled from the rest of the SCOP database | ||
- | using features derived from their sequence (note that a protein is an arbitrary length sequence over the alphabet of the 20 amino acids). | + | using features derived from their sequence (a protein is a chain of amino acids, so as computer scientists, we can consider it as a sequence over the alphabet of the 20 amino acids). |
- | I chose to represent the proteins in | + | I chose to represent the proteins in terms of their motif composition. A sequence motif is a |
- | terms of their motif composition. A sequence motif is a | + | |
pattern of amino acids that is conserved in evolution. | pattern of amino acids that is conserved in evolution. | ||
Motifs are usually associated with regions of the protein that are | Motifs are usually associated with regions of the protein that are | ||
Line 41: | Line 40: | ||
so the data is very sparse. | so the data is very sparse. | ||
Therefore, only the non-zero elements of the data are represented. | Therefore, only the non-zero elements of the data are represented. | ||
- | Each line in the file describes a single example and has the format: | + | Each line in the file describes a single example. Here's an example from the file: |
<code> | <code> |