This shows you the differences between two versions of the page.
Both sides previous revision Previous revision | Next revision Both sides next revision | ||
assignments:assignment3 [2015/10/02 12:10] asa |
assignments:assignment3 [2015/10/02 12:12] asa |
||
---|---|---|---|
Line 45: | Line 45: | ||
d1scta_,a.1.1.2 31417:1.0 32645:1.0 39208:1.0 42164:1.0 .... | d1scta_,a.1.1.2 31417:1.0 32645:1.0 39208:1.0 42164:1.0 .... | ||
</code> | </code> | ||
- | The first column is the ID of the protein, the second is the class it belongs to (the values for the class variable are ''a.1.1.2'', which is the given class of proteins, and ''rest'' which is the negative class representing the rest of the database), and the rest of the elements are pairs of the form ''feature_id:value'' - an id of a feature and the value associated with it. | + | The first column is the ID of the protein, the second is the class it belongs to (the values for the class variable are ''a.1.1.2'', which is the given class of proteins, and ''rest'' which is the negative class representing the rest of the database); the remainder consists of elements of the form ''feature_id:value''which provide an id of a feature and the value associated with it. |
This is an extension of the format used by LibSVM, that scikit-learn can read. | This is an extension of the format used by LibSVM, that scikit-learn can read. | ||
- | See a discussion [[http://scikit-learn.org/stable/datasets/#datasets-in-svmlight-libsvm-format | here]]. | + | See a discussion of this format and how to read it [[http://scikit-learn.org/stable/datasets/#datasets-in-svmlight-libsvm-format | here]]. |
We note that the data is very high dimensional since | We note that the data is very high dimensional since |