Warning: Declaration of action_plugin_tablewidth::register(&$controller) should be compatible with DokuWiki_Action_Plugin::register(Doku_Event_Handler $controller) in /s/bach/b/class/cs545/public_html/fall16/lib/plugins/tablewidth/action.php on line 93

Warning: Declaration of syntax_plugin_mathjax_protecttex::render($mode, &$renderer, $data) should be compatible with DokuWiki_Syntax_Plugin::render($format, Doku_Renderer $renderer, $data) in /s/bach/b/class/cs545/public_html/fall16/lib/plugins/mathjax/syntax/protecttex.php on line 15
assignments:assignment3 [CS545 fall 2016]

User Tools

Site Tools


assignments:assignment3

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision Both sides next revision
assignments:assignment3 [2015/10/02 09:48]
asa
assignments:assignment3 [2015/10/02 10:04]
asa
Line 39: Line 39:
 important for its function, and are therefore useful in differentiating between classes of proteins. important for its function, and are therefore useful in differentiating between classes of proteins.
 A given protein will typically contain only a handful of motifs, and A given protein will typically contain only a handful of motifs, and
-so the data is very sparse. ​ ​It ​is also very high dimensionalsince+so the data is very sparse. 
 +Therefore, only the non-zero elements of the data are represented. 
 +Each line in the file describes a single example and has the format: 
 + 
 +<​code>​ 
 +d1scta_,​a.1.1.2 31417:1.0 32645:1.0 39208:1.0 42164:1.0 .... 
 +</​code>​ 
 +The first column is the ID of the protein, the second is the class it belongs to (the values for the class variable are ''​a.1.1.2'',​ which is the given class of proteins, and ''​rest''​ which is the negative class representing the rest of the database), and the rest of the elements are pairs of the form ''​feature_id:​value''​ - an id of a feature and the value associated with it. 
 +This is an extension of the format used by LibSVM, that scikit-learn can read. 
 +See a discussion [[http://​scikit-learn.org/​stable/​datasets/#​datasets-in-svmlight-libsvm-format | here]]. 
 + 
 +We note that the data is very high dimensional since
 the number of conserved patterns in the space of all proteins is the number of conserved patterns in the space of all proteins is
 large. large.
assignments/assignment3.txt · Last modified: 2016/09/20 09:34 by asa