Warning: Declaration of action_plugin_wrap::register(&$controller) should be compatible with DokuWiki_Action_Plugin::register(Doku_Event_Handler $controller) in /s/bach/b/class/cs545/public_html/fall13/dokuwiki/lib/plugins/wrap/action.php on line 148
Warning: Declaration of action_plugin_tablewidth::register(&$controller) should be compatible with DokuWiki_Action_Plugin::register(Doku_Event_Handler $controller) in /s/bach/b/class/cs545/public_html/fall13/dokuwiki/lib/plugins/tablewidth/action.php on line 93
Warning: Declaration of syntax_plugin_fontsize2::handle($match, $state, $pos, &$handler) should be compatible with DokuWiki_Syntax_Plugin::handle($match, $state, $pos, Doku_Handler $handler) in /s/bach/b/class/cs545/public_html/fall13/dokuwiki/lib/plugins/fontsize2/syntax.php on line 19
Warning: Declaration of syntax_plugin_fontsize2::render($mode, &$renderer, $data) should be compatible with DokuWiki_Syntax_Plugin::render($format, Doku_Renderer $renderer, $data) in /s/bach/b/class/cs545/public_html/fall13/dokuwiki/lib/plugins/fontsize2/syntax.php on line 19
Warning: Declaration of syntax_plugin_comment::handle($match, $state, $pos, &$handler) should be compatible with DokuWiki_Syntax_Plugin::handle($match, $state, $pos, Doku_Handler $handler) in /s/bach/b/class/cs545/public_html/fall13/dokuwiki/lib/plugins/comment/syntax.php on line 30
Warning: Declaration of syntax_plugin_comment::render($mode, &$renderer, $data) should be compatible with DokuWiki_Syntax_Plugin::render($format, Doku_Renderer $renderer, $data) in /s/bach/b/class/cs545/public_html/fall13/dokuwiki/lib/plugins/comment/syntax.php on line 30
Warning: Declaration of syntax_plugin_wrap_span::handle($match, $state, $pos, &$handler) should be compatible with DokuWiki_Syntax_Plugin::handle($match, $state, $pos, Doku_Handler $handler) in /s/bach/b/class/cs545/public_html/fall13/dokuwiki/lib/plugins/wrap/syntax/span.php on line 88
Warning: Declaration of syntax_plugin_wrap_span::render($mode, &$renderer, $indata) should be compatible with DokuWiki_Syntax_Plugin::render($format, Doku_Renderer $renderer, $data) in /s/bach/b/class/cs545/public_html/fall13/dokuwiki/lib/plugins/wrap/syntax/span.php on line 88
Warning: Declaration of syntax_plugin_wrap_closesection::handle($match, $state, $pos, &$handler) should be compatible with DokuWiki_Syntax_Plugin::handle($match, $state, $pos, Doku_Handler $handler) in /s/bach/b/class/cs545/public_html/fall13/dokuwiki/lib/plugins/wrap/syntax/closesection.php on line 39
Warning: Declaration of syntax_plugin_wrap_closesection::render($mode, &$renderer, $indata) should be compatible with DokuWiki_Syntax_Plugin::render($format, Doku_Renderer $renderer, $data) in /s/bach/b/class/cs545/public_html/fall13/dokuwiki/lib/plugins/wrap/syntax/closesection.php on line 39
Warning: Declaration of syntax_plugin_wrap_div::handle($match, $state, $pos, &$handler) should be compatible with DokuWiki_Syntax_Plugin::handle($match, $state, $pos, Doku_Handler $handler) in /s/bach/b/class/cs545/public_html/fall13/dokuwiki/lib/plugins/wrap/syntax/div.php on line 116
Warning: Declaration of syntax_plugin_wrap_div::render($mode, &$renderer, $indata) should be compatible with DokuWiki_Syntax_Plugin::render($format, Doku_Renderer $renderer, $data) in /s/bach/b/class/cs545/public_html/fall13/dokuwiki/lib/plugins/wrap/syntax/div.php on line 116
Warning: Declaration of syntax_plugin_tablewidth::handle($match, $state, $pos, &$handler) should be compatible with DokuWiki_Syntax_Plugin::handle($match, $state, $pos, Doku_Handler $handler) in /s/bach/b/class/cs545/public_html/fall13/dokuwiki/lib/plugins/tablewidth/syntax.php on line 57
Warning: Declaration of syntax_plugin_tablewidth::render($mode, &$renderer, $data) should be compatible with DokuWiki_Syntax_Plugin::render($format, Doku_Renderer $renderer, $data) in /s/bach/b/class/cs545/public_html/fall13/dokuwiki/lib/plugins/tablewidth/syntax.php on line 57
Warning: Declaration of syntax_plugin_mathjax_protecttex::render($mode, &$renderer, $data) should be compatible with DokuWiki_Syntax_Plugin::render($format, Doku_Renderer $renderer, $data) in /s/bach/b/class/cs545/public_html/fall13/dokuwiki/lib/plugins/mathjax/syntax/protecttex.php on line 157
Warning: Cannot modify header information - headers already sent by (output started at /s/bach/b/class/cs545/public_html/fall13/dokuwiki/lib/plugins/tablewidth/syntax.php:57) in /s/bach/b/class/cs545/public_html/fall13/dokuwiki/inc/actions.php on line 210
Warning: Cannot modify header information - headers already sent by (output started at /s/bach/b/class/cs545/public_html/fall13/dokuwiki/lib/plugins/tablewidth/syntax.php:57) in /s/bach/b/class/cs545/public_html/fall13/dokuwiki/lib/tpl/dokuwiki/main.php on line 12 notes:evaluating_classifier_performance [CS545 fall 2013]
Hands on work in PyML and evaluating classifier performance
Let's start using our perceptron on some data and see how it's doing.
Reading in the data
First we need to import PyML:
In [1]: from PyML import *
Next let's load the gisette dataset that you are going to use in assignment 1:
In [2]: data = vectorDatasets.PyVectorDataSet("gisette_train.data")
This loads the feature matrix, and creates an unlabeled dataset, because the labels for this dataset are provided separately.
To attach labels to the data we read the labels:
In [3]: data.attachLabels(Labels("gisette_train.labels"))
Note that PyML has several data containers VectorDataSet and SparseDataSet which have an underlying C++ implementation, and PyVectorDataSet, which uses a Numpy array. We are using PyVectorDataSet in this case since our perceptron is implemented in pure python.
Let's find out a few things about the dataset:
In [4]: print data
<PyVectorDataSet instance>
number of patterns: 6000
number of features: 5000class Label / Size
-1 : 30001 : 3000
This tells us there are 6000 labeled examples in the dataset, and how many there are of each class, as well as the dimensionality of the data (number of features).
You can access this information directly:
In [4]: printlen(tr), tr.numFeatures30005000
The labels are stored in a Labels object that is associated with the dataset:
The labels themselves are stored as a list in the Y attribute of the labels object, such that the label associated with the ith training example is data.labels.Y[i].
As noted above, PyVectorDataset uses a numpy array to store its data and this array is accessible as the X attribute of a dataset:
In [6]: printtype(data.X), data.X.shape<type'numpy.ndarray'>(6000,5000)
Let's split the dataset into two parts, one for training and one for testing:
In [7]: tr, tst = data.split(0.5)
The argument to split indicates what fraction of the data to use for the first dataset.
Using the classifier
Importing and instantiating an instance of the perceptron:
In [6]: import perceptron
In [7]: p = perceptron.Perceptron()
Every classifier has a train method that constructs the model based on some training data:
In [8]: p.train(tr)
converged in10 iterations
Notice that the perceptron has converged. That means the data is linearly separable.
Now let's run the classifier on the data we have used for training:
In [9]: results1 = p.test(tr)
In [10]: print results1
Confusion Matrix:
predicted labels:
-11
-115000101500
success rate: 1.000000
balanced success rate: 1.000000
area under ROC curve: 1.000000
area under ROC 50 curve: 0.980000
Since the perceptron has converged, it perfectly separates the positive from negative examples, and achieves perfect classification accuracy. Does this mean we have a good classifier? Not necessarily. Let's apply it to our testing data.
In [12]: results = p.test(tst)
In [13]: print results
Confusion Matrix:
predicted labels:
-11
-11439611621438
success rate: 0.959000
balanced success rate: 0.959000
area under ROC curve: 0.992884
area under ROC 50 curve: 0.852040
The classifier is still doing well, but definitely not perfect!
notes/evaluating_classifier_performance.txt · Last modified: 2013/09/06 09:52 by asa