Warning: Declaration of action_plugin_wrap::register(&$controller) should be compatible with DokuWiki_Action_Plugin::register(Doku_Event_Handler $controller) in /s/bach/b/class/cs545/public_html/fall13/dokuwiki/lib/plugins/wrap/action.php on line 148
Warning: Declaration of action_plugin_tablewidth::register(&$controller) should be compatible with DokuWiki_Action_Plugin::register(Doku_Event_Handler $controller) in /s/bach/b/class/cs545/public_html/fall13/dokuwiki/lib/plugins/tablewidth/action.php on line 93 feature_selection_bias [CS545 fall 2013]
When using feature selection you need to be very careful in how you evaluate your classifier.
Here's the wrong way of doing it:
from PyML import *
# the wrong way of using feature selection
data = SparseDataSet('colon.data')# distinguish between normal tissue and tissue affected by colon cancer# data is available from:# http://mldata.org/repository/data/viewslug/colon-cancer/# create an instance of the RFE feature selection method
rfe = featsel.RFE()# a feature selector's train method selects a subset of features
rfe.train(data)
results1 = SVM().stratifiedCV(data)
If you run this you will get a classifier with perfect accuracy. Now let's do it the right way:
# the right way to perform feature selection:# feature selection is performed as part of training the classifier
data = SparseDataSet('colon.data')
results2 = composite.FeatureSelect(SVM(), featsel.RFE()).stratifiedCV(data)
feature_selection_bias.txt · Last modified: 2013/11/12 18:50 by asa