Differences

This shows you the differences between two versions of the page.

--- assignments:assignment5 [2013/11/04 20:18]
asa created
+++ assignments:assignment5 [2016/10/17 19:20]
asa
@@ Line 1: / Line 1: @@
-========= Assignment 5: Naive Bayes ============
+~~NOTOC~~
-Due:  November 17th at 6pm
+======== Assignment 5: Neural networks ===========
-===== Part 1:  A few short questions about naive Bayes =====
+Due:  October 31st at 11:59pm
+===== Part 1:  Multi-layer perceptrons =====
-  - Can you use naive Bayes for data that contains both categorical and real-valued features?
+In the first few slides about neural networks (also section 7.1 in chapter e-7) we discussed the expressive power of multi-layer perceptrons with a "sign" activation function.  Describe in detail a multi-layer perceptron that implements the following decision boundary:
-  - The basic assumption in naive Bayes is that all attributes are independent given the label.  How can you model just 2 of $d$ features as dependent?
-  - Given a trained naive Bayes classifier, and without access to the training data, how would you select a subset of features that are most predictive of the class label?
+{{ :assignments:boundary.png?200 |}}
+===== Part 2:  Exploring neural networks for digit classification =====
+In this segment of the assignment we will explore classification of handwritten digits with neural networks.  For that task, we will use part of the [[http://yann.lecun.com/exdb/mnist/ |MNIST]] dataset, which is very commonly used in the machine learning community.
+Your task is to explore various aspects of multi-layer neural networks using this dataset.
+For simplicity, use 25 percent of the data for evaluating network performance, and the rest reserve for training.
+Normalize the data by dividing the features by the maximum value, which will normalize them to the range [0,1] (since the minimum is 0).
+As a basis for your implementation use the neural network code I showed in class.
+Here's what you need to do:
+  * Plot network accuracy as a function of the number of hidden units for a single-layer network with a logistic activation function.  Use a range of values where the network displays both under-fitting and over-fitting.
+  * Plot network accuracy as a function of the number of hidden units for a two-layer network with a logistic activation function.  Here, also demonstrate performance in a range of values where the network exhibits both under-fitting and over-fitting.  Does this dataset benefit from the use of more than one layer?
+  * Add weight decay regularization to the neural network class you used (explain in your report how you did it).  Does the network demonstrate less over-fitting on this dataset with the addition of weight decay?
+  * The provided implementation uses the same activation function in each layer.  For solving regression problems we need to use a linear activation function to produce the output of the network.  Explain why, and what changes need to be made in the code.
+The code that was provided does not really have a bias for all but the first layer.  For 5 extra points, modify the code so that it correctly uses a bias for all layers.
+===== Submission =====
+Submit your report via Canvas.  Python code can be displayed in your report if it is short, and helps understand what you have done. The sample LaTex document provided in assignment 1 shows how to display Python code.  Submit the Python code that was used to generate the results as a file called ''assignment3.py'' (you can split the code into several .py files; Canvas allows you to submit multiple files).  Typing
+<code>
+$ python assignment4.py
+</code>
+should generate all the tables/plots used in your report.
-===== Part 2:  naive Bayes implementation =====
-Implement a naive Bayes classifier for either categorical or continuous data.  Compare its performance to that of an SVM (make sure to perform proper model selection for classifier parameters using internal cross-validation).  Use two UCI repository datasets for this task.  There are several datasets that have categorical data: e.g. [[http://archive.ics.uci.edu/ml/datasets/Nursery | nursery school application ranking]], [[http://archive.ics.uci.edu/ml/datasets/Adult | census income prediction]], and [[http://archive.ics.uci.edu/ml/datasets/Molecular+Biology+(Splice-junction+Gene+Sequences)| splice junction detection]].  If you are implementing naive Bayes for categorical data, make sure to include pseudo-counts to avoid over fitting.
 ===== Grading =====
-Here is what the grading sheet will look like for this assignment.  A few general guidelines for this and future assignments in the course:
+A few general guidelines for this and future assignments in the course:
-  * Always provide a description of the method you used to produce a given result in sufficient detail such that the reader can reproduce your results on the basis of the description.  You can use a few lines of python code or pseudo-code.  If your code is more than a few lines, you can include it as an appendix to your report.  For example, for the first part of the assignment, provide the protocol you use to evaluate classifier accuracy.
+  * Your answers should be concise and to the point.
+  * You need to use LaTex to write the report.
+  * The report is well structured, the writing is clear, with good grammar and correct spelling; good formatting of math, code, figures and captions (every figure and table needs to have a caption that explains what is being shown).
+  * Whenever you use information from the web or published papers, a reference should be provided.  Failure to do so is considered plagiarism.
+  * Always provide a description of the method you used to produce a given result in sufficient detail such that the reader can reproduce your results on the basis of the description.  You can use a few lines of python code or pseudo-code.
   * You can provide results in the form of tables, figures or text - whatever form is most appropriate for a given problem.  There are no rules about how much space each answer should take.  BUT we will take off points if we have to wade through a lot of redundant data.
-  * In any machine learning paper there is a discussion of the results.  There is a similar expectation from your assignments that you reason about your results.  For example, for the learning curve problem, what can you say on the basis of the observed learning curve?
+  * In any machine learning paper there is a discussion of the results.  There is a similar expectation from your assignments that you reason about your results.
+We will take off points if these guidelines are not followed.
 <code>
-Grading sheet for assignment 5
+Grading sheet for assignment 4
+Part 1:  40 points.
+( 5 points):  Primal SVM formulation is correct
+(10 points):  Lagrangian found correctly
+(10 points):  Derivation of saddle point equations
+(15 points):  Derivation of the dual
+Part 2:  15 points.
-Part 1:  30 points.
+Part 2:  15 points.
-(10 points):  1st question
-(10 points):  1st question
-(10 points):  1st question
-Part 2:  50 points.
+Part 3:  30 points.
-(10 points):  Experimental protocol
+(15 points):  Accuracy as a function of parameters and discussion of the results
-(20 points):  Correct classifier implementation
+(10 points):  Comparison of normalized and non-normalized kernels and correct model selection
-(10 points):  Results for the two classifiers on both datasets
+( 5 points):  Visualization of the kernel matrix and observations made about it
-(10 points):  Discussion of the results
-Report structure, grammar and spelling:  10 points
-( 3 points):  Heading and subheading structure easy to follow and
-              clearly divides report into logical sections.
-( 4 points):  Code, math, figure captions, and all other aspects of
-              report are well-written and formatted.
-( 3 points):  Grammar, spelling, and punctuation.
 </code>

CS545 fall 2016

User Tools

Site Tools

Differences

Page Tools