Are you curious about tends on who signs up for what majors at Colorado State University? The diversity data looks at numbers and genders across the College of Natural Sciences and Engineering.
Details
The gender diversity dataset looks at major enrollment at Colorado State University. Due to file restrictions in ZyBooks, it is broken up into two files. CS150_Gender_Data_Older.csv and CS150_Gender_Data.csv The examples below use the CS150_Gender_Data.csv which includes this semesters data. The following columns are setup in the CSV.
-
PRIMARY_COLLEGE
This has the option of either NS or EG -
PRIMARY_COLLEGE_DESC
Paired with Natural Science or Engineering -
PRIMARY_DEPARTMENT_DESC
The department in which the major is hosted, for example, the Computer Science department has both Applied Computing Technology and Computer Science majors -
PRIMARY_MAJOR_DESC
The name of the major. This does not include concentrations (just some majors are very descriptive in engineering) - TERM
The term in which this data is pulled. It is year, term (by start month), and 0 - For example:- 201890 - Is Fall (September) 2018
- 201610 - Is Spring 2016
- 201760 - Is Summer 2017
- GENDER
M or F. It is safe to assume all of them have an M or F due to how CSU currently handles the data. For those interested, there are a few of us working on a change to include non-binary for future years, but as this is ‘historic’ data, it is still stored as the binary descriptors.
Required Methods To Implement (graded)
We will grade the following methods. Please note, that while there may be different ways to implement them, and you are free to even write helper methods (which we did ourselves) - we need the method names to match the following specification.
CSVReader
This file specifically is used to read the Comma Separated Value files using a Scanner Object. We also used it to help store the indices of the columns in constant variables for easy use in GenderStats.java.
public void initialize(String file)
This method will initialize a class level scanner object based on a File (new File(…)). The name of the file will be passed in. You can assume it is a correct name, but you should also try and catch the IOException that is required by calling new file. The following code can help you get started. You may also want to look at the Digital Humanities lab for an example.
Here is some example code that will help you.
try {
fileScanner = new Scanner(new File(file));
}catch (IOException io) {
io.printStackTrace();
}
public boolean hasNext()
Returns if the scanner has more lines to read if the scanner has been initialized. If the scanner hasn’t been initialized, it will return false. Looking at how scanner checks to see if more lines need to be read will help with this method.
public String[] getNext()
If the scanner has more lines to read, it reads the line and returns a String array of all the values in the line - broken up by the comma (‘,’) delimiter! This is essentially how CSV files are stored.
Stats.java
This file is the main driver file of your program. It will help figure out what things you want to calculate.
Output
The other test all look at the output to standard out, so you are free to name the methods whatever you want, as long as they print out in a similar manner. I strip all white space in the tests and ignore case, so that isn’t as big of a deal. The output should be in the format of
Computer Engineering: Males: 93.33% Females: 6.67%
We are looking at the average number of males and females in computer engineering in the above line. All tests that we require look at the average. Also, all tests only look at the 201890 term, so it may make sense to ignore any data from another term until you have completed the tests - and want to dive deeper into the data.
Suggestions / Insights
When working through this dataset, we found helper methods were essential to break up the work. For example, we put together a getMajor(String major) method that returned the index of the major from an array of majors. This helped simplify the code later. We also used that index to help with placement of my counter variables for the different things I wanted to count. Arrays are essential to get this done without a bunch of typing. Our actual solution didn’t have that many lines of code. It also helped to put static final variables in the CSVReader file that matches up with the column names. This way, my other code would know which index was which column. For example:
public final static int GENDER = 5;
/// and in Stats.java
if(line[CSVReader.GENDER].equalsIgnoreCase("F")) { /*do something */}
It is also good to figure what tests you want to look at before you start to write code. Does the number of majors matter just as much as a percent? We are asking for the Average, but maybe standard deviation is something to look at? Maybe you want to look at the raw numbers more than the average, as some majors only have a handful of majors - making percentages exaggerated.
Lastly, the sort test was by far the hardest - and is intentionally only worth 1 point. You can probably perform all the tests that you need without sorting, but I found the sorted answer to be insightful. We also only look at the 201890 term, but you may want to look at other terms - as getting a trend may be helpful.
Example Outputs
Natural Sciences: Males: 41.03% Females: 58.97%
Psychology: Males: 24.94% Females: 75.06%
Chemistry: Males: 53.29% Females: 46.71%
Computer Science: Males: 88.39% Females: 11.61%
Biological Science: Males: 30.05% Females: 69.95%
Mechanical Engineering: Males: 87.29% Females: 12.71%
Chemical & Biological Engineer: Males: 67.00% Females: 33.00%
Mathematics: Males: 58.41% Females: 41.59%
Zoology: Males: 20.22% Females: 79.78%
Applied Computing Technology: Males: 87.77% Females: 12.23%
Civil Engineering: Males: 74.51% Females: 25.49%
Environmental Engineering: Males: 54.29% Females: 45.71%
Electrical Engineering: Males: 89.41% Females: 10.59%
Biomedical Engineering with EE: Males: 66.67% Females: 33.33%
Biochemistry: Males: 45.48% Females: 54.52%
Computer Engineering: Males: 93.33% Females: 6.67%
Physics: Males: 79.44% Females: 20.56%
Engineering Science: Males: 77.36% Females: 22.64%
Statistics: Males: 67.69% Females: 32.31%
Biomedical Engineering with ME: Males: 53.91% Females: 46.09%
Biomedical Engineering with EL: Males: 80.00% Females: 20.00%
Biomedical Engineering with CB: Males: 46.75% Females: 53.25%
Engrg Sci and Intl Studies: Males: 27.27% Females: 72.73%
Engineering Open Option: Males: 82.93% Females: 17.07%
==========
Sorted Values
Computer Engineering: Males: 93.33% Females: 6.67%
Electrical Engineering: Males: 89.41% Females: 10.59%
Computer Science: Males: 88.39% Females: 11.61%
Applied Computing Technology: Males: 87.77% Females: 12.23%
Mechanical Engineering: Males: 87.29% Females: 12.71%
Engineering Open Option: Males: 82.93% Females: 17.07%
Biomedical Engineering with EL: Males: 80.00% Females: 20.00%
Physics: Males: 79.44% Females: 20.56%
Engineering Science: Males: 77.36% Females: 22.64%
Civil Engineering: Males: 74.51% Females: 25.49%
Statistics: Males: 67.69% Females: 32.31%
Chemical & Biological Engineer: Males: 67.00% Females: 33.00%
Biomedical Engineering with EE: Males: 66.67% Females: 33.33%
Mathematics: Males: 58.41% Females: 41.59%
Environmental Engineering: Males: 54.29% Females: 45.71%
Biomedical Engineering with ME: Males: 53.91% Females: 46.09%
Chemistry: Males: 53.29% Females: 46.71%
Biomedical Engineering with CB: Males: 46.75% Females: 53.25%
Biochemistry: Males: 45.48% Females: 54.52%
Natural Sciences: Males: 41.03% Females: 58.97%
Biological Science: Males: 30.05% Females: 69.95%
Engrg Sci and Intl Studies: Males: 27.27% Females: 72.73%
Psychology: Males: 24.94% Females: 75.06%
Zoology: Males: 20.22% Females: 79.78%
Natural Sciences: Males: 42.65% Females: 57.35%
Engineering: Males: 74.91% Females: 25.09%
Beyond that, the rest is up to you! Make sure to figure out additional tests to look at to help you better analyze the data. Start simple and build up.
Reference
- Colorado State University, Majors in Colleges of Natural Sciences and Engineering, 2016-2018, Colorado State University, Fort Collins Colorado, retrieved 2018.