The objectives of this assignment are to use (1) use instance variables, and (2) ArrayList of objects. You will build on P1 and add new functionality. You will implement two classes Tweet and TwitterDB. The Tweet class captures each tweet along with the userid and date/time of posting. The TwitterDB class is used to gather information from the tweets as a whole.
You will be able to reuse some code from P1 (copy-paste with appropriate modification). However, you must fix an error that was present in the sample code provided for setting up the scanner delimiter. Use the following code instead (note the two backslashes before dash):
useDelimiter("[ *\\-,!?.]+")
Implement the Tweet class in a file called called Tweet.java. You must use the following three instance variables:
This constructor takes three strings corresponding to the three columns of the tweets file. A Tweet object is constructed from the arguments. The String dateTime is converted to a java.util.Date in order to make date comparisons easier. Refer to the documentation of Date and SimpleDateFormat to figure out what methods are useful. Here is how to convert the String dateTime to Date date:
// Put the import statements at the top of your file import java.text.SimpleDateFormat; import java.util.Date; ... ... // Put the following statements in an appropriate place in the constructor SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss"); try { this.date = sdf.parse(dateTime); } catch(ParseException e) { System.exit(0); }
This returns the userID instance variable inside the class.
This returns the date instance variable inside the class.
This returns the tweet instance variable inside the class.
This returns the number of characters in the tweet instance variable.
Two tweets are equal if the tweeted message (instance variable String tweet) in the two classes are equal. Note that we do not care about the user id or the date.
Also note that the argument is of type Object instead of Tweet. That is because you are overriding the equals method provided in the Java class called Object. That means you need to check that the argument is of type Tweet when you compare the tweets.
This method returns a new String that is the concatenation of the userID, a tab character, the date converted to a String format using the toString() method in the Date class, a tab character, and the tweeted message in the end. Note that there should not be a newline character at the end.
Implement the following constructor:
The string given as input to the method will contain the name of a file that contains a collection of tweets. The file has three columns that are separated by tabs ('\t'). The first column contains the user ID, the second contains the date and time the post was made, and the third is the post itself. Here's an example file. Your constructor will read the file line-by-line, construct a Tweet instance from each line, and fill the instance variable called tweets with all the Tweet instances. You can assume the file exists and follows the required format. The tweets need to be stored in the same order as they appear in the file.
Implement the following methods:
This method returns an ArrayList of Tweet instances corresponding to the specified userID. The order of tweets must be preserved. If there are no tweets by the specified userID, return an empty ArrayList (not a null ArrayList).
This method returns an ArrayList of Tweet instances that were posted strictly before the date specified by dateTime. Assume the format of the String dateTime is what is used in the tweets file (i.e., "yyyy-MM-dd'T'HH:mm:ss"). The order of tweets must be preserved. If there are no tweets before the specified dateTime, return an empty ArrayList (not a null ArrayList).
Returns the most common word in the file of tweets. Note that only the third column of the file is to be considered, i.e., the third instance variable called String tweet (not userID or date). You will use the same delimiters as in P1. Your method should be case-insensitive, i.e. the word "help" appears capitalized, that should count as occurrences of the same word. The return value should be all lower case. Note that we are not using stop-words in P2.
Returns the first tweet (don't include userID or date in the search or the return value) that contains the word. You will use the same delimiters as in P1. Your method must be case-insensitive. If no tweet contains the word, return null (not an empty String). Return the tweet as is, i.e., don't convert the tweet to lower case or upper case when returning.
None of the methods above should be declared static. In fact, compilation of your program will fail if the method signatures in your program are different than above. To help you develop your program here is a tweets file. Preliminary testing using the online system will be performed using these file. You get results of preliminary testing almost instantaneously using the online submission system. To grade your program (final testing) we will use different files, so do not hardcode any parameters into your program. We encourage you to construct your own tweet files that test various scenarios (e.g., that you successfully ignore case).
Keep in mind that we will not test your main method -- the methods you implement will be tested directly. However, you should use your main to test the methods that you write. A barebones main can include something like:
public static void main(String[] args) { TwitterDB tdb = new TwitterDB("tweets.txt"); System.out.println("Number of tweets: " + tdb.getNumberOfTweets()); Tweet t = tdb.getTweet(0); System.out.println("Printing t:"); System.out.println(t); System.out.println("User ID: " + t.getUserID()); System.out.println("Tweet: " + t.getTweet()); System.out.println("Number of characters: " + t.numChars()); System.out.println(); ArrayListtweets = tdb.tweetsBy("USER_989b85bb"); System.out.println("USER_989b85bb sent how many tweets? " + tweets.size()); tweets = tdb.tweetsBefore("2010-03-07T18:26:13"); System.out.println("How many tweets before 2010-03-07T18:26:13? " + tweets.size()); System.out.println("Most common word: " + tdb.mostCommonWord()); System.out.println("First tweet that contains \"spirit\": " + tdb.search("spirit")); }
You can make the following assumptions about how we will test your methods:
During preliminary testing, your score equals the number of test cases passed. H owever, during final testing, certain test cases may be weighted more than other s. More difficult methods will be worth more points.
This assignment requires you to create two files, but the assignment submission system can only accept one file. Thus, we will need to "combine" the two Java files into one file. Here's how to do this.
Create a single jar file called P2.jar from the two program files using the instructions provided here. Check that you are only combining the source file (i.e., Tweet.java and TwitterDB.java) and not the class files (i.e., Tweet.class and TwitterDB.class) by mistake. Submit the P2.jar file via the online checkin system. This system performs preliminary testing of your program on the same data file above. Final grading will be performed on a different set of files.
The twitter dataset we provided to you is a subset of a much larger dataset which is available here.