The goal of this assignment is to understand the concept of Java interfaces while implementing a classical substitution cipher. A cipher is used to encrypt and decrypt information. Encryption refers to the process or algorithm that converts plaintext into a coded or encrypted version, called cipher text. Decryption refers to the conversion of cipher text to plaintext. For details on what a cipher is, refer to this link on Wikipedia.
Because of time constraints, we will implement only one cipher called the Caesar Cipher. We will also implement a code breaker because sometimes there is a need to decrypt ciphertext, but you do not know the key used for encryption.
You will code the CaesarCipher class, which implements the Cipher interface using the Caesar cipher algorithm. You may refer to this link on Wikipedia for a detailed description. The Caesar cipher is one of the simplest classical encryption algorithms, and belongs to the category of substitution ciphers.
Each character in the plaintext is subsituted with a character coming a fixed number of places down the alphabet. For example, using a shift of 3 characters, we get the following table of character substitutions:
Original character |
Substituted character |
Original character |
Substituted character |
A | D | a | d |
B | E | b | e |
C | F | c | f |
... | ... | ... | ... |
Z | C | z | c |
Thus, a plaintext string "Cab" is encrypted as "Fde". A cipher text string "Fde" is decrypted as "Cab" by substituting each character with the character coming 3 position earlier in the alphabet. Also note that uppercaese letters are treated separately, i.e. Z is encrypted as C, and z is encrypted as c.
For this assignment assume that the plaintext to be encrypted can contain the letters A-Z, a-z, and spaces (but no tabs or other whitespace). No punctuation characters will be included in the text that needs to be encrypted. The ciphertext used as an argument to the decrypt method can only contain characters that correspond to valid substitutions of the characters specified for the plaintext.
Implement the CaesarCipher class as follows:
Constructor:
List of attributes:
List of Methods:
The Sentry class is used to encrypt and decrypt text files using a given cipher. The Sentry class stores information on which cipher to use and can encrypt the contents of a text file.
Implement the Sentry class as follows:
Constructor:
List of attributes:
List of methods:
System.err.println(e); System.exit(0);
The encrypt method in the Sentry class treats each line as a string and calls the encrypt method of the associated cipher instance variable. The encrypt method of Sentry uses recursion to process the lines from the input file in reverse order, and encrypts each line separately. The encrypted lines are printed in an output text file. The output file thus contains the encrypted version of each line in the input file, but the lines appear in the reverse order. Assume that the lines in the text file only contain characters as described above in the Caesar cipher algorithm above.
For example, suppose that a Caesar cipher is used with a shift parameter of 3. Given this plaintext file, here is the resulting encrypted file. If you decrypt the encrypted file, you should get back the original input file.
System.err.println(e); System.exit(0);
Decryption works in a corresponding way as encryption. It recursively reverses the order of lines in an encrypted file while decrypting each line separately, and finally produces a file containing the decrypted lines.
The Caesar cipher can be easily broken because there are only 26 possible shifts possible for the 26 letters in the English alphabet. Thus, a brute force approach can be applied by trying out all possible shifts. One of these shifts must be able to decrypt the text correctly.
While a human can easily detect the correct decryption of a ciphertext, it is also straightforward to design an algorithm that does that. Our approach is based on the observation that the letters in English tend to appear with certain characteristic frequencies. If one has access to a large piece of text, then one can train the program to learn the frequencies. A trained program can decrypt new text by determining which of the 26 shifts results in frequencies that best match the known frequencies.
There are two main steps used by our code breaker:
Here is a sample plain text file that is "large enough" to give us a pretty good idea of the frequency of occurrence of each letter. Note that this file contains spaces and punctuation but you are only interested in determining the frequencies of the letters A-Z and a-z. The frequency is defined as the fraction of times each letter occurs in the text. That is, it is the number of times a particular letter occurs divided by the total number of letters in the text. Do not include non-letter characters in any of the counts. Furthermore, do not distinguish between uppercase and lowercase. That is, there is only one frequency for the letters 'A' and 'a' (not two), one for 'B' and 'b', and so on.
The training step produces an array of doubles. This array has 26
entries, each storing the frequency of a letter. This array is called knownFrequencies.
public final int NUMBER_OF_LETTERS = 26; private double[] knownFrequencies = new double[NUMBER_OF_LETTERS];
The decryption algorithm needs to determine the value of the shift parameter that was used to produce the ciphertext. Once the shift is known, decryption can be performed using the Caesar cipher decryption technique.
In the training step we already determined the array of known frequencies. If you perform a similar frequency analysis of the ciphertext file, you will obtain an array of frequencies of letters in the ciphertext. Let us call this array observedFrequencies. Note that we cannot directly compare the frequencies because the observedFrequencies array is for letters in the ciphertext, while the knownFrequencies array is for letters in the plaintext. The plaintext letters were substituted to obtain the letters in the ciphertext.
Try out all possible shifts (0..25) of the observedFrequencies array using wraparound. That is, for shift of one A becomes B, B becomes C, ..., and Z becomes A. With shift of two, A becomes C, B becomes D, ... , Y becomes A, and Z becomes B. A shift of 0 corresponds to the original observedFrequencies array. For one of those shifts, the observedFrequencies will match the knownFrequencies best. To do this we need a measure that quantifies how well observedFrequencies match knownFrequencies. There exist several sophisticated distance measures (e.g., chi-squared statistic). However, in this assignment we will use a simple measure to calculate the distance between two frequency arrays, A and B.
Distance = ∑ (Math.abs(A[i] - B[i])),
where the summation is over i, which ranges over 0..NUMBER_OF_LETTERS-1.
Note that a small value of the distance implies a better match.
Download the class CodeBreaker from here. Implement the following methods in the class.
This method is needed for testing to make sure you generated the correct array of known frequencies. It helps with grading and providing partial credit.
This method is needed when we test your code. We will need to bypass the training method while testing the decryption functionality just in case your training method isn't working correctly. It helps with grading and providing partial credit.
This method implements the training step (section 3.1). It sets the array knownFrequencies.
Assume the training file exists. For any exceptions that you need to catch, use the following text inside the catch block.
System.err.println(e); System.exit(0);
This method decrypts the contents of the file referred to by cipherTextFileName, and produces an output file called outputFileName, which contains the decrypted text. Note that the order of lines will be the same unlike in the case with the Sentry class that was also reversing the order of lines in a file. This method returns the value of the shift parameter that was used to encrypt the text.
Assume the file to be decrypted exists. For any exceptions that you need to catch, use the following text inside the catch block.
System.err.println(e); System.exit(0);
As always, remember that we will not test your main method. The methods you implement will be tested directly by the test server. Ideally, you should create your own main methods, perhaps one in each class, to test each class separately and together. Preliminary testing will cover some methods and a limited set of situations in these methods. Final testing may use other files (both names and contents may differ).
Here is a sample barebones main method for the CaesarCipher class. It does not test all the methods, nor does it test for various situations.
public static void main(String[] args){ CaesarCipher cipher = new CaesarCipher(3); String plainText = "The enemy was spotted"; System.out.println("Plaintext: " + plainText); String cipherText = cipher.encrypt(plainText); System.out.println("After encryption, cipherText: " + cipherText); String backToPlainText = cipher.decrypt(cipherText); System.out.println("After decryption, plainText: " + backToPlainText); if(!plainText.equals(backToPlainText)) System.err.println("Fix your program!"); }
Here is a sample barebones main method for the Sentry class. It does not test all the methods, nor does it test for various situations.
public static void main(String[] args) { Sentry sentry = new Sentry(new CaesarCipher(3)); sentry.encrypt("encrypt-input.txt", "encrypt-output.txt"); sentry.decrypt("encrypt-output.txt", "decrypt-output.txt"); }
The CodeBreaker.java file contails a sample barebones main method to help you get started testing that program.
Create a single jar file called P5.jar from the four Java files (Cipher, CaesarCipher, Sentry, and CodeBreaker) using the instructions provided here. Do not add other files (e.g., class files).
Submit the P5.jar file via the online checkin system.