All assignments are due at 8:00 PM on the due date. There is a late penalty of 7.5% per-day for up to a maximum of 2 days. All assignments will be posted at least 2 weeks prior to its due date. We will have a mix of both written and programming assignments. All assignments will be posted on this page. All assignments should be submitted using Canvas.
Tentative assignment release schedule is listed below:
|
Posted |
Due |
CSx55-HW1
|
1/16/24 |
2/14/24 |
|
|
|
|
|
|
CS455-HW2
|
2/7/24 |
3/6/24 |
CS455-HW3 |
3/20/24 |
4/17/24 |
CS455-HW4 |
4/3/24 |
5/1/24 |
|
|
|
|
|
|
CS555-HW2 |
2/7/24 |
3/6/24 |
CS555-HW3 |
3/18/24 |
4/17/24 |
CS555-HW4 |
4/3/24 |
5/1/24 |
|
|
|
|
|
|
CSx55-TermProject |
|
|
CS455/555: HW1 Using Dijkstra’s Shortest Paths to Route Packets in a Network Overlay
The objective of this assignment is to get you familiar with coding in a distributed setting where you need to manage the underlying communications between nodes. Upon completion of this assignment you will have a set of reusable classes that you will be able to draw upon. As part of this assignment you will be: (1) constructing a logical overlay over a distributed set of nodes, and then (2) computing shortest paths using Dijkstra’s algorithm to route packets in the system. Additional details are available here.
CS455: HW2 Synchronization and Coordination Using Thread Pools
The objective of this assignment is to get you to be comfortable with threads and synchronization mechanisms. Another objective of this assignment is to introduce the role that data structures and locking mechanisms play in designing concurrent programs. Additional details are available here.
CS555: HW2 Distributed Load Balancing of Computational Tasks Using Thread Pools
As part of this assignment, you will be leveraging thread pools in a distributed environment to alleviate computational load imbalances across a set of computational nodes. The computational task being load balanced in this assignment is similar to the proof of work computation that is performed in cryptocurrencies such as BitCoin. Additional details about this assignment are available here.
Both the Task and Miner are available.
CS455: HW3 Analyzing the Million Song Dataset Using MapReduce
As part of this assignment you will be developing MapReduce programs that parse and process the Million Song dataset to support knowledge extraction over different features like songs genres, artists etc. You will be using Apache Hadoop
(version 3.3.6) to implement this assignment. Additional details can be found here.
CS555: HW3 Implementing the Chord P2P System
As part of this assignment you will be implementing a structured P2P system. Specifically, you will be implementing the Chord P2P network where individual peers have a 32-bit identifer, and thus can support up to 4 billion peers. This assignment will account for 15 points towards your cumulative course grade. Additional details can be found here.
CS455: HW4: Analyzing the MovieLens Dataset Using Spark
The objective of this assignment is to gain experience in developing Spark programs. As part of this assignment, you will be working with the MovieLens dataset that describe ratings and free-text tagging activities from MovieLens, a movie recommendation service. This dataset was created by GroupLens and primarily hosted at Kaggle. You will be using Apache Spark to implement this assignment. Additional details are available here.
CS555: HW4 Building a Distributed, Replicated, and Fault Tolerant File System: Contrasting Replication and Erasure Coding
The objective of this assignment is to build a distributed, failure-resilient file system. The fault tolerance for files is achieved using two techniques: replication and erasure coding. This assignment has several sub-items associated with it. Additional details are available here.
Term Project & Paper: Scalable Distributed Analytics
[Group assignment: Teams of 2-3 CS455 students and 2 CS555 students]
As part of this assignment you will be doing a term project that involves using Apache Spark, TensorFlow, or PyTorch for performing analytics over 2 or more spatial datasets: a rich set of datasets is available at: https://urban-sustain.org . Your system or application should execute on a minimum of 10 machines. The problem should be data-intensive and/or compute-intensive. Additional details about the Term Project are available here.
|