CPSC 404

Taught by Dr. Ed Knorr

Overview

A theory and calculations based class about the internals of a relational database, including how sql queries are executed and different query plans, managing the buffer pool, using indexes for faster lookup, and handling concurrency and recovering from crashes etc. The course is very different from its predecessor CPSC 304, rather I would say the type of material is most similar to the lecture portion of operating systems classes like CPSC 313.

Lecture Topics

  • Database Indexes (tree and hash)
  • Buffer pool management
  • Disk Scheduling
  • Query Evaluation & Optimization
  • Transaction management and concurrency control
  • Crash Recovery

Coursework

3 Midterms + final exam (most of the marks allocated to exams)
In-class and pre-class exercises
A couple misc. written assignments

Experience

The class material is generally very straightforward if you’re okay with theory and some basic maths like calculating the number of IO’s for a sql query or sorting a table. And of course the exams do require some memorization.

However, I really have to complain about the pre-class and in-class exercises. The pre-class exercise is due 9am the day of every lecture, and the in-class is due 12pm the day after every lecture. And they’re usually several questions, and take up most of a page (written). As someone who might not listen to every lecture right away, please tell me how I can keep up with all the pre-class and in-class exercises.

Overall, a really easy and interesting class, and would be good to take alongside heavy coding-based classes, or just by itself if you don’t like coding. I actually enjoyed this class a lot more than cpsc 304.

Workload: 1/5 Difficulty: 1/5 Usefulness: 2/5

CPSC 416

Taught by Professor Ivan Beschastnikh

Overview

The focus of this course is on distributed systems - a system whose components are spread across multiple computers on a network. Because the system is distributed, we get better scalability via adding more computers or nodes, as well as fault tolerance, meaning the system can continue operating even if one or more computers fail. However, there are lots of design considerations and complexity that come with these advantages, for example: syncing the time or ordering the events of all the nodes in the system, handling failures of both centralised components and regular nodes, mutual exclusion and transactions, having multiple nodes reach a consensus, etc. The course goes through different design choices and tradeoffs for these issues, and also introduces current topics like bitcoin and MapReduce.

Lecture material aside, this is a very coding heavy course, with a large portion of the grades going towards assignments ~15% and a project ~45%, where the balance between the two changes year-to-year. The language is in Golang, and the assignments are marked based on correctness, determined by automated tracing scripts, whereas the project is open-ended and marked based on your completion of your project proposal.

Grading

Assignments: three for 5%, 10%, 15% respectively
Group project: 45%
Final exam: 20%
Participation (including piazza): 5%

Assignments

There were three assignments in total, with the first two being individual, and the last one in a group of three. I noticed the assignments change year-to-year, but our year’s displayed on the class website if you’d like to take a look. All assignments involve writing some client/server code in golang using remote procedure calls (RPC).

The first assignment was creating a client to interact with a server to play a game of nim, and the second assignment was creating a failure detection library, and adding it to the client such that it can switch to a different server when a server goes down.

The third assignment (groups of 3) was creating a coherent key-value store system including the client, server, and a coordinator node. The servers are arranged in a linked list topology, and provided get/put requests from the client, and should behave as if there is 1 server, and should withstand server failures. The specs are less clear, and describe desired functionality, instead of how you should write the protocols, and there are much more design choices to make.

Lastly, the group project (groups of 6) was open ended, meaning you could build any distributed system you’d like that. Nearly every group got 100% for this portion, because they marked based on whether or not you completed the tasks you specified in your project proposal.

Experience

I found the first two individual assignments to be pretty straightforward. The more difficult part was figuring out how to use golang and how to send/receive objects via udp, but otherwise very okay. I spent about ~5 hours on the first assignment, and a few more days with 3-4 hour each on the second one.

On the other hand, the third group assignment requires a lot of work (both coding and reading the specs), and if you have even 1 bad teammate, it will not be fun. And so this assignment gave me a lot of headaches, and not because the coding was super hard: given how much coding needed to be done, the amount of requirements and protocols that need to be implemented, and the spec being very long, it was really hard for me to have time to look over the parts of another group member, read the spec for their part, look at piazza for more information, look over their code, and communicate what they needed to add.

So the difficulty of the assignment itself is handleable (if everyone does their part), but not really if you have to deal with a bad team member. I wrote about ~900 lines of code for this assignment just for my part.

I think many people found the third assignment to be a lot of work, and when the professor released the outline for the project, worth 45%, to be due in a month, some people took to Piazza to voice their discontent. However, the project is actually a lot easier than the assignment, as it’s open-ended, meaning your group can pick what distributed system to build, based off of a paper or similar.

Our group chose to build a distributed consensus algorithm, Raft, that functioned as a key-value store database. Instead of building out the full algorithm as detailed in the paper, we used a centralised node for failure detection and creating the system topology. This was a really good decision, as we didn’t have to build out half of the Raft protocol, but we still had a lot to do nevertheless.

Workload: 5/5 Difficulty: 5/5 Usefulness: 5/5

Overall, I think this class is a must take as a senior computer science student, as it introduces core concepts of decentralisation, scalability, and fault-tolerance. There is definitely a lot of coding and learning to-do, but what else are you supposed to do as a cs student?