wiki:cs122a-2021-fall

Version 137 (modified by Mike Carey, 4 years ago) ( diff )

--

CS122A Fall 2021

Introduction to Data Management: The Hybrid Edition

Course Personnel

Instructor:

Discussion TAs:

Assignment TAs:

Readers:

Meeting Times & Places

What When Where Who
Lecture A Mon/Wed/Fri 3-3:50 PM SSLH 100 Mike
Discussion A1 Wed 5:00-5:50 PM RH 184 Kyle
Discussion A2 Wed 6:00-6:50 PM RH 184 Kyle
Discussion A3 Thu 7-7:50 PM DBH 1300 Kyle
Discussion A4 Thu 8-8:50 PM DBH 1300 Kyle
Lecture B Mon/Wed/Fri 4-4:50 PM SSLH 100 Mike
Discussion B1 Thu 5:00-5:50 PM DBH 1300 Nada
Discussion B2 Thu 6:00-6:50 PM DBH 1300 Nada
Discussion B3 Wed 7-7:50 PM DBH 1300 Nada
Discussion B4 Wed 8-8:50 PM DBH 1300 Nada

Course Objectives

This course provides students with an introduction to the design of databases and the use of database management systems in support of applications. It covers the entity-relationship (E-R) approach to logical database design. It then covers the relational data model, mapping of E/R designs to relations, relational database design principles, abstract query languages such as the relational algebra and relational calculus, and the industry-standard query language, SQL. It also covers indexing and physical database design, Students will gain exposure to how relational database management systems are used to manage an actual database. Time permitting, the course will also touch briefly on advanced database management topics such as semi-structured data management ("NoSQL"), more advanced data analytics, and transactions.

This course is aimed at database design and the use of database management systems in building database applications. It feeds into a follow-on project course, CS122B, whose focus is data-centric Web applications. The CS122A/B course sequence does NOT cover the internal workings of database systems; that material is covered in the undergraduate course CS122C (or its graduate equivalent, CS222) and the graduate-level follow-on course CS223. (The course textbook also delves further into that material for those students who are curious about what goes on under the hood.) Interested students are strongly encouraged to take one, two, or all of these courses; CS122B and CS122C/CS222 are independent, as each one requires only CS122A as their required database background. Also available as a newer CS122A follow-on course is CS1222D, whose focus is on new post-relational data management technologies, i.e., "NoSQL" databases and Big Data management platforms.

Prerequisites

Students should ideally have some experience programming in Python, Java, C++, or C#.

Required Textbooks

Database System Concepts (Seventh Edition) by A. Silberschatz, H. Korth, and S. Sudarshan. (Note: Older editions are not okay, as this book has just been chosen for its up-to-date content (as of 2020)!) The book is available via the UCI bookstore, from the publisher as a permanently-owned eBook, or from online sources such as Amazon.com.


Topic Coverage and Exam Schedule

Syllabus

Topic Reading (Required!)
Databases and DB Systems Ch. 1
Entity-Relationship (E-R) Data Model Ch. 6.1-6.5, 6.8-6.9
Relational Data Model Ch. 2.1-2.4, 3.1-3.2
E-R to Relational Translation Ch. 6.6-6.7
Relational Design Theory Ch. 7.1-7.4.2
Midterm Exam 1 Fri, Oct 22 (during lecture time)
Relational Algebra Ch. 2.5-2.7
Relational Calculus Wikipedia: Tuple relational calculus
SQL Basics (SPJ and Nested Queries) Ch. 3.3-3.5
SQL Analytics: Aggregation, Nulls, and Outer Joins Ch. 3.6-3.9, 4.1
Advanced SQL: Constraints, Triggers, Views, and Security Ch. 4.2, 4.4-4.5, 4.7
Midterm Exam 2 Mon, Nov 15 (during lecture time)
Storage Ch. 12.1-12.4, 12.6-12.7
Indexing Ch. 14.1-14.4, 14.5
Physical DB Design Ch. 14.6-14.7, 15.1-15.3, 15.5.3
Semistructured Data Management (a.k.a. NoSQL) Ch. 8.1, AsterixDB SQL++ Primer, Couchbase SQL++ Book
Data Science 1: Advanced SQL Analytics Ch. 5.5, 11.3
Data Science 2: Notebooks, Dataframes, and Python/Pandas Lecture notes and Jupyter notebook
Basics of Transactions Ch. 4.3, Ch. 17
Endterm Exam Fri, Dec 3 (during lecture time)

Midterm Exam 1

Time: Fri, Oct 22, Lecture Time
Place: SSLH 100
Midterm 1 (3PM)
Midterm 1 (3PM) Solution
Midterm 1 (4PM)
Midterm 1 (4PM) Solution

Midterm Exam 2

Time: Mon, Nov 15, Lecture Time
Place: SSLH 100
Midterm 2 (3PM)
Midterm 2 (3PM) Solution
Midterm 2 (4PM)
Midterm 2 (4PM) Solution

Endterm Exam

Time: Fri, Dec 3, Lecture Time
Place: SSLH 100
Endterm
Endterm Solution

Helpful Exam Resources

Example Midterm 1
Solution to example Midterm 1
Example Midterm 2
Solution to example Midterm 2
Example Endterm
Solution to example Endterm


Exams, Assignments, and Grading

Grading Criteria

Exams: 36% (3 x 12%)
Homework: 56%
Quizzes: 5%
Participation: 3%

Exam Policies

There will be three equally-weighted, non-cumulative exams this quarter. Together they will contribute a total of 36% towards your final grade in the class.

The exam dates have been provided at the start of the term (see above!). These dates are not flexible and makeup exams will not be offered, so please avoid scheduling interviews, mini-vacations, or any other activities in ways that would interfere with exam-taking. All students will be required to take each exam on the same date in the same time window(s) in order to ensure that we can give, grade, and fairly curve their whole class using a single exam. The exams will be held in-person this quarter. (Location - but not timing - accommodations will be made for those few students who obtain UCI exceptions to in-person participation or have truly hardship reasons - and explicit prior permission - to be remote during the exams.)

Homework and Participation

Homework assignments must be turned in on Gradescope by the assigned due dates/times. Details of exactly how to turn in a given assignment will be included in each assignment's handout. The fraction of your grade attributed to homework will be based on your top 7 out of 8 homework scores. (You can spend the 8th one however you like, but if you do decide to skip an assignment, keep in mind that the material will still be on the exams and that the homework is an excellent study/practice tool for those -- so you will still be responsible for mastering any "skipped" material.)

There will also be short weekly quizzes to give you further practice with the course material; those will count very little towards your final grade (taking them will contribute 5%), and they're just there for your benefit (so you can check your understanding of things and see what questions you might want to ask). Taking each quiz (regardless of score) will earn you it's weekly points. Quizzes other than the very first one must be taken by its posted deadline (no exceptions). The quizzes will be made available by Wednesday's lecture time each week, and time will be carved out to let you take them in class (and then review the answers via Q&A in discussions).

Grade Change Policy

For all of the graded assignments as well as the two midterm exams, if you disagree with the grading, you may raise your concerns with the relevant instructor (Professor, TA, or Reader) within two weeks after they are returned. After that, all grades will be considered final. 'Gradescope's regrading request feature (not e-mail) is the place to handle grading issues.' Do not wait until the end of the term, as two weeks means two weeks, and no regrading requests will be entertained after final grades have been posted.

Collaboration Policy

Homework assignments are to be completed individually, but you are strongly encouraged to pair up with a fellow student -- your brainstorming buddy -- for the duration of the quarter. It is okay to discuss assignments with other peers as well, e.g., to clarify the interpretation of a question or to compare thoughts on very rough approaches, but discussions of the details of your work are to stay within your team of two. You should pick a brainstorming partner at the start of the term and then stay with that person for the remainder of the quarter. See http://www.ics.uci.edu/ugrad/policies/index.php#academic_honesty for a good discussion of what is/isn't considered honest collaboration. The three exams are to be done solo, but will be open cheat sheet and open book. :-) Note that UCI's academic honesty policy applies both to person-to-person and to social media interactions (more on this below).

Late Policy

Due dates will be clearly indicated on all HW assignments. Assignments will still be accepted for up to one day (24 hours) after the due date, but you will lose 10 points (out of the 100-point total per assignment) for such lateness. We will not accept assignments in any form whatsoever after that time. "Stuff happens" sometimes, especially in COVID times (!), so you should anticipate that reality and avoid working up until the very last minute. Assignments MUST be turned in per the assignment's instructions by the indicated deadline date/time in order to get full credit. (Note: The best-7-out-of-8 grading policy is another fallback for what might otherwise be late homework assignments; that fallback will work once, but obviously just once.) We will release an official solution for most assignments 24 hours after its due date/time; as a result, we will stop accepting turn-in's at that time.

Academic Honesty Policy

Cheating is the one area where the mostly empathetic instructor for this course has zero patience or sympathy. You are here at UCI to learn, and cheating totally defeats that purpose. If you cheat in real life, on the job, you could actually bring down a company - so I do my part to avoid graduating students who would do that. All students are expected to adhere to the UCI/ICS Academic Honesty policies (see https://aisc.uci.edu/policies/academic-integrity/index.php and http://www.ics.uci.edu/ugrad/policies/index.php#academic_honesty to read their details). Any student found to be involved in cheating or aiding others in doing so may be academically prosecuted to the maximum extent possible: you could even fail the course in its entirety. (Ask around - I have done that before.) Just say no to cheating!!! By the way, it is entirely fine to look at old CS122A exams, old assignments, old quizzes, old solutions, etc. (Studying such materials will help you practice and learn!) Again, your goal here should be to learn - not get some particular grade - and in this course we will ask you to please be on the honor system and behave accordingly! Note that with the test-like way that companies interview prospective employees nowadays, you won't make it far past the door if you don't know your stuff.

Late Enrollment Policy

This class moves fast and it pretty much hits the ground running, so enrolling after missing more than the first lecture or two is highly discouraged. If you foresee a need to enroll late, e.g., due to waiting list or financial aid issues, you must do all of the work for the class from the beginning as if you are already enrolled. (The instructor will be happy to add you as an "Additional Student" so that you are fully able to do so w.r.t. accessing all course materials.) No special accommodations will be made for students who do not do this -- so whatever you miss, you miss and will lose points for. (The course already has some flexibility built into the grading policy.)


DBMS Platform

This class will use an industrial-strength relational database management system (RDBMS) for the hands-on homework assignments. In some past terms we have allowed students to choose between MySQL (an open source DBMS), DB2 (from IBM), and/or another RDBMS of their choosing. This quarter we will be suggesting that everyone use the same system - and a different one namely PostgreSQL. (This is a good RDBMS to have on your resume.) We will tell you when it's time to install and start to use PostgreSQL. (More information about that will appear here soon, so keep an eye on this space.) You will also find that Google searches are a great source of information and issues/solutions related to open source DBMSs, as there are frequently other users out there in the Googlesphere having the same problem that you're having. Again, we will not be letting students use a different DBMS this quarter because some assignments will use SQL features that PostgreSQL has but less complete systems, e.g., MySQL, do not have, and/or features that are not entirely standardized across different DBMSs.


CS122A In Person Again? For Real?

Nope! (Thank you Delta variant, sigh.) Well, okay, it will be sort of in person - let the Great Hybrid Database Adventure begin...

The last few offerings of CS122A have been online, so many kinks have been worked out related to remote participation in the class. And, with the sizes of CS122A classes in recent years, much of what we do was already happening online anyway. This term CS122A will be run in a way that will offer some in-person-ness but should also be remote-friendly. Lectures will be still be prerecorded (and distributed via Yuja/Canvas). They will be available for viewing by the day of the appointed lecture time at the latest - and hopefully sooner in most cases. Students will be expected to watch them before the lecture times, and then the instructor will appear on campus, in person (!), in the appointed lecture hall, during some of the official lecture times. The agenda for those in-person sessions will be Q&A, clarifications, review of sticking points, quizzes/tests, and whatever else we end up deciding would be useful. (Please feel free to suggest ways to improve those times as we go along - this is an experiment with an instructional method called the "flipped classroom"!) We will also try to monitor the class discussion forum during these sessions, and remote students can inject questions that way - and the live answers will then be available for later viewing by the remote questioners, as the "flipped" sessions will also be recorded for later viewing. The course's three exams will be held in person - those will be the only times when attendance is mandatory, in fact. The discussion sessions will be held live (!) on campus as well, and they will focus more on homework-related issues; those may also be Zoom-recorded if we can make that work. PDF lecture slides and quiz solutions will be made available after the relevant sessions for later reference as well.

For all of this to work, everyone will need a laptop capable of running PostgreSQL, workable Internet access, and the ability to access (or forward) UCI e-mail, Piazza, Gradescope, and Canvas+Yuja. If this is not you, this unfortunately won't be a good time for you to take this class, as we won't be able to provide workarounds for those dependencies. (Sorry!) You should verify your ability to meet these requirements during the first week of class, before committing to taking it for sure. (There is often a waiting list for this class, so it's important to make space for those who are will be able to fully participate.) You will want - actually need - to bring your laptop with you to each class meeting. Doing so will allow you to fully participate in class (as well as checking your e-mail and social media when the instructor is boring you. :-)). We will be using both Gradescope and PostgreSQL live during some of the class meetings.

Warning: This will not be like a real, well-designed, smoothly orchestrated "we meant to use this flipped classroom approach" class. This will be us doing our best to give you the best possible CS122A educational experience during a time when the world is still in a bit of a messy pandemic state. Please be patient and set your expectations accordingly! Don't be surprised if the lecture quality is "earthy" or if my self-appointed "emotional support cat" (Tustin) wanders through the lecture video background at some point... (I am often fighting a losing battle for at-home desktop real estate with this particular cat...) That being said, however, I am optimistic - I'm confident that we will indeed be able to deliver a full-quality version of CS122A this way! Spring and Fall 2020 both went surprisingly well online, so I'm hoping that our hybrid Fall 2020 will too. Exactly how we used our flipped times may evolve over the course of the quarter - e.g., if there is not enough interest in having that much F2F time, we might move to a model where Mondays and Wednesdays are F2F but Fridays are on-line (except during exams). Let's see how it goes and what you all seem to want!


Discussion Forums for All Things CS122A

We will use Piazza (heavily!) for online class discussions and announcements. Piazza aims to get you the help you need fast and efficiently from classmates, the TAs, the readers, and the instructor. Rather than emailing course or HW content questions to the teaching staff, you will be expected to post them on Piazza. We've used Piazza for this class many times before, and it works remarkably well - you have probably used it in at least a few of your other classes too. You can find our class Piazza page at http://piazza.com/uci/fall2021/cs122a/home. With roughly 360 students enrolled in the two halves of the course, even when it is offered face-to-face, Piazza is essentially the only hope for managing the class and getting everyone's questions answered -- office hours become hopeless at this scale. Your Piazza activity will also contribute 2% to your overall grade - non-involvement or misbehavior on Piazza may lead to loss of those points. Note: Although some courses at UCI will be migrating to EdDiscussion instead of Piazza, we will be sticking with Piazza for this quarter, as we don't want to experiment with a new discussion platform while also experimenting for the first time with a flipped classroom (!).

Note: You are expected to pay attention to Piazza, as that will be the primary communication mechanism for all sorts of course announcements. If you miss one, that'll be on you; "I didn't know" won't fly. (:-))

When using Piazza, there are a few things to keep in mind. Firstly, despite an unfortunate national trend towards accepting nastiness in social media, hurtful Piazza behavior will not be tolerated - so be kind to your classmates. Secondly, Piazza is a wonderful resource for asking and answering questions - as long as it is used thoughtfully. Please avoid re-asking questions that have already been asked and answered - you are responsible for reading others' questions and not re-asking them. With 400+ eyes on each message, lazy question-asking on Piazza is costly and inconsiderate, and such repeated behavior may lead to a loss of Piazza points. Finally, Piazza is not a place to discuss the details of answers to HW problems - i.e., it is not a place to post, request, or compare answers to specific assigned problems! Doing so would actually risk your violating the Academic Honesty Policy, as everyone is expected to ultimately do their own work. (Piazza is a fine place to chat with one another about answers to old sample exams, however.)

As you are aware, in addition to our lecture times, you have a weekly discussion session meeting - and attendance of those is highly encouraged. If for some reason you have to miss your session during one week, you can request permission (in advance!) to attend a different one - but please try to avoid that for room load-balancing and health/safety reasons. You will be responsible for being aware of any and all material covered in your discussion sessions, so if you do opt out or have to miss one, be sure to talk to a classmate and/or watch the Zoom recording of the missed session (if available). The discussion sessions will also serve (importantly!) as more "group office hours" - especially for homework-related issues - so come armed with things you'd like to have clarified or expanded on related to the homework assignments.


Homework Assignments

Due Date Topic HW Assignment HW Solution Other Useful Info
Wed, Oct 6 (6:00 PM Pacific) E-R Modeling HW1 Details Template HW1 Solution Setting Up PostgreSQL
Wed, Oct 13 (6:00 PM Pacific) E-R to Relational Translation HW2 Details Template HW2 Solution Solution SQL File
Wed, Oct 20 (6:00 PM Pacific) Principled Relational DB Design HW3 Details Template HW3 Solution
Fri, Oct 29 (6:00 PM Pacific) Relational Algebra HW4 Details Template HW4 Solution RelaX Instructions
Fri, Nov 5 (6:00 PM Pacific) SQL HW5 Details Template HW5 Solution
Fri, Nov 12 (6:00 PM Pacific) More SQL HW6 Details Template HW6 Solution
Mon, Nov 22 (6:00 PM Pacific) Physical DB Design HW7 Details Template HW7 Solution
Wed, Dec 1 (6:00 PM Pacific) NoSQL and Analytics HW8 Details Template AsterixDB Instructions

Attachments (80)

Note: See TracWiki for help on using the wiki.