CS222P Fall 2018: Principles of Data Management
Course Personnel
Instructor:
- Chen Li
E-Mail: chenli AT ics DOT uci DOT edu
Office: DBH 2092
Office hours: Tuesdays 11 am - noon, Wednesdays 2:00 - 3:00 pm - Office Hours of Prof. Li during the finals week: Dec. 10, Monday, 8 - 10 am, DBH 2065, Dec. 11, Tuesday, 1 - 3 pm, DBH 2065
Assistants:
- Qiushi Bai
E-mail: qbai1 AT uci DOT edu
Office: ICS 464C
Office hours: Friday 8:30 am - 9:30 am
- Abdulrahman Abdulhamid Alsaudi
E-mail: alsaudia AT uci DOT edu
Office: ICS 464C
Office hours: Mondays 1:00 pm - 2:00 pm
Online Discussion: https://piazza.com/uci/fall2018/cs222p
- Please use Piazza properly. It's a place for students to exchange ideas. Don't post "easy" or "random" questions without much thinking.
- To encourage students to actively participate in Piazza discussions and provide high-quality answers, we will select one student with the best Piazza performance. The student will get 2% extra credits in the overall scores.
Schedule & Location
Lecture
Days: Mondays and Wednesdays
Time: 5:00 pm – 6:20 pm
Location: SSL 228
Lab
Days: Fridays
Time: 7:00 pm – 7:50 pm
Location: SSL 270
Lectures
ID | Date | Topic | Readings | Notes |
1 | 10/01/18, M | DBMS Architecture | Ch. 1; Chamberlin et al 1981 | PPT1 (Updated 10/2, 11:35pm) Setup MySQL |
2 | 10/03/18, W | Record/Page Formats | Ch. 9.1, 9.2, 9.6, 9.7 | PPT2 |
3 | 10/8/18, M | PAX, Column/Row stores, OLTP/OLAP, Project 1, Part 2 | Ditto | Ditto |
4 | 10/10/18, W | Heap Files, Buffer Manager, Catalogs, Project 2 | Ch. 9 (remaining chapters) | PPT3 |
5 | 10/15/18, M | Schema versioning, File Organizations | Ch. 8.4 | PPT4 |
6 | 10/17/18, W | Index Overview and ISAM Tree Index | Ch. 8.4, Ch. 10 | PPT5 |
7 | 10/22/18, M | B+ tree | Graefe 2011 (Sec. 1-3) | PPT6 |
8 | 10/24/18, W | Static Hashing, Extendible Hashing, Linear Hashing | Ch. 11; Fagin et al 1979; Litwin 1980 | PPT7 |
9 | 10/29/18, M | Comparisons of Indexes and Indexing Performance | Ch. 8.4 - 8.6 | PPT8 |
10 | 10/31/18, W | Project 3, Ditto | Ditto | Ditto |
11 | 11/5/18, M | Index-only plans, Index selection, Midterm Review | Ditto | Ditto |
12 | 11/7/18, W | In-class midterm | ||
13 | 11/14/18, W | External Sorting | Ch. 13 | PPT9 |
14 | 11/19/18, M | Selection, Projection | Ch. 14.1 - 14.3 | PPT10 |
15 | 11/21/18, W | Joins: Blocked-based NLJ, Index-based NLJ | Ch. 14.4 | PPT11 |
16 | 11/26/18, M | Joins: Merge-based, Hash-based (Grace), Project 4, Operator interface | Ditto | Ditto |
17 | 11/28/18, W | Project 4, Joins: Simple Hash Join, Hybrid Hash Join, Multi-attribute join, Theta join | Ditto | Ditto |
18 | 12/3/18, M | Set operations, Aggregation, Cost estimation | Ch. 14.5, 14.6, 15.2 | PPT12 |
19 | 12/5/18, W | System-R Optimizer, course wrap up | Ch. 15.3 - 15.6, Selinger et al 1979 | PPT13 |
Projects
Project | Due Date | Topic | Teams | Proportions (50%) | Graded By | Private Tests |
Project 1 | Tu, week 2, Oct. 9 | Record-Based File Manager (RBF) | Pair Project | 9% | ||
Project 2 | Tu, week 5, Oct. 30 | Relation Manager (RM) | Pair Project | 14% | ||
Project 3 | Tu, week 8, Nov. 22 | Index Manager (IX) | Pair Project | 14% | ||
Project 4 | Fri, week 10, Dec. 7 | Query Engine (QE) | Pair Project | 13% |
Project team sign-up sheet Github account sheet Search for teamates
Note: Using Github in CS222P
Lab Section
Date | Assistant | Topic | Notes |
10/05/18, Fri | Qiushi Bai | Page Format, Project Environment Setup | Lab01 |
10/12/18, Fri | Qiushi Bai | Disk I/O, Record Manager | Quiz01 |
10/19/18, Fri | Abdulrahman Alsaudi | Buffer Manager, Schema versioning, | Quiz02 |
10/26/18, Fri | Abdulrahman Alsaudi | Index Overview and ISAM Tree Index, B+ tree | Quiz03 |
11/02/18, Fri | Qiushi Bai | Composite Key, Index Performance | Quiz04 |
11/16/18, Fri | Qiushi Bai | Merge Sort | Quiz05 |
11/30/18, Fri | Abdulrahman Alsaudi | Projection, Blocked-based NLJ, Index-based NLJ, Merge-based join | Quiz06 |
12/7/18, Fri | Abdulrahman Alsaudi | Simple Hash Join, Grace Hash Join, Hybrid Hash Join | Quiz07 |
Course Information
Course Description
This course provides an implementor's view of database management systems. It covers fundamental principles and implementation techniques, issues, and trade-offs related to database management systems. Topics covered include storage management, buffer management, record-oriented file systems, access methods, query processing, and query optimization. This course is a must for those students wishing to explore database management as an area of research and/or who plan on taking CS223 or CS224. A significant portion of database systems research as well as industrial database and information system development deals with adapting the basic database techniques covered in this course to new advances in hardware and software technologies or to the requirements of different applications and data types.
Using Github
You are required to use Github to manage your source code. Here are the instructions about how to use github to manage your projects and do submissions. For a team of 2 students, each member is expected to do the best effort to make equal contributions. We reserve the rights to deduct points for *both* students if we see unbalanced contributions.
Grading Breakdown
Midterm Exam: 25%
Final Exam: 25%
Four-Part Programming Project: 50%
This mixture of grading criteria for CS222P is intended to give equal "excelling opportunities" to both thinkers and do-ers. It's also intended to avoid overly weighting either team-based performance or individual performance. This is a hands-on project class, so expect to put a significant effort into your projects. The exams will be comprehensive with respect to the course material and will also ping you individually with respect to project knowledge, so be sure that you and your project partner make design decisions together. The two exams will be closed-book, but you may bring an 8.5"x11" two-sided cheat sheet to each exam if you like.
For all the graded projects and exams, if you disagree with the grading, you can discuss with us within two weeks after they are returned. After that, all the grades will be finalized.
Textbooks
Required: Database Management Systems, 3rd edition, by R. Ramakrishnan and J. Gehrke, McGraw Hill, 2003.
Recommended: Readings in Database Systems, 4th edition, by J. Hellerstein and M. Stonebraker, MIT Press, 2005.
Other Readings
The following papers are mostly drawn from the Hellerstein and Stonebraker book or can be found via UCI's ACM Digital Library subscription (http://dl.acm.org). You will be responsible both for the material covered in the lectures and the material covered in the readings listed in the Syllabus below.
- Abadi, D., Madden, S., and Hachem, N. 2008. Column-stores vs. Row-stores: How Different are They Really?. Proc. ACM SIGMOD Intl. Conf. on Management of Data (Vancouver, Canada, June 2008).
- Alsubaiee, S., et a., 2014. AsterixDB: a scalable, open source BDMS. Proc. VLDB Endow. 7, 14 (Oct. 2014), 1905-1916.
- Chamberlin, D., et al, 1981. A history and evaluation of System R. Commun. ACM 24, 10 (Oct. 1981), 632-646.
- Dean, J., and Ghemawat, S., 2004. MapReduce: Simplified data processing on large clusters. Proc. 6th USENIX Symp. on Op. Sys. Design and Impl. (OSDI'04) (San Francisco, CA, December 2004).
- DeWitt, D. and Gray, J. 1992. Parallel database systems: The future of high performance database systems. Commun. ACM 35, 6 (June 1992), 85-98.
- Fagin, R., et al, 1979. Extendible Hashing—A fast access method for dynamic files. ACM Trans. Database Syst. 4, 3 (Sept. 1979), 315-344.
- Graefe, G., 2011. Modern B-Tree Techniques. Foundations and Trends in Databases 3, 4 (Sept. 2011) 203-402. (Available from the UCI intranet at: http://dx.doi.org/10.1561/1900000028)
- Guttman, A. 1984. R-trees: A dynamic index structure for spatial searching. Proc. ACM SIGMOD Intl. Conf. on Management of Data (Boston, Massachusetts, June 1984).
- Litwin, W., 1980. Linear hashing: A new tool for file and table addressing. Proc. 6th Int'l. Conf. on Very Large Data Bases, Montreal, Oct. 1980.
- Nievergelt, J., Hinterberger, H., and Sevcik, K. C. 1984. The Grid File: An adaptable, symmetric multikey file structure. ACM Trans. Database Syst. 9, 1 (March 1984), 38-71. (optional)
- Selinger, P. G. et al, 1979. Access path selection in a relational database management system. Proc. ACM SIGMOD Int'l. Conf. on Management of Data (Boston, Massachusetts, May-June 1979).
- Shapiro, L. D. 1986. Join processing in database systems with large main memories. ACM Trans. Database Syst. 11, 3 (Aug. 1986), 239-264.
- Stonebraker, M. 1981. Operating system support for database management. Commun. ACM 24, 7 (July 1981), 412-418.
Class Discussion Forum
We will be using Piazza for online class discussions. This system is highly catered to getting you help fast and efficiently from classmates, the Reader(s), and the instructor. Rather than emailing questions to any of us on the teaching staff, we'll ask you to post your questions on Piazza. If you have any problems or feedback for the developers, email team@…. Please make Piazza participation (at least monitoring the activity there) a part of your regular CS222P routine!
You can find our class page at: https://piazza.com/uci/fall2018/cs222p
Programming Project
This class is intended to teach the principles involved in DBMS implementation, so it will include a significant programming component. The project will be aimed at giving you a sense of what goes on inside a DBMS, and what the issues and challenges are in building a system "now" that will be managing data of various user-defined shapes and sizes "later". You will try your hand at record-based file storage, indexing, and relational runtime system programming (a.k.a. query operators). The language for the project will be C++ - hopefully you either have C++ experience or you have Java experience and can learn C++ quickly. Our assistant(s) will run the project, and they can provide you with additional suggestions for material to help you "spin up" on C++ programming, debugging, and tools in case you need to do a bit of remedial work on the side in order to tackle the project successfully.
Notice about Projects
- Making teams: Project grades will be team based. Each team can have at most two people. Please let us know about your group by adding your team members' names here in front of your group id (please do so even if you are alone in your group). A student can always leave his/her team during the quarter with a notice at least two weeks prior to the next project's deadline. Also, please note that it is not permitted to join a new team after splitting.
- The projects are doable by individuals, which can be more challenging but rewarding.
- Students are allowed to add attributes, methods, and classes, as long as the public signatures provided in the codebase are kept and implemented. We will be testing your code by calling the methods that are declared in the signatures.
- Students also have the freedom to use other open-source libraries or packages (e.g., Gtest and Boost) to implement and test their code. Just please make sure that you: (1) get the okay from us before using any additional packages; (2) list such packages clearly in your project document; and (3) write your makefile correctly so that your project can still be built by calling the 'make' command.
Project Late Policy
- The official due date for each assignment is listed here on the wiki, and it is expected that students will turn most work in on or before that date.
- We will offer a 48-hour "grace period" for each assignment, and will therefore accept solution submissions that are turned in within 48 hours of the due date (i.e., less than two days late). No need to ask - this time is yours in the event that you need it for some unforeseen reason. If you use it, your score will be deducted by 10 points (out of 100). You are recommended NOT to use the grace period, as you will lose not only 10%, but also valuable time from the next assignment's working period for every hour that you run late.
- Late assignments after the grace period will NOT be accepted, so do always aim to be on time! (Please don't even ask, as this is what the two-day grace period is intended for. :-))
Cheating Policy
Cheating is an area where the instructor for this course has absolutely no patience or sympathy whatsoever. You are all here to learn, and cheating defeats that purpose and is unfair to your fellow students. Students will be expected to adhere to the UCI and ICS Academic Honesty policies (and should see http://www.editor.uci.edu/catalogue/appx/appx.2.htm#academic and http://www.ics.uci.edu/ugrad/policies/index.php#academic_honesty to read their details). We run a software package to search for cheating cases in projects. Any student found to be cheating or aiding others in cheating will be academically prosecuted to the maximum extent possible — so you may fail the course with no option for a second chance. This policy will be strictly adhered to; no exceptions will be possible regardless of the impact it may have on your studies, your work plans, your visa status, or anything else. Don’t cheat and you won’t have issues. Just say no to cheating!
C/C++ Resources
- A helpful reference for C/C++ libraries: http://www.cplusplus.com/reference/
- A tutorial on programming in C/C++ and guide on related resources: http://www.cprogramming.com/tutorial.html#c++tutorial
- A tutorial on learning C++ for Java programmers: http://pages.cs.wisc.edu/~hasti/cs368/CppTutorial/index.html
Midterm Exam
Time: Nov. 7, Wednesday, In-class. Closed books. You can bring an A4-sized two-sided "cheat sheet." Sample midterm with solutions Midterm solutions
Final Exam
Time: Dec. 12, Wednesday, 10:30-12:30pm
Usual classroom. Closed books, Closed notes. You can bring an A4-sized two-sided "cheat sheet." The exam will cover all materials taught in this quarter, with a more emphasis on those after the midterm.
Sample final Final solution
Attachments (28)
- setup-mysql.txt (1.2 KB) - added by qiushi 6 years ago.
-
cs222p-lecture01.ppt
(646.5 KB) -
added by chenli 6 years ago.
Lecture 01: introduction
-
cs222p-lecture02.ppt
(315.0 KB) -
added by chenli 6 years ago.
Lecture 02
- cs222p-lab01.pptx (72.5 KB) - added by qiushi 6 years ago.
-
cs222p-lecture03.ppt
(341.5 KB) -
added by chenli 6 years ago.
Lecture 03
- cs222p-fall18-quiz01-solution.pdf (52.4 KB) - added by qiushi 6 years ago.
-
cs222p-lecture04.ppt
(237.5 KB) -
added by chenli 6 years ago.
Lecture 04
-
cs222p-lecture05.ppt
(453.5 KB) -
added by chenli 6 years ago.
Lecture 05
-
cs222p-lecture06.ppt
(1.1 MB) -
added by chenli 6 years ago.
Lecture 06
-
cs222-lecture06.ppt
(1.1 MB) -
added by chenli 6 years ago.
Lecture 06
-
cs222p-lecture07.ppt
(444.0 KB) -
added by chenli 6 years ago.
Lecture 07
-
CS222P-Quiz03.pdf
(61.3 KB) -
added by alsaudi.ad 6 years ago.
Quiz 3
-
cs222p-lecture08.ppt
(333.5 KB) -
added by chenli 6 years ago.
Lecture 08
- 2017-midterm-solution.pdf (509.0 KB) - added by qiushi 6 years ago.
- cs222p-fall18-quiz04-solution.pdf (21.7 KB) - added by qiushi 6 years ago.
-
CS222P-Quiz02.pdf
(52.2 KB) -
added by alsaudi.ad 6 years ago.
Quiz 2
- cs222p-fall18-midterm-solution.pdf (312.9 KB) - added by qiushi 6 years ago.
-
cs222p-lecture09.ppt
(206.0 KB) -
added by chenli 6 years ago.
Lecture 09
-
cs222p-lecture10.ppt
(148.0 KB) -
added by chenli 6 years ago.
Lecture 10
-
cs222p-lecture11.ppt
(896.0 KB) -
added by chenli 6 years ago.
Lecture 11
-
cs222p-lecture12.ppt
(417.5 KB) -
added by chenli 6 years ago.
Lecture 12
- cs222p-fall18-quiz05-solution.pdf (32.3 KB) - added by qiushi 6 years ago.
-
cs222p-lecture13.ppt
(277.5 KB) -
added by chenli 6 years ago.
Lecture 13
-
system-R-selinger79.pdf
(1.2 MB) -
added by chenli 6 years ago.
System-R Selinger paper
- CS222P-Quiz06.pdf (71.3 KB) - added by alsaudi.ad 6 years ago.
- CS222P-Quiz07.pdf (65.9 KB) - added by alsaudi.ad 6 years ago.
-
2017-fall-final-solution.pdf
(333.4 KB) -
added by alsaudi.ad 6 years ago.
2017-fall-final-solution
- 2018-fall-final-solution.pdf (211.4 KB) - added by alsaudi.ad 6 years ago.