OGI School of Science & Engineering

CSE541 Database Implementation: Winter 2006

Class time/location Wednesdays, 5:30pm-8:15pm in WCC 403 (Jan 11-Mar 15)

Text Database System Implementation by Garcia-Molina et al. (Prentice Hall, ISBN:0-13-040264-8)

Database Tuning: Principles, Experiments, and Troubleshooting Techniques by Shasha & Bonnet (Morgan Kaufmann, ISBN:1-55-860753-6)

Instructor Tamara Hayes
Office: BCB330
Phone: 503-748-7372 (but try email first)
Fax: 503-748-7038
Prerequisites CSE514 and programming experience, or permission of instructor.
Class mailing list Visit cse541-list mailing list web page.

News (Last updated March 15, 2006)

Final exam has been posted to this website: (pdf) | (Word) |
Murali's talk on real-time databases: pdf

Weekly Schedule

Final lecture notes will be posted on the day of the class. I will print copies for class, but you can always print another if you need it. Warning: Class lecture notes are subject to change up to the day of the class. Print early at your own risk!

All lecture notes and assignments will be posted in pdf format. You can get Adobe Reader here.

Readings are from the Garcia-Molina text and the Shasha book. Additional optional readings from primary literature sources will be posted to the web in pdf format.
Lectures
Date Topic Reading
11-Jan Course overview; Intro to database implementation; Storage management; Buffer management
Lecture notes (pdf) | Homework 1 (pdf) | assign1.txt (for assignment 1) (tab-delimited)
Ch. 3
Shasha: Ch. 2.4, 2.5
Related papers
18-Jan Indexing (hashing, Btrees); Performance tuning 1
Lecture notes Indexes (pdf) | Homework 2 (pdf)
Ch. 4
Shasha: Ch. 3
Related papers
25-Jan Multi-dimensional indexing; Logical query plans
Lecture notes Multi-D indexing (pdf) | Logical query plans (pdf)
Ch. 5,6
Related papers
1-Feb Query Execution; Query Optimization
Lecture notes (pdf)
Homework 3 (pdf) | Files for assignment 3
Project deliverable 1 due
Ch. 6, Ch. 7.1-7.4
Shasha: Ch. 4.5-4.6, 5.4, App. D
Related papers
8-Feb Schedules and serializability; Locking
Lecture notes (pdf)
Ch. 7.5-7.7, 9.1-9.5
Shasha: Ch. 7.1-7.3
15-Feb NO CLASS  
22-Feb Optimistic concurrency control, transaction management
Project deliverable 2 due
Lecture notes (pdf)
Ch. 9.6-9.9, 10
Shasha: Ch. 2.2
Related papers
1-Mar Undo/redo logging; ARIES logging
Lecture notes (pdf)
Homework 4 (pdf) | (Word) |
Ch. 8, Original ARIES paper
Shasha: Ch. 2.3
8-Mar Distributed databases
Lecture notes (pdf)
Ch. 10.4
15-Mar Catch-up if needed; Project presentations
Final project due
Final exam: (pdf) | (Word) | Murali's talk on Real-time Databases pdf
 
22-Mar Exam week
Final exam due
 
 
Additional reading
Week Reading
11-Jan D.L. Moniz, P.J. Fortier, Simulation analysis of a real-time database buffer manager 29th Annual Simulation Symposium, April 08 - 11, 1996, New Orleans, LA.

This paper examines buffer management strategies in the context of time constraints on critical and non-critical transactions. Simulations are used to compare and contrast the strategies.

18-Jan J. Rao , K.A. Ross, Making B+- trees cache conscious in main memory. Proc. ACM SIGMOD 2000, 475-486.

An approach to storing B+-trees that increases utilization of cache lines.

R.A. Hankins, J.M. Patel, Effect of node size on the performance of cache-conscious B+-trees. ACM SIGMETRICS 2003, 31(1).

Analytical and experimental analysis of a model of node size on the search performance of cache-conscious B+-trees.

25-Jan J. F. Roddick, M. Spiliopoulou, A survey of temporal knowledge discovery paradigms and methods IEEE Transactions on Knowledge and Data Engineering, 14(4), 2002, 750-767.

This paper provides an overview of a problem space - temporal mining - that has begun to influence database implementation, particularly in the area of indexing. If you present this paper for a class project, you must include a discussion what types of indexes would be particularly suited to temporal data, and how one would go about implementing support for those indexes in a relational database.

1-Feb S. Manegold, P. Boncz, M. Kersten, Optimizing Main-Memory Join on Modern Hardware IEEE Transactions on Knowledge and Data Engineering, 14(4), 2002, 709-730.

An excellent paper that addresses the issue of adapting existing implementations of query support to better utilize high-performance CPU and memory architectures on modern machines.

G. Slivinskas, C.S. Jensen, R.T. Snodgrass, A Foundation for Conventional and Temporal Query Optimization Addressing Duplicates and Ordering IEEE Transactions on Knowledge and Data Engineering, 13(1), 2001, 21-49.

Proposes an extension to SQL to support temporal queries, and discusses how query plan enumeration would proceed for these extensions. This is a very technical paper, and if this paper is used for a class project, you must discuss what effect the proposed SQL extensions would have on the implementation and performance of the query processor and query optimizer. Presentation of the algebra of the transformation rules is neither necessary nor appropriate, although explanation of those rules and the need for them would be.

22-Feb P.K. Reddy, S. Bhalla, Asynchronous Operations in Distributed Concurrency Control IEEE Transactions on Knowledge and Data Engineering, 15(3), 2003, 721-733.

Graph-based resolution of deadlocks in a distributed environment.

S. Lee, J.L. Kim, Performance Analysis of Distributed Deadlock Detection Algorithms IEEE Transactions on Knowledge and Data Engineering, 13(4), 2001, 623-636.

This is an analytical analysis of the performance of deadlock detection schemes. This is a technical paper not suitable for the class project, but may be of interest to some students.

2-Mar C. Mohan, Repeating History Beyond ARIES Proceedings of the 25th VLDB Conference, Edinburgh, Scotland, 1999.

Grading Policy

Each course deliverable and oral presentation will be graded based on its completeness, clarity, depth, and presentation. Grades will be assigned as follows:
4   Work and presentation are superior
3   Work is accurate and its presentation is of high quality
2   Work is complete, but there are some problems with its presentation or accuracy
1   Work is incomplete, inaccurate, or its presentation is poor
0   Work was not submitted

Final grades in the course will be determined based on the assessment criteria above. There will be no curve. Final grades have the following meanings:
A+   Superior performance in all aspects of the course with work exemplifying the highest quality.
A   Superior performance in most aspects of the course; high quality work in the remainder.
A-   High quality performance in all or most aspects of the course.
B+   High quality performance in some of the course; satisfactory performance in the remainder.
B   Satisfactory performance in the course.
B-   Satisfactory performance in most of the course, with the remainder being somewhat substandard.
C   Evidence of some learning but generally marginal performance.

Late assignments will be accepted only with prior approval from the professor.

Assignments

Work in the course will be weighted as follows:
40%   Homework assignments
Homework assignments will consist of relating to the material covered in class, in somes cases exploring how the database mySQL has implemented support for certain capabilities
30%   Exam
The exam will cover all the material in the course (readings and lectures) up to the point of the exam. It will be a take-home exam.
30%   Course project
Students have a choice of 2 projects:
  1. Presenting in detail one of the papers from the Optional Reading List. This project will include describing how the material in the paper relates to concepts discussed in class, and providing a critique of the approach and conclusions of the paper.
  2. A benchmarking project, in which a team of 2-3 students will evaluate the performance of mySQL for a particular type of workload.

Project Details

Details of deliverables and ideas for the benchmarking project can be found here: pdf.
Details of deliverables and ideas for the paper review project can be found here: pdf

Course Description

In this course we will look at how relational database systems are designed, and how different implementation approaches influence performance and data integrity. Topics will include data storage, caching and buffer management, how query plans are chosen and query processing is optimized, concurrency control, logging and recovery, and performance tuning. We will look at the tuning parameters of large commercial databases such as Oracle, DB/2, Informix and Sybase and discuss how these parameters influence the software behavior and what they teach us about the implementation.

Students will gain hands-on experience by working with an existing OpenSource SQL database (mySQL). We will also look at some recent research in the field to get a better understanding of new algorithms and approaches in database implementation.

Who should take this course?

This course will be of interest to:
  • DBA's responsible for maintaining and tuning the database(s) at their sites. After taking this course, you should have a better understanding of how database tuning parameters work, and when to use them.
  • Database application developers, who are designing large database applications. After taking this course, you should have a better understanding of issues influencing physical and logical database design.
  • Students interested in working in database system development. After taking this course you will understand the key implementation issues in relational database design.
[an error occurred while processing this directive]