Previous page
Next page

3.5.2.6 Queries involving “select”, “project”, and “join”

The power of join really becomes apparent when it is used with the other two relational operators: select and project. For example, let’s say we want to produce a listing of the instructor contact information for every course offered in the spring of 2013. The listing should include for each course: the course name, section sequence number, instructor’s name, office location, and phone number.

Reviewing the three relations that make up our database: Faculty (Figure 3.14), Students (Figure 3.15), and Courses (Figure 3.18); we see that instructor names, office locations, and phone numbers are stored in Faculty, while course names and sequence numbers are stored in Courses. In other words, the information we want is not stored in any single table of the database, but is instead spread among multiple tables. Upon further examination we might also note that the two relations, Faculty and Courses, share an attribute in common: faculty names. Because Faculty and Courses share a field in common, they may be joined. The relational expression to perform this operation is:

Rel1 ← Join Courses and Faculty over Fname

The relation produced by this expression, Rel1, will contain all of the attributes of both Courses and Faculty (with the exception that Fname will be listed only once). Hence, there would be 11 attributes in the resulting relation: Fname, Seq_no, Course, Quarter, Year, Credits, Dept, Office, Phone, SSN, and Salary. Due to the fact that the Fname field in each row of Courses will match up with one (and only one) Fname entry in the Faculty relation, Rel1 will contain exactly the same number of tuples as Courses, eleven.

A relation, Rel4, together with the expressions that generated it

Figure 3.21: A relation, Rel4, together with the expressions that generated it

Rel1 contains all of the information we are interested in, but it also contains a large number of rows and attributes we are not interested in – rows containing information about classes offered in quarters other than spring 2013, and attributes such as faculty social security numbers and salaries. Thus, we need to construct a relation that contains only the relevant data.

Reviewing the problem statement, we were asked to provide contact information on all classes offered during the spring of 2013. The following two select statements produce a relation with the relevant tuples.

Rel2 ← Select from Rel1 where Year = 2013

Rel3 ← Select from Rel2 where Quarter = “Spring”

Rel3 is nearer the mark since it does not contain any extraneous rows. However, it still contains a number of attributes that were not requested, such as the aforementioned Faculty social security numbers and salaries. The following project statement produces the final relation:

Rel4 ← Project Course, Seq_no, Fname, Office, Phone from Rel3

This sequence of relational expressions is summarized, and the resulting relation shown, in Figure 3.21.

When we began our discussion of relational databases we posed a number of example queries of the type that could be asked against a university database. One of those questions was:

The “Grades” relation – part of a university database

Figure 3.22: The “Grades” relation – part of a university database

If you attempt to answer this question using the Faculty, Students, and Courses tables, you will quickly discover that they do not contain the requested information. There is information on Dr. Carpenter and her Math 241 offering in the spring of 2013, but nowhere in the three tables are the courses actually taken by students specified.

The association between students and the courses they have taken is provided by the final relation of our university database, the Grades relation, which is shown in its entirety in Figure 3.22. This rather lengthy table contains three attributes: ID, Seq_no, and Grade; and has an entry for every course taken by every student. The ID attribute, which represents student ID numbers, is shared by both Grades and Students, and thus provides a link between those two tables. Likewise, a link is provided between Grades and Courses via the shared “Seq_no” field, which represents the sequence numbers of individual course offerings or sections.

Using the Grades relation it becomes possible to find the names and ID numbers of the students who took Dr. Carpenter’s spring 2013 offering of Math 241.

In general, when trying to answer a query involving information from a number of different tables (such as this one) it is important that you be fully aware of the types of information stored in each of the tables and aware of the fields that they share in common. The shared fields are critical, since they are used to “work your way” from table to table collecting the information you are interested in along the way.

For example, to discover the students who took a particular class, it is first necessary to obtain the sequence number of that class from the Courses relation. Once we have the course sequence number we can select the rows of the Grades relation that match that sequence number, giving us the ID numbers of the students who took the class. Using those student IDs, we can retrieve the names of the matching students from the Students relation.

This approach for determining the names and ID numbers of the students in Dr. Carpenter’s spring 2013 Math 241 course can be summarized in the following way:

  1. Determine the course sequence number of the indicated course.

  2. Determine the Student ID numbers of the people taking that course.

  3. Retrieve the student information on each of those students.

  4. Project out the name and student ID fields from those student records.

Step one can be accomplished via the following compound select statement:

We now have a relation that contains a single tuple – the one concerning Jenna Carpenter’s spring 2013 offering of Math 241.[6]

Although relation R1 consists of the one tuple from Courses that concerns the course of interest, all of the fields, save one, are not really needed. This is because the only piece of information we need from Courses is the sequence number of Dr. Carpenter’s course, which is “100010”. To keep the relations we are working with from becoming cluttered with unnecessary fields, let’s project out this single attribute.

R2 ← Project Seq_no from R1

Given the course sequence number, we can now construct a relation containing only the grade records of the students who took the course. This can be accomplished by the following relational query:

R3 ← Join Grades and R2 over Seq_no

The contents of relation R3 is shown in Figure 3.23. As you can see, R3 consists of all of the grade records for the course with sequence number “100010”. While we needed the course sequence number to produce this table, the only information contained in the table that is currently of interest to us is the student ID numbers. Hence, we can project out this column:

R4 ← Project ID from R3

Relation R3 – the student numbers and grades of all students who took Math 241 in the spring of 2013 under Dr. Carpente

Figure 3.23: Relation R3 – the student numbers and grades of all students
who took Math 241 in the spring of 2013 under Dr. Carpenter

We can now begin the third step outlined above: retrieving the student records for the individuals whose ID numbers appear in R4. Doing so will involve a join of R4 with the Students relation over ID.

R5 ← Join Students and R4 over ID

Relation R5 contains the records of those students who took Dr. Carpenter’s Math 241 course in the Spring of 2013. The final step in solving the problem involves projecting out only those fields we are interested in; namely Sname and ID.

R6 ← Project Sname, ID from R5

The final table resulting from this series of relational expressions is shown in Figure 3.24.

Although the rows of this table are shown sorted by both last name and student ID number, relational database theory treats relations are unordered sets of tuples, so the order of the rows in the displayed table is technically unpredictable. Most commercially available database applications, however, allow the user to easily sort the rows of a table into any order he or she desires.

Relation R6 – The names and ID numbers of all students who took Math 241 in the spring of 2013 under Dr. Carpenter

Figure 3.24: Relation R6 – The names and ID numbers of all students
who took Math 241 in the spring of 2013 under Dr. Carpenter

Exercises for Section 3.5.2.6

Using the relations presented in Figures 3.14, 3.15, 3.18, and 3.22, develop relational queries to solve each of the following problems.

  1. Produce a listing of the grades given by Dr. O’Neal (“ONeal M. B.”) in all of his courses. The relation you produce should contain only the student names, their grades, the name of the course in which they made the grade, and the quarter and year of offering.

  2. Create a relation containing only the name, office location, and phone number of the instructor who taught the course with sequence number “100004”.

  3. Generate the fall 2012 course schedule for student number “55555510”. The schedule should consist of course names, sequence numbers, and instructors.

  4. Produce an academic transcript for “Walker J.”. The transcript should consist of the course name, quarter, year, and grade earned for every course taken by Mr. Walker.

  5. The Chair of the CS program is interested in how well his students are doing. Generate a table of student names, courses taken, and grades earned by every CS major in winter 2013.

  6. Produce a list of the names of students who earned one or more A’s in winter 2013.

  7. What is the SSN of the only professor who gave “Kleinpeter J.” a B in fall 2012?


Footnotes

[6] If Dr. Carpenter had taught multiple sections of Math 241 during the Spring 2013 quarter, multiple tuples would be listed in this relation – one for each section. However, the wording of the original question makes it unlikely that she was teaching multiple sections of that class that quarter.

Return to top