SQL relationships
tags:
Introduction
This post is actually a note for myself about relationships in SQL. Most of the time tables are related to others through relationships. The principle is to avoid non-atomic values (first normal form) and data duplication (second normal form). There are 3 relations ships: one-to-many, many-to-many and one-to-one.
The data
We’ll use fake data. Imagine we want to watch movies at the cinema. Let’s suppose I’m interested in adventure movies but I have no idea what movies are worth watching. So I read the reviews of these movies, which is a score in 5 and chooses the one with the best score.
We have:
- a table for the movies called movies with the schema:
CREATE TABLE movies (
id SERIAL PRIMARY KEY,
title CHARACTER(50) NOT NULL,
duration INTEGER CHECK (duration > 0),
CONSTRAINT unique_title UNIQUE (title)
);
id | title | duration |
---|---|---|
1 | Don Juan | 110 |
2 | Peter Pan | 120 |
3 | The Lost World | 105 |
4 | Robin Hood | 143 |
- a table for the genres called genres with the schema:
CREATE TABLE genres(
id SERIAL PRIMARY KEY,
genre CHARACTER(15) NOT NULL UNIQUE
);
id | genre |
---|---|
1 | Romance |
2 | Adventure |
3 | Fantasy |
- table for the promotions called reviews:
CREATE TABLE reviews (
id SERIAL PRIMARY KEY,
movie_id INTEGER REFERENCES movies(id),
review INTEGER
FOREIGN KEY (movie_id) REFERENCES movies(id)
);
Note that we create the relationship by creating a foreign key. The foreign key movie_id references the primary key id in the table movies.
id | movie_id | review |
---|---|---|
1 | 1 | 3 |
2 | 1 | 4 |
3 | 2 | 4 |
4 | 3 | 5 |
5 | 4 | 4 |
6 | 4 | 5 |
(Note: the examples used in this post are directly taken from a course on CodeSchool, The Sequel to SQL)
One-to-many
One row in table A can relate to many rows in table B. One row in table B relates to only one row in table A.
In our example: one movie can have many reviews but a review adresses only one movie.
SELECT m.title, r.review
FROM movies m JOIN reviews r
ON m.id=r.movie_id;
title | review |
---|---|
Don Juan | 3 |
Don Juan | 4 |
Peter Pan | 4 |
The Lost World | 5 |
Robin Hood | 4 |
Robin Hood | 5 |
Many-to-many
One row in table A can relate to many rows in table B. One row in table B can relate to many rows in table A. To define this relationship we need a table C (the junction table) whose columns will be foreign keys referencing the primary keys of tables A and B.
In this example: A movie can have many genres and a genre can be associated to many movies.
CREATE TABLE movies_genres(
movie_id INTEGER,
genre_id INTEGER,
FOREIGN KEY (movie_id) REFERENCES movies(id),
FOREIGN KEY (genre_id) REFERENCES genres(id)
);
movie_id | genre_id |
---|---|
1 | 1 |
2 | 2 |
2 | 3 |
3 | 3 |
4 | 2 |
Now if we want to know the genres of the movies we have to use 2 joins:
SELECT m.id, m.title films, g.genre
FROM movies m
INNER JOIN movies_genres mg
ON m.id=mg.movie_id
INNER JOIN genres g
ON g.id=mg.genre_id;
id | films | genre |
---|---|---|
1 | Don Juan | Romance |
2 | Peter Pan | Adventure |
2 | Peter Pan | Fantasy |
3 | The Lost World | Fantasy |
4 | Robin Hood | Adventure |
One-to-one
One row in table A relates to only one row in table B. One row in table B relates to only one row in table A. The primary key in B is the foreign key references the primary key in A.
In our example, we can imagine that each movie has a given budget and gross. These information would be stored in the table information:
CREATE TABLE information (
movie_id INTEGER PRIMARY KEY,
budget INTEGER,
gross INTEGER,
FOREIGN KEY (movie_id) REFERENCES movies(id)
);
movie_id | budget | gross |
---|---|---|
1 | 1000000 | 3000000 |
2 | 1500000 | 4000000 |
3 | 5000000 | 10000000 |
4 | 2000000 | 6000000 |
The important thing to notice here is that the primary key of this table (movie_id) is also the foreign key references id in the table movies.
If we want to have the movies and these information we use a join on these 2 tables:
SELECT m.title, i.budget, i.gross
FROM movies m JOIN information i
ON m.id = i.movie_id;
title | budget | gross |
---|---|---|
Don Juan | 1000000 | 3000000 |
Peter Pan | 1500000 | 4000000 |
The Lost World | 5000000 | 10000000 |
Robin Hood | 2000000 | 6000000 |
What movie am I going to watch?
I want to watch the adventure movie with the best average review. First let’s create a view to determine the adventure movies:
CREATE VIEW adventure_movies_view AS
SELECT m.id, m.title, g.genre
FROM movies m
INNER JOIN movies_genres mg
ON m.id=mg.movie_id
INNER JOIN genres g
ON g.id=mg.genre_id
where g.genre = 'Adventure';
Then we use join between the adventure movies and the reviews tables. We group by movie and we average the score of the reviews:
SELECT amv.title, AVG(r.review) as avg_review
FROM adventure_movies_view amv JOIN reviews r
ON amv.id=r.movie_id
GROUP BY amv.title;
title | avg_review |
---|---|
Peter Pan | 4.0 |
Robin Hood | 4.5 |
OK. It seems I’m going to watch Robin Hood
.