SQL tutorial •

Using the JOIN function to combine tables

Right now the information in the three tables from the sample e-commerce data is quite dispersed. For example, the order table contains no other information about who made the order than the customer id. So if you need to contact a certain customer, you would have to take the customer id from the order table and then manually look that up in the customer table to find the corresponding email address. It would be better to have one combined table with all of this information. Here’s how this could be done in SQL:

Overview of an SQL query

As you see, this looks quite similar to a usual <span class = "code">SELECT</span> statement. The main difference in this query is the left join statement after the <span class = "code">FROM</span> statement. The left join statement is structured into the clause <span class = "code">LEFT JOIN</span> followed by the destination of the table being combined with the primary table. After this is an <span class = "code">ON</span> clause, which defines the common identifier in the two tables. This is what’s required for a left join to work.

Another new feature in the query above is the table alias preceding the selected columns. This gives an alias or nickname for the whole table, using <span class = "code">AS</span> to define the destination of the table. As you’ve learned, you can use <span class = "code">AS</span> to give aliases to individual columns (<span class = "code">SELECT</span> x_column <span class = "code">AS</span> x_alias). In a similar way, you can give tables an alias by adding the <span class = "code">AS</span> statement after the table destination in the <span class = "code">FROM</span> and <span class = "code">JOIN</span> statements. You can then refer to these table aliases when picking the columns in the <span class = "code">SELECT</span> statement by adding the alias and a <span class = "code">.</span> before the column name.

Table aliases are not a requirement for all columns when using joins, but if there are columns in the two tables that have the same header, you need to specify which of the tables you’re referring to. It can also be helpful to easily go back and see which columns belong to which tables, so it’s a good idea to add aliases if you start to combine many tables and columns.

You can try to type this query in the editor on your own:

The result should look something like this:

A GIF showing a finished query

In this example, columns from two different tables were selected and displayed together. But as you’ve probably figured out, left joins can be combined with all the other functions you’ve now learned to use in SQL.

For example: Say you want to know your revenue for every month. The sales numbers are stored in the Order line table, while the information about when the order was received is in the Order table. You need to join these two tables to get this view, then sum the quantity times the price, and finally group by the year and month of when the order was received. You can also order it by year and month. Here’s what the query looks like:

If you try on your own, the resulting table should look like this:

resulting table

Exercise 11: Revenue per order

Say you want a table showing when each order was received and the revenue for each order. This would require joining the order line table with the order table, and then selecting the order id, the date when the order was received, and the sum of each quantity times the price. Remember that each order can have several order lines, so you’ll need to sum these up to get this for each order. Try to write a query that returns these columns in a table.

If you run your query, it should give a result like this:

Result of the exercise

In SQL there are often many means to an end, so you’ll usually be able to get the required result by writing different variations of the same query. A simple way to solve the exercise above would be to write a query like this:

Next up:
Combining CTEs with JOIN