Here’s an example of how the aggregate functions work. Say you want to have a quick understanding of how many orders there have been using the sample e-commerce data. As each order has a unique id, you can simply count the id’s column in the table, like this:
If you try on your own, you’ll see that the number of orders is 500. This counts all orders placed in the store. Orders can have different statuses, for example, Order Received, Shipped, or Fulfilled. You might be more interested to see how many orders have been fulfilled. The <span class="code">COUNT</span> function counts all the entries in the column it’s asked to count. As you might have seen in the order table, it has columns with date stamps for when the order changes status — and until an entry is printed in this cell, it simply says NULL:
To see how many orders have been fulfilled, you could therefore count how many entries there are in this column, which would give you the answer to how many cells are filled with a date, and disregard the cells filled with NULL, like this:
As you see, the answer to this is 170.
Pro Tip: The <span class="code">COUNT</span> function will be among your most used aggregate functions. In general, your go-to functions will most likely be:
- <span class="code">COUNT</span> Counting all entries in the column specified
- <span class="code">SUM</span> Calculating the sum of the entries in a column
- <span class="code">AVG</span> Calculating the average of the entries in a column
- <span class="code">MAX</span> Calculating the maximum of the entries in a column
- <span class="code">MIN</span> Calculating the minimum of the entries in a column
The aggregate functions work as you would intuitively expect when working with numbers, but they can also find a logic to give an answer to other types of data. For example, you could find the earliest date of an order in your table like this:
Or, you could find the first person if you ranked all your customers alphabetically by their first name (in case you for some reason want to know that…) like this:
Looking up the minimum value of people’s first names rarely adds value, but this showcases how the aggregate function can work on values other than numerical ones if some kind of logic can be applied to it.
Exercise 8: Find the highest value in the order_line table
Say you want to understand what the highest value is in the lines in your order_line table (different from the customer table you just worked with). Try to find the highest value of quantity multiplied by price in the <span class="code">order_line</span> table by using the <span class="code">MAX</span> function. If you run your query, it should give a result like this:
In SQL there are often many means to an end, so you’ll usually be able to get the required result by writing different variations of the same query. A simple way to solve the exercise above would be to write a query like this:
As you’ve probably noticed, there are typically several lines for each order. So, the small analysis you did in this exercise to find the highest value of the order lines would rarely bring any interesting insights for your business. It would be much more interesting, for example, to find the highest value per order instead of per order line. This is what will be covered in the next section.