SQL tutorial •
Aggregating Data


In the last section you learned how to make aggregations across the entire table, which enables you to get insights about the whole data sample. So you've now gone from looking at all rows in a table to being able to summarize all of these into one row. But sometimes there will be situations where you'll need to summarize the results on a level in between looking at all the rows in a table and only looking at one. For example, you might want to see the sum of your sales for each month, the sales for each order, or the average sales for each customer.

To do this, you want to use the aggregate functions on certain defined groupings. You can use the <span class="code">GROUP BY</span> function to do this. For example, to find how many orders each customer has made:

See the query here:

This is how the <span class="code">GROUP BY</span> function works. You write your aggregate function as you would otherwise, and then add the column(s) you want your resulting table to be grouped by. For example, if you’d like to count how many orders you get per year, a query to investigate this could look like this:

As you see, you can use the count of the id column in the <span class="code">Order</span> table to get the number of orders in the same way as you did before, but on top of this, you also need to create a new column to extract only the year from the <span class="code">received_at</span> date column. After the <span class="code">FROM</span> statement, you also add a <span class="code">GROUP BY</span> statement to define how you’d like to put your aggregations into groups. In this case, you’d like these aggregations to be grouped by the new column you just made, <span class="code">received_at_year</span>. If you try it on your own, the result would look like this:

As you see, the result looks a little weird with a lot of orders in 2021 and few in 2020 and 2022. You can take a deeper look by adding another level of granularity:

This time, add one more column extracting the month from the <span class="code">received_at</span> date column, and group by this column. The result will look like this:

Now the 12th month in 2020 and the 1st month in 2022 are the only months included, so the result seems to make sense. And this is how the <span class="code">GROUP BY</span> function works — defining which column(s) you’d like your aggregations to be grouped by.

The result of the grouped count of orders is a bit messy, and it would be nice to have the results ordered chronologically. You can use the <span class="code">ORDER BY</span> function to define how you'd like the results ordered. This function is written similarly to the <span class="code">GROUP BY</span> function and is simply added after the <span class="code">GROUP BY</span> function. For example:

To order the results by both year and month, add both to the <span class="code">ORDER BY</span> statement. If you type this in the Weld Editor on your own, your result should look like this:

You can define whether the <span class="code">ORDER BY</span> function should order the value in ascending or descending order by adding <span class="code">ASC</span> or <span class="code">DESC</span> to the end of the statement. If you want to reverse the order in the query above, you could change the <span class="code">ORDER BY</span> statement to <span class="code">recived_at_year desc, recived_at_month desc</span>.

The <span class="code">ORDER BY</span> function statement not only works for aggregated data, it can also define the order of all rows in a table. If you want to play around with this, you can go back to some of the queries you wrote earlier.

Exercise 9: Find the total revenue for each order, and order it from largest to smallest

As noted at the end of the last exercise, finding the highest value of the Order lines doesn’t make much sense, since each order can have several lines. Now with the <span class="code">GROUP BY</span> function in your toolbox, you can instead try to find the largest order on your webshop. All the data you’ll need is in your <span class="code">order_line</span> table. As each order has a unique order id that’s included in the <span class="code">order_line</span> table, you can group it by these values. Order your table from the largest to the smallest revenue per order.

If you run your query, it should give a result like this:

As you can see, the order with the highest value is the order with 48vZj27vQz as the id.

In SQL there are often many means to an end, so you’ll usually be able to get the required result by writing different variations of the same query. A simple way to solve the exercise above would be to write a query like this:

You might remember from looking at the raw data in this table that the orders are actually in different currencies. To convert the values into one currency, you’d need to add another step in your analysis. This way, you could find the real revenue for the e-commerce business per month. To do this, you’d need to know the conversion rates between all the currencies in the table. From there, you could calculate it in the editor. If you want to get a deeper look into the sales in the sample e-commerce data, feel free to push yourself with this exercise.

Next up:
Avoid failing queries with aggregate functions and GROUP BY