SQL tutorial •
Tips & Tricks

LIMITing the amount of data with SQL

SQL can handle very large amounts of data. But if you’re working with very large data sets, this requires a lot of computing power, which can take up resources both in terms of time and money. So it’s a good idea to limit the amount of data you work with in SQL. Especially when you’re developing your models, it’s often enough to work with a smaller sample of the full data set, and then you can apply the final model on the full table. You can use the <span class="code">LIMIT</span> function for this. The order line table contains 1000 rows. If you wanted to develop a model for this table by only experimenting with the first 5 rows, it could be done like this:

This returns this table:

A Table showing the LIMIT function

The order in which you place your ‘limit’ statement in your query is important. For example, if you want to add an ‘order by’ statement to the query above and order the query as:

  1. <span class = "code">SELECT</span>
  2. <span class = "code">FROM</span>
  3. <span class = "code">LIMIT</span>
  4. <span class = "code">ORDER BY</span>

The result would be to first select the 5 top rows in the raw table, and then order those 5 rows. If you change the order of the statements to:

  1. <span class = "code">SELECT</span>
  2. <span class = "code">FROM</span>
  3. <span class = "code">ORDER BY</span>
  4. <span class = "code">LIMIT</span>

The resulting table would instead be the 5 top rows after you’ve ordered them. You can try this on your own.

Next up
Formatting best practices