How to use this guide
This document should be as a rough guide to SQL. Let’s start with some concepts and build from there…..
What is SQL and why is it useful for me?
SQL is a Structured Query Language and will enable you to ask a Database the questions that you want to know and exclude other information.
SQL is officially amazing!
Don’t be put off by the way SQL looks. It’s quite easy to get to grips with and is very, very powerful in our business and will be a useful skill to learn in any industry where big data is used.
What can SQL do?
- SQL can return large data sets very quickly (useful for management and long range reports)
- Allows you to interrogate (ask questions) to a Database directly
- There is no limit to the questions you can ask of the Database using SQL.
WHAT IS A DATABASE?
A Database is a collection of information that is organised so that it can be easily accessed, managed and updated.
Traditionally, companies had all their data in one large table.
This is problematic when trying to update parts of the database without affecting the whole. For example, using the traditional method of Database management, if we want to change details on the way we categorize Publishers, then it would mean a change to the whole Database.
A solution to this is to create a Relational Database.
A Relational Database is a collection of data items organized as a set of tables from which data can be accessed or reassembled in many different ways without having to modify the entire Database.
Tables are made up by rows and columns. The columns are linked together by shared values:
SQL works within a Relational Database.
The function of SQL is to SELECT columns of data, then specify which tables the data is FROM.
Once you have an initial table that you have chosen to SELECT, then you can JOIN other tables ON unique values.
SQL will then allow you to specify commands to help determine WHERE a range of data comes from (for example, between two dates) and then allow you to GROUP them effectively.
Most businesses are built on a series of
If you understand how a relationsional Database works, it will
massively increase your understanding of data sets and how to structure reports.
Understanding what information is in what table is key to getting the most out of SQL.
Tables are organized in Columns and Rows.
Columns contain types of information, such as Network name, Network ID, Account Manager etc.
Rows contain values that relate to those Columns. For example, Mobusi (Network name) 145 (Network ID) Zach Measures (Account Manager).
Each row has a primary key. A primary key is a special relational Database table column (or combination of columns) designated to uniquely identify all table records.
A primary key's main features are:
- It must contain a unique value for each row of data.
- It cannot contain null values.
Primary keys cannot be deleted and should be used later when you come to JOIN tables.
WHAT DO THE TABLES LOOK LIKE AND HOW MANY ARE THERE?
One way to see the tables as a starting point is to log into SQL and use the Content tab.
The tables can be accessed by typing in the table name.
Some tables like ‘revenue’ or ‘rollup_offer_link’ are vast, with lots of data (by lots, we mean over 315 MIL rows on some tables). Some tables like ‘office’ are tiny (3 rows in the table)
To see what information is on what table, you can search in the filter on the table. Remember, that you will only be able to search on an exact match in this view of SQL.
To see which columns are in the selected table use the drop down here:
Example with drop down displayed
As you can see, some tables contain more information than others. In order to link different tables to encompass all the data we want, we need to JOIN one table to another.
The JOIN function is critical in SQL. Tables link to each other through shared values (like publisher, network or offer ID). The below shows some common ways to link to tables.
WRITING A QUERY- BASIC COMMANDS
I have found the following mnemonic useful for remembering the order of commands. “San Francisco Joggers Open Windows Gently” (SFJOWG).
BASIC RULES OF SQL
Always SELECT data from the largest table you can. This will help when you JOIN other tables.
JOIN ON on unique IDs if possible. Dates, names, descriptions, numeric values (like revenue) are not good values to JOIN on.
Try to JOIN as few tables as possible
- MAKE SURE THE QUERY IS RIGHT!! SQL IS ONLY AS GOOD AS THE DATA PUT IN!
- Always ensure you write your commands in the correct format. It will make it easier for you to troubleshoot your own queries.
on the largest data sets
- Test on small date ranges first
- Make sure you name the tables correctly
- Make sure you use the right syntax for , ; and so on
Be very careful running queries on key
- Make sure that you use the SUM() function if you want to add up rather than have the Database return a huge output of data.
- Use ‘AND revenue > 0’ to avoid running reports which will carry through all revenue records (including those with 0 revenue)
APPENDIX - USEFUL RESOURCES AND QUERIES