Photo by Campaign Creators on Unsplash
How to - Data Analysis With SQL
A continuation on our earlier SQL tutorial while linking it with Data Analysis
Are you a Data Enthusiast? Do you want to learn more about Data Analysis & SQL?
Then you have come to the right place! In this blog you will be introduced to practical Data Analysis using SQL, and we will get to dive deeper into SQL and continue our previous SQL tutorial, If you haven't already then please go check How I Met Your Database -& Mainly Relational Using SQL-
Northwind Database
In this blog, we will be working with Northwind Database, which is a sample database created by Microsoft to practice fictional data in a company environment.
The database consists of the following tables:
Suppliers: Suppliers and vendors of Northwind
Customers: Customers who buy products from Northwind
Employees: Employee details of Northwind traders
Products: Product information
Shippers: The names and contact information for the shippers who transport the goods from the traders to the end-users.
Orders and Order_Details: Customers' and the companies' sales order interactions.
To have a broader idea of the database structure, we can check the below Entity Relationship Diagram -ERD-:
*Image credits goes to: yugabyte *
The Power Of SQL Queries
You now have a database that is completely ready for you to explore, so let's start by utilizing SQL and asking ourselves, what insights can we gain from our Database ?
Highest Performing Employees
Let's say that we measure our performance goals solely on the volume of orders, regardless of their monetary value.
We can use our SQL knowledge to know who the highest performing employees based on number of orders.
Based on the ERD above, the reference that links - Foreign Key - between the Orders & the Employees Tables is the Employee ID.
Foreign Keys
A FOREIGN KEY
is a field (or set of fields) in one table that points to a PRIMARY KEY
in another table.
The child table is the table with the foreign key, and the referred or parent table is the table with the primary key.
SQL Query
The Query to get the volume of orders and the employee responsible looks like this:
SELECT E.EmployeeID As Employee_ID,
COUNT(O.EmployeeID) AS Order_Count,
CONCAT(E.FirstName, ' ' ,E.LastName) As Full_Name
FROM Orders as O
inner join Employees as E on O.EmployeeID = E.EmployeeID
group by E.EmployeeID, E.FirstName, E.LastName;
Lets dissect the query and understand what each part does, starting with GROUP BY
GROUP BY
The GROUP BY
statement is used to groups rows with the same values into summary rows.
To group the result set by one or more columns, the GROUP BY
statement is frequently used with aggregate functions (COUNT(), MAX(), MIN(), SUM(), AVG()).
In the above query however we can see that we grouped by The Employee's ID, First Name & Last Name.
Aliases
SQL aliases are used to give a table, or a column in a table, a temporary name, they are often used to make column names more readable.
An alias only exists for the duration of that query and is created with the AS
keyword.
Inner Join
The INNER JOIN
keyword picks records in both tables that have the same value.
You can think about Inner Join
in the term of the intersection between two tables
If the columns in both tables match, the INNER JOIN keyword selects all rows from both tables. These orders will not be displayed if there are records in the "Orders" table that do not have matches in the "Employees" table!
There are a lot of table join types, we will discuss them as the need arises but you can check the below Venn Diagram as it will provide you with a brief explanation for the use-case of each join type:
Query Output
Our executed query will yield the following result -as shown below- and this provided us with insight on which employees created the most orders for the customers.
Employee_ID Order_Count Full_Name
1 123 Nancy Davolio
2 96 Andrew Fuller
3 127 Janet Leverling
4 156 Margaret Peacock
5 42 Steven Buchanan
6 67 Michael Suyama
7 72 Robert King
8 104 Laura Callahan
9 43 Anne Dodsworth
But this is still missing something fundamental, did you notice it?
If your answer was their order then you are absolutely correct!, our query didn't order the elements based on volume of orders, it is still missing the ORDER BY
clause.
ORDER BY
As the name suggests, if we want to sort the result set in ascending or descending order, use the ORDER BY keyword.
By default, the ORDER BY keyword organizes the records in ascending order. Use the DESC keyword to sort the records in descending order.
For example, In our query, we should add the order by
clause after the group by
like this:
ORDER BY COUNT(O.EmployeeID) DESC;
OR
-- Since we defined Order_Count as an Alias for COUNT(O.EmployeeID)
-- we can use them interchangeably
order by Order_Count desc;
And that yeilds the result set organized as below:
Employee_ID Order_Count Full_Name
4 156 Margaret Peacock
3 127 Janet Leverling
1 123 Nancy Davolio
8 104 Laura Callahan
2 96 Andrew Fuller
7 72 Robert King
6 67 Michael Suyama
9 43 Anne Dodsworth
5 42 Steven Buchanan
How have discounts affected the number of items sold?
It is a known fact that we all LOVE sales, no matter what the sale is on, we lways manage to find something that we like and buy something new or an even larger quantity of something that we use on a regular basis.
But, this is not a scientific way to prove it, so how can we ?
We can start by getting the number of items sold with a sale and those sold without while comparing them with the number of orders made.
Let's see how we can achieve that with SQL.
SQL Queries
We will be keeping the Queries simple and using only the basic syntax, but I strongly advice you to start reading about Left Join
and joins in general and to start experimenting by yourself on various databases and datasets like The Famous Titanic Dataset on Kaggle
The below queries check for the total number of items purchased and the number of orders made.
-- For Items With Discounts:
SELECT SUM(OD.Quantity) AS Total_Items_Sold, Count(OD.OrderID)
FROM [Order Details] as OD
WHERE OD.Discount > 0;
-- For Items Without Discounts:
SELECT SUM(OD.Quantity) AS Total_Items_Sold, Count(OD.OrderID)
FROM [Order Details] as OD
WHERE OD.Discount = 0;
For the Results:
Items With Discounts: A total of
22718
items were sold in838
orders.Total_Items_Sold Number_Of_Orders 22718 838
Items Without Discounts: A total of
28599
items in1317
ordersTotal_Items_Sold Number_Of_Orders 28599 1317
If we calculate the average number of items per order, We will find that clearly the number of items purchased increases when a sale is present.
What discount percentage should the company offer?
Now that we have seen that discounts affect the volume of our sales, we have to determine the best percentage that will help us maximize our profit & market share.
Let's check the increments of 5% discounts in our database and see what our data is trying to tell us.
SQL QUERY
SELECT SUM(OD.Quantity) AS Total_Items_Sold, Count(OD.OrderID)AS Number_Of_Orders, OD.Discount AS Discount_Percentage
FROM [Order Details] as OD
WHERE OD.Discount = 0.05 or OD.Discount =0.1 or OD.Discount =0.2 or OD.Discount =0.25
GROUP BY OD.Discount
ORDER BY OD.Discount DESC;
This shows the following results, which surprisingly show a very little change in the order sizes and numbers is an indicator for the Northwind company that if the offer a discount as low as 5% , they will notice an increase in sales with no need to decrease their revenue any further.
Total_Items_Sold Number_Of_Orders Discount_Percentage
4349 154 0.25
4351 161 0.2
4366 173 0.1
5182 185 0.05
We can take this even a step further and see the effect of discounts on different product categories or If discounts affect certain regions more than others, but I will leave that for you to answer and test your skills.
This brings us to the end of today's blog about using SQL for Data Analysis, In the upcoming blog we will be getting to know Python and later on we will create a mini project with full code examples to handle the Northwind Database and other various datasets.
Hope you enjoyed reading this blog as much as I enjoyed writing it. If you have any comments or suggestions, please feel free to reach out ❤