This blog is gonna be Legen-WAIT FOR IT-dary! or at least I hope so. As you may have guessed I am a huge HIMYM fan.
In this blog we will give an introduction on SQL, the basic syntax, the joins and everything you need to start your way in handling Data from relational databases in general.
Different types of Databases
We deal with databases on an almost daily basis, even without computers. Every time we access our bank accounts, add a friend on any social media application or buy anything online really.
In 1970, Edgar F. Codd of IBM published an academic paper titled, A Relational Model of Data for Large Shared Banks. The paper introduced an innovative way to model data. It paved the way of building a group of cross-linked tables that would allow you to store data just once. Introducing the capability to find answers to any data that we need so long as the answer was stored somewhere in it.
In the 1980s and ’90s, relational databases skyrocketed; Making the ability to query and index data extremely efficient. Table joins and Transactions were introduced.
There are various types of databases, that include but are not limited to:
The sort of database that stores data in the database system using an object-based data paradigm. The information is represented and saved as objects, which are analogous to the objects used in object-oriented programming.
Object databases are frequently employed in applications that call for high performance, calculations, and quick results. Real-time systems, architectural & engineering for 3D modelling, telecommunications, and scientific products, molecular science, and astronomy are some of the common applications that use object databases.
The database is normally the one that follows the network data model. The data is represented in this manner as a network of nodes connected by links. It allows each record to have several children and parent nodes, forming a flexible network structure.
A database that stores information in a virtual environment and runs on a cloud computing platform. It offers users access to the database through a variety of cloud - based services. There are numerous cloud platforms available, however the below are the best:
- Amazon Web Services(AWS)
- Microsoft Azure
- Google Cloud SQL
Non-SQL databases are a form of database that can store a wide range of data sets. It is not a relational database because it stores data in a variety of formats, not only tabular. It was created in response to a rise in the need for modern applications. As a result, in response to the demands, NoSQL introduced a wide range of database systems. A NoSQL database can be further classified into the following four types:
Key-value storage: It is the most basic sort of database storage, in which each object is stored as a key (or attribute name) that holds its value.
Document-oriented Database: A database that stores data in the form of a JSON-like document. It facilitates data storage for developers by using the same document-model format as the application code.
Graph Databases: It's a graph-like structure for storing large volumes of data. The graph database is most typically used by social networking websites.
Wide-column stores: It's akin to how data is stored in relational databases. Instead of storing data in rows, data is stored in huge columns.
This database uses the relational data model, which stores data in the form of rows (tuples) and columns (attributes), which are combined to make a table (relation). SQL is used to store, manipulate, and maintain data in a relational database. Each table in the database has a key that distinguishes the data from that of other tables.
We will focus mainly in this blog on Relational databases and how to manage them using SQL.
What is SQL?
SQL stands for Structured Query Language, It is the way by which you can manipulate the database and perform various operations like Inserting, updating or deleting records and creating tables.
SQL became an ANSI standard -American National Standards Institute- in 1986, and of ISO -International Organization for Standardization- in 1987. However, there are different versions of the SQL language, they all support the major commands (such as
WHERE). You don't have to worry if all of that sounds gibberish to you right now as we will discuss each of these terms and more in details.
Relational Databases Properties
The ACID properties are the four most well-known properties of a relational model, and they are as follows:
A means Atomicity: This assures that the data operation will complete successfully or unsuccessfully. It employs an all-or-nothing approach. A transaction, for example, will either be committed or aborted.
C means Consistency: If we conduct any operation on the data, its value should be retained both before and after the operation. For example, the account balance should be correct before and after the transaction, i.e., it should be conserved.
I means Isolation: There can be multiple concurrent users accessing data from the database at the same time. As a result, data isolation should be maintained. When many transactions occur at the same time, for example, one transaction's effects should not be apparent to the database's other transactions.
D means Durability: It ensures that data modifications are permanent after the process is completed and the data is committed.
Most common SQL Commands
You will work with these commands in an almost daily basis, as they are essential in working with any query:
CREATE DATABASE: creates a new database
ALTER DATABASE: modifies a database
CREATE TABLE: creates a new table
ALTER TABLE: modifies a table
DROP TABLE: deletes a table
SELECT: extracts data from a database
UPDATE: updates data in a database
DELETE: deletes data from a database
INSERT INTO: inserts new data into a database
Note: SQL keywords are not case sensitive:
select is the same as
SELECT and can be used interchangeably.
One or more tables are commonly seen in a database. A name is assigned to each table (e.g. "Customers" or "Purchases"). Tables include data records (rows).
We'll utilize our HIMYM knowledge and create a really simple database in this blog. The following is the script used to create the database, table and populating the table with some sample data:
IF NOT EXISTS(SELECT * FROM sys.databases WHERE name = 'HIMYM') BEGIN CREATE DATABASE HIMYM; END; use HIMYM; create table Characters( id INTEGER PRIMARY KEY IDENTITY (1, 1), CharacterName NVARCHAR(32) NOT NULL, Age FLOAT, Occupancy NVARCHAR(32) ); INSERT INTO Characters(CharacterName,Age,Occupancy) VALUES ('Ted Mosby', 27, 'Architect'), ('Marshall Eriksen', 27, 'Law-Student'), ('Barney Stinson', 29, 'PLEASE'), ('Lily Aldrin', 27, 'Pre-School Teacher'), ('Robin Scherbatsky', 25, 'News Reporter');
We performed a lot of operations in the script above , so let's provide a brief step-by-step explanation:
- Checking if the database exists and if it does not, then we create it.
- We then utilize the
USEkeyword to specify the database that we will use in the rest of the script to avoid any ambiguity.
- Creating our Characters table with the specified columns (id, Character Name, Age & Occupation)
- Populating the Characters table with some sample data.
We will dive deeper into
IDENTITY & more SQL features and keywords in further blogs; so you do not have to worry about them right now if you do not understand them.
if we run the query
SELECT * FROM Characters, we will retrieve the following data:
id CharacterName Age Occupancy 1 Ted Mosby 27 Architect 2 Marshall Eriksen 27 Law-Student 3 Barney Stinson 29 PLEASE 4 Lily Aldrin 27 Pre-School Teacher 5 Robin Scherbatsky 25 News Reporter
We have five records, each row corresponds to a unique character with each column holding all information linked with a certain field.
Let's breakdown this simple select query structure and understand what each keyword does; the general structure of a query contains the table name, selected columns to retrieve and the condition to retrieve by.
SELECT column_name(s) FROM table_name WHERE condition;
In our example we did not have a
WHERE condition but we will dig deeper on when to use it later on.
SELECT: used to retrieve data.
*: a wildcard expression used to indicate that we want to retrieve all the columns in our query.
From table_name: the name of the table from which we want to retrieve our data.
Phew! that was a lot of information if you are still new to the world of databases & SQL, that's it for this blog and In the upcoming blog posts, we will start going deeper into SQL and provide an introduction to programming with python as part of our coding & data analysis journey!
Hope you enjoyed reading this blog as much as I enjoyed writing it. If you have any comments or suggestions please feel free to reach out ❤