In this article, we will learn about the basics of SQL (SELECT, FROM & WHERE) with exciting visuals, and the bonus is my personal story of learning SQL. I tell you, it is no less than Moneyball. You all must have used Excel in your life to do something, and in the article, we will connect your Excel experience to help you understand how SQL functions. Let’s start with the story, and you will learn SQL Basics.
I was working in customer support for a company, let’s call it ‘Big Name,’ and I was as frustrated as one could be when some toddler sits next to you in a theater, and it is a Tom Hanks movie that you have been planning for quite a long time.
I hope you can anticipate why people get irritated because of a call center job. I have been thinking of switching jobs for a long time but didn’t have enough skills to pursue my thoughts.
I was going through LinkedIn one day and discovered that one of my friends, Vijay, is working for a company called CricDope Analytics.
I messaged him, and we had a crisp conversation where he told me they were looking for multiple interns because they had won a contract with a big cricket analytics company. I asked him if I could do it, and he agreed to help.
All I knew at that time was some Excel basics and nothing else. Vijay assured me they to help.
I got this internship just because of Vijay. I resigned from ‘Big Name’ and joined CricDope.
I was assigned to figure out some interesting facts and data about the IPL 2020 Season. They needed it to write one blog: ‘ Why is IPL 2020 the most interesting season so far?’
So, I didn’t have any idea how to go about it! Vijay came to the rescue. He shared the dataset of IPL 2020 with me in an Excel sheet, and he told me it is already available on the SQL server.
And he shared the plan ‘how I do it?’. It sounded simple. He said he would share questions with me, and I had to get those data from Excel, so I could try and learn SQL and compare to check. Simple, right?
What is the first thing you do if you have to read the data in an Excel file? Open the folder (i.e., IPL) where the Excel file resides and double-click the file (IPL2020) to open, right?
That’s the exact step we do when we have to read the data of a table from the Database.
USE IPL; — Opens the folder
SELECT * FROM IPL2020; — Read the Excel file
USE IPL means to open the database IPL where all tables from IPL2008 to IPL2022 are present. It is the folder where all tables are organized.
Don’t focus on SELECT and *; we will learn those in the next part. Here ‘FROM IPL2020’ means we are asking to open the IPL2020 table from all existing tables in our database.
What happens the moment the file opens (after double-clicking)? You see all the data(all columns and rows) on our screen, right?
SELECT * exactly does that. It SELECTs all (that’s what * means) the column to show in the output.
The SELECT statement is used to see columns from the table. This is the formal definition that you can find all over the internet.
This is the first thing I learned on my first day at CricDope. How to open the data?
What does this query do?
What if I don’t want to see all the columns but only want to see selected columns like “Match Date” and “Venue”?
SELECT match_date, venue FROM IPL2020;
SQL Engine fetches this IPL2020 table in this query and then prints columns “match_date” and “venue” because we have specified that we want to see data from two columns for our use.
SELECT with Aggregation Function
As the year passes, IPL is getting very competitive. Every team is trying to break the existing records. In the IPL 2020 season, it would be interesting to find the highest score of the innings is?
That means the maximum score by any team in an inning is 228 in the IPL2020 season.
Wait, we can see here the column name is “MAX(team_score),” which does not look very good. Well, we can rename column names if required. Here is how we do it.
You can see the result column is renamed to ‘season_topscore.’ This is how easily we can put an alias.
Writing Formula in SELECT
Suppose we have to find a run rate for this top score. In this case, we can use simple mathematical operations as well.
11.40 is the run rate of the match, with the highest score inning in IPL2020. 🙂
Easy, like writing a sentence in English, right?
I learned how to fetch the data with SQL queries on my first day. On the second day, Vijay asked me to learn about WHERE. And he also mentioned I would be able to solve most of the questions on the list after this.
Let’s dive into how I learned WHERE and solved the questions without asking Vijay for help.
Let’s now understand filtering data with WHERE. In Excel, we click on the filter icon and put conditions and apply. In SQL, we use and put the conditions.
Let me share an interesting incident that happened on my second day. I was talking to a guy in the office about the Ashes and how England had beaten Australia on a thrilling last day of the test, and Vijay saw me. He thought I was taking this casually and I wouldn’t be able to finish it by the next day.
He messaged me and asked…
Please share the match number from the table where Virat Kohli won the Player of the Match award.
And I didn’t know how to use WHERE because I didn’t learn it the previous day and went home after I learned SELECT and FROM.
I saw the end of my chance to work in the world of data. And suddenly realized I could use the IPL2020 Excel sheet and answer him. I took the chances. Here is what I did in Excel.
So I opened this IPL2020 file in excel, and in the column ‘player_of_match,’ I clicked on Virat Kohli and applied a filter. Here is what I got-
‘Match 25’, I replied to the message, and he acknowledged the hard work which I was yet to do. 🙂
WHERE is used in the same way to filter data. I implemented this with SQL. Let me show you. I will continue using the following SQL code every time as it reads the data using FROM and show all (*) the columns. Remember?
SELECT * FROM IPL2020;
Now filter the “Virat Kohli” as a player of the match. Here is how I did the same thing on SQL:
SELECT match_number, player_of_match FROM IPL2020 WHERE player_of_match = 'V Kohli'
Here it is, ‘the Match 25’!
As soon as I saw this, my chest got puffed up. I was able to do it on the very first attempt. It is a simple English statement. Instead of *, We have used two-column names because we don’t need to see other columns to solve this; we need Virat Kohli and the Match number.
Order of Execution-
In the previous question, I wanted to filter the Player_of_match column with the value = V Kohli.
So this is what I did-
I was ready to answer the questions required for the blog ‘Why is IPL 2020 the most interesting season so far?’ Because Vijay told me, ‘Once you know WHERE you will be able to solve these questions!’
Here is the list of questions he shared with me-
IPL2020 has many tie matches; find out precisely how many were tied.
“Trivia: Before this season, a maximum of 2 matches had been tied in a season.”
I solved these questions in Excel and SQL to cross-verify. In Excel, I filtered the outcome column and selected ties like this-
You could see four tied matches (match numbers- 2,35,36,10). This is the maximum number of ties in any season of IPL.
SELECT * FROM IPL2020 WHERE outcome = ‘tie’;
Both outcomes were similar. Yaay! The first question cracked!
In this outcome, columns are two values, ‘tie’ and NULL. That means either a match was tied, or if there was a final result, it is NULL.
“A NULL value in a relational DB is used when the value in a column is either “unknown” or “missing.” A null is neither an empty string(i.e. “”) nor a zero value (i.e., 0).”
So I figured the other way of doing it-
SELECT * FROM IPL2020 WHERE Outcome IS NOT NULL ;
When we use NULL or NOT NULL to compare, we do not use the = operator. We use NULL or IS NOT NULL.
So opposite of this query will be (if we have to find matches that didn’t end tie)-
SELECT * FROM IPL2020 WHERE Outcome is NULL ;
How many matches were won with at least eight wickets in hand (won by at least eight wickets)?
I opened Excel sheets and found two columns that can be useful in this case. The first is winner_type (wickets or runs), and the second is win_by (it contains the margin of win, wickets, or runs). I applied filters in both columns- winner_type (selected wickets) and win_by (more than or equals eight wickets). Have a look here-
So there were 13 matches( match no 15,18,31,32,39,40,41,44,45,46,51,53 & 56) which resulted in at least 8-wicket wins.
Now, we have to use the same math in SQL.
SELECT match_number, win_by, winner_type FROM IPL2020 WHERE winner_type = 'wickets' AND win_by >= 8;
We have done the same things in the SQL query as in WHERE. We have applied two conditions and joined both with an “AND.” So the SQL Engine will look for ‘wickets’ in the winner_type column, and if it is satisfied, it will show records with more than or equal to 8 wickets, i.e., >=8 (same results can be attained with win_by >7).
After solving this, I read this boring post on operators and extracted some exciting outcomes about operators that I want to share with you.
Find the teams that won the toss, elected to field, and won the match.
Trivia: As per the General trends of previous seasons, teams that chase the score have slightly more chances of winning than others. With this, we will see the number against 60 matches.
I solved this in Excel first.
- I applied filter in toss_decision = ‘field’
- In winner_type = ‘wickets’ (because the team that chases the score will win by wickets :)).
The answer here is 14 matches.
Now I had to find the SQL way-
SELECT * FROM IPL2020 WHERE toss_decision = ‘field’ AND toss_winner = winner;
SELECT * FROM IPL2020 WHERE toss_decision = 'field' AND toss_winner = winner;
Look at this query; you will be amazed at how SQL can pull off complex operations. In Excel, we had to filter on win_type = ‘wickets’ while SQL can compare the toss_winner column to the winner column and extract results in a fraction of a second.
It seems simple, right? We have to check if toss_winner and winner are the same.
Look at what this SQL query does.
It compares the toss_winner and winner columns and shows the rows where both are the same. :))
After solving this, I was amazed at what I was dealing with. :)) also, I had this sense of happiness about leaving the ‘Big Name’ to pursue my passion for working as a Data Analyst/Data Scientist; I don’t know why! 🙂
I had to find teams that ‘won the toss and elected to field’ and won the match.
Find the teams that scored over 200 or under 60.
Trivia- To analyze the dominant performances and chances that could result in NRR (Net Run Rate) boosts for teams.
I did it in Excel first. Here is how-
We have put a filter on the team_score column where we asked to show values >=200 OR <= 60.
Here is the SQL way-
SELECT * FROM IPL2020 WHERE team_score >= 200 OR team_score <= 60 ;
And both results were the same.
My takeaway after this was; that I understood the difference between AND and OR very clearly and for the first time in my life.
OR is used when we want results that satisfy any condition, like in this question, we want results where the score is less than 60 OR more than 200. It is generally applied to a single column. In our case, we have applied conditions on the same column, i.e., match_score.
AND is used where strictly both conditions have to be true. It is generally applied to multiple columns. Like in the above question: (toss_decision = ‘field’ AND toss_winner = winner;) we applied on two columns.
Team scores above 200 or below 60. Who fielded first(chased)?
Trivia- To check how many huge totals chases (above 200) and the fastest chases of the season (chasing below 60).
Let’s try this the Excel way.
I applied a filter on team_score(>=200 or <=60) and on toss_winner = ‘field’. It seems easy on Excel.
Let’s check the SQL way-
SELECT * FROM IPL2020 WHERE (team_score >= 200 AND toss_decision = 'field') OR (team_score <= 60 AND toss_decision = 'field') ;
It also prints the same results. :))
Find out the matches where “V Kohli, KL Rahul, and AB de Villiers” were the players of the match.
I had this confidence in writing basic SQL queries for this question, so I wanted to try some exciting approaches.
Just common approach would be;
SELECT * FROM IPL2020 WHERE Player_of_match = ‘KL Rahul’ OR Player_of_match = ‘V Kohli’ OR Player_of_match = ‘AB DE Villiers’;
This gives the match details for this question.
But what if you have to find it for 15 players instead of these 3? An interesting approach is here with IN operator-
SELECT * FROM IPL2020 WHERE Player_of_match IN (‘KL Rahul’, ‘V Kohli’, ‘AB DE Villiers’);
This will make your work easier. The IN operator checks if any values mentioned IN the brackets match the column values.
You can specify multiple values in the IN operator, just like we did here. It is a shortcut to save yourself from multiple OR conditions. 🙂
We can also use NOTIN to perform the opposite operation of IN. So if we have to list out players of matches except ‘KL Rahul,’ ‘V Kohli’, and ‘AB DE Villiers, we will use the same query with NOT IN except for IN.
So if we have to find matches where ‘KL Rahul’, ‘V Kohli’, and ‘AB DE Villiers’ were not player_of_match? This is how we can do it.
SELECT * FROM IPL2020 WHERE Player_of_match <> 'KL Rahul' or Player_of_match <> 'V Kohli' or Player_of_match <> 'AB de Villiers’ ;
Or, this way-
<> and != both denotes not equals.
The limitation with these is if we have to use 100 player names, we would not be able to do it till our time to go home, but with NOT IN is cool and faster-
Look at the same query with NOT IN-
This is how I completed my first assignment with CricDope and gained the confidence to march into more advanced things. After I solved all these questions and shared data on each one with Vijay, I was more than relieved and confident. To describe it in a close approach.
I felt like this! 😂