top of page

Product Managers Get Ready for Advance SQL - Part 3

This is part 3 of my post on learning SQL. I wrote about how to set up Postgres and PgAdmin in my post 1 and covered a few basic SQL queries in post 2. In this post, I want to discuss the aggregate functions, use of GROUP BY, JOINS ( I know...the most dreadful topic), and UNIONs


Aggregate Functions


The main role of aggregate functions is to take multiple inputs and return a single output. Here are the most common aggregate functions.

  • AVG ()- returns average value

  • COUNT()- returns the number of values

  • MAX ()- returns the maximum value

  • MIN ()- returns the minimum value

  • SUM ()- returns the sum of all values

The aggregate functions are used in the SELECT clause. Here are a few examples:


1) Find the average, minimum, and maximum rental duration from the film table


SELECT AVG (rental_duration), MIN (rental_duration), MAX (rental_duration) FROM film


2) Find the total amount from the payment table

SELECT SUM (amount) FROM payment


Few key things to note are that aggregate functions return only 1 result and hence we cannot combine them to be displayed with other columns unless we are using GROUP BY .


GROUP BY and HAVING


This is a very powerful function that product managers used extensively. What we do basically is to use an aggregate function on a category. For example, you want to know who are your top paying customers. If you think about this, you will need to do a SUM on the amount paid for all the transactions per customer and then sort them in descending order. Let's look at it by an example. Let's explore the payment table.



If you will notice, you will see the amount column and customer id column in this table. In order to find the top-paying customer, we need the SUM of the amount on all transactions per customer.


SELECT customer_id, SUM(amount) FROM payment

GROUP BY customer_id

ORDER BY SUM(amount) DESC


You will notice that customer_id '148' is our highest paying customer. We can additionally use LIMIT to identify the top 3 paying customers. We can also use COUNT instead of SUM to find out the total number of transactions per customer. We can use these results to identify customer segments for various promotions and whatnot. If you want to use the

filter on aggregate function like only display customers whose SUM >100, we can use HAVING


SELECT customer_id, SUM(amount) FROM payment

GROUP BY customer_id

HAVING SUM(amount) >100

ORDER BY SUM(amount) DESC




JOINS


Until now, all the queries that I have written are executed on a single table. But more than often, we find ourselves running queries that will require us to connect two or more tables. And that's where the JOINS come into play. I think this concept will be best understood with the help of the Venn diagram.



Consider there are 2 tables, Table A and Table B. Now, depending on our requirements we may find ourselves in one of 4 situations

  1. Common records between 2 tables - Can be retrieved using Inner Join

  2. Records that are unique to both Table A and Table B. Can be retrieved using FULL OUTER Join

  3. Records that are only exclusive to Table A- Can be retrieved using LEFT OUTER Join

  4. Records that are only exclusive to Table B- Can be retrieved using RIGHT OUTER Join

In order to join the tables, we need Table B to have the primary key of Table A present as a foreign key ( Remember Primary Key and Foreign Keys from post 1 of this series).


Let's look at each join and their syntaxes


Inner Join- The Syntax of the query is


SELECT * FROM Table A

INNER JOIN Table B

ON Table A.col_match = Table B.col_match


Here is an example. Let's fetch payment_id, customer_id and first_name by joining payment and customer table


SELECT payment_id, payment.customer_id,customer.first_name FROM payment

INNER JOIN customer ON

payment.customer_id = customer.customer_id




FULL OUTER Join- The Syntax of the query is


SELECT * FROM Table A

FULL OUTER JOIN Table B

ON Table A.col_match = Table B.col_match

WHERE Table A.id IS NULL OR Table B.ID IS NULL


Here is an example

SELECT * FROM payment

FULL OUTER JOIN customer ON

payment.customer_id = customer.customer_id

WHERE payment.payment_id IS NULL OR customer.customer_id IS NULL




LEFT OUTER Join


The Syntax of the query is


SELECT * FROM Table A

LEFT JOIN Table B

ON Table A.col_match = Table B.col_match

WHERE Table B.ID IS NULL


Here is an example to find out films that are not in inventory


SELECT film.film_id, title, inventory_id

FROM film

LEFT JOIN inventory ON

inventory.film_id= film.film_id

WHERE inventory.film_id IS NULL




RIGHT OUTER Join


This is similar to LEFT JOIN except that the tables are changed


The Syntax of the query is


SELECT * FROM Table A

RIGHT JOIN Table B

ON Table A.col_match = Table B.col_match

WHERE Table A.ID IS NULL


Here is an example


SELECT film.film_id, title, inventory_id

FROM film

RIGHT JOIN inventory ON

inventory.film_id= film.film_id

WHERE film.film_id IS NULL


UNION


The UNION operator is used to combine the result-set of two or more SELECT statements.


The Syntax of the query is


SELECT column_name(s) FROM Table A

UNION

SELECT column_name(s) FROM Table B


That's all I wanted to cover in this post. There is so much more in SQL like the use of various date operators, operators for conditions such as CASE, COALESCE, operator for converting data type such as CAST, etc but the ones I mentioned in my 3 posts will give us sufficient knowledge to run the basic queries all by ourselves. In my next posts, I want to cover some basic DevOps concepts like CI/CD pipelines, Jenkins, Dockers, and Ansible.


Stay Tuned !!

85 views0 comments

Recent Posts

See All
bottom of page