SQL: Joining and Grouping

18th September 2019

With so much choice out there, it can be easy to concentrate on emerging technologies rather than the basics. But from time-time, going over the basics can be beneficial to get the best performance. It's a term used a lot in sport, and it's no different with Software Development.

In this article, I'm going to go over some basic SQL statements, in particular, joining tables together. The examples will be written for SQL Server, but should also work for MySQL.

The Example

The example below shows part of an eCommerce database. There are two tables. A Product table and a OrderProduct table. The Product table has a one-to-many relationship with the OrderProduct Table.

Product

ProductIdNamePrice
1T-Shirt10.00
2Shoes25.00
3Hat4.00
4Tie2.00
5Jumper20.00

OrderProduct

OrderProductIdProductIdNamePrice
11T-Shirt10.00
21T-Shirt10.00
31T-Shirt10.00
42Shoes25.00
52Shoes25.00
65Jumper20.00

Joining

Both tables join each other by using the ProductId. The ProductId on the Product table joins the OrderProduct table by the exact same key.

Inner Join

Inner join allows you to return the results of both tables, as long as the joining column exists in both table records. If the joining column record does not exist on either table record, it will not be returned.

In the example below, we are inner joining both Product and OrderProduct together and returning the columns from the Product table in the SELECT statement. Because ProductId 3 and 4 do not exist in the OrderProduct table, they do not get returned.

SELECT Product.ProductId, Product.[Name], Product.Price FROM dbo.Product
INNER JOIN dbo.OrderProduct ON Product.ProductId=OrderProduct.ProductId
ORDER BY Product.ProductId
ProductIdNamePrice
1T-Shirt10.00
1T-Shirt10.00
1T-Shirt10.00
2Shoes25.00
2Shoes25.00
5Jumper20.00

But wait, this is returning duplicate records, how do we only return unique records from the Product table? By using GROUP BY. This allows us to group the results by particular columns. The example below shows us grouping the results by ProductId and Name.

SELECT Product.ProductId, Product.[Name] FROM dbo.Product
INNER JOIN dbo.OrderProduct ON Product.ProductId=OrderProduct.ProductId
GROUP BY Product.ProductId, Product.[Name]
ORDER BY Product.ProductId
ProductIdName
1T-Shirt
2Shoes
5Jumper

Now, we can amend this further by working how much was spent for each product by using the SUM function. As you can probably guess, the SUM function adds up all the columns and returns the total. We can also use the COUNT function to work out the quantity for each product. You can only use these functions when grouping data together.

SELECT Product.ProductId, Product.[Name], COUNT(OrderProduct.OrderProductId) AS Quantity, SUM(OrderProduct.Price) AS TotalPrice FROM dbo.Product
INNER JOIN dbo.OrderProduct ON Product.ProductId=OrderProduct.ProductId
GROUP BY Product.ProductId, Product.[Name]
ORDER BY Product.ProductId
ProductIdNameQuantityTotalPrice
1T-Shirt330.00
2Shoes250.00
5Jumper150.00

Left Join and Right Join

Left join (or left outer join) allows you to join two tables, but the initial table will return all records, regardless of whether the key exists in the joining table.

SELECT Product.ProductId, Product.[Name] FROM dbo.Product
LEFT JOIN dbo.OrderProduct ON Product.ProductId=OrderProduct.ProductId
GROUP BY Product.ProductId, Product.[Name]
ORDER BY Product.ProductId
ProductIdName
1T-Shirt
2Shoes
3Hat
4Tie
5Jumper

Right join does the exact opposite. It will return all results from the joining table, even if the join doesn't exist in the initial table. But it won't return results from the initial table if it does not exist in the joining table.

SELECT Product.ProductId, Product.[Name] FROM dbo.Product
RIGHT JOIN dbo.OrderProduct ON Product.ProductId=OrderProduct.ProductId
GROUP BY Product.ProductId, Product.[Name]
ORDER BY Product.ProductId
ProductIdName
1T-Shirt
2Shoes
5Jumper

Again, with left and right joins, you can take advantage of grouping to get the totals in the exact same way that you would do it for inner joins.

Cross Join

Cross join is an interesting one. For each record in the initial table, it will return all the results in the joining table. This leads to a large number of results being returned.

As I don't want a large number of results returned, I've put in a where clause that only returns a Product if the ID is 1 or 2.

SELECT Product.ProductId, Product.[Name] FROM dbo.Product
CROSS JOIN dbo.OrderProduct
WHERE Product.ProductId IN (1,2)
ORDER BY Product.ProductId
ProductIdName
1T-Shirt
1 T-Shirt
1 T-Shirt
1 T-Shirt
1 T-Shirt
1 T-Shirt
2Shoes
2Shoes
2Shoes
2Shoes
2Shoes
2Shoes

I haven't used cross joins on too many occasions in my career. The only real example that I can think off where it could be beneficial is if you were creating a sports league and you wanted to output all the fixtures. If you had a Team table, you could cross join it with the same table and exclude records where both ID's are the same.

Conclusion

It's worth noting that joins can be expensive to the performance of your SQL Server. In our example above, we are only working with a very limited number of tables and records. But the more tables you join and the more records you have within those tables, the longer it will return the results.

When planning out your database architecture, always have this in your mind!

About the author

David Grace

David Grace

Senior .NET web developer | ASP.NET Core | C# | Software developer

Free .NET videos

  • Do you want to watch free videos featuring .NET 7 new features?
  • How about what's new in C# 11?
  • Or a recap on the SOLID principles?
Watch our .NET videos