DEV Community

roadpilot
roadpilot

Posted on

SQL: Find the sum using two tables (Join and Subquery)

The problem statement:
"Given the CITY and COUNTRY tables, query the sum of the populations of all cities where the CONTINENT is 'Asia'.

Note: CITY.CountryCode and COUNTRY.Code are matching key columns."

The CITY table is a table of cities with a "POPULATION" field. The field is numeric so we can use native "sum" to add up the populations of the cities that meet the criteria. The CITY table uses "CountryCode" as it's foreign key to create the relationship to the COUNTRY table.

The COUNTRY table is a table of countries and with a "Code" primary key and a "CONTINENT" which has a value for each continent in which that the country is found. The "Code" field is the object of the CITY table foreign key. The COUNTRY table also has "POPULATION" field.

For our purposes, we want to first find the the countries from the COUNTRY table where 'Asia' is the continent:

select code from country where continent='Asia'
Enter fullscreen mode Exit fullscreen mode

This gives us the "Code" of all countries that are in Asia. We have fulfilled our criteria. But this is in the "Country" table, and we need to filter the "City" table to get the sum of the populations.

Now that we know the Country "Code" recordset, we can filter the CITY table by the cities that have the "CountryCode" from our previous query. We will use our previous query as a subquery for this part of the filtering:

select population from city 
where countrycode in 
(select code from country where continent='Asia')
Enter fullscreen mode Exit fullscreen mode

This will now filter our City table and return the populations for only the cities that are in Asia.

The final step is to add the "sum" function:

select sum(population) from city 
where countrycode in 
(select code from country where continent='Asia')
Enter fullscreen mode Exit fullscreen mode

The sum function takes the result of the filtering criteria and returns a number that is the sum of all of the field values, in this all (POPULATION) values that meet the criteria (continent='Asia')

This was the subquery method of solving this problem. We can also create the relation between the two tables by using a JOIN statement. The JOIN statement's main purpose is to combine rows from one or more tables based on a match condition. The combined row set is then available by the select statement for use to display, filter, or group by the columns. There was a little hint in the problem statement:

Note: CITY.CountryCode and COUNTRY.Code are matching key columns.
Enter fullscreen mode Exit fullscreen mode

We used this same information to form the relationship with our subquery but now we're going to use it in a JOIN statement.

select sum(CITY.population) from city 
inner join country 
on CITY.CountryCode=COUNTRY.Code 
where continent='Asia'
Enter fullscreen mode Exit fullscreen mode

Notice we had to specify CITY.population because when you join two or more tables, there may be duplication of field names and if you don't specify, the query engine will not know which "population" field you are referring to (it will give you an error on execution).

We use our select statement to query the sum of the population field from the CITY table when joined to the COUNTRY table. We match the CITY.CountryCode value to the COUNTRY.Code value to create the relationship (JOIN the tables). Then we use the 'continent' field to filter to the 'Asia' continents. Notice we don't have to specify 'COUNTRY.contentent' - that's because there is no "continent" field in the CITY table. Only when there are duplicated field names do you have to specify the "table.fieldname" structure. There may be times when you want to, to prevent confusion, but you don't have to.

In this particular scenario, using a subquery probably doesn't make much difference from from using a JOIN statement, but in a larger scale database or a more complex query where every subquery has to be evaluated, it may make a difference in processing time. Some things to consider when choosing one over the other. Subqueries are nice because they can stand alone and you can test as you go, but you might end up taking more of a processing hit than if you'd used a JOIN.

Top comments (0)