SQL Distinct

Retrieving Distinct Values

When retrieving data from a database table, you may encounter duplicate rows or values in the result set. To eliminate these duplicates, you can use the DISTINCT keyword immediately after the SELECT keyword, as shown below:

Syntax

The DISTINCT clause is used to remove duplicate rows from the result set:

SELECT DISTINCT column_list FROM table_name;

Here, column_list is a comma-separated list of column or field names from a database table (e.g., name, age, country, etc.) whose values you wish to retrieve.

Note: The DISTINCT clause operates similarly to the UNIQUE constraint, except in its treatment of null values. Two NULL values are considered unique, though they are not considered distinct from each other.

Let's explore some examples to see how it functions in practice.

Imagine we have a customers table in our database with the following records:

+---------+--------------------+-----------+-------------+
| cust_id | cust_name          | city      | postal_code |
+---------+--------------------+-----------+-------------+
|       1 | Maria Anders       | Berlin    | 12209       |
|       2 | Fran Wilson        | Madrid    | 28023       |
|       3 | Dominique Perrier  | Paris     | 75016       |
|       4 | Martin Blank       | Turin     | 10100       |
|       5 | Thomas Hardy       | Portland  | 97219       |
|       6 | Christina Aguilera | Madrid    | 28001       |
+---------+--------------------+-----------+-------------+

Now, execute the following query to retrieve all the rows from the city column of this table:

SELECT city FROM customers;

After executing the query, the output will look something like this:

+-----------+
| city      |
+-----------+
| Berlin    |
| Madrid    |
| Paris     |
| Turin     |
| Portland  |
| Madrid    |
+-----------+

Upon examining the output, you'll notice that the city "Madrid" appears twice in our result set, which is undesirable. Let's address this issue.

Removing Duplicate Data

The following query utilizes DISTINCT to produce a list of unique cities from the customers table.

SELECT DISTINCT city FROM customers;

After executing the above query, the output will resemble this:

+-----------+
| city      |
+-----------+
| Berlin    |
| Madrid    |
| Paris     |
| Turin     |
| Portland  |
+-----------+

As you can see, this time there are no duplicate values in our result set.

Note: When using the SELECT DISTINCT statement on a column with multiple NULL values, SQL retains one NULL value and eliminates the others from the result set. This is because DISTINCT considers all NULL values as identical.