When retrieving data from a database table, you may encounter duplicate rows or values in the result set. To eliminate these duplicates, you can use the DISTINCT
keyword immediately after the SELECT
keyword, as shown below:
The DISTINCT
clause is used to remove duplicate rows from the result set:
column_list
FROM table_name
;Here, column_list is a comma-separated list of column or field names from a database table (e.g., name, age, country, etc.) whose values you wish to retrieve.
Note: The DISTINCT
clause operates similarly to the UNIQUE
constraint, except in its treatment of null values. Two NULL
values are considered unique, though they are not considered distinct from each other.
Let's explore some examples to see how it functions in practice.
Imagine we have a customers table in our database with the following records:
+---------+--------------------+-----------+-------------+ | cust_id | cust_name | city | postal_code | +---------+--------------------+-----------+-------------+ | 1 | Maria Anders | Berlin | 12209 | | 2 | Fran Wilson | Madrid | 28023 | | 3 | Dominique Perrier | Paris | 75016 | | 4 | Martin Blank | Turin | 10100 | | 5 | Thomas Hardy | Portland | 97219 | | 6 | Christina Aguilera | Madrid | 28001 | +---------+--------------------+-----------+-------------+
Now, execute the following query to retrieve all the rows from the city column of this table:
SELECT city FROM customers;
After executing the query, the output will look something like this:
+-----------+ | city | +-----------+ | Berlin | | Madrid | | Paris | | Turin | | Portland | | Madrid | +-----------+
Upon examining the output, you'll notice that the city "Madrid" appears twice in our result set, which is undesirable. Let's address this issue.
The following query utilizes DISTINCT
to produce a list of unique cities from the customers table.
SELECT DISTINCT city FROM customers;
After executing the above query, the output will resemble this:
+-----------+ | city | +-----------+ | Berlin | | Madrid | | Paris | | Turin | | Portland | +-----------+
As you can see, this time there are no duplicate values in our result set.
Note: When using the SELECT DISTINCT
statement on a column with multiple NULL values, SQL retains one NULL
value and eliminates the others from the result set. This is because DISTINCT
considers all NULL values as identical.