Pages

Tuesday, December 25, 2007

Finding Duplicates

In many situations, users need to have the ability to determine when duplicate data exists in the database. You can use a GROUP BY and COUNT to see how many duplicates exist in each group. However, in most instances, users want to see the primary key identifier of the duplicate rows so that you can display the data or delete the duplicates. You cannot add the primary key to a query that uses a GROUP BY because this would cause all counts to equal one since the primary key by definition is unique.

To get a query that shows duplicates along with the primary key, you must use an "In" clause in the WHERE clause.

Using Northwind:

SELECT
Employees.LastName, Employees.FirstName, Employees.Address, Employees.EmployeeID, Employees.Title

FROM
Employees

WHERE
(
(
(Employees.LastName) In
(
SELECT [LastName]
FROM [Employees] As Tmp
GROUP BY [LastName],[FirstName],[Address]
HAVING Count(*)>1 And [FirstName] = [Employees].[FirstName] And [Address] = [Employees].[Address]
)
)
)

ORDER BY
Employees.LastName, Employees.FirstName, Employees.Address


This query will return the first name, last name, address, employeeid and title of all rows where the last name, first name, and address are duplicated.

No comments: