Movatterモバイル変換


[0]ホーム

URL:


You don’t need a vector database - just use Postgres for everything. Read the case study on switching from Pinecone to Neon
PostgreSQL Tutorial
PostgreSQL Tutorial

How To Delete Duplicate Rows in PostgreSQL

Summary: in this tutorial, you will learn how to use various techniques to delete duplicate rows in PostgreSQL.

Preparing sample data

First,create a new table namedbasket that stores fruits:

CREATE TABLE basket(    idSERIAL PRIMARY KEY,    fruitVARCHAR(50)NOT NULL);

Second,insert some fruits into thebasket table.

INSERT INTO basket(fruit)VALUES  ('apple'),  ('apple'),  ('orange'),  ('orange'),  ('orange'),  ('banana');

Third,query data from thebasket table:

SELECT    id,    fruitFROM    basket;

Output:

id | fruit----+--------  1 | apple  2 | apple  3 | orange  4 | orange  5 | orange  6 | banana(6 rows)

The output indicates some duplicate rows such as 2 apples and 3 oranges in thebasket table.

Finding duplicate rows

If the table has few rows, you can easily see which ones are duplicates immediately. However, this is not the case with a table that has lots of rows.

The find the duplicate rows, you use the following statement:

SELECT    fruit,    COUNT( fruit )FROM    basketGROUP BY    fruitHAVING    COUNT( fruit )> 1ORDER BY    fruit;

Output:

fruit  | count--------+------- apple  |     2 orange |     3(2 rows)

Deleting duplicate rows using DELETE USING statement

The following statement uses theDELETE USING statement to remove duplicate rows:

DELETE FROM    basket a        USING basket bWHERE    a.id < b.id    AND a.fruit = b.fruit;

In this example, we joined thebasket table to itself and checked if two different rows (a.id < b.id) have the same value in thefruit column.

The following query retrieves data from thebasket table to verify the duplication removal:

SELECTid,fruitFROMbasket;

Output:

id | fruit----+--------  2 | apple  5 | orange  6 | banana(3 rows)

The output indicates that the statement removes the duplicate rows with the lowest IDs and keeps the one with the highest id.

If you want to keep the duplicate rows with the lowest id, you need to flip the operator in theWHERE clause:

DELETE  FROM    basket a        USING basket bWHERE    a.id > b.id    AND a.fruit = b.fruit;

To check whether the statement works correctly, let’s verify the data in thebasket table:

SELECT    id,    fruitFROM    basket;

Output:

id | fruit----+--------  1 | apple  3 | orange  6 | banana(3 rows)

The output indicates that duplicate rows with the lowest ids are retained.

Deleting duplicate rows using subquery

The following statement uses a subquery to delete duplicate rows and keep the row with the lowest id.

DELETE FROM basketWHERE id IN    (SELECT id    FROM        (SELECT id,         ROW_NUMBER() OVER(PARTITION BY fruit        ORDER BY  id ) AS row_num        FROM basket )t        WHERE t.row_num > 1 );

In this example, the subquery returned the duplicate rows except for the first row in the duplicate group. The outerDELETE statement deleted the duplicate rows returned by the subquery.

If you want to keep the duplicate row with the highest ID, just change the order in the subquery:

DELETE FROM basketWHERE idIN    (SELECT id    FROM        (SELECT id,         ROW_NUMBER()OVER(PARTITION BY fruit        ORDER BY  idDESC )AS row_num        FROM basket ) t        WHERE t.row_num> 1 );

In case you want to delete duplicates based on values of multiple columns, here is the query template:

DELETE FROM table_nameWHERE idIN    (SELECT id    FROM        (SELECT id,         ROW_NUMBER()OVER(PARTITION BY column_1,         column_2        ORDER BY  id )AS row_num        FROM table_name ) t        WHERE t.row_num> 1 );

In this case, the statement will delete all rows with duplicate values in thecolumn_1 andcolumn_2 columns.

Deleting duplicate rows using an immediate table

To delete rows using an immediate table, you use the following steps:

  1. Create a new table with the same structure as the one whose duplicate rows should be removed.
  2. Insertdistinct rows from the source table to the immediate table.
  3. Drop the source table.
  4. Rename the immediate table to the name of the source table.

The following illustrates the steps for removing duplicate rows from thebasket table:

-- step 1CREATE TABLE basket_temp (LIKE basket);-- step 2INSERT INTO basket_temp(fruit, id)SELECT    DISTINCT ON (fruit) fruit,    idFROM basket;-- step 3DROP TABLE basket;-- step 4ALTER TABLE basket_tempRENAMETO basket;

In this tutorial, you have learned how to delete duplicate rows in PostgreSQL using theDELETE USING statement, subquery, and the immediate table techniques.

Last updated on

Was this page helpful?
Thank you for your feedback!

[8]ページ先頭

©2009-2025 Movatter.jp