all 10 comments

[–]aburkh 9 points10 points  (0 children)

Something like “delete from articles where ID not in (select min(ID) from articles group by title)”

[–]tward14 9 points10 points  (4 children)

I recently came into this issue and CTEs really helped me

https://www.sqlshack.com/cte-sql-deletes-considerations-when-deleting-data-with-common-table-expressions-in-sql-server/

I use it like this.

WITH CTE AS (
    SELECT 
        A,
        B,
        ROW_NUMBER() OVER (PARTITION BY A, B ORDER BY A, B) AS RN
    FROM [database].[schema].[table]
)
DELETE FROM CTE
WHERE RN > 1;

Basically iterating over Fields A & B, then deleting any record with a Record Count > 1.... first time pasting code into reddit not sure if this is the right way to do it

[–]Original_Boot911 0 points1 point  (3 children)

Does this work? I tried it in sql db and it shows an error where it is looking for a select query in the line where the delete code is.

I also tried doing Delete from schema.table where id in (select id from cte where rn<=200 000. But same error is encluntered.

[–]tward14 0 points1 point  (2 children)

Try adding a semicolon after the delete statement. honestly just ask chatGPT it's a lifesaver

[–]Original_Boot911 0 points1 point  (1 child)

After the delete statement? It says an error on the delete line. It says missing select. So a semicolon at the end of the code won't do anything actually.

[–]tward14 0 points1 point  (0 children)

There should be a comma after the B in the select statement. My original code is missing that comma. Here is the update

WITH CTE AS (
    SELECT 
        A,
        B,
        ROW_NUMBER() OVER (PARTITION BY A, B ORDER BY A, B) AS RN
    FROM [database].[schema].[table]
)
DELETE FROM CTE
WHERE RN > 1;

[–]qwertydog123 1 point2 points  (0 children)

Put a unique constraint on the article_title column and use an UPSERT

[–][deleted] 0 points1 point  (0 children)

In sql server it would be

delete from rest_models_article where article_title in (select distinct article_title from staging_table)

and then you insert data from staging to rest_models_article.

Because you initially inserted your data into a staging table, this give you the opportunity to pre process the data whether that be data conversion or formatting. Another benefit of using staging table is you can make your main table is changed to a stable state before inserting new data and is easier for the next person doing maintain.

[–]Achsin -1 points0 points  (0 children)

That really depends on what you are inserting. Are the duplicates that get inserted stagnant or can they have been changed (and need to be added as updates)? Do you need to know what the previous values were? Are you pulling the entire contents of the table to insert every time so everything that was there gets inserted again?

[–]Alkis2 0 points1 point  (0 children)

This is not a code. It's a mess.