如何删除不存在 unique row id
的重复行?
我的桌子是
col1 col2 col3 col4 col5 col6 col7
john 1 1 1 1 1 1
john 1 1 1 1 1 1
sally 2 2 2 2 2 2
sally 2 2 2 2 2 2
我想在重复删除后留下以下内容:
john 1 1 1 1 1 1
sally 2 2 2 2 2 2
我尝试了一些查询,但我认为它们取决于有一个行 ID,因为我没有得到想要的结果。例如:
DELETE
FROM table
WHERE col1 IN (
SELECT id
FROM table
GROUP BY id
HAVING (COUNT(col1) > 1)
)
我喜欢 CTE 和 ROW_NUMBER
,因为这两者的结合让我们可以看到哪些行被删除(或更新),因此只需将 DELETE FROM CTE...
更改为 SELECT * FROM CTE
:
WITH CTE AS(
SELECT [col1], [col2], [col3], [col4], [col5], [col6], [col7],
RN = ROW_NUMBER()OVER(PARTITION BY col1 ORDER BY col1)
FROM dbo.Table1
)
DELETE FROM CTE WHERE RN > 1
DEMO(结果不同;我认为这是由于您的拼写错误)
COL1 COL2 COL3 COL4 COL5 COL6 COL7
john 1 1 1 1 1 1
sally 2 2 2 2 2 2
由于 PARTITION BY col1
,此示例通过单个列 col1
确定重复项。如果您想包含多个列,只需将它们添加到 PARTITION BY
:
ROW_NUMBER()OVER(PARTITION BY Col1, Col2, ... ORDER BY OrderColumn)
我更喜欢 CTE 从 sql server 表中删除重复的行
强烈建议关注这篇文章 ::http://codaffection.com/sql-server-article/delete-duplicate-rows-in-sql-server/
通过保持原始
WITH CTE AS
(
SELECT *,ROW_NUMBER() OVER (PARTITION BY col1,col2,col3 ORDER BY col1,col2,col3) AS RN
FROM MyTable
)
DELETE FROM CTE WHERE RN<>1
不保留原创
WITH CTE AS
(SELECT *,R=RANK() OVER (ORDER BY col1,col2,col3)
FROM MyTable)
DELETE CTE
WHERE R IN (SELECT R FROM CTE GROUP BY R HAVING COUNT(*)>1)
在不使用 CTE
和 ROW_NUMBER()
的情况下,您只需使用带有 MAX
功能的 group by 即可删除记录,这里是示例
DELETE
FROM MyDuplicateTable
WHERE ID NOT IN
(
SELECT MAX(ID)
FROM MyDuplicateTable
GROUP BY DuplicateColumn1, DuplicateColumn2, DuplicateColumn3)
MIN(ID)
保留原始记录
如果你没有引用,比如外键,你可以这样做。在测试概念证明并且测试数据被重复时,我经常这样做。
SELECT DISTINCT [col1],[col2],[col3],[col4],[col5],[col6],[col7]
INTO [newTable]
FROM [oldTable]
进入对象资源管理器并删除旧表。
用旧表的名称重命名新表。
DELETE from search
where id not in (
select min(id) from search
group by url
having count(*)=1
union
SELECT min(id) FROM search
group by url
having count(*) > 1
)
删除所有重复项,但最先删除(使用最小 ID)
应该在其他 SQL 服务器(如 Postgres)中同样工作:
DELETE FROM table
WHERE id NOT IN (
select min(id) from table
group by col1, col2, col3, col4, col5, col6, col7
)
id
应该对他们有意义。
mysql
中有两种解决方案:
A) 使用 DELETE JOIN
语句删除重复行
DELETE t1 FROM contacts t1
INNER JOIN contacts t2
WHERE
t1.id < t2.id AND
t1.email = t2.email;
此查询引用了联系人表两次,因此,它使用表别名 t1
和 t2
。
输出是:
1 个查询正常,4 行受影响(0.10 秒)
如果要删除重复行并保留 lowest id
,可以使用以下语句:
DELETE c1 FROM contacts c1
INNER JOIN contacts c2
WHERE
c1.id > c2.id AND
c1.email = c2.email;
B) 使用中间表删除重复行
下面显示了使用中间表删除重复行的步骤:
1. 创建一个新表,其结构与要删除重复行的原始表相同。
2. 将原始表中的不同行插入到直接表中。
3. 将原始表中的不同行插入到直接表中。
步骤 1. 创建一个与原表结构相同的新表:
CREATE TABLE source_copy LIKE source;
步骤 2. 将原始表中的不同行插入到新表中:
INSERT INTO source_copy
SELECT * FROM source
GROUP BY col; -- column that has duplicate values
步骤 3. 删除原始表并将直接表重命名为原始表
DROP TABLE source;
ALTER TABLE source_copy RENAME TO source;
来源:http://www.mysqltutorial.org/mysql-delete-duplicate-rows/
请参阅下面的删除方式。
Declare @table table
(col1 varchar(10),col2 int,col3 int, col4 int, col5 int, col6 int, col7 int)
Insert into @table values
('john',1,1,1,1,1,1),
('john',1,1,1,1,1,1),
('sally',2,2,2,2,2,2),
('sally',2,2,2,2,2,2)
创建了一个名为 @table
的示例表并使用给定数据加载它。
https://i.stack.imgur.com/DsYzT.png
Delete aliasName from (
Select *,
ROW_NUMBER() over (Partition by col1,col2,col3,col4,col5,col6,col7 order by col1) as rowNumber
From @table) aliasName
Where rowNumber > 1
Select * from @table
https://i.stack.imgur.com/qjapy.png
注意:如果您在 Partition by
部分中给出所有列,则 order by
没有太大意义。
我知道,这个问题是三年前提出的,我的回答是蒂姆发布的另一个版本,但发布只是以防万一它对任何人都有帮助。
在 sql server 中可以通过多种方式完成,最简单的方法是:将重复行表中的不同行插入到新的临时表中。然后从重复行表中删除所有数据,然后从没有重复的临时表中插入所有数据,如下所示。
select distinct * into #tmp From table
delete from table
insert into table
select * from #tmp drop table #tmp
select * from table
使用公用表表达式(CTE)删除重复行
With CTE_Duplicates as
(select id,name , row_number()
over(partition by id,name order by id,name ) rownumber from table )
delete from CTE_Duplicates where rownumber!=1
Microsoft 有一个关于如何删除重复项的非常简洁的指南。查看http://support.microsoft.com/kb/139444
简而言之,当您只有几行要删除时,这是删除重复项的最简单方法:
SET rowcount 1;
DELETE FROM t1 WHERE myprimarykey=1;
myprimarykey 是行的标识符。
我将 rowcount
设置为 1,因为我只有两行重复。如果我有 3 行重复,那么我会将 rowcount 设置为 2,以便它删除它看到的前两个,并且只在表 t1 中留下一个。
尝试使用:
SELECT linkorder
,Row_Number() OVER (
PARTITION BY linkorder ORDER BY linkorder DESC
) AS RowNum
FROM u_links
https://i.stack.imgur.com/pUtkX.png
在尝试了上面建议的解决方案后,它适用于中小型表。我可以为非常大的表建议该解决方案。因为它在迭代中运行。
删除 LargeSourceTable 的所有依赖项视图,您可以使用 sql managment studio 找到依赖项,右键单击表并单击“查看依赖项” 重命名表:sp_rename 'LargeSourceTable', 'LargeSourceTable_Temp'; GO 再次创建 LargeSourceTable,但现在,添加一个主键,其中定义重复的所有列添加 WITH (IGNORE_DUP_KEY = ON) 例如: CREATE TABLE [dbo].[LargeSourceTable] ( ID int IDENTITY(1,1), [CreateDate] DATETIME CONSTRAINT [DF_LargeSourceTable_CreateDate] DEFAULT (getdate()) NOT NULL, [Column1] CHAR (36) NOT NULL, [Column2] NVARCHAR (100) NOT NULL, [Column3] CHAR (36) NOT NULL, PRIMARY KEY ( Column1, Column2) WITH (IGNORE_DUP_KEY = ON) ); GO 再次为新创建的表创建您首先删除的视图现在,运行以下 sql 脚本,您将看到每页 1,000,000 行的结果,您可以更改每页的行数以更频繁地查看结果.请注意,我将 IDENTITY_INSERT 设置为开和关,因为其中一列包含自动增量 id,我也在复制
SET IDENTITY_INSERT LargeSourceTable ON DECLARE @PageNumber AS INT, @RowspPage AS INT DECLARE @TotalRows AS INT declare @dt varchar(19) SET @PageNumber = 0 SET @RowspPage = 1000000
select @TotalRows = count (*) from LargeSourceTable_TEMP
While ((@PageNumber - 1) * @RowspPage < @TotalRows )
Begin
begin transaction tran_inner
; with cte as
(
SELECT * FROM LargeSourceTable_TEMP ORDER BY ID
OFFSET ((@PageNumber) * @RowspPage) ROWS
FETCH NEXT @RowspPage ROWS ONLY
)
INSERT INTO LargeSourceTable
(
ID
,[CreateDate]
,[Column1]
,[Column2]
,[Column3]
)
select
ID
,[CreateDate]
,[Column1]
,[Column2]
,[Column3]
from cte
commit transaction tran_inner
PRINT 'Page: ' + convert(varchar(10), @PageNumber)
PRINT 'Transfered: ' + convert(varchar(20), @PageNumber * @RowspPage)
PRINT 'Of: ' + convert(varchar(20), @TotalRows)
SELECT @dt = convert(varchar(19), getdate(), 121)
RAISERROR('Inserted on: %s', 0, 1, @dt) WITH NOWAIT
SET @PageNumber = @PageNumber + 1
End
SET IDENTITY_INSERT LargeSourceTable OFF
要从 SQL Server 中的表中删除重复行,请按照下列步骤操作:
使用 GROUP BY 子句或 ROW_NUMBER() 函数查找重复行。使用 DELETE 语句删除重复的行。
设置样品表
DROP TABLE IF EXISTS contacts;
CREATE TABLE contacts(
contact_id INT IDENTITY(1,1) PRIMARY KEY,
first_name NVARCHAR(100) NOT NULL,
last_name NVARCHAR(100) NOT NULL,
email NVARCHAR(255) NOT NULL,
);
插入值
INSERT INTO contacts
(first_name,last_name,email)
VALUES
('Syed','Abbas','syed.abbas@example.com'),
('Catherine','Abel','catherine.abel@example.com'),
('Kim','Abercrombie','kim.abercrombie@example.com'),
('Kim','Abercrombie','kim.abercrombie@example.com'),
('Kim','Abercrombie','kim.abercrombie@example.com'),
('Hazem','Abolrous','hazem.abolrous@example.com'),
('Hazem','Abolrous','hazem.abolrous@example.com'),
('Humberto','Acevedo','humberto.acevedo@example.com'),
('Humberto','Acevedo','humberto.acevedo@example.com'),
('Pilar','Ackerman','pilar.ackerman@example.com');
https://i.stack.imgur.com/N9pU1.png
询问
SELECT
contact_id,
first_name,
last_name,
email
FROM
contacts;
从表中删除重复的行
WITH cte AS (
SELECT
contact_id,
first_name,
last_name,
email,
ROW_NUMBER() OVER (
PARTITION BY
first_name,
last_name,
email
ORDER BY
first_name,
last_name,
email
) row_num
FROM
contacts
)
DELETE FROM cte
WHERE row_num > 1;
现在应该删除记录
https://i.stack.imgur.com/B0T6K.png
with myCTE
as
(
select productName,ROW_NUMBER() over(PARTITION BY productName order by slno) as Duplicate from productDetails
)
Delete from myCTE where Duplicate>1
-- this query will keep only one instance of a duplicate record.
;WITH cte
AS (SELECT ROW_NUMBER() OVER (PARTITION BY col1, col2, col3-- based on what? --can be multiple columns
ORDER BY ( SELECT 0)) RN
FROM Mytable)
delete FROM cte
WHERE RN > 1
您需要根据字段对重复记录进行分组,然后保留其中一条记录并删除其余记录。例如:
DELETE prg.Person WHERE Id IN (
SELECT dublicateRow.Id FROM
(
select MIN(Id) MinId, NationalCode
from prg.Person group by NationalCode having count(NationalCode ) > 1
) GroupSelect
JOIN prg.Person dublicateRow ON dublicateRow.NationalCode = GroupSelect.NationalCode
WHERE dublicateRow.Id <> GroupSelect.MinId)
从一个巨大的(几百万条记录)表中删除重复项可能需要很长时间。我建议您对选定行的临时表进行批量插入,而不是删除。
--REWRITING YOUR CODE(TAKE NOTE OF THE 3RD LINE) WITH CTE AS(SELECT NAME,ROW_NUMBER()
OVER (PARTITION BY NAME ORDER BY NAME) ID FROM @TB) SELECT * INTO #unique_records FROM
CTE WHERE ID =1;
这可能对您的情况有所帮助
DELETE t1 FROM table t1 INNER JOIN table t2 WHERE t1.id > t2.id AND t1.col1 = t2.col1
id
。
删除重复的想法涉及
a) 保护那些不重复的行
b) 保留一起限定为重复的许多行之一。
一步步
1)首先确定满足重复定义的行并将它们插入到临时表中,例如 #tableAll 。
2)在临时表中选择非重复(单行)或不同的行,例如#tableUnique。
3) 从加入#tableAll 的源表中删除以删除重复项。
4) 将#tableUnique 中的所有行插入源表。
5) 删除#tableAll 和#tableUnique
如果您能够临时向表中添加一列,这是一个对我有用的解决方案:
ALTER TABLE dbo.DUPPEDTABLE ADD RowID INT NOT NULL IDENTITY(1,1)
然后使用 MIN 和 GROUP BY 的组合执行 DELETE
DELETE b
FROM dbo.DUPPEDTABLE b
WHERE b.RowID NOT IN (
SELECT MIN(RowID) AS RowID
FROM dbo.DUPPEDTABLE a WITH (NOLOCK)
GROUP BY a.ITEM_NUMBER,
a.CHARACTERISTIC,
a.INTVALUE,
a.FLOATVALUE,
a.STRINGVALUE
);
验证 DELETE 是否正确执行:
SELECT a.ITEM_NUMBER,
a.CHARACTERISTIC,
a.INTVALUE,
a.FLOATVALUE,
a.STRINGVALUE, COUNT(*)--MIN(RowID) AS RowID
FROM dbo.DUPPEDTABLE a WITH (NOLOCK)
GROUP BY a.ITEM_NUMBER,
a.CHARACTERISTIC,
a.INTVALUE,
a.FLOATVALUE,
a.STRINGVALUE
ORDER BY COUNT(*) DESC
结果应该没有计数大于 1 的行。最后,删除 rowid 列:
ALTER TABLE dbo.DUPPEDTABLE DROP COLUMN RowID;
哦,哇,准备好所有这些答案让我觉得很愚蠢,它们就像专家对所有 CTE 和临时表等的回答。
我所做的只是使用 MAX 聚合 ID 列。
DELETE FROM table WHERE col1 IN (
SELECT MAX(id) FROM table GROUP BY id HAVING ( COUNT(col1) > 1 )
)
注意:您可能需要多次运行它才能删除重复,因为这一次只会删除一组重复的行。
另一种在不丢失信息的情况下删除重复行的方法如下:
delete from dublicated_table t1 (nolock)
join (
select t2.dublicated_field
, min(len(t2.field_kept)) as min_field_kept
from dublicated_table t2 (nolock)
group by t2.dublicated_field having COUNT(*)>1
) t3
on t1.dublicated_field=t3.dublicated_field
and len(t1.field_kept)=t3.min_field_kept
DECLARE @TB TABLE(NAME VARCHAR(100));
INSERT INTO @TB VALUES ('Red'),('Red'),('Green'),('Blue'),('White'),('White')
--**Delete by Rank**
;WITH CTE AS(SELECT NAME,DENSE_RANK() OVER (PARTITION BY NAME ORDER BY NEWID()) ID FROM @TB)
DELETE FROM CTE WHERE ID>1
SELECT NAME FROM @TB;
--**Delete by Row Number**
;WITH CTE AS(SELECT NAME,ROW_NUMBER() OVER (PARTITION BY NAME ORDER BY NAME) ID FROM @TB)
DELETE FROM CTE WHERE ID>1;
SELECT NAME FROM @TB;
DELETE FROM TBL1 WHERE ID IN
(SELECT ID FROM TBL1 a WHERE ID!=
(select MAX(ID) from TBL1 where DUPVAL=a.DUPVAL
group by DUPVAL
having count(DUPVAL)>1))
AND COl1='John'
)更有效。通常您应该在 CTE 中应用过滤器。...FROM dbo.Table1 WHERE Col1='John'
。这是小提琴:sqlfiddle.com/#!6/fae73/744/0set rowcount 1 delete from t1 where col1=1 and col2=1
,如 here