I need to do a bulk update of 1 field in a data-table (removing
zero-padding). The update is expected to update 52.5 million records.
What techniques could I use so I can ensure
1) Consistency in the transactional log backups (every 10 mins)
2) Consistency in the full DB backup every day
3) Disk drive of Transaction logs do not get filled up completely (I suspect
even if I set rown count to 10,000 and loop through, disk-space would still
be used')
Target environment
SQL Server 2000 cluster on 2 nodes running Windows 2003 Server
(Different disk drives for data and log files)A normal set based operation will do fine. update my table set myfield =
'mynewvalue'
However, this will create significant locking so you're only going to be
this aggressive on a database that's not in use (Overnight, for most) or
write a cursor that includes some logic in it but that's going to be slow.
DatabaseAdmins.com
Databases Reliable. Available.
"Patrick" wrote:
> I need to do a bulk update of 1 field in a data-table (removing
> zero-padding). The update is expected to update 52.5 million records.
> What techniques could I use so I can ensure
> 1) Consistency in the transactional log backups (every 10 mins)
> 2) Consistency in the full DB backup every day
> 3) Disk drive of Transaction logs do not get filled up completely (I suspe
ct
> even if I set rown count to 10,000 and loop through, disk-space would stil
l
> be used')
> Target environment
> SQL Server 2000 cluster on 2 nodes running Windows 2003 Server
> (Different disk drives for data and log files)|||SQL Server takes care of consistency. To make sure you don't blow up the
transaction log, dividing the update into smaller chunks of updates (e.g.
10,000 rows at a time as you mentioned) is the right approach in general.
I'd like to add that you may have to pace your updates (e.g. sleep for some
time after each iteration) so that your transaction log backups at the
10-minute interval have a chance coming in to clear out the inactive portion
of the log for reuse. Alternative, you can backup the transaction log after
each iteration (or every so many iterations) in the loop. You need to
experiment a bit to find out the best chunk size and whether the loop withou
t
any wait time would be too tight in your particular configuration.
Also, it's certainly possible you can choose to update all the rows in a
single shot, depending on the size of your transaction log.
Linchi
"burt_king" wrote:
[vbcol=seagreen]
> A normal set based operation will do fine. update my table set myfield =
> 'mynewvalue'
> However, this will create significant locking so you're only going to be
> this aggressive on a database that's not in use (Overnight, for most) or
> write a cursor that includes some logic in it but that's going to be slow.
> --
> DatabaseAdmins.com
> Databases Reliable. Available.
>
> "Patrick" wrote:
>|||I have composed the following SQL, could you tell me what is wrong? The
Select getDate() only printed <10 dates! when I am expecting millions of
records to be updated!
DECLARE @.recCount int
SELECT @.recCount=50001
SET rowcount 50000
WHILE @.recCount>0
BEGIN
select getdate()
waitfor delay '000:00:5'
BEGIN TRAN
UPDATE
cor_transactions
SET
id_Entity = SUBSTRING(id_Entity,2,3)
WHERE
len(id_Entity)>3
and
id_Entity like '0%'
IF @.@.ERROR= 0
COMMIT TRAN
ELSE
BEGIN
ROLLBACK TRAN
print 'ERROR' + Cast(@.@.ERROR as varchar(50))
BREAK
END
SELECT @.recCount=@.@.ROWCOUNT
END
SELECT @.recCount=50001
SET rowcount 50000
WHILE @.recCount>0
BEGIN
select getdate()
waitfor delay '000:00:5'
BEGIN TRAN
UPDATE
cor_stocks
SET
id_Entity = SUBSTRING(id_Entity,2,3)
WHERE
len(id_Entity)>3
and
id_Entity like '0%'
IF @.@.ERROR= 0
COMMIT TRAN
ELSE
BEGIN
ROLLBACK TRAN
print 'ERROR' + Cast(@.@.ERROR as varchar(50))
BREAK
END
SELECT @.recCount=@.@.ROWCOUNT
END
SELECT @.recCount=50001
SET rowcount 50000
WHILE @.recCount>0
BEGIN
select getdate()
waitfor delay '000:00:5'
BEGIN TRAN
UPDATE
cor_inventory
SET
id_Entity = SUBSTRING(id_Entity,2,3)
WHERE
len(id_Entity)>3
and
id_Entity like '0%'
IF @.@.ERROR= 0
COMMIT TRAN
ELSE
BEGIN
ROLLBACK TRAN
print 'ERROR' + Cast(@.@.ERROR as varchar(50))
BREAK
END
SELECT @.recCount=@.@.ROWCOUNT
END
"Linchi Shea" wrote:
[vbcol=seagreen]
> SQL Server takes care of consistency. To make sure you don't blow up the
> transaction log, dividing the update into smaller chunks of updates (e.g.
> 10,000 rows at a time as you mentioned) is the right approach in general.
> I'd like to add that you may have to pace your updates (e.g. sleep for som
e
> time after each iteration) so that your transaction log backups at the
> 10-minute interval have a chance coming in to clear out the inactive porti
on
> of the log for reuse. Alternative, you can backup the transaction log afte
r
> each iteration (or every so many iterations) in the loop. You need to
> experiment a bit to find out the best chunk size and whether the loop with
out
> any wait time would be too tight in your particular configuration.
> Also, it's certainly possible you can choose to update all the rows in a
> single shot, depending on the size of your transaction log.
> Linchi
> "burt_king" wrote:
>
Showing posts with label update. Show all posts
Showing posts with label update. Show all posts
Thursday, March 29, 2012
Bulk UPDATE on SQL Server 2000
Labels:
bulk,
data-table,
database,
expected,
field,
microsoft,
million,
mysql,
oracle,
records,
removingzero-padding,
server,
sql,
techniques,
update
Bulk UPDATE on SQL Server 2000
I need to do a bulk update of 1 field in a data-table (removing
zero-padding). The update is expected to update 52.5 million records.
What techniques could I use so I can ensure
1) Consistency in the transactional log backups (every 10 mins)
2) Consistency in the full DB backup every day
3) Disk drive of Transaction logs do not get filled up completely (I suspect
even if I set rown count to 10,000 and loop through, disk-space would still
be used')
Target environment
SQL Server 2000 cluster on 2 nodes running Windows 2003 Server
(Different disk drives for data and log files)A normal set based operation will do fine. update my table set myfield ='mynewvalue'
However, this will create significant locking so you're only going to be
this aggressive on a database that's not in use (Overnight, for most) or
write a cursor that includes some logic in it but that's going to be slow.
--
DatabaseAdmins.com
Databases Reliable. Available.
"Patrick" wrote:
> I need to do a bulk update of 1 field in a data-table (removing
> zero-padding). The update is expected to update 52.5 million records.
> What techniques could I use so I can ensure
> 1) Consistency in the transactional log backups (every 10 mins)
> 2) Consistency in the full DB backup every day
> 3) Disk drive of Transaction logs do not get filled up completely (I suspect
> even if I set rown count to 10,000 and loop through, disk-space would still
> be used')
> Target environment
> SQL Server 2000 cluster on 2 nodes running Windows 2003 Server
> (Different disk drives for data and log files)|||SQL Server takes care of consistency. To make sure you don't blow up the
transaction log, dividing the update into smaller chunks of updates (e.g.
10,000 rows at a time as you mentioned) is the right approach in general.
I'd like to add that you may have to pace your updates (e.g. sleep for some
time after each iteration) so that your transaction log backups at the
10-minute interval have a chance coming in to clear out the inactive portion
of the log for reuse. Alternative, you can backup the transaction log after
each iteration (or every so many iterations) in the loop. You need to
experiment a bit to find out the best chunk size and whether the loop without
any wait time would be too tight in your particular configuration.
Also, it's certainly possible you can choose to update all the rows in a
single shot, depending on the size of your transaction log.
Linchi
"burt_king" wrote:
> A normal set based operation will do fine. update my table set myfield => 'mynewvalue'
> However, this will create significant locking so you're only going to be
> this aggressive on a database that's not in use (Overnight, for most) or
> write a cursor that includes some logic in it but that's going to be slow.
> --
> DatabaseAdmins.com
> Databases Reliable. Available.
>
> "Patrick" wrote:
> > I need to do a bulk update of 1 field in a data-table (removing
> > zero-padding). The update is expected to update 52.5 million records.
> >
> > What techniques could I use so I can ensure
> > 1) Consistency in the transactional log backups (every 10 mins)
> > 2) Consistency in the full DB backup every day
> > 3) Disk drive of Transaction logs do not get filled up completely (I suspect
> > even if I set rown count to 10,000 and loop through, disk-space would still
> > be used')
> >
> > Target environment
> > SQL Server 2000 cluster on 2 nodes running Windows 2003 Server
> > (Different disk drives for data and log files)|||I have composed the following SQL, could you tell me what is wrong? The
Select getDate() only printed <10 dates! when I am expecting millions of
records to be updated!
DECLARE @.recCount int
SELECT @.recCount=50001
SET rowcount 50000
WHILE @.recCount>0
BEGIN
select getdate()
waitfor delay '000:00:5'
BEGIN TRAN
UPDATE
cor_transactions
SET
id_Entity = SUBSTRING(id_Entity,2,3)
WHERE
len(id_Entity)>3
and
id_Entity like '0%'
IF @.@.ERROR= 0
COMMIT TRAN
ELSE
BEGIN
ROLLBACK TRAN
print 'ERROR' + Cast(@.@.ERROR as varchar(50))
BREAK
END
SELECT @.recCount=@.@.ROWCOUNT
END
SELECT @.recCount=50001
SET rowcount 50000
WHILE @.recCount>0
BEGIN
select getdate()
waitfor delay '000:00:5'
BEGIN TRAN
UPDATE
cor_stocks
SET
id_Entity = SUBSTRING(id_Entity,2,3)
WHERE
len(id_Entity)>3
and
id_Entity like '0%'
IF @.@.ERROR= 0
COMMIT TRAN
ELSE
BEGIN
ROLLBACK TRAN
print 'ERROR' + Cast(@.@.ERROR as varchar(50))
BREAK
END
SELECT @.recCount=@.@.ROWCOUNT
END
SELECT @.recCount=50001
SET rowcount 50000
WHILE @.recCount>0
BEGIN
select getdate()
waitfor delay '000:00:5'
BEGIN TRAN
UPDATE
cor_inventory
SET
id_Entity = SUBSTRING(id_Entity,2,3)
WHERE
len(id_Entity)>3
and
id_Entity like '0%'
IF @.@.ERROR= 0
COMMIT TRAN
ELSE
BEGIN
ROLLBACK TRAN
print 'ERROR' + Cast(@.@.ERROR as varchar(50))
BREAK
END
SELECT @.recCount=@.@.ROWCOUNT
END
"Linchi Shea" wrote:
> SQL Server takes care of consistency. To make sure you don't blow up the
> transaction log, dividing the update into smaller chunks of updates (e.g.
> 10,000 rows at a time as you mentioned) is the right approach in general.
> I'd like to add that you may have to pace your updates (e.g. sleep for some
> time after each iteration) so that your transaction log backups at the
> 10-minute interval have a chance coming in to clear out the inactive portion
> of the log for reuse. Alternative, you can backup the transaction log after
> each iteration (or every so many iterations) in the loop. You need to
> experiment a bit to find out the best chunk size and whether the loop without
> any wait time would be too tight in your particular configuration.
> Also, it's certainly possible you can choose to update all the rows in a
> single shot, depending on the size of your transaction log.
> Linchi
> "burt_king" wrote:
> > A normal set based operation will do fine. update my table set myfield => > 'mynewvalue'
> >
> > However, this will create significant locking so you're only going to be
> > this aggressive on a database that's not in use (Overnight, for most) or
> > write a cursor that includes some logic in it but that's going to be slow.
> >
> > --
> > DatabaseAdmins.com
> > Databases Reliable. Available.
> >
> >
> > "Patrick" wrote:
> >
> > > I need to do a bulk update of 1 field in a data-table (removing
> > > zero-padding). The update is expected to update 52.5 million records.
> > >
> > > What techniques could I use so I can ensure
> > > 1) Consistency in the transactional log backups (every 10 mins)
> > > 2) Consistency in the full DB backup every day
> > > 3) Disk drive of Transaction logs do not get filled up completely (I suspect
> > > even if I set rown count to 10,000 and loop through, disk-space would still
> > > be used')
> > >
> > > Target environment
> > > SQL Server 2000 cluster on 2 nodes running Windows 2003 Server
> > > (Different disk drives for data and log files)
zero-padding). The update is expected to update 52.5 million records.
What techniques could I use so I can ensure
1) Consistency in the transactional log backups (every 10 mins)
2) Consistency in the full DB backup every day
3) Disk drive of Transaction logs do not get filled up completely (I suspect
even if I set rown count to 10,000 and loop through, disk-space would still
be used')
Target environment
SQL Server 2000 cluster on 2 nodes running Windows 2003 Server
(Different disk drives for data and log files)A normal set based operation will do fine. update my table set myfield ='mynewvalue'
However, this will create significant locking so you're only going to be
this aggressive on a database that's not in use (Overnight, for most) or
write a cursor that includes some logic in it but that's going to be slow.
--
DatabaseAdmins.com
Databases Reliable. Available.
"Patrick" wrote:
> I need to do a bulk update of 1 field in a data-table (removing
> zero-padding). The update is expected to update 52.5 million records.
> What techniques could I use so I can ensure
> 1) Consistency in the transactional log backups (every 10 mins)
> 2) Consistency in the full DB backup every day
> 3) Disk drive of Transaction logs do not get filled up completely (I suspect
> even if I set rown count to 10,000 and loop through, disk-space would still
> be used')
> Target environment
> SQL Server 2000 cluster on 2 nodes running Windows 2003 Server
> (Different disk drives for data and log files)|||SQL Server takes care of consistency. To make sure you don't blow up the
transaction log, dividing the update into smaller chunks of updates (e.g.
10,000 rows at a time as you mentioned) is the right approach in general.
I'd like to add that you may have to pace your updates (e.g. sleep for some
time after each iteration) so that your transaction log backups at the
10-minute interval have a chance coming in to clear out the inactive portion
of the log for reuse. Alternative, you can backup the transaction log after
each iteration (or every so many iterations) in the loop. You need to
experiment a bit to find out the best chunk size and whether the loop without
any wait time would be too tight in your particular configuration.
Also, it's certainly possible you can choose to update all the rows in a
single shot, depending on the size of your transaction log.
Linchi
"burt_king" wrote:
> A normal set based operation will do fine. update my table set myfield => 'mynewvalue'
> However, this will create significant locking so you're only going to be
> this aggressive on a database that's not in use (Overnight, for most) or
> write a cursor that includes some logic in it but that's going to be slow.
> --
> DatabaseAdmins.com
> Databases Reliable. Available.
>
> "Patrick" wrote:
> > I need to do a bulk update of 1 field in a data-table (removing
> > zero-padding). The update is expected to update 52.5 million records.
> >
> > What techniques could I use so I can ensure
> > 1) Consistency in the transactional log backups (every 10 mins)
> > 2) Consistency in the full DB backup every day
> > 3) Disk drive of Transaction logs do not get filled up completely (I suspect
> > even if I set rown count to 10,000 and loop through, disk-space would still
> > be used')
> >
> > Target environment
> > SQL Server 2000 cluster on 2 nodes running Windows 2003 Server
> > (Different disk drives for data and log files)|||I have composed the following SQL, could you tell me what is wrong? The
Select getDate() only printed <10 dates! when I am expecting millions of
records to be updated!
DECLARE @.recCount int
SELECT @.recCount=50001
SET rowcount 50000
WHILE @.recCount>0
BEGIN
select getdate()
waitfor delay '000:00:5'
BEGIN TRAN
UPDATE
cor_transactions
SET
id_Entity = SUBSTRING(id_Entity,2,3)
WHERE
len(id_Entity)>3
and
id_Entity like '0%'
IF @.@.ERROR= 0
COMMIT TRAN
ELSE
BEGIN
ROLLBACK TRAN
print 'ERROR' + Cast(@.@.ERROR as varchar(50))
BREAK
END
SELECT @.recCount=@.@.ROWCOUNT
END
SELECT @.recCount=50001
SET rowcount 50000
WHILE @.recCount>0
BEGIN
select getdate()
waitfor delay '000:00:5'
BEGIN TRAN
UPDATE
cor_stocks
SET
id_Entity = SUBSTRING(id_Entity,2,3)
WHERE
len(id_Entity)>3
and
id_Entity like '0%'
IF @.@.ERROR= 0
COMMIT TRAN
ELSE
BEGIN
ROLLBACK TRAN
print 'ERROR' + Cast(@.@.ERROR as varchar(50))
BREAK
END
SELECT @.recCount=@.@.ROWCOUNT
END
SELECT @.recCount=50001
SET rowcount 50000
WHILE @.recCount>0
BEGIN
select getdate()
waitfor delay '000:00:5'
BEGIN TRAN
UPDATE
cor_inventory
SET
id_Entity = SUBSTRING(id_Entity,2,3)
WHERE
len(id_Entity)>3
and
id_Entity like '0%'
IF @.@.ERROR= 0
COMMIT TRAN
ELSE
BEGIN
ROLLBACK TRAN
print 'ERROR' + Cast(@.@.ERROR as varchar(50))
BREAK
END
SELECT @.recCount=@.@.ROWCOUNT
END
"Linchi Shea" wrote:
> SQL Server takes care of consistency. To make sure you don't blow up the
> transaction log, dividing the update into smaller chunks of updates (e.g.
> 10,000 rows at a time as you mentioned) is the right approach in general.
> I'd like to add that you may have to pace your updates (e.g. sleep for some
> time after each iteration) so that your transaction log backups at the
> 10-minute interval have a chance coming in to clear out the inactive portion
> of the log for reuse. Alternative, you can backup the transaction log after
> each iteration (or every so many iterations) in the loop. You need to
> experiment a bit to find out the best chunk size and whether the loop without
> any wait time would be too tight in your particular configuration.
> Also, it's certainly possible you can choose to update all the rows in a
> single shot, depending on the size of your transaction log.
> Linchi
> "burt_king" wrote:
> > A normal set based operation will do fine. update my table set myfield => > 'mynewvalue'
> >
> > However, this will create significant locking so you're only going to be
> > this aggressive on a database that's not in use (Overnight, for most) or
> > write a cursor that includes some logic in it but that's going to be slow.
> >
> > --
> > DatabaseAdmins.com
> > Databases Reliable. Available.
> >
> >
> > "Patrick" wrote:
> >
> > > I need to do a bulk update of 1 field in a data-table (removing
> > > zero-padding). The update is expected to update 52.5 million records.
> > >
> > > What techniques could I use so I can ensure
> > > 1) Consistency in the transactional log backups (every 10 mins)
> > > 2) Consistency in the full DB backup every day
> > > 3) Disk drive of Transaction logs do not get filled up completely (I suspect
> > > even if I set rown count to 10,000 and loop through, disk-space would still
> > > be used')
> > >
> > > Target environment
> > > SQL Server 2000 cluster on 2 nodes running Windows 2003 Server
> > > (Different disk drives for data and log files)
Bulk repopulation of a table
Hi, I have a table T with 3 int columns, I need to update it every X
hours with a query that returns the same 3 int columns. Currently I'm
doing it using a cursor iterating over each row and checking whether I
need to DELETE/UPDATE/INSERT it, this however isn't very efficient.
So I tested 2 additional methods, one involving deleting all the rows
from the target table and then inserting everything and the other by
dropping the table, recreating and inseting.
The results are not very surprising:
PROC,CPU,READS,WRITES,DURATION
DROP,2720,177757,218,4000
DELE,3150,183060,230,5300
CURS,8200,504818,247,8240
Which indicates that DROPing and reCREATing is the way to go, any
suggestions?
I don't like the DROP/CREATE scheme since it might involve the TABLEs
downtime (even when I CREATE a T1 then DROP T and sp_rename T1->T). If
there is no better way, will there be problems due to sp_rename
changing the name but not the ID of an object?
P.S. it's SQL2005, and the query itself executes in (2300,11596,0,2730)
[same column names as above].
Thanks.Can you give some more information so we can provide a more realistic
answer? The DDL for the existing table would help along with how or where
you are getting the data to update the table with. If the new data is a
complete up to date set then you have several options. I am assuming it is a
flat file but since you didn't provide that information it is only a guess.
I would look at using BULK INSERT to load the new data into a table that you
create. Lets call it X. Then you can prep this table to get it exactly like
you want it to be, drop the original table and rename X to what the old one
was. You may find that renaming the old one first, renaming X and then
dropping might yield the least time that the original table is off-line to
the users. But either way it should be a matter of a second or less.
Andrew J. Kelly SQL MVP
<johnsolver@.gmail.com> wrote in message
news:1137942204.960433.107600@.g49g2000cwa.googlegroups.com...
> Hi, I have a table T with 3 int columns, I need to update it every X
> hours with a query that returns the same 3 int columns. Currently I'm
> doing it using a cursor iterating over each row and checking whether I
> need to DELETE/UPDATE/INSERT it, this however isn't very efficient.
> So I tested 2 additional methods, one involving deleting all the rows
> from the target table and then inserting everything and the other by
> dropping the table, recreating and inseting.
> The results are not very surprising:
> PROC,CPU,READS,WRITES,DURATION
> DROP,2720,177757,218,4000
> DELE,3150,183060,230,5300
> CURS,8200,504818,247,8240
> Which indicates that DROPing and reCREATing is the way to go, any
> suggestions?
> I don't like the DROP/CREATE scheme since it might involve the TABLEs
> downtime (even when I CREATE a T1 then DROP T and sp_rename T1->T). If
> there is no better way, will there be problems due to sp_rename
> changing the name but not the ID of an object?
> P.S. it's SQL2005, and the query itself executes in (2300,11596,0,2730)
> [same column names as above].
> Thanks.
>|||Instead of deleting all rows from the table you can truncate it. Its not
a good idea to drop and recreate table.
Or
you can first store the data in one temporary and then update related
data from temporary table in permanent table using TSQL.
Please post data and how you want to update it.
Regards
Amish Shah
*** Sent via Developersdex http://www.examnotes.net ***|||Hi, sorry for not providing enough info:
the table:
CREATE TABLE [dbo].[T](
[tid] [int] NOT NULL,
[pid] [int] NOT NULL DEFAULT (0),
[r] [int] NOT NULL,
CONSTRAINT [PK_T] PRIMARY KEY CLUSTERED
(
[t] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY];
The query returns the exact 3 columns, simplistically: SELECT tid,pid,r
FROM A (it returns ~90k rows) I don't think that the exact query
matters... I'll be happy to provide additional info if needed.
Thanks.|||As to updating data, assuming T contains the following:
tid,pid,r
1,3,6
2,3,40
5,4,60
and the query (Q) returns
tid,pid,r
1,3,50
8,1,1000
Then I would like T to contain exactly what Q returned, namely:
DELETE FROM T WHERE tid=2
DELETE FROM T WHERE tid=5
UPDATE T SET r=50 WHERE tid=1
INSERT INTO T (tid,pid,r) VALUES(8,1,1000)
etc.|||If you are updating the value for r in table T, based on a match of both tid
and pid from the source table, then
--create a temporary table that has the data to be imported.
SELECT tid, pid, r INTO #importtable FROM (...the rest of your import query
...)
--Delete records from T that aren't in the new import batch
DELETE FROM T WHERE NOT EXISTS
(SELECT i.tid FROM #importtable i WHERE i.tid = T.tid)
--update the records that have matching tid and pid values
UPDATE T SET r = i.r FROM #importtable i
INNER JOIN T ON T.tid=i.tid AND T.pid = i.pid
--INSERT new records into T that didn't exist before
INSERT INTO T
SELECT tid,pid,r FROM #importtable i
WHERE NOT EXISTS
(SELECT tid FROM T WHERE T.tid = i.tid)
This gets more complicated if the PK on T is composite; I couldn't really
tell from the DDL you posted earlier.
Of course, it may be faster to just truncate the table and import in the new
data, as suggested earlier by Amish.
Another suggestion is to encapsulate the above into an INSTEAD OF trigger,
and then you could simply run
INSERT INTO T SELECT ...rest of your import query here ...
The INSTEAD OF trigger would fire before any primary key constraints would
be checked, so it could acheive what you want.
"johnsolver@.gmail.com" wrote:
> As to updating data, assuming T contains the following:
> tid,pid,r
> 1,3,6
> 2,3,40
> 5,4,60
> and the query (Q) returns
> tid,pid,r
> 1,3,50
> 8,1,1000
> Then I would like T to contain exactly what Q returned, namely:
> DELETE FROM T WHERE tid=2
> DELETE FROM T WHERE tid=5
> UPDATE T SET r=50 WHERE tid=1
> INSERT INTO T (tid,pid,r) VALUES(8,1,1000)
> etc.
>|||Thanks for the reply Mark,
I've tried your suggestion, performance-wise it's better than the
CURSOR solution (not surprising) but it's not as good as DROP or
TRUNCATE approaches. The key isn't a composite it's only one column as
you guessed, tid.
Amish: Why isn't it a good idea to DROP/CREATE the table? (except for
the minimal downtime due to sp_rename)?
Thanks.|||Johnsol
I also suggested you
you can first store the data in one temporary and then update related
data from temporary table in permanent table using TSQL.
But I was unable to give solution untill go post DDL.
Second
when you change the table name causes the sp, view and references
invalid which uses this table until you recreate new table with old
name.
If you drop and recreate table you have to recreate all relations ,
indexes again. You can not guarantee referential integrity of data
of othe tables if this table is part of any foregin key relationship
with them, since this table drops many times.
In some cases we have seen that updating data in the table was not
possible using TSQL easily and you have to create large number of temp
tables and check number of conditions and check data from number of
tables.
So I have gave you all the options but untill I get the data I was not
able to give you some solution.
Regards
Amish Shah|||Thanks for the reply Amish,
in the end I'll probably go with CREATE and DROP simply because it's
performance is about 1.5 times faster than the temp table solution, I
don't have any foreign keys/views built on the table.
So I've opted for the following (DDL follows), btw. should I wrap the
whole sequence in BEGIN TRANS ... COMIT TRANS?
CREATE TABLE [dbo].[T_1](
[tid] [int] NOT NULL,
[pid] [int] NOT NULL DEFAULT ((0)),
[r] [int] NOT NULL,
CONSTRAINT [PK_T_1] PRIMARY KEY CLUSTERED
(
[tid] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
INSERT INTO T_1 SELECT tid,pid,r FROM A;
DROP TABLE T;
EXEC sp_rename 'T_1','T';
EXEC sp_rename 'T.PK_T_1','PK_T','INDEX';|||Yes at least you should do all in transaction.
Regards
Amish Shah
hours with a query that returns the same 3 int columns. Currently I'm
doing it using a cursor iterating over each row and checking whether I
need to DELETE/UPDATE/INSERT it, this however isn't very efficient.
So I tested 2 additional methods, one involving deleting all the rows
from the target table and then inserting everything and the other by
dropping the table, recreating and inseting.
The results are not very surprising:
PROC,CPU,READS,WRITES,DURATION
DROP,2720,177757,218,4000
DELE,3150,183060,230,5300
CURS,8200,504818,247,8240
Which indicates that DROPing and reCREATing is the way to go, any
suggestions?
I don't like the DROP/CREATE scheme since it might involve the TABLEs
downtime (even when I CREATE a T1 then DROP T and sp_rename T1->T). If
there is no better way, will there be problems due to sp_rename
changing the name but not the ID of an object?
P.S. it's SQL2005, and the query itself executes in (2300,11596,0,2730)
[same column names as above].
Thanks.Can you give some more information so we can provide a more realistic
answer? The DDL for the existing table would help along with how or where
you are getting the data to update the table with. If the new data is a
complete up to date set then you have several options. I am assuming it is a
flat file but since you didn't provide that information it is only a guess.
I would look at using BULK INSERT to load the new data into a table that you
create. Lets call it X. Then you can prep this table to get it exactly like
you want it to be, drop the original table and rename X to what the old one
was. You may find that renaming the old one first, renaming X and then
dropping might yield the least time that the original table is off-line to
the users. But either way it should be a matter of a second or less.
Andrew J. Kelly SQL MVP
<johnsolver@.gmail.com> wrote in message
news:1137942204.960433.107600@.g49g2000cwa.googlegroups.com...
> Hi, I have a table T with 3 int columns, I need to update it every X
> hours with a query that returns the same 3 int columns. Currently I'm
> doing it using a cursor iterating over each row and checking whether I
> need to DELETE/UPDATE/INSERT it, this however isn't very efficient.
> So I tested 2 additional methods, one involving deleting all the rows
> from the target table and then inserting everything and the other by
> dropping the table, recreating and inseting.
> The results are not very surprising:
> PROC,CPU,READS,WRITES,DURATION
> DROP,2720,177757,218,4000
> DELE,3150,183060,230,5300
> CURS,8200,504818,247,8240
> Which indicates that DROPing and reCREATing is the way to go, any
> suggestions?
> I don't like the DROP/CREATE scheme since it might involve the TABLEs
> downtime (even when I CREATE a T1 then DROP T and sp_rename T1->T). If
> there is no better way, will there be problems due to sp_rename
> changing the name but not the ID of an object?
> P.S. it's SQL2005, and the query itself executes in (2300,11596,0,2730)
> [same column names as above].
> Thanks.
>|||Instead of deleting all rows from the table you can truncate it. Its not
a good idea to drop and recreate table.
Or
you can first store the data in one temporary and then update related
data from temporary table in permanent table using TSQL.
Please post data and how you want to update it.
Regards
Amish Shah
*** Sent via Developersdex http://www.examnotes.net ***|||Hi, sorry for not providing enough info:
the table:
CREATE TABLE [dbo].[T](
[tid] [int] NOT NULL,
[pid] [int] NOT NULL DEFAULT (0),
[r] [int] NOT NULL,
CONSTRAINT [PK_T] PRIMARY KEY CLUSTERED
(
[t] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY];
The query returns the exact 3 columns, simplistically: SELECT tid,pid,r
FROM A (it returns ~90k rows) I don't think that the exact query
matters... I'll be happy to provide additional info if needed.
Thanks.|||As to updating data, assuming T contains the following:
tid,pid,r
1,3,6
2,3,40
5,4,60
and the query (Q) returns
tid,pid,r
1,3,50
8,1,1000
Then I would like T to contain exactly what Q returned, namely:
DELETE FROM T WHERE tid=2
DELETE FROM T WHERE tid=5
UPDATE T SET r=50 WHERE tid=1
INSERT INTO T (tid,pid,r) VALUES(8,1,1000)
etc.|||If you are updating the value for r in table T, based on a match of both tid
and pid from the source table, then
--create a temporary table that has the data to be imported.
SELECT tid, pid, r INTO #importtable FROM (...the rest of your import query
...)
--Delete records from T that aren't in the new import batch
DELETE FROM T WHERE NOT EXISTS
(SELECT i.tid FROM #importtable i WHERE i.tid = T.tid)
--update the records that have matching tid and pid values
UPDATE T SET r = i.r FROM #importtable i
INNER JOIN T ON T.tid=i.tid AND T.pid = i.pid
--INSERT new records into T that didn't exist before
INSERT INTO T
SELECT tid,pid,r FROM #importtable i
WHERE NOT EXISTS
(SELECT tid FROM T WHERE T.tid = i.tid)
This gets more complicated if the PK on T is composite; I couldn't really
tell from the DDL you posted earlier.
Of course, it may be faster to just truncate the table and import in the new
data, as suggested earlier by Amish.
Another suggestion is to encapsulate the above into an INSTEAD OF trigger,
and then you could simply run
INSERT INTO T SELECT ...rest of your import query here ...
The INSTEAD OF trigger would fire before any primary key constraints would
be checked, so it could acheive what you want.
"johnsolver@.gmail.com" wrote:
> As to updating data, assuming T contains the following:
> tid,pid,r
> 1,3,6
> 2,3,40
> 5,4,60
> and the query (Q) returns
> tid,pid,r
> 1,3,50
> 8,1,1000
> Then I would like T to contain exactly what Q returned, namely:
> DELETE FROM T WHERE tid=2
> DELETE FROM T WHERE tid=5
> UPDATE T SET r=50 WHERE tid=1
> INSERT INTO T (tid,pid,r) VALUES(8,1,1000)
> etc.
>|||Thanks for the reply Mark,
I've tried your suggestion, performance-wise it's better than the
CURSOR solution (not surprising) but it's not as good as DROP or
TRUNCATE approaches. The key isn't a composite it's only one column as
you guessed, tid.
Amish: Why isn't it a good idea to DROP/CREATE the table? (except for
the minimal downtime due to sp_rename)?
Thanks.|||Johnsol
I also suggested you
you can first store the data in one temporary and then update related
data from temporary table in permanent table using TSQL.
But I was unable to give solution untill go post DDL.
Second
when you change the table name causes the sp, view and references
invalid which uses this table until you recreate new table with old
name.
If you drop and recreate table you have to recreate all relations ,
indexes again. You can not guarantee referential integrity of data
of othe tables if this table is part of any foregin key relationship
with them, since this table drops many times.
In some cases we have seen that updating data in the table was not
possible using TSQL easily and you have to create large number of temp
tables and check number of conditions and check data from number of
tables.
So I have gave you all the options but untill I get the data I was not
able to give you some solution.
Regards
Amish Shah|||Thanks for the reply Amish,
in the end I'll probably go with CREATE and DROP simply because it's
performance is about 1.5 times faster than the temp table solution, I
don't have any foreign keys/views built on the table.
So I've opted for the following (DDL follows), btw. should I wrap the
whole sequence in BEGIN TRANS ... COMIT TRANS?
CREATE TABLE [dbo].[T_1](
[tid] [int] NOT NULL,
[pid] [int] NOT NULL DEFAULT ((0)),
[r] [int] NOT NULL,
CONSTRAINT [PK_T_1] PRIMARY KEY CLUSTERED
(
[tid] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
INSERT INTO T_1 SELECT tid,pid,r FROM A;
DROP TABLE T;
EXEC sp_rename 'T_1','T';
EXEC sp_rename 'T.PK_T_1','PK_T','INDEX';|||Yes at least you should do all in transaction.
Regards
Amish Shah
Sunday, March 25, 2012
Bulk Insert/Update Ideas
I need a fresh set of eyes.
On a daily basis I need to perform a bulk update. Table totals about 50,000 records with approximately 5,000 changing (deletes, edits, and new records) per day. I'd like to push just the updates somehow, but VB is too slow and I haven't found a way in to handle it in DTS. Not much experience w/ DTS.
I'm transfering between two SQL 2000 servers w/ a VB app sitting in the middle.
Any ideas?you can break the incoming file into 3 corresponding to operations (insert, update, delete) using findstr with redirection (>)
then for insert you just do a straight bulk insert, while for update and delete do a bulk insert into a staging table and then delete and update by joining it with the live table.
On a daily basis I need to perform a bulk update. Table totals about 50,000 records with approximately 5,000 changing (deletes, edits, and new records) per day. I'd like to push just the updates somehow, but VB is too slow and I haven't found a way in to handle it in DTS. Not much experience w/ DTS.
I'm transfering between two SQL 2000 servers w/ a VB app sitting in the middle.
Any ideas?you can break the incoming file into 3 corresponding to operations (insert, update, delete) using findstr with redirection (>)
then for insert you just do a straight bulk insert, while for update and delete do a bulk insert into a staging table and then delete and update by joining it with the live table.
Wednesday, March 7, 2012
Bulk insert bypasses trigger
Hey Folks,
A 3rd party product uses some kind of bulk insert to update a table
several times during the day. Sometimes the data is overwritten, which
effectively loses history. A trigger on the table is bypassed by the bulk
process. I'm hoping there is another way to get notified when the table is
updated so I can quickly copy it to a safer location for reporting purposes.
Maybe an event? Alert?
Thanks!> A 3rd party product uses some kind of bulk insert to update a table
> several times during the day. Sometimes the data is overwritten, which
> effectively loses history. A trigger on the table is bypassed by the bulk
> process. I'm hoping there is another way to get notified when the table is
> updated so I can quickly copy it to a safer location for reporting
> purposes.
Bulk Insert to a staging table, then perform a regular insert ... select ...
from to ensure the trigger is fired.|||BULK INSERT? See the FIRE_TRIGGERS argument for the BULK INSERT statment in
BOL.
HTH
Jerry
"MnFisher" <mnfisher@.community.nospam> wrote in message
news:utwybjoyFHA.2516@.TK2MSFTNGP12.phx.gbl...
> Hey Folks,
> A 3rd party product uses some kind of bulk insert to update a table
> several times during the day. Sometimes the data is overwritten, which
> effectively loses history. A trigger on the table is bypassed by the bulk
> process. I'm hoping there is another way to get notified when the table is
> updated so I can quickly copy it to a safer location for reporting
> purposes.
> Maybe an event? Alert?
> Thanks!
>
>|||Jerry,
Thanks for the response. I don't have access to the command itself. Is there
any other way to catch a bulk insert?
"Jerry Spivey" <jspivey@.vestas-awt.com> wrote in message
news:uJbnaooyFHA.1960@.TK2MSFTNGP10.phx.gbl...
> BULK INSERT? See the FIRE_TRIGGERS argument for the BULK INSERT statment
> in BOL.
> HTH
> Jerry
> "MnFisher" <mnfisher@.community.nospam> wrote in message
> news:utwybjoyFHA.2516@.TK2MSFTNGP12.phx.gbl...
>|||To "catch"? view? Yes. Profiler.
HTH
Jerry
"MnFisher" <mnfisher@.community.nospam> wrote in message
news:OptkiQpyFHA.2812@.TK2MSFTNGP14.phx.gbl...
> Jerry,
> Thanks for the response. I don't have access to the command itself. Is
> there any other way to catch a bulk insert?
> "Jerry Spivey" <jspivey@.vestas-awt.com> wrote in message
> news:uJbnaooyFHA.1960@.TK2MSFTNGP10.phx.gbl...
>
A 3rd party product uses some kind of bulk insert to update a table
several times during the day. Sometimes the data is overwritten, which
effectively loses history. A trigger on the table is bypassed by the bulk
process. I'm hoping there is another way to get notified when the table is
updated so I can quickly copy it to a safer location for reporting purposes.
Maybe an event? Alert?
Thanks!> A 3rd party product uses some kind of bulk insert to update a table
> several times during the day. Sometimes the data is overwritten, which
> effectively loses history. A trigger on the table is bypassed by the bulk
> process. I'm hoping there is another way to get notified when the table is
> updated so I can quickly copy it to a safer location for reporting
> purposes.
Bulk Insert to a staging table, then perform a regular insert ... select ...
from to ensure the trigger is fired.|||BULK INSERT? See the FIRE_TRIGGERS argument for the BULK INSERT statment in
BOL.
HTH
Jerry
"MnFisher" <mnfisher@.community.nospam> wrote in message
news:utwybjoyFHA.2516@.TK2MSFTNGP12.phx.gbl...
> Hey Folks,
> A 3rd party product uses some kind of bulk insert to update a table
> several times during the day. Sometimes the data is overwritten, which
> effectively loses history. A trigger on the table is bypassed by the bulk
> process. I'm hoping there is another way to get notified when the table is
> updated so I can quickly copy it to a safer location for reporting
> purposes.
> Maybe an event? Alert?
> Thanks!
>
>|||Jerry,
Thanks for the response. I don't have access to the command itself. Is there
any other way to catch a bulk insert?
"Jerry Spivey" <jspivey@.vestas-awt.com> wrote in message
news:uJbnaooyFHA.1960@.TK2MSFTNGP10.phx.gbl...
> BULK INSERT? See the FIRE_TRIGGERS argument for the BULK INSERT statment
> in BOL.
> HTH
> Jerry
> "MnFisher" <mnfisher@.community.nospam> wrote in message
> news:utwybjoyFHA.2516@.TK2MSFTNGP12.phx.gbl...
>|||To "catch"? view? Yes. Profiler.
HTH
Jerry
"MnFisher" <mnfisher@.community.nospam> wrote in message
news:OptkiQpyFHA.2812@.TK2MSFTNGP14.phx.gbl...
> Jerry,
> Thanks for the response. I don't have access to the command itself. Is
> there any other way to catch a bulk insert?
> "Jerry Spivey" <jspivey@.vestas-awt.com> wrote in message
> news:uJbnaooyFHA.1960@.TK2MSFTNGP10.phx.gbl...
>
Tuesday, February 14, 2012
Builtin alternatives to UDF?
I'm using some simple scalar UDFs in expressions that update sets of
around 500,000 rows, but have some worries about possible performance
hits. Question is, are there relativly simple ways of accomplishing the
same inline using the builtin functions?
create function dbo.f_MaxOf(@.x1 float, @.x2 float) returns float
AS
begin
return(case when @.x1 is null then @.x2
when @.x2 is null then @.x1
when @.x1 < @.x2 then @.x2
else @.x1
end)
end
go
----
--
create function dbo.f_IfZero(@.x float, @.minValue float) returns float
AS
begin
return(case when ABS(@.x) < @.minValue then @.minValue
else @.x
end)
end
(simple) Example usage:
UPDATE T set colA = dbo.f_MaxOf(colB, colE) / dbo.f_IfZero(colC + colD,
0.001)
The "real life" code is somewhat more complex ... :-)
An obvious alternative to the 'f_IfZero' function would be something
like
(case when ABS(colC + colD) < 0.001 then 0.001 else colC + colD
end)
but this quickly gets ugly when the expression being evaluated is more
complex, and the update has multiple occurences of the same function
that need to be replaced ...
Ideas anyone?
/Erikhave you ever heard of COALESCE function? It might simplify some of you
logic. Lookup in SQL Books on Line. Also, it seems that all of what you are
doing can be done as one update with no functions.
<eox_conceptos@.despammed.com> wrote in message
news:1144239792.905239.41070@.i40g2000cwc.googlegroups.com...
> I'm using some simple scalar UDFs in expressions that update sets of
> around 500,000 rows, but have some worries about possible performance
> hits. Question is, are there relativly simple ways of accomplishing the
> same inline using the builtin functions?
> create function dbo.f_MaxOf(@.x1 float, @.x2 float) returns float
> AS
> begin
> return(case when @.x1 is null then @.x2
> when @.x2 is null then @.x1
> when @.x1 < @.x2 then @.x2
> else @.x1
> end)
> end
> go
> ----
--
> create function dbo.f_IfZero(@.x float, @.minValue float) returns float
> AS
> begin
> return(case when ABS(@.x) < @.minValue then @.minValue
> else @.x
> end)
> end
> (simple) Example usage:
> UPDATE T set colA = dbo.f_MaxOf(colB, colE) / dbo.f_IfZero(colC + colD,
> 0.001)
> The "real life" code is somewhat more complex ... :-)
> An obvious alternative to the 'f_IfZero' function would be something
> like
> (case when ABS(colC + colD) < 0.001 then 0.001 else colC + colD
> end)
> but this quickly gets ugly when the expression being evaluated is more
> complex, and the update has multiple occurences of the same function
> that need to be replaced ...
> Ideas anyone?
> /Erik
>|||Farmer wrote:
> have you ever heard of COALESCE function? It might simplify some of you
> logic. Lookup in SQL Books on Line. Also, it seems that all of what you ar
e
> doing can be done as one update with no functions.
>
I'm familiar with coalesce. Of course, the example update could have
been written as:
UPDATE T set colA =
(case when colB is null then colE
when colE is null then colB
when colB < colE then colE
else colB
end) / (case when ABS(colC + colD) < 0.001 then 0.001 else colC + colD
end)
but this gets messy and obscures the underlying logic.
Could you clarify what you mean by "can be done as one update with no
functions"? This *is* one update ...|||I guess the following could work instead of "f_IsZero":
COALESCE(NULLIF(ROUND( expression , 3, 1 ), 0.0), 0.001)
... and that just leaves finding a suitable replacement for
"dbo.f_MaxOf"
:-)
around 500,000 rows, but have some worries about possible performance
hits. Question is, are there relativly simple ways of accomplishing the
same inline using the builtin functions?
create function dbo.f_MaxOf(@.x1 float, @.x2 float) returns float
AS
begin
return(case when @.x1 is null then @.x2
when @.x2 is null then @.x1
when @.x1 < @.x2 then @.x2
else @.x1
end)
end
go
----
--
create function dbo.f_IfZero(@.x float, @.minValue float) returns float
AS
begin
return(case when ABS(@.x) < @.minValue then @.minValue
else @.x
end)
end
(simple) Example usage:
UPDATE T set colA = dbo.f_MaxOf(colB, colE) / dbo.f_IfZero(colC + colD,
0.001)
The "real life" code is somewhat more complex ... :-)
An obvious alternative to the 'f_IfZero' function would be something
like
(case when ABS(colC + colD) < 0.001 then 0.001 else colC + colD
end)
but this quickly gets ugly when the expression being evaluated is more
complex, and the update has multiple occurences of the same function
that need to be replaced ...
Ideas anyone?
/Erikhave you ever heard of COALESCE function? It might simplify some of you
logic. Lookup in SQL Books on Line. Also, it seems that all of what you are
doing can be done as one update with no functions.
<eox_conceptos@.despammed.com> wrote in message
news:1144239792.905239.41070@.i40g2000cwc.googlegroups.com...
> I'm using some simple scalar UDFs in expressions that update sets of
> around 500,000 rows, but have some worries about possible performance
> hits. Question is, are there relativly simple ways of accomplishing the
> same inline using the builtin functions?
> create function dbo.f_MaxOf(@.x1 float, @.x2 float) returns float
> AS
> begin
> return(case when @.x1 is null then @.x2
> when @.x2 is null then @.x1
> when @.x1 < @.x2 then @.x2
> else @.x1
> end)
> end
> go
> ----
--
> create function dbo.f_IfZero(@.x float, @.minValue float) returns float
> AS
> begin
> return(case when ABS(@.x) < @.minValue then @.minValue
> else @.x
> end)
> end
> (simple) Example usage:
> UPDATE T set colA = dbo.f_MaxOf(colB, colE) / dbo.f_IfZero(colC + colD,
> 0.001)
> The "real life" code is somewhat more complex ... :-)
> An obvious alternative to the 'f_IfZero' function would be something
> like
> (case when ABS(colC + colD) < 0.001 then 0.001 else colC + colD
> end)
> but this quickly gets ugly when the expression being evaluated is more
> complex, and the update has multiple occurences of the same function
> that need to be replaced ...
> Ideas anyone?
> /Erik
>|||Farmer wrote:
> have you ever heard of COALESCE function? It might simplify some of you
> logic. Lookup in SQL Books on Line. Also, it seems that all of what you ar
e
> doing can be done as one update with no functions.
>
I'm familiar with coalesce. Of course, the example update could have
been written as:
UPDATE T set colA =
(case when colB is null then colE
when colE is null then colB
when colB < colE then colE
else colB
end) / (case when ABS(colC + colD) < 0.001 then 0.001 else colC + colD
end)
but this gets messy and obscures the underlying logic.
Could you clarify what you mean by "can be done as one update with no
functions"? This *is* one update ...|||I guess the following could work instead of "f_IsZero":
COALESCE(NULLIF(ROUND( expression , 3, 1 ), 0.0), 0.001)
... and that just leaves finding a suitable replacement for
"dbo.f_MaxOf"
:-)
Subscribe to:
Comments (Atom)