Thursday, March 29, 2012

Bulk repopulation of a table

Hi, I have a table T with 3 int columns, I need to update it every X
hours with a query that returns the same 3 int columns. Currently I'm
doing it using a cursor iterating over each row and checking whether I
need to DELETE/UPDATE/INSERT it, this however isn't very efficient.
So I tested 2 additional methods, one involving deleting all the rows
from the target table and then inserting everything and the other by
dropping the table, recreating and inseting.
The results are not very surprising:
PROC,CPU,READS,WRITES,DURATION
DROP,2720,177757,218,4000
DELE,3150,183060,230,5300
CURS,8200,504818,247,8240
Which indicates that DROPing and reCREATing is the way to go, any
suggestions?
I don't like the DROP/CREATE scheme since it might involve the TABLEs
downtime (even when I CREATE a T1 then DROP T and sp_rename T1->T). If
there is no better way, will there be problems due to sp_rename
changing the name but not the ID of an object?
P.S. it's SQL2005, and the query itself executes in (2300,11596,0,2730)
[same column names as above].
Thanks.Can you give some more information so we can provide a more realistic
answer? The DDL for the existing table would help along with how or where
you are getting the data to update the table with. If the new data is a
complete up to date set then you have several options. I am assuming it is a
flat file but since you didn't provide that information it is only a guess.
I would look at using BULK INSERT to load the new data into a table that you
create. Lets call it X. Then you can prep this table to get it exactly like
you want it to be, drop the original table and rename X to what the old one
was. You may find that renaming the old one first, renaming X and then
dropping might yield the least time that the original table is off-line to
the users. But either way it should be a matter of a second or less.
Andrew J. Kelly SQL MVP
<johnsolver@.gmail.com> wrote in message
news:1137942204.960433.107600@.g49g2000cwa.googlegroups.com...
> Hi, I have a table T with 3 int columns, I need to update it every X
> hours with a query that returns the same 3 int columns. Currently I'm
> doing it using a cursor iterating over each row and checking whether I
> need to DELETE/UPDATE/INSERT it, this however isn't very efficient.
> So I tested 2 additional methods, one involving deleting all the rows
> from the target table and then inserting everything and the other by
> dropping the table, recreating and inseting.
> The results are not very surprising:
> PROC,CPU,READS,WRITES,DURATION
> DROP,2720,177757,218,4000
> DELE,3150,183060,230,5300
> CURS,8200,504818,247,8240
> Which indicates that DROPing and reCREATing is the way to go, any
> suggestions?
> I don't like the DROP/CREATE scheme since it might involve the TABLEs
> downtime (even when I CREATE a T1 then DROP T and sp_rename T1->T). If
> there is no better way, will there be problems due to sp_rename
> changing the name but not the ID of an object?
> P.S. it's SQL2005, and the query itself executes in (2300,11596,0,2730)
> [same column names as above].
> Thanks.
>|||Instead of deleting all rows from the table you can truncate it. Its not
a good idea to drop and recreate table.
Or
you can first store the data in one temporary and then update related
data from temporary table in permanent table using TSQL.
Please post data and how you want to update it.
Regards
Amish Shah
*** Sent via Developersdex http://www.examnotes.net ***|||Hi, sorry for not providing enough info:
the table:
CREATE TABLE [dbo].[T](
[tid] [int] NOT NULL,
[pid] [int] NOT NULL DEFAULT (0),
[r] [int] NOT NULL,
CONSTRAINT [PK_T] PRIMARY KEY CLUSTERED
(
[t] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY];
The query returns the exact 3 columns, simplistically: SELECT tid,pid,r
FROM A (it returns ~90k rows) I don't think that the exact query
matters... I'll be happy to provide additional info if needed.
Thanks.|||As to updating data, assuming T contains the following:
tid,pid,r
1,3,6
2,3,40
5,4,60
and the query (Q) returns
tid,pid,r
1,3,50
8,1,1000
Then I would like T to contain exactly what Q returned, namely:
DELETE FROM T WHERE tid=2
DELETE FROM T WHERE tid=5
UPDATE T SET r=50 WHERE tid=1
INSERT INTO T (tid,pid,r) VALUES(8,1,1000)
etc.|||If you are updating the value for r in table T, based on a match of both tid
and pid from the source table, then
--create a temporary table that has the data to be imported.
SELECT tid, pid, r INTO #importtable FROM (...the rest of your import query
...)
--Delete records from T that aren't in the new import batch
DELETE FROM T WHERE NOT EXISTS
(SELECT i.tid FROM #importtable i WHERE i.tid = T.tid)
--update the records that have matching tid and pid values
UPDATE T SET r = i.r FROM #importtable i
INNER JOIN T ON T.tid=i.tid AND T.pid = i.pid
--INSERT new records into T that didn't exist before
INSERT INTO T
SELECT tid,pid,r FROM #importtable i
WHERE NOT EXISTS
(SELECT tid FROM T WHERE T.tid = i.tid)
This gets more complicated if the PK on T is composite; I couldn't really
tell from the DDL you posted earlier.
Of course, it may be faster to just truncate the table and import in the new
data, as suggested earlier by Amish.
Another suggestion is to encapsulate the above into an INSTEAD OF trigger,
and then you could simply run
INSERT INTO T SELECT ...rest of your import query here ...
The INSTEAD OF trigger would fire before any primary key constraints would
be checked, so it could acheive what you want.
"johnsolver@.gmail.com" wrote:

> As to updating data, assuming T contains the following:
> tid,pid,r
> 1,3,6
> 2,3,40
> 5,4,60
> and the query (Q) returns
> tid,pid,r
> 1,3,50
> 8,1,1000
> Then I would like T to contain exactly what Q returned, namely:
> DELETE FROM T WHERE tid=2
> DELETE FROM T WHERE tid=5
> UPDATE T SET r=50 WHERE tid=1
> INSERT INTO T (tid,pid,r) VALUES(8,1,1000)
> etc.
>|||Thanks for the reply Mark,
I've tried your suggestion, performance-wise it's better than the
CURSOR solution (not surprising) but it's not as good as DROP or
TRUNCATE approaches. The key isn't a composite it's only one column as
you guessed, tid.
Amish: Why isn't it a good idea to DROP/CREATE the table? (except for
the minimal downtime due to sp_rename)?
Thanks.|||Johnsol
I also suggested you
you can first store the data in one temporary and then update related
data from temporary table in permanent table using TSQL.
But I was unable to give solution untill go post DDL.
Second
when you change the table name causes the sp, view and references
invalid which uses this table until you recreate new table with old
name.
If you drop and recreate table you have to recreate all relations ,
indexes again. You can not guarantee referential integrity of data
of othe tables if this table is part of any foregin key relationship
with them, since this table drops many times.
In some cases we have seen that updating data in the table was not
possible using TSQL easily and you have to create large number of temp
tables and check number of conditions and check data from number of
tables.
So I have gave you all the options but untill I get the data I was not
able to give you some solution.
Regards
Amish Shah|||Thanks for the reply Amish,
in the end I'll probably go with CREATE and DROP simply because it's
performance is about 1.5 times faster than the temp table solution, I
don't have any foreign keys/views built on the table.
So I've opted for the following (DDL follows), btw. should I wrap the
whole sequence in BEGIN TRANS ... COMIT TRANS?
CREATE TABLE [dbo].[T_1](
[tid] [int] NOT NULL,
[pid] [int] NOT NULL DEFAULT ((0)),
[r] [int] NOT NULL,
CONSTRAINT [PK_T_1] PRIMARY KEY CLUSTERED
(
[tid] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
INSERT INTO T_1 SELECT tid,pid,r FROM A;
DROP TABLE T;
EXEC sp_rename 'T_1','T';
EXEC sp_rename 'T.PK_T_1','PK_T','INDEX';|||Yes at least you should do all in transaction.
Regards
Amish Shah

No comments:

Post a Comment