build sql script: BULK INSERT performance

The fact that you're using GUIDs says much. In SQL 2005, they have a new
function - NEWSEQUENTIALID() - that can alleviate the problems that you get
with NEWID(). NEWID() creates random GUIDs, so when you do INSERTs, they're
all over the place. I would try lowering the fill factor on the
nonclustered index. Start with 50 or 60. After the bulk insert, defrag the
index.
Try running DBCC SHOWCONTIG to report how fragmented that index is.
As for the clustered index, is it a monotonically increasing key? If so,
make the fill factor 100, since a lower fill factor won't buy you anything.
Tom
Thomas A. Moreau, BSc, PhD, MCSE, MCDBA, MCITP, MCTS
SQL Server MVP
Toronto, ON Canada
https://mvp.support.microsoft.com/profile/Tom.Moreau
"Scott Nichol" <snicholnews@.scottnichol.com> wrote in message
news:%23OYnHQQHIHA.2268@.TK2MSFTNGP02.phx.gbl...
Good questions. Fill factor is 80% (the default?). The write pattern in
general is no writes, then bursts every two minutes or so
at checkpoint. During bursts, write queue goes to 100 or more. Each burst
is about 8 seconds. The read pattern is more consistent
and lighter, with read queue never higher than 6, but normally 0 (and that's
with other concurrent activity in the database). The
reads are from the RAID 10 set with the non-clustered index. There is very
little reading from the set with the clustered index.
On a development machine where I can run the BULK INSERT without other
activity, the disk activity on the RAID set with the
clustered index consists of one write burst of 4 seconds. The disk activity
on the RAID set with the non-clustered index shows
consistent reads for the rest of the load, with intermiittent write bursts.
The feeling we get from the reads is that SQL Server is reading through the
existing index nodes to find leaves that need to be
updated (splitting when necessary, etc.).
The values in the non-clustered index are GUIDs and thus distributed
throughout the index.
Scott Nichol
"Tom Moreau" <tom@.dont.spam.me.cips.ca> wrote in message
news:%23HGP4$LHIHA.1316@.TK2MSFTNGP02.phx.gbl...
> What fill factors do you have? I'm thinking that there may be a lot of
> page
> splitting going on.
> There could also be disk issues, but you did say that things looked OK
> there - "Perfmon does not indicate any resource being pinned". What is
> the
> average disk queue length? Is it > 24?
> It may come to pass that you need to use horizontal partitioning and break
> the table up.
> --
> Tom
> ----
> Thomas A. Moreau, BSc, PhD, MCSE, MCDBA, MCITP, MCTS
> SQL Server MVP
> Toronto, ON Canada
> https://mvp.support.microsoft.com/profile/Tom.Moreau
>
> "Scott Nichol" <snicholnews@.scottnichol.com> wrote in message
> news:OFCIl3KHIHA.4684@.TK2MSFTNGP06.phx.gbl...
> Thanks for the responses. If the speed of the BULK INSERT were the only
> criterion, dropping the non-clustered index would be great.
> Without it, but with the clustered index and data still there, I load at
> about 20,000 rows/sec. Unfortuntely, re-building an index
> on a table with 500+ million records takes much longer than the 3000
> seconds
> to load with the index intact.
> What I am really wondering is whether anyone else has a similar scenario
> (clustered and non-clustered indexes on a large table) and
> whether the affect on BULK INSERT performance is the same order of
> magnitude
> as I am experiencing (2 order of magnitude drop).
> --
> Scott Nichol
> "Tom Moreau" <tom@.dont.spam.me.cips.ca> wrote in message
> news:%238rvAkIHIHA.1548@.TK2MSFTNGP05.phx.gbl...
>
"Tom Moreau" <tom@.dont.spam.me.cips.ca> wrote in message news:eHmpCuTHIHA.4040@.TK2MSFTNGP02.phx.gbl...
> The fact that you're using GUIDs says much. In SQL 2005, they have a new
> function - NEWSEQUENTIALID() - that can alleviate the problems that you get
> with NEWID().
That sounds very useful. One more reason to look forward to migration next year.

> NEWID() creates random GUIDs, so when you do INSERTs, they're
> all over the place. I would try lowering the fill factor on the
> nonclustered index. Start with 50 or 60. After the bulk insert, defrag the
> index.
> Try running DBCC SHOWCONTIG to report how fragmented that index is.
Good idea: I will definitely check/alleviate fragmentation. I may toy with the fill factor on the dev machine first.

> As for the clustered index, is it a monotonically increasing key? If so,
> make the fill factor 100, since a lower fill factor won't buy you anything.
No such luck. It is 6 columns. Because the first column is a date, *most* of the data added each day is at the "end" of the index.
Unfortunately, there are some data laggards that arrive days later, so the BULK INSERT is not a pure append.
Thanks for all your help.
Scott Nichol

> --
> Tom
> ----
> Thomas A. Moreau, BSc, PhD, MCSE, MCDBA, MCITP, MCTS
> SQL Server MVP
> Toronto, ON Canada
> https://mvp.support.microsoft.com/profile/Tom.Moreau
>
> "Scott Nichol" <snicholnews@.scottnichol.com> wrote in message
> news:%23OYnHQQHIHA.2268@.TK2MSFTNGP02.phx.gbl...
> Good questions. Fill factor is 80% (the default?). The write pattern in
> general is no writes, then bursts every two minutes or so
> at checkpoint. During bursts, write queue goes to 100 or more. Each burst
> is about 8 seconds. The read pattern is more consistent
> and lighter, with read queue never higher than 6, but normally 0 (and that's
> with other concurrent activity in the database). The
> reads are from the RAID 10 set with the non-clustered index. There is very
> little reading from the set with the clustered index.
> On a development machine where I can run the BULK INSERT without other
> activity, the disk activity on the RAID set with the
> clustered index consists of one write burst of 4 seconds. The disk activity
> on the RAID set with the non-clustered index shows
> consistent reads for the rest of the load, with intermiittent write bursts.
> The feeling we get from the reads is that SQL Server is reading through the
> existing index nodes to find leaves that need to be
> updated (splitting when necessary, etc.).
> The values in the non-clustered index are GUIDs and thus distributed
> throughout the index.
> --
> Scott Nichol
> "Tom Moreau" <tom@.dont.spam.me.cips.ca> wrote in message
> news:%23HGP4$LHIHA.1316@.TK2MSFTNGP02.phx.gbl...
>

Monday, March 19, 2012

BULK INSERT performance

No comments:

Post a Comment

build sql script

Blog Archive

About Me