build sql script: BULK INSERT performance

Question/sanity check about BULK INSERT performance.
I load a 300,000 row text file into my largest table every day. The file loads into an empty table with the same structure and
defaults but no indexes in about 10 seconds, for a rate of 30,000 rows/sec. The same file takes more like 3000 seconds to load into
the "real" table, which has a clustered index, one non-clustered index and 547,640,055 rows of data (as of today). Is the
discrepency in rate (30,000 rows/sec vs. 100 rows/sec) reasonable? Is it because the indexes on the table are so deep that updating
them requires lots of page-at-a-time I/O?
Some useful info:
1. The file being loaded is sorted in clustered index order
2. The BULK INSERT has the ORDER clause
3. The TABLOCK option on BULK INSERT makes no difference in performance (lock is quickly escalated to table when the option is not
specified)
4. Database recovery mode is Bulk-Logged
5. Doing row-by-row INSERTs takes 5700 seconds or so, i.e. BULK INSERT is not even twice as fast
6. SQL Server 2000 SP3 on Windows 2003
7. SQL Server uses approximately 2 GB RAM (max for our OS + SS versions)
8. Transaction log is on RAID 1, data on 12-spindle RAID 10, non-clustered index on 6-spindle RAID 10. All drives SCSI/320, 15k
9. Perfmon does not indicate any resource being pinned (or even challenged)
--
Scott NicholScott
http://sqlblog.com/blogs/linchi_shea/archive/2007/08/27/performance-impact-the-most-optimal-insert-script-can-t-beat-bulkcopy.aspx
"Scott Nichol" <snicholnews@.scottnichol.com> wrote in message
news:uDASsWIHIHA.1548@.TK2MSFTNGP05.phx.gbl...
> Question/sanity check about BULK INSERT performance.
> I load a 300,000 row text file into my largest table every day. The file
> loads into an empty table with the same structure and defaults but no
> indexes in about 10 seconds, for a rate of 30,000 rows/sec. The same file
> takes more like 3000 seconds to load into the "real" table, which has a
> clustered index, one non-clustered index and 547,640,055 rows of data (as
> of today). Is the discrepency in rate (30,000 rows/sec vs. 100 rows/sec)
> reasonable? Is it because the indexes on the table are so deep that
> updating them requires lots of page-at-a-time I/O?
> Some useful info:
> 1. The file being loaded is sorted in clustered index order
> 2. The BULK INSERT has the ORDER clause
> 3. The TABLOCK option on BULK INSERT makes no difference in performance
> (lock is quickly escalated to table when the option is not specified)
> 4. Database recovery mode is Bulk-Logged
> 5. Doing row-by-row INSERTs takes 5700 seconds or so, i.e. BULK INSERT is
> not even twice as fast
> 6. SQL Server 2000 SP3 on Windows 2003
> 7. SQL Server uses approximately 2 GB RAM (max for our OS + SS versions)
> 8. Transaction log is on RAID 1, data on 12-spindle RAID 10, non-clustered
> index on 6-spindle RAID 10. All drives SCSI/320, 15k
> 9. Perfmon does not indicate any resource being pinned (or even
> challenged)
> --
> Scott Nichol
>|||That was an excellent blog post but it was for SQL 2005. In SQL 2005,
improvements were made in bulk copy performance, alleviating the need to
drop indexes. The OP says that the system is SQL 2000 SP3a.
Looking at the fact that the data file is sorted in clustered index order
and there is one nonclustered index, I'd be inclined to drop the
nonclustered index, do the bulk insert and then recreate the nonclustered
index.
I'm also curious as to whether or not the data to be inserted are spread
throughout the table or are they added to the "end" of the table, i.e. are
the clustered index keys to be inserted greater than all existing clustered
index keys. If the data are spread throughout the table, I'd want to reduce
the fill factor.
--
Tom
----
Thomas A. Moreau, BSc, PhD, MCSE, MCDBA, MCITP, MCTS
SQL Server MVP
Toronto, ON Canada
https://mvp.support.microsoft.com/profile/Tom.Moreau
"Uri Dimant" <urid@.iscar.co.il> wrote in message
news:u6AXXaIHIHA.700@.TK2MSFTNGP05.phx.gbl...
Scott
http://sqlblog.com/blogs/linchi_shea/archive/2007/08/27/performance-impact-the-most-optimal-insert-script-can-t-beat-bulkcopy.aspx
"Scott Nichol" <snicholnews@.scottnichol.com> wrote in message
news:uDASsWIHIHA.1548@.TK2MSFTNGP05.phx.gbl...
> Question/sanity check about BULK INSERT performance.
> I load a 300,000 row text file into my largest table every day. The file
> loads into an empty table with the same structure and defaults but no
> indexes in about 10 seconds, for a rate of 30,000 rows/sec. The same file
> takes more like 3000 seconds to load into the "real" table, which has a
> clustered index, one non-clustered index and 547,640,055 rows of data (as
> of today). Is the discrepency in rate (30,000 rows/sec vs. 100 rows/sec)
> reasonable? Is it because the indexes on the table are so deep that
> updating them requires lots of page-at-a-time I/O?
> Some useful info:
> 1. The file being loaded is sorted in clustered index order
> 2. The BULK INSERT has the ORDER clause
> 3. The TABLOCK option on BULK INSERT makes no difference in performance
> (lock is quickly escalated to table when the option is not specified)
> 4. Database recovery mode is Bulk-Logged
> 5. Doing row-by-row INSERTs takes 5700 seconds or so, i.e. BULK INSERT is
> not even twice as fast
> 6. SQL Server 2000 SP3 on Windows 2003
> 7. SQL Server uses approximately 2 GB RAM (max for our OS + SS versions)
> 8. Transaction log is on RAID 1, data on 12-spindle RAID 10, non-clustered
> index on 6-spindle RAID 10. All drives SCSI/320, 15k
> 9. Perfmon does not indicate any resource being pinned (or even
> challenged)
> --
> Scott Nichol
>|||Tom
> drop indexes. The OP says that the system is SQL 2000 SP3a.
Correct, just missed it.
> Looking at the fact that the data file is sorted in clustered index order
> and there is one nonclustered index, I'd be inclined to drop the
> nonclustered index, do the bulk insert and then recreate the nonclustered
> index.
Agreed as well
"Tom Moreau" <tom@.dont.spam.me.cips.ca> wrote in message
news:%238rvAkIHIHA.1548@.TK2MSFTNGP05.phx.gbl...
> That was an excellent blog post but it was for SQL 2005. In SQL 2005,
> improvements were made in bulk copy performance, alleviating the need to
> drop indexes. The OP says that the system is SQL 2000 SP3a.
> Looking at the fact that the data file is sorted in clustered index order
> and there is one nonclustered index, I'd be inclined to drop the
> nonclustered index, do the bulk insert and then recreate the nonclustered
> index.
> I'm also curious as to whether or not the data to be inserted are spread
> throughout the table or are they added to the "end" of the table, i.e. are
> the clustered index keys to be inserted greater than all existing
> clustered
> index keys. If the data are spread throughout the table, I'd want to
> reduce
> the fill factor.
> --
> Tom
> ----
> Thomas A. Moreau, BSc, PhD, MCSE, MCDBA, MCITP, MCTS
> SQL Server MVP
> Toronto, ON Canada
> https://mvp.support.microsoft.com/profile/Tom.Moreau
>
> "Uri Dimant" <urid@.iscar.co.il> wrote in message
> news:u6AXXaIHIHA.700@.TK2MSFTNGP05.phx.gbl...
> Scott
> http://sqlblog.com/blogs/linchi_shea/archive/2007/08/27/performance-impact-the-most-optimal-insert-script-can-t-beat-bulkcopy.aspx
>
> "Scott Nichol" <snicholnews@.scottnichol.com> wrote in message
> news:uDASsWIHIHA.1548@.TK2MSFTNGP05.phx.gbl...
>> Question/sanity check about BULK INSERT performance.
>> I load a 300,000 row text file into my largest table every day. The file
>> loads into an empty table with the same structure and defaults but no
>> indexes in about 10 seconds, for a rate of 30,000 rows/sec. The same
>> file
>> takes more like 3000 seconds to load into the "real" table, which has a
>> clustered index, one non-clustered index and 547,640,055 rows of data (as
>> of today). Is the discrepency in rate (30,000 rows/sec vs. 100 rows/sec)
>> reasonable? Is it because the indexes on the table are so deep that
>> updating them requires lots of page-at-a-time I/O?
>> Some useful info:
>> 1. The file being loaded is sorted in clustered index order
>> 2. The BULK INSERT has the ORDER clause
>> 3. The TABLOCK option on BULK INSERT makes no difference in performance
>> (lock is quickly escalated to table when the option is not specified)
>> 4. Database recovery mode is Bulk-Logged
>> 5. Doing row-by-row INSERTs takes 5700 seconds or so, i.e. BULK INSERT is
>> not even twice as fast
>> 6. SQL Server 2000 SP3 on Windows 2003
>> 7. SQL Server uses approximately 2 GB RAM (max for our OS + SS versions)
>> 8. Transaction log is on RAID 1, data on 12-spindle RAID 10,
>> non-clustered
>> index on 6-spindle RAID 10. All drives SCSI/320, 15k
>> 9. Perfmon does not indicate any resource being pinned (or even
>> challenged)
>> --
>> Scott Nichol
>|||Thanks for the responses. If the speed of the BULK INSERT were the only criterion, dropping the non-clustered index would be great.
Without it, but with the clustered index and data still there, I load at about 20,000 rows/sec. Unfortuntely, re-building an index
on a table with 500+ million records takes much longer than the 3000 seconds to load with the index intact.
What I am really wondering is whether anyone else has a similar scenario (clustered and non-clustered indexes on a large table) and
whether the affect on BULK INSERT performance is the same order of magnitude as I am experiencing (2 order of magnitude drop).
--
Scott Nichol
"Tom Moreau" <tom@.dont.spam.me.cips.ca> wrote in message news:%238rvAkIHIHA.1548@.TK2MSFTNGP05.phx.gbl...
> That was an excellent blog post but it was for SQL 2005. In SQL 2005,
> improvements were made in bulk copy performance, alleviating the need to
> drop indexes. The OP says that the system is SQL 2000 SP3a.
> Looking at the fact that the data file is sorted in clustered index order
> and there is one nonclustered index, I'd be inclined to drop the
> nonclustered index, do the bulk insert and then recreate the nonclustered
> index.
> I'm also curious as to whether or not the data to be inserted are spread
> throughout the table or are they added to the "end" of the table, i.e. are
> the clustered index keys to be inserted greater than all existing clustered
> index keys. If the data are spread throughout the table, I'd want to reduce
> the fill factor.
> --
> Tom
> ----
> Thomas A. Moreau, BSc, PhD, MCSE, MCDBA, MCITP, MCTS
> SQL Server MVP
> Toronto, ON Canada
> https://mvp.support.microsoft.com/profile/Tom.Moreau
>
> "Uri Dimant" <urid@.iscar.co.il> wrote in message
> news:u6AXXaIHIHA.700@.TK2MSFTNGP05.phx.gbl...
> Scott
> http://sqlblog.com/blogs/linchi_shea/archive/2007/08/27/performance-impact-the-most-optimal-insert-script-can-t-beat-bulkcopy.aspx
>
> "Scott Nichol" <snicholnews@.scottnichol.com> wrote in message
> news:uDASsWIHIHA.1548@.TK2MSFTNGP05.phx.gbl...
>> Question/sanity check about BULK INSERT performance.
>> I load a 300,000 row text file into my largest table every day. The file
>> loads into an empty table with the same structure and defaults but no
>> indexes in about 10 seconds, for a rate of 30,000 rows/sec. The same file
>> takes more like 3000 seconds to load into the "real" table, which has a
>> clustered index, one non-clustered index and 547,640,055 rows of data (as
>> of today). Is the discrepency in rate (30,000 rows/sec vs. 100 rows/sec)
>> reasonable? Is it because the indexes on the table are so deep that
>> updating them requires lots of page-at-a-time I/O?
>> Some useful info:
>> 1. The file being loaded is sorted in clustered index order
>> 2. The BULK INSERT has the ORDER clause
>> 3. The TABLOCK option on BULK INSERT makes no difference in performance
>> (lock is quickly escalated to table when the option is not specified)
>> 4. Database recovery mode is Bulk-Logged
>> 5. Doing row-by-row INSERTs takes 5700 seconds or so, i.e. BULK INSERT is
>> not even twice as fast
>> 6. SQL Server 2000 SP3 on Windows 2003
>> 7. SQL Server uses approximately 2 GB RAM (max for our OS + SS versions)
>> 8. Transaction log is on RAID 1, data on 12-spindle RAID 10, non-clustered
>> index on 6-spindle RAID 10. All drives SCSI/320, 15k
>> 9. Perfmon does not indicate any resource being pinned (or even
>> challenged)
>> --
>> Scott Nichol
>|||What fill factors do you have? I'm thinking that there may be a lot of page
splitting going on.
There could also be disk issues, but you did say that things looked OK
there - "Perfmon does not indicate any resource being pinned". What is the
average disk queue length? Is it > 24?
It may come to pass that you need to use horizontal partitioning and break
the table up.
--
Tom
----
Thomas A. Moreau, BSc, PhD, MCSE, MCDBA, MCITP, MCTS
SQL Server MVP
Toronto, ON Canada
https://mvp.support.microsoft.com/profile/Tom.Moreau
"Scott Nichol" <snicholnews@.scottnichol.com> wrote in message
news:OFCIl3KHIHA.4684@.TK2MSFTNGP06.phx.gbl...
Thanks for the responses. If the speed of the BULK INSERT were the only
criterion, dropping the non-clustered index would be great.
Without it, but with the clustered index and data still there, I load at
about 20,000 rows/sec. Unfortuntely, re-building an index
on a table with 500+ million records takes much longer than the 3000 seconds
to load with the index intact.
What I am really wondering is whether anyone else has a similar scenario
(clustered and non-clustered indexes on a large table) and
whether the affect on BULK INSERT performance is the same order of magnitude
as I am experiencing (2 order of magnitude drop).
--
Scott Nichol
"Tom Moreau" <tom@.dont.spam.me.cips.ca> wrote in message
news:%238rvAkIHIHA.1548@.TK2MSFTNGP05.phx.gbl...
> That was an excellent blog post but it was for SQL 2005. In SQL 2005,
> improvements were made in bulk copy performance, alleviating the need to
> drop indexes. The OP says that the system is SQL 2000 SP3a.
> Looking at the fact that the data file is sorted in clustered index order
> and there is one nonclustered index, I'd be inclined to drop the
> nonclustered index, do the bulk insert and then recreate the nonclustered
> index.
> I'm also curious as to whether or not the data to be inserted are spread
> throughout the table or are they added to the "end" of the table, i.e. are
> the clustered index keys to be inserted greater than all existing
> clustered
> index keys. If the data are spread throughout the table, I'd want to
> reduce
> the fill factor.
> --
> Tom
> ----
> Thomas A. Moreau, BSc, PhD, MCSE, MCDBA, MCITP, MCTS
> SQL Server MVP
> Toronto, ON Canada
> https://mvp.support.microsoft.com/profile/Tom.Moreau
>
> "Uri Dimant" <urid@.iscar.co.il> wrote in message
> news:u6AXXaIHIHA.700@.TK2MSFTNGP05.phx.gbl...
> Scott
> http://sqlblog.com/blogs/linchi_shea/archive/2007/08/27/performance-impact-the-most-optimal-insert-script-can-t-beat-bulkcopy.aspx
>
> "Scott Nichol" <snicholnews@.scottnichol.com> wrote in message
> news:uDASsWIHIHA.1548@.TK2MSFTNGP05.phx.gbl...
>> Question/sanity check about BULK INSERT performance.
>> I load a 300,000 row text file into my largest table every day. The file
>> loads into an empty table with the same structure and defaults but no
>> indexes in about 10 seconds, for a rate of 30,000 rows/sec. The same
>> file
>> takes more like 3000 seconds to load into the "real" table, which has a
>> clustered index, one non-clustered index and 547,640,055 rows of data (as
>> of today). Is the discrepency in rate (30,000 rows/sec vs. 100 rows/sec)
>> reasonable? Is it because the indexes on the table are so deep that
>> updating them requires lots of page-at-a-time I/O?
>> Some useful info:
>> 1. The file being loaded is sorted in clustered index order
>> 2. The BULK INSERT has the ORDER clause
>> 3. The TABLOCK option on BULK INSERT makes no difference in performance
>> (lock is quickly escalated to table when the option is not specified)
>> 4. Database recovery mode is Bulk-Logged
>> 5. Doing row-by-row INSERTs takes 5700 seconds or so, i.e. BULK INSERT is
>> not even twice as fast
>> 6. SQL Server 2000 SP3 on Windows 2003
>> 7. SQL Server uses approximately 2 GB RAM (max for our OS + SS versions)
>> 8. Transaction log is on RAID 1, data on 12-spindle RAID 10,
>> non-clustered
>> index on 6-spindle RAID 10. All drives SCSI/320, 15k
>> 9. Perfmon does not indicate any resource being pinned (or even
>> challenged)
>> --
>> Scott Nichol
>|||Good questions. Fill factor is 80% (the default?). The write pattern in general is no writes, then bursts every two minutes or so
at checkpoint. During bursts, write queue goes to 100 or more. Each burst is about 8 seconds. The read pattern is more consistent
and lighter, with read queue never higher than 6, but normally 0 (and that's with other concurrent activity in the database). The
reads are from the RAID 10 set with the non-clustered index. There is very little reading from the set with the clustered index.
On a development machine where I can run the BULK INSERT without other activity, the disk activity on the RAID set with the
clustered index consists of one write burst of 4 seconds. The disk activity on the RAID set with the non-clustered index shows
consistent reads for the rest of the load, with intermiittent write bursts.
The feeling we get from the reads is that SQL Server is reading through the existing index nodes to find leaves that need to be
updated (splitting when necessary, etc.).
The values in the non-clustered index are GUIDs and thus distributed throughout the index.
--
Scott Nichol
"Tom Moreau" <tom@.dont.spam.me.cips.ca> wrote in message news:%23HGP4$LHIHA.1316@.TK2MSFTNGP02.phx.gbl...
> What fill factors do you have? I'm thinking that there may be a lot of page
> splitting going on.
> There could also be disk issues, but you did say that things looked OK
> there - "Perfmon does not indicate any resource being pinned". What is the
> average disk queue length? Is it > 24?
> It may come to pass that you need to use horizontal partitioning and break
> the table up.
> --
> Tom
> ----
> Thomas A. Moreau, BSc, PhD, MCSE, MCDBA, MCITP, MCTS
> SQL Server MVP
> Toronto, ON Canada
> https://mvp.support.microsoft.com/profile/Tom.Moreau
>
> "Scott Nichol" <snicholnews@.scottnichol.com> wrote in message
> news:OFCIl3KHIHA.4684@.TK2MSFTNGP06.phx.gbl...
> Thanks for the responses. If the speed of the BULK INSERT were the only
> criterion, dropping the non-clustered index would be great.
> Without it, but with the clustered index and data still there, I load at
> about 20,000 rows/sec. Unfortuntely, re-building an index
> on a table with 500+ million records takes much longer than the 3000 seconds
> to load with the index intact.
> What I am really wondering is whether anyone else has a similar scenario
> (clustered and non-clustered indexes on a large table) and
> whether the affect on BULK INSERT performance is the same order of magnitude
> as I am experiencing (2 order of magnitude drop).
> --
> Scott Nichol
> "Tom Moreau" <tom@.dont.spam.me.cips.ca> wrote in message
> news:%238rvAkIHIHA.1548@.TK2MSFTNGP05.phx.gbl...
>> That was an excellent blog post but it was for SQL 2005. In SQL 2005,
>> improvements were made in bulk copy performance, alleviating the need to
>> drop indexes. The OP says that the system is SQL 2000 SP3a.
>> Looking at the fact that the data file is sorted in clustered index order
>> and there is one nonclustered index, I'd be inclined to drop the
>> nonclustered index, do the bulk insert and then recreate the nonclustered
>> index.
>> I'm also curious as to whether or not the data to be inserted are spread
>> throughout the table or are they added to the "end" of the table, i.e. are
>> the clustered index keys to be inserted greater than all existing
>> clustered
>> index keys. If the data are spread throughout the table, I'd want to
>> reduce
>> the fill factor.
>> --
>> Tom
>> ----
>> Thomas A. Moreau, BSc, PhD, MCSE, MCDBA, MCITP, MCTS
>> SQL Server MVP
>> Toronto, ON Canada
>> https://mvp.support.microsoft.com/profile/Tom.Moreau
>>
>> "Uri Dimant" <urid@.iscar.co.il> wrote in message
>> news:u6AXXaIHIHA.700@.TK2MSFTNGP05.phx.gbl...
>> Scott
>> http://sqlblog.com/blogs/linchi_shea/archive/2007/08/27/performance-impact-the-most-optimal-insert-script-can-t-beat-bulkcopy.aspx
>>
>> "Scott Nichol" <snicholnews@.scottnichol.com> wrote in message
>> news:uDASsWIHIHA.1548@.TK2MSFTNGP05.phx.gbl...
>> Question/sanity check about BULK INSERT performance.
>> I load a 300,000 row text file into my largest table every day. The file
>> loads into an empty table with the same structure and defaults but no
>> indexes in about 10 seconds, for a rate of 30,000 rows/sec. The same
>> file
>> takes more like 3000 seconds to load into the "real" table, which has a
>> clustered index, one non-clustered index and 547,640,055 rows of data (as
>> of today). Is the discrepency in rate (30,000 rows/sec vs. 100 rows/sec)
>> reasonable? Is it because the indexes on the table are so deep that
>> updating them requires lots of page-at-a-time I/O?
>> Some useful info:
>> 1. The file being loaded is sorted in clustered index order
>> 2. The BULK INSERT has the ORDER clause
>> 3. The TABLOCK option on BULK INSERT makes no difference in performance
>> (lock is quickly escalated to table when the option is not specified)
>> 4. Database recovery mode is Bulk-Logged
>> 5. Doing row-by-row INSERTs takes 5700 seconds or so, i.e. BULK INSERT is
>> not even twice as fast
>> 6. SQL Server 2000 SP3 on Windows 2003
>> 7. SQL Server uses approximately 2 GB RAM (max for our OS + SS versions)
>> 8. Transaction log is on RAID 1, data on 12-spindle RAID 10,
>> non-clustered
>> index on 6-spindle RAID 10. All drives SCSI/320, 15k
>> 9. Perfmon does not indicate any resource being pinned (or even
>> challenged)
>> --
>> Scott Nichol
>>
>|||The fact that you're using GUIDs says much. In SQL 2005, they have a new
function - NEWSEQUENTIALID() - that can alleviate the problems that you get
with NEWID(). NEWID() creates random GUIDs, so when you do INSERTs, they're
all over the place. I would try lowering the fill factor on the
nonclustered index. Start with 50 or 60. After the bulk insert, defrag the
index.
Try running DBCC SHOWCONTIG to report how fragmented that index is.
As for the clustered index, is it a monotonically increasing key? If so,
make the fill factor 100, since a lower fill factor won't buy you anything.
--
Tom
----
Thomas A. Moreau, BSc, PhD, MCSE, MCDBA, MCITP, MCTS
SQL Server MVP
Toronto, ON Canada
https://mvp.support.microsoft.com/profile/Tom.Moreau
"Scott Nichol" <snicholnews@.scottnichol.com> wrote in message
news:%23OYnHQQHIHA.2268@.TK2MSFTNGP02.phx.gbl...
Good questions. Fill factor is 80% (the default?). The write pattern in
general is no writes, then bursts every two minutes or so
at checkpoint. During bursts, write queue goes to 100 or more. Each burst
is about 8 seconds. The read pattern is more consistent
and lighter, with read queue never higher than 6, but normally 0 (and that's
with other concurrent activity in the database). The
reads are from the RAID 10 set with the non-clustered index. There is very
little reading from the set with the clustered index.
On a development machine where I can run the BULK INSERT without other
activity, the disk activity on the RAID set with the
clustered index consists of one write burst of 4 seconds. The disk activity
on the RAID set with the non-clustered index shows
consistent reads for the rest of the load, with intermiittent write bursts.
The feeling we get from the reads is that SQL Server is reading through the
existing index nodes to find leaves that need to be
updated (splitting when necessary, etc.).
The values in the non-clustered index are GUIDs and thus distributed
throughout the index.
--
Scott Nichol
"Tom Moreau" <tom@.dont.spam.me.cips.ca> wrote in message
news:%23HGP4$LHIHA.1316@.TK2MSFTNGP02.phx.gbl...
> What fill factors do you have? I'm thinking that there may be a lot of
> page
> splitting going on.
> There could also be disk issues, but you did say that things looked OK
> there - "Perfmon does not indicate any resource being pinned". What is
> the
> average disk queue length? Is it > 24?
> It may come to pass that you need to use horizontal partitioning and break
> the table up.
> --
> Tom
> ----
> Thomas A. Moreau, BSc, PhD, MCSE, MCDBA, MCITP, MCTS
> SQL Server MVP
> Toronto, ON Canada
> https://mvp.support.microsoft.com/profile/Tom.Moreau
>
> "Scott Nichol" <snicholnews@.scottnichol.com> wrote in message
> news:OFCIl3KHIHA.4684@.TK2MSFTNGP06.phx.gbl...
> Thanks for the responses. If the speed of the BULK INSERT were the only
> criterion, dropping the non-clustered index would be great.
> Without it, but with the clustered index and data still there, I load at
> about 20,000 rows/sec. Unfortuntely, re-building an index
> on a table with 500+ million records takes much longer than the 3000
> seconds
> to load with the index intact.
> What I am really wondering is whether anyone else has a similar scenario
> (clustered and non-clustered indexes on a large table) and
> whether the affect on BULK INSERT performance is the same order of
> magnitude
> as I am experiencing (2 order of magnitude drop).
> --
> Scott Nichol
> "Tom Moreau" <tom@.dont.spam.me.cips.ca> wrote in message
> news:%238rvAkIHIHA.1548@.TK2MSFTNGP05.phx.gbl...
>> That was an excellent blog post but it was for SQL 2005. In SQL 2005,
>> improvements were made in bulk copy performance, alleviating the need to
>> drop indexes. The OP says that the system is SQL 2000 SP3a.
>> Looking at the fact that the data file is sorted in clustered index order
>> and there is one nonclustered index, I'd be inclined to drop the
>> nonclustered index, do the bulk insert and then recreate the nonclustered
>> index.
>> I'm also curious as to whether or not the data to be inserted are spread
>> throughout the table or are they added to the "end" of the table, i.e.
>> are
>> the clustered index keys to be inserted greater than all existing
>> clustered
>> index keys. If the data are spread throughout the table, I'd want to
>> reduce
>> the fill factor.
>> --
>> Tom
>> ----
>> Thomas A. Moreau, BSc, PhD, MCSE, MCDBA, MCITP, MCTS
>> SQL Server MVP
>> Toronto, ON Canada
>> https://mvp.support.microsoft.com/profile/Tom.Moreau
>>
>> "Uri Dimant" <urid@.iscar.co.il> wrote in message
>> news:u6AXXaIHIHA.700@.TK2MSFTNGP05.phx.gbl...
>> Scott
>> http://sqlblog.com/blogs/linchi_shea/archive/2007/08/27/performance-impact-the-most-optimal-insert-script-can-t-beat-bulkcopy.aspx
>>
>> "Scott Nichol" <snicholnews@.scottnichol.com> wrote in message
>> news:uDASsWIHIHA.1548@.TK2MSFTNGP05.phx.gbl...
>> Question/sanity check about BULK INSERT performance.
>> I load a 300,000 row text file into my largest table every day. The
>> file
>> loads into an empty table with the same structure and defaults but no
>> indexes in about 10 seconds, for a rate of 30,000 rows/sec. The same
>> file
>> takes more like 3000 seconds to load into the "real" table, which has a
>> clustered index, one non-clustered index and 547,640,055 rows of data
>> (as
>> of today). Is the discrepency in rate (30,000 rows/sec vs. 100
>> rows/sec)
>> reasonable? Is it because the indexes on the table are so deep that
>> updating them requires lots of page-at-a-time I/O?
>> Some useful info:
>> 1. The file being loaded is sorted in clustered index order
>> 2. The BULK INSERT has the ORDER clause
>> 3. The TABLOCK option on BULK INSERT makes no difference in performance
>> (lock is quickly escalated to table when the option is not specified)
>> 4. Database recovery mode is Bulk-Logged
>> 5. Doing row-by-row INSERTs takes 5700 seconds or so, i.e. BULK INSERT
>> is
>> not even twice as fast
>> 6. SQL Server 2000 SP3 on Windows 2003
>> 7. SQL Server uses approximately 2 GB RAM (max for our OS + SS versions)
>> 8. Transaction log is on RAID 1, data on 12-spindle RAID 10,
>> non-clustered
>> index on 6-spindle RAID 10. All drives SCSI/320, 15k
>> 9. Perfmon does not indicate any resource being pinned (or even
>> challenged)
>> --
>> Scott Nichol
>>
>|||"Tom Moreau" <tom@.dont.spam.me.cips.ca> wrote in message news:eHmpCuTHIHA.4040@.TK2MSFTNGP02.phx.gbl...
> The fact that you're using GUIDs says much. In SQL 2005, they have a new
> function - NEWSEQUENTIALID() - that can alleviate the problems that you get
> with NEWID().
That sounds very useful. One more reason to look forward to migration next year.
> NEWID() creates random GUIDs, so when you do INSERTs, they're
> all over the place. I would try lowering the fill factor on the
> nonclustered index. Start with 50 or 60. After the bulk insert, defrag the
> index.
> Try running DBCC SHOWCONTIG to report how fragmented that index is.
Good idea: I will definitely check/alleviate fragmentation. I may toy with the fill factor on the dev machine first.
> As for the clustered index, is it a monotonically increasing key? If so,
> make the fill factor 100, since a lower fill factor won't buy you anything.
No such luck. It is 6 columns. Because the first column is a date, *most* of the data added each day is at the "end" of the index.
Unfortunately, there are some data laggards that arrive days later, so the BULK INSERT is not a pure append.
Thanks for all your help.
--
Scott Nichol
> --
> Tom
> ----
> Thomas A. Moreau, BSc, PhD, MCSE, MCDBA, MCITP, MCTS
> SQL Server MVP
> Toronto, ON Canada
> https://mvp.support.microsoft.com/profile/Tom.Moreau
>
> "Scott Nichol" <snicholnews@.scottnichol.com> wrote in message
> news:%23OYnHQQHIHA.2268@.TK2MSFTNGP02.phx.gbl...
> Good questions. Fill factor is 80% (the default?). The write pattern in
> general is no writes, then bursts every two minutes or so
> at checkpoint. During bursts, write queue goes to 100 or more. Each burst
> is about 8 seconds. The read pattern is more consistent
> and lighter, with read queue never higher than 6, but normally 0 (and that's
> with other concurrent activity in the database). The
> reads are from the RAID 10 set with the non-clustered index. There is very
> little reading from the set with the clustered index.
> On a development machine where I can run the BULK INSERT without other
> activity, the disk activity on the RAID set with the
> clustered index consists of one write burst of 4 seconds. The disk activity
> on the RAID set with the non-clustered index shows
> consistent reads for the rest of the load, with intermiittent write bursts.
> The feeling we get from the reads is that SQL Server is reading through the
> existing index nodes to find leaves that need to be
> updated (splitting when necessary, etc.).
> The values in the non-clustered index are GUIDs and thus distributed
> throughout the index.
> --
> Scott Nichol
> "Tom Moreau" <tom@.dont.spam.me.cips.ca> wrote in message
> news:%23HGP4$LHIHA.1316@.TK2MSFTNGP02.phx.gbl...
>> What fill factors do you have? I'm thinking that there may be a lot of
>> page
>> splitting going on.
>> There could also be disk issues, but you did say that things looked OK
>> there - "Perfmon does not indicate any resource being pinned". What is
>> the
>> average disk queue length? Is it > 24?
>> It may come to pass that you need to use horizontal partitioning and break
>> the table up.
>> --
>> Tom
>> ----
>> Thomas A. Moreau, BSc, PhD, MCSE, MCDBA, MCITP, MCTS
>> SQL Server MVP
>> Toronto, ON Canada
>> https://mvp.support.microsoft.com/profile/Tom.Moreau
>>
>> "Scott Nichol" <snicholnews@.scottnichol.com> wrote in message
>> news:OFCIl3KHIHA.4684@.TK2MSFTNGP06.phx.gbl...
>> Thanks for the responses. If the speed of the BULK INSERT were the only
>> criterion, dropping the non-clustered index would be great.
>> Without it, but with the clustered index and data still there, I load at
>> about 20,000 rows/sec. Unfortuntely, re-building an index
>> on a table with 500+ million records takes much longer than the 3000
>> seconds
>> to load with the index intact.
>> What I am really wondering is whether anyone else has a similar scenario
>> (clustered and non-clustered indexes on a large table) and
>> whether the affect on BULK INSERT performance is the same order of
>> magnitude
>> as I am experiencing (2 order of magnitude drop).
>> --
>> Scott Nichol
>> "Tom Moreau" <tom@.dont.spam.me.cips.ca> wrote in message
>> news:%238rvAkIHIHA.1548@.TK2MSFTNGP05.phx.gbl...
>> That was an excellent blog post but it was for SQL 2005. In SQL 2005,
>> improvements were made in bulk copy performance, alleviating the need to
>> drop indexes. The OP says that the system is SQL 2000 SP3a.
>> Looking at the fact that the data file is sorted in clustered index order
>> and there is one nonclustered index, I'd be inclined to drop the
>> nonclustered index, do the bulk insert and then recreate the nonclustered
>> index.
>> I'm also curious as to whether or not the data to be inserted are spread
>> throughout the table or are they added to the "end" of the table, i.e.
>> are
>> the clustered index keys to be inserted greater than all existing
>> clustered
>> index keys. If the data are spread throughout the table, I'd want to
>> reduce
>> the fill factor.
>> --
>> Tom
>> ----
>> Thomas A. Moreau, BSc, PhD, MCSE, MCDBA, MCITP, MCTS
>> SQL Server MVP
>> Toronto, ON Canada
>> https://mvp.support.microsoft.com/profile/Tom.Moreau
>>
>> "Uri Dimant" <urid@.iscar.co.il> wrote in message
>> news:u6AXXaIHIHA.700@.TK2MSFTNGP05.phx.gbl...
>> Scott
>> http://sqlblog.com/blogs/linchi_shea/archive/2007/08/27/performance-impact-the-most-optimal-insert-script-can-t-beat-bulkcopy.aspx
>>
>> "Scott Nichol" <snicholnews@.scottnichol.com> wrote in message
>> news:uDASsWIHIHA.1548@.TK2MSFTNGP05.phx.gbl...
>> Question/sanity check about BULK INSERT performance.
>> I load a 300,000 row text file into my largest table every day. The
>> file
>> loads into an empty table with the same structure and defaults but no
>> indexes in about 10 seconds, for a rate of 30,000 rows/sec. The same
>> file
>> takes more like 3000 seconds to load into the "real" table, which has a
>> clustered index, one non-clustered index and 547,640,055 rows of data
>> (as
>> of today). Is the discrepency in rate (30,000 rows/sec vs. 100
>> rows/sec)
>> reasonable? Is it because the indexes on the table are so deep that
>> updating them requires lots of page-at-a-time I/O?
>> Some useful info:
>> 1. The file being loaded is sorted in clustered index order
>> 2. The BULK INSERT has the ORDER clause
>> 3. The TABLOCK option on BULK INSERT makes no difference in performance
>> (lock is quickly escalated to table when the option is not specified)
>> 4. Database recovery mode is Bulk-Logged
>> 5. Doing row-by-row INSERTs takes 5700 seconds or so, i.e. BULK INSERT
>> is
>> not even twice as fast
>> 6. SQL Server 2000 SP3 on Windows 2003
>> 7. SQL Server uses approximately 2 GB RAM (max for our OS + SS versions)
>> 8. Transaction log is on RAID 1, data on 12-spindle RAID 10,
>> non-clustered
>> index on 6-spindle RAID 10. All drives SCSI/320, 15k
>> 9. Perfmon does not indicate any resource being pinned (or even
>> challenged)
>> --
>> Scott Nichol
>>
>>
>|||> The values in the non-clustered index are GUIDs and thus distributed
> throughout the index.
As Tom said, random GUID values are a real bulk insert performance killer.
Buffer efficiency with these random decreases proportionally with table size
until you get to the point of requiring at least one I/Os for every row
inserted.
One workaround I've used with SQL 2000 is to use a composite key instead of
GUID alone. I don't know of the details of your application but in my case,
I used a unique index consisting of FileId (a sequential integer) and
RecordId (a GUID value). This greatly improved bulk insert performance with
very large tables.
--
Hope this helps.
Dan Guzman
SQL Server MVP
"Scott Nichol" <snicholnews@.scottnichol.com> wrote in message
news:%23OYnHQQHIHA.2268@.TK2MSFTNGP02.phx.gbl...
> Good questions. Fill factor is 80% (the default?). The write pattern in
> general is no writes, then bursts every two minutes or so at checkpoint.
> During bursts, write queue goes to 100 or more. Each burst is about 8
> seconds. The read pattern is more consistent and lighter, with read queue
> never higher than 6, but normally 0 (and that's with other concurrent
> activity in the database). The reads are from the RAID 10 set with the
> non-clustered index. There is very little reading from the set with the
> clustered index.
> On a development machine where I can run the BULK INSERT without other
> activity, the disk activity on the RAID set with the clustered index
> consists of one write burst of 4 seconds. The disk activity on the RAID
> set with the non-clustered index shows consistent reads for the rest of
> the load, with intermiittent write bursts.
> The feeling we get from the reads is that SQL Server is reading through
> the existing index nodes to find leaves that need to be updated (splitting
> when necessary, etc.).
> The values in the non-clustered index are GUIDs and thus distributed
> throughout the index.
> --
> Scott Nichol
> "Tom Moreau" <tom@.dont.spam.me.cips.ca> wrote in message
> news:%23HGP4$LHIHA.1316@.TK2MSFTNGP02.phx.gbl...
>> What fill factors do you have? I'm thinking that there may be a lot of
>> page
>> splitting going on.
>> There could also be disk issues, but you did say that things looked OK
>> there - "Perfmon does not indicate any resource being pinned". What is
>> the
>> average disk queue length? Is it > 24?
>> It may come to pass that you need to use horizontal partitioning and
>> break
>> the table up.
>> --
>> Tom
>> ----
>> Thomas A. Moreau, BSc, PhD, MCSE, MCDBA, MCITP, MCTS
>> SQL Server MVP
>> Toronto, ON Canada
>> https://mvp.support.microsoft.com/profile/Tom.Moreau
>>
>> "Scott Nichol" <snicholnews@.scottnichol.com> wrote in message
>> news:OFCIl3KHIHA.4684@.TK2MSFTNGP06.phx.gbl...
>> Thanks for the responses. If the speed of the BULK INSERT were the only
>> criterion, dropping the non-clustered index would be great.
>> Without it, but with the clustered index and data still there, I load at
>> about 20,000 rows/sec. Unfortuntely, re-building an index
>> on a table with 500+ million records takes much longer than the 3000
>> seconds
>> to load with the index intact.
>> What I am really wondering is whether anyone else has a similar scenario
>> (clustered and non-clustered indexes on a large table) and
>> whether the affect on BULK INSERT performance is the same order of
>> magnitude
>> as I am experiencing (2 order of magnitude drop).
>> --
>> Scott Nichol
>> "Tom Moreau" <tom@.dont.spam.me.cips.ca> wrote in message
>> news:%238rvAkIHIHA.1548@.TK2MSFTNGP05.phx.gbl...
>> That was an excellent blog post but it was for SQL 2005. In SQL 2005,
>> improvements were made in bulk copy performance, alleviating the need to
>> drop indexes. The OP says that the system is SQL 2000 SP3a.
>> Looking at the fact that the data file is sorted in clustered index
>> order
>> and there is one nonclustered index, I'd be inclined to drop the
>> nonclustered index, do the bulk insert and then recreate the
>> nonclustered
>> index.
>> I'm also curious as to whether or not the data to be inserted are spread
>> throughout the table or are they added to the "end" of the table, i.e.
>> are
>> the clustered index keys to be inserted greater than all existing
>> clustered
>> index keys. If the data are spread throughout the table, I'd want to
>> reduce
>> the fill factor.
>> --
>> Tom
>> ----
>> Thomas A. Moreau, BSc, PhD, MCSE, MCDBA, MCITP, MCTS
>> SQL Server MVP
>> Toronto, ON Canada
>> https://mvp.support.microsoft.com/profile/Tom.Moreau
>>
>> "Uri Dimant" <urid@.iscar.co.il> wrote in message
>> news:u6AXXaIHIHA.700@.TK2MSFTNGP05.phx.gbl...
>> Scott
>> http://sqlblog.com/blogs/linchi_shea/archive/2007/08/27/performance-impact-the-most-optimal-insert-script-can-t-beat-bulkcopy.aspx
>>
>> "Scott Nichol" <snicholnews@.scottnichol.com> wrote in message
>> news:uDASsWIHIHA.1548@.TK2MSFTNGP05.phx.gbl...
>> Question/sanity check about BULK INSERT performance.
>> I load a 300,000 row text file into my largest table every day. The
>> file
>> loads into an empty table with the same structure and defaults but no
>> indexes in about 10 seconds, for a rate of 30,000 rows/sec. The same
>> file
>> takes more like 3000 seconds to load into the "real" table, which has a
>> clustered index, one non-clustered index and 547,640,055 rows of data
>> (as
>> of today). Is the discrepency in rate (30,000 rows/sec vs. 100
>> rows/sec)
>> reasonable? Is it because the indexes on the table are so deep that
>> updating them requires lots of page-at-a-time I/O?
>> Some useful info:
>> 1. The file being loaded is sorted in clustered index order
>> 2. The BULK INSERT has the ORDER clause
>> 3. The TABLOCK option on BULK INSERT makes no difference in performance
>> (lock is quickly escalated to table when the option is not specified)
>> 4. Database recovery mode is Bulk-Logged
>> 5. Doing row-by-row INSERTs takes 5700 seconds or so, i.e. BULK INSERT
>> is
>> not even twice as fast
>> 6. SQL Server 2000 SP3 on Windows 2003
>> 7. SQL Server uses approximately 2 GB RAM (max for our OS + SS
>> versions)
>> 8. Transaction log is on RAID 1, data on 12-spindle RAID 10,
>> non-clustered
>> index on 6-spindle RAID 10. All drives SCSI/320, 15k
>> 9. Perfmon does not indicate any resource being pinned (or even
>> challenged)
>> --
>> Scott Nichol
>>
>>
>|||Thanks. You confirm our suspicions about the GUID: we seem to have lots of single I/Os when the index gets updated. There's
virtually no I/O queuing and the machine does not seem taxed at all.
If we use the sequential integer approach, we only need the GUID in the index when we hit 4G rows and start duplicating the integer
values we generate (assuming we use negative as well as positive values), right?
Since the value of the index is used as a pseudo-FK from another table, we'd have some work there as well. Not to mention, of
course, adding the new sequential column to the existing half billion rows. Too bad we did not attack the problem earlier!
It sounds like NEWSEQUENTIALID() would help us tremendously if we moved to SQL Server 2005. Is that correct? We want to migrate
during 2008 anyway. If it might do at least nearly as well, it would certainly be easier than retrofitting with the sequential
integer.
--
Scott Nichol
"Dan Guzman" <guzmanda@.nospam-online.sbcglobal.net> wrote in message news:8187C560-FBFB-40E0-BE14-AB089E2936B3@.microsoft.com...
>> The values in the non-clustered index are GUIDs and thus distributed throughout the index.
> As Tom said, random GUID values are a real bulk insert performance killer. Buffer efficiency with these random decreases
> proportionally with table size until you get to the point of requiring at least one I/Os for every row inserted.
> One workaround I've used with SQL 2000 is to use a composite key instead of GUID alone. I don't know of the details of your
> application but in my case, I used a unique index consisting of FileId (a sequential integer) and RecordId (a GUID value). This
> greatly improved bulk insert performance with very large tables.
> --
> Hope this helps.
> Dan Guzman
> SQL Server MVP
> "Scott Nichol" <snicholnews@.scottnichol.com> wrote in message news:%23OYnHQQHIHA.2268@.TK2MSFTNGP02.phx.gbl...
>> Good questions. Fill factor is 80% (the default?). The write pattern in general is no writes, then bursts every two minutes or
>> so at checkpoint. During bursts, write queue goes to 100 or more. Each burst is about 8 seconds. The read pattern is more
>> consistent and lighter, with read queue never higher than 6, but normally 0 (and that's with other concurrent activity in the
>> database). The reads are from the RAID 10 set with the non-clustered index. There is very little reading from the set with the
>> clustered index.
>> On a development machine where I can run the BULK INSERT without other activity, the disk activity on the RAID set with the
>> clustered index consists of one write burst of 4 seconds. The disk activity on the RAID set with the non-clustered index shows
>> consistent reads for the rest of the load, with intermiittent write bursts.
>> The feeling we get from the reads is that SQL Server is reading through the existing index nodes to find leaves that need to be
>> updated (splitting when necessary, etc.).
>> The values in the non-clustered index are GUIDs and thus distributed throughout the index.
>> --
>> Scott Nichol
>> "Tom Moreau" <tom@.dont.spam.me.cips.ca> wrote in message news:%23HGP4$LHIHA.1316@.TK2MSFTNGP02.phx.gbl...
>> What fill factors do you have? I'm thinking that there may be a lot of page
>> splitting going on.
>> There could also be disk issues, but you did say that things looked OK
>> there - "Perfmon does not indicate any resource being pinned". What is the
>> average disk queue length? Is it > 24?
>> It may come to pass that you need to use horizontal partitioning and break
>> the table up.
>> --
>> Tom
>> ----
>> Thomas A. Moreau, BSc, PhD, MCSE, MCDBA, MCITP, MCTS
>> SQL Server MVP
>> Toronto, ON Canada
>> https://mvp.support.microsoft.com/profile/Tom.Moreau
>>
>> "Scott Nichol" <snicholnews@.scottnichol.com> wrote in message
>> news:OFCIl3KHIHA.4684@.TK2MSFTNGP06.phx.gbl...
>> Thanks for the responses. If the speed of the BULK INSERT were the only
>> criterion, dropping the non-clustered index would be great.
>> Without it, but with the clustered index and data still there, I load at
>> about 20,000 rows/sec. Unfortuntely, re-building an index
>> on a table with 500+ million records takes much longer than the 3000 seconds
>> to load with the index intact.
>> What I am really wondering is whether anyone else has a similar scenario
>> (clustered and non-clustered indexes on a large table) and
>> whether the affect on BULK INSERT performance is the same order of magnitude
>> as I am experiencing (2 order of magnitude drop).
>> --
>> Scott Nichol
>> "Tom Moreau" <tom@.dont.spam.me.cips.ca> wrote in message
>> news:%238rvAkIHIHA.1548@.TK2MSFTNGP05.phx.gbl...
>> That was an excellent blog post but it was for SQL 2005. In SQL 2005,
>> improvements were made in bulk copy performance, alleviating the need to
>> drop indexes. The OP says that the system is SQL 2000 SP3a.
>> Looking at the fact that the data file is sorted in clustered index order
>> and there is one nonclustered index, I'd be inclined to drop the
>> nonclustered index, do the bulk insert and then recreate the nonclustered
>> index.
>> I'm also curious as to whether or not the data to be inserted are spread
>> throughout the table or are they added to the "end" of the table, i.e. are
>> the clustered index keys to be inserted greater than all existing
>> clustered
>> index keys. If the data are spread throughout the table, I'd want to
>> reduce
>> the fill factor.
>> --
>> Tom
>> ----
>> Thomas A. Moreau, BSc, PhD, MCSE, MCDBA, MCITP, MCTS
>> SQL Server MVP
>> Toronto, ON Canada
>> https://mvp.support.microsoft.com/profile/Tom.Moreau
>>
>> "Uri Dimant" <urid@.iscar.co.il> wrote in message
>> news:u6AXXaIHIHA.700@.TK2MSFTNGP05.phx.gbl...
>> Scott
>> http://sqlblog.com/blogs/linchi_shea/archive/2007/08/27/performance-impact-the-most-optimal-insert-script-can-t-beat-bulkcopy.aspx
>>
>> "Scott Nichol" <snicholnews@.scottnichol.com> wrote in message
>> news:uDASsWIHIHA.1548@.TK2MSFTNGP05.phx.gbl...
>> Question/sanity check about BULK INSERT performance.
>> I load a 300,000 row text file into my largest table every day. The file
>> loads into an empty table with the same structure and defaults but no
>> indexes in about 10 seconds, for a rate of 30,000 rows/sec. The same
>> file
>> takes more like 3000 seconds to load into the "real" table, which has a
>> clustered index, one non-clustered index and 547,640,055 rows of data (as
>> of today). Is the discrepency in rate (30,000 rows/sec vs. 100 rows/sec)
>> reasonable? Is it because the indexes on the table are so deep that
>> updating them requires lots of page-at-a-time I/O?
>> Some useful info:
>> 1. The file being loaded is sorted in clustered index order
>> 2. The BULK INSERT has the ORDER clause
>> 3. The TABLOCK option on BULK INSERT makes no difference in performance
>> (lock is quickly escalated to table when the option is not specified)
>> 4. Database recovery mode is Bulk-Logged
>> 5. Doing row-by-row INSERTs takes 5700 seconds or so, i.e. BULK INSERT is
>> not even twice as fast
>> 6. SQL Server 2000 SP3 on Windows 2003
>> 7. SQL Server uses approximately 2 GB RAM (max for our OS + SS versions)
>> 8. Transaction log is on RAID 1, data on 12-spindle RAID 10,
>> non-clustered
>> index on 6-spindle RAID 10. All drives SCSI/320, 15k
>> 9. Perfmon does not indicate any resource being pinned (or even
>> challenged)
>> --
>> Scott Nichol
>>
>>
>>
>|||> If we use the sequential integer approach, we only need the GUID in the
> index when we hit 4G rows and start duplicating the integer values we
> generate (assuming we use negative as well as positive values), right?
You don't need the GUID in the index at all if you go with a bigint and let
SQL Server assign the values with IDENTITY. The application I mentioned in
my response provided both the integer and GUID values to SQL Server. This
allowed the application to assign the surrogate key and bulk insert into
related tables.
> Since the value of the index is used as a pseudo-FK from another table,
> we'd have some work there as well. Not to mention, of course, adding the
> new sequential column to the existing half billion rows. Too bad we did
> not attack the problem earlier!
I learned this same lesson the same lesson the hard way. I started with
only GUID clustered PK and found that bulk performance dropped at about 5M
rows. I then changed to a non-clustered index to mitigate the problem but
performance still dropped off with larger tables. Once I added the integer,
I was able to use the clustered primary key with good insert performance.
My experience was also a good lesson on PMON. The only way to really know
if you have a disk bottleneck with a batch application is to know the
maximum throughput the disk subsystem can sustain. Once the limit is hit,
the only ways to get more throughput is to either add hardware or revise the
indexing strategy/application design.
> It sounds like NEWSEQUENTIALID() would help us tremendously if we moved to
> SQL Server 2005. Is that correct? We want to migrate during 2008 anyway.
> If it might do at least nearly as well, it would certainly be easier than
> retrofitting with the sequential integer.
Yes, a NEWSEQUENTIALID() default constraint (available in SQL 2005 and
2008), a bigint identity, or the composite key method will certainly improve
insert performance.
--
Hope this helps.
Dan Guzman
SQL Server MVP
"Scott Nichol" <snicholnews@.scottnichol.com> wrote in message
news:Ovyu9OWHIHA.280@.TK2MSFTNGP03.phx.gbl...
> Thanks. You confirm our suspicions about the GUID: we seem to have lots
> of single I/Os when the index gets updated. There's virtually no I/O
> queuing and the machine does not seem taxed at all.
> If we use the sequential integer approach, we only need the GUID in the
> index when we hit 4G rows and start duplicating the integer values we
> generate (assuming we use negative as well as positive values), right?
> Since the value of the index is used as a pseudo-FK from another table,
> we'd have some work there as well. Not to mention, of course, adding the
> new sequential column to the existing half billion rows. Too bad we did
> not attack the problem earlier!
> It sounds like NEWSEQUENTIALID() would help us tremendously if we moved to
> SQL Server 2005. Is that correct? We want to migrate during 2008 anyway.
> If it might do at least nearly as well, it would certainly be easier than
> retrofitting with the sequential integer.
> --
> Scott Nichol
> "Dan Guzman" <guzmanda@.nospam-online.sbcglobal.net> wrote in message
> news:8187C560-FBFB-40E0-BE14-AB089E2936B3@.microsoft.com...
>> The values in the non-clustered index are GUIDs and thus distributed
>> throughout the index.
>> As Tom said, random GUID values are a real bulk insert performance
>> killer. Buffer efficiency with these random decreases proportionally with
>> table size until you get to the point of requiring at least one I/Os for
>> every row inserted.
>> One workaround I've used with SQL 2000 is to use a composite key instead
>> of GUID alone. I don't know of the details of your application but in my
>> case, I used a unique index consisting of FileId (a sequential integer)
>> and RecordId (a GUID value). This greatly improved bulk insert
>> performance with very large tables.
>> --
>> Hope this helps.
>> Dan Guzman
>> SQL Server MVP
>> "Scott Nichol" <snicholnews@.scottnichol.com> wrote in message
>> news:%23OYnHQQHIHA.2268@.TK2MSFTNGP02.phx.gbl...
>> Good questions. Fill factor is 80% (the default?). The write pattern
>> in general is no writes, then bursts every two minutes or so at
>> checkpoint. During bursts, write queue goes to 100 or more. Each burst
>> is about 8 seconds. The read pattern is more consistent and lighter,
>> with read queue never higher than 6, but normally 0 (and that's with
>> other concurrent activity in the database). The reads are from the RAID
>> 10 set with the non-clustered index. There is very little reading from
>> the set with the clustered index.
>> On a development machine where I can run the BULK INSERT without other
>> activity, the disk activity on the RAID set with the clustered index
>> consists of one write burst of 4 seconds. The disk activity on the RAID
>> set with the non-clustered index shows consistent reads for the rest of
>> the load, with intermiittent write bursts.
>> The feeling we get from the reads is that SQL Server is reading through
>> the existing index nodes to find leaves that need to be updated
>> (splitting when necessary, etc.).
>> The values in the non-clustered index are GUIDs and thus distributed
>> throughout the index.
>> --
>> Scott Nichol
>> "Tom Moreau" <tom@.dont.spam.me.cips.ca> wrote in message
>> news:%23HGP4$LHIHA.1316@.TK2MSFTNGP02.phx.gbl...
>> What fill factors do you have? I'm thinking that there may be a lot of
>> page
>> splitting going on.
>> There could also be disk issues, but you did say that things looked OK
>> there - "Perfmon does not indicate any resource being pinned". What is
>> the
>> average disk queue length? Is it > 24?
>> It may come to pass that you need to use horizontal partitioning and
>> break
>> the table up.
>> --
>> Tom
>> ----
>> Thomas A. Moreau, BSc, PhD, MCSE, MCDBA, MCITP, MCTS
>> SQL Server MVP
>> Toronto, ON Canada
>> https://mvp.support.microsoft.com/profile/Tom.Moreau
>>
>> "Scott Nichol" <snicholnews@.scottnichol.com> wrote in message
>> news:OFCIl3KHIHA.4684@.TK2MSFTNGP06.phx.gbl...
>> Thanks for the responses. If the speed of the BULK INSERT were the
>> only
>> criterion, dropping the non-clustered index would be great.
>> Without it, but with the clustered index and data still there, I load
>> at
>> about 20,000 rows/sec. Unfortuntely, re-building an index
>> on a table with 500+ million records takes much longer than the 3000
>> seconds
>> to load with the index intact.
>> What I am really wondering is whether anyone else has a similar
>> scenario
>> (clustered and non-clustered indexes on a large table) and
>> whether the affect on BULK INSERT performance is the same order of
>> magnitude
>> as I am experiencing (2 order of magnitude drop).
>> --
>> Scott Nichol
>> "Tom Moreau" <tom@.dont.spam.me.cips.ca> wrote in message
>> news:%238rvAkIHIHA.1548@.TK2MSFTNGP05.phx.gbl...
>> That was an excellent blog post but it was for SQL 2005. In SQL 2005,
>> improvements were made in bulk copy performance, alleviating the need
>> to
>> drop indexes. The OP says that the system is SQL 2000 SP3a.
>> Looking at the fact that the data file is sorted in clustered index
>> order
>> and there is one nonclustered index, I'd be inclined to drop the
>> nonclustered index, do the bulk insert and then recreate the
>> nonclustered
>> index.
>> I'm also curious as to whether or not the data to be inserted are
>> spread
>> throughout the table or are they added to the "end" of the table, i.e.
>> are
>> the clustered index keys to be inserted greater than all existing
>> clustered
>> index keys. If the data are spread throughout the table, I'd want to
>> reduce
>> the fill factor.
>> --
>> Tom
>> ----
>> Thomas A. Moreau, BSc, PhD, MCSE, MCDBA, MCITP, MCTS
>> SQL Server MVP
>> Toronto, ON Canada
>> https://mvp.support.microsoft.com/profile/Tom.Moreau
>>
>> "Uri Dimant" <urid@.iscar.co.il> wrote in message
>> news:u6AXXaIHIHA.700@.TK2MSFTNGP05.phx.gbl...
>> Scott
>> http://sqlblog.com/blogs/linchi_shea/archive/2007/08/27/performance-impact-the-most-optimal-insert-script-can-t-beat-bulkcopy.aspx
>>
>> "Scott Nichol" <snicholnews@.scottnichol.com> wrote in message
>> news:uDASsWIHIHA.1548@.TK2MSFTNGP05.phx.gbl...
>> Question/sanity check about BULK INSERT performance.
>> I load a 300,000 row text file into my largest table every day. The
>> file
>> loads into an empty table with the same structure and defaults but no
>> indexes in about 10 seconds, for a rate of 30,000 rows/sec. The same
>> file
>> takes more like 3000 seconds to load into the "real" table, which has
>> a
>> clustered index, one non-clustered index and 547,640,055 rows of data
>> (as
>> of today). Is the discrepency in rate (30,000 rows/sec vs. 100
>> rows/sec)
>> reasonable? Is it because the indexes on the table are so deep that
>> updating them requires lots of page-at-a-time I/O?
>> Some useful info:
>> 1. The file being loaded is sorted in clustered index order
>> 2. The BULK INSERT has the ORDER clause
>> 3. The TABLOCK option on BULK INSERT makes no difference in
>> performance
>> (lock is quickly escalated to table when the option is not specified)
>> 4. Database recovery mode is Bulk-Logged
>> 5. Doing row-by-row INSERTs takes 5700 seconds or so, i.e. BULK
>> INSERT is
>> not even twice as fast
>> 6. SQL Server 2000 SP3 on Windows 2003
>> 7. SQL Server uses approximately 2 GB RAM (max for our OS + SS
>> versions)
>> 8. Transaction log is on RAID 1, data on 12-spindle RAID 10,
>> non-clustered
>> index on 6-spindle RAID 10. All drives SCSI/320, 15k
>> 9. Perfmon does not indicate any resource being pinned (or even
>> challenged)
>> --
>> Scott Nichol
>>
>>
>>
>>
>|||Trying to avoid the pain of making any "real" change, I implemented my own alternative to NEWSEQUENTIALID(), and it ROCKS.
Using the function to generate GUID values (during data massaging) to load, BULK INSERT on my development machine took just 23
seconds to load each of two files, one 110,766 rows, the other 118,829. That's 4991 rows/sec on average. For comparison, 117,755
rows loaded in 14 seconds (8411 rows/sec) with the non-clustered index dropped. With "normal" GUIDs, I was loading just 90 rows/sec
on that rig. I look forward to seeing this on our production box.
The caveats are
1. I had dropped the non-clustered index to run the test without it, so the index that subsequent tests ran against is a fresh
re-build, presumably as unfragmented as possible.
2. The function I used to generate GUIDs was taken from an article on the web at
http://www.informit.com/articles/article.aspx?p=25862&seqNum=7. It combines 10 bytes of a normally generated GUID with 6 bytes of
the current time. The time portion provides the sequential-ness; the GUID part the random-ness. The algorithm seems very unlikely
to produce a duplicate, but I will review it to confirm before trying in production.
Thanks for all the great feedback.
--
Scott Nichol
"Dan Guzman" <guzmanda@.nospam-online.sbcglobal.net> wrote in message news:B0C99286-2CBB-4C26-AC61-B09D02965D41@.microsoft.com...
>> If we use the sequential integer approach, we only need the GUID in the index when we hit 4G rows and start duplicating the
>> integer values we generate (assuming we use negative as well as positive values), right?
> You don't need the GUID in the index at all if you go with a bigint and let SQL Server assign the values with IDENTITY. The
> application I mentioned in my response provided both the integer and GUID values to SQL Server. This allowed the application to
> assign the surrogate key and bulk insert into related tables.
>> Since the value of the index is used as a pseudo-FK from another table, we'd have some work there as well. Not to mention, of
>> course, adding the new sequential column to the existing half billion rows. Too bad we did not attack the problem earlier!
> I learned this same lesson the same lesson the hard way. I started with only GUID clustered PK and found that bulk performance
> dropped at about 5M rows. I then changed to a non-clustered index to mitigate the problem but performance still dropped off with
> larger tables. Once I added the integer, I was able to use the clustered primary key with good insert performance.
> My experience was also a good lesson on PMON. The only way to really know if you have a disk bottleneck with a batch application
> is to know the maximum throughput the disk subsystem can sustain. Once the limit is hit, the only ways to get more throughput is
> to either add hardware or revise the indexing strategy/application design.
>> It sounds like NEWSEQUENTIALID() would help us tremendously if we moved to SQL Server 2005. Is that correct? We want to migrate
>> during 2008 anyway. If it might do at least nearly as well, it would certainly be easier than retrofitting with the sequential
>> integer.
> Yes, a NEWSEQUENTIALID() default constraint (available in SQL 2005 and 2008), a bigint identity, or the composite key method will
> certainly improve insert performance.
> --
> Hope this helps.
> Dan Guzman
> SQL Server MVP
> "Scott Nichol" <snicholnews@.scottnichol.com> wrote in message news:Ovyu9OWHIHA.280@.TK2MSFTNGP03.phx.gbl...
>> Thanks. You confirm our suspicions about the GUID: we seem to have lots of single I/Os when the index gets updated. There's
>> virtually no I/O queuing and the machine does not seem taxed at all.
>> If we use the sequential integer approach, we only need the GUID in the index when we hit 4G rows and start duplicating the
>> integer values we generate (assuming we use negative as well as positive values), right?
>> Since the value of the index is used as a pseudo-FK from another table, we'd have some work there as well. Not to mention, of
>> course, adding the new sequential column to the existing half billion rows. Too bad we did not attack the problem earlier!
>> It sounds like NEWSEQUENTIALID() would help us tremendously if we moved to SQL Server 2005. Is that correct? We want to migrate
>> during 2008 anyway. If it might do at least nearly as well, it would certainly be easier than retrofitting with the sequential
>> integer.
>> --
>> Scott Nichol
>> "Dan Guzman" <guzmanda@.nospam-online.sbcglobal.net> wrote in message news:8187C560-FBFB-40E0-BE14-AB089E2936B3@.microsoft.com...
>> The values in the non-clustered index are GUIDs and thus distributed throughout the index.
>> As Tom said, random GUID values are a real bulk insert performance killer. Buffer efficiency with these random decreases
>> proportionally with table size until you get to the point of requiring at least one I/Os for every row inserted.
>> One workaround I've used with SQL 2000 is to use a composite key instead of GUID alone. I don't know of the details of your
>> application but in my case, I used a unique index consisting of FileId (a sequential integer) and RecordId (a GUID value). This
>> greatly improved bulk insert performance with very large tables.
>> --
>> Hope this helps.
>> Dan Guzman
>> SQL Server MVP
>> "Scott Nichol" <snicholnews@.scottnichol.com> wrote in message news:%23OYnHQQHIHA.2268@.TK2MSFTNGP02.phx.gbl...
>> Good questions. Fill factor is 80% (the default?). The write pattern in general is no writes, then bursts every two minutes
>> or so at checkpoint. During bursts, write queue goes to 100 or more. Each burst is about 8 seconds. The read pattern is more
>> consistent and lighter, with read queue never higher than 6, but normally 0 (and that's with other concurrent activity in the
>> database). The reads are from the RAID 10 set with the non-clustered index. There is very little reading from the set with
>> the clustered index.
>> On a development machine where I can run the BULK INSERT without other activity, the disk activity on the RAID set with the
>> clustered index consists of one write burst of 4 seconds. The disk activity on the RAID set with the non-clustered index shows
>> consistent reads for the rest of the load, with intermiittent write bursts.
>> The feeling we get from the reads is that SQL Server is reading through the existing index nodes to find leaves that need to be
>> updated (splitting when necessary, etc.).
>> The values in the non-clustered index are GUIDs and thus distributed throughout the index.
>> --
>> Scott Nichol
>> "Tom Moreau" <tom@.dont.spam.me.cips.ca> wrote in message news:%23HGP4$LHIHA.1316@.TK2MSFTNGP02.phx.gbl...
>> What fill factors do you have? I'm thinking that there may be a lot of page
>> splitting going on.
>> There could also be disk issues, but you did say that things looked OK
>> there - "Perfmon does not indicate any resource being pinned". What is the
>> average disk queue length? Is it > 24?
>> It may come to pass that you need to use horizontal partitioning and break
>> the table up.
>> --
>> Tom
>> ----
>> Thomas A. Moreau, BSc, PhD, MCSE, MCDBA, MCITP, MCTS
>> SQL Server MVP
>> Toronto, ON Canada
>> https://mvp.support.microsoft.com/profile/Tom.Moreau
>>
>> "Scott Nichol" <snicholnews@.scottnichol.com> wrote in message
>> news:OFCIl3KHIHA.4684@.TK2MSFTNGP06.phx.gbl...
>> Thanks for the responses. If the speed of the BULK INSERT were the only
>> criterion, dropping the non-clustered index would be great.
>> Without it, but with the clustered index and data still there, I load at
>> about 20,000 rows/sec. Unfortuntely, re-building an index
>> on a table with 500+ million records takes much longer than the 3000 seconds
>> to load with the index intact.
>> What I am really wondering is whether anyone else has a similar scenario
>> (clustered and non-clustered indexes on a large table) and
>> whether the affect on BULK INSERT performance is the same order of magnitude
>> as I am experiencing (2 order of magnitude drop).
>> --
>> Scott Nichol
>> "Tom Moreau" <tom@.dont.spam.me.cips.ca> wrote in message
>> news:%238rvAkIHIHA.1548@.TK2MSFTNGP05.phx.gbl...
>> That was an excellent blog post but it was for SQL 2005. In SQL 2005,
>> improvements were made in bulk copy performance, alleviating the need to
>> drop indexes. The OP says that the system is SQL 2000 SP3a.
>> Looking at the fact that the data file is sorted in clustered index order
>> and there is one nonclustered index, I'd be inclined to drop the
>> nonclustered index, do the bulk insert and then recreate the nonclustered
>> index.
>> I'm also curious as to whether or not the data to be inserted are spread
>> throughout the table or are they added to the "end" of the table, i.e. are
>> the clustered index keys to be inserted greater than all existing
>> clustered
>> index keys. If the data are spread throughout the table, I'd want to
>> reduce
>> the fill factor.
>> --
>> Tom
>> ----
>> Thomas A. Moreau, BSc, PhD, MCSE, MCDBA, MCITP, MCTS
>> SQL Server MVP
>> Toronto, ON Canada
>> https://mvp.support.microsoft.com/profile/Tom.Moreau
>>
>> "Uri Dimant" <urid@.iscar.co.il> wrote in message
>> news:u6AXXaIHIHA.700@.TK2MSFTNGP05.phx.gbl...
>> Scott
>> http://sqlblog.com/blogs/linchi_shea/archive/2007/08/27/performance-impact-the-most-optimal-insert-script-can-t-beat-bulkcopy.aspx
>>
>> "Scott Nichol" <snicholnews@.scottnichol.com> wrote in message
>> news:uDASsWIHIHA.1548@.TK2MSFTNGP05.phx.gbl...
>>> Question/sanity check about BULK INSERT performance.
>>>
>>> I load a 300,000 row text file into my largest table every day. The file
>>> loads into an empty table with the same structure and defaults but no
>>> indexes in about 10 seconds, for a rate of 30,000 rows/sec. The same
>>> file
>>> takes more like 3000 seconds to load into the "real" table, which has a
>>> clustered index, one non-clustered index and 547,640,055 rows of data (as
>>> of today). Is the discrepency in rate (30,000 rows/sec vs. 100 rows/sec)
>>> reasonable? Is it because the indexes on the table are so deep that
>>> updating them requires lots of page-at-a-time I/O?
>>>
>>> Some useful info:
>>>
>>> 1. The file being loaded is sorted in clustered index order
>>> 2. The BULK INSERT has the ORDER clause
>>> 3. The TABLOCK option on BULK INSERT makes no difference in performance
>>> (lock is quickly escalated to table when the option is not specified)
>>> 4. Database recovery mode is Bulk-Logged
>>> 5. Doing row-by-row INSERTs takes 5700 seconds or so, i.e. BULK INSERT is
>>> not even twice as fast
>>> 6. SQL Server 2000 SP3 on Windows 2003
>>> 7. SQL Server uses approximately 2 GB RAM (max for our OS + SS versions)
>>> 8. Transaction log is on RAID 1, data on 12-spindle RAID 10,
>>> non-clustered
>>> index on 6-spindle RAID 10. All drives SCSI/320, 15k
>>> 9. Perfmon does not indicate any resource being pinned (or even
>>> challenged)
>>>
>>> --
>>> Scott Nichol
>>>
>>
>>
>>
>>
>|||> The algorithm seems very unlikely to produce a duplicate, but I will
> review it to confirm before trying in production.
I'm glad you found an easy solution. As long as you use a primary key to
guarantee unique values, I don't see a problem using it.
--
Hope this helps.
Dan Guzman
SQL Server MVP
"Scott Nichol" <snicholnews@.scottnichol.com> wrote in message
news:eYlG65bHIHA.4272@.TK2MSFTNGP06.phx.gbl...
> Trying to avoid the pain of making any "real" change, I implemented my own
> alternative to NEWSEQUENTIALID(), and it ROCKS.
> Using the function to generate GUID values (during data massaging) to
> load, BULK INSERT on my development machine took just 23 seconds to load
> each of two files, one 110,766 rows, the other 118,829. That's 4991
> rows/sec on average. For comparison, 117,755 rows loaded in 14 seconds
> (8411 rows/sec) with the non-clustered index dropped. With "normal"
> GUIDs, I was loading just 90 rows/sec on that rig. I look forward to
> seeing this on our production box.
> The caveats are
> 1. I had dropped the non-clustered index to run the test without it, so
> the index that subsequent tests ran against is a fresh re-build,
> presumably as unfragmented as possible.
> 2. The function I used to generate GUIDs was taken from an article on the
> web at http://www.informit.com/articles/article.aspx?p=25862&seqNum=7. It
> combines 10 bytes of a normally generated GUID with 6 bytes of the current
> time. The time portion provides the sequential-ness; the GUID part the
> random-ness. The algorithm seems very unlikely to produce a duplicate,
> but I will review it to confirm before trying in production.
> Thanks for all the great feedback.
> --
> Scott Nichol
> "Dan Guzman" <guzmanda@.nospam-online.sbcglobal.net> wrote in message
> news:B0C99286-2CBB-4C26-AC61-B09D02965D41@.microsoft.com...
>> If we use the sequential integer approach, we only need the GUID in the
>> index when we hit 4G rows and start duplicating the integer values we
>> generate (assuming we use negative as well as positive values), right?
>> You don't need the GUID in the index at all if you go with a bigint and
>> let SQL Server assign the values with IDENTITY. The application I
>> mentioned in my response provided both the integer and GUID values to SQL
>> Server. This allowed the application to assign the surrogate key and
>> bulk insert into related tables.
>> Since the value of the index is used as a pseudo-FK from another table,
>> we'd have some work there as well. Not to mention, of course, adding
>> the new sequential column to the existing half billion rows. Too bad we
>> did not attack the problem earlier!
>> I learned this same lesson the same lesson the hard way. I started with
>> only GUID clustered PK and found that bulk performance dropped at about
>> 5M rows. I then changed to a non-clustered index to mitigate the problem
>> but performance still dropped off with larger tables. Once I added the
>> integer, I was able to use the clustered primary key with good insert
>> performance.
>> My experience was also a good lesson on PMON. The only way to really
>> know if you have a disk bottleneck with a batch application is to know
>> the maximum throughput the disk subsystem can sustain. Once the limit is
>> hit, the only ways to get more throughput is to either add hardware or
>> revise the indexing strategy/application design.
>> It sounds like NEWSEQUENTIALID() would help us tremendously if we moved
>> to SQL Server 2005. Is that correct? We want to migrate during 2008
>> anyway. If it might do at least nearly as well, it would certainly be
>> easier than retrofitting with the sequential integer.
>> Yes, a NEWSEQUENTIALID() default constraint (available in SQL 2005 and
>> 2008), a bigint identity, or the composite key method will certainly
>> improve insert performance.
>> --
>> Hope this helps.
>> Dan Guzman
>> SQL Server MVP
>> "Scott Nichol" <snicholnews@.scottnichol.com> wrote in message
>> news:Ovyu9OWHIHA.280@.TK2MSFTNGP03.phx.gbl...
>> Thanks. You confirm our suspicions about the GUID: we seem to have lots
>> of single I/Os when the index gets updated. There's virtually no I/O
>> queuing and the machine does not seem taxed at all.
>> If we use the sequential integer approach, we only need the GUID in the
>> index when we hit 4G rows and start duplicating the integer values we
>> generate (assuming we use negative as well as positive values), right?
>> Since the value of the index is used as a pseudo-FK from another table,
>> we'd have some work there as well. Not to mention, of course, adding
>> the new sequential column to the existing half billion rows. Too bad we
>> did not attack the problem earlier!
>> It sounds like NEWSEQUENTIALID() would help us tremendously if we moved
>> to SQL Server 2005. Is that correct? We want to migrate during 2008
>> anyway. If it might do at least nearly as well, it would certainly be
>> easier than retrofitting with the sequential integer.
>> --
>> Scott Nichol
>> "Dan Guzman" <guzmanda@.nospam-online.sbcglobal.net> wrote in message
>> news:8187C560-FBFB-40E0-BE14-AB089E2936B3@.microsoft.com...
>> The values in the non-clustered index are GUIDs and thus distributed
>> throughout the index.
>> As Tom said, random GUID values are a real bulk insert performance
>> killer. Buffer efficiency with these random decreases proportionally
>> with table size until you get to the point of requiring at least one
>> I/Os for every row inserted.
>> One workaround I've used with SQL 2000 is to use a composite key
>> instead of GUID alone. I don't know of the details of your application
>> but in my case, I used a unique index consisting of FileId (a
>> sequential integer) and RecordId (a GUID value). This greatly improved
>> bulk insert performance with very large tables.
>> --
>> Hope this helps.
>> Dan Guzman
>> SQL Server MVP
>> "Scott Nichol" <snicholnews@.scottnichol.com> wrote in message
>> news:%23OYnHQQHIHA.2268@.TK2MSFTNGP02.phx.gbl...
>> Good questions. Fill factor is 80% (the default?). The write pattern
>> in general is no writes, then bursts every two minutes or so at
>> checkpoint. During bursts, write queue goes to 100 or more. Each
>> burst is about 8 seconds. The read pattern is more consistent and
>> lighter, with read queue never higher than 6, but normally 0 (and
>> that's with other concurrent activity in the database). The reads are
>> from the RAID 10 set with the non-clustered index. There is very
>> little reading from the set with the clustered index.
>> On a development machine where I can run the BULK INSERT without other
>> activity, the disk activity on the RAID set with the clustered index
>> consists of one write burst of 4 seconds. The disk activity on the
>> RAID set with the non-clustered index shows consistent reads for the
>> rest of the load, with intermiittent write bursts.
>> The feeling we get from the reads is that SQL Server is reading
>> through the existing index nodes to find leaves that need to be
>> updated (splitting when necessary, etc.).
>> The values in the non-clustered index are GUIDs and thus distributed
>> throughout the index.
>> --
>> Scott Nichol
>> "Tom Moreau" <tom@.dont.spam.me.cips.ca> wrote in message
>> news:%23HGP4$LHIHA.1316@.TK2MSFTNGP02.phx.gbl...
>> What fill factors do you have? I'm thinking that there may be a lot
>> of page
>> splitting going on.
>> There could also be disk issues, but you did say that things looked
>> OK
>> there - "Perfmon does not indicate any resource being pinned". What
>> is the
>> average disk queue length? Is it > 24?
>> It may come to pass that you need to use horizontal partitioning and
>> break
>> the table up.
>> --
>> Tom
>> ----
>> Thomas A. Moreau, BSc, PhD, MCSE, MCDBA, MCITP, MCTS
>> SQL Server MVP
>> Toronto, ON Canada
>> https://mvp.support.microsoft.com/profile/Tom.Moreau
>>
>> "Scott Nichol" <snicholnews@.scottnichol.com> wrote in message
>> news:OFCIl3KHIHA.4684@.TK2MSFTNGP06.phx.gbl...
>> Thanks for the responses. If the speed of the BULK INSERT were the
>> only
>> criterion, dropping the non-clustered index would be great.
>> Without it, but with the clustered index and data still there, I load
>> at
>> about 20,000 rows/sec. Unfortuntely, re-building an index
>> on a table with 500+ million records takes much longer than the 3000
>> seconds
>> to load with the index intact.
>> What I am really wondering is whether anyone else has a similar
>> scenario
>> (clustered and non-clustered indexes on a large table) and
>> whether the affect on BULK INSERT performance is the same order of
>> magnitude
>> as I am experiencing (2 order of magnitude drop).
>> --
>> Scott Nichol
>> "Tom Moreau" <tom@.dont.spam.me.cips.ca> wrote in message
>> news:%238rvAkIHIHA.1548@.TK2MSFTNGP05.phx.gbl...
>>> That was an excellent blog post but it was for SQL 2005. In SQL
>>> 2005,
>>> improvements were made in bulk copy performance, alleviating the
>>> need to
>>> drop indexes. The OP says that the system is SQL 2000 SP3a.
>>>
>>> Looking at the fact that the data file is sorted in clustered index
>>> order
>>> and there is one nonclustered index, I'd be inclined to drop the
>>> nonclustered index, do the bulk insert and then recreate the
>>> nonclustered
>>> index.
>>>
>>> I'm also curious as to whether or not the data to be inserted are
>>> spread
>>> throughout the table or are they added to the "end" of the table,
>>> i.e. are
>>> the clustered index keys to be inserted greater than all existing
>>> clustered
>>> index keys. If the data are spread throughout the table, I'd want
>>> to
>>> reduce
>>> the fill factor.
>>>
>>> --
>>> Tom
>>>
>>> ----
>>> Thomas A. Moreau, BSc, PhD, MCSE, MCDBA, MCITP, MCTS
>>> SQL Server MVP
>>> Toronto, ON Canada
>>> https://mvp.support.microsoft.com/profile/Tom.Moreau
>>>
>>>
>>> "Uri Dimant" <urid@.iscar.co.il> wrote in message
>>> news:u6AXXaIHIHA.700@.TK2MSFTNGP05.phx.gbl...
>>> Scott
>>> http://sqlblog.com/blogs/linchi_shea/archive/2007/08/27/performance-impact-the-most-optimal-insert-script-can-t-beat-bulkcopy.aspx
>>>
>>>
>>>
>>> "Scott Nichol" <snicholnews@.scottnichol.com> wrote in message
>>> news:uDASsWIHIHA.1548@.TK2MSFTNGP05.phx.gbl...
>>> Question/sanity check about BULK INSERT performance.
>>>
>>> I load a 300,000 row text file into my largest table every day.
>>> The file
>>> loads into an empty table with the same structure and defaults but
>>> no
>>> indexes in about 10 seconds, for a rate of 30,000 rows/sec. The
>>> same
>>> file
>>> takes more like 3000 seconds to load into the "real" table, which
>>> has a
>>> clustered index, one non-clustered index and 547,640,055 rows of
>>> data (as
>>> of today). Is the discrepency in rate (30,000 rows/sec vs. 100
>>> rows/sec)
>>> reasonable? Is it because the indexes on the table are so deep
>>> that
>>> updating them requires lots of page-at-a-time I/O?
>>>
>>> Some useful info:
>>>
>>> 1. The file being loaded is sorted in clustered index order
>>> 2. The BULK INSERT has the ORDER clause
>>> 3. The TABLOCK option on BULK INSERT makes no difference in
>>> performance
>>> (lock is quickly escalated to table when the option is not
>>> specified)
>>> 4. Database recovery mode is Bulk-Logged
>>> 5. Doing row-by-row INSERTs takes 5700 seconds or so, i.e. BULK
>>> INSERT is
>>> not even twice as fast
>>> 6. SQL Server 2000 SP3 on Windows 2003
>>> 7. SQL Server uses approximately 2 GB RAM (max for our OS + SS
>>> versions)
>>> 8. Transaction log is on RAID 1, data on 12-spindle RAID 10,
>>> non-clustered
>>> index on 6-spindle RAID 10. All drives SCSI/320, 15k
>>> 9. Perfmon does not indicate any resource being pinned (or even
>>> challenged)
>>>
>>> --
>>> Scott Nichol
>>>
>>>
>>>
>>
>>
>>
>>
>|||On Nov 1, 1:49 pm, "Scott Nichol" <snicholn...@.scottnichol.com> wrote:
> Question/sanity check about BULK INSERT performance.
> I load a 300,000 row text file into my largest table every day. The file loads into an empty table with the same structure and
> defaults but no indexes in about 10 seconds, for a rate of 30,000 rows/sec. The same file takes more like 3000 seconds to load into
> the "real" table, which has a clustered index, one non-clustered index and 547,640,055 rows of data (as of today). Is the
> discrepency in rate (30,000 rows/sec vs. 100 rows/sec) reasonable? Is it because the indexes on the table are so deep that updating
> them requires lots of page-at-a-time I/O?
> Some useful info:
> 1. The file being loaded is sorted in clustered index order
> 2. The BULK INSERT has the ORDER clause
> 3. The TABLOCK option on BULK INSERT makes no difference in performance (lock is quickly escalated to table when the option is not
> specified)
> 4. Database recovery mode is Bulk-Logged
> 5. Doing row-by-row INSERTs takes 5700 seconds or so, i.e. BULK INSERT is not even twice as fast
> 6. SQL Server 2000 SP3 on Windows 2003
> 7. SQL Server uses approximately 2 GB RAM (max for our OS + SS versions)
> 8. Transaction log is on RAID 1, data on 12-spindle RAID 10, non-clustered index on 6-spindle RAID 10. All drives SCSI/320, 15k
> 9. Perfmon does not indicate any resource being pinned (or even challenged)
> --
> Scott Nichol
Hi Scott,
we have seen this behaviour for tables with clustered
indexes.....remember that when a table has a clustered index the data
is in the index...our testing showed that putting data into tables
with clustered indexs would result in 2-3x performance improvement for
dimensional style queries...alas....inserting rows into them is
terribly slow.....
Other tips...
1. Make sure you pad the index so there is somewere to put new rows.
2. We used multipart keys for the partitioning column based on country
and time...this is not 'standard' but it means that each query that
hits the database only goes to the country data it needs for the
period it needs. Create lots of small partitions and you will get
faster query response.
3. We also created _tmp tables and used the bulk loader to load the
temp table......we then perform a 'delete from _ins table where row
exists in _tmp table...and then we perform an insert into _ins table
select * from _tmp table. We found this the fastest way to get data
into a table with a clustered index. (In fact we wrote a utility to do
this so that it is only a one statement call in our schedule...we
prepared a Load Interface File and pass it's name to this utility...)
As you point out...dropping and recreating the index is a non
starter.
On other databases (notably oracle)...it is possible to drop indexes
on individual partitions so that the indexes that are on the
partitions that are likely to be updated can be dropped and then put
back and the indexes on partitions that are not changed can stay.
My thoughts on all this are that MSFT are not very serious about
making SQL Server a good dimensional model query database...they are
spending their money on SSAS and advising most people to go with
cubes....we found very little materials anywhere in the world as to
how to best set up the underlying tables for fast query performance in
a dimensional model......we had to do a fwe weeks of research to
figure out how to make it go as fast as it is possible to make it
go...
Good Luck
Peter Nolan
www.peternolan.com|||Thanks.
In other postings on this topic, I have discovered the root of the problem and a possible resolution.
The clustered index itself reduces speed very slightly compared to no indexes at all. The additional nonclustered index, the key
for which is a GUID, caused a two orders of magnitude reduction in speed. The solution I found with the least work is to generate
sequential GUIDs, somewhat like the NEWSEQUENTIALID() function in SQL Server 2005. Using these, the performance drop is less than
50% from the scenario with no nonclustered index.
--
Scott Nichol
"Peter Nolan" <peter@.peternolan.com> wrote in message news:1194107237.342015.198360@.v3g2000hsg.googlegroups.com...
> On Nov 1, 1:49 pm, "Scott Nichol" <snicholn...@.scottnichol.com> wrote:
>> Question/sanity check about BULK INSERT performance.
>> I load a 300,000 row text file into my largest table every day. The file loads into an empty table with the same structure and
>> defaults but no indexes in about 10 seconds, for a rate of 30,000 rows/sec. The same file takes more like 3000 seconds to load
>> into
>> the "real" table, which has a clustered index, one non-clustered index and 547,640,055 rows of data (as of today). Is the
>> discrepency in rate (30,000 rows/sec vs. 100 rows/sec) reasonable? Is it because the indexes on the table are so deep that
>> updating
>> them requires lots of page-at-a-time I/O?
>> Some useful info:
>> 1. The file being loaded is sorted in clustered index order
>> 2. The BULK INSERT has the ORDER clause
>> 3. The TABLOCK option on BULK INSERT makes no difference in performance (lock is quickly escalated to table when the option is
>> not
>> specified)
>> 4. Database recovery mode is Bulk-Logged
>> 5. Doing row-by-row INSERTs takes 5700 seconds or so, i.e. BULK INSERT is not even twice as fast
>> 6. SQL Server 2000 SP3 on Windows 2003
>> 7. SQL Server uses approximately 2 GB RAM (max for our OS + SS versions)
>> 8. Transaction log is on RAID 1, data on 12-spindle RAID 10, non-clustered index on 6-spindle RAID 10. All drives SCSI/320, 15k
>> 9. Perfmon does not indicate any resource being pinned (or even challenged)
>> --
>> Scott Nichol
> Hi Scott,
> we have seen this behaviour for tables with clustered
> indexes.....remember that when a table has a clustered index the data
> is in the index...our testing showed that putting data into tables
> with clustered indexs would result in 2-3x performance improvement for
> dimensional style queries...alas....inserting rows into them is
> terribly slow.....
> Other tips...
> 1. Make sure you pad the index so there is somewere to put new rows.
> 2. We used multipart keys for the partitioning column based on country
> and time...this is not 'standard' but it means that each query that
> hits the database only goes to the country data it needs for the
> period it needs. Create lots of small partitions and you will get
> faster query response.
> 3. We also created _tmp tables and used the bulk loader to load the
> temp table......we then perform a 'delete from _ins table where row
> exists in _tmp table...and then we perform an insert into _ins table
> select * from _tmp table. We found this the fastest way to get data
> into a table with a clustered index. (In fact we wrote a utility to do
> this so that it is only a one statement call in our schedule...we
> prepared a Load Interface File and pass it's name to this utility...)
> As you point out...dropping and recreating the index is a non
> starter.
> On other databases (notably oracle)...it is possible to drop indexes
> on individual partitions so that the indexes that are on the
> partitions that are likely to be updated can be dropped and then put
> back and the indexes on partitions that are not changed can stay.
> My thoughts on all this are that MSFT are not very serious about
> making SQL Server a good dimensional model query database...they are
> spending their money on SSAS and advising most people to go with
> cubes....we found very little materials anywhere in the world as to
> how to best set up the underlying tables for fast query performance in
> a dimensional model......we had to do a fwe weeks of research to
> figure out how to make it go as fast as it is possible to make it
> go...
> Good Luck
> Peter Nolan
> www.peternolan.com
>

Monday, March 19, 2012

BULK INSERT performance

No comments:

Post a Comment

build sql script

Blog Archive

About Me