On Nov 1, 1:49 pm, "Scott Nichol" <snicholn...@.scottnichol.com> wrote:
> Question/sanity check about BULK INSERT performance.
> I load a 300,000 row text file into my largest table every day. The file loads into an empty table with the same structure and
> defaults but no indexes in about 10 seconds, for a rate of 30,000 rows/sec. The same file takes more like 3000 seconds to load into
> the "real" table, which has a clustered index, one non-clustered index and 547,640,055 rows of data (as of today). Is the
> discrepency in rate (30,000 rows/sec vs. 100 rows/sec) reasonable? Is it because the indexes on the table are so deep that updating
> them requires lots of page-at-a-time I/O?
> Some useful info:
> 1. The file being loaded is sorted in clustered index order
> 2. The BULK INSERT has the ORDER clause
> 3. The TABLOCK option on BULK INSERT makes no difference in performance (lock is quickly escalated to table when the option is not
> specified)
> 4. Database recovery mode is Bulk-Logged
> 5. Doing row-by-row INSERTs takes 5700 seconds or so, i.e. BULK INSERT is not even twice as fast
> 6. SQL Server 2000 SP3 on Windows 2003
> 7. SQL Server uses approximately 2 GB RAM (max for our OS + SS versions)
> 8. Transaction log is on RAID 1, data on 12-spindle RAID 10, non-clustered index on 6-spindle RAID 10. All drives SCSI/320, 15k
> 9. Perfmon does not indicate any resource being pinned (or even challenged)
> --
> Scott Nichol
Hi Scott,
we have seen this behaviour for tables with clustered
indexes.....remember that when a table has a clustered index the data
is in the index...our testing showed that putting data into tables
with clustered indexs would result in 2-3x performance improvement for
dimensional style queries...alas....inserting rows into them is
terribly slow.....
Other tips...
1. Make sure you pad the index so there is somewere to put new rows.
2. We used multipart keys for the partitioning column based on country
and time...this is not 'standard' but it means that each query that
hits the database only goes to the country data it needs for the
period it needs. Create lots of small partitions and you will get
faster query response.
3. We also created _tmp tables and used the bulk loader to load the
temp table......we then perform a 'delete from _ins table where row
exists in _tmp table...and then we perform an insert into _ins table
select * from _tmp table. We found this the fastest way to get data
into a table with a clustered index. (In fact we wrote a utility to do
this so that it is only a one statement call in our schedule...we
prepared a Load Interface File and pass it's name to this utility...)
As you point out...dropping and recreating the index is a non
starter.
On other databases (notably oracle)...it is possible to drop indexes
on individual partitions so that the indexes that are on the
partitions that are likely to be updated can be dropped and then put
back and the indexes on partitions that are not changed can stay.
My thoughts on all this are that MSFT are not very serious about
making SQL Server a good dimensional model query database...they are
spending their money on SSAS and advising most people to go with
cubes....we found very little materials anywhere in the world as to
how to best set up the underlying tables for fast query performance in
a dimensional model......we had to do a fwe weeks of research to
figure out how to make it go as fast as it is possible to make it
go...
Good Luck
Peter Nolan
www.peternolan.com
Thanks.
In other postings on this topic, I have discovered the root of the problem and a possible resolution.
The clustered index itself reduces speed very slightly compared to no indexes at all. The additional nonclustered index, the key
for which is a GUID, caused a two orders of magnitude reduction in speed. The solution I found with the least work is to generate
sequential GUIDs, somewhat like the NEWSEQUENTIALID() function in SQL Server 2005. Using these, the performance drop is less than
50% from the scenario with no nonclustered index.
Scott Nichol
"Peter Nolan" <peter@.peternolan.com> wrote in message news:1194107237.342015.198360@.v3g2000hsg.googlegro ups.com...
> On Nov 1, 1:49 pm, "Scott Nichol" <snicholn...@.scottnichol.com> wrote:
> Hi Scott,
> we have seen this behaviour for tables with clustered
> indexes.....remember that when a table has a clustered index the data
> is in the index...our testing showed that putting data into tables
> with clustered indexs would result in 2-3x performance improvement for
> dimensional style queries...alas....inserting rows into them is
> terribly slow.....
> Other tips...
> 1. Make sure you pad the index so there is somewere to put new rows.
> 2. We used multipart keys for the partitioning column based on country
> and time...this is not 'standard' but it means that each query that
> hits the database only goes to the country data it needs for the
> period it needs. Create lots of small partitions and you will get
> faster query response.
> 3. We also created _tmp tables and used the bulk loader to load the
> temp table......we then perform a 'delete from _ins table where row
> exists in _tmp table...and then we perform an insert into _ins table
> select * from _tmp table. We found this the fastest way to get data
> into a table with a clustered index. (In fact we wrote a utility to do
> this so that it is only a one statement call in our schedule...we
> prepared a Load Interface File and pass it's name to this utility...)
> As you point out...dropping and recreating the index is a non
> starter.
> On other databases (notably oracle)...it is possible to drop indexes
> on individual partitions so that the indexes that are on the
> partitions that are likely to be updated can be dropped and then put
> back and the indexes on partitions that are not changed can stay.
> My thoughts on all this are that MSFT are not very serious about
> making SQL Server a good dimensional model query database...they are
> spending their money on SSAS and advising most people to go with
> cubes....we found very little materials anywhere in the world as to
> how to best set up the underlying tables for fast query performance in
> a dimensional model......we had to do a fwe weeks of research to
> figure out how to make it go as fast as it is possible to make it
> go...
> Good Luck
> Peter Nolan
> www.peternolan.com
>
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment