Tuesday, March 27, 2012
bulk load option native? or formatfile?
I had knowed native option is faster.
But test resulted that formatfile is faster than native option.
Test data is 120000 row, and text file size is 25M.
And I monitored log records, log bytes, elapsed time.
Result : log records, log bytes, elapsed time(ss).
native 7934, 468212, 12
fomatfile 5041, 297452, 9
Is this a special case because data size is much small?
Or is this truth?
Thank you!
>
> I tested bulk load.
> I had knowed native option is faster.
> But test resulted that formatfile is faster than native option.
> Test data is 120000 row, and text file size is 25M.
> And I monitored log records, log bytes, elapsed time.
> Result : log records, log bytes, elapsed time(ss).
> native 7934, 468212, 12
> fomatfile 5041, 297452, 9
> Is this a special case because data size is much small?
> Or is this truth?
> Thank you!
>
There are always exceptions to a rule. SQL Server documentation says that
native format is the fastest form of data transfer. There is also a table
in SQL Server books online that tells you what format to use on which
circumstances.
Hope this helps,
Eric Crdenas
Senior support professional
This posting is provided "AS IS" with no warranties, and confers no rights.
bulk load option native? or formatfile?
I had knowed native option is faster.
But test resulted that formatfile is faster than native option.
Test data is 120000 row, and text file size is 25M.
And I monitored log records, log bytes, elapsed time.
Result : log records, log bytes, elapsed time(ss).
native 7934, 468212, 12
fomatfile 5041, 297452, 9
Is this a special case because data size is much small?
Or is this truth?
Thank you!>
> I tested bulk load.
> I had knowed native option is faster.
> But test resulted that formatfile is faster than native option.
> Test data is 120000 row, and text file size is 25M.
> And I monitored log records, log bytes, elapsed time.
> Result : log records, log bytes, elapsed time(ss).
> native 7934, 468212, 12
> fomatfile 5041, 297452, 9
> Is this a special case because data size is much small?
> Or is this truth?
> Thank you!
>
--
There are always exceptions to a rule. SQL Server documentation says that
native format is the fastest form of data transfer. There is also a table
in SQL Server books online that tells you what format to use on which
circumstances.
Hope this helps,
Eric Crdenas
Senior support professional
This posting is provided "AS IS" with no warranties, and confers no rights.sql
bulk load option native? or formatfile?
I had knowed native option is faster.
But test resulted that formatfile is faster than native option
Test data is 120000 row, and text file size is 25M
And I monitored log records, log bytes, elapsed time
Result : log records, log bytes, elapsed time(ss)
native 7934, 468212, 1
fomatfile 5041, 297452,
Is this a special case because data size is much small
Or is this truth
Thank you!>
> I tested bulk load.
> I had knowed native option is faster.
> But test resulted that formatfile is faster than native option.
> Test data is 120000 row, and text file size is 25M.
> And I monitored log records, log bytes, elapsed time.
> Result : log records, log bytes, elapsed time(ss).
> native 7934, 468212, 12
> fomatfile 5041, 297452, 9
> Is this a special case because data size is much small?
> Or is this truth?
> Thank you!
>
--
There are always exceptions to a rule. SQL Server documentation says that
native format is the fastest form of data transfer. There is also a table
in SQL Server books online that tells you what format to use on which
circumstances.
Hope this helps,
--
Eric Cárdenas
Senior support professional
This posting is provided "AS IS" with no warranties, and confers no rights.
Sunday, March 25, 2012
Bulk Insert: One Row not importing in...
Ok so I got this working:
BULK INSERT
dbo.tbCheckPointTest
FROM 'c:\inetpub\wwwroot\upload\maxie_task_aging.csv'
WITH
(DATAFILETYPE = 'char', FIELDTERMINATOR = ',',
FIRSTROW = 2, ROWS_PER_BATCH = 6, ROWTERMINATOR = '\n')
My data file looks like this:
Total of 6 rows, so I indicate to start on Row 2, yet only the last three lines are getting into my table.
I have changed and manipulated the query varies counts, but still just the last three lines get imported.
Thoughts?
Thanks!
Hi,
From the looks of your data file, it is difficult to tell how many blank lines are associated with your data file.
It appears that "Research" may begin on line 4, in which case you would have only three rows of data: "Research", "State", and "Program".
I am confused by the fact that you get the last three rows: "State", "Program", and "Program Tech".
Is it possible that some rows are terminated by "0A0D" and others only by "0A" or "0D", and "\n" is being interpreted as strictly "0A0D" (hexadecimal)?
I also am confused by your use of "FIELDTERMINATOR = ',' ", since I don't see any commas in the data you have displayed.
Dan
|||Sorry. I had the .csv open with excel when I copied and pasted it:
REPORT_EXECUTED_ON:20070404_110001
Research Specialist - Research , 340, 91, 2635
State Worker (Interim) , 918, 3, 0
Program Specialist - Quality Assurance Specialist , 16, 1, 0
Program Technician - Data Entry Hold , 82, 28, 160
I am going to see if anything you suggested will help.
|||The problem seems to be that the BULK INSERT function seems to expect all the rows in the file to have the correct number of fields for the target table (even those being skipped). So it is ignoring the newlines until it has found enough commas (and then counts any other commas until it finds the row terminator as part of that field).
Try loading the file:
REPORT_EXECUTED_ON:20070404_110001,,,
,,,
Research Specialist - Research , 340, 91, 2635
State Worker (Interim) , 918, 3, 0
Program Specialist - Quality Assurance Specialist , 16, 1, 0
Program Technician - Data Entry Hold , 82, 28, 160
Though you will need a FIRSTROW = 3
If you put the last field into a varchar rather than an int then if you add commas to the end (e.g. 2635,,,,) then you will find that these commas end up in that field (if not then it will complain about unexpected characters when it tries to convert it to an int or whatever).
So any rows to be skipped should have the correct number of field separators in them (3) or they will not be included in the skip count.
|||Thank you, that solved the issue. Now I have to get w/the programmer who delivers the file to me.
Nice catch, Dhericean!
Dan
Bulk Insert vs. Data Flow Task (different row results using flat file source)
With the Bulk Import Task I'm putting all the csv rows in one column. With the Data Flow Task I'm mapping each csv value to it's own column in the SQL table.
I used two different flat file sources and got the following:
Flat file 1: Bulk Import Task = 12,649,499 rows; Data Flow Task = 4,215,817 rows
Flat file 2: Bulk Import Task = 3,403,254 rows; Data Flow Task = 1,134,359 rows
Anyone have any guess as to why this is happening?
It seems that there's a factor of 3 in between the two components, which could mean that Flat File Connection Manager is interpereting 3 rows as 1. This might be related with the row delimiter in the Flat File connection manager.
Can you try to use preview in the Flat File connection manager, and see if rows show up correctly?
|||Yeah, thanks for the reply. I've checked the preview and it looks fine. I've also tracked down some of the skipped rows and there's nothing signficant about them. They look exactly the same as all the others. Perhaps there is a hidden character or something? Is this possible?|||
Can you compare the rows imported by FlatFile with the rows imported by Bulk insert task and see if there is any difference between them?
Thanks,
Ovidiu Burlacu
Can you put a data viewer in your pipeline and see what rows pass through at execution time?
Thanks,
Ovidiu Burlacu
BULK INSERT UTF8 File ERROR 4864 on first row first column
we try to import a UTF8 File (at first with SSIS BULK INSERT but now with SQL to track down the errors) and always face the same issue - the first row does not import with error "Msg 4864, Level 16, State 1, Line 10 Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 1, column 1 (PSGT_ID)."
The Statements see below. We need to use utf8 because thets the only we can get - with ansi oder unicode it works - so why not utf8? (Codepage 65001?)
--CREATE TABLE [BULKTEST](
-- [PSGT_ID] [numeric](38, 0) NOT NULL primary key,
-- [PSGT_PSG_ID] [numeric](38, 0) NOT NULL,
-- [PSGT_PSP_ID] [numeric](38, 0) NOT NULL,
-- [PSGT_KURZTEXTM] [nvarchar](5) COLLATE Latin1_General_CI_AS NULL,
-- [PSGT_TEXT] [nvarchar](40) COLLATE Latin1_General_CI_AS NULL, -
- [PSGT_AENDDATUM] [datetime] NOT NULL)
truncate table [BULKTEST]
BULK INSERT [BULKTEST] FROM 'C:\test.txt' WITH( CHECK_CONSTRAINTS, CODEPAGE='65001', DATAFILETYPE='char', FIELDTERMINATOR='|', ROWTERMINATOR='\n')
select * from [BULKTEST]
As test data you could use something like this
1|2|3|4|5|2006-01-01
2|2|3|4|5|2006-01-01
3|2|3|4|5|2006-01-01
4|2|3|4|5|2006-01-01
in notepad in a text file stored as UTF8!
We really need help on this!
Best regards, HANNES
Try using the DATAFILETYPE = 'widechar'. You can also try using a format file.|||I have tried widechar and also a format file - but it stays the same.
Maybe we are a little bit further - could it by that notepad incorrectly adds some characters at the beginning of the file?
Like  - if i display them on the comandline with type <filename>?
LG HANNES
|||Check this Blog http://sqlblogcasts.com/blogs/dong/archive/2006/11/27/how-to-bulk-insert-your-user-defined-types-udt.aspx in this case.Thursday, March 22, 2012
Bulk Insert Task - Where Does The First Row Begin?
I am using the Bulk Insert Task to bulk copy data from a flat file to a SQL Server 2005 table. The flat file contains pipe ( | ) delimited columns and several thousand records which are {CR}{LF} delimited. The file also contains a header record and trailer record, which contain several words but also contains a ( | ) symbol among those words.
E.g.:
HEDR | yadi yadi yada
500 | Data Data | More Data | 600
460 | Datum | More More | 705
550 | Data | Data | 603
FOOTR | yadi yadi yada
I need to ignore the header and trailer records and just pickup the proper records. But even though I manually set the First Row property to 2 and the Last Row property to 4, It does not pickup the first true record i.e. the record which begins with 500, and if I set the First Row to 1, it throws me an error citing 'truncation error' or similar. I think it is taking the first record (i.e. header row along with the first row as one, and since there are now more pipes ( | ) the error is thrown)
I've tried setting different values for these properties but to no avail... Any help will be deeply appreciated...
Thanks
Gogula
Bulk insert task expects the header and footer records also to be in the same format as other Data Rows, so to do, what you are trying to do, your data needs to be in this format:
HEDR | yadi | yadi | yada
500 | Data Data | More Data | 600
460 | Datum | More More | 705
550 | Data | Data | 603
FOOTR| yadi | yadi | yada
This is the same behavior as Transact-SQL bulk insert, and if you try something like the following with your file, you will encounter errors as well. Therefore, this is by design, according to what bulk insert supports.
BULK INSERT tempdb.dbo.Table_1
FROM 'C:\\File.txt'
WITH
(
FIELDTERMINATOR ='|',
ROWTERMINATOR ='\n',
firstrow = 2,
lastrow = 4
)
|||Hi Rangeeta,Thanks for the reply. But what if in case there might be more than one header and footer (which I programmatic ally count and take into consideration) and those headers cannot be separated by the delimiters (since the client needs to send it to us)? It seems a little unfair that Bulk Insert expects the headers to be in the same format as the other rows...
Are there any alternate methods to do this?
Thanks again
Gogula
Bulk insert skips rows
Hi Guys,
My little bulk insert is only bringing every second row of a CSV file. this is not good as i need every row.
My SQLcommand is thus.
InsertCommand="BULK INSERT TBL_Unitel_services FROM 'C:/webroot/servicedesk/csvs_Services/csv.csv' WITH (FIRSTROW = 1, FIELDTERMINATOR = ',', ROWTERMINATOR = '\n', MAXERRORS = 0) "
Every thing looks fine, just change this
ROWTERMINATOR = '\n'
as
ROWTERMINATOR = '\r\n'
Let me know if you still have issues or Mark the thread as answered if it answers your issue.
|||no joy.
I am sure its becasue i have a computed column at the end of the table. If i remove it all the rows are entered, with it only half come in GAHHHH!!
sqlBULK INSERT skips a row
We have a flat file with a header record. A simplified example follows:
HEADER
John,Smith
Fred,Bloggs
But when we set bulk insert to skip the first row it actually skips the
John,Smith row as well. Run the script below after saving the CSV data out
to a file called user.csv...
create table [User]
(
FirstName varchar ( 32 ),
LastName varchar ( 32 )
)
bulk insert [User] from 'user.csv'
with
(
FIRSTROW = 2,
DATAFILETYPE = 'char',
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
)
select * from [user] -- only one record
Please help!!
McG
[url]http://mcg
McG
[url]http://mcg
"McG
news:#1#7JPAVGHA.5332@.tk2msftngp13.phx.gbl...
> Hi. Not sure if this is a bug with bulk insert or standard behaviour.
> We have a flat file with a header record. A simplified example follows:
> HEADER
> John,Smith
> Fred,Bloggs
> But when we set bulk insert to skip the first row it actually skips the
> John,Smith row as well. Run the script below after saving the CSV data out
> to a file called user.csv...
> create table [User]
> (
> FirstName varchar ( 32 ),
> LastName varchar ( 32 )
> )
> bulk insert [User] from 'user.csv'
> with
> (
> FIRSTROW = 2,
> DATAFILETYPE = 'char',
> FIELDTERMINATOR = ',',
> ROWTERMINATOR = '\n'
> )
> select * from [user] -- only one record
> Please help!!
> --
> McG
> [url]http://mcg
>
>|||Some solutions are
1. Pre-process the file to remove the header line (use a DOS
or Unix-like utility, for example).
2. Bulk insert the data into a staging table with one column,
then parse the data into columns using SQL instead of the
bulk insert process.
3. Use SQL Server Integration Services. I'm no expert, but I believe
it can skip (or import into a different table) rows based on the
contents of the row, which bulk insert cannot do.
The FIRSTROW parameter for bulk insert, as you have discovered,
is not a FIRST_LINE_OF_THE_TEXT_FILE parameter.
Steve Kass
Drew University
McG
>Anyone with ideas on how to get around this limitation?
>--
>McG
>[url]http://mcg
>
>"McG
>news:#1#7JPAVGHA.5332@.tk2msftngp13.phx.gbl...
>
>
>|||McG
Alas, this is standard behaviour.
BULK INSERT and BCP live in a very squared world. They don't think in
terms of lines, they think in terms of records and fields. (But calls them
rows and columns to
The first field is from the beginning of file until the terminator for
the first field. (For terminated fields. Fields can also be fix-length,
or include a length-prefix.) The second field lasts until the terminator
for that field. And so on, up to the last field which lasts until the
terminator for that field. That terminator is called a "row terminator",
but as long as BCP is not reading the last field, the terminator carries
no meaning. Once the last field has been read, bulk copy starts with
the first field again.
This has some advantages and di
you can handle embedded row terminators, for instance embedded newlines
(except in the last field). The bad part is that you can't skip a header
which does not conform with the file. Because as far as BCP is concerned,
that header is part of the first field.
The workaround is to run a program first that strips the header, which
for a large file is not very appealing.
I know I have subumitted a suggestion for an improvement to bulk copy
for a future release of SQL Server, so that you should be able to describe
a header to be skipped. The suggestion is on
http://lab.msdn.microsoft.com/productfeedback/ but alas, the site
appears to experience some problems right now, so I cannot give the
exact link. But when it's back, search for by name, "bulk" and SQL 2005
to find it and vote for it.
--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx
Bulk Insert Row Terminator
Thanks,If it's just a line feed (hex 0A), then you should be able to use '\l\' as the rowterminator. (Note that is a lowercase "L".)
Terri|||Ryan..did that work for you?
BULK INSERT row terminator
I have a multiple text files with an odd row terminator. If you were to
examine it in VB it would be like a "CrCrLf" instead of just "CrLf". In
HEX it is DDA instead of just DA. When I am trying to import into my
table using BULK INSERT I use "\n" as the row terminator but that is
putting the the previous character into the column and then it signals
a carriage return when I attempt to query the data.
Any suggestions on what I should use as the row terminator? Is it
possible to tell BULK INSERT to use something like "CHAR(10)\n"?
"\n\n" does NOT work.
Thanks in advance.I found it!
"\r\n" works like a charm.. "\r" is for carriage return only.. I'm
surprised I didn't already know that.
I found the answer here:
http://www4.dogus.edu.tr/bim/bil_ka...l65dba/ch16.htm
under table:
Table 16.4. Valid field terminators.
Terminator Type Syntax
tab \t
new line \n
carriage return \r
backslash \\
NULL terminator \0
user-defined terminator character (^, %, *, and so on)sql
Tuesday, March 20, 2012
bulk insert problem
the column delimiter is @., row delimiter is \n, line feed.
But here is the problem, we have a column called "comment",
basically user can enter any characters to it, so if "comment" column
contains
@. character, then sql server think it's an extra column. However, that's
why before @. character we put \ , it's \@. in comment column now. BUT How
come it doesn't do escape for @. character? Is it possible to create an
exception for delimiter in case if Sql server sees \@. , then it'll ignore it
and will not treat it as delimiter?
thanksWhy not do a global replace of @. with some other sequence of characters that
is not already found in the file? Import the file and then do another
replace in the column to put the characters back to @..
Andrew J. Kelly SQL MVP
"Kevin" <Kevin@.discussions.microsoft.com> wrote in message
news:3D940ABE-6D54-48DA-9164-0D22787DB201@.microsoft.com...
> hi guys, I have a data file that I like to import to a table.
> the column delimiter is @., row delimiter is \n, line feed.
> But here is the problem, we have a column called "comment",
> basically user can enter any characters to it, so if "comment" column
> contains
> @. character, then sql server think it's an extra column. However,
> that's
> why before @. character we put \ , it's \@. in comment column now. BUT
> How
> come it doesn't do escape for @. character? Is it possible to create an
> exception for delimiter in case if Sql server sees \@. , then it'll ignore
> it
> and will not treat it as delimiter?
> thanks
>|||in this case, it's possible to write perl script to do global replacement in
windows?
This way I can make it automate process.
"Andrew J. Kelly" wrote:
> Why not do a global replace of @. with some other sequence of characters th
at
> is not already found in the file? Import the file and then do another
> replace in the column to put the characters back to @..
> --
> Andrew J. Kelly SQL MVP
>
> "Kevin" <Kevin@.discussions.microsoft.com> wrote in message
> news:3D940ABE-6D54-48DA-9164-0D22787DB201@.microsoft.com...
>
>|||I thought this might have been a one time import but I don't see why you
can't use perl (or some other utility) to automate this.
Andrew J. Kelly SQL MVP
"Kevin" <Kevin@.discussions.microsoft.com> wrote in message
news:6D17EF5E-0150-4F38-80BB-91D6F086CA70@.microsoft.com...
> in this case, it's possible to write perl script to do global replacement
> in
> windows?
> This way I can make it automate process.
>
> "Andrew J. Kelly" wrote:
>sql
Monday, March 19, 2012
BULK INSERT performance
I load a 300,000 row text file into my largest table every day. The file loads into an empty table with the same structure and
defaults but no indexes in about 10 seconds, for a rate of 30,000 rows/sec. The same file takes more like 3000 seconds to load into
the "real" table, which has a clustered index, one non-clustered index and 547,640,055 rows of data (as of today). Is the
discrepency in rate (30,000 rows/sec vs. 100 rows/sec) reasonable? Is it because the indexes on the table are so deep that updating
them requires lots of page-at-a-time I/O?
Some useful info:
1. The file being loaded is sorted in clustered index order
2. The BULK INSERT has the ORDER clause
3. The TABLOCK option on BULK INSERT makes no difference in performance (lock is quickly escalated to table when the option is not
specified)
4. Database recovery mode is Bulk-Logged
5. Doing row-by-row INSERTs takes 5700 seconds or so, i.e. BULK INSERT is not even twice as fast
6. SQL Server 2000 SP3 on Windows 2003
7. SQL Server uses approximately 2 GB RAM (max for our OS + SS versions)
8. Transaction log is on RAID 1, data on 12-spindle RAID 10, non-clustered index on 6-spindle RAID 10. All drives SCSI/320, 15k
9. Perfmon does not indicate any resource being pinned (or even challenged)
--
Scott NicholScott
http://sqlblog.com/blogs/linchi_shea/archive/2007/08/27/performance-impact-the-most-optimal-insert-script-can-t-beat-bulkcopy.aspx
"Scott Nichol" <snicholnews@.scottnichol.com> wrote in message
news:uDASsWIHIHA.1548@.TK2MSFTNGP05.phx.gbl...
> Question/sanity check about BULK INSERT performance.
> I load a 300,000 row text file into my largest table every day. The file
> loads into an empty table with the same structure and defaults but no
> indexes in about 10 seconds, for a rate of 30,000 rows/sec. The same file
> takes more like 3000 seconds to load into the "real" table, which has a
> clustered index, one non-clustered index and 547,640,055 rows of data (as
> of today). Is the discrepency in rate (30,000 rows/sec vs. 100 rows/sec)
> reasonable? Is it because the indexes on the table are so deep that
> updating them requires lots of page-at-a-time I/O?
> Some useful info:
> 1. The file being loaded is sorted in clustered index order
> 2. The BULK INSERT has the ORDER clause
> 3. The TABLOCK option on BULK INSERT makes no difference in performance
> (lock is quickly escalated to table when the option is not specified)
> 4. Database recovery mode is Bulk-Logged
> 5. Doing row-by-row INSERTs takes 5700 seconds or so, i.e. BULK INSERT is
> not even twice as fast
> 6. SQL Server 2000 SP3 on Windows 2003
> 7. SQL Server uses approximately 2 GB RAM (max for our OS + SS versions)
> 8. Transaction log is on RAID 1, data on 12-spindle RAID 10, non-clustered
> index on 6-spindle RAID 10. All drives SCSI/320, 15k
> 9. Perfmon does not indicate any resource being pinned (or even
> challenged)
> --
> Scott Nichol
>|||That was an excellent blog post but it was for SQL 2005. In SQL 2005,
improvements were made in bulk copy performance, alleviating the need to
drop indexes. The OP says that the system is SQL 2000 SP3a.
Looking at the fact that the data file is sorted in clustered index order
and there is one nonclustered index, I'd be inclined to drop the
nonclustered index, do the bulk insert and then recreate the nonclustered
index.
I'm also curious as to whether or not the data to be inserted are spread
throughout the table or are they added to the "end" of the table, i.e. are
the clustered index keys to be inserted greater than all existing clustered
index keys. If the data are spread throughout the table, I'd want to reduce
the fill factor.
--
Tom
----
Thomas A. Moreau, BSc, PhD, MCSE, MCDBA, MCITP, MCTS
SQL Server MVP
Toronto, ON Canada
https://mvp.support.microsoft.com/profile/Tom.Moreau
"Uri Dimant" <urid@.iscar.co.il> wrote in message
news:u6AXXaIHIHA.700@.TK2MSFTNGP05.phx.gbl...
Scott
http://sqlblog.com/blogs/linchi_shea/archive/2007/08/27/performance-impact-the-most-optimal-insert-script-can-t-beat-bulkcopy.aspx
"Scott Nichol" <snicholnews@.scottnichol.com> wrote in message
news:uDASsWIHIHA.1548@.TK2MSFTNGP05.phx.gbl...
> Question/sanity check about BULK INSERT performance.
> I load a 300,000 row text file into my largest table every day. The file
> loads into an empty table with the same structure and defaults but no
> indexes in about 10 seconds, for a rate of 30,000 rows/sec. The same file
> takes more like 3000 seconds to load into the "real" table, which has a
> clustered index, one non-clustered index and 547,640,055 rows of data (as
> of today). Is the discrepency in rate (30,000 rows/sec vs. 100 rows/sec)
> reasonable? Is it because the indexes on the table are so deep that
> updating them requires lots of page-at-a-time I/O?
> Some useful info:
> 1. The file being loaded is sorted in clustered index order
> 2. The BULK INSERT has the ORDER clause
> 3. The TABLOCK option on BULK INSERT makes no difference in performance
> (lock is quickly escalated to table when the option is not specified)
> 4. Database recovery mode is Bulk-Logged
> 5. Doing row-by-row INSERTs takes 5700 seconds or so, i.e. BULK INSERT is
> not even twice as fast
> 6. SQL Server 2000 SP3 on Windows 2003
> 7. SQL Server uses approximately 2 GB RAM (max for our OS + SS versions)
> 8. Transaction log is on RAID 1, data on 12-spindle RAID 10, non-clustered
> index on 6-spindle RAID 10. All drives SCSI/320, 15k
> 9. Perfmon does not indicate any resource being pinned (or even
> challenged)
> --
> Scott Nichol
>|||Tom
> drop indexes. The OP says that the system is SQL 2000 SP3a.
Correct, just missed it.
> Looking at the fact that the data file is sorted in clustered index order
> and there is one nonclustered index, I'd be inclined to drop the
> nonclustered index, do the bulk insert and then recreate the nonclustered
> index.
Agreed as well
"Tom Moreau" <tom@.dont.spam.me.cips.ca> wrote in message
news:%238rvAkIHIHA.1548@.TK2MSFTNGP05.phx.gbl...
> That was an excellent blog post but it was for SQL 2005. In SQL 2005,
> improvements were made in bulk copy performance, alleviating the need to
> drop indexes. The OP says that the system is SQL 2000 SP3a.
> Looking at the fact that the data file is sorted in clustered index order
> and there is one nonclustered index, I'd be inclined to drop the
> nonclustered index, do the bulk insert and then recreate the nonclustered
> index.
> I'm also curious as to whether or not the data to be inserted are spread
> throughout the table or are they added to the "end" of the table, i.e. are
> the clustered index keys to be inserted greater than all existing
> clustered
> index keys. If the data are spread throughout the table, I'd want to
> reduce
> the fill factor.
> --
> Tom
> ----
> Thomas A. Moreau, BSc, PhD, MCSE, MCDBA, MCITP, MCTS
> SQL Server MVP
> Toronto, ON Canada
> https://mvp.support.microsoft.com/profile/Tom.Moreau
>
> "Uri Dimant" <urid@.iscar.co.il> wrote in message
> news:u6AXXaIHIHA.700@.TK2MSFTNGP05.phx.gbl...
> Scott
> http://sqlblog.com/blogs/linchi_shea/archive/2007/08/27/performance-impact-the-most-optimal-insert-script-can-t-beat-bulkcopy.aspx
>
> "Scott Nichol" <snicholnews@.scottnichol.com> wrote in message
> news:uDASsWIHIHA.1548@.TK2MSFTNGP05.phx.gbl...
>> Question/sanity check about BULK INSERT performance.
>> I load a 300,000 row text file into my largest table every day. The file
>> loads into an empty table with the same structure and defaults but no
>> indexes in about 10 seconds, for a rate of 30,000 rows/sec. The same
>> file
>> takes more like 3000 seconds to load into the "real" table, which has a
>> clustered index, one non-clustered index and 547,640,055 rows of data (as
>> of today). Is the discrepency in rate (30,000 rows/sec vs. 100 rows/sec)
>> reasonable? Is it because the indexes on the table are so deep that
>> updating them requires lots of page-at-a-time I/O?
>> Some useful info:
>> 1. The file being loaded is sorted in clustered index order
>> 2. The BULK INSERT has the ORDER clause
>> 3. The TABLOCK option on BULK INSERT makes no difference in performance
>> (lock is quickly escalated to table when the option is not specified)
>> 4. Database recovery mode is Bulk-Logged
>> 5. Doing row-by-row INSERTs takes 5700 seconds or so, i.e. BULK INSERT is
>> not even twice as fast
>> 6. SQL Server 2000 SP3 on Windows 2003
>> 7. SQL Server uses approximately 2 GB RAM (max for our OS + SS versions)
>> 8. Transaction log is on RAID 1, data on 12-spindle RAID 10,
>> non-clustered
>> index on 6-spindle RAID 10. All drives SCSI/320, 15k
>> 9. Perfmon does not indicate any resource being pinned (or even
>> challenged)
>> --
>> Scott Nichol
>|||Thanks for the responses. If the speed of the BULK INSERT were the only criterion, dropping the non-clustered index would be great.
Without it, but with the clustered index and data still there, I load at about 20,000 rows/sec. Unfortuntely, re-building an index
on a table with 500+ million records takes much longer than the 3000 seconds to load with the index intact.
What I am really wondering is whether anyone else has a similar scenario (clustered and non-clustered indexes on a large table) and
whether the affect on BULK INSERT performance is the same order of magnitude as I am experiencing (2 order of magnitude drop).
--
Scott Nichol
"Tom Moreau" <tom@.dont.spam.me.cips.ca> wrote in message news:%238rvAkIHIHA.1548@.TK2MSFTNGP05.phx.gbl...
> That was an excellent blog post but it was for SQL 2005. In SQL 2005,
> improvements were made in bulk copy performance, alleviating the need to
> drop indexes. The OP says that the system is SQL 2000 SP3a.
> Looking at the fact that the data file is sorted in clustered index order
> and there is one nonclustered index, I'd be inclined to drop the
> nonclustered index, do the bulk insert and then recreate the nonclustered
> index.
> I'm also curious as to whether or not the data to be inserted are spread
> throughout the table or are they added to the "end" of the table, i.e. are
> the clustered index keys to be inserted greater than all existing clustered
> index keys. If the data are spread throughout the table, I'd want to reduce
> the fill factor.
> --
> Tom
> ----
> Thomas A. Moreau, BSc, PhD, MCSE, MCDBA, MCITP, MCTS
> SQL Server MVP
> Toronto, ON Canada
> https://mvp.support.microsoft.com/profile/Tom.Moreau
>
> "Uri Dimant" <urid@.iscar.co.il> wrote in message
> news:u6AXXaIHIHA.700@.TK2MSFTNGP05.phx.gbl...
> Scott
> http://sqlblog.com/blogs/linchi_shea/archive/2007/08/27/performance-impact-the-most-optimal-insert-script-can-t-beat-bulkcopy.aspx
>
> "Scott Nichol" <snicholnews@.scottnichol.com> wrote in message
> news:uDASsWIHIHA.1548@.TK2MSFTNGP05.phx.gbl...
>> Question/sanity check about BULK INSERT performance.
>> I load a 300,000 row text file into my largest table every day. The file
>> loads into an empty table with the same structure and defaults but no
>> indexes in about 10 seconds, for a rate of 30,000 rows/sec. The same file
>> takes more like 3000 seconds to load into the "real" table, which has a
>> clustered index, one non-clustered index and 547,640,055 rows of data (as
>> of today). Is the discrepency in rate (30,000 rows/sec vs. 100 rows/sec)
>> reasonable? Is it because the indexes on the table are so deep that
>> updating them requires lots of page-at-a-time I/O?
>> Some useful info:
>> 1. The file being loaded is sorted in clustered index order
>> 2. The BULK INSERT has the ORDER clause
>> 3. The TABLOCK option on BULK INSERT makes no difference in performance
>> (lock is quickly escalated to table when the option is not specified)
>> 4. Database recovery mode is Bulk-Logged
>> 5. Doing row-by-row INSERTs takes 5700 seconds or so, i.e. BULK INSERT is
>> not even twice as fast
>> 6. SQL Server 2000 SP3 on Windows 2003
>> 7. SQL Server uses approximately 2 GB RAM (max for our OS + SS versions)
>> 8. Transaction log is on RAID 1, data on 12-spindle RAID 10, non-clustered
>> index on 6-spindle RAID 10. All drives SCSI/320, 15k
>> 9. Perfmon does not indicate any resource being pinned (or even
>> challenged)
>> --
>> Scott Nichol
>|||What fill factors do you have? I'm thinking that there may be a lot of page
splitting going on.
There could also be disk issues, but you did say that things looked OK
there - "Perfmon does not indicate any resource being pinned". What is the
average disk queue length? Is it > 24?
It may come to pass that you need to use horizontal partitioning and break
the table up.
--
Tom
----
Thomas A. Moreau, BSc, PhD, MCSE, MCDBA, MCITP, MCTS
SQL Server MVP
Toronto, ON Canada
https://mvp.support.microsoft.com/profile/Tom.Moreau
"Scott Nichol" <snicholnews@.scottnichol.com> wrote in message
news:OFCIl3KHIHA.4684@.TK2MSFTNGP06.phx.gbl...
Thanks for the responses. If the speed of the BULK INSERT were the only
criterion, dropping the non-clustered index would be great.
Without it, but with the clustered index and data still there, I load at
about 20,000 rows/sec. Unfortuntely, re-building an index
on a table with 500+ million records takes much longer than the 3000 seconds
to load with the index intact.
What I am really wondering is whether anyone else has a similar scenario
(clustered and non-clustered indexes on a large table) and
whether the affect on BULK INSERT performance is the same order of magnitude
as I am experiencing (2 order of magnitude drop).
--
Scott Nichol
"Tom Moreau" <tom@.dont.spam.me.cips.ca> wrote in message
news:%238rvAkIHIHA.1548@.TK2MSFTNGP05.phx.gbl...
> That was an excellent blog post but it was for SQL 2005. In SQL 2005,
> improvements were made in bulk copy performance, alleviating the need to
> drop indexes. The OP says that the system is SQL 2000 SP3a.
> Looking at the fact that the data file is sorted in clustered index order
> and there is one nonclustered index, I'd be inclined to drop the
> nonclustered index, do the bulk insert and then recreate the nonclustered
> index.
> I'm also curious as to whether or not the data to be inserted are spread
> throughout the table or are they added to the "end" of the table, i.e. are
> the clustered index keys to be inserted greater than all existing
> clustered
> index keys. If the data are spread throughout the table, I'd want to
> reduce
> the fill factor.
> --
> Tom
> ----
> Thomas A. Moreau, BSc, PhD, MCSE, MCDBA, MCITP, MCTS
> SQL Server MVP
> Toronto, ON Canada
> https://mvp.support.microsoft.com/profile/Tom.Moreau
>
> "Uri Dimant" <urid@.iscar.co.il> wrote in message
> news:u6AXXaIHIHA.700@.TK2MSFTNGP05.phx.gbl...
> Scott
> http://sqlblog.com/blogs/linchi_shea/archive/2007/08/27/performance-impact-the-most-optimal-insert-script-can-t-beat-bulkcopy.aspx
>
> "Scott Nichol" <snicholnews@.scottnichol.com> wrote in message
> news:uDASsWIHIHA.1548@.TK2MSFTNGP05.phx.gbl...
>> Question/sanity check about BULK INSERT performance.
>> I load a 300,000 row text file into my largest table every day. The file
>> loads into an empty table with the same structure and defaults but no
>> indexes in about 10 seconds, for a rate of 30,000 rows/sec. The same
>> file
>> takes more like 3000 seconds to load into the "real" table, which has a
>> clustered index, one non-clustered index and 547,640,055 rows of data (as
>> of today). Is the discrepency in rate (30,000 rows/sec vs. 100 rows/sec)
>> reasonable? Is it because the indexes on the table are so deep that
>> updating them requires lots of page-at-a-time I/O?
>> Some useful info:
>> 1. The file being loaded is sorted in clustered index order
>> 2. The BULK INSERT has the ORDER clause
>> 3. The TABLOCK option on BULK INSERT makes no difference in performance
>> (lock is quickly escalated to table when the option is not specified)
>> 4. Database recovery mode is Bulk-Logged
>> 5. Doing row-by-row INSERTs takes 5700 seconds or so, i.e. BULK INSERT is
>> not even twice as fast
>> 6. SQL Server 2000 SP3 on Windows 2003
>> 7. SQL Server uses approximately 2 GB RAM (max for our OS + SS versions)
>> 8. Transaction log is on RAID 1, data on 12-spindle RAID 10,
>> non-clustered
>> index on 6-spindle RAID 10. All drives SCSI/320, 15k
>> 9. Perfmon does not indicate any resource being pinned (or even
>> challenged)
>> --
>> Scott Nichol
>|||Good questions. Fill factor is 80% (the default?). The write pattern in general is no writes, then bursts every two minutes or so
at checkpoint. During bursts, write queue goes to 100 or more. Each burst is about 8 seconds. The read pattern is more consistent
and lighter, with read queue never higher than 6, but normally 0 (and that's with other concurrent activity in the database). The
reads are from the RAID 10 set with the non-clustered index. There is very little reading from the set with the clustered index.
On a development machine where I can run the BULK INSERT without other activity, the disk activity on the RAID set with the
clustered index consists of one write burst of 4 seconds. The disk activity on the RAID set with the non-clustered index shows
consistent reads for the rest of the load, with intermiittent write bursts.
The feeling we get from the reads is that SQL Server is reading through the existing index nodes to find leaves that need to be
updated (splitting when necessary, etc.).
The values in the non-clustered index are GUIDs and thus distributed throughout the index.
--
Scott Nichol
"Tom Moreau" <tom@.dont.spam.me.cips.ca> wrote in message news:%23HGP4$LHIHA.1316@.TK2MSFTNGP02.phx.gbl...
> What fill factors do you have? I'm thinking that there may be a lot of page
> splitting going on.
> There could also be disk issues, but you did say that things looked OK
> there - "Perfmon does not indicate any resource being pinned". What is the
> average disk queue length? Is it > 24?
> It may come to pass that you need to use horizontal partitioning and break
> the table up.
> --
> Tom
> ----
> Thomas A. Moreau, BSc, PhD, MCSE, MCDBA, MCITP, MCTS
> SQL Server MVP
> Toronto, ON Canada
> https://mvp.support.microsoft.com/profile/Tom.Moreau
>
> "Scott Nichol" <snicholnews@.scottnichol.com> wrote in message
> news:OFCIl3KHIHA.4684@.TK2MSFTNGP06.phx.gbl...
> Thanks for the responses. If the speed of the BULK INSERT were the only
> criterion, dropping the non-clustered index would be great.
> Without it, but with the clustered index and data still there, I load at
> about 20,000 rows/sec. Unfortuntely, re-building an index
> on a table with 500+ million records takes much longer than the 3000 seconds
> to load with the index intact.
> What I am really wondering is whether anyone else has a similar scenario
> (clustered and non-clustered indexes on a large table) and
> whether the affect on BULK INSERT performance is the same order of magnitude
> as I am experiencing (2 order of magnitude drop).
> --
> Scott Nichol
> "Tom Moreau" <tom@.dont.spam.me.cips.ca> wrote in message
> news:%238rvAkIHIHA.1548@.TK2MSFTNGP05.phx.gbl...
>> That was an excellent blog post but it was for SQL 2005. In SQL 2005,
>> improvements were made in bulk copy performance, alleviating the need to
>> drop indexes. The OP says that the system is SQL 2000 SP3a.
>> Looking at the fact that the data file is sorted in clustered index order
>> and there is one nonclustered index, I'd be inclined to drop the
>> nonclustered index, do the bulk insert and then recreate the nonclustered
>> index.
>> I'm also curious as to whether or not the data to be inserted are spread
>> throughout the table or are they added to the "end" of the table, i.e. are
>> the clustered index keys to be inserted greater than all existing
>> clustered
>> index keys. If the data are spread throughout the table, I'd want to
>> reduce
>> the fill factor.
>> --
>> Tom
>> ----
>> Thomas A. Moreau, BSc, PhD, MCSE, MCDBA, MCITP, MCTS
>> SQL Server MVP
>> Toronto, ON Canada
>> https://mvp.support.microsoft.com/profile/Tom.Moreau
>>
>> "Uri Dimant" <urid@.iscar.co.il> wrote in message
>> news:u6AXXaIHIHA.700@.TK2MSFTNGP05.phx.gbl...
>> Scott
>> http://sqlblog.com/blogs/linchi_shea/archive/2007/08/27/performance-impact-the-most-optimal-insert-script-can-t-beat-bulkcopy.aspx
>>
>> "Scott Nichol" <snicholnews@.scottnichol.com> wrote in message
>> news:uDASsWIHIHA.1548@.TK2MSFTNGP05.phx.gbl...
>> Question/sanity check about BULK INSERT performance.
>> I load a 300,000 row text file into my largest table every day. The file
>> loads into an empty table with the same structure and defaults but no
>> indexes in about 10 seconds, for a rate of 30,000 rows/sec. The same
>> file
>> takes more like 3000 seconds to load into the "real" table, which has a
>> clustered index, one non-clustered index and 547,640,055 rows of data (as
>> of today). Is the discrepency in rate (30,000 rows/sec vs. 100 rows/sec)
>> reasonable? Is it because the indexes on the table are so deep that
>> updating them requires lots of page-at-a-time I/O?
>> Some useful info:
>> 1. The file being loaded is sorted in clustered index order
>> 2. The BULK INSERT has the ORDER clause
>> 3. The TABLOCK option on BULK INSERT makes no difference in performance
>> (lock is quickly escalated to table when the option is not specified)
>> 4. Database recovery mode is Bulk-Logged
>> 5. Doing row-by-row INSERTs takes 5700 seconds or so, i.e. BULK INSERT is
>> not even twice as fast
>> 6. SQL Server 2000 SP3 on Windows 2003
>> 7. SQL Server uses approximately 2 GB RAM (max for our OS + SS versions)
>> 8. Transaction log is on RAID 1, data on 12-spindle RAID 10,
>> non-clustered
>> index on 6-spindle RAID 10. All drives SCSI/320, 15k
>> 9. Perfmon does not indicate any resource being pinned (or even
>> challenged)
>> --
>> Scott Nichol
>>
>|||The fact that you're using GUIDs says much. In SQL 2005, they have a new
function - NEWSEQUENTIALID() - that can alleviate the problems that you get
with NEWID(). NEWID() creates random GUIDs, so when you do INSERTs, they're
all over the place. I would try lowering the fill factor on the
nonclustered index. Start with 50 or 60. After the bulk insert, defrag the
index.
Try running DBCC SHOWCONTIG to report how fragmented that index is.
As for the clustered index, is it a monotonically increasing key? If so,
make the fill factor 100, since a lower fill factor won't buy you anything.
--
Tom
----
Thomas A. Moreau, BSc, PhD, MCSE, MCDBA, MCITP, MCTS
SQL Server MVP
Toronto, ON Canada
https://mvp.support.microsoft.com/profile/Tom.Moreau
"Scott Nichol" <snicholnews@.scottnichol.com> wrote in message
news:%23OYnHQQHIHA.2268@.TK2MSFTNGP02.phx.gbl...
Good questions. Fill factor is 80% (the default?). The write pattern in
general is no writes, then bursts every two minutes or so
at checkpoint. During bursts, write queue goes to 100 or more. Each burst
is about 8 seconds. The read pattern is more consistent
and lighter, with read queue never higher than 6, but normally 0 (and that's
with other concurrent activity in the database). The
reads are from the RAID 10 set with the non-clustered index. There is very
little reading from the set with the clustered index.
On a development machine where I can run the BULK INSERT without other
activity, the disk activity on the RAID set with the
clustered index consists of one write burst of 4 seconds. The disk activity
on the RAID set with the non-clustered index shows
consistent reads for the rest of the load, with intermiittent write bursts.
The feeling we get from the reads is that SQL Server is reading through the
existing index nodes to find leaves that need to be
updated (splitting when necessary, etc.).
The values in the non-clustered index are GUIDs and thus distributed
throughout the index.
--
Scott Nichol
"Tom Moreau" <tom@.dont.spam.me.cips.ca> wrote in message
news:%23HGP4$LHIHA.1316@.TK2MSFTNGP02.phx.gbl...
> What fill factors do you have? I'm thinking that there may be a lot of
> page
> splitting going on.
> There could also be disk issues, but you did say that things looked OK
> there - "Perfmon does not indicate any resource being pinned". What is
> the
> average disk queue length? Is it > 24?
> It may come to pass that you need to use horizontal partitioning and break
> the table up.
> --
> Tom
> ----
> Thomas A. Moreau, BSc, PhD, MCSE, MCDBA, MCITP, MCTS
> SQL Server MVP
> Toronto, ON Canada
> https://mvp.support.microsoft.com/profile/Tom.Moreau
>
> "Scott Nichol" <snicholnews@.scottnichol.com> wrote in message
> news:OFCIl3KHIHA.4684@.TK2MSFTNGP06.phx.gbl...
> Thanks for the responses. If the speed of the BULK INSERT were the only
> criterion, dropping the non-clustered index would be great.
> Without it, but with the clustered index and data still there, I load at
> about 20,000 rows/sec. Unfortuntely, re-building an index
> on a table with 500+ million records takes much longer than the 3000
> seconds
> to load with the index intact.
> What I am really wondering is whether anyone else has a similar scenario
> (clustered and non-clustered indexes on a large table) and
> whether the affect on BULK INSERT performance is the same order of
> magnitude
> as I am experiencing (2 order of magnitude drop).
> --
> Scott Nichol
> "Tom Moreau" <tom@.dont.spam.me.cips.ca> wrote in message
> news:%238rvAkIHIHA.1548@.TK2MSFTNGP05.phx.gbl...
>> That was an excellent blog post but it was for SQL 2005. In SQL 2005,
>> improvements were made in bulk copy performance, alleviating the need to
>> drop indexes. The OP says that the system is SQL 2000 SP3a.
>> Looking at the fact that the data file is sorted in clustered index order
>> and there is one nonclustered index, I'd be inclined to drop the
>> nonclustered index, do the bulk insert and then recreate the nonclustered
>> index.
>> I'm also curious as to whether or not the data to be inserted are spread
>> throughout the table or are they added to the "end" of the table, i.e.
>> are
>> the clustered index keys to be inserted greater than all existing
>> clustered
>> index keys. If the data are spread throughout the table, I'd want to
>> reduce
>> the fill factor.
>> --
>> Tom
>> ----
>> Thomas A. Moreau, BSc, PhD, MCSE, MCDBA, MCITP, MCTS
>> SQL Server MVP
>> Toronto, ON Canada
>> https://mvp.support.microsoft.com/profile/Tom.Moreau
>>
>> "Uri Dimant" <urid@.iscar.co.il> wrote in message
>> news:u6AXXaIHIHA.700@.TK2MSFTNGP05.phx.gbl...
>> Scott
>> http://sqlblog.com/blogs/linchi_shea/archive/2007/08/27/performance-impact-the-most-optimal-insert-script-can-t-beat-bulkcopy.aspx
>>
>> "Scott Nichol" <snicholnews@.scottnichol.com> wrote in message
>> news:uDASsWIHIHA.1548@.TK2MSFTNGP05.phx.gbl...
>> Question/sanity check about BULK INSERT performance.
>> I load a 300,000 row text file into my largest table every day. The
>> file
>> loads into an empty table with the same structure and defaults but no
>> indexes in about 10 seconds, for a rate of 30,000 rows/sec. The same
>> file
>> takes more like 3000 seconds to load into the "real" table, which has a
>> clustered index, one non-clustered index and 547,640,055 rows of data
>> (as
>> of today). Is the discrepency in rate (30,000 rows/sec vs. 100
>> rows/sec)
>> reasonable? Is it because the indexes on the table are so deep that
>> updating them requires lots of page-at-a-time I/O?
>> Some useful info:
>> 1. The file being loaded is sorted in clustered index order
>> 2. The BULK INSERT has the ORDER clause
>> 3. The TABLOCK option on BULK INSERT makes no difference in performance
>> (lock is quickly escalated to table when the option is not specified)
>> 4. Database recovery mode is Bulk-Logged
>> 5. Doing row-by-row INSERTs takes 5700 seconds or so, i.e. BULK INSERT
>> is
>> not even twice as fast
>> 6. SQL Server 2000 SP3 on Windows 2003
>> 7. SQL Server uses approximately 2 GB RAM (max for our OS + SS versions)
>> 8. Transaction log is on RAID 1, data on 12-spindle RAID 10,
>> non-clustered
>> index on 6-spindle RAID 10. All drives SCSI/320, 15k
>> 9. Perfmon does not indicate any resource being pinned (or even
>> challenged)
>> --
>> Scott Nichol
>>
>|||"Tom Moreau" <tom@.dont.spam.me.cips.ca> wrote in message news:eHmpCuTHIHA.4040@.TK2MSFTNGP02.phx.gbl...
> The fact that you're using GUIDs says much. In SQL 2005, they have a new
> function - NEWSEQUENTIALID() - that can alleviate the problems that you get
> with NEWID().
That sounds very useful. One more reason to look forward to migration next year.
> NEWID() creates random GUIDs, so when you do INSERTs, they're
> all over the place. I would try lowering the fill factor on the
> nonclustered index. Start with 50 or 60. After the bulk insert, defrag the
> index.
> Try running DBCC SHOWCONTIG to report how fragmented that index is.
Good idea: I will definitely check/alleviate fragmentation. I may toy with the fill factor on the dev machine first.
> As for the clustered index, is it a monotonically increasing key? If so,
> make the fill factor 100, since a lower fill factor won't buy you anything.
No such luck. It is 6 columns. Because the first column is a date, *most* of the data added each day is at the "end" of the index.
Unfortunately, there are some data laggards that arrive days later, so the BULK INSERT is not a pure append.
Thanks for all your help.
--
Scott Nichol
> --
> Tom
> ----
> Thomas A. Moreau, BSc, PhD, MCSE, MCDBA, MCITP, MCTS
> SQL Server MVP
> Toronto, ON Canada
> https://mvp.support.microsoft.com/profile/Tom.Moreau
>
> "Scott Nichol" <snicholnews@.scottnichol.com> wrote in message
> news:%23OYnHQQHIHA.2268@.TK2MSFTNGP02.phx.gbl...
> Good questions. Fill factor is 80% (the default?). The write pattern in
> general is no writes, then bursts every two minutes or so
> at checkpoint. During bursts, write queue goes to 100 or more. Each burst
> is about 8 seconds. The read pattern is more consistent
> and lighter, with read queue never higher than 6, but normally 0 (and that's
> with other concurrent activity in the database). The
> reads are from the RAID 10 set with the non-clustered index. There is very
> little reading from the set with the clustered index.
> On a development machine where I can run the BULK INSERT without other
> activity, the disk activity on the RAID set with the
> clustered index consists of one write burst of 4 seconds. The disk activity
> on the RAID set with the non-clustered index shows
> consistent reads for the rest of the load, with intermiittent write bursts.
> The feeling we get from the reads is that SQL Server is reading through the
> existing index nodes to find leaves that need to be
> updated (splitting when necessary, etc.).
> The values in the non-clustered index are GUIDs and thus distributed
> throughout the index.
> --
> Scott Nichol
> "Tom Moreau" <tom@.dont.spam.me.cips.ca> wrote in message
> news:%23HGP4$LHIHA.1316@.TK2MSFTNGP02.phx.gbl...
>> What fill factors do you have? I'm thinking that there may be a lot of
>> page
>> splitting going on.
>> There could also be disk issues, but you did say that things looked OK
>> there - "Perfmon does not indicate any resource being pinned". What is
>> the
>> average disk queue length? Is it > 24?
>> It may come to pass that you need to use horizontal partitioning and break
>> the table up.
>> --
>> Tom
>> ----
>> Thomas A. Moreau, BSc, PhD, MCSE, MCDBA, MCITP, MCTS
>> SQL Server MVP
>> Toronto, ON Canada
>> https://mvp.support.microsoft.com/profile/Tom.Moreau
>>
>> "Scott Nichol" <snicholnews@.scottnichol.com> wrote in message
>> news:OFCIl3KHIHA.4684@.TK2MSFTNGP06.phx.gbl...
>> Thanks for the responses. If the speed of the BULK INSERT were the only
>> criterion, dropping the non-clustered index would be great.
>> Without it, but with the clustered index and data still there, I load at
>> about 20,000 rows/sec. Unfortuntely, re-building an index
>> on a table with 500+ million records takes much longer than the 3000
>> seconds
>> to load with the index intact.
>> What I am really wondering is whether anyone else has a similar scenario
>> (clustered and non-clustered indexes on a large table) and
>> whether the affect on BULK INSERT performance is the same order of
>> magnitude
>> as I am experiencing (2 order of magnitude drop).
>> --
>> Scott Nichol
>> "Tom Moreau" <tom@.dont.spam.me.cips.ca> wrote in message
>> news:%238rvAkIHIHA.1548@.TK2MSFTNGP05.phx.gbl...
>> That was an excellent blog post but it was for SQL 2005. In SQL 2005,
>> improvements were made in bulk copy performance, alleviating the need to
>> drop indexes. The OP says that the system is SQL 2000 SP3a.
>> Looking at the fact that the data file is sorted in clustered index order
>> and there is one nonclustered index, I'd be inclined to drop the
>> nonclustered index, do the bulk insert and then recreate the nonclustered
>> index.
>> I'm also curious as to whether or not the data to be inserted are spread
>> throughout the table or are they added to the "end" of the table, i.e.
>> are
>> the clustered index keys to be inserted greater than all existing
>> clustered
>> index keys. If the data are spread throughout the table, I'd want to
>> reduce
>> the fill factor.
>> --
>> Tom
>> ----
>> Thomas A. Moreau, BSc, PhD, MCSE, MCDBA, MCITP, MCTS
>> SQL Server MVP
>> Toronto, ON Canada
>> https://mvp.support.microsoft.com/profile/Tom.Moreau
>>
>> "Uri Dimant" <urid@.iscar.co.il> wrote in message
>> news:u6AXXaIHIHA.700@.TK2MSFTNGP05.phx.gbl...
>> Scott
>> http://sqlblog.com/blogs/linchi_shea/archive/2007/08/27/performance-impact-the-most-optimal-insert-script-can-t-beat-bulkcopy.aspx
>>
>> "Scott Nichol" <snicholnews@.scottnichol.com> wrote in message
>> news:uDASsWIHIHA.1548@.TK2MSFTNGP05.phx.gbl...
>> Question/sanity check about BULK INSERT performance.
>> I load a 300,000 row text file into my largest table every day. The
>> file
>> loads into an empty table with the same structure and defaults but no
>> indexes in about 10 seconds, for a rate of 30,000 rows/sec. The same
>> file
>> takes more like 3000 seconds to load into the "real" table, which has a
>> clustered index, one non-clustered index and 547,640,055 rows of data
>> (as
>> of today). Is the discrepency in rate (30,000 rows/sec vs. 100
>> rows/sec)
>> reasonable? Is it because the indexes on the table are so deep that
>> updating them requires lots of page-at-a-time I/O?
>> Some useful info:
>> 1. The file being loaded is sorted in clustered index order
>> 2. The BULK INSERT has the ORDER clause
>> 3. The TABLOCK option on BULK INSERT makes no difference in performance
>> (lock is quickly escalated to table when the option is not specified)
>> 4. Database recovery mode is Bulk-Logged
>> 5. Doing row-by-row INSERTs takes 5700 seconds or so, i.e. BULK INSERT
>> is
>> not even twice as fast
>> 6. SQL Server 2000 SP3 on Windows 2003
>> 7. SQL Server uses approximately 2 GB RAM (max for our OS + SS versions)
>> 8. Transaction log is on RAID 1, data on 12-spindle RAID 10,
>> non-clustered
>> index on 6-spindle RAID 10. All drives SCSI/320, 15k
>> 9. Perfmon does not indicate any resource being pinned (or even
>> challenged)
>> --
>> Scott Nichol
>>
>>
>|||> The values in the non-clustered index are GUIDs and thus distributed
> throughout the index.
As Tom said, random GUID values are a real bulk insert performance killer.
Buffer efficiency with these random decreases proportionally with table size
until you get to the point of requiring at least one I/Os for every row
inserted.
One workaround I've used with SQL 2000 is to use a composite key instead of
GUID alone. I don't know of the details of your application but in my case,
I used a unique index consisting of FileId (a sequential integer) and
RecordId (a GUID value). This greatly improved bulk insert performance with
very large tables.
--
Hope this helps.
Dan Guzman
SQL Server MVP
"Scott Nichol" <snicholnews@.scottnichol.com> wrote in message
news:%23OYnHQQHIHA.2268@.TK2MSFTNGP02.phx.gbl...
> Good questions. Fill factor is 80% (the default?). The write pattern in
> general is no writes, then bursts every two minutes or so at checkpoint.
> During bursts, write queue goes to 100 or more. Each burst is about 8
> seconds. The read pattern is more consistent and lighter, with read queue
> never higher than 6, but normally 0 (and that's with other concurrent
> activity in the database). The reads are from the RAID 10 set with the
> non-clustered index. There is very little reading from the set with the
> clustered index.
> On a development machine where I can run the BULK INSERT without other
> activity, the disk activity on the RAID set with the clustered index
> consists of one write burst of 4 seconds. The disk activity on the RAID
> set with the non-clustered index shows consistent reads for the rest of
> the load, with intermiittent write bursts.
> The feeling we get from the reads is that SQL Server is reading through
> the existing index nodes to find leaves that need to be updated (splitting
> when necessary, etc.).
> The values in the non-clustered index are GUIDs and thus distributed
> throughout the index.
> --
> Scott Nichol
> "Tom Moreau" <tom@.dont.spam.me.cips.ca> wrote in message
> news:%23HGP4$LHIHA.1316@.TK2MSFTNGP02.phx.gbl...
>> What fill factors do you have? I'm thinking that there may be a lot of
>> page
>> splitting going on.
>> There could also be disk issues, but you did say that things looked OK
>> there - "Perfmon does not indicate any resource being pinned". What is
>> the
>> average disk queue length? Is it > 24?
>> It may come to pass that you need to use horizontal partitioning and
>> break
>> the table up.
>> --
>> Tom
>> ----
>> Thomas A. Moreau, BSc, PhD, MCSE, MCDBA, MCITP, MCTS
>> SQL Server MVP
>> Toronto, ON Canada
>> https://mvp.support.microsoft.com/profile/Tom.Moreau
>>
>> "Scott Nichol" <snicholnews@.scottnichol.com> wrote in message
>> news:OFCIl3KHIHA.4684@.TK2MSFTNGP06.phx.gbl...
>> Thanks for the responses. If the speed of the BULK INSERT were the only
>> criterion, dropping the non-clustered index would be great.
>> Without it, but with the clustered index and data still there, I load at
>> about 20,000 rows/sec. Unfortuntely, re-building an index
>> on a table with 500+ million records takes much longer than the 3000
>> seconds
>> to load with the index intact.
>> What I am really wondering is whether anyone else has a similar scenario
>> (clustered and non-clustered indexes on a large table) and
>> whether the affect on BULK INSERT performance is the same order of
>> magnitude
>> as I am experiencing (2 order of magnitude drop).
>> --
>> Scott Nichol
>> "Tom Moreau" <tom@.dont.spam.me.cips.ca> wrote in message
>> news:%238rvAkIHIHA.1548@.TK2MSFTNGP05.phx.gbl...
>> That was an excellent blog post but it was for SQL 2005. In SQL 2005,
>> improvements were made in bulk copy performance, alleviating the need to
>> drop indexes. The OP says that the system is SQL 2000 SP3a.
>> Looking at the fact that the data file is sorted in clustered index
>> order
>> and there is one nonclustered index, I'd be inclined to drop the
>> nonclustered index, do the bulk insert and then recreate the
>> nonclustered
>> index.
>> I'm also curious as to whether or not the data to be inserted are spread
>> throughout the table or are they added to the "end" of the table, i.e.
>> are
>> the clustered index keys to be inserted greater than all existing
>> clustered
>> index keys. If the data are spread throughout the table, I'd want to
>> reduce
>> the fill factor.
>> --
>> Tom
>> ----
>> Thomas A. Moreau, BSc, PhD, MCSE, MCDBA, MCITP, MCTS
>> SQL Server MVP
>> Toronto, ON Canada
>> https://mvp.support.microsoft.com/profile/Tom.Moreau
>>
>> "Uri Dimant" <urid@.iscar.co.il> wrote in message
>> news:u6AXXaIHIHA.700@.TK2MSFTNGP05.phx.gbl...
>> Scott
>> http://sqlblog.com/blogs/linchi_shea/archive/2007/08/27/performance-impact-the-most-optimal-insert-script-can-t-beat-bulkcopy.aspx
>>
>> "Scott Nichol" <snicholnews@.scottnichol.com> wrote in message
>> news:uDASsWIHIHA.1548@.TK2MSFTNGP05.phx.gbl...
>> Question/sanity check about BULK INSERT performance.
>> I load a 300,000 row text file into my largest table every day. The
>> file
>> loads into an empty table with the same structure and defaults but no
>> indexes in about 10 seconds, for a rate of 30,000 rows/sec. The same
>> file
>> takes more like 3000 seconds to load into the "real" table, which has a
>> clustered index, one non-clustered index and 547,640,055 rows of data
>> (as
>> of today). Is the discrepency in rate (30,000 rows/sec vs. 100
>> rows/sec)
>> reasonable? Is it because the indexes on the table are so deep that
>> updating them requires lots of page-at-a-time I/O?
>> Some useful info:
>> 1. The file being loaded is sorted in clustered index order
>> 2. The BULK INSERT has the ORDER clause
>> 3. The TABLOCK option on BULK INSERT makes no difference in performance
>> (lock is quickly escalated to table when the option is not specified)
>> 4. Database recovery mode is Bulk-Logged
>> 5. Doing row-by-row INSERTs takes 5700 seconds or so, i.e. BULK INSERT
>> is
>> not even twice as fast
>> 6. SQL Server 2000 SP3 on Windows 2003
>> 7. SQL Server uses approximately 2 GB RAM (max for our OS + SS
>> versions)
>> 8. Transaction log is on RAID 1, data on 12-spindle RAID 10,
>> non-clustered
>> index on 6-spindle RAID 10. All drives SCSI/320, 15k
>> 9. Perfmon does not indicate any resource being pinned (or even
>> challenged)
>> --
>> Scott Nichol
>>
>>
>|||Thanks. You confirm our suspicions about the GUID: we seem to have lots of single I/Os when the index gets updated. There's
virtually no I/O queuing and the machine does not seem taxed at all.
If we use the sequential integer approach, we only need the GUID in the index when we hit 4G rows and start duplicating the integer
values we generate (assuming we use negative as well as positive values), right?
Since the value of the index is used as a pseudo-FK from another table, we'd have some work there as well. Not to mention, of
course, adding the new sequential column to the existing half billion rows. Too bad we did not attack the problem earlier!
It sounds like NEWSEQUENTIALID() would help us tremendously if we moved to SQL Server 2005. Is that correct? We want to migrate
during 2008 anyway. If it might do at least nearly as well, it would certainly be easier than retrofitting with the sequential
integer.
--
Scott Nichol
"Dan Guzman" <guzmanda@.nospam-online.sbcglobal.net> wrote in message news:8187C560-FBFB-40E0-BE14-AB089E2936B3@.microsoft.com...
>> The values in the non-clustered index are GUIDs and thus distributed throughout the index.
> As Tom said, random GUID values are a real bulk insert performance killer. Buffer efficiency with these random decreases
> proportionally with table size until you get to the point of requiring at least one I/Os for every row inserted.
> One workaround I've used with SQL 2000 is to use a composite key instead of GUID alone. I don't know of the details of your
> application but in my case, I used a unique index consisting of FileId (a sequential integer) and RecordId (a GUID value). This
> greatly improved bulk insert performance with very large tables.
> --
> Hope this helps.
> Dan Guzman
> SQL Server MVP
> "Scott Nichol" <snicholnews@.scottnichol.com> wrote in message news:%23OYnHQQHIHA.2268@.TK2MSFTNGP02.phx.gbl...
>> Good questions. Fill factor is 80% (the default?). The write pattern in general is no writes, then bursts every two minutes or
>> so at checkpoint. During bursts, write queue goes to 100 or more. Each burst is about 8 seconds. The read pattern is more
>> consistent and lighter, with read queue never higher than 6, but normally 0 (and that's with other concurrent activity in the
>> database). The reads are from the RAID 10 set with the non-clustered index. There is very little reading from the set with the
>> clustered index.
>> On a development machine where I can run the BULK INSERT without other activity, the disk activity on the RAID set with the
>> clustered index consists of one write burst of 4 seconds. The disk activity on the RAID set with the non-clustered index shows
>> consistent reads for the rest of the load, with intermiittent write bursts.
>> The feeling we get from the reads is that SQL Server is reading through the existing index nodes to find leaves that need to be
>> updated (splitting when necessary, etc.).
>> The values in the non-clustered index are GUIDs and thus distributed throughout the index.
>> --
>> Scott Nichol
>> "Tom Moreau" <tom@.dont.spam.me.cips.ca> wrote in message news:%23HGP4$LHIHA.1316@.TK2MSFTNGP02.phx.gbl...
>> What fill factors do you have? I'm thinking that there may be a lot of page
>> splitting going on.
>> There could also be disk issues, but you did say that things looked OK
>> there - "Perfmon does not indicate any resource being pinned". What is the
>> average disk queue length? Is it > 24?
>> It may come to pass that you need to use horizontal partitioning and break
>> the table up.
>> --
>> Tom
>> ----
>> Thomas A. Moreau, BSc, PhD, MCSE, MCDBA, MCITP, MCTS
>> SQL Server MVP
>> Toronto, ON Canada
>> https://mvp.support.microsoft.com/profile/Tom.Moreau
>>
>> "Scott Nichol" <snicholnews@.scottnichol.com> wrote in message
>> news:OFCIl3KHIHA.4684@.TK2MSFTNGP06.phx.gbl...
>> Thanks for the responses. If the speed of the BULK INSERT were the only
>> criterion, dropping the non-clustered index would be great.
>> Without it, but with the clustered index and data still there, I load at
>> about 20,000 rows/sec. Unfortuntely, re-building an index
>> on a table with 500+ million records takes much longer than the 3000 seconds
>> to load with the index intact.
>> What I am really wondering is whether anyone else has a similar scenario
>> (clustered and non-clustered indexes on a large table) and
>> whether the affect on BULK INSERT performance is the same order of magnitude
>> as I am experiencing (2 order of magnitude drop).
>> --
>> Scott Nichol
>> "Tom Moreau" <tom@.dont.spam.me.cips.ca> wrote in message
>> news:%238rvAkIHIHA.1548@.TK2MSFTNGP05.phx.gbl...
>> That was an excellent blog post but it was for SQL 2005. In SQL 2005,
>> improvements were made in bulk copy performance, alleviating the need to
>> drop indexes. The OP says that the system is SQL 2000 SP3a.
>> Looking at the fact that the data file is sorted in clustered index order
>> and there is one nonclustered index, I'd be inclined to drop the
>> nonclustered index, do the bulk insert and then recreate the nonclustered
>> index.
>> I'm also curious as to whether or not the data to be inserted are spread
>> throughout the table or are they added to the "end" of the table, i.e. are
>> the clustered index keys to be inserted greater than all existing
>> clustered
>> index keys. If the data are spread throughout the table, I'd want to
>> reduce
>> the fill factor.
>> --
>> Tom
>> ----
>> Thomas A. Moreau, BSc, PhD, MCSE, MCDBA, MCITP, MCTS
>> SQL Server MVP
>> Toronto, ON Canada
>> https://mvp.support.microsoft.com/profile/Tom.Moreau
>>
>> "Uri Dimant" <urid@.iscar.co.il> wrote in message
>> news:u6AXXaIHIHA.700@.TK2MSFTNGP05.phx.gbl...
>> Scott
>> http://sqlblog.com/blogs/linchi_shea/archive/2007/08/27/performance-impact-the-most-optimal-insert-script-can-t-beat-bulkcopy.aspx
>>
>> "Scott Nichol" <snicholnews@.scottnichol.com> wrote in message
>> news:uDASsWIHIHA.1548@.TK2MSFTNGP05.phx.gbl...
>> Question/sanity check about BULK INSERT performance.
>> I load a 300,000 row text file into my largest table every day. The file
>> loads into an empty table with the same structure and defaults but no
>> indexes in about 10 seconds, for a rate of 30,000 rows/sec. The same
>> file
>> takes more like 3000 seconds to load into the "real" table, which has a
>> clustered index, one non-clustered index and 547,640,055 rows of data (as
>> of today). Is the discrepency in rate (30,000 rows/sec vs. 100 rows/sec)
>> reasonable? Is it because the indexes on the table are so deep that
>> updating them requires lots of page-at-a-time I/O?
>> Some useful info:
>> 1. The file being loaded is sorted in clustered index order
>> 2. The BULK INSERT has the ORDER clause
>> 3. The TABLOCK option on BULK INSERT makes no difference in performance
>> (lock is quickly escalated to table when the option is not specified)
>> 4. Database recovery mode is Bulk-Logged
>> 5. Doing row-by-row INSERTs takes 5700 seconds or so, i.e. BULK INSERT is
>> not even twice as fast
>> 6. SQL Server 2000 SP3 on Windows 2003
>> 7. SQL Server uses approximately 2 GB RAM (max for our OS + SS versions)
>> 8. Transaction log is on RAID 1, data on 12-spindle RAID 10,
>> non-clustered
>> index on 6-spindle RAID 10. All drives SCSI/320, 15k
>> 9. Perfmon does not indicate any resource being pinned (or even
>> challenged)
>> --
>> Scott Nichol
>>
>>
>>
>|||> If we use the sequential integer approach, we only need the GUID in the
> index when we hit 4G rows and start duplicating the integer values we
> generate (assuming we use negative as well as positive values), right?
You don't need the GUID in the index at all if you go with a bigint and let
SQL Server assign the values with IDENTITY. The application I mentioned in
my response provided both the integer and GUID values to SQL Server. This
allowed the application to assign the surrogate key and bulk insert into
related tables.
> Since the value of the index is used as a pseudo-FK from another table,
> we'd have some work there as well. Not to mention, of course, adding the
> new sequential column to the existing half billion rows. Too bad we did
> not attack the problem earlier!
I learned this same lesson the same lesson the hard way. I started with
only GUID clustered PK and found that bulk performance dropped at about 5M
rows. I then changed to a non-clustered index to mitigate the problem but
performance still dropped off with larger tables. Once I added the integer,
I was able to use the clustered primary key with good insert performance.
My experience was also a good lesson on PMON. The only way to really know
if you have a disk bottleneck with a batch application is to know the
maximum throughput the disk subsystem can sustain. Once the limit is hit,
the only ways to get more throughput is to either add hardware or revise the
indexing strategy/application design.
> It sounds like NEWSEQUENTIALID() would help us tremendously if we moved to
> SQL Server 2005. Is that correct? We want to migrate during 2008 anyway.
> If it might do at least nearly as well, it would certainly be easier than
> retrofitting with the sequential integer.
Yes, a NEWSEQUENTIALID() default constraint (available in SQL 2005 and
2008), a bigint identity, or the composite key method will certainly improve
insert performance.
--
Hope this helps.
Dan Guzman
SQL Server MVP
"Scott Nichol" <snicholnews@.scottnichol.com> wrote in message
news:Ovyu9OWHIHA.280@.TK2MSFTNGP03.phx.gbl...
> Thanks. You confirm our suspicions about the GUID: we seem to have lots
> of single I/Os when the index gets updated. There's virtually no I/O
> queuing and the machine does not seem taxed at all.
> If we use the sequential integer approach, we only need the GUID in the
> index when we hit 4G rows and start duplicating the integer values we
> generate (assuming we use negative as well as positive values), right?
> Since the value of the index is used as a pseudo-FK from another table,
> we'd have some work there as well. Not to mention, of course, adding the
> new sequential column to the existing half billion rows. Too bad we did
> not attack the problem earlier!
> It sounds like NEWSEQUENTIALID() would help us tremendously if we moved to
> SQL Server 2005. Is that correct? We want to migrate during 2008 anyway.
> If it might do at least nearly as well, it would certainly be easier than
> retrofitting with the sequential integer.
> --
> Scott Nichol
> "Dan Guzman" <guzmanda@.nospam-online.sbcglobal.net> wrote in message
> news:8187C560-FBFB-40E0-BE14-AB089E2936B3@.microsoft.com...
>> The values in the non-clustered index are GUIDs and thus distributed
>> throughout the index.
>> As Tom said, random GUID values are a real bulk insert performance
>> killer. Buffer efficiency with these random decreases proportionally with
>> table size until you get to the point of requiring at least one I/Os for
>> every row inserted.
>> One workaround I've used with SQL 2000 is to use a composite key instead
>> of GUID alone. I don't know of the details of your application but in my
>> case, I used a unique index consisting of FileId (a sequential integer)
>> and RecordId (a GUID value). This greatly improved bulk insert
>> performance with very large tables.
>> --
>> Hope this helps.
>> Dan Guzman
>> SQL Server MVP
>> "Scott Nichol" <snicholnews@.scottnichol.com> wrote in message
>> news:%23OYnHQQHIHA.2268@.TK2MSFTNGP02.phx.gbl...
>> Good questions. Fill factor is 80% (the default?). The write pattern
>> in general is no writes, then bursts every two minutes or so at
>> checkpoint. During bursts, write queue goes to 100 or more. Each burst
>> is about 8 seconds. The read pattern is more consistent and lighter,
>> with read queue never higher than 6, but normally 0 (and that's with
>> other concurrent activity in the database). The reads are from the RAID
>> 10 set with the non-clustered index. There is very little reading from
>> the set with the clustered index.
>> On a development machine where I can run the BULK INSERT without other
>> activity, the disk activity on the RAID set with the clustered index
>> consists of one write burst of 4 seconds. The disk activity on the RAID
>> set with the non-clustered index shows consistent reads for the rest of
>> the load, with intermiittent write bursts.
>> The feeling we get from the reads is that SQL Server is reading through
>> the existing index nodes to find leaves that need to be updated
>> (splitting when necessary, etc.).
>> The values in the non-clustered index are GUIDs and thus distributed
>> throughout the index.
>> --
>> Scott Nichol
>> "Tom Moreau" <tom@.dont.spam.me.cips.ca> wrote in message
>> news:%23HGP4$LHIHA.1316@.TK2MSFTNGP02.phx.gbl...
>> What fill factors do you have? I'm thinking that there may be a lot of
>> page
>> splitting going on.
>> There could also be disk issues, but you did say that things looked OK
>> there - "Perfmon does not indicate any resource being pinned". What is
>> the
>> average disk queue length? Is it > 24?
>> It may come to pass that you need to use horizontal partitioning and
>> break
>> the table up.
>> --
>> Tom
>> ----
>> Thomas A. Moreau, BSc, PhD, MCSE, MCDBA, MCITP, MCTS
>> SQL Server MVP
>> Toronto, ON Canada
>> https://mvp.support.microsoft.com/profile/Tom.Moreau
>>
>> "Scott Nichol" <snicholnews@.scottnichol.com> wrote in message
>> news:OFCIl3KHIHA.4684@.TK2MSFTNGP06.phx.gbl...
>> Thanks for the responses. If the speed of the BULK INSERT were the
>> only
>> criterion, dropping the non-clustered index would be great.
>> Without it, but with the clustered index and data still there, I load
>> at
>> about 20,000 rows/sec. Unfortuntely, re-building an index
>> on a table with 500+ million records takes much longer than the 3000
>> seconds
>> to load with the index intact.
>> What I am really wondering is whether anyone else has a similar
>> scenario
>> (clustered and non-clustered indexes on a large table) and
>> whether the affect on BULK INSERT performance is the same order of
>> magnitude
>> as I am experiencing (2 order of magnitude drop).
>> --
>> Scott Nichol
>> "Tom Moreau" <tom@.dont.spam.me.cips.ca> wrote in message
>> news:%238rvAkIHIHA.1548@.TK2MSFTNGP05.phx.gbl...
>> That was an excellent blog post but it was for SQL 2005. In SQL 2005,
>> improvements were made in bulk copy performance, alleviating the need
>> to
>> drop indexes. The OP says that the system is SQL 2000 SP3a.
>> Looking at the fact that the data file is sorted in clustered index
>> order
>> and there is one nonclustered index, I'd be inclined to drop the
>> nonclustered index, do the bulk insert and then recreate the
>> nonclustered
>> index.
>> I'm also curious as to whether or not the data to be inserted are
>> spread
>> throughout the table or are they added to the "end" of the table, i.e.
>> are
>> the clustered index keys to be inserted greater than all existing
>> clustered
>> index keys. If the data are spread throughout the table, I'd want to
>> reduce
>> the fill factor.
>> --
>> Tom
>> ----
>> Thomas A. Moreau, BSc, PhD, MCSE, MCDBA, MCITP, MCTS
>> SQL Server MVP
>> Toronto, ON Canada
>> https://mvp.support.microsoft.com/profile/Tom.Moreau
>>
>> "Uri Dimant" <urid@.iscar.co.il> wrote in message
>> news:u6AXXaIHIHA.700@.TK2MSFTNGP05.phx.gbl...
>> Scott
>> http://sqlblog.com/blogs/linchi_shea/archive/2007/08/27/performance-impact-the-most-optimal-insert-script-can-t-beat-bulkcopy.aspx
>>
>> "Scott Nichol" <snicholnews@.scottnichol.com> wrote in message
>> news:uDASsWIHIHA.1548@.TK2MSFTNGP05.phx.gbl...
>> Question/sanity check about BULK INSERT performance.
>> I load a 300,000 row text file into my largest table every day. The
>> file
>> loads into an empty table with the same structure and defaults but no
>> indexes in about 10 seconds, for a rate of 30,000 rows/sec. The same
>> file
>> takes more like 3000 seconds to load into the "real" table, which has
>> a
>> clustered index, one non-clustered index and 547,640,055 rows of data
>> (as
>> of today). Is the discrepency in rate (30,000 rows/sec vs. 100
>> rows/sec)
>> reasonable? Is it because the indexes on the table are so deep that
>> updating them requires lots of page-at-a-time I/O?
>> Some useful info:
>> 1. The file being loaded is sorted in clustered index order
>> 2. The BULK INSERT has the ORDER clause
>> 3. The TABLOCK option on BULK INSERT makes no difference in
>> performance
>> (lock is quickly escalated to table when the option is not specified)
>> 4. Database recovery mode is Bulk-Logged
>> 5. Doing row-by-row INSERTs takes 5700 seconds or so, i.e. BULK
>> INSERT is
>> not even twice as fast
>> 6. SQL Server 2000 SP3 on Windows 2003
>> 7. SQL Server uses approximately 2 GB RAM (max for our OS + SS
>> versions)
>> 8. Transaction log is on RAID 1, data on 12-spindle RAID 10,
>> non-clustered
>> index on 6-spindle RAID 10. All drives SCSI/320, 15k
>> 9. Perfmon does not indicate any resource being pinned (or even
>> challenged)
>> --
>> Scott Nichol
>>
>>
>>
>>
>|||Trying to avoid the pain of making any "real" change, I implemented my own alternative to NEWSEQUENTIALID(), and it ROCKS.
Using the function to generate GUID values (during data massaging) to load, BULK INSERT on my development machine took just 23
seconds to load each of two files, one 110,766 rows, the other 118,829. That's 4991 rows/sec on average. For comparison, 117,755
rows loaded in 14 seconds (8411 rows/sec) with the non-clustered index dropped. With "normal" GUIDs, I was loading just 90 rows/sec
on that rig. I look forward to seeing this on our production box.
The caveats are
1. I had dropped the non-clustered index to run the test without it, so the index that subsequent tests ran against is a fresh
re-build, presumably as unfragmented as possible.
2. The function I used to generate GUIDs was taken from an article on the web at
http://www.informit.com/articles/article.aspx?p=25862&seqNum=7. It combines 10 bytes of a normally generated GUID with 6 bytes of
the current time. The time portion provides the sequential-ness; the GUID part the random-ness. The algorithm seems very unlikely
to produce a duplicate, but I will review it to confirm before trying in production.
Thanks for all the great feedback.
--
Scott Nichol
"Dan Guzman" <guzmanda@.nospam-online.sbcglobal.net> wrote in message news:B0C99286-2CBB-4C26-AC61-B09D02965D41@.microsoft.com...
>> If we use the sequential integer approach, we only need the GUID in the index when we hit 4G rows and start duplicating the
>> integer values we generate (assuming we use negative as well as positive values), right?
> You don't need the GUID in the index at all if you go with a bigint and let SQL Server assign the values with IDENTITY. The
> application I mentioned in my response provided both the integer and GUID values to SQL Server. This allowed the application to
> assign the surrogate key and bulk insert into related tables.
>> Since the value of the index is used as a pseudo-FK from another table, we'd have some work there as well. Not to mention, of
>> course, adding the new sequential column to the existing half billion rows. Too bad we did not attack the problem earlier!
> I learned this same lesson the same lesson the hard way. I started with only GUID clustered PK and found that bulk performance
> dropped at about 5M rows. I then changed to a non-clustered index to mitigate the problem but performance still dropped off with
> larger tables. Once I added the integer, I was able to use the clustered primary key with good insert performance.
> My experience was also a good lesson on PMON. The only way to really know if you have a disk bottleneck with a batch application
> is to know the maximum throughput the disk subsystem can sustain. Once the limit is hit, the only ways to get more throughput is
> to either add hardware or revise the indexing strategy/application design.
>> It sounds like NEWSEQUENTIALID() would help us tremendously if we moved to SQL Server 2005. Is that correct? We want to migrate
>> during 2008 anyway. If it might do at least nearly as well, it would certainly be easier than retrofitting with the sequential
>> integer.
> Yes, a NEWSEQUENTIALID() default constraint (available in SQL 2005 and 2008), a bigint identity, or the composite key method will
> certainly improve insert performance.
> --
> Hope this helps.
> Dan Guzman
> SQL Server MVP
> "Scott Nichol" <snicholnews@.scottnichol.com> wrote in message news:Ovyu9OWHIHA.280@.TK2MSFTNGP03.phx.gbl...
>> Thanks. You confirm our suspicions about the GUID: we seem to have lots of single I/Os when the index gets updated. There's
>> virtually no I/O queuing and the machine does not seem taxed at all.
>> If we use the sequential integer approach, we only need the GUID in the index when we hit 4G rows and start duplicating the
>> integer values we generate (assuming we use negative as well as positive values), right?
>> Since the value of the index is used as a pseudo-FK from another table, we'd have some work there as well. Not to mention, of
>> course, adding the new sequential column to the existing half billion rows. Too bad we did not attack the problem earlier!
>> It sounds like NEWSEQUENTIALID() would help us tremendously if we moved to SQL Server 2005. Is that correct? We want to migrate
>> during 2008 anyway. If it might do at least nearly as well, it would certainly be easier than retrofitting with the sequential
>> integer.
>> --
>> Scott Nichol
>> "Dan Guzman" <guzmanda@.nospam-online.sbcglobal.net> wrote in message news:8187C560-FBFB-40E0-BE14-AB089E2936B3@.microsoft.com...
>> The values in the non-clustered index are GUIDs and thus distributed throughout the index.
>> As Tom said, random GUID values are a real bulk insert performance killer. Buffer efficiency with these random decreases
>> proportionally with table size until you get to the point of requiring at least one I/Os for every row inserted.
>> One workaround I've used with SQL 2000 is to use a composite key instead of GUID alone. I don't know of the details of your
>> application but in my case, I used a unique index consisting of FileId (a sequential integer) and RecordId (a GUID value). This
>> greatly improved bulk insert performance with very large tables.
>> --
>> Hope this helps.
>> Dan Guzman
>> SQL Server MVP
>> "Scott Nichol" <snicholnews@.scottnichol.com> wrote in message news:%23OYnHQQHIHA.2268@.TK2MSFTNGP02.phx.gbl...
>> Good questions. Fill factor is 80% (the default?). The write pattern in general is no writes, then bursts every two minutes
>> or so at checkpoint. During bursts, write queue goes to 100 or more. Each burst is about 8 seconds. The read pattern is more
>> consistent and lighter, with read queue never higher than 6, but normally 0 (and that's with other concurrent activity in the
>> database). The reads are from the RAID 10 set with the non-clustered index. There is very little reading from the set with
>> the clustered index.
>> On a development machine where I can run the BULK INSERT without other activity, the disk activity on the RAID set with the
>> clustered index consists of one write burst of 4 seconds. The disk activity on the RAID set with the non-clustered index shows
>> consistent reads for the rest of the load, with intermiittent write bursts.
>> The feeling we get from the reads is that SQL Server is reading through the existing index nodes to find leaves that need to be
>> updated (splitting when necessary, etc.).
>> The values in the non-clustered index are GUIDs and thus distributed throughout the index.
>> --
>> Scott Nichol
>> "Tom Moreau" <tom@.dont.spam.me.cips.ca> wrote in message news:%23HGP4$LHIHA.1316@.TK2MSFTNGP02.phx.gbl...
>> What fill factors do you have? I'm thinking that there may be a lot of page
>> splitting going on.
>> There could also be disk issues, but you did say that things looked OK
>> there - "Perfmon does not indicate any resource being pinned". What is the
>> average disk queue length? Is it > 24?
>> It may come to pass that you need to use horizontal partitioning and break
>> the table up.
>> --
>> Tom
>> ----
>> Thomas A. Moreau, BSc, PhD, MCSE, MCDBA, MCITP, MCTS
>> SQL Server MVP
>> Toronto, ON Canada
>> https://mvp.support.microsoft.com/profile/Tom.Moreau
>>
>> "Scott Nichol" <snicholnews@.scottnichol.com> wrote in message
>> news:OFCIl3KHIHA.4684@.TK2MSFTNGP06.phx.gbl...
>> Thanks for the responses. If the speed of the BULK INSERT were the only
>> criterion, dropping the non-clustered index would be great.
>> Without it, but with the clustered index and data still there, I load at
>> about 20,000 rows/sec. Unfortuntely, re-building an index
>> on a table with 500+ million records takes much longer than the 3000 seconds
>> to load with the index intact.
>> What I am really wondering is whether anyone else has a similar scenario
>> (clustered and non-clustered indexes on a large table) and
>> whether the affect on BULK INSERT performance is the same order of magnitude
>> as I am experiencing (2 order of magnitude drop).
>> --
>> Scott Nichol
>> "Tom Moreau" <tom@.dont.spam.me.cips.ca> wrote in message
>> news:%238rvAkIHIHA.1548@.TK2MSFTNGP05.phx.gbl...
>> That was an excellent blog post but it was for SQL 2005. In SQL 2005,
>> improvements were made in bulk copy performance, alleviating the need to
>> drop indexes. The OP says that the system is SQL 2000 SP3a.
>> Looking at the fact that the data file is sorted in clustered index order
>> and there is one nonclustered index, I'd be inclined to drop the
>> nonclustered index, do the bulk insert and then recreate the nonclustered
>> index.
>> I'm also curious as to whether or not the data to be inserted are spread
>> throughout the table or are they added to the "end" of the table, i.e. are
>> the clustered index keys to be inserted greater than all existing
>> clustered
>> index keys. If the data are spread throughout the table, I'd want to
>> reduce
>> the fill factor.
>> --
>> Tom
>> ----
>> Thomas A. Moreau, BSc, PhD, MCSE, MCDBA, MCITP, MCTS
>> SQL Server MVP
>> Toronto, ON Canada
>> https://mvp.support.microsoft.com/profile/Tom.Moreau
>>
>> "Uri Dimant" <urid@.iscar.co.il> wrote in message
>> news:u6AXXaIHIHA.700@.TK2MSFTNGP05.phx.gbl...
>> Scott
>> http://sqlblog.com/blogs/linchi_shea/archive/2007/08/27/performance-impact-the-most-optimal-insert-script-can-t-beat-bulkcopy.aspx
>>
>> "Scott Nichol" <snicholnews@.scottnichol.com> wrote in message
>> news:uDASsWIHIHA.1548@.TK2MSFTNGP05.phx.gbl...
>>> Question/sanity check about BULK INSERT performance.
>>>
>>> I load a 300,000 row text file into my largest table every day. The file
>>> loads into an empty table with the same structure and defaults but no
>>> indexes in about 10 seconds, for a rate of 30,000 rows/sec. The same
>>> file
>>> takes more like 3000 seconds to load into the "real" table, which has a
>>> clustered index, one non-clustered index and 547,640,055 rows of data (as
>>> of today). Is the discrepency in rate (30,000 rows/sec vs. 100 rows/sec)
>>> reasonable? Is it because the indexes on the table are so deep that
>>> updating them requires lots of page-at-a-time I/O?
>>>
>>> Some useful info:
>>>
>>> 1. The file being loaded is sorted in clustered index order
>>> 2. The BULK INSERT has the ORDER clause
>>> 3. The TABLOCK option on BULK INSERT makes no difference in performance
>>> (lock is quickly escalated to table when the option is not specified)
>>> 4. Database recovery mode is Bulk-Logged
>>> 5. Doing row-by-row INSERTs takes 5700 seconds or so, i.e. BULK INSERT is
>>> not even twice as fast
>>> 6. SQL Server 2000 SP3 on Windows 2003
>>> 7. SQL Server uses approximately 2 GB RAM (max for our OS + SS versions)
>>> 8. Transaction log is on RAID 1, data on 12-spindle RAID 10,
>>> non-clustered
>>> index on 6-spindle RAID 10. All drives SCSI/320, 15k
>>> 9. Perfmon does not indicate any resource being pinned (or even
>>> challenged)
>>>
>>> --
>>> Scott Nichol
>>>
>>
>>
>>
>>
>|||> The algorithm seems very unlikely to produce a duplicate, but I will
> review it to confirm before trying in production.
I'm glad you found an easy solution. As long as you use a primary key to
guarantee unique values, I don't see a problem using it.
--
Hope this helps.
Dan Guzman
SQL Server MVP
"Scott Nichol" <snicholnews@.scottnichol.com> wrote in message
news:eYlG65bHIHA.4272@.TK2MSFTNGP06.phx.gbl...
> Trying to avoid the pain of making any "real" change, I implemented my own
> alternative to NEWSEQUENTIALID(), and it ROCKS.
> Using the function to generate GUID values (during data massaging) to
> load, BULK INSERT on my development machine took just 23 seconds to load
> each of two files, one 110,766 rows, the other 118,829. That's 4991
> rows/sec on average. For comparison, 117,755 rows loaded in 14 seconds
> (8411 rows/sec) with the non-clustered index dropped. With "normal"
> GUIDs, I was loading just 90 rows/sec on that rig. I look forward to
> seeing this on our production box.
> The caveats are
> 1. I had dropped the non-clustered index to run the test without it, so
> the index that subsequent tests ran against is a fresh re-build,
> presumably as unfragmented as possible.
> 2. The function I used to generate GUIDs was taken from an article on the
> web at http://www.informit.com/articles/article.aspx?p=25862&seqNum=7. It
> combines 10 bytes of a normally generated GUID with 6 bytes of the current
> time. The time portion provides the sequential-ness; the GUID part the
> random-ness. The algorithm seems very unlikely to produce a duplicate,
> but I will review it to confirm before trying in production.
> Thanks for all the great feedback.
> --
> Scott Nichol
> "Dan Guzman" <guzmanda@.nospam-online.sbcglobal.net> wrote in message
> news:B0C99286-2CBB-4C26-AC61-B09D02965D41@.microsoft.com...
>> If we use the sequential integer approach, we only need the GUID in the
>> index when we hit 4G rows and start duplicating the integer values we
>> generate (assuming we use negative as well as positive values), right?
>> You don't need the GUID in the index at all if you go with a bigint and
>> let SQL Server assign the values with IDENTITY. The application I
>> mentioned in my response provided both the integer and GUID values to SQL
>> Server. This allowed the application to assign the surrogate key and
>> bulk insert into related tables.
>> Since the value of the index is used as a pseudo-FK from another table,
>> we'd have some work there as well. Not to mention, of course, adding
>> the new sequential column to the existing half billion rows. Too bad we
>> did not attack the problem earlier!
>> I learned this same lesson the same lesson the hard way. I started with
>> only GUID clustered PK and found that bulk performance dropped at about
>> 5M rows. I then changed to a non-clustered index to mitigate the problem
>> but performance still dropped off with larger tables. Once I added the
>> integer, I was able to use the clustered primary key with good insert
>> performance.
>> My experience was also a good lesson on PMON. The only way to really
>> know if you have a disk bottleneck with a batch application is to know
>> the maximum throughput the disk subsystem can sustain. Once the limit is
>> hit, the only ways to get more throughput is to either add hardware or
>> revise the indexing strategy/application design.
>> It sounds like NEWSEQUENTIALID() would help us tremendously if we moved
>> to SQL Server 2005. Is that correct? We want to migrate during 2008
>> anyway. If it might do at least nearly as well, it would certainly be
>> easier than retrofitting with the sequential integer.
>> Yes, a NEWSEQUENTIALID() default constraint (available in SQL 2005 and
>> 2008), a bigint identity, or the composite key method will certainly
>> improve insert performance.
>> --
>> Hope this helps.
>> Dan Guzman
>> SQL Server MVP
>> "Scott Nichol" <snicholnews@.scottnichol.com> wrote in message
>> news:Ovyu9OWHIHA.280@.TK2MSFTNGP03.phx.gbl...
>> Thanks. You confirm our suspicions about the GUID: we seem to have lots
>> of single I/Os when the index gets updated. There's virtually no I/O
>> queuing and the machine does not seem taxed at all.
>> If we use the sequential integer approach, we only need the GUID in the
>> index when we hit 4G rows and start duplicating the integer values we
>> generate (assuming we use negative as well as positive values), right?
>> Since the value of the index is used as a pseudo-FK from another table,
>> we'd have some work there as well. Not to mention, of course, adding
>> the new sequential column to the existing half billion rows. Too bad we
>> did not attack the problem earlier!
>> It sounds like NEWSEQUENTIALID() would help us tremendously if we moved
>> to SQL Server 2005. Is that correct? We want to migrate during 2008
>> anyway. If it might do at least nearly as well, it would certainly be
>> easier than retrofitting with the sequential integer.
>> --
>> Scott Nichol
>> "Dan Guzman" <guzmanda@.nospam-online.sbcglobal.net> wrote in message
>> news:8187C560-FBFB-40E0-BE14-AB089E2936B3@.microsoft.com...
>> The values in the non-clustered index are GUIDs and thus distributed
>> throughout the index.
>> As Tom said, random GUID values are a real bulk insert performance
>> killer. Buffer efficiency with these random decreases proportionally
>> with table size until you get to the point of requiring at least one
>> I/Os for every row inserted.
>> One workaround I've used with SQL 2000 is to use a composite key
>> instead of GUID alone. I don't know of the details of your application
>> but in my case, I used a unique index consisting of FileId (a
>> sequential integer) and RecordId (a GUID value). This greatly improved
>> bulk insert performance with very large tables.
>> --
>> Hope this helps.
>> Dan Guzman
>> SQL Server MVP
>> "Scott Nichol" <snicholnews@.scottnichol.com> wrote in message
>> news:%23OYnHQQHIHA.2268@.TK2MSFTNGP02.phx.gbl...
>> Good questions. Fill factor is 80% (the default?). The write pattern
>> in general is no writes, then bursts every two minutes or so at
>> checkpoint. During bursts, write queue goes to 100 or more. Each
>> burst is about 8 seconds. The read pattern is more consistent and
>> lighter, with read queue never higher than 6, but normally 0 (and
>> that's with other concurrent activity in the database). The reads are
>> from the RAID 10 set with the non-clustered index. There is very
>> little reading from the set with the clustered index.
>> On a development machine where I can run the BULK INSERT without other
>> activity, the disk activity on the RAID set with the clustered index
>> consists of one write burst of 4 seconds. The disk activity on the
>> RAID set with the non-clustered index shows consistent reads for the
>> rest of the load, with intermiittent write bursts.
>> The feeling we get from the reads is that SQL Server is reading
>> through the existing index nodes to find leaves that need to be
>> updated (splitting when necessary, etc.).
>> The values in the non-clustered index are GUIDs and thus distributed
>> throughout the index.
>> --
>> Scott Nichol
>> "Tom Moreau" <tom@.dont.spam.me.cips.ca> wrote in message
>> news:%23HGP4$LHIHA.1316@.TK2MSFTNGP02.phx.gbl...
>> What fill factors do you have? I'm thinking that there may be a lot
>> of page
>> splitting going on.
>> There could also be disk issues, but you did say that things looked
>> OK
>> there - "Perfmon does not indicate any resource being pinned". What
>> is the
>> average disk queue length? Is it > 24?
>> It may come to pass that you need to use horizontal partitioning and
>> break
>> the table up.
>> --
>> Tom
>> ----
>> Thomas A. Moreau, BSc, PhD, MCSE, MCDBA, MCITP, MCTS
>> SQL Server MVP
>> Toronto, ON Canada
>> https://mvp.support.microsoft.com/profile/Tom.Moreau
>>
>> "Scott Nichol" <snicholnews@.scottnichol.com> wrote in message
>> news:OFCIl3KHIHA.4684@.TK2MSFTNGP06.phx.gbl...
>> Thanks for the responses. If the speed of the BULK INSERT were the
>> only
>> criterion, dropping the non-clustered index would be great.
>> Without it, but with the clustered index and data still there, I load
>> at
>> about 20,000 rows/sec. Unfortuntely, re-building an index
>> on a table with 500+ million records takes much longer than the 3000
>> seconds
>> to load with the index intact.
>> What I am really wondering is whether anyone else has a similar
>> scenario
>> (clustered and non-clustered indexes on a large table) and
>> whether the affect on BULK INSERT performance is the same order of
>> magnitude
>> as I am experiencing (2 order of magnitude drop).
>> --
>> Scott Nichol
>> "Tom Moreau" <tom@.dont.spam.me.cips.ca> wrote in message
>> news:%238rvAkIHIHA.1548@.TK2MSFTNGP05.phx.gbl...
>>> That was an excellent blog post but it was for SQL 2005. In SQL
>>> 2005,
>>> improvements were made in bulk copy performance, alleviating the
>>> need to
>>> drop indexes. The OP says that the system is SQL 2000 SP3a.
>>>
>>> Looking at the fact that the data file is sorted in clustered index
>>> order
>>> and there is one nonclustered index, I'd be inclined to drop the
>>> nonclustered index, do the bulk insert and then recreate the
>>> nonclustered
>>> index.
>>>
>>> I'm also curious as to whether or not the data to be inserted are
>>> spread
>>> throughout the table or are they added to the "end" of the table,
>>> i.e. are
>>> the clustered index keys to be inserted greater than all existing
>>> clustered
>>> index keys. If the data are spread throughout the table, I'd want
>>> to
>>> reduce
>>> the fill factor.
>>>
>>> --
>>> Tom
>>>
>>> ----
>>> Thomas A. Moreau, BSc, PhD, MCSE, MCDBA, MCITP, MCTS
>>> SQL Server MVP
>>> Toronto, ON Canada
>>> https://mvp.support.microsoft.com/profile/Tom.Moreau
>>>
>>>
>>> "Uri Dimant" <urid@.iscar.co.il> wrote in message
>>> news:u6AXXaIHIHA.700@.TK2MSFTNGP05.phx.gbl...
>>> Scott
>>> http://sqlblog.com/blogs/linchi_shea/archive/2007/08/27/performance-impact-the-most-optimal-insert-script-can-t-beat-bulkcopy.aspx
>>>
>>>
>>>
>>> "Scott Nichol" <snicholnews@.scottnichol.com> wrote in message
>>> news:uDASsWIHIHA.1548@.TK2MSFTNGP05.phx.gbl...
>>> Question/sanity check about BULK INSERT performance.
>>>
>>> I load a 300,000 row text file into my largest table every day.
>>> The file
>>> loads into an empty table with the same structure and defaults but
>>> no
>>> indexes in about 10 seconds, for a rate of 30,000 rows/sec. The
>>> same
>>> file
>>> takes more like 3000 seconds to load into the "real" table, which
>>> has a
>>> clustered index, one non-clustered index and 547,640,055 rows of
>>> data (as
>>> of today). Is the discrepency in rate (30,000 rows/sec vs. 100
>>> rows/sec)
>>> reasonable? Is it because the indexes on the table are so deep
>>> that
>>> updating them requires lots of page-at-a-time I/O?
>>>
>>> Some useful info:
>>>
>>> 1. The file being loaded is sorted in clustered index order
>>> 2. The BULK INSERT has the ORDER clause
>>> 3. The TABLOCK option on BULK INSERT makes no difference in
>>> performance
>>> (lock is quickly escalated to table when the option is not
>>> specified)
>>> 4. Database recovery mode is Bulk-Logged
>>> 5. Doing row-by-row INSERTs takes 5700 seconds or so, i.e. BULK
>>> INSERT is
>>> not even twice as fast
>>> 6. SQL Server 2000 SP3 on Windows 2003
>>> 7. SQL Server uses approximately 2 GB RAM (max for our OS + SS
>>> versions)
>>> 8. Transaction log is on RAID 1, data on 12-spindle RAID 10,
>>> non-clustered
>>> index on 6-spindle RAID 10. All drives SCSI/320, 15k
>>> 9. Perfmon does not indicate any resource being pinned (or even
>>> challenged)
>>>
>>> --
>>> Scott Nichol
>>>
>>>
>>>
>>
>>
>>
>>
>|||On Nov 1, 1:49 pm, "Scott Nichol" <snicholn...@.scottnichol.com> wrote:
> Question/sanity check about BULK INSERT performance.
> I load a 300,000 row text file into my largest table every day. The file loads into an empty table with the same structure and
> defaults but no indexes in about 10 seconds, for a rate of 30,000 rows/sec. The same file takes more like 3000 seconds to load into
> the "real" table, which has a clustered index, one non-clustered index and 547,640,055 rows of data (as of today). Is the
> discrepency in rate (30,000 rows/sec vs. 100 rows/sec) reasonable? Is it because the indexes on the table are so deep that updating
> them requires lots of page-at-a-time I/O?
> Some useful info:
> 1. The file being loaded is sorted in clustered index order
> 2. The BULK INSERT has the ORDER clause
> 3. The TABLOCK option on BULK INSERT makes no difference in performance (lock is quickly escalated to table when the option is not
> specified)
> 4. Database recovery mode is Bulk-Logged
> 5. Doing row-by-row INSERTs takes 5700 seconds or so, i.e. BULK INSERT is not even twice as fast
> 6. SQL Server 2000 SP3 on Windows 2003
> 7. SQL Server uses approximately 2 GB RAM (max for our OS + SS versions)
> 8. Transaction log is on RAID 1, data on 12-spindle RAID 10, non-clustered index on 6-spindle RAID 10. All drives SCSI/320, 15k
> 9. Perfmon does not indicate any resource being pinned (or even challenged)
> --
> Scott Nichol
Hi Scott,
we have seen this behaviour for tables with clustered
indexes.....remember that when a table has a clustered index the data
is in the index...our testing showed that putting data into tables
with clustered indexs would result in 2-3x performance improvement for
dimensional style queries...alas....inserting rows into them is
terribly slow.....
Other tips...
1. Make sure you pad the index so there is somewere to put new rows.
2. We used multipart keys for the partitioning column based on country
and time...this is not 'standard' but it means that each query that
hits the database only goes to the country data it needs for the
period it needs. Create lots of small partitions and you will get
faster query response.
3. We also created _tmp tables and used the bulk loader to load the
temp table......we then perform a 'delete from _ins table where row
exists in _tmp table...and then we perform an insert into _ins table
select * from _tmp table. We found this the fastest way to get data
into a table with a clustered index. (In fact we wrote a utility to do
this so that it is only a one statement call in our schedule...we
prepared a Load Interface File and pass it's name to this utility...)
As you point out...dropping and recreating the index is a non
starter.
On other databases (notably oracle)...it is possible to drop indexes
on individual partitions so that the indexes that are on the
partitions that are likely to be updated can be dropped and then put
back and the indexes on partitions that are not changed can stay.
My thoughts on all this are that MSFT are not very serious about
making SQL Server a good dimensional model query database...they are
spending their money on SSAS and advising most people to go with
cubes....we found very little materials anywhere in the world as to
how to best set up the underlying tables for fast query performance in
a dimensional model......we had to do a fwe weeks of research to
figure out how to make it go as fast as it is possible to make it
go...
Good Luck
Peter Nolan
www.peternolan.com|||Thanks.
In other postings on this topic, I have discovered the root of the problem and a possible resolution.
The clustered index itself reduces speed very slightly compared to no indexes at all. The additional nonclustered index, the key
for which is a GUID, caused a two orders of magnitude reduction in speed. The solution I found with the least work is to generate
sequential GUIDs, somewhat like the NEWSEQUENTIALID() function in SQL Server 2005. Using these, the performance drop is less than
50% from the scenario with no nonclustered index.
--
Scott Nichol
"Peter Nolan" <peter@.peternolan.com> wrote in message news:1194107237.342015.198360@.v3g2000hsg.googlegroups.com...
> On Nov 1, 1:49 pm, "Scott Nichol" <snicholn...@.scottnichol.com> wrote:
>> Question/sanity check about BULK INSERT performance.
>> I load a 300,000 row text file into my largest table every day. The file loads into an empty table with the same structure and
>> defaults but no indexes in about 10 seconds, for a rate of 30,000 rows/sec. The same file takes more like 3000 seconds to load
>> into
>> the "real" table, which has a clustered index, one non-clustered index and 547,640,055 rows of data (as of today). Is the
>> discrepency in rate (30,000 rows/sec vs. 100 rows/sec) reasonable? Is it because the indexes on the table are so deep that
>> updating
>> them requires lots of page-at-a-time I/O?
>> Some useful info:
>> 1. The file being loaded is sorted in clustered index order
>> 2. The BULK INSERT has the ORDER clause
>> 3. The TABLOCK option on BULK INSERT makes no difference in performance (lock is quickly escalated to table when the option is
>> not
>> specified)
>> 4. Database recovery mode is Bulk-Logged
>> 5. Doing row-by-row INSERTs takes 5700 seconds or so, i.e. BULK INSERT is not even twice as fast
>> 6. SQL Server 2000 SP3 on Windows 2003
>> 7. SQL Server uses approximately 2 GB RAM (max for our OS + SS versions)
>> 8. Transaction log is on RAID 1, data on 12-spindle RAID 10, non-clustered index on 6-spindle RAID 10. All drives SCSI/320, 15k
>> 9. Perfmon does not indicate any resource being pinned (or even challenged)
>> --
>> Scott Nichol
> Hi Scott,
> we have seen this behaviour for tables with clustered
> indexes.....remember that when a table has a clustered index the data
> is in the index...our testing showed that putting data into tables
> with clustered indexs would result in 2-3x performance improvement for
> dimensional style queries...alas....inserting rows into them is
> terribly slow.....
> Other tips...
> 1. Make sure you pad the index so there is somewere to put new rows.
> 2. We used multipart keys for the partitioning column based on country
> and time...this is not 'standard' but it means that each query that
> hits the database only goes to the country data it needs for the
> period it needs. Create lots of small partitions and you will get
> faster query response.
> 3. We also created _tmp tables and used the bulk loader to load the
> temp table......we then perform a 'delete from _ins table where row
> exists in _tmp table...and then we perform an insert into _ins table
> select * from _tmp table. We found this the fastest way to get data
> into a table with a clustered index. (In fact we wrote a utility to do
> this so that it is only a one statement call in our schedule...we
> prepared a Load Interface File and pass it's name to this utility...)
> As you point out...dropping and recreating the index is a non
> starter.
> On other databases (notably oracle)...it is possible to drop indexes
> on individual partitions so that the indexes that are on the
> partitions that are likely to be updated can be dropped and then put
> back and the indexes on partitions that are not changed can stay.
> My thoughts on all this are that MSFT are not very serious about
> making SQL Server a good dimensional model query database...they are
> spending their money on SSAS and advising most people to go with
> cubes....we found very little materials anywhere in the world as to
> how to best set up the underlying tables for fast query performance in
> a dimensional model......we had to do a fwe weeks of research to
> figure out how to make it go as fast as it is possible to make it
> go...
> Good Luck
> Peter Nolan
> www.peternolan.com
>