We have a flat file with a header record. A simplified example follows:
HEADER
John,Smith
Fred,Bloggs
But when we set bulk insert to skip the first row it actually skips the
John,Smith row as well. Run the script below after saving the CSV data out
to a file called user.csv...
create table [User]
(
FirstName varchar ( 32 ),
LastName varchar ( 32 )
)
bulk insert [User] from 'user.csv'
with
(
FIRSTROW = 2,
DATAFILETYPE = 'char',
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
)
select * from [user] -- only one record
Please help!!
McG

[url]http://mcg

McG

[url]http://mcg

"McG

news:#1#7JPAVGHA.5332@.tk2msftngp13.phx.gbl...
> Hi. Not sure if this is a bug with bulk insert or standard behaviour.
> We have a flat file with a header record. A simplified example follows:
> HEADER
> John,Smith
> Fred,Bloggs
> But when we set bulk insert to skip the first row it actually skips the
> John,Smith row as well. Run the script below after saving the CSV data out
> to a file called user.csv...
> create table [User]
> (
> FirstName varchar ( 32 ),
> LastName varchar ( 32 )
> )
> bulk insert [User] from 'user.csv'
> with
> (
> FIRSTROW = 2,
> DATAFILETYPE = 'char',
> FIELDTERMINATOR = ',',
> ROWTERMINATOR = '\n'
> )
> select * from [user] -- only one record
> Please help!!
> --
> McG

> [url]http://mcg

>
>|||Some solutions are
1. Pre-process the file to remove the header line (use a DOS
or Unix-like utility, for example).
2. Bulk insert the data into a staging table with one column,
then parse the data into columns using SQL instead of the
bulk insert process.
3. Use SQL Server Integration Services. I'm no expert, but I believe
it can skip (or import into a different table) rows based on the
contents of the row, which bulk insert cannot do.
The FIRSTROW parameter for bulk insert, as you have discovered,
is not a FIRST_LINE_OF_THE_TEXT_FILE parameter.
Steve Kass
Drew University
McG

>Anyone with ideas on how to get around this limitation?
>--
>McG

>[url]http://mcg

>
>"McG

>news:#1#7JPAVGHA.5332@.tk2msftngp13.phx.gbl...
>
>
>|||McG

Alas, this is standard behaviour.
BULK INSERT and BCP live in a very squared world. They don't think in
terms of lines, they think in terms of records and fields. (But calls them
rows and columns to

The first field is from the beginning of file until the terminator for
the first field. (For terminated fields. Fields can also be fix-length,
or include a length-prefix.) The second field lasts until the terminator
for that field. And so on, up to the last field which lasts until the
terminator for that field. That terminator is called a "row terminator",
but as long as BCP is not reading the last field, the terminator carries
no meaning. Once the last field has been read, bulk copy starts with
the first field again.
This has some advantages and di

you can handle embedded row terminators, for instance embedded newlines
(except in the last field). The bad part is that you can't skip a header
which does not conform with the file. Because as far as BCP is concerned,
that header is part of the first field.
The workaround is to run a program first that strips the header, which
for a large file is not very appealing.
I know I have subumitted a suggestion for an improvement to bulk copy
for a future release of SQL Server, so that you should be able to describe
a header to be skipped. The suggestion is on
http://lab.msdn.microsoft.com/productfeedback/ but alas, the site
appears to experience some problems right now, so I cannot give the
exact link. But when it's back, search for by name, "bulk" and SQL 2005
to find it and vote for it.
--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx
No comments:
Post a Comment