Friday, February 24, 2012

Bulk Insert

Sorry for the piece-by-piece nature of this post, I moved it from a
dormant group to this one and it was 3 separate posts in the other
group. Anyway...

I'm trying to bulk insert a text file of 10 columns into a table with
12. How can I specify which columns to insert to? I think format
files are what I'm supposed to use, but I can't figure them out. I've
also tried using a view, as was suggested on one of the many websites
I've searched, but I clearly did that incorrectly as well.

----------
Update:

I'm working with the view, and I've got a view that contains the exact
columns from the table I want. I ran my bulk insert command,

BULK INSERT Test..IV10401 FROM 'c:\bulkInsertFile2.txt'

and it returned the error:

Server: Msg 2601, Level 14, State 3, Line 1
Cannot insert duplicate key row in object 'IV10401' with unique index
'AK2IV10401'.
Note: Bulk Insert through a view may result in base table default
values being ignored for NULL columns in the data file.
The statement has been terminated.

The AK2IV10401 key is comprised of 3 columns and I'm sure that each of
my rows in the insert file is unique according to those three. What
should I be checking for?

--------
Update 2:

I can only successfully insert 1 row. It seems to be treating each row

as an individual primary key when it should be treating them as
composite keys. I cannot alter the table, since it was created by
Great Plains Dynamics. Is there some sort of switch that I'm missing
in my bulk insert statement or can I suppress the errors?ughh, bulk insert is going to be the end of me. i just need to insert
two seperate .txt files into two separate tables, but i can't do it. i
did finally get one to go through by not demanding that the index
AK2IV10401 is unique. i don't know what problems that will cause for
me in the future, but i would at least like to get to the future to see
SOMETHING happen. As for the second table, there is a Primary Key that
is blocking all my progress and I don't know how to get around this.
Here is the error I get.

Violation of PRIMARY KEY constraint 'PKIV10402'. Cannot insert
duplicate key in object 'IV10402'.
The statement has been terminated.

I REALLY don't think I'm violating anything, so why is it kicking and
screaming at me?

-pk|||pk (philip.kluss@.gmail.com) writes:
> Sorry for the piece-by-piece nature of this post, I moved it from a
> dormant group to this one and it was 3 separate posts in the other
> group. Anyway...
> I'm trying to bulk insert a text file of 10 columns into a table with
> 12. How can I specify which columns to insert to? I think format
> files are what I'm supposed to use, but I can't figure them out. I've
> also tried using a view, as was suggested on one of the many websites
> I've searched, but I clearly did that incorrectly as well.

Format files are a bit tedious, but for 10 columns it's not that
bad. Here is an example:

8.0
10
1 SQLCHAR 0 0 "\t" 1 X ""
2 SQLCHAR 0 0 "\t" 2 X ""
...
10 SQLCHAR 0 0 "\r\n" 12 X ""

First row is the version of the file format. Next row lists the number
of fields in the bulk file. Next ten rows details the fields.

First column is the field number. Second column number is the data type.
This is always SQLCHAR for an ANSI file, and SQLNCHAR for a Unicode
file. Other data types applies only to binary data files.

Third column is prefix length. This is always 0 for a text file. Fourth
column is column length. Use this for fixed-length columns or leave 0.
Fifth column is the field terminator. In the example, I'm assuming
tab, save for the last row that is terminated by carriage return+line feed.

The sixth column is the column number for the table column in SQL Server.
This does not have to follow the numbers in the file. If the number is 0,
that file in the text file is not imported.

The seventh column is the column name, but this column is informational
only.

The eigth column specifies the collation. This is good if you need to
convert data between charsets when importing.

> ----------
> Update:
> I'm working with the view, and I've got a view that contains the exact
> columns from the table I want. I ran my bulk insert command,
> BULK INSERT Test..IV10401 FROM 'c:\bulkInsertFile2.txt'
> and it returned the error:
> Server: Msg 2601, Level 14, State 3, Line 1
> Cannot insert duplicate key row in object 'IV10401' with unique index
> 'AK2IV10401'.
> Note: Bulk Insert through a view may result in base table default
> values being ignored for NULL columns in the data file.
> The statement has been terminated.
> The AK2IV10401 key is comprised of 3 columns and I'm sure that each of
> my rows in the insert file is unique according to those three. What
> should I be checking for?

Maybe the keys are already in the table?

> --------
> Update 2:
> I can only successfully insert 1 row. It seems to be treating each row
> as an individual primary key when it should be treating them as
> composite keys. I cannot alter the table, since it was created by
> Great Plains Dynamics.

Without access to table definition, data file and the BCP command
it's hard to tell what is going on.

A common technique is to bulk load into a staging table, and then
clean up data there, before moving to the target table.

> Is there some sort of switch that I'm missing
> in my bulk insert statement or can I suppress the errors?

Well, you can use -b and -m to set the batch size, and increase the
number of errors permitted. See Books Online for further details.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||I'll describe the table the best I can. Then I'm headed home for the
day in hopes that someone can explain my error. I appreciate your
response Erland but I feel I haven't given enough info. So here it
goes.

I've got an empty table IV10402. I didn't create the table, Great
Plains Dynamics did, but I need to import to it. There are several
indexes and primary keys defined on it that I can't, with good
confidence, alter. It has 11 columns and I have a txt file that
consists of 10, so I've created a format file which I'm fairly certain
is correct. It appears as follows.

-----------

8.0
10
1 SQLCHAR 0 15 "\t" 1
PRCSHID SQL_Latin1_General_CP1_CI_AS
2 SQLCHAR 0 1 "\t" 2
EPITMTYP SQL_Latin1_General_CP1_CI_AS
3 SQLCHAR 0 31 "\t" 3
ITEMNMBR SQL_Latin1_General_CP1_CI_AS
4 SQLCHAR 0 9 "\t" 4
UOFM SQL_Latin1_General_CP1_CI_AS
5 SQLCHAR 0 41 "\t" 5
QTYFROM ""
6 SQLCHAR 0 41 "\t" 6
QTYTO ""
7 SQLCHAR 0 41 "\t" 7
PSITMVAL ""
8 SQLCHAR 0 41 "\t" 8
EQUOMQTY ""
9 SQLCHAR 0 41 "\t" 9
QTYBSUOM ""
10 SQLCHAR 0 12 "\n" 10
SEQNUMBR ""

--------

So when I run my bulk insert command, which appears as follows,

BULK INSERT Test..IV10402 FROM 'c:\bulkInsertFile.txt'
WITH (DATAFILETYPE='char',
ROWTERMINATOR='\n',
FORMATFILE='c:\iv10402.fmt')

I get this error,

Server: Msg 2627, Level 14, State 1, Line 1
Violation of PRIMARY KEY constraint 'PKIV10402'. Cannot insert
duplicate key in object 'IV10402'.
The statement has been terminated.

I then go to check what the Primary Key constraint PKIV10402 is and
here is the info that I can offer up.

It is not clustered. It is based off of 6 columns, PRCSHID, EPITMTYP,
ITEMNMBR, UOFM, QTYFROM, and QTYTO. The Create UNIQUE checkbox is
checked and greyed out, so I can't use my previous workaround of
checking the "Ignore Duplicate Key" box. Index Filegroup is PRIMARY.
Fill Factor is 90%.

One last thing is that the Table Identity Column for IV10402 is set to
DEX_ROW_ID, which happens to be the one column that I'm not inserting.
Is this a problem?

Again, this table is empty when I run this insert. I'm almost positive
that there aren't actually duplicate primary keys. Did Microsoft
really offer no way to find out which rows it feels are duplicates?
That seems very shortsighted in my opinion. Thanks for reading. I'll
see you all tomorrow.

-pk|||pk (philip.kluss@.gmail.com) writes:
> ughh, bulk insert is going to be the end of me. i just need to insert
> two seperate .txt files into two separate tables, but i can't do it. i
> did finally get one to go through by not demanding that the index
> AK2IV10401 is unique. i don't know what problems that will cause for
> me in the future, but i would at least like to get to the future to see
> SOMETHING happen. As for the second table, there is a Primary Key that
> is blocking all my progress and I don't know how to get around this.
> Here is the error I get.
> Violation of PRIMARY KEY constraint 'PKIV10402'. Cannot insert
> duplicate key in object 'IV10402'.
> The statement has been terminated.
> I REALLY don't think I'm violating anything, so why is it kicking and
> screaming at me?

Because you are violating something.

Get data into a keyless staging table, and to a SELECT WHERE EXISTS
to find clashes with existing data, and "SELECT keycol, COUNT(*) FROM
tbl GROUP BY keyol HAVING COUNT(*) > 1" to find the dups in the file.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||pk (philip.kluss@.gmail.com) writes:
> It is not clustered. It is based off of 6 columns, PRCSHID, EPITMTYP,
> ITEMNMBR, UOFM, QTYFROM, and QTYTO. The Create UNIQUE checkbox is
> checked and greyed out, so I can't use my previous workaround of
> checking the "Ignore Duplicate Key" box. Index Filegroup is PRIMARY.
> Fill Factor is 90%.
> One last thing is that the Table Identity Column for IV10402 is set to
> DEX_ROW_ID, which happens to be the one column that I'm not inserting.
> Is this a problem?
> Again, this table is empty when I run this insert. I'm almost positive
> that there aren't actually duplicate primary keys. Did Microsoft
> really offer no way to find out which rows it feels are duplicates?

Either there are duplicates in the file, or the format file is incorrect
somehow, so that data ends up in the wrong columns.

Create a copy of the table, but put no indexes or constraints on the
table. Bulk load data into that table. Check for duplicate with

SELECT col1, col2, ... COUNT(*)
FROM tbl
GROUP BY col1, col2, ...
HAVING COUNT(*) > 1

Also, do something like "SELECT TOP 100 * FROM tbl" to see whether the
data makes any sense.

You can use the Object Broswer in Query Analyzer to create a script
for the table. Find the table, and scripting options is on the context
menu. Create the table in tempdb.

As for finding which rows that are problematic directly, BULK INSERT
does not seem to offer this option. BCP does, but I think that error
file covers only format errors, not insertion errors.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||Erland,

You are a lifesaver. It took me a while to figure out what that SQL
statement you were telling me to use was supposed to do, but as soon as
I did, it found the 2 lines out of 15000 that managed to trip the
duplicate key error. I've since corrected it and am feeling much more
confident in my troubleshooting skills for the future. Thank you very
much.

-pk

No comments:

Post a Comment