Saturday, February 25, 2012

BULK INSERT /BCP with UDT

Dear all,

I tried to do BULK INSERT / BCP of a table with a UDT column there.

The data file looks like this: (Tab, \n, second column is the UDT column)

9556987 C,C,0.84 4974588
9845355 C,C,0.84 4974588
9556988 A,C,0.82 4974589
9845356 A,C,0.82 4974589
9556989 A,T,0.90 4974590

This is the SQL:

BULK INSERT dbo.TEST
FROM 'E:\My Documents\Visual Studio 2005\Projects\Genotype\Genotype_UDT\BULK_in1.txt'
WITH
(
KEEPIDENTITY,
KEEPNULLS,
ROWS_PER_BATCH = 5,
TABLOCK
)

This is the error msg:

Msg 4864, Level 16, State 1, Line 1
Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 1, column 2 (GENOTYPE).

My UDT works fine if I do this:

INSERT INTO TEST VALUES('A,A,0.1101', 'Whatever')

Or INSERT INTO SELECT ...

So what's the problem?

Does SQL CLR engine call UDT Parse(...) for BULK INSERT? If not, must we provide the UDT value in binary like 0x11222ccffdedd, not a string?It has been too long nobody answer this question, so bring it up again. Also turn the question in a simple way:

How to do BULK load for UDT?

Thanks everybody!|||

Finally got the solution, and I spend some time to blog it here:

http://sqlblogcasts.com/blogs/dong/archive/2006/11/27/how-to-bulk-insert-your-user-defined-types-udt.aspx

Comments please, and your view on UDT or CLR in general.

dong

|||I have same problem and i am getting nuts about it.

Target table:
CREATE TABLE [dbo].[organiz] (
[cislo_subjektu] [int] NULL ,
[reference_subjektu] [varchar] (30) COLLATE SQL_Czech_CP1250_CI_AS NULL ,
[nazev_subjektu] [varchar] (100) COLLATE SQL_Czech_CP1250_CI_AS NULL ,
[nazev_zkraceny] [varchar] (40) COLLATE SQL_Czech_CP1250_CI_AS NULL ,
[ulice] [char] (40) COLLATE SQL_Czech_CP1250_CI_AS NULL ,
[psc] [char] (15) COLLATE SQL_Czech_CP1250_CI_AS NULL ,
[misto] [char] (40) COLLATE SQL_Czech_CP1250_CI_AS NULL ,
[ico] [char] (15) COLLATE SQL_Czech_CP1250_CI_AS NULL ,
[dic] [char] (15) COLLATE SQL_Czech_CP1250_CI_AS NULL ,
[uverovy_limit] [money] NULL ,
[stav_limitu] [money] NULL
) ON [PRIMARY]
GO

Insert command:

BULK INSERT pokus.dbo.organiz
FROM 'D:\organizace.xml' /* my file */
WITH (FORMATFILE = 'D:\pok.xml')

Formatfile:

<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="1" xsi:type="CharTerm" TERMINATOR="," MAX_LENGTH="12"/>
<FIELD ID="2" xsi:type="CharTerm" TERMINATOR="," MAX_LENGTH="30" COLLATION="SQL_Czech_CP1250_CI_AS"/>
<FIELD ID="3" xsi:type="CharTerm" TERMINATOR="," MAX_LENGTH="100" COLLATION="SQL_Czech_CP1250_CI_AS"/>
<FIELD ID="4" xsi:type="CharTerm" TERMINATOR="," MAX_LENGTH="40" COLLATION="SQL_Czech_CP1250_CI_AS"/>
<FIELD ID="5" xsi:type="CharTerm" TERMINATOR="," MAX_LENGTH="40" COLLATION="SQL_Czech_CP1250_CI_AS"/>
<FIELD ID="6" xsi:type="CharTerm" TERMINATOR="," MAX_LENGTH="15" COLLATION="SQL_Czech_CP1250_CI_AS"/>
<FIELD ID="7" xsi:type="CharTerm" TERMINATOR="," MAX_LENGTH="40" COLLATION="SQL_Czech_CP1250_CI_AS"/>
<FIELD ID="8" xsi:type="CharTerm" TERMINATOR="," MAX_LENGTH="15" COLLATION="SQL_Czech_CP1250_CI_AS"/>
<FIELD ID="9" xsi:type="CharTerm" TERMINATOR="," MAX_LENGTH="15" COLLATION="SQL_Czech_CP1250_CI_AS"/>
<FIELD ID="10" xsi:type="CharTerm" TERMINATOR="," MAX_LENGTH="30"/>
<FIELD ID="11" xsi:type="CharTerm" TERMINATOR="\r\n" MAX_LENGTH="30"/>
</RECORD>
<ROW>
<COLUMN SOURCE="1" NAME="cislo_subjektu" xsi:type="SQLINT"/>
<COLUMN SOURCE="2" NAME="reference_subjektu" xsi:type="SQLBINARY"/>
<COLUMN SOURCE="3" NAME="nazev_subjektu" xsi:type="SQLBINARY"/>
<COLUMN SOURCE="4" NAME="nazev_zkraceny" xsi:type="SQLBINARY"/>
<COLUMN SOURCE="5" NAME="ulice" xsi:type="SQLBINARY"/>
<COLUMN SOURCE="6" NAME="psc" xsi:type="SQLBINARY"/>
<COLUMN SOURCE="7" NAME="misto" xsi:type="SQLBINARY"/>
<COLUMN SOURCE="8" NAME="ico" xsi:type="SQLBINARY"/>
<COLUMN SOURCE="9" NAME="dic" xsi:type="SQLBINARY"/>
<COLUMN SOURCE="10" NAME="uverovy_limit" xsi:type="SQLBINARY"/>
<COLUMN SOURCE="11" NAME="stav_limitu" xsi:type="SQLBINARY"/>
</ROW>
</BCPFORMAT>

All i get is

Bulk load data conversion error (type mismatch or invalid character for the specified codepage)

no matter what i have changed(so far) in format file.|||

Hi, Pkv

I'm not sure your problem is the same as mine. What I asked and later found the solution is about the UDT bulk load, and the fundamental issue is UDT's Parse(String) method won't be called during bulk op.

By looking at your data, I didn't see any UDT in your table. So I reckon your problem could lie somewhere between the data file, format file and codepage.

If you can show a raw data file segment, and the real error msg, I'll try to replicate on my machine.

|||Here is the sample of the file:

<?xml version="1.0" encoding="windows-1250"?>
<root>
<organizace>
<cislo_subjektu>65673</cislo_subjektu>
<reference_subjektu>0000002135</reference_subjektu>
<nazev_subjektu>Martin Prudky</nazev_subjektu>
<nazev_zkraceny>PRUDKY LOUKA U LITV.</nazev_zkraceny>
<ulice>U nádra?í</ulice>
<psc>435 33</psc>
<misto>Louka u Litvínova</misto>
<ico>15137970</ico>
<dic>CZ6011151003</dic>
<uverovy_limit>200000</uverovy_limit>
<stav_limitu>172081.0000</stav_limitu>
</organizace>
<organizace>
<cislo_subjektu>67125</cislo_subjektu>
<reference_subjektu>0000003301</reference_subjektu>
<nazev_subjektu>Lesní dru?stvo obcí Ledec nad Sázavou</nazev_subjektu>
<nazev_zkraceny>LESNI DRUZSTVO LEDEC</nazev_zkraceny>
<ulice>Hnevkovice 56</ulice>
<psc>582 94</psc>
<misto>Ledec nad Sázavou</misto>
<ico>25251937</ico>
<dic>CZ25251937</dic>
<uverovy_limit>50000</uverovy_limit>
<stav_limitu>0.0000</stav_limitu>
</organizace>
<organizace>
<cislo_subjektu>67159</cislo_subjektu>
<reference_subjektu>0000004170</reference_subjektu>
<nazev_subjektu>D.D.L.Servis, s.r.o.</nazev_subjektu>
<nazev_zkraceny>D.D.L. SERVIS LOMNICE</nazev_zkraceny>
<ulice>?elechy 37</ulice>
<psc>512 51</psc>
<misto>Lomnice nad Popelkou</misto>
<ico>25291521</ico>
<dic>CZ25291521</dic>
<uverovy_limit>50000</uverovy_limit>
<stav_limitu>0.0000</stav_limitu>
</organizace>
.
.
.

It goes for next 30 000 table rows.

Here is list of errors:
Msg 4864, Level 16, State 1, Line 1
Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 1, column 1 (cislo_subjektu).
Msg 4864, Level 16, State 1, Line 1
Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 2, column 1 (cislo_subjektu).
Msg 4864, Level 16, State 1, Line 1
Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 3, column 1 (cislo_subjektu).
Msg 4864, Level 16, State 1, Line 1
Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 4, column 1 (cislo_subjektu).
Msg 4864, Level 16, State 1, Line 1
Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 5, column 1 (cislo_subjektu).
Msg 4864, Level 16, State 1, Line 1
Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 6, column 1 (cislo_subjektu).
Msg 4864, Level 16, State 1, Line 1
Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 7, column 1 (cislo_subjektu).
Msg 4864, Level 16, State 1, Line 1
Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 8, column 1 (cislo_subjektu).
Msg 4864, Level 16, State 1, Line 1
Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 9, column 1 (cislo_subjektu).
Msg 4864, Level 16, State 1, Line 1
Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 10, column 1 (cislo_subjektu).
Msg 4864, Level 16, State 1, Line 1
Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 11, column 1 (cislo_subjektu).
Msg 4865, Level 16, State 1, Line 1
Cannot bulk load because the maximum number of errors (10) was exceeded.
Msg 7399, Level 16, State 1, Line 1
The OLE DB provider "BULK" for linked server "(null)" reported an error. The provider did not give any information about the error.
Msg 7330, Level 16, State 2, Line 1
Cannot fetch a row from OLE DB provider "BULK" for linked server "(null)".|||

Hi, pkv,

Sorry for the late reply. After looking at your data, I'm now sure your problem isn't same as mine.

Your task is to BULK INSERT XML data file into a relational table. Also called 'shredding'.

There are 3 ways to do this: (from the MCTS trainning kit book)

1, Use OPENXML and the XML stored procedures

2, Use the XML data type's node() method

3, Use the SQLXML API.

Here are my code for method number 2:

DECLARE @.X XML

SELECT @.X = X.C

FROM OPENROWSET(BULK

'E:\My Documents\SQL Server Management Studio\Projects\pkv_problem\organizace.xml',

SINGLE_BLOB) AS X(C)

INSERT INTO dbo.organiz

SELECT

C.value('(./cislo_subjektu/text())[1]', 'int') AS 'cislo_subjektu'

,C.value('(./reference_subjektu/text())[1]', 'varchar(30)') AS 'reference_subjektu'

,C.value('(./nazev_subjektu/text())[1]', 'varchar(100)') AS 'nazev_subjektu'

,C.value('(./nazev_zkraceny/text())[1]', 'varchar(40)') AS 'nazev_zkraceny'

,C.value('(./ulice/text())[1]', 'char(40)') AS 'ulice'

,C.value('(./psc/text())[1]', 'char(15)') AS 'psc'

,C.value('(./misto/text())[1]', 'char(40)') AS 'misto'

,C.value('(./ico/text())[1]', 'char(15)') AS 'ico'

,C.value('(./dic/text())[1]', 'char(15)') AS 'dic'

,C.value('(./uverovy_limit/text())[1]', 'money') AS 'uverovy_limit'

,C.value('(./stav_limitu/text())[1]', 'money') AS 'stav_limitu'

FROM @.X.nodes('/root/organizace') T(C)

As you can see, this solution has nothing to do with a format file here.

(The SELECT clause in the INSERT INTO statement can be replaced by this one:

SELECT

C.value('*[1]', 'int') AS 'cislo_subjektu'

,C.value('*[2]', 'varchar(30)') AS 'reference_subjektu'

,C.value('*[3]', 'varchar(100)') AS 'nazev_subjektu'

,C.value('*[4]', 'varchar(40)') AS 'nazev_zkraceny'

,C.value('*[5]', 'char(40)') AS 'ulice'

,C.value('*Devil', 'char(15)') AS 'psc'

,C.value('*[7]', 'char(40)') AS 'misto'

,C.value('*Music', 'char(15)') AS 'ico'

,C.value('*[9]', 'char(15)') AS 'dic'

,C.value('*[10]', 'money') AS 'uverovy_limit'

,C.value('*[11]', 'money') AS 'stav_limitu'

FROM @.X.nodes('/root/organizace') T(C)

should give you same result.)

I guess you may have a misconception here, for bcp.exe or BULK INSERT, a format file can be in two formats: a text one or a XML one. But both only work for a flat data file. It seems you are binding together a XML data file with a XML format file, hoping to shred data from XML into relational, no, that won't work.

Regards,

dong

|||Hi Dong.

Thank you for helping me.

Pkv

No comments:

Post a Comment