Monday, March 19, 2012

Bulk insert of long unicode strings

Here is the situation, please let me know if you have any tips:

..TXT files in a share at \\foo

SPROCS run daily parses of many things, including data on that share. The
other day, we encountered rows in the TXT files which looked like:

column1Row1data,column2Row1data
column1Row2data,column2Row2data

...etc..

However, column2 was about 6000 bytes of unicode. We are bulk inserting
into a table specifying nvarchar(4000). When it encounters high unicode
rows, it throws a truncation error (16).

We really need information contained in the first 200 bytes of the string in
column2. However, the errors are causing the calling SPROC to abort.
Please let me know if you have any suggestions on workarounds for this
situation. Ideally, we would only Bulk Insert a sub-section of column2 if
possible.

Thanks!
/TyTy (tybala on the server at hotmail.com) writes:
> Here is the situation, please let me know if you have any tips:
> .TXT files in a share at \\foo
> SPROCS run daily parses of many things, including data on that share. The
> other day, we encountered rows in the TXT files which looked like:
> column1Row1data,column2Row1data
> column1Row2data,column2Row2data
> ..etc..
> However, column2 was about 6000 bytes of unicode. We are bulk inserting
> into a table specifying nvarchar(4000). When it encounters high unicode
> rows, it throws a truncation error (16).
> We really need information contained in the first 200 bytes of the
> string in column2.

You can use a format file like this one:

8.0
3
1 SQLNCHAR 0 200 "" 1 a Finnish_Swedish_CS_AS
2 SQLNCHAR 0 0 "," 0 dummy ""
3 SQLNCHAR 0 0 "\r\n" 2 b Finnish_Swedish_CS_AS

Here you defined the host file to have three fields: the first is a
200-character long fixed length field, the second is closed by a ,
and the third field is close by end-of-line. By specifying a 0 in
the sixth column in the format file for the second field, you specify
that this field is is not be imported into SQL Server.

You may want to change the collation what fits with the collation you
use in your database.

Note that this only works if all occurrances of the first field is
more than 200 characters. Would there be a record with a shorter
length of this field, it will steal characters from the second field.
(You would probably get an error when importing the file, as BCP will
not find the delimiter for the second field.)

--
Erland Sommarskog, SQL Server MVP, sommar@.algonet.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||I wanted to thank you for answering my question. We successfully
implemented a variant of your solution last night. The assistance is much
appreciated.

/Ty

"Erland Sommarskog" <sommar@.algonet.se> wrote in message
news:Xns9401F28A16A1CYazorman@.127.0.0.1...
> Ty (tybala on the server at hotmail.com) writes:
> > Here is the situation, please let me know if you have any tips:
> > .TXT files in a share at \\foo
> > SPROCS run daily parses of many things, including data on that share.
The
> > other day, we encountered rows in the TXT files which looked like:
> > column1Row1data,column2Row1data
> > column1Row2data,column2Row2data
> > ..etc..
> > However, column2 was about 6000 bytes of unicode. We are bulk inserting
> > into a table specifying nvarchar(4000). When it encounters high unicode
> > rows, it throws a truncation error (16).
> > We really need information contained in the first 200 bytes of the
> > string in column2.
> You can use a format file like this one:
> 8.0
> 3
> 1 SQLNCHAR 0 200 "" 1 a Finnish_Swedish_CS_AS
> 2 SQLNCHAR 0 0 "," 0 dummy ""
> 3 SQLNCHAR 0 0 "\r\n" 2 b Finnish_Swedish_CS_AS
> Here you defined the host file to have three fields: the first is a
> 200-character long fixed length field, the second is closed by a ,
> and the third field is close by end-of-line. By specifying a 0 in
> the sixth column in the format file for the second field, you specify
> that this field is is not be imported into SQL Server.
> You may want to change the collation what fits with the collation you
> use in your database.
> Note that this only works if all occurrances of the first field is
> more than 200 characters. Would there be a record with a shorter
> length of this field, it will steal characters from the second field.
> (You would probably get an error when importing the file, as BCP will
> not find the delimiter for the second field.)
>
> --
> Erland Sommarskog, SQL Server MVP, sommar@.algonet.se
> Books Online for SQL Server SP3 at
> http://www.microsoft.com/sql/techin.../2000/books.asp

No comments:

Post a Comment