Thursday, March 22, 2012

Bulk Insert Unicode

Good day,

We are using bulk insert with a formatfile to load a text file into sqlexpress. One field in the text file contains non-ascii (unicode) charaters and the corresponding database field is nvarchar.

When the record and row are specified in the format file as:

<FIELD ID="23" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="400" COLLATION="Latin1_General_CI_AS"/>
<COLUMN SOURCE="23" NAME="FullName" xsi:type="SQLNVARCHAR"/>

the value "Boca CurĂ¡" gets imported as "Boca Cur??". This is true even with datafiletype set to widenative or widechar, and/or codepage set to raw or acp.

When the field is specfied in the record section of the format file as NCharTerm, the bulk insert terminates immediately with:

Msg 4863, Level 16, State 1, Server MINT\SQLEXPRESS, Line 1
Bulk load data conversion error (truncation) for row 2, column 23 (FullName).

This is true regardless of which bulk insert options specified.

What could be going wrong and how can we address it?

Thanks in advance...

Hi,

I see it's been a while since you posted this, but I have just run into this same issue today and I was wondering if you were able to resolve it gracefully?

regards

|||

Hi knightEknight!

We solved this with MS using a developer support incident. I should have updated this post when the information was fresh! Just went back through my notes and the readme for the project, and the solution was to convert the input file from UTF8 text encoding to ANSI text encoding. This is a Save As option in Notepad and most text editors. If you have a large file and need some C# code to do this, let me know....

|||Thanks! My input comes from various sources and I've pretty much concluded the same thing. I'll just have to find a way to convert all the input to ANSI. - Regards|||

Here is a C# example:

// utf8 in, ansi out

StreamReader inStream = File.OpenText(@."E:\Data\DEM\Panama\panama_canal_final.txt");

FileStream outStream = File.Create(@."E:\Data\DEM\Panama\panama_canal_fsubset.txt");

Encoding ansi = Encoding.Default;

string inLine;

byte[] outBytes;

while (true)

{

inLine = inStream.ReadLine();

if (inLine == null)

break;

outBytes = ansi.GetBytes(inLine + "\r\n");

outStream.Write(outBytes, 0, outBytes.Length);

}

inStream.Close();

outStream.Close();

|||

Instead of converting your files to ANSI as suggested you could also fix your delimiter problem for Unicode files. The TERMINATOR should be '\t\0' instead of just '\t' for tabs and '\r\0\n\0' for the row delimiter if you are using standard CR+LF.

Regards,

Lars

sql

No comments:

Post a Comment