Showing posts with label unicode. Show all posts
Showing posts with label unicode. Show all posts

Tuesday, March 27, 2012

bulk inserting uniqueidentifier column

Hi at all,
I'm trying to bulk insert a uniqueidentifier column from unicode file.
In my file I have guid generated from c# application and they are
formatted in this way (separated by "|") :

guid | field1 | field2
fc0c0c42-438e-4897-96db-8b0489e873ef|field1|field2

In my destination table I have three column:
id (uniqueidentifier)
field1 (nvarchar)
field2 (nvarchar)

I use in bulk insert a format file like this :

9.0
3
1SQLNCHAR00"|\0"1IDLatin1_General_CI_AS
2SQLNCHAR00"|\0"2Field1Latin1_General_CI_AS
3SQLNCHAR00"|\0"3Field2Latin1_General_CI_AS

and I use this script

BULK INSERT [dbo].[KWTA2] FROM 'd:\WTA2.txt'
WITH (FORMATFILE = 'd:\wta2Format.FMT')

It doesn't work, it prints out
Msg 8152, Level 16, State 13, Line 2
String or binary data would be truncated.

I've also tried to specify in FMT file SQLUNIQUEID instead of SQLNCHAR
and it works perfectly but it imports another data. For example the
guid fc0c0c42-438e-4897-96db-8b0489e873ef became
00350031-0039-0033-3100-300030003000

Please can you help me?
Why sql converts alphanumerical GUID into only numbers ID?
How can I bulk insert GUID? (I didn't find anything googling around :
\ )

Thanks!

Bob(bob.speaking@.gmail.com) writes:

Quote:

Originally Posted by

I'm trying to bulk insert a uniqueidentifier column from unicode file.
In my file I have guid generated from c# application and they are
formatted in this way (separated by "|") :
>
guid | field1 | field2
fc0c0c42-438e-4897-96db-8b0489e873ef|field1|field2
>
In my destination table I have three column:
id (uniqueidentifier)
field1 (nvarchar)
field2 (nvarchar)
>
I use in bulk insert a format file like this :
>
9.0
3
1 SQLNCHAR 0 0 "|\0" 1 ID Latin1_General_CI_AS
2 SQLNCHAR 0 0 "|\0" 2 Field1


Latin1_General_CI_AS

Quote:

Originally Posted by

3 SQLNCHAR 0 0 "|\0" 3 Field2


Latin1_General_CI_AS

Does the file really consist of one single line?

Assuming that you have one record per line in the file, the terminator
for field 3 should be \r\0\n\0. What happens now is that Field2 in the
first record extends into the GUID in the second record, and then it
goes downhill from there.

Quote:

Originally Posted by

I've also tried to specify in FMT file SQLUNIQUEID instead of SQLNCHAR
and it works perfectly but it imports another data. For example the
guid fc0c0c42-438e-4897-96db-8b0489e873ef became
00350031-0039-0033-3100-300030003000


SQLUNIQUEID is what you would use in a binary file. It's not applicable
here.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx|||>

Quote:

Originally Posted by

Does the file really consist of one single line?
>
Assuming that you have one record per line in the file, the terminator
for field 3 should be \r\0\n\0. What happens now is that Field2 in the
first record extends into the GUID in the second record, and then it
goes downhill from there.


I'm sorry, cut and pasting sample text file I've removed the correct
syntax :\
In fact the last line has the terminator you specified. :)

Quote:

Originally Posted by

Quote:

Originally Posted by

I've also tried to specify in FMT file SQLUNIQUEID instead of SQLNCHAR
and it works perfectly but it imports another data. For example the
guid fc0c0c42-438e-4897-96db-8b0489e873ef became
00350031-0039-0033-3100-300030003000


>
SQLUNIQUEID is what you would use in a binary file. It's not applicable
here.


I'm migrating this bulk insert frm sql server 2000... in 2005 this
doesn't work. Is it caused by more strictly rules in 2005 engine?
Is sql converting my "char" guid in binary?

Thanks for the prompt reply :)

bobsql

BULK INSERTing UNICODE data with format files

Sorry but could you please show me how I convert (specifically the
separators) this bcp format file to unicode so that it works (with a unicod
e
file). I have tried various methods without success.
8.0
2
1 SQLNCHAR 2 100 "," 1 RecordId
Latin1_General_CI_AS
2 SQLNCHAR 4 0 "\r\n" 2 TheText
Latin1_General_CI_AS
Thanks
Peter Doyle
"Erland Sommarskog" wrote:

> Nitin M (nitin@.nowhere.com) writes:
> Ah! Glad to hear that you where able to find it out yourself.
>
> The obvious idea would be to save the format file as Unicode, but that
> does not work; you only get a message about unknown version. I tried in
> SQL 2005 as well, but SQL 2005 appears to think that a Unicode file must
> be an XML format file.
> --
> Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
> Books Online for SQL Server 2005 at
> http://www.microsoft.com/technet/pr...oads/books.mspx
> Books Online for SQL Server 2000 at
> http://www.microsoft.com/sql/prodin...ions/books.mspx
>Peter Doyle (PeterDoyle@.discussions.microsoft.com) writes:
> Sorry but could you please show me how I convert (specifically the
> separators) this bcp format file to unicode so that it works (with a
> unicode file). I have tried various methods without success.
> 8.0
> 2
> 1 SQLNCHAR 2 100 "," 1 RecordId
> Latin1_General_CI_AS
> 2 SQLNCHAR 4 0 "\r\n" 2 TheText
> Latin1_General_CI_AS
The above won't work for several reasons. You have specified a
prefix length, but prefix lengths is only for binary formats.
The problem with impoting Unicode files, is that the format file itself
must be a 8-bit file. At least I seem to recall that BCP freaked out
when I tried with a Unicode file. (Hm, did I try BULK INSERT as well+)
Anyway, with the format indicated by your format file, you don't any
format file at all. For a table like:
CREATE TABLE test (a nvarchar(100) NOT NULL,
b nvarchar(100) NOT NULL)
And this data (saved as Unicode):
Detta r lite data,Och detta r mer data.
Sedan kommer det n mer data hr,Och nu kommer det n mer
Trams!, Det r vad det r!
This command line will do:
bcp tempdb..test in slask.bcp -w -t, -T
-w specifies that the file a Unicode file.
But in case that your real case calls for a format file, here is one
that works with the above table and data:
8.0
2
1 SQLNCHAR 0 0 ",\0" 1 a ""
2 SQLNCHAR 0 0 "\r\0\n\0" 2 b ""
The tricky part is the \0 that must come aftet the character it belongs
to, due to endianness.
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx

BULK INSERTing UNICODE data with format files

Hi,
I am trying to bulk insert data with UNICODE characters into a table using a
format file. I am using SQL Server 2000 with all the latest SPs.
When I try to bulk insert the data I get the following error.
"Bulk Insert: Unexpected end-of-file (EOF) encountered in data file."
What am I doing incorrectly?
Please help me out with this. What is the correct way to do this. My data
file will have UNICODE characters (for nchar, nvarchar sql types) and
would also have data for other types like (int, datetime etc). And I want to
use a format file.
Thanks in anticipation,
Nitin M
I have created the data file in the following way.
---
StreamWriter DataWriter = new
StreamWriter("data.txt",false,System.Text.Encoding.Unicode);
DataWriter.WriteLine("1/1/2005@.@.aa@.@.aaaa@.@.23@.@.-1.9879@.@.");
DataWriter.Close();
----
--
This is the definition of my table.
---
CREATE TABLE [dbo].[AllTypes] (
[mydate] [datetime] NULL ,
[mychar] [char] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[myvarchar] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[myint] [int] NULL ,
[myreal] [real] NULL
) ON [PRIMARY]
---
The bulk insert query that I use
---
bulk insert alltypes from 'data.txt' with
(
datafiletype='widechar',
formatfile = 'bcp.format.txt',
tablock
)
---
My format file
---
8.0
5
1 SQLNCHAR 0 0 "@.@." 1 mydate ""
2 SQLNCHAR 0 0 "@.@." 2 mychar SQL_Latin1_General_CP1_CI_AS
3 SQLNCHAR 0 0 "@.@." 3 myvarchar SQL_Latin1_General_CP1_CI_AS
4 SQLNCHAR 0 0 "@.@." 4 myint ""
5 SQLNCHAR 0 0 "@.@.\r\n" 5 myreal ""
---Nitin M (nitin@.nowhere.com) writes:

> I am trying to bulk insert data with UNICODE characters into a table
> using a format file. I am using SQL Server 2000 with all the latest
> SPs.
> When I try to bulk insert the data I get the following error.
> "Bulk Insert: Unexpected end-of-file (EOF) encountered in data file."
> What am I doing incorrectly?
> Please help me out with this. What is the correct way to do this. My
> data file will have UNICODE characters (for nchar, nvarchar sql types)
> and would also have data for other types like (int, datetime etc). And I
> want to use a format file.
I only got half-way of solving this puzzle. You need to specify the
separators as Unicode as well. I tried this:
8.0
5
1 SQLNCHAR 0 0 "\0@.\0@." 1 mydate ""
2 SQLNCHAR 0 0 "\0@.\0@." 2 mychar SQL_Latin1_General_CP1_CI_AS
3 SQLNCHAR 0 0 "\0@.\0@." 3 myvarchar SQL_Latin1_General_CP1_CI_AS
4 SQLNCHAR 0 0 "\0@.\0@." 4 myint ""
5 SQLNCHAR 0 0 "\0@.\0@.\0\r\0\n" 5 myreal ""
This got me past the EOF error, but instead I got conversion errors for
the numeric values. You could make all columns characters columns, so
you see what BCP actually finds, and then maybe modify the separators
from this.
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx|||Hi Erland,
Thanks a lot for looking into this.
Even I got around the problem by specifying separators as Unicode. The
reason you are getting conversion errors is due to the byte ordering of the
separators. Try the other byte order. It works for me. I am not getting any
conversion errors.
Is there no other cleaner way around this?
Thanks,
Nitin
"Erland Sommarskog" <esquel@.sommarskog.se> wrote in message
news:Xns973F74A723AEFYazorman@.127.0.0.1...
> Nitin M (nitin@.nowhere.com) writes:
>
> I only got half-way of solving this puzzle. You need to specify the
> separators as Unicode as well. I tried this:
> 8.0
> 5
> 1 SQLNCHAR 0 0 "\0@.\0@." 1 mydate ""
> 2 SQLNCHAR 0 0 "\0@.\0@." 2 mychar SQL_Latin1_General_CP1_CI_AS
> 3 SQLNCHAR 0 0 "\0@.\0@." 3 myvarchar SQL_Latin1_General_CP1_CI_AS
> 4 SQLNCHAR 0 0 "\0@.\0@." 4 myint ""
> 5 SQLNCHAR 0 0 "\0@.\0@.\0\r\0\n" 5 myreal ""
> This got me past the EOF error, but instead I got conversion errors for
> the numeric values. You could make all columns characters columns, so
> you see what BCP actually finds, and then maybe modify the separators
> from this.
>
> --
> Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
> Books Online for SQL Server 2005 at
> http://www.microsoft.com/technet/pr...oads/books.mspx
> Books Online for SQL Server 2000 at
> http://www.microsoft.com/sql/prodin...ions/books.mspx|||Nitin M (nitin@.nowhere.com) writes:
> Thanks a lot for looking into this.
> Even I got around the problem by specifying separators as Unicode. The
> reason you are getting conversion errors is due to the byte ordering of
> the separators. Try the other byte order. It works for me. I am not
> getting any conversion errors.
Ah! Glad to hear that you where able to find it out yourself.

> Is there no other cleaner way around this?
The obvious idea would be to save the format file as Unicode, but that
does not work; you only get a message about unknown version. I tried in
SQL 2005 as well, but SQL 2005 appears to think that a Unicode file must
be an XML format file.
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx

Thursday, March 22, 2012

Bulk insert Unicode data

Hi,
I have to import Unicode data from a txt file to a sql server table.
I'd like to use BULK INSERT with the FORMAT FILE option. I 've tryed with
Datafiletype = WIDECHAR but without success.
The error is "Cannot perform bulk insert. Invalid collation name for source
column xx in format file".
Format file is :
8.0
19
1 SQLNCHAR 0 50 "~" 1 BatchGroup Latin1_General_CI_AS
2 SQLNCHAR 0 50 "~" 2 ProductID Latin1_General_CI_AS
3 SQLNCHAR 0 100 "~" 3 ProducerIDERP Latin1_General_CI_AS
4 SQLNCHAR 0 100 "~" 4 ProducerName Latin1_General_CI_AS
5 SQLNCHAR 0 50 "~" 5 ProductDefinitionName Latin1_General_
CI_AS
6 SQLNCHAR 0 500 "~" 6 Name Latin1_General_CI_AS
7 SQLNCHAR 0 500 "~" 14 DisplayName Latin1_General_CI_AS
8 SQLFLT8 0 8 "~" 7 MeasureEquivalenceFactor ""
9 SQLNCHAR 0 10 "~" 8 MeasureUnitID Latin1_General_CI_AS
10 SQLNCHAR 0 10 "~" 9 MeasureUnitIDAlt Latin1_General_CI_AS
11 SQLMONEY 0 8 "~" 12 Price ""
12 SQLFLT8 0 8 "~" 13 VAT ""
13 SQLNCHAR 0 100 "~" 10 PricingCategoryName Latin1_General_C
I_AS
14 SQLNCHAR 0 100 "~" 11 SupplierName Latin1_General_CI_AS
15 SQLNCHAR 0 250 "~" 15 Tipo Latin1_General_CI_AS
16 SQLNCHAR 0 250 "~" 16 SottoTipo Latin1_General_CI_AS
17 SQLINT 0 4 "~" 17 Pagina ""
18 SQLNCHAR 0 250 "\r\n" 18 ImageName Latin1_General_CI_A
19 SQLINT 0 0
(~ is field separator)
Thanks!
MBHi
It is not clear what you are trying to do with field 19? if it is not
required then check out
http://msdn.microsoft.com/library/d...>
bcp_9y43.asp
At a guess try:
18 SQLNCHAR 0 250 "" 18 ImageName Latin1_General_CI_A
19 SQLCHAR 0 0 "\r\n" 0 Dump
John
"Michela Burello" wrote:

> Hi,
> I have to import Unicode data from a txt file to a sql server table.
> I'd like to use BULK INSERT with the FORMAT FILE option. I 've tryed with
> Datafiletype = WIDECHAR but without success.
> The error is "Cannot perform bulk insert. Invalid collation name for sourc
e
> column xx in format file".
> Format file is :
> 8.0
> 19
> 1 SQLNCHAR 0 50 "~" 1 BatchGroup Latin1_General_CI_AS
> 2 SQLNCHAR 0 50 "~" 2 ProductID Latin1_General_CI_AS
> 3 SQLNCHAR 0 100 "~" 3 ProducerIDERP Latin1_General_CI_AS
> 4 SQLNCHAR 0 100 "~" 4 ProducerName Latin1_General_CI_AS
> 5 SQLNCHAR 0 50 "~" 5 ProductDefinitionName Latin1_General_
CI_AS
> 6 SQLNCHAR 0 500 "~" 6 Name Latin1_General_CI_AS
> 7 SQLNCHAR 0 500 "~" 14 DisplayName Latin1_General_CI_AS
> 8 SQLFLT8 0 8 "~" 7 MeasureEquivalenceFactor ""
> 9 SQLNCHAR 0 10 "~" 8 MeasureUnitID Latin1_General_CI_AS
> 10 SQLNCHAR 0 10 "~" 9 MeasureUnitIDAlt Latin1_General_CI_AS
> 11 SQLMONEY 0 8 "~" 12 Price ""
> 12 SQLFLT8 0 8 "~" 13 VAT ""
> 13 SQLNCHAR 0 100 "~" 10 PricingCategoryName Latin1_General_C
I_AS
> 14 SQLNCHAR 0 100 "~" 11 SupplierName Latin1_General_CI_AS
> 15 SQLNCHAR 0 250 "~" 15 Tipo Latin1_General_CI_AS
> 16 SQLNCHAR 0 250 "~" 16 SottoTipo Latin1_General_CI_AS
> 17 SQLINT 0 4 "~" 17 Pagina ""
> 18 SQLNCHAR 0 250 "\r\n" 18 ImageName Latin1_General_CI_A
> 19 SQLINT 0 0
> (~ is field separator)
> Thanks!
> MB|||Hi,
thank you for your answer but I still have problems!
I tried to bulk insert without field 19.
The error is "Cannot perform bulk insert. Invalid collation name for source
column 18 in format file".
My format file is now:
8.0
18
1 SQLNCHAR 0 50 "~" 1 BatchGroup Latin1_General_CI_AS
2 SQLNCHAR 0 50 "~" 2 ProductID Latin1_General_CI_AS
3 SQLNCHAR 0 100 "~" 3 ProducerIDERP Latin1_General_CI_AS
4 SQLNCHAR 0 100 "~" 4 ProducerName Latin1_General_CI_AS
5 SQLNCHAR 0 50 "~" 5 ProductDefinitionName Latin1_General_
CI_AS
6 SQLNCHAR 0 500 "~" 6 Name Latin1_General_CI_AS
7 SQLNCHAR 0 500 "~" 14 DisplayName Latin1_General_CI_AS
8 SQLFLT8 0 8 "~" 7 MeasureEquivalenceFactor ""
9 SQLNCHAR 0 10 "~" 8 MeasureUnitID Latin1_General_CI_AS
10 SQLNCHAR 0 10 "~" 9 MeasureUnitIDAlt Latin1_General_CI_AS
11 SQLMONEY 0 8 "~" 12 Price ""
12 SQLFLT8 0 8 "~" 13 VAT ""
13 SQLNCHAR 0 100 "~" 10 PricingCategoryName Latin1_General_C
I_AS
14 SQLNCHAR 0 100 "~" 11 SupplierName Latin1_General_CI_AS
15 SQLNCHAR 0 250 "~" 15 Tipo Latin1_General_CI_AS
16 SQLNCHAR 0 250 "~" 16 SottoTipo Latin1_General_CI_AS
17 SQLINT 0 4 "~" 17 Pagina ""
18 SQLNCHAR 0 250 "\r\n" 18 ImageName Latin1_General_CI_A
My query is:
bulk insert MyTable
from 'myfile.txt'
with ( fieldterminator = '~',
firstrow = 2,
DATAFILETYPE = 'widechar',
FORMATFILE = 'myformatfile.fmt',
codepage = 1200
)
The table MyTable has this structure:
BatchGroup nvarchar 50
ProductID varchar 100 1
ProducerIDERP nvarchar 100
ProducerName nvarchar 100
ProductDefinitionName varchar 100
Name nvarchar 500 1
MeasureEquivalenceFactor float 8
MeasureUnitID nvarchar 10 1
MeasureUnitIDAlt nvarchar 10
PricingCategoryName nvarchar 100
SupplierName nvarchar 100
Price money 8
VAT float 8
DisplayName nvarchar 500
Tipo varchar 500
SottoTipo varchar 500
Pagina int 4
ImageName varchar 500
ItemID int 4 (identity)
Thanks
MB
"John Bell" wrote:
> Hi
> It is not clear what you are trying to do with field 19? if it is not
> required then check out
> http://msdn.microsoft.com/library/d...
t_bcp_9y43.asp
> At a guess try:
> 18 SQLNCHAR 0 250 "" 18 ImageName Latin1_General_CI_A
> 19 SQLCHAR 0 0 "\r\n" 0 Dump
>
> John
> "Michela Burello" wrote:
>|||On Thu, 28 Apr 2005 00:06:03 -0700, Michela Burello <Michela
Burello@.discussions.microsoft.com> wrote:

>The error is "Cannot perform bulk insert. Invalid collation name for source
>column xx in format file".
(snip)
>18 SQLNCHAR 0 250 "\r\n" 18 ImageName Latin1_General_CI_A
Hi Michela,
There is no collation "Latin1_General_CI_A". You need to add either an S
or an I at the end of this line.
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)|||Hi
This one works for me!!
8.0
19
1 SQLNCHAR 2 100 "~" 1
BatchGroup SQL_Latin1_General_CP1_CI_AS
2 SQLCHAR 2 100 "~" 2
ProductID SQL_Latin1_General_CP1_CI_AS
3 SQLNCHAR 2 200 "~" 3
ProducerIDERP SQL_Latin1_General_CP1_CI_AS
4 SQLNCHAR 2 200 "~" 4
ProducerName SQL_Latin1_General_CP1_CI_AS
5 SQLCHAR 2 100 "~" 5
ProductDefinitionName SQL_Latin1_General_CP1_CI_AS
6 SQLNCHAR 2 1000 "~" 6
Name SQL_Latin1_General_CP1_CI_AS
7 SQLFLT8 1 8 "~" 7
MeasureEquivalenceFactor ""
8 SQLNCHAR 2 20 "~" 8
MeasureUnitID SQL_Latin1_General_CP1_CI_AS
9 SQLNCHAR 2 20 "~" 9
MeasureUnitIDAlt SQL_Latin1_General_CP1_CI_AS
10 SQLNCHAR 2 200 "~" 10
PricingCategoryName SQL_Latin1_General_CP1_CI_AS
11 SQLNCHAR 2 200 "~" 11
SupplierName SQL_Latin1_General_CP1_CI_AS
12 SQLMONEY 1 8 "~" 12
Price ""
13 SQLFLT8 1 8 "~" 13
VAT ""
14 SQLNCHAR 2 1000 "~" 14
DisplayName SQL_Latin1_General_CP1_CI_AS
15 SQLCHAR 2 500 "~" 15
Tipo SQL_Latin1_General_CP1_CI_AS
16 SQLCHAR 2 500 "~" 16
SottoTipo SQL_Latin1_General_CP1_CI_AS
17 SQLINT 1 4 "~" 17
Pagina ""
18 SQLCHAR 2 500 "" 18
ImageName SQL_Latin1_General_CP1_CI_AS
19 SQLINT 0 0 "\r\n" 19
ItemID ""
Any the statement:
bulk insert MyTable
from 'C:\temp\myfile.txt'
with ( firstrow = 1,
FORMATFILE = 'C:\temp\myformatfile.fmt'
)
John

Bulk Insert Unicode

Good day,

We are using bulk insert with a formatfile to load a text file into sqlexpress. One field in the text file contains non-ascii (unicode) charaters and the corresponding database field is nvarchar.

When the record and row are specified in the format file as:

<FIELD ID="23" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="400" COLLATION="Latin1_General_CI_AS"/>
<COLUMN SOURCE="23" NAME="FullName" xsi:type="SQLNVARCHAR"/>

the value "Boca Curá" gets imported as "Boca Cur??". This is true even with datafiletype set to widenative or widechar, and/or codepage set to raw or acp.

When the field is specfied in the record section of the format file as NCharTerm, the bulk insert terminates immediately with:

Msg 4863, Level 16, State 1, Server MINT\SQLEXPRESS, Line 1
Bulk load data conversion error (truncation) for row 2, column 23 (FullName).

This is true regardless of which bulk insert options specified.

What could be going wrong and how can we address it?

Thanks in advance...

Hi,

I see it's been a while since you posted this, but I have just run into this same issue today and I was wondering if you were able to resolve it gracefully?

regards

|||

Hi knightEknight!

We solved this with MS using a developer support incident. I should have updated this post when the information was fresh! Just went back through my notes and the readme for the project, and the solution was to convert the input file from UTF8 text encoding to ANSI text encoding. This is a Save As option in Notepad and most text editors. If you have a large file and need some C# code to do this, let me know....

|||Thanks! My input comes from various sources and I've pretty much concluded the same thing. I'll just have to find a way to convert all the input to ANSI. - Regards|||

Here is a C# example:

// utf8 in, ansi out

StreamReader inStream = File.OpenText(@."E:\Data\DEM\Panama\panama_canal_final.txt");

FileStream outStream = File.Create(@."E:\Data\DEM\Panama\panama_canal_fsubset.txt");

Encoding ansi = Encoding.Default;

string inLine;

byte[] outBytes;

while (true)

{

inLine = inStream.ReadLine();

if (inLine == null)

break;

outBytes = ansi.GetBytes(inLine + "\r\n");

outStream.Write(outBytes, 0, outBytes.Length);

}

inStream.Close();

outStream.Close();

|||

Instead of converting your files to ANSI as suggested you could also fix your delimiter problem for Unicode files. The TERMINATOR should be '\t\0' instead of just '\t' for tabs and '\r\0\n\0' for the row delimiter if you are using standard CR+LF.

Regards,

Lars

sql

Bulk Insert Unicode

Good day,

We are using bulk insert with a formatfile to load a text file into sqlexpress. One field in the text file contains non-ascii (unicode) charaters and the corresponding database field is nvarchar.

When the record and row are specified in the format file as:

<FIELD ID="23" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="400" COLLATION="Latin1_General_CI_AS"/>
<COLUMN SOURCE="23" NAME="FullName" xsi:type="SQLNVARCHAR"/>

the value "Boca Curá" gets imported as "Boca Cur??". This is true even with datafiletype set to widenative or widechar, and/or codepage set to raw or acp.

When the field is specfied in the record section of the format file as NCharTerm, the bulk insert terminates immediately with:

Msg 4863, Level 16, State 1, Server MINT\SQLEXPRESS, Line 1
Bulk load data conversion error (truncation) for row 2, column 23 (FullName).

This is true regardless of which bulk insert options specified.

What could be going wrong and how can we address it?

Thanks in advance...

Hi,

I see it's been a while since you posted this, but I have just run into this same issue today and I was wondering if you were able to resolve it gracefully?

regards

|||

Hi knightEknight!

We solved this with MS using a developer support incident. I should have updated this post when the information was fresh! Just went back through my notes and the readme for the project, and the solution was to convert the input file from UTF8 text encoding to ANSI text encoding. This is a Save As option in Notepad and most text editors. If you have a large file and need some C# code to do this, let me know....

|||Thanks! My input comes from various sources and I've pretty much concluded the same thing. I'll just have to find a way to convert all the input to ANSI. - Regards|||

Here is a C# example:

// utf8 in, ansi out

StreamReader inStream = File.OpenText(@."E:\Data\DEM\Panama\panama_canal_final.txt");

FileStream outStream = File.Create(@."E:\Data\DEM\Panama\panama_canal_fsubset.txt");

Encoding ansi = Encoding.Default;

string inLine;

byte[] outBytes;

while (true)

{

inLine = inStream.ReadLine();

if (inLine == null)

break;

outBytes = ansi.GetBytes(inLine + "\r\n");

outStream.Write(outBytes, 0, outBytes.Length);

}

inStream.Close();

outStream.Close();

|||

Instead of converting your files to ANSI as suggested you could also fix your delimiter problem for Unicode files. The TERMINATOR should be '\t\0' instead of just '\t' for tabs and '\r\0\n\0' for the row delimiter if you are using standard CR+LF.

Regards,

Lars

Bulk Insert Unicode

Good day,

We are using bulk insert with a formatfile to load a text file into sqlexpress. One field in the text file contains non-ascii (unicode) charaters and the corresponding database field is nvarchar.

When the record and row are specified in the format file as:

<FIELD ID="23" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="400" COLLATION="Latin1_General_CI_AS"/>
<COLUMN SOURCE="23" NAME="FullName" xsi:type="SQLNVARCHAR"/>

the value "Boca Curá" gets imported as "Boca Cur??". This is true even with datafiletype set to widenative or widechar, and/or codepage set to raw or acp.

When the field is specfied in the record section of the format file as NCharTerm, the bulk insert terminates immediately with:

Msg 4863, Level 16, State 1, Server MINT\SQLEXPRESS, Line 1
Bulk load data conversion error (truncation) for row 2, column 23 (FullName).

This is true regardless of which bulk insert options specified.

What could be going wrong and how can we address it?

Thanks in advance...

Hi,

I see it's been a while since you posted this, but I have just run into this same issue today and I was wondering if you were able to resolve it gracefully?

regards

|||

Hi knightEknight!

We solved this with MS using a developer support incident. I should have updated this post when the information was fresh! Just went back through my notes and the readme for the project, and the solution was to convert the input file from UTF8 text encoding to ANSI text encoding. This is a Save As option in Notepad and most text editors. If you have a large file and need some C# code to do this, let me know....

|||Thanks! My input comes from various sources and I've pretty much concluded the same thing. I'll just have to find a way to convert all the input to ANSI. - Regards|||

Here is a C# example:

// utf8 in, ansi out

StreamReader inStream = File.OpenText(@."E:\Data\DEM\Panama\panama_canal_final.txt");

FileStream outStream = File.Create(@."E:\Data\DEM\Panama\panama_canal_fsubset.txt");

Encoding ansi = Encoding.Default;

string inLine;

byte[] outBytes;

while (true)

{

inLine = inStream.ReadLine();

if (inLine == null)

break;

outBytes = ansi.GetBytes(inLine + "\r\n");

outStream.Write(outBytes, 0, outBytes.Length);

}

inStream.Close();

outStream.Close();

Bulk Insert Unicode

Good day,

We are using bulk insert with a formatfile to load a text file into sqlexpress. One field in the text file contains non-ascii (unicode) charaters and the corresponding database field is nvarchar.

When the record and row are specified in the format file as:

<FIELD ID="23" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="400" COLLATION="Latin1_General_CI_AS"/>
<COLUMN SOURCE="23" NAME="FullName" xsi:type="SQLNVARCHAR"/>

the value "Boca Curá" gets imported as "Boca Cur??". This is true even with datafiletype set to widenative or widechar, and/or codepage set to raw or acp.

When the field is specfied in the record section of the format file as NCharTerm, the bulk insert terminates immediately with:

Msg 4863, Level 16, State 1, Server MINT\SQLEXPRESS, Line 1
Bulk load data conversion error (truncation) for row 2, column 23 (FullName).

This is true regardless of which bulk insert options specified.

What could be going wrong and how can we address it?

Thanks in advance...

Hi,

I see it's been a while since you posted this, but I have just run into this same issue today and I was wondering if you were able to resolve it gracefully?

regards

|||

Hi knightEknight!

We solved this with MS using a developer support incident. I should have updated this post when the information was fresh! Just went back through my notes and the readme for the project, and the solution was to convert the input file from UTF8 text encoding to ANSI text encoding. This is a Save As option in Notepad and most text editors. If you have a large file and need some C# code to do this, let me know....

|||Thanks! My input comes from various sources and I've pretty much concluded the same thing. I'll just have to find a way to convert all the input to ANSI. - Regards|||

Here is a C# example:

// utf8 in, ansi out

StreamReader inStream = File.OpenText(@."E:\Data\DEM\Panama\panama_canal_final.txt");

FileStream outStream = File.Create(@."E:\Data\DEM\Panama\panama_canal_fsubset.txt");

Encoding ansi = Encoding.Default;

string inLine;

byte[] outBytes;

while (true)

{

inLine = inStream.ReadLine();

if (inLine == null)

break;

outBytes = ansi.GetBytes(inLine + "\r\n");

outStream.Write(outBytes, 0, outBytes.Length);

}

inStream.Close();

outStream.Close();

Bulk Insert Unicode

Good day,
We are using bulk insert with a formatfile to load a text file into
sqlexpress. One field in the text file contains non-ascii (unicode)
charaters and the corresponding database field is nvarchar.
When the record and row are specified in the format file as:
<FIELD ID=3D"23" xsi:type=3D"CharTerm" TERMINATOR=3D"\t" MAX_LENGTH=3D"400"
COLLATION=3D"Latin1_General_CI_AS"/>
<COLUMN SOURCE=3D"23" NAME=3D"FullName" xsi:type=3D"SQLNVARCHAR"/>
the value "Boca Cur=E1" gets imported as "Boca Cur=C3=A1". This is true
even with datafiletype set to widenative or widechar, and/or codepage
set to raw or acp.
When the field is specfied in the record section of the format file as
NCharTerm, the bulk insert terminates immediately with:
Msg 4863, Level 16, State 1, Server MINT\SQLEXPRESS, Line 1
Bulk load data conversion error (truncation) for row 2, column 23
(FullName).
This is true regardless bulk insert options specified.
What could be going wrong and how can we address it?
Thanks in advance...Hi
Can you specify SQLNVARCHAR(50) for example? If I remember well it truncates
the string if you do not specify a length.As it is NVARCHAR , so it is like
NVARCHAR(2)
<rgreene@.icanmarine.com> wrote in message
news:1143039002.249599.122720@.g10g2000cwb.googlegroups.com...
Good day,
We are using bulk insert with a formatfile to load a text file into
sqlexpress. One field in the text file contains non-ascii (unicode)
charaters and the corresponding database field is nvarchar.
When the record and row are specified in the format file as:
<FIELD ID="23" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="400"
COLLATION="Latin1_General_CI_AS"/>
<COLUMN SOURCE="23" NAME="FullName" xsi:type="SQLNVARCHAR"/>
the value "Boca Cur" gets imported as "Boca Curá". This is true
even with datafiletype set to widenative or widechar, and/or codepage
set to raw or acp.
When the field is specfied in the record section of the format file as
NCharTerm, the bulk insert terminates immediately with:
Msg 4863, Level 16, State 1, Server MINT\SQLEXPRESS, Line 1
Bulk load data conversion error (truncation) for row 2, column 23
(FullName).
This is true regardless bulk insert options specified.
What could be going wrong and how can we address it?
Thanks in advance...|||Hi Uri. Thanks for the quick reply.
When I use NCharTerm in the record section of the formatfile and
SQLNVARCHAR(200) in the row section I get:
Msg 4858, Level 16, State 1, Server MINT\SQLEXPRESS, Line 1
Line 49 in format file "C:\Documents and Settings\rgreene\My
Documents\PlaceName
s\format.xml": bad value SQLNVARCHAR(200) for attribute "xsi:type".
Any other ideas?|||Well, can you show us a format file? Is it a XML file?
<rgreene@.icanmarine.com> wrote in message
news:1143039873.112229.184880@.g10g2000cwb.googlegroups.com...
> Hi Uri. Thanks for the quick reply.
> When I use NCharTerm in the record section of the formatfile and
> SQLNVARCHAR(200) in the row section I get:
> Msg 4858, Level 16, State 1, Server MINT\SQLEXPRESS, Line 1
> Line 49 in format file "C:\Documents and Settings\rgreene\My
> Documents\PlaceName
> s\format.xml": bad value SQLNVARCHAR(200) for attribute "xsi:type".
> Any other ideas?
>|||Here is the original XML format file (the one that imports, but
changes, the unicode text):
<?xml version="1.0"?>
<BCPFORMAT
xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="1" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="5"/>
<FIELD ID="2" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="12"/>
<FIELD ID="3" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="12"/>
<FIELD ID="4" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="30"/>
<FIELD ID="5" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="30"/>
<FIELD ID="6" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="7"/>
<FIELD ID="7" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="8"/>
<FIELD ID="8" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="4"/>
<FIELD ID="9" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="7"/>
<FIELD ID="10" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="2"
COLLATION="Latin1_General_CI_AS"/>
<FIELD ID="11" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="10"
COLLATION="Latin1_General_CI_AS"/>
<FIELD ID="12" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="5"/>
<FIELD ID="13" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="4"
COLLATION="Latin1_General_CI_AS"/>
<FIELD ID="14" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="4"
COLLATION="Latin1_General_CI_AS"/>
<FIELD ID="15" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="400"
COLLATION="Latin1_General_CI_AS"/>
<FIELD ID="16" xsi:type="CharTerm" TERMINATOR="\t"
MAX_LENGTH="12"/>
<FIELD ID="17" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="4"
COLLATION="Latin1_General_CI_AS"/>
<FIELD ID="18" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="2"
COLLATION="Latin1_General_CI_AS"/>
<FIELD ID="19" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="6"
COLLATION="Latin1_General_CI_AS"/>
<FIELD ID="20" xsi:type="CharTerm" TERMINATOR="\t"
MAX_LENGTH="128"/>
<FIELD ID="21" xsi:type="CharTerm" TERMINATOR="\t"
MAX_LENGTH="128"/>
<FIELD ID="22" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="400"
COLLATION="Latin1_General_CI_AS"/>
<FIELD ID="23" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="400"
COLLATION="Latin1_General_CI_AS"/>
<FIELD ID="24" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="400"
COLLATION="Latin1_General_CI_AS"/>
<FIELD ID="25" xsi:type="CharTerm" TERMINATOR="\r\n"
MAX_LENGTH="24"/>
</RECORD>
<ROW>
<COLUMN SOURCE="1" NAME="RegionCode" xsi:type="SQLTINYINT"/>
<COLUMN SOURCE="2" NAME="UniqueFeatureIdentifier"
xsi:type="SQLINT"/>
<COLUMN SOURCE="3" NAME="UniqueNameIdentifier" xsi:type="SQLINT"/>
<COLUMN SOURCE="4" NAME="Lat" xsi:type="SQLFLT8"/>
<COLUMN SOURCE="5" NAME="Lon" xsi:type="SQLFLT8"/>
<COLUMN SOURCE="10" NAME="FeatureClassificationCode"
xsi:type="SQLNCHAR"/>
<COLUMN SOURCE="11" NAME="FeatureDesignationCode"
xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="12" NAME="PopulatedPlaceClassification"
xsi:type="SQLTINYINT"/>
<COLUMN SOURCE="13" NAME="PrimaryCountryCode"
xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="14" NAME="ADM1Code" xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="15" NAME="ADM2" xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="16" NAME="Dimension" xsi:type="SQLINT"/>
<COLUMN SOURCE="17" NAME="SecondaryCountryCode"
xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="18" NAME="NameType" xsi:type="SQLNCHAR"/>
<COLUMN SOURCE="19" NAME="LanguageCode" xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="22" NAME="SortName" xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="23" NAME="FullName" xsi:type="SQLNVARCHAR()"/>
<COLUMN SOURCE="24" NAME="FullNameND" xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="25" NAME="ModificationDate"
xsi:type="SQLDATETIME"/>
</ROW>
</BCPFORMAT>

Monday, March 19, 2012

Bulk insert of long unicode strings

Here is the situation, please let me know if you have any tips:

..TXT files in a share at \\foo

SPROCS run daily parses of many things, including data on that share. The
other day, we encountered rows in the TXT files which looked like:

column1Row1data,column2Row1data
column1Row2data,column2Row2data

...etc..

However, column2 was about 6000 bytes of unicode. We are bulk inserting
into a table specifying nvarchar(4000). When it encounters high unicode
rows, it throws a truncation error (16).

We really need information contained in the first 200 bytes of the string in
column2. However, the errors are causing the calling SPROC to abort.
Please let me know if you have any suggestions on workarounds for this
situation. Ideally, we would only Bulk Insert a sub-section of column2 if
possible.

Thanks!
/TyTy (tybala on the server at hotmail.com) writes:
> Here is the situation, please let me know if you have any tips:
> .TXT files in a share at \\foo
> SPROCS run daily parses of many things, including data on that share. The
> other day, we encountered rows in the TXT files which looked like:
> column1Row1data,column2Row1data
> column1Row2data,column2Row2data
> ..etc..
> However, column2 was about 6000 bytes of unicode. We are bulk inserting
> into a table specifying nvarchar(4000). When it encounters high unicode
> rows, it throws a truncation error (16).
> We really need information contained in the first 200 bytes of the
> string in column2.

You can use a format file like this one:

8.0
3
1 SQLNCHAR 0 200 "" 1 a Finnish_Swedish_CS_AS
2 SQLNCHAR 0 0 "," 0 dummy ""
3 SQLNCHAR 0 0 "\r\n" 2 b Finnish_Swedish_CS_AS

Here you defined the host file to have three fields: the first is a
200-character long fixed length field, the second is closed by a ,
and the third field is close by end-of-line. By specifying a 0 in
the sixth column in the format file for the second field, you specify
that this field is is not be imported into SQL Server.

You may want to change the collation what fits with the collation you
use in your database.

Note that this only works if all occurrances of the first field is
more than 200 characters. Would there be a record with a shorter
length of this field, it will steal characters from the second field.
(You would probably get an error when importing the file, as BCP will
not find the delimiter for the second field.)

--
Erland Sommarskog, SQL Server MVP, sommar@.algonet.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||I wanted to thank you for answering my question. We successfully
implemented a variant of your solution last night. The assistance is much
appreciated.

/Ty

"Erland Sommarskog" <sommar@.algonet.se> wrote in message
news:Xns9401F28A16A1CYazorman@.127.0.0.1...
> Ty (tybala on the server at hotmail.com) writes:
> > Here is the situation, please let me know if you have any tips:
> > .TXT files in a share at \\foo
> > SPROCS run daily parses of many things, including data on that share.
The
> > other day, we encountered rows in the TXT files which looked like:
> > column1Row1data,column2Row1data
> > column1Row2data,column2Row2data
> > ..etc..
> > However, column2 was about 6000 bytes of unicode. We are bulk inserting
> > into a table specifying nvarchar(4000). When it encounters high unicode
> > rows, it throws a truncation error (16).
> > We really need information contained in the first 200 bytes of the
> > string in column2.
> You can use a format file like this one:
> 8.0
> 3
> 1 SQLNCHAR 0 200 "" 1 a Finnish_Swedish_CS_AS
> 2 SQLNCHAR 0 0 "," 0 dummy ""
> 3 SQLNCHAR 0 0 "\r\n" 2 b Finnish_Swedish_CS_AS
> Here you defined the host file to have three fields: the first is a
> 200-character long fixed length field, the second is closed by a ,
> and the third field is close by end-of-line. By specifying a 0 in
> the sixth column in the format file for the second field, you specify
> that this field is is not be imported into SQL Server.
> You may want to change the collation what fits with the collation you
> use in your database.
> Note that this only works if all occurrances of the first field is
> more than 200 characters. Would there be a record with a shorter
> length of this field, it will steal characters from the second field.
> (You would probably get an error when importing the file, as BCP will
> not find the delimiter for the second field.)
>
> --
> Erland Sommarskog, SQL Server MVP, sommar@.algonet.se
> Books Online for SQL Server SP3 at
> http://www.microsoft.com/sql/techin.../2000/books.asp