Sunday, March 25, 2012

Bulk Insert vs. Data Flow Task (different row results using flat file source)

I'm importing a large csv file two different ways - one with Bulk Import Task and the other way with the Data Flow Task (flat file source -> OLE DB destination).

With the Bulk Import Task I'm putting all the csv rows in one column. With the Data Flow Task I'm mapping each csv value to it's own column in the SQL table.

I used two different flat file sources and got the following:

Flat file 1: Bulk Import Task = 12,649,499 rows; Data Flow Task = 4,215,817 rows
Flat file 2: Bulk Import Task = 3,403,254 rows; Data Flow Task = 1,134,359 rows

Anyone have any guess as to why this is happening?

It seems that there's a factor of 3 in between the two components, which could mean that Flat File Connection Manager is interpereting 3 rows as 1. This might be related with the row delimiter in the Flat File connection manager.

Can you try to use preview in the Flat File connection manager, and see if rows show up correctly?

|||Yeah, thanks for the reply. I've checked the preview and it looks fine. I've also tracked down some of the skipped rows and there's nothing signficant about them. They look exactly the same as all the others. Perhaps there is a hidden character or something? Is this possible?|||

Can you compare the rows imported by FlatFile with the rows imported by Bulk insert task and see if there is any difference between them?

Thanks,
Ovidiu Burlacu

|||yes, i can do that and the resolution is that they look exactly the same. ug. there really is no indication i can see to why a rows get dropped. they are not the same type, they don't have rows before them or after them with any significant difference. it's really very puzzling.|||

Can you put a data viewer in your pipeline and see what rows pass through at execution time?

Thanks,
Ovidiu Burlacu

No comments:

Post a Comment