Tuesday, March 27, 2012

Bulk insertion

Hi,

I am working on an application that is to read a large number of XML files, take out specific values from each file, and store these in a SQL server so that reports can be generated from these values. There are some 15-20,000 files for each month of the year. I am OK with parsing the files and getting the fields that I need but I don't want to insert one record at a time as I parse the files. I was told that I can create a .exe file that parses the xml files and stores the required values in a csv file and use these csv files to initiate a bulk insert, using Business Intelligence Studio. I have not been able to find any info or article on how to do this. Any help on how I can accomplish this, or alternate solutions is greatly appreciated.

You can make use of SQL BulkCopy feature of ADO.NET. Load your XML into a DataSet/DataTable and run SQL Bulk copy into your table. Very few lines of code.

using (SqlBulkCopy bulkCopy = new SqlBulkCopy(connectionString))
{
foreach (string tableName in tableNames)
{
SqlBulkCopy(dataSet.Tables[tableName], tableName, bulkCopy);
}
}


private static void SqlBulkCopy(DataTable dataTable, string tableName, SqlBulkCopy bulkCopy)
{
bulkCopy.DestinationTableName = tableName;
bulkCopy.ColumnMappings.Clear();
foreach (DataColumn myCol in dataTable.Columns)
bulkCopy.ColumnMappings.Add(myCol.ColumnName, myCol.ColumnName);
bulkCopy.WriteToServer(dataTable);
}

|||

Thanks for the reply Raghu. I don't need all the data elements in the XML files, only a selected few. This is an example XML file:

<xml xmlns:s='uuid:BDC6E3F0-6DA3-11d1-A2A3-00AA00C14882'
xmlns:dt='uuid:C2F41010-65B3-11d1-A29F-00AA00C14882'
xmlns:rs='urn:schemas-microsoft-com:rowset'
xmlns:z='#RowsetSchema'>
<s:Schema id='RowsetSchema'>
<s:ElementType name='row' content='eltOnly' rs:updatable='true'>
<s:AttributeType name='c0' rs:name='AFR Finalization' rs:number='1' rs:write='true'>
<s:datatype dt:type='string' rs:dbtype='str' dt:maxLength='18' rs:precision='0' rs:fixedlength='true' rs:maybenull='false'/>
</s:AttributeType>
<s:AttributeType name='Processed' rs:number='2' rs:write='true'>
<s:datatype dt:type='int' dt:maxLength='4' rs:precision='0' rs:fixedlength='true' rs:maybenull='false'/>
</s:AttributeType>
<s:AttributeType name='Rate' rs:number='3' rs:write='true'>
<s:datatype dt:type='string' rs:dbtype='str' dt:maxLength='7' rs:precision='0' rs:fixedlength='true' rs:maybenull='false'/>
</s:AttributeType>
<s:extends type='rs:rowbase'/>
</s:ElementType>
</s:Schema>
<rs:data>
<rs:insert>
<z:row c0='CIF '/>
<z:row c0=' Input ' Processed='12028'/>
<z:row c0=' Finalized ' Processed='5444' Rate=' 45.26%'/>
<z:row c0='RTS '/>
<z:row c0=' Input ' Processed='9802'/>
<z:row c0=' Finalized ' Processed='5504' Rate=' 56.15%'/>
<z:row c0='Interception '/>
<z:row c0=' Input ' Processed='12639'/>
<z:row c0=' Finalized ' Processed='8220' Rate=' 65.04%'/>
</rs:insert>
</rs:data>
</xml>

I only need CIF/Input and RTS/INPUT from this particular file. There will be one of this file for each day of the month and I will have to process a month's worth of xml files. There are tens of different xml files but they all follow the same schema.

Thanks.

No comments:

Post a Comment