I am using SQL Server 2005 Management Studio Express.I am trying to insert records in bulk into "students" table.
Students:
uniqueid varchar(9) (primary Key)
lname varchar(35)
fname varchar(35)
My problem is when one record from the bulk record already exists in Students table all the bulk insert fails. I dont want that. I want rest of the records to be inserted smoothly with no error.
How can I do it....Is there any disadvantage if i do this?
You could bulk insert into a staging table and then do the logic to insert only non-duplicated keys. I rarely (if ever) insert directly into a "master table". Once data is in the staging table you can do a simple insert.
Insert into Students
Select uniqueid, lname, fname, ....
from staging_Students
where uniqueid not in (select uniqueid from Students)
Drop the staging table after you are done, or leave it there to determine why you are getting duplicated keys from the insert file...
|||
I am using staging table and inserting using not in condition...but i am trying to see if there is a way around...the reason is bc i think the performance of query will come down if i do again select....will the performance go down?students table will have 60000 records...
|||You could also use Openrowset with the BULK option (see books online for details). You would have to define a format file to use for the BULK option.
Insert into Students
Select uniqueid, lname, fname, ....
From Openrowset(BULK bulk parameters) as A
Where uniqueid not in (select uniqueid from Students)
|||
Yes, the performance depends on the number of rows in the main table and the inserted rwos. But another option is to remove the primary key on the uniqueid column and create an unique index with the ignore_dup_key option ON. This will result in the duplicate rows from being ignored during BULK INSERT. This will eliminate the additional join using staging table approach. In any case, you should use the staging table if you have more complex data cleaning tasks to perform.
So do the following:
1. Drop primary key on uniqueid column
2. Add unique index like:
create unique index ix_uq_uniqueid on Students(uniqueid) with ignore_dup_key = on
3. Now, bulk insert as before and the duplicate rows will be ignored automatically.
|||umachandar...students table can have 60000 records. At a time I can insert 300 records into that. This is my scenario...can you tell which approach is better to follow( using ignore_dup_key or use not in condition to check duplicates). Do I need staging table if i user ignore_dup_key option.? I am having clustered index on uniqueid column in students table rite now.
Scenario:
60 users will be using my application. each user can upload a .csv file with 300 records at maximum.
This is what i am doing rite now....when the user uploads a .csv file from front end the .net application will invoke sqlbulkcopy class built into .net 2.0 and does a bulk insert ot 300 records into staging table. After that i invoke a stored procedure which puts the data into 3 different tables. so my procedure will have 3 insert statements(with not in conditions). After successful insertion i delete the data from the staging table for that user. the number of users and records in students table can grow up further in furture...
or should I user the ignore_dup_key option and avoid the staging table?
Thanks...
No comments:
Post a Comment