Tuesday, March 27, 2012

Bulk inserts to Data Warehouse - Best Practices?

Hello all,

I just started a new job this week and they complain about the length of
time it takes to load data into their data warehouse,
which they do once a month.

From what I can gather, they rebuild the indexes before the insert with an
80% Fillfactor, then insert the data (with the
indexes enabled), then rebuild the indexes with a 100% Fillfactor.

Most of my RDBMS experience is with a different product. We would have
disabled the indexes and Foreign Keys, loaded the data, then
re-enabled them, moving any records that violated the constraints into an
appropriate audit table to be checked after.

Can someone share with me what the accepted "best practices" are for loading
data efficiently into a data warehouse?

Any thoughts would be deeply appreciated.

SteveIn article <kZI4d.27460$pA.1692759@.news20.bellglobal.com>,
steveee_ca@.yahoo.com says...
> Hello all,
> I just started a new job this week and they complain about the length of
> time it takes to load data into their data warehouse,
> which they do once a month.
> From what I can gather, they rebuild the indexes before the insert with an
> 80% Fillfactor, then insert the data (with the
> indexes enabled), then rebuild the indexes with a 100% Fillfactor.
> Most of my RDBMS experience is with a different product. We would have
> disabled the indexes and Foreign Keys, loaded the data, then
> re-enabled them, moving any records that violated the constraints into an
> appropriate audit table to be checked after.
> Can someone share with me what the accepted "best practices" are for loading
> data efficiently into a data warehouse?
> Any thoughts would be deeply appreciated.

Your method, dropping indexes, unless part of a constraint, would be
proper. You could BCP the data into a flat table and then process it
from the flat table to the destination table after you clean it in the
flat table.

--
--
spamfree999@.rrohio.com
(Remove 999 to reply to me)|||Here's a link to a DTS/BI best practices doc:
http://msdn.microsoft.com/library/d...ntbpwithdts.asp

Every situation is different, with a lot depending on how much data is being
loaded and what (if any) indexes need to be in place to support the ETL
process. It seems a bit odd to rebuild indexes twice, though. Normally,
one drops all but the required indexes beforehand and recreates the other
(reporting) indexes afterward.

--
Hope this helps.

Dan Guzman
SQL Server MVP

"Steve_CA" <steveee_ca@.yahoo.com> wrote in message
news:kZI4d.27460$pA.1692759@.news20.bellglobal.com. ..
> Hello all,
> I just started a new job this week and they complain about the length of
> time it takes to load data into their data warehouse,
> which they do once a month.
> From what I can gather, they rebuild the indexes before the insert with an
> 80% Fillfactor, then insert the data (with the
> indexes enabled), then rebuild the indexes with a 100% Fillfactor.
> Most of my RDBMS experience is with a different product. We would have
> disabled the indexes and Foreign Keys, loaded the data, then
> re-enabled them, moving any records that violated the constraints into an
> appropriate audit table to be checked after.
> Can someone share with me what the accepted "best practices" are for
> loading
> data efficiently into a data warehouse?
> Any thoughts would be deeply appreciated.
> Steve

No comments:

Post a Comment