We've been using the Business Intelligence Studio as part of Visual Studio 2005. In general, we find its a well designed ETL tool, but for one problem - its too SLOW!
From what I can see, there are two basic problems: 1) the need to repeatedly validate packages against the database; and 2) the assumption that each package gets its own connection to the database.
Problem #1: Validation
The basic problem with the speed issue seems to be its constant need to validate against the underlying database. For example by default:
I have a project where we're doing replication of tables from a SYBASE database to a SQL Server 2005 database. For each table, we've created a package. We now have 50-60 packages.
Every time you open up the project, it re-validates each package against both the SYBASE database and the SQL database. Each build re-validates. Each execute in Debug mode re-validates. Each run in production re-validates. All this validation on a slow database connection makes basic development tasks very slow.
The same thing happens when running the package in DTExec or through the SQL Agent - it pre-validates every package. This is both good and bad - its good in that it means it will fail your package before trying to execute some bad SQL or badly mapped columns, but given that you're going to run the same package over and over again without a change, there should be a way to turn off this step for packages that haven't changed.
I've looked around and haven't seen any method to change this behaviour - it would be great to have some more lazy validation, e.g. once I have validated it once, don't re validate unless I actually make a change to the package.
Problem #2: Data Sources
In an SSIS Project, there are data sources that you can define once as a common data source. However, there seems to be two problems with the data source model:
1. You cannot store an ADO.NET data source configuration as a shared data source - these have to be re-created in each package. Given how common these of connections are, this seems to me to be a big oversight.
2. Each package when its loaded creates its own connection to the database. Its very slow to create a connection and then tear it down especially with either a slow network or a slow database (e.g. an old legacy connection).
This causes problems for us with so many packages. When I open my project, Visual Studio opens each package and validates it. This causes a connection to created to every database in the package in parallel! With our old SYBASE database, it simply gets overwhelmed when Studio tries to validate 50 packages all in parallel.
My only solution so far is to consolidate packages together - I've been taking 20-30 packages and combining them into a single monster package. This reduces the connections created and speeds up validation. However, this strikes me as unreasonable from a re-usability perspective as it reduces package modularity.
I've done a bit of searching around these settings and I haven't found any way to configure the way that packages are validated - if anyone knows of a way, put a comment up so I can be corrected!
Subscribe to:
Post Comments (Atom)
Subscribe Now
Blog Archive
-
▼
2007
(99)
-
▼
April
(27)
- MIX07 Update: Dynamic Data Controls
- MIX07 Update: Feedback on Expression Web
- Bringing You The Best from MIX07
- Finding People Across Multiple Social Networks
- The Growing Phenomenon of Internet
- Classic Presentation from Uncle Bob Martin
- Service Oriented Architecture - "I think we spend ...
- Where Will The Cool Kids Be Hanging Out Next?
- Do We Really Need an AJAX Conference?
- Microsoft .NET is Hit with Patent Suit
- Amazing Use of Google Earth - Crisis in Darfur
- Is Posting on Facebook Enough Action To Ease Your ...
- The Cause for RIM Going Down is a Software Failure
- We Need a New Paradigm for Web Analytics
- Are You A Guerrilla Project Manager?
- The Biggest Security Threat - Paper and the Front ...
- Fun with Naming Conventions
- E-File Follow-up
- Best Tool for Task and Bug Tracking is JIRA!
- Vintage Material on YouTube
- Desktop R.I.P? Don't Count on It!
- SSIS and Visual Studio 2005 are very slow...
- Is it a Web Site or Blog?
- Expression Web and Blend will be available for MSD...
- Microsoft should let Apple have the Toy Market
- Countdown to Mix 07 in Las Vegas
- Test Driven Development within Visual Studio 2005
-
▼
April
(27)

3 comments:
Hi Chris,
Good post - you've pointed out some good problems here. A couple of points tho:
"Every time you open up the project, it re-validates each package"
That's not true. If the package is not open in the designer (i.e. if it wasn't open last time you closed the project) then it won't get validated.
You're aware that you can "work offline" in the designer so that nothing gets validated, right?
"should be a way to turn off [validation when running from dtexec]"
This is a very good idea. Have you requested it at connect.microsoft.com? However, I do't ever recommend using it i a production environment :)
Are you aware that if you set property DelayValidation=TRUE on an executable then it won't get validated when the package spins up? This could be the "lazy validation" that you talk about.
"You cannot store an ADO.NET data source configuration as a shared data source - these have to be re-created in each package. Given how common these of connections are, this seems to me to be a big oversight."
You can store the connection string in a configuration thus allowing it to be used in different packages. Is that what you mean?
"Each package when its loaded creates its own connection to the database. Its very slow to create a connection and then tear it down especially with either a slow network or a slow database (e.g. an old legacy connection)."
Yes, it does. In fact more than that, by default each executable in each package has its own connection to the database. This was an intentional move away from the DTS behaviour where the same connection was used because this could create blocking. This is correct in my opinion. Note that the DTS behaviour can be mimicked using the RetainSameConnection property on the connection manager.
Hope this helps.
-Jamie
Hi, do you solve the speed problem?
I also meet this problem, and dtexec is also too slow to run a package. actual running time is 0.1s, but the total time is more than 40s..
email to me if you solve it. Thanks.
brucenan#gmail.com
In principle, a good happen, support the views of the author
Post a Comment