Hello,
I have a transformation in which the column of data at the flat file source is nine characters long, and typically contains a string of six or seven zeros with a non-zero number in the last two or three characters. If none of the records in that column were an empty string, I think I could get away with this:
DTSDestination("TTLCrd") = CInt(DTSSource("Col004"))
The destination is a SQL Server 2000 table, and the column is of type Integer. What do I do when Col004 is an empty string? I've tried a couple of different IF statements, but they have not worked. Empty strings need to become zero values.
Thank you for your help.
cdun2Avoid using DTS to insert directly into production tables.
Avoid putting logic or code that manipulates data in your DTS package.
Use DTS for moving data from point A to point B. All the other features of DTS (or the new SSIS) are crap, and lead inevitably to bad application design.
Best practice is to use DTS to pipe your data to a staging table and then kick off a stored procedure to process the data in the staging table, verifying and cleansing the data before pushing it into the production tables. The stored procedure will hold all of the data logic, including the COALESCE() function, which will easily convert your NULL values to zeros.|||Thank you for your response!
cdun2|||Best practice is to use DTS to pipe your data to a staging table and then kick off a stored procedure to process the data in the staging table, verifying and cleansing the data before pushing it into the production tables. The stored procedure will hold all of the data logic, including the COALESCE() function, which will easily convert your NULL values to zeros.
So what you say is that you should have staging tables (potentially with all nvarchar field if you for instance import from text files), generate a whole lot of disk activity, before you kick off stored procs to do all the work for you? I cannot see why one would want to do it that way. I'm pretty satisfied with the way SSIS work, and would very much like to know why you discourage use of the SSIS features.|||I'm totally with blindman on this. My question in turn would be why would you want to incorporate data\ business logic into discrete, proprietry DTS\ SSIS packages? Virtually everything else I do in T-SQL unless there is no alternative. Even if I used DTS or SSIS at all it would be to get stuff from outside the database to inside it with as little fuss as possible - nothing more. I see nothing gained throwing these ETL tools at the problem when the standard SS language is perfectly capable of everything I have come across so far.
You probably know though that I go further even than blindman and do not use DTS or SSIS at all.
As far as the disk usage is concerned, from my perspective I work with large batch systems with well specced servers. We are not on a big time or resource pressure when we load so for me it is not a consideration.|||KISS.
By not creating code in my DTS package, I keep all of my logic in one place(the database) and in one language(SQL). That makes debugging much easier.
It also compartmentalizes my process, meaning I could use DTS, or BCP, or SSIS, or ASP, or a friggin' Access macro to load data, and my sproc will process it. I can even have multiple data flows going into the staging table.
My staging tables contain columns that record the source of each record, the time it entered the database, and a column for recording any processing errors. As my sproc cleanses, verifies, and loads the staging data, any discrepancies are noted within the ErrorStatus column. At the end of the process finding any records that failed and the reasons for their failure is a snap, and all I need to do is fix the existing staging records and reset the ErrorStatus column, and then I can rerun the sproc.
ETL has become bread and butter to me now. I can almost write these sprocs and packages with my eyes closed.
ROAC, how easily did your DTS packages upgrade to SSIS?|||As far as the disk usage is concerned, from my perspective I work with large batch systems with well specced servers. We are not on a big time or resource pressure when we load so for me it is not a consideration.
Well, I see. In that case I would do the same. However, there are many MANY companies around, especially in smaller countries, that cannot afford this kind of systems. The gap between for instance an EVA4000 and EVA6000 is huge for many companies, and they have to take disk performance into consideration.|||KISS.
ROAC, how easily did your DTS packages upgrade to SSIS?
Agreed. Keep things simple. If you are having a lot of data sources, I find it more easy working with SSIS than having tons of procs.
When it comes to upgrading, my DTS packages upgraded pretty well, but I know quite a few that did not, which of course is an issue. However, with almost the same arguments you come up with, you could tell to do the work in ASP.NET or Java as well, and have issues when .NET Framework or Java language is upgraded. Or, for that matter, when SQL Syntax changes.
As I said in my last post, please keep in mind that there are smaller countries and companies in the world. What's best for an enterprise is not neccessarily best for a small or medium sized business in Scandinavia or the Baltics. Thus, I think your advices perhaps should be something like "If you can afford ..., you should ...". Got my idea?
I have no problem seeing your points, I just cannot see it as the only solution.|||Well, I see. In that case I would do the same. However, there are many MANY companies around, especially in smaller countries, that cannot afford this kind of systems. The gap between for instance an EVA4000 and EVA6000 is huge for many companies, and they have to take disk performance into consideration.I might add that the organisation I worked for prior to this was certainly not large, nor specialist like my present company, but I stuck to the same philosophy then. Actually - I think I just did :)
The most expensive resource of all is the bum in the seat supporting the hands keying in the code. This bum is likely to know T-SQL if he\ she is working with SQL Server. Why is it cost effective to introduce a GUI based ETL tool into the mix when bog standard T-SQL is perfectly capable? The worst short term ROI an employer gets from me is when I am wrestling with a new language\ gui\ tool etc.. I think the time aspect of my point might be a pressure for using SSIS\ DTS but not the resources - I'm afraid I have no idea what the difference is between EVA4000 and EVA6000 :)|||ROAC, why do you think my method of storing the data in staging tables and then running a sproc against them is going to entail more disk activity/server resources than an SSIS package?
Presumably the amount of data being imported is some fraction of the data that already exists in production, so if the server is beefy enough to handle day-to-day processing it should not choke on running sprocs against staging data. Especially since these are normally run as batch processes during maintenance hours.|||As the data is written do disk twice instead of once, yes it WILL consume more disk resources. If you are lucky enough to do all the job at night, well then you are definitely more lucky than I am. As I said previously, if you can have the extra disk load, your approach is the best. I'm not questioning that. I just want to make it clear, for you and other reders, that your scenario is not the only one around. Other people may have other needs, which will lead to other solutions. Doubling the disk activity required to import data is not always an option. You are lucky enough to have that option, but is it so hard to believe that other people not neccessarily have the same situation as you?
I think you perhaps should open your eyes a bit and look around, because there are solutions out there, behaving quite differently from those you are working with.|||Disk activity, hmmm..
Well, as a consultant who has created production and data warehouse ETL solutions for dozens of companies in many industries I can't say I've run into a situation where that was the deciding factor in the application architecture. But I'll grant such a situation may be possible.
Obviously I don't think my method is the only way to go. There are certainly many implementations using DTS, SSIS, or 3rd party tools (I recently had to suffer through a project where the client used a tool called DataStage).
The problem is, too many inexperienced people jump into creating DTS/SSIS solutions simply because THEY assume that THAT is the only method. When in reality (and I'm not backing down on this), these GUI tools are SELDOM the best method and lead to fragile designs that are difficult to debug or modify.|||BTW - Roac - totally agreed that everything deserves evaluation in the context of the entire circumstance. There are no absolutes - that is why we are having this fun discussion where we are sharing our opinions :)|||Its all good. <\TouchGloves>|||Lol - Roac - I've just remembered my first post addressing you was when I thought you had made a somewhat absolute statement too. What goes around comes around eh? On that one you were in agreement with blindman.|||I sit on the fence in these issues...
There are many things that I've written in DTS/SSIS where there is simple conversion of incoming data, and as long as there is a well defined way to deal with "bad" data this works very well. The thing that most DTS packages seemed to (incorrectly) assume was that all of the incoming data would be processed correctly on the first try.
When there are enough resources and time, the staging table approach works very well. You need to keep in mind that time is often my biggest problem, and that large amounts of data usually take large amounts of time to process. This can make it functionally impossible to stage some kinds of data because the processing window isn't large enough to support physically handling the data multiple times.
I don't have any inherant problem with either approach. Both work, and with the proper discipline for the ETL approach and sufficient resources (disk, time, etc) for the staging approach they can both produce the same results. Unfortunately, both discipline and resources are often scarce, so you need to find the best solution for the problem at hand.
-PatP
No comments:
Post a Comment