Official CozyRoc Blog

November 13, 2007

Implementing custom SSIS Control Flow Task in C++

Filed under: SSIS, Tips — COZYROC @ 11:41 pm

This topic is a work in progress. It will stay this way until Microsoft SSIS product team decides to document their native interfaces for third-parties to use. If you want to develop custom SSIS task in C++, you have to load yourself with patience and be ready to spend some time in the debugger and with Reflector. The undocumented native interfaces for implementing SSIS Control Flow Task can be found by opening the COM type library information in the DTS.dll library. The following interfaces represent roughly the base class Task from which all custom SSIS task must inherit:

There is one more interface IDTSComponentPersist90, which has to be implemented for serializing your task. After you are done implementing these four interfaces, you have to make your task visible to the SSIS framework. This is done by including additional information in the registry for your COM object. The additional registry keys under the specific task CLSID are:

  • DefaultIcon – Must point to an icon resource located in your DLL. Should have the format <your library>.dll,0 where 0 is my guess the index of the icon resource.
  • Programmable – I’m not sure if this is mandatory, but I saw other stock SSIS tasks using it.
  • Implemented Categories – Should contain {CD0DB111-958E-440D-9111-7D70D1490063} subkey. Categorizes your COM object as SSIS task?
  • DTSInfo – Various additional SSIS task metadata.
    • Description – task description.
    • ResourceFile – string resources containing additional task descriptions. Should have the format <your library>,10,20 where 10, 20 are string resource ids.
    • TaskContact – information describing where the SSIS originated from, copyright info, etc.
    • TaskType – Describes task compatibility. For SQL 2005, it should say DTS90.
    • UITypeName – Task user interface type reference.

After you compile and open an SSIS package, you should be able to browse and find your custom SSIS Control Flow Task listed.

p.s.

We haven’t spent time yet developing custom SSIS Data Flow Components, but you may check the type library in the DTSPipeline.dll library.

November 5, 2007

Even Easier SSIS Package Configuration

Filed under: SSIS — COZYROC @ 3:42 am

We all know the standard hard-coded SSIS package configuration types like XML, registry, INI file, SQL Server. Our opinion is that the SSIS package configuration implementation in the SQL 2005 incarnation is half baked, with no way to extend it with other configuration types. We have used for a long time in DTS a different configuration setup based on Excel worksheets. It gives you better maintenability and flexibility because you can use the Excel engine as a powerful initialization model. We have created a small Script Task to demonstrate the concept. You can use it as a sample to create your own configuration mechanisms. Please install SSIS+ 1.2 to use it. Enjoy.

November 4, 2007

Comparing performance of a raw file vs. CozyRoc Data Flow Source and Destination

Filed under: SSIS — COZYROC @ 12:42 am

As we have promised earlier, we have measured the performance of our new components Data Flow Source and Data Flow Destination. We have used the package provided by Jamie Thomson who has done similar test, comparing the performance of raw data file vs. ADO recordset. We have extended the package to include test for our components and another test without intermediate data flow step. The test without intermediate step is important because it establishes the baseline comparison. The current performance cannot be improved beyond the baseline, unless Microsoft makes changes to the SSIS data flow pipeline implementation. Here are the results:

Number of rows Baseline execution time Raw file execution time Data Flow execution time Data Flow vs. Baseline Data Flow vs. Raw
100000 5.531 6.156 6.203 -10.83% -0.76%
200000 7.359 8.5 8.313 -11.48% 2.25%
300000 9.14 10.891 10.438 -12.44% 4.34%
400000 11 13.218 12.625 -12.87% 4.70%
500000 12.765 15.563 14.75 -13.46% 5.51%
600000 14.563 17.797 16.828 -13.46% 5.76%
700000 16.391 20.218 18.875 -13.16% 7.12%
800000 18.282 22.578 21.297 -14.16% 6.01%
1000000 21.781 27.218 25.453 -14.43% 6.93%

Also in chart form:

Data Flow Performance

The performance improvement compared to the raw data file approach is evident. But what is even better is that the Data Flow Source and Data Flow Destination components are implemented to be non-blocking. So if the same test is executed on a dual or more processor machine, it will execute the reading and writing steps in parallel and show even better performance compared to the sequential raw data file approach. Another shortcoming of the raw data file approach is the fact that it depends on the file system and requires space to transfer the data. Some might say these days you have powerful subsystems with advanced caching, which is true of course if you can afford it.

If you want to check the test package, you may download it from here.

November 1, 2007

CozyRoc SSIS+ 1.2 Beta 1 Released

Filed under: Announcement, SSIS — COZYROC @ 8:59 am

It is November 1 and a perfect day to announce Beta 1 of CozyRoc SSIS+ 1.2 library. We have included some really powerful components, which will serve as foundation for other advanced components:

  • Package SSIS Connection Manager – Provides access to the current or different package at runtime.
  • Data Flow Source SSIS Data Flow Component – Very fast, non-blocking, in-memory data read from another data flow.
  • Data Flow Destination SSIS Data Flow Component – Exposes data from a data flow for in-memory read in another data flow.

We will soon blog about the performance of these new components. Our preliminary tests show they match the performance of the raw data file source/destination components, but there is one important difference. Our components do not need to go thru intermediate file write/read step. They are reading/writing in parallel and the process is not sequential as the Microsoft recommended solution for transferring data from one data flow to another. You can try it for yourself by downloading the package.

And of course the new release is available for download here.

Create a free website or blog at WordPress.com.