I would like to build an ETL process using T-SQL to read data from various data sources and perform data cleasing, aggregation, etc..
This ETL process runs several times each month and move data from Source systems to a SQL DWH (SQL 2016 DB).
The ETL will do the following:
1-Read data from Source SQL tables.
2-Insert Data in Staging tables
4-Delete/truncate data from Final tables
5-Insert new data in Final tables
6-generate error log, rollback mechanism etc..
1) How many tables involved?
- 10 tables to read from
2) What is the Final destination?
- a single Fact table in the DWH, several Dimension tables
3) what do we want to accomplish?
- A process to read data from source and populate fact, dimension table
4) How do you want the process to run?
- interactively like: "exec ETL_process "
1) The process will be divided in steps and will display a message after each step like that
"Step 1: Load employee data completed" for instance.
2) If the ETL process is divided to 20 steps for instance and we run into errors at Step 14. I should be able to restart the process from Step 14 once the issue fixed. Meaning I don't want to rollback the entire 14 steps rather I would like to do something like :
"exec ETL_process Stp 14"
This will restart the process from where it failed.
Nice to have:
-Be able to run the procedure from a GUI/Web interface.
-Be able to convert this SP to a SSIS package to be part of SQL job
Ideally you are:
-Very familiar and comfortable with writing SQL code
-Have been engaged in DWH project in the past involving data movement/validation and can share best practices.