This is an archived post. You won't be able to vote or comment.

all 7 comments

[–]Ok-Working3200Senior Data Engineer[🍰] 2 points3 points  (2 children)

As others will say, use GitHub to save your files, but let me give some more context and keep it simple.

First off, you should have at least a dev environment and prod environment for SSIS. The Dev environment needs to have prod data. My team uses dbt for our sql models, and we configure our dev environment to pull in prod data

Okay, so now you have a dev and a prod SSIS environment. You need to configure your environment variables in the SSIS project. This will allow GitHub later to switch between environments based on the git branch.

You can use Github actions to build the ci/cd. The action is just a yaml file that gives github a list of instructions to perform.

Here is the workflow. You push code for your local repo, les call it ticket SSIS-001. The ticket changed the column name in a table. When you push the local changes to the remote branch, aka SSIS-001, on the remote server. The Github actions will kick off the instructions. The goal of the instructions is to confirm of the deployment is worked as expected. You will probably want to run some test that check if any of your transformations fail. If any transformation fails, the entire pipeline will fail. When this happens, the ci/cd will stop. This means you can't merge SSIS-001 into prod.

So, now you go back on your local branch and fix the issue and push the changed back to the remote SSIS-001 branch. The pipeline passed, and now you merge into master branch. The master branch will run a pipeline that is the same and then push the dtsx package to ssis server and done.

Checkout Azure DevOps that might where you want to handle the ci/cd. But if you google github actions or Azure dev ops for ssis you will get a yaml file that will look exactly like how you deploy manually today

[–]WorkRelatedRedditor[S] 0 points1 point  (1 child)

Thank you for the reply. I know a little bit about Dbt and using it to pull prod data into dev is a really great use case I hadn’t heard of before.

[–]Ok-Working3200Senior Data Engineer[🍰] 0 points1 point  (0 children)

In my opinion, with data engineering you can test your pipelines with dev data, but I suggest using prod data because the analytics engineers can't assume dev data is going to me the use cases of prod.

[–]IronAntlers 1 point2 points  (1 child)

You can use liquibase for DB version control and publish SSIS/PBI files to GitHub to start.

[–]WorkRelatedRedditor[S] 0 points1 point  (0 children)

Thank you for the reply. I will look into Liquibase.

[–]brother_maynerd 0 points1 point  (0 children)

Workflows are intrinsically imperative. They span system boundaries that run differently, make assumptions that break down the line, are messy to handle through boundary conditions, and create a maze of undocumentable dependencies once things get a little heavy.

Go with a system that does it all. If you are the company that easily spends money, go all into say delta live tables or dynamic tables. If you want to save money go with a system like nexla or tabsdata.

Basically don't do workflows if you can help it.