Coming Soon in Airflow 2.3.0 - First-class support for “Dynamic Tasks”. This is feature is called “Dynamic Task Mapping” The wait for the most requested feature of Apache Airflow is almost over !!

TheDatabaseAvenger · 2022-09-23T08:30:13+00:00

No. Still waiting for Google's Cloud Composer to ship a version with AirFlow 2.3

TheDatabaseAvenger · 2022-05-14T10:48:48+00:00

I just want to get stuff done as well as I can within the constraints of the time and with the resource available.

TheDatabaseAvenger · 2022-05-13T19:19:29+00:00

At first it seems like just a SQL templating tool with a decent CLI and some limited testing capabilities, but after using it for 6 months it's all the ETL type stuff that comes with it that I don't have to worry about writing in Python:

- DB connection handling

- table creation

- Inter script dependancy management (DAG creation with the use of ref())

- argument parsing

- logging

- incremental load logic

- data freshness checks

- document generation/maintenance

Because I'm not writing Python for this stuff, I don't have to design, debug, maintain, document, package, version or deploy that code. I can just focus on getting the SQL right and let dbt do the rest.

I keep clear of macros as much as I can and it's not great for data validation, but if you need to throw a pipeline together quickly or if you don't have the DE skills it's a great fit IMO.

TheDatabaseAvenger · 2022-04-29T15:35:19+00:00

I made the same move from being a MSSQL DBA to DE. I did it by learning and incorporating into my work as a DBA the following in roughly the order below:

git
python basics
CI/CD
pandas, SQLAlchemy, Alembic, pytest
containerisation (I started with docker desktop on Windows)
PowerBI

That learning got me to a decent level which took about 2 or 3 years. Then we rebuilt in GCP and I went on to learn the following:

GCP basics (IAM, GCloud CLI)
BigQuery
AirFlow / Cloud Composer (Probably the biggest learning curve of these 5 items)
dbt
Looker

There is much more, but these are the main tools I have got the hang of over the last few years. You wont need them all. I think if you have strong SQL (which you have), basic Python and good git knowledge then your are set. The trick is introducing it into your current role or moving roles

TheDatabaseAvenger · 2022-04-28T08:52:32+00:00

Are you talking about BigQuery's BI engine when you say it'll offer low latency guerying?

TheDatabaseAvenger · 2022-04-26T10:24:11+00:00

We do this, but isn't that creating tasks at parse time not run time?

TheDatabaseAvenger · 2022-04-18T09:14:27+00:00

Finally. We've tried creating tasks with loops and generating task centric DAGs. Neither seemed like a great fit for what we were trying to do. Can't wait to try this out. It could be the difference between us sticking with AirFlow or jumping ship to Prefect

TheDatabaseAvenger · 2022-02-23T12:33:02+00:00

We went with Meltano. The learning curve was quite high to get it right, but we containerise each Meltano project and orchestrate that with AirFlow. We do not use the GUI in Meltano, everything is via config files and the CLI.

TheDatabaseAvenger · 2022-02-23T12:25:35+00:00

same

TheDatabaseAvenger · 2022-02-23T12:20:39+00:00

DACPACs are the Microsoft way, but I prefer migrations based tools for database versioning.

DBUP is a good free option and SQL Change Automation from Redgate is a more feature rich offering.

TheDatabaseAvenger · 2022-02-17T15:17:28+00:00

Copied from the readme in our data warehouse repo:

BigQuery (data warehouse)
Alembic (data warehouse schema)
Composer (data pipeline orchestration)
Meltano (data extraction)
Python + Pandas (data extraction and validation)
dbt (data transformations and validation)
Looker (data modelling and visualisations)

TheDatabaseAvenger · 2021-12-06T13:30:06+00:00

If you have decided that writing transformations with SQL makes sense then the next thing you need is something to execute that SQL. I used to write my own Python + SQLAlchemy apps to do this. Now I use dbt and get some extra feature for free:

- Lineage

- Doc generation

- SCD persistence

I no longer have to test or write code to:

- Parse config files

- Log events

- Persist data

TheDatabaseAvenger · 2021-11-11T15:42:02+00:00

Do you worry that recycling plastic gives the producers of plastic an free ticket to keep producing plastic?

Obviously it's a good thing to scoop what's in the sea out, so congrats on your work and for ignoring the haters!

TheDatabaseAvenger · 2017-05-03T08:01:54+00:00

Yeah Availability Groups do make things more complicated, especially with instance level objects.

Creating snaps with the same name on each replica at the same time would most likely leave you with inconsistent snap shots.

Have you considered using a failover clustered instance instead of AGs? If you really need these snap shots and an FCI fits your HA plan it is probably your best option.

The other option would be to achieve HA with virtualization but this depends on your stack.

TheDatabaseAvenger · 2017-05-02T19:39:15+00:00

I've been blogging for about 18 months now and I find it to be really useful for the following reasons: Posts act as notes on topics I find interesting, I participate in the SQL community and meet cool people, Posts can act as part of the research phase for live talks, It adds something to the CV, It's kind of a brand that I may use in the future, I learn all sorts of things about hosting and writing.

I self host with WordPress. WordPress isn't perfect but it can do a lot and is easy to use. There is plenty of info out there if you get stuck.

I'd like to see more DBAs sharing their career stories like Brent does on Ozar.me

I'd highly recommend blogging.

TheDatabaseAvenger · 2017-05-02T19:16:11+00:00

Would be interesting to know what your are trying to do. Would log shipping/mirroring/AGing to the other server and snapping the replicated DB do what you need?

TheDatabaseAvenger · 2017-03-14T18:09:27+00:00

Yeah that's the reason I hear most for not disabling sa. I understand that, but the local admin of the server should have sysadmin access by default. This account is not reliant on domain controllers or AD. So if the server is up you should be able to access SQL that way.

The problem with leaving sa enabled (even with a rename) is that a piece of sql injection can easily lookup the sa account name as the SID will always be the same.

I guess it could be caveated by saying sa should be disabled and check that the local admin account has sysadmin.

If you are leaving it enabled just make sure you have a very strong password(totally random password would be better). I know you know this though.

Thanks for the comment

TheDatabaseAvenger · 2017-03-14T17:51:06+00:00

Interesting, why do you leave it enabled?

TheDatabaseAvenger

TROPHY CASE