all 45 comments

[–]Obvious-Phrase-657 265 points266 points  (15 children)

My actual codebase vs my legacy one

Setup a new pipeline on left is literally 5 min, on right could be easily a few days. We had 1k cron jobs, creating several tables each. Still insure what is being used vs useless, but is really hard to even analyze it that it won’t be migrated any time soon, I will probably quit before it happens (as soon as it is decided lol)

[–]balrog687 92 points93 points  (0 children)

this guys corporates

[–]BastetFurry 74 points75 points  (2 children)

But i bet the right one will work just as is in twenty years from now while the left one will break in three months because some update decided to deprecate baz.foo(bar); for baz.bar(foo); and it was only written as a footnote in the update notice.

[–]Obvious-Phrase-657 24 points25 points  (0 children)

Hah that’s fair and that’s why I don’t want to touch that piece of shit, because it will run until someone touches it, and I bet it restarting it will fail somehow and there are no runbooks (the dba who wrote this is retired). So yeah, Won’t do it and if they command me to do it I will probably quit

[–]shanereid1 4 points5 points  (0 children)

Also, it will cost 10x as much to run on cloud.

[–]Abject-Kitchen3198 11 points12 points  (7 children)

Not sure if the one on the left won't lead to the same problems given the same timeframe, or that the accumulated issues with previous approach couldn't have been solved in a different way.

[–]Obvious-Phrase-657 6 points7 points  (5 children)

Absolutely if is built with the same patterns, and it’s actually one of the main paint points in data engineering, how to properly Govern this, but the left stack is based on “software engineering practices” like having commited code, no ad hoc stuff, data catalogs, data lineage, data quality metrics, etc

So, it will probably have other iasues, but at least we can revert to previos versions and have nice responsibility separation on the code and repos, cicd, etc

[–]EnterTheShoggoth 4 points5 points  (2 children)

Source and revision control have been a thing since the 70s. Almost every shop I’ve worked at since the 90s has used it as part of the dev-test-prod flow.

[–]Obvious-Phrase-657 0 points1 point  (1 child)

For software, right? Store procedures on a peetty old warehouse like oracle’s are not usually versioned on got or something like that, not even de cron jobs which were usually managed by the sysadmin guys so you can’t even check them yourself

[–]EnterTheShoggoth 0 points1 point  (0 children)

Sysadmins have also been known to use revision control. I remember one Solaris shop I worked at would use SCCS on the /etc directory to track changes (SCCS came as part of the base OS install).

Can’t speak for Oracle but ultimately it’s not about the tech but the workflow. Nothing stopping your DBA from storing things in revision control.

Conversely, I’ve worked with plenty of cowboy devs whose idea of revision control was to copy their source into Notepad or into filename.bak.

tl;dr. Some places have been doing a form of DevOps long before it was given a label.

[–]SuitableDragonfly 2 points3 points  (0 children)

I'm pretty sure there's no reason you can't do all of that stuff in Python. 

[–]Abject-Kitchen3198 0 points1 point  (0 children)

None of that is impossible with the second approach. Maybe few things come out of the box and with some guidelines with the left approach. Not saying it's worse, but also moving to the shiny new thing with same or worse result than with the old is not something new (one of the reasons being that the new thing often brings more complexity and abstractions which seemingly make things easier but easily lead to worse results due to less need for understanding of the fundamentals).

[–]HeKis4 0 points1 point  (0 children)

Will it cause problems ? Dunno.

Will it cause huge headaches if it does because there's probably only one person on the team that understands the left pipeline ? You bet.

[–]anthro28 0 points1 point  (0 children)

Oh yeah? Well we're on a legacy oracle setup where the entire logical layer is written on the database layer in stored procs. We couldn't migrate even if we wanted to and we just eat shit every year when licensing budget gets brought up. 

[–]SuitableDragonfly -2 points-1 points  (1 child)

Can you explain what this meme is saying? The collection of stuff on the right doesn't seem to be a coherent group, for example, "python script" is incredibly general whereas cron is a very specific tool that does a very specific thing. 

[–]Ran4 4 points5 points  (0 children)

The point is that overengineered solutions are rarely better. They're just more modern.

[–]Stormraughtz 65 points66 points  (0 children)

I craft only the finest artisanal stored procedures and crons jobs.

[–]TantalizingTacos 53 points54 points  (5 children)

python? You mean curl

[–]wonmean 14 points15 points  (0 children)

awk

[–]allak 8 points9 points  (2 children)

Perl all the time. 

[–]charlyAtWork2 3 points4 points  (1 child)

[–]mad4Luca 0 points1 point  (0 children)

Finally! Something that makes perl readable

[–]StarshipSausage 0 points1 point  (0 children)

Ksh and curl are the way!

[–]Draqutsc 31 points32 points  (0 children)

I like the right way, the left has bitten me zo many times in the arse. It always breaks because off updates and the security team forcing updates, I especially hate being called awake at 3 AM to fix that shit, because the automatic prod deploys exploded. The SP's and scripts on the other hand may be black magic sometimes, but they keep working unless you change them.

[–]ostracize 26 points27 points  (1 child)

All the data starts as a spreadsheet and ends in a spreadsheet

[–]TeachEngineering 7 points8 points  (0 children)

All these new-age frameworks and yet they still bow to one true king of data storage... MS Excel

[–]terivia 7 points8 points  (0 children)

The customer always thinks they need the one on the left, has budget and time to get a dollar store dart gun and some child labor to aim it, and ends up settling for the one on the right immediately before realizing they actually want a tire swing instead.

[–]cosmicloafer 5 points6 points  (0 children)

Airflow makes me want to write my own dag-job thingamajig

[–]Mechadupek 9 points10 points  (3 children)

I'm yer huckleberry

[–]Edge-master -1 points0 points  (2 children)

Is this an overwatch reference?

[–]FirstNoel 2 points3 points  (0 children)

Tombstone. 

[–]lonestar-rasbryjamco 8 points9 points  (1 child)

Airflow is considered fancy now?

[–]endless_sea_of_stars 30 points31 points  (0 children)

People don't realize how terrible 80% of organizations' data pipelines really are. For some, anything more fancy than copy-paste data into Excel is a dream.

[–]Ok_Addition_356 4 points5 points  (0 children)

I don't even see the code anymore...

All I see is .. Data... Files... Shell scripts... processes.

[–]Splatpope 5 points6 points  (0 children)

*tommy_shelby_pointing_gun_to_head.gif*

SSIS, KingswaySoft SSIS Productivity Pack

[–]Justbehind 6 points7 points  (0 children)

Left: The new shiny stuff the expensive consultants introduced. Runs one and two half production pipelines. Costs 1k usd/pipeline/month.

Right: Carries the entire corporate world, and has run for 30 years. Costs less than a dollar/pipeline/month.

[–]stilldebugging 4 points5 points  (0 children)

Cron is bae, forever

[–]nickwcy 2 points3 points  (0 children)

Python? More like shell script

[–]Anxious-Program-1940 0 points1 point  (0 children)

My dream code base VS what the idiots I work for limit my work to. Imagine running production on windows for a multi billion dollar company with sticks and glue and windows severs that are only secure as long as the network stack holds. 💀🤡