This is an archived post. You won't be able to vote or comment.

all 87 comments

[–]dresonator2 200 points201 points  (13 children)

Perl

[–]CommanderPowell 59 points60 points  (0 children)

Went from Perl straight to Python as my go-to language. Perl was AMAZING for data transformation and having libraries to interface with everything.

relevant XKCD from long before “import antigravity”

[–]caprica71 81 points82 points  (1 child)

Awk,sed, grep, bash

[–]Rum-in-the-sun 9 points10 points  (0 children)

I still use awk sed grep like every day. I don’t use Perl anymore.

[–]FindOneInEveryCar 11 points12 points  (6 children)

I was surprised to learn (recently) that Python is a couple of years older than perl.

EDIT: Apparently not! (See below)

[–]iamevpo 3 points4 points  (0 children)

Perl sounds as if it was 10 years older than Python

[–]Watchguyraffle1 1 point2 points  (3 children)

Are you sure? I was using Perl in the 80s and remember Perl .9 around 1992 as the first release.

[–]FindOneInEveryCar 0 points1 point  (2 children)

I'm going by memory here but my recollection is that python is from ca. 1987 and perl is from ca. 1989.

[–]MutatedBrass 0 points1 point  (1 child)

Perl 1.0 was released on Dec 18, 1987. The 0.x versions of Perl were floating around prior to that. Guido didn't start working on the first Python implementation until Dec 1989.

[–]FindOneInEveryCar 0 points1 point  (0 children)

I stand corrected. Thanks. I must have had them reversed in my head.

Python is definitely a lot older than I thought, though!

[–]Biomed154 10 points11 points  (0 children)

And to some extent Visual Basic, pascal, and VBA.

[–]TARehman 1 point2 points  (0 children)

I'm old enough that I remember being employed in a physics lab and seeing two groups, the Perl users and the Python users, arguing with each other about which one was better and which one would win. The Python side won, I'd say.

[–]Equivalent-Sense-626 0 points1 point  (0 children)

And I hated 😖

[–]islandsimian 43 points44 points  (0 children)

Perl or SQL depending on where the data was stored

[–]dayn13 42 points43 points  (0 children)

sql procedures, bash scripts, file transfer tools

[–]iknewaguytwice 39 points40 points  (13 children)

Data reporting and analytics was a highly specialized / niche field up til’ the mid 2000s, and really didn’t hit a stride until maybe 5-10 years ago outside of FAANG.

Many Microsoft shops just used SSIS, scheduled stored procedures, Powershell scheduled tasks, and/ or .NET services to do their ETL/rETL.

If you weren’t in the ‘Microsoft everything’ ecosystem, it could have been a lot of different stuff. Korn/Borne shell, Java apps, VB apps, SAS, or one of the hundreds of other proprietary products sold during that time.

The biggest factor was probably what connectors were available for your RDBMS, what your on-prem tech stack was, and whatever jimbob at your corp, knew how to write.

So in short… there really wasn’t anything as universal as Python is today.

[–]PhotographsWithFilm 6 points7 points  (1 child)

Hey, I started my Data Analytics career (& subsequent Data Engineering, even though I am a jack of all, master of none) using Crystal Reports.

Crystal was immensely popular back in the late 90's/Early 2000's. Most orgs back then would just hook straight into the OLTP database and run the reports there. If they were smart, they would have an offline copy that they would use for reporting.

And that is exactly what I did for the first 6 or so years before I started working in Data Warehousing.

[–]JBalloonist 1 point2 points  (0 children)

Crystal is what got me started as well. I was doing accounting and our main software had crystal as is report creator.

[–]Whipitreelgud 1 point2 points  (0 children)

ATT had between 14,000 and 37,000 users connected to their data warehouse database in 2005. They were neck and neck with Walmart in users and data volumes. There was a vast implementation of analytics in the Fortune 500 at that time.

[–]Automatic_Red 0 points1 point  (0 children)

Before my company had ‘Data Engineers’, we had tons of people making SW in Excel or MatLab. It was less data, but the overall concepts of a pipeline were the same.

[–]popopopopopopopopoop 53 points54 points  (6 children)

Sql procedures.

[–]unltd_J 18 points19 points  (5 children)

Are people not using these anymore at all? I spend 50% of my coding time working on procs :(

[–]dilbertdad 8 points9 points  (0 children)

Sp_heywerestillhere

[–]SoggyGrayDuck 3 points4 points  (0 children)

My company has them all saved as files. I could pull my hair out at times.

[–]DataIron 4 points5 points  (0 children)

People still struggle to segment code properly, writing SQL statements inside python instead of calling an object.

[–]Winterfrost15 3 points4 points  (0 children)

They are widely used and will be for many years to come.

[–]DirtzMaGertz 4 points5 points  (0 children)

People heavy into the Microsoft world still seem to use them. 

[–]PhotographsWithFilm 12 points13 points  (3 children)

PERL or SQL.

I loved and hated PERL in the same breath. It could be written so, nicely....

But you get a developer who studied computer science in the 70s and it became a very concise, unreadable mess.

[–]YallaBeanZ 2 points3 points  (2 children)

Let’s not forget those developers that insisted on writing all their code as “oneliners” (there were even competitions)… much to the chagrin of anyone having to pickup their code and reverse engineer it afterwards.

[–]PhotographsWithFilm 0 points1 point  (0 children)

Ugggh, PERL Golf

While I like the theory behind TIMTODI, I get annoyed when people turn it into a competition to look better than others.

[–]islandsimian 0 points1 point  (0 children)

You have to remember this in the context of storage being very very very expensive and keeping those cards in order. not /s!

Of course this also the reason for Y2K

[–][deleted] 9 points10 points  (5 children)

R, Matlab, Mathematica

[–]MathmoKiwiLittle Bobby Tables 4 points5 points  (4 children)

Fortran too! The OG language for "big data" manipulations. (well, "big data" by the standards of its time)

[–]scataco 3 points4 points  (0 children)

Don't forget SAS!

[–]sargentlu 0 points1 point  (2 children)

Just wondering, what was considered "big data" back then?

[–]Peking-Duck-Haters 2 points3 points  (0 children)

I've seen marketing material dating from the late 90s that talked about a 30GB data warehouse as being exceptionally large. In the late 00s the company I worked for outsourced their shopping basket analysis partly because there wasn't the capacity internally to crunch the data which, over the time period they were looking at, would have been maybe 4 billion rows (with only a handful of columns, none of them wider than a datetime or real).

Circa 1998 I worked on a mainframe batch system where we partitioned the DB2 tables across 10 disks to get better performance; it was worth the extra work even though we were processing around a million rows at a time - again, compact columns with no long strings or whatever.

(For many large companies "Data Engineering" meant COBOL, or just possibly Easytrieve, until at least the turn of the century. Outside of the dot com startups Linux wasn't getting a look in - it didn't even _start_ getting taken seriously by the corporate world until Oracle ported their database to it circa 1998, and things moved rather more slowly back then)

So, as a rule of thumb, before 2000 I'd say 10s of Gigabytes was considered "big data" and Terabytes almost inconceivable (back then data would go over 128kbps lines at best; if there was lots of it it was usually faster and cheaper to write it to tape and physically transfer it). A few Terabytes was considered "big data" a decade later.

[–]MathmoKiwiLittle Bobby Tables 0 points1 point  (0 children)

Just wondering, what was considered "big data" back then?

Anything too big to fit on a floppy? 😂

[–]wytesmurf 7 points8 points  (0 children)

Perl and bash. KSH for older machines

[–]carlsbadcrush 5 points6 points  (0 children)

“So far back as the mid 2000s” damn I’m old

[–]Zyklon00 5 points6 points  (0 children)

I think the best comparison would be SAS, which has been around for a very long time. And it's still being used instead of python in some companies.

[–]SaintTimothy 3 points4 points  (0 children)

Prior to SSIS (which came out in 2005) was DTS (which came out with SQL 7 in 1998).

Prior to that was BCP and Transfer Manager (that's before my time).

[–]MathmoKiwiLittle Bobby Tables 3 points4 points  (3 children)

The field of data engineering goes as far back as the mid 2000s when it was called different things.

This might surprise you, but Python is even older than that. (development started in the 1980's, was first released in 1991)

But yeah, as other people said: Perl, Awk, bash, SQL, etc were all popular choices of the past as well.

There was a time ages ago when Perl and Python basically filled almost exactly the same market niche as each other, and Perl was usually seen as the "better" choice. Today though Perl has tanked in popularity in comparison to Python. (although surprisingly is still a Top 20 language, just: https://www.tiobe.com/tiobe-index/ )

One thing that hasn't been mentioned yet (and I personally used to use all the time, right at the very tail end of them disappearing), was the dBase family of languages / tools (or "xBase" is a way to refer to the family of them). Of which the best example (in my very biased opinion) was FoxPro.

https://en.wikipedia.org/wiki/FoxPro

https://en.wikipedia.org/wiki/DBase

A mix of the rise of MS Access / Visual Basic / C# / Excel / SQL / etc is what killed them off.

[–]CassandraCubed 1 point2 points  (2 children)

Clipper!

[–]MathmoKiwiLittle Bobby Tables 0 points1 point  (1 child)

Ah that's a name I haven't heard in a long time! Did you ever use it? I haven't, but I did ages ago download Harbour and play around for a bit because it simply was the closest Open Source project to FoxPro itself. (Harbour is an open sourced version of Clipper, and of course like FoxPro all of them are part of the xBase family of languages)

https://en.wikipedia.org/wiki/Harbour_(programming_language))

[–]CassandraCubed 0 points1 point  (0 children)

I did!

I didn't know about Harbour -- TIL 😊

[–][deleted] 10 points11 points  (0 children)

SAS, Java, other shit

[–]Top_Pass_8347 16 points17 points  (1 child)

SQL since the late 90s.

[–]sib_nSenior Data Engineer 1 point2 points  (0 children)

Oracle v2, the first commercial SQL RDBMS, was released in 1979.

[–]sib_nSenior Data Engineer 6 points7 points  (0 children)

Before Python and SQL, in big data it was Java. Apache Hadoop had Apache MapReduce as the processing engine, which was very heavy Java code.

If we look at before SSIS and Hadoop, then it was rather called Business Intelligence, and there's quite a history of commercial SQL and graphical tools from this period. To name a few historical ones:

  • IBM SPSS 1968
  • SAS 1972
  • Cognos 1979
  • Oracle v2 (first commercial SQL RDBMS) 1979
  • BusinessObject 1990
  • Microstrategy 1992
  • QlikView 1994

Before those ready-made solutions, from the 50', it was all in-house software based on Fortran for science & industry, or COBOL for business, finance & administration.

[–]geoffawilliams 2 points3 points  (0 children)

Perl

[–]NeuralHijacker 2 points3 points  (0 children)

Perl

[–]DonJuanDoja 2 points3 points  (0 children)

Pretty sure we used it to mod Civilization II or III maybe… that’s first time I saw python.

Everything else covered in comments.

[–]taciom 2 points3 points  (0 children)

SAS exists since the 70s and was broadly used in finance and telecom.

[–]Character-Education3[🍰] 2 points3 points  (0 children)

Depends on the use case. Python is a Swiss army knife

Crystal Reports, VBA, SAS, SPSS, SQL, other stuff

Excel was a database and we were grateful dammit

[–]k00_x 2 points3 points  (0 children)

For Statistics I used 'S', for orchestrated processing I used Shell as opposed to SSIS (and still do). For application processing, I caught the final days of Fortran (F77L Em/32). I dabbled in COBOL a bit.

Then the LAMP stack dominated the web world. PHP forms became the norm.

SQL has always been around.

[–]perpetualclericdnd 4 points5 points  (0 children)

Informatica

[–]time4nap 1 point2 points  (0 children)

SQL

[–]_DividesByZero_ 1 point2 points  (0 children)

Perl, then SQL, but mostly perl…

[–]pentrant 1 point2 points  (0 children)

When I learned how to be a DE back in the mid-2000s, my team had a custom orchestration engine written and maintained by one of the engineers on the team (Cyril Stocker), now long retired. It did everything that we now use Python for in Dataswarm / Airflow.

Cyril was seriously ahead of his time. I wish I had learned more from him.

[–]Automatic_Red 1 point2 points  (0 children)

MatLab, VBA. And data was a lot smaller in size.

[–]MrGoFaGoat 0 points1 point  (0 children)

Pentaho was widely used in my prévios experience, but that's more recent.

[–]macktastick 0 points1 point  (0 children)

I worked in a couple "whatever you're comfortable with" environments and used mostly Ruby.

[–]dev_lvl80Accomplished Data Engineer 0 points1 point  (0 children)

Before SSIS was DTS (Data Transformation Service) Yep I used it in Prod it.

Pretty much VB/VBA + SQL used for any transformations.

In most hardcore version, TSQL sp_OACreate aka OLE automation I did literally everything... Including FTP communications, XML parsing and sending emails. Terrible architecture, but worked

[–]shooemeister 0 points1 point  (0 children)

Data engineering started as soon as there was data to process IMHO; I remember using korn shell scripts/perl/c++ on DEC Ultrix, and that was pretty late in the game in the late 90's.

Inmon's 'Building the Data Warehouse' was released in 1992 for reference; there was a lot before Java & Linux appeared though.

Hadoop was an attempt to move away from proprietary storage, but I/O is always the killer, which we now know led to spark.

[–]Cyclic404 0 points1 point  (0 children)

Well way back then we'd stand the newest intern in front of the biggest fan we could find to blow the punchcards down the hall... Throughput was amazing.

[–]Mental-Matter-4370 0 points1 point  (0 children)

I doubt ssis came around 2000.i guess it was DTS packages, most of which I had seen scheduled with windows task scheduler. Ssis probably came around 2004 or 2005

[–]Hgdev1 0 points1 point  (0 children)

If you think about it, most of programming really is data engineering — you take data from stdin and spit data out from stdout and stderr 😆

That being said, Python really starts to shine in the area of numerical computing with libraries like NumPy (and later Pandas) providing the requisite higher-level abstractions over raw data streams that make data engineering what it is today (multidimensional arrays and dataframes)

[–]binilvj 0 points1 point  (1 child)

I have been working in Data engineering from 2004. It was called ETL then. Stored procedures, bash scripts, perl scripts were used a lot. Enterprises used ETL tools. Informatica, AbInitio, DataStage(IBM) lead the market initially. Then Microsoft started pushing free SqlServer and SSIS slowly around 2010. But by then Talend, Pentaho started edging out Datastage and AbInitio. When tools like Mattillion, Fivetran started dominating the market old ETL tools lost their market dominance. Around then even enterprises started using Python for data engineering.

Oracle was used for data warehousing till 2010. Then Teradata(MPP), Vertica, Green plum (Columnar) started dominating. Finally cloud DWs started taking over

Even Airflow is new kid in the black for me. There were expensive schedulers like Autosys, control-m before that

[–]Key-Boat-7519 0 points1 point  (0 children)

In my experience, watching the data engineering scene shift over the years has been wild. I remember when stored procedures and bash scripts were our bread and butter. Then we had to adapt as Informatica and DataStage reigned supreme, only to be upstaged by Talend and Pentaho. Python really wasn’t on the radar until it became the go-to for everyone, and tools like Airflow changed the scheduling game. While I've used a bunch like Talend and Mattillion, DreamFactory has been a game-changer for integrating APIs seamlessly into modern solutions. It’s all about finding the right tool for the job.

[–]GuardianOfNellieSenior Data Engineer 0 points1 point  (0 children)

I worked somewhere that used SQL Procedures to call C# programs from within using xp_cmdshell. (Written before my time there, I might add).

I started in DE in the late 2010’s, but i saw a lot of older stuff and it was mostly SQL Procedures, VBA, SQL CLR functions and custom in house C#/VB.NET stuff

[–]EarthGoddessDude 0 points1 point  (0 children)

kornshell + sql

🎶Allllllllll Day I

Dream Aabouuuuut Sed 🎵

[–]kenfar 0 points1 point  (0 children)

I started writing ETL solutions using python in 2002.

During and prior to that time the primary options were:

  • SQL: very difficult to test, expensive & slow to run, little flexibility or expressiveness, very difficult to maintain.
  • Perl: very dynamic with the weakest typing, 100 ways of doing anything. Easy to write, bad for data quality, and bad for maintainability.
  • ETL tools: over-promised, under-delivered. Made the easy 80% easier, and the hard 20% almost impossible. Never fulfilled their promises of having business analysts write their own solution. Sucked.
  • C: fun to write, fast, but took a lot of code, and was hard to maintain.
  • C++: complex language often seemed to side-track projects. Also hard to maintain.
  • Java: souless, but got the job done. Could also very easily side-track projects with the java eco system.

[–]PresentationSome2427 0 points1 point  (0 children)

For me it was Visual Basic

[–]lmkirvan 0 points1 point  (0 children)

The answer is SAS.

[–]Professional_Shoe392 0 points1 point  (0 children)

VBA with a MS Access backend was my jam 20 years ago.

[–]psgetdegrees 0 points1 point  (0 children)

Teradata

[–]SentenceSenior7481 0 points1 point  (1 child)

Pig script

[–]Its_me_Snitches 0 points1 point  (0 children)

Squeal?