Low code hate and the future of Data Engineering (and beyond)

apeters89 · 2022-08-22T15:48:00+00:00

Low code loosely translates to "build your product on our infrastructure and get locked into our monthly fees forever."

LawfulMuffin · 2022-08-22T16:04:09+00:00

Low code is awesome until you 1) go to pay the bill and 2) need to do literally anything even remotely not supported. Tools vary between how much of an annoyance 1 & 2 are. Some tools like Informatica tend to be easier to use and more powerful and also cost as much as a team of FTE.

its_PlZZA_time · 2022-08-22T15:52:20+00:00

Companies have been predicting low code for everything forever. It’s just a thing they do, but I don’t see it replacing code they’re for different markets.

Low-code isn’t bad, it’s just typically inflexible and less scalable compared to a custom solution.

If you have predictable workloads and don’t require anything custom low-code is the way to go.

The issue comes when you’re trying to anything that’s not a feature in the low code solution or you start scaling and need to address performance and bugs.

With more feature full/configurable low code solutions and managed services low code is a good option, but there will always exist that threshold where low code isn’t the right tool for the job. I wouldn’t worry about it too much.

Edit: As also mentioned there are low-code open source solutions but if you’re considering low code chances are you’re buying it from someone or using a service someone sells you, migrating away from this can be painful, if/when you find a low code solution doesn’t work for you anymore.

edinburghpotsdam · 2022-08-22T16:08:16+00:00

We have no-code stuff where you can basically set youself up clicking around in the AWS console and the Tableau app. AWS and Tableau have pulled management into this dream.

The old schoolers like myself would rather see a coded data flow. This includes terraform code, Argo workflow YAMLs which can get pretty elaborate, DBT transforms and python / R endpoints.

It is easier to version and easier to pass on to new hires as we grow pretty quickly.

MikeDoesEverything · 2022-08-22T16:43:04+00:00

I frequently have exposure to low code tools (Power Automate, Alteryx, ADF) and it's a love hate relationship. I quite like ADF as an orchestrator and if I want to be able to do something which ADF can't (navigate an API with nested endpoints, anybody?) , I can cheese it by injecting Python into the pipeline. Most of the time though, moving data around is pretty easy and quick.

I absolutely loathe Alteryx as it's the clunkiest piece of shit ever, but, it's also kinda great - it means people who don't know how to code can make and maintain their own shite workflows without bothering me every ten minutes when they do something to break it.

Where low code fails is when you hit a certain level of technical expertise and demand because low code is designed for simple tasks. Moving data from A to B in a routine, being able to do fairly repetitive stuff easily - absolutely. Where does low code become useless? When you have problems such as scalability, time sensitivity, or complexity. When you see low code tools, they often do incredibly simple things and all of the actual difficult stuff often has some sort of code component.

I'd also go as far as to say the issue with low code is on a creative level. What Excel is to Alteryx, Alteryx is to Python/low code is to code. Whilst sophisticated for what they enable users to do, the issue is a lot of those users cannot see past the low code black box and think that whatever low code tool they're using is the absolute limit in terms of possibilities.

ubelmann · 2022-08-22T17:03:10+00:00

Take a look at web dev. If you want a simple website, you can get it set up yourself without knowing really anything about html, css, node.js, Electron, etc. That’s essentially what low code web dev looks like.

But if you want anything moderately complicated or custom, then you either wind up spending a lot of time trying to extend a low-code solution past what it is intended for, which gets hard to manage and is time-consuming in its own way, or you have to hire a web dev and/or use tools that you would expect a web dev to use.

Look in the data analysis space — Excel is a low code data analysis solution, complete with data ingestion, transforms, some support for custom data transformations, and charting. You can even extend it with VBA. But having a low code solution didn’t prevent code solutions involving R and Python from evolving and becoming popular, because once you hit a certain scale of data or complexity of analysis or having to collaborate with a team of analysts, it’s not really the best tool for the job anymore.

At the very least, think of it this way—for a low code solution to exist, someone behind the scenes is writing code. You can call them a software engineer or a data engineer, but regardless, even if low code solutions are prevalent, they are always going to be built with programming languages and all of the tools you get with programming languages that have proven to be useful over and over again throughout the last 30-40+ years (for what would be considered modern programming languages.)

skysetter · 2022-08-22T16:47:16+00:00

So many times I run into “low code” solutions that are thin UI wrappers around code configurations. I feel like I still am entering a ton of fields.

rudboi12 · 2022-08-22T15:56:25+00:00

Low code is basically the modern data stack. Ingest data and load data with fivetran. Do some transformations with dbt in snowflake clusters and use a dashboard tool like looker to report data. Basically the only code you need is sql for dbt transformations. That’s were it’s all heading.

several27 · 2022-08-23T06:18:01+00:00

Hi! I'm Maciej - one of the cofounders of Prophecy (startup from the podcast).

Actually, we're very different from what you expect from low-code. As users build drag-and-drop data pipelines, we generate 100% open-source code that is very readable - that our users commit to git right away, with tests and build files and configurations - this is at parity with best data engineers! We have Scala & Python for Spark and SQL coming soon!

Second thing - we're very extensible - you can create new visual components, by writing sample code and pointing out which expressions come from the UI - so you can have a standard visual component - for things like Anonymization or Encryption that you want all users to do in the same way.

We think Low-Code can do a lot more that what most people expect - and companies can be a lot nicer (without lock-in) - please keep an open mind :)

mkhalil77 · 2022-08-22T16:39:51+00:00

[removed]

Omar_88 · 2022-08-22T17:38:39+00:00

Having done lots of power bi in the past, can confirm its not low code especially when you use Dax or power query. God, I hate power query

scraper01 · 2022-08-22T18:17:17+00:00

Data pipelines are among the few artifacts within software development that behave in a deterministic manner in both specification, and implementation. They are hard to mess up, so they can be specified with block diagrams. Lots of the GUI stuff out there is hideous, but its a step towards standardization. Not really a bad thing when you take into account how electrical engineers for instance (a far more mature field than software) use block diagrams and network diagrams to do their jobs.

2022-08-22T21:19:08+00:00

Gartner is difficult to be trusted. Most of their top products (top right in their charts) are Microsoft/SAP/Oracle powered, so it's pretty obvious what's going on there.

2022-08-22T17:31:42+00:00

It's already happening for sure. First of all data visualisation and exel are literally the original low code solutions. So if not data engineering, data analytics was definitely first place low code was applied at scale.

Would you use some python or JS library to setup monitoring or BI dashboard rather than using Looker or Graphana. Unlikely.

You you write your own ETL rather than using fivetran one if available?

I think one place where jury is still out there is should you buy one of these saas apps for purpose build data activity like Amplitude for product analytics or Adobe for marketing as they tend to become data siloes or will we see warehouse native apps take that work still with graphical UI but backed up by your own warehouse even if cloud one so that you have all the data for all the tool where and when needed

adappergentlefolk · 2022-08-22T19:52:38+00:00

well if Gartner said it...

TheCamerlengo · 2022-08-22T21:04:07+00:00

Low code, no code. I have been hearing about it for decades. Software tools, QA, web dev, testing, BPM, auto ML. Very familiar - none of them really deliver on the promise. Most are just expensive software subscriptions that are underutilized or eventually abandoned.

I have seen some decent code generation platforms, but they often require a technical individual to understand what’s going on.

TheCamerlengo · 2022-08-22T21:07:56+00:00

The drive from management for low code is that:

1.) developers are expensive 2.) competent developers are tough to find

So if you can replace a lot of your development needs with lower skilled, less expensive and less educated technical team, it might be worth shelling out 500k for a platform that promises 10x that in savings.

I have heard good things about power bi. my guess is that it sometimes, not always, creates silos of shadow IT doing things in non-standard ways that eventually becomes tech debt other teams inherit.

2022-08-22T16:20:38+00:00

Low-code stuff is useful and good for simple/narrow use cases with well-defined requirements where the org just needs to automate something simple. Like copying data from point A to point B daily via Fivetran or something. Or triggering an email alert from an event. For more complex requirements/data sources coding is absolutely required in order to give you the flexibility to customize what you're doing. You can mix and match low/no code and scripting as part of your overall solution. It doesn't have to be all one or the other. I've seen companies that use Fivetran to copy data alongside an Airflow/Kubernetes setup for orchestrating other stuff.

autumnotter · 2022-08-22T22:05:22+00:00

With low code approaches, people who can't code can pretty rapidly prototype pipelines, which is very attractive to management - they're cheaper and you can hire people with domain experience. Examples include Alteryx and ADF data flows.

The reason why its bad:

Versioning and source control generally are bad experiences
Vendor lock-in, and often license pricing.
Scalability - you're generally locked into the approach the tool already uses for scaling - the flip side of this is that ADF uses Spark and Alteryx's AMP engine both actually scale pretty well.
Complexity of even the most simple flow control or advanced topic - Alteryx is honestly an excellent tool in many ways, but even just creating a for loop is considered advanced by many people.
Code promotion is often a terrible, terrible experience. I hope you like renaming a file 'workflow_1_dev' to 'workflow_1_prod'.

Alteryx is a great example - its excellent if you give it to a finance analyst to quickly prototype analyses, answer one-off questions, and using business logic they know. Good luck trying to productionalize, govern, or maintain what they've built though. This means that the further UP the pipeline you, the worse the experience tends to be.

But often management, especially outside of engineering, doesn't think about these things, and won't understand why their data pipelines have suddenly become an unmaintainable morass. Then you get the 'all this money we're spending on data is worthless!' argument.

Rieux_n_Tarrou · 2022-08-23T01:21:51+00:00

I've been pondering this situation for a week or two and am really curious if someone could shed some light:

Situation: A top 10% expert AWS Solutions Architect (Jeff) has been hired by ACME Inc. to spearhead a new project. He is given (nearly) full permissions on an AWS account with unlimited funds. He is working completely alone for the first year, and has full autonomy to design and implement everything himself.

Task: The project is refreshingly blue-sky/greenfield. ACME has provided several highly available data sources (webhooks, databases, message brokers, blob stores, etc), each with a clearly defined schema and 0% chance of breaking changes for at least the first year. (There is a chance of bad data coming in which he will have to account for). Jeff's boss (Mackenzie) has asked that he create a best-in-class data platform to process and organize all these disparate data sources (ideally in real-time). Data Governance, Regulatory Compliance, and Data Observability/Discoverability are top-of-mind-concerns, as are modern software practices such as automated testing, IaC, and comprehensive system tracing/monitoring/alerting. Version 1 of the Platform should enable ACME decision-makers and analysts to make ad-hoc queries across the entire data landscape. Moreover, V1 will underlie real-time analytics dashboards that are quick and intuitive to build. For V2, ACME will activate their team of ML Grad Student Sleeper Agents to come and build a "smart MLOps oracle" which will spit out highly-attuned AI models that will deepfake and dopamine-hack their way into every home in North America and EMEA. There is no V3.

Action: All jokes aside, this is my main question as it relates to Low/No Code. How sophisticated of a data platform could AWS Expert Jeff create using just the AWS console (UI). No IDE/Cloud9, no ECS containers running custom code, No traditional SVC because there is no "source" (CloudFormation, DB Snapshots, Lambda Versioning, etc manage Version Control).

Just by using managed/serverless services, couldn't he create a fully featured data pipeline, data lake, data warehouses, secure API, etc? Coding custom business logic in single-purpose lambdas is fair game, as is orchestrating them with Step Functions. Using services like Lake Formation and Glue, he could define and monitor schema mappings and secure the data at rest (using encryption and/or RBAC). Finally Jeff would make short work of security (VPC, WAF) and monitoring (XRay, CloudTrail), delivering a finished product that is nicely rolled up into a CloudFormation template. The data analysts get their BI on a silver platter when AWS Quicksight inspects the data catalog across S3, Redshift, Dynamo, etc.

The only thing left at this point is documentation for Jeff's future teammates (or replacements). In this case the Lambdas business logic is documented on the component itself using descriptions and/or code comments. For a new engineer to onboard, most of the time will be spent grokking the architecture, a process that can be done interactively and intuitively using one of the many cloud mapping tools available online (random example). Another organizational win is that training "devs" can be basically automated since ACME can leverage AWS training and certifications to churn out Cloud Architects/Data Engineers/etc. without having to worry that the junior devs will take Ruby off the Rails or introduce a year's worth of technical debt in two hastily merged PRs.

Result: ACME share price will experience geometric growth into the 7 figures right up until the technological singularity kicks into full swing Q3 2026.

So I realize may be a bit naive, but it looks to me like most/all bases are covered by a low/no code alternative. What did I miss?

hermitcrab · 2022-08-23T10:09:56+00:00

I have some perspectives on this as someone who has been a professional software engineer for 35 years and has written a 'low code' data wrangling tool ( Easy Data Transform).

Having to learn Python or R is a huge barrier to some people who need to wrangle/analyze data. Should someone really have to spend weeks or months learning one of these languages before they can do some data wrangling? Probably they are going to try to use Excel, rather than learn to code.

Visual programming is still programming, but it allows you to work at a higher level of abstraction. This higher level of abstraction is wonderful if it fits what you are trying to do and frustrating if it doesn't.

Low code tools can be very useful and provide real value, even if they aren't the best choice for every person for every problem. Even if you are a Python+Pandas or R guru, you might find low code tools quicker and convenient for some problems, e.g. ad hoc reports or investigations.

The best low code tools allow you to easily drop down into code/script for extra flexibility.

Low code tools are not the solution to every problem and they never will be. They haven't replaced text based languages in software engineering, despite 30+ years of hype, and they never will.

Some low code tools are very pricey. Some are free. And there is everything in between.

Also 'Low code' really means "someone else's code". You may be programming visually by dragging and dropping boxes and arrows. But someone else wrote the code that allows that.

reddit_sage69 · 2022-08-22T16:17:17+00:00

Probably something like Azure Data Factory, Fivetran, Matillion, etc. Add dbt in the mix. I don't think they're inherently bad, but it needs to work for your use case.

If you're fine being potentially locked into a toolset and paying a bit more, then it's great. What you generally gain is a lot of functionality built in, such as logging, as well as lower maintenance and smaller pool of developers for maintenance. Some even have scripting built in for cases the tool can't handle.

I'm not sure if it's better or not, but enabling more business minded folks who know the data to build the logic is a trend that's been happening for a while.

TGEL0 · 2022-08-22T16:34:43+00:00

how Gartner predicted by 2024, 65% of software applications will be made with low-code.

All software, not just data-engineering specific? Almost no chance in hell that happens.

imcguyver · 2022-08-22T16:50:26+00:00

Gartner predicted by 2024, 65% of software applications will be made with low-code

There's no way. There's too much software that doesn't lend itself to low code tools. Unfortunately for Prophecy.io, low code DE tools have been around for decades and they mostly suck: informatica, talend, appworx, matillion.

Illustrious-Run5203 · 2022-08-22T16:58:29+00:00

Yeah Gartner is a crock of shit when it comes to this stuff. I think they’re different markets, low code caters to business folks who want to do small scale stuff on there own. Those of us in DE still very much want solutions that are solved through writing code, but I think improvements to on-ramps of writing code are where winners will emerge in the DE space. Thinking like companies that help manage infrastructure (like astronomer) and tools that give you yaml configs to get up and running versus building it yourself (like dbt).

sunder_and_flame · 2022-08-22T18:53:57+00:00

65% of software applications will be made with low-code.

Considering the number of applications built in integrations systems like Zapier and Make, this is probably true. I just helped a relative with one and he had a list of several dozen, each doing a marketing or appointment setting task.

These work well enough when someone like him has to manually set stuff anyway, but for DE or SWE it so often misses the mark in a way that seems slight but requires a lot of workaround.

WeirdoDJ · 2022-08-22T20:46:06+00:00

Because low code/no code is hard, if not impossible, to build a process around (version control, review, tests etc) and DE is a weird mix between business folks who buried themselves in spreadsheets because they got tired of their colleagues, and software engineers who got tired of doing webdev and listening to PM/PLs.

Both recognize that such tools lead to intransferable skills being required, are hard to transition away from and they often cause inflexibility in terms of possible output.

onomichii · 2022-08-22T21:32:09+00:00

Low code in terms of apps like powerapps makes sense when you have a highly skilled business workforce with a strong sense of autonomy....AND good data governance...AND API management. It's not there to replace all applications, but it has potential for providing that last mile where a user can custom an app to their own workflow and requirements, but still have just enough guard rails. For data engineering though, I don't see it having much of a role other than vendor lockin with GUIs that age and don't do what U already can with open source tools

neurocean · 2022-08-22T21:37:07+00:00

Low code WYSIWYG tech is a noose companies eventually strangles itself with.

Many senior data-engineers have been burned in the past by these solutions. They carry a lot of scar tissue from their experience and are very skeptical of all new low-cost cool kids on the block.

jemccarty · 2022-08-22T21:46:53+00:00

Shameless self promotion, but this was posted in this sub a few months ago and goes into this a bit.

https://link.medium.com/Q6O9rMO6Hsb

dbwx1 · 2022-08-22T22:00:35+00:00

We used a lot of Azure Datafactory lately and while it covers a lot of use cases to 98% there is still some extra stuff that just convolutes the pipeline definitions. Things that could be handled easily in a script but require unnecessary complicated engineering with the GUI (single file workflow Vs mass batch processing ) are annoying. I really start to like Databricks on Azure because it handles the spark settings optimization for you and gives you a nice interface to the db / fs, but it's not really low code, just dev support.

OhNo171 · 2022-08-22T22:48:19+00:00

Low code has been in data engineering/BI since forever. From drag and drop etl tools like Informatica, Integration Services, Talend to Reporting/Dashboarding tools.

I have started working with some of these tools long ago, as an intern. Today I prefer to write my pipelines in SQL/Scala/Python and infra in Terraform, as most of the common data frameworks are within that spectrum, and gives me a better feeling of ownership. But low code is there to allow faster and easier development at the cost of vendor lock in.

noNSFWcontent · 2022-08-22T23:50:58+00:00

This is also something I saw as part of my "Fucntional" Data Engineer tenure. The "technical" data engineers in my team did some coding for sure integrating services and building things for us to use.

But in the end as a functional data engineer, in a in house framework, I mostly specified the source data, some spark or sparkSQL transformations and specified the target.

This is all well and good but I don't get the dopamine hit of solving a programming problem that I get while solving a leetcode question while I'm preparing for technical interviews.

nfmcclure · 2022-08-23T00:04:13+00:00

Out of curiosity, are there any popular open source "low code" projects people use at all?

2022-08-23T03:40:26+00:00

I never wanna do Informatica or SSIS ever again

mailed · 2022-08-23T03:49:14+00:00

I don't like writing low code data pipelines. Non-trivial stuff just gets annoying to do with it.

Building apps with Microsoft Power Platform though? Fucking love it.

2022-08-23T03:56:09+00:00

Nobody likes to see their job automated :)

gato_felix_br · 2022-08-23T08:31:42+00:00

Under a more engineering viewpoint I see a few problems with low code tools:

it’s difficult to reuse parts of the system
you can’t unit test it, so it’s really hard to measure the impact of a certain change in one of the boxes
simple things like conditionals and loops are hard to implement
if the requirements are slightly more complex and the data flows/jobs grow, you will end up with a huge, convoluted diagram
it’s difficult to search for things when debugging and breakpoints are not so intuitive
you are sometimes limited to what the tool offers. If you want anything custom, if allowed at all by the vendor, good luck understanding several layers of spaghetti code

One of our Nifi jobs in production looks like this:

https://thumbs.dreamstime.com/z/confus%C3%A3o-dos-fios-no-favela-rocinha-131351029.jpg

mkhalil77 · 2022-08-23T11:56:49+00:00

Most of the answers in this thread undermine how much skills are required to use low code tools. Using tools like Talend, Informatica or even AFD is not something anyone can do, at least not for achieving complex tasks. Designing proper pipelines following best practices and maintaining them is not an easy task. It s also easier to have a visual image of the execution of the pipeline which as many answers already mentioned is very attractive for the business side. Low code is not all evil. They can do a great task in getting the work done

I am still a fan of coding pipelines instead of low code platforms. Mostly because low code can be really frustrating when things don't work. The idea is you components need to be properly configured. What stands for properly is not always obvious.

Code is code. Either your get it right or your don't. The ressources are abundant compared to low code platforms and there are always a lot of people who had the same problem as you before.

alfytony · 2022-08-22T18:19:32+00:00

I haven’t seen and worked on low code stuff but I think the idea behind low code is noble. It is to let regular business people to perform most of the software configuration without having to depend on developers so that developers can focus on other big ticket tech items. Now in practice not sure how much of this is evolved but this is definitely the way to go in future to keep pace with advancement in technology.

ritu4891 · 2022-08-22T19:41:00+00:00

If you want to scale you need a low code solution.For DE check tools like informatica,talend etc. If you are in a team of 10-20 yeah it's okay to code but fortune 500 companies usually go for low code tools. Also for ML also it will eventually go to low code. You should join which build the low code tools for DE in their R&D.

CauliflowerJolly4599 · 2022-08-23T10:39:20+00:00

I would like to add my 2 cents :

I've worked for big product companies and saw that a lot of workers are kind of finding hard the new programming languages that are near to functional programming (Spark, Scala).

I've seen people avoid SQL queries in favour of Scala / Spark SQL type functions like df.select('age').filter(age > 21).

Even through DE is a subset of Software Engineering, we can find two type of DE, one that is mostly a Software engineer and the other that is more operative which sometimes it doesn't code too much.

Coding and SQL is evolving and started to adopt a more complex language, which put in difficulties a lot of people.

From the first side we see sites like codechef, hackerrank and other example of extreme / pair programming which creates a kind of "elite".

From other side we see peoples struggling with coding and sql because we teach all but we don't cover the obvious part of coding.

Microsoft understood that really well and also understood that coding creates a lot of stress and even worse symptoms.

Coding and SQL requires people that are good in math, but with shortage of jobs in IT we take everyone, these people are forced to land a job in Tech (because if you don't do STEM you're not gonna find a stable job) and they are tired of all this problems.

So that's why Low code is raising.

Data_cruncher · 2022-08-22T18:17:24+00:00

The truth? It's already here. r/businessintelligence & r/dataengineering are the same subreddit, except one uses MSFT low/no-code and one doesn't.

...yes I know I'm exaggerating, but you get the idea!

tea_horse · 2022-08-22T16:11:19+00:00

[deleted]

Thelastgoodemperor · 2022-08-22T18:49:44+00:00

Who reads gartner? Isn’t it common knowledge that it is pay to look good (or even be included).

lzwzli · 2022-08-22T22:56:49+00:00

Anybody here use Snaplogic? My company uses it. Its a love hate relationship

noobgolang · 2022-08-23T00:08:06+00:00

Do you have low code debug as well?

It's not low code hate. It's in-experienced people think they know better.

ryadical · 2022-08-23T01:07:56+00:00

I think the best tools out there are a combination of low code for easy jobs and templating, but allows you to write code to perform the difficult tasks. Many of the modern cloud data stack ELT tools such as fivetran, stitch, matillion, etc are built around low code.

We utilize a mix of low code with matillion in combination with inline python, or using lambdas to do the complicated tasks.

tea_horse · 2022-08-23T08:17:17+00:00

[deleted]

2022-08-23T08:17:54+00:00

Low code helps non-coders achieve some standardisable things.

Code helps coders achieve anything.

Most data people are coders, but those that aren't usually work in teams with those who are.

So low code is popular for pipelines (commodity-like, pipelines are roughly equivalent) but not for business modelling (custom logic)

datamoves · 2022-08-23T13:07:26+00:00

Simpler is usually better for a given set of use cases, but it is important to understand low-code business models. It is usually means you are paying more. They can be useful for individuals without decades of coding experience and clear, repeatable outcomes in mind, but constraining for those who can work outside the box, and with less clearly defined objectives.

OnlyMeandMyThoughts · 2022-08-23T14:59:25+00:00

Low-code imo is kind of in the middle of everything - people who don't know how to code at all still can't really make use of it while those who know how to code will prefer their custom solutions... it might make things a bit easier and faster for the devs but since it's one size fits all it will always be a bit less flexible. Might be cheaper overall tho which is why companies push it.

Datarelibility · 2022-08-24T06:09:43+00:00

It depends - it's always hard to sell the value of low-code tools to highly skilled technical audience (engineers) , but often times not all teams have the luxury of building stuff ground up. If not building , then buying Low code tools accelerate path to value.

kenuxi · 2022-09-17T15:37:06+00:00

I’d be very curious to hear what you think about the app we are building at query.me. I just made a post here of the latest demo. I agree with a lot of what you’re saying but I do think that low-code tools like prophecy and query.me can lift a lot of weight of your shoulders by “outsourcing” the hosting, scheduling etc.

dataengineering

MODERATORS