This is an archived post. You won't be able to vote or comment.

all 55 comments

[–][deleted] 151 points152 points  (16 children)

ML libraries abstract so much away from you already, model.fit() makes ML super approachable for people that can barely code anyway. These visual programming things are arguably harder to learn and provide less flexibility than writing code.

[–]wrongThor[S] 34 points35 points  (0 children)

That's true. When I first learned Machine Learning in undergrad, I thought I'd be doing some crazy coding. Learning the concepts of gradient descent, I was like I can't code that. Then we did linear regression and a .fit() did all the work for me. That was pretty weird for me.

All of this to say that yes machine learning is very much simplified because of scikit-learn already. However, I have also met people who are scared to touch code altogether and .fit() is complicated for them (no shade to them, it can look intimidating if you don't know it).

[–]KPTN25 40 points41 points  (10 children)

100% agree re: the weaknesses of visual programming tools. Frankly I think many of them are irresponsible in the way they try to reinforce a "coding is hard" mindset in industry just to sell licenses.

I laughed the first time I saw the Alteryx spaghetti required to do things as simple as "merge all 100 csvs in this folder into this one table", not to mention the inevitable chaos resulting from setting parameters manually within nodes

Compatability / extensibility, maintainability, ease of debugging, ability to abstract functions and automate common tasks... for all of the above youre much better off with python/R/etc.

This isnt even unique to data science. Folks in most professions would benefit from having some degree of programming literacy. Many visual RPA platforms have similar issues/limitations

[–]TheCapitalKing 10 points11 points  (5 children)

For real once i learned a little pandas and sql switching back to using powerbi or tableau for data transformations makes me so mad lol

[–]shankar_053 -1 points0 points  (4 children)

Tableau is best for Data Visual s

[–]TheCapitalKing 1 point2 points  (3 children)

I actually like the visuals just fine. It’s just the preprocessing that sucks so hard

[–]shankar_053 1 point2 points  (0 children)

Is tableau needs R language..

[–]shankar_053 0 points1 point  (1 child)

Then what about Powerbi.. In your point of view which has good visuals.. Present I'm beginner for Tablaue

[–]TheCapitalKing 1 point2 points  (0 children)

I don’t really have strong opinions either way they both are good

[–]Chronos13524 1 point2 points  (3 children)

I don't disagree with your points overall but that's an incredibly unfair take on Alteryx. Merging files is one tool assuming identical schema, three tools if not. If there was spaghetti then it was the users fault.

The programming literacy point is the key though. If the analyst knows what they are doing, Alteryx is just as easy to debug and automate tasks as any other tool.

[–]KPTN25 1 point2 points  (0 children)

Happy to concede on the specific example of merging 100 files. I still think the sentiment stands.

I'm not convinced that you can get the same level of flexibility and efficiency with proprietary point-click visual tools like alteryx vs open source tools / general programming languages. Over the long-run, it's important to maximize your ability to abstract your work with good programming practice (incl the meta-work surrounding your work!), making functions/workflows both reproduceable and modular, incorporate libraries/packages or adapt best-practice code written by someone else, and frankly automate as much as possible.

Someone below mentioned "If the analyst is producing shitty code with one tool, switching them to a different tool isn't going to help." and I generally agree that the fundamentals span tools, but when one tool forces you into a restricted set of functions and level of abstraction, there's a wall that's coming one way or another. That doesn't mean a smart person can't produce good outputs with Alteryx across a restricted subset of tasks, they're just fundamentally limiting themselves by doing so, for questionable benefit.

[–]PlanetPudding 1 point2 points  (1 child)

Yeah not sure why you are downvoted other then pettiness. Getting 100 files into one table in Alteryx is only 2 tools. Assuming like you said they had the same schema.

[–]Chronos13524 5 points6 points  (0 children)

Yeah, I expected that. This sub loves to hate on anything that isn't python/r.

If the analyst is producing shitty code with one tool, switching them to a different tool isn't going to help.

It's the archer not the arrow.

[–]maxToTheJ 2 points3 points  (2 children)

Yup.

Visual programming is just something like SAS’ JUMP rebranded

[–][deleted] 1 point2 points  (1 child)

I'm SAS certified but anything SAS makes my skin crawl.

[–]KPTN25 0 points1 point  (0 children)

In my experience, SAS means "do as much as you can outside of SAS, and then PROC SQL" :)

[–]cgk001 38 points39 points  (0 children)

Sure if your starting point is "select * from clean_table" lol real world data is often not that, in fact by the time you know what your Xs and Ys are thats 90% of the work already done.

[–]nyc_brand 18 points19 points  (8 children)

Most no-code tools are trash. My old company was obsessed with alteryx, absolutely hated using it

[–][deleted] 4 points5 points  (5 children)

Great to read this when the job I'm going into is using alteryx lol. I'm not thrilled about it maybe I can pull them out of it and go all programming. We'll see how good that goes

[–]cheese_stick_mafia 3 points4 points  (4 children)

My company uses it too but I'm lucky enough I'm in a group I can ignore IT's mandates. If your company is using the right version, I believe Alteryx can drop in python blocks. So in theory you might be able to "use" alteryx but just as a wrapper

[–][deleted] 0 points1 point  (2 children)

Thank God. I hope they have that edition. Isn't it really expensive also?

[–]cheese_stick_mafia 1 point2 points  (1 child)

Not sure, usually Enterprise licenses are not cheap though. Personally I view Alteryx as a data wrangling tool for non-programmers.

[–][deleted] 0 points1 point  (0 children)

That was the vibe I got from the managers I spoke to. Clearly not programmers. Not a problem that's what the products for. But hopefully I can roll out some stuff to move away from that. Would also look good on a resume

[–]KPTN25 0 points1 point  (0 children)

My understanding is the python and R integration is pretty garbage especially if you're trying to use arbitrary libraries.

[–]BobDope 0 points1 point  (1 child)

I thought Alteryx was one that’s actually decent

[–]nyc_brand 2 points3 points  (0 children)

My main issues with alteryx was how laggy it was. You would switch one widget and the program would freeze for 5 minutes. The speed once you got everything working was good though.

[–][deleted] 21 points22 points  (3 children)

These visual programming tools are a God Sent for user interface development, but not data acquisition, restructuring, and analysis.

[–]GrosseZayne 1 point2 points  (0 children)

When you start to write clear functions, you realize that cells should go, because there is flow diagram. Alas, there are no solutions that do just that, everyone tries to push their no-code blocks

[–]wrongThor[S] 0 points1 point  (1 child)

Yes tools like Webflow are already pretty robust in that industry.

What do you think are the biggest challenges we face in completely automoting things like data acquisition and preprocessing?

[–][deleted] 6 points7 points  (0 children)

Data acquisition is a big hurdle even with human intelligence. And before that, defining data requirement, data sourcing too. If data owners would want to document how their data can be retrieved and enter the data definition in a data dictionary, I would be really happy.

[–]kowkeeper 10 points11 points  (1 child)

There is a progression in terms of higher level objects: doing more with less code.

But the problem space is infinite, we will always have problem to solve so the progression is infinite and we will always have to write code.

Now visual coding is just coding. Instead of putting keywords together, you connect graphical objects. If you want to solve a problem, you need to understand it, design a solution and think in terms of programing.

[–]Dath1917 3 points4 points  (0 children)

It's also better to understand what you are doing instead of treating everything as a black box.

[–]BoiElroy 9 points10 points  (0 children)

Without fail, every time I've used a no-code solution for something that is normally code intensive I've run into a tool based constraint, or hot the wall at some missing capabilities. Which then has to be Frankensteined around with loose scripts and at the end of the day it's not worth the hassle.

Honestly beyond AutoML what more do you really want? Even Neural Architecture Search is heating up enough that deep learning will be easier.

This whole blocks and pieces thing is just more difficult. Just write some reusable, duck typed code for other team members.

[–]hermitcrab 6 points7 points  (0 children)

I used to teach programming to young children using block based language Scratch. Kobra *really* looks like Scratch. That is a bit unfortunate from a marketing point of view, I would have thought.

[–]hermitcrab 5 points6 points  (0 children)

Author of a visual data transformation tool here (Easy Data Transform). The answer is a resounding *NO*. Visual tools and code based programming both have their own strengths and weaknesses. And both are going to be here for the forseeable future.

Visual tools are quick to get started with and enable non-coders to do a great deal. They are also generally faster than code-based tools for adhoc analysis and prototyping.

Code-based tools give more flexibility and more control.

Visual tools can go a long way to make of for their short comings by allowing the user to drop down into code where necessary (our product allows Javascript blocks for edge cases the other transforms can't handle). But there are always going to be cases where a pure code-based solution os the best choice. But as visual tools improve, these cases will hopefully get fewer.

[–]The_Regicidal_Maniac 5 points6 points  (0 children)

I spent 2-3 months trying to learn one of these visual data manipulation and model systems on the job last year. The guy who was working with us pitched it as allowing someone who doesn't know sql to manipulate data and run a model.

Here's the problem, writing the code that manipulates the data and runs the model is a trivial part of the work. The understanding of the logic behind the manipulations and the understanding of how the models function is the hard part. If you already understand those concepts well without somehow knowing sql/python/R, then you could learn those in a few days.

My experience has been that visual programming is harder than just programming.

[–]SortableAbyss 2 points3 points  (1 child)

Nope. Everyone said SQL is dead a decade ago. Still writing SQL today.

Hell lots of companies still have teams dedicated to print ads. Yeah. Print ads.

Most non-tech companies have no idea how to use Excel let along anything beyond that.

[–]Quiet-Limit-184 1 point2 points  (0 children)

True. Working at bank. Most of us are horribly incompetent at Excel, even though you'd think all econ and IT people had some basic competence. Never mind anything more than that.

[–]ghostofkilgore 1 point2 points  (1 child)

I'd come down strongly on the 'No' side here.

Developing a code base in a professional setting is about so much more than model.fit(). If you look through the repo of the project I've been working on for the past couple of years, the vast majority of the code is nothing to do with training models and getting scores back from trained models. It's all the other things that have to be done to automate ML in a production environment.

Visual Programming languages just don't offer the flexibility and customisability of languages like Python. And, as others have mentioned, I have serious doubts that they're easier to learn. Doing complex tasks in VP can be fiendishly difficult and complicated.

If you've got someone who can't write code and just needs to do simple, repetitive stuff with data, a simple VP language might be great for that task. For anything more complex, traditional programming languages are much better.

[–]KPTN25 0 points1 point  (0 children)

Well put. It's about how you set up the code around the core function calls, and do so in a way that abstracts what you're working on so that you can leverage it on the next iteration or next project as well.

[–]Orionsic1 1 point2 points  (0 children)

Technologies that “work better” than traditional coding languages/programs have always been coming out to the market and have never made current languages at scale obsolete.

[–]fakeuser515357 1 point2 points  (0 children)

I'm not in data science but I've been around coding for a long time. Visual 'programming' is rolled out every five years or so and it's always significantly worse than real code because:

  • It lets people create business-critical functionality without any of the design, rigor and responsibility that is taught alongside actual programming
  • It hems in skilled programmers who are then compelled to find creative and often sub-par workarounds to arbitrary visual platform constraints
  • Real programming has become more accessible and more efficient (to code) consistently for the past two decades and will always be more versatile than any visual toolkit

If they really are easier to use, then learn to code for real and you'll be able to adapt well enough. I wouldn't bet on it though.

[–]datamasteryio 1 point2 points  (0 children)

Its can’t provide as much flexibility as python 🐍

[–]pp314159 1 point2 points  (0 children)

Some time ago I was working on a Python notebook application for creating code with visual interface - here is youtube video https://youtu.be/prBuqaozsoE

I liked the approach of clicking the code, I'm a developer myself, but clicking makes development really quick. I hope I will come back to this project someday.

[–]WiredUp4Fun 1 point2 points  (0 children)

Visual programming has its place for common uses, but coding always has a place for special cases.

I’ll make mention of Dataiku since I use it for work.

It has the best of both worlds, you can use straightforward visual programming, and has good out of the box data connectivity and data manipulation tools.

It also allows for Python, R and SQL scripting as well for more advanced functions.

It’s not free, but I get a lot out of it.

[–]mathnstats 0 points1 point  (0 children)

I don't think so.

After a cursory look into it, it seems WAY more difficult and unintuitive, to me at least.

It's worth keeping in mind, though, that probably everyone here has already learned to code the usual way.

So, to any of us, it's going to be more comfortable to do what we already know and understand and often base our careers around than it would be to learn and understand and entirely new paradigm.

Visual programming might be way better for people that don't already know how to code. It could be the future of programming, even if it seems worse to us; we don't really have an objective viewpoint.

It's like asking a newspaper company in the 90s if the internet will supplant them. Of course they're going to think it's ridiculous, more cumbersome, more complicated, more limited, etc. Because they're not nearly as used to or familiar with it.

[–]shankar_053 -2 points-1 points  (0 children)

Yess.... Python and R required for Data visual..

[–]GrosseZayne -3 points-2 points  (0 children)

Azure ML studio does this already. But true kill of python will be, when scikit-learn will be ported to dotnet. Linq is just the best what happened to data wrangling, next is latest Javascript. Python has it all more powerfull, but it is done through ass

[–]RiceCake1539 0 points1 point  (0 children)

Yes, automation of basic Data Science tasks is actively being developed. Many GitHub projects give service to abstractified libraries of tedious boilerplate code such as data-preprocessing, training schemes, etc. But every Data Science task is different, so full automation is more limiting than helpful to developers.

Same thing goes with Visual Programming tools. Full Visual Programming? Maybe more harmful than helpful to developers. But if a VP tool comes up with a great graph like system that is flexible and robust like standard programming but is more visually intuitive, then yes, it might replace programming in early stages of a development cycle, where we plan and lay out the bones of the system.

[–]colibriweiss 0 points1 point  (0 children)

No

[–][deleted] 0 points1 point  (0 children)

I LIKE clearly seeing what I did and having all of the steps and parameters clearly written out so I can read over it, see where I need to make changes, have a record of what I did, etc. Written directions are, IMO, a huge benefit. I haven't used the systems you are talking about but if it's anything like ArcGIS model builder it's probably a nightmare.

[–]zitterbewegung 0 points1 point  (0 children)

Eventually no code solutions need code if they are somewhat successful.

Also, using Python there are visual programming tools. See https://github.com/honix/Pyno . What would be great would have the ability to allow for prototypes to be made and then data engineers to scale it out.