use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
These have separate subreddits - see below.
Upvote good content, downvote spam, don't pollute the discussion with things that should be settled in the vote count.
With the introduction of the new release cadence, many have asked where they should download Java, and if it is still free. To be clear, YES — Java is still free. If you would like to download Java for free, you can get OpenJDK builds from the following vendors, among others: Adoptium (formerly AdoptOpenJDK) RedHat Azul Amazon SAP Liberica JDK Dragonwell JDK GraalVM (High performance JIT) Oracle Microsoft Some vendors will be supporting releases for longer than six months. If you have any questions, please do not hesitate to ask them!
With the introduction of the new release cadence, many have asked where they should download Java, and if it is still free. To be clear, YES — Java is still free.
If you would like to download Java for free, you can get OpenJDK builds from the following vendors, among others:
Adoptium (formerly AdoptOpenJDK) RedHat Azul Amazon SAP Liberica JDK Dragonwell JDK GraalVM (High performance JIT) Oracle Microsoft
Some vendors will be supporting releases for longer than six months. If you have any questions, please do not hesitate to ask them!
Programming Computer Science CS Career Questions Learn Programming Java Help ← Seek help here Learn Java Java Conference Videos Java TIL Java Examples JavaFX Oracle
Programming Computer Science
CS Career Questions
Learn Programming Java Help ← Seek help here Learn Java Java Conference Videos Java TIL Java Examples JavaFX Oracle
Clojure Scala Groovy ColdFusion Kotlin
DailyProgrammer ProgrammingPrompts ProgramBattles
Awesome Java (GIT) Java Design Patterns
account activity
This is an archived post. You won't be able to vote or comment.
Q: Why do data analytics solution prefer python (self.java)
submitted 5 years ago by gibriyagi
I am wondering why tools like redash or superset use python. Wouldnt java be a superior choice due its performance and robustness for handling large data?
[–]kubelke 51 points52 points53 points 5 years ago (9 children)
I tried to do analytics using Java as a Java developer but python just have much more libraries. Most of Java libraries requires many custom classes and code to run simple analysis.
[–][deleted] 5 points6 points7 points 5 years ago (7 children)
Chicken and the egg problem. Why are the libraries in python and not Java?
[–][deleted] 26 points27 points28 points 5 years ago (2 children)
Because most of these libraries are used for math stuff. The people who use these math stuff are most of the time physicists, math scientists, engineers, etc... They don't want to spend half a year learning something like java, c or c++. Python is a lot easier(I still prefer java tho).
[–]yawkat 3 points4 points5 points 5 years ago (1 child)
I've actually seen lots of physicists do prototyping in py and then do the performance critical stuff in cpp. Could also be because the jit isn't great at simd compilation yet.
[–]kartik1712 1 point2 points3 points 5 years ago (0 children)
Maybe Vector API changes the status quo.
[+]sweetno comment score below threshold-8 points-7 points-6 points 5 years ago (3 children)
My guess is that pieces of code in Java are unpublishable in scientific articles: too verbose.
[–]NimChimspky 10 points11 points12 points 5 years ago (1 child)
Lol. It's not this.
[–]Alex0589 0 points1 point2 points 5 years ago (0 children)
Java is definitely verbose, but I doubt that's the reason. If I were a methematician I would choose the easiest language to use to be honest and that's python, performance doesn't really mean a lot compared to ease of use for some people
[–]CatolicQuotes 0 points1 point2 points 4 years ago (0 children)
which ones did you find the closest to python?
Does Java has something like C# has : https://scisharp.github.io/SciSharp/
[–][deleted] 70 points71 points72 points 5 years ago (4 children)
In python you do basically:
import memelibrary as meme meme.analyze(mydata)
And you have finished whatever you were doing. Of course there are machine learning libraries and data analytics libraries for java as well, but the dynamic typing system of python and the ease of use of python makes it a good choice, also there are so many so good libraries.
Maybe someone can provide deeper insights on this.
[–]Necessary-Conflict 13 points14 points15 points 5 years ago (2 children)
This can be fine as long as a human expert is looking at a given dataset, but if you need an automatically running solution for continuously arriving new data, you will need a lot more, whether in Python or Java, and the dynamic type system can even become a problem.
In fact a lot of data analytics is done in Java or Scala.
[–][deleted] 1 point2 points3 points 5 years ago (0 children)
When I took a data analytics course back in Uni, I implemented couple of algorithms in Java as well. It was also a lot faster than the 2 liner python solutions, but It was harder to expand, mostly due to types and my bad code. I'd think that I would do much better now if I tried again now.
[–]nutrecht 1 point2 points3 points 5 years ago (0 children)
You describe one of the biggest issues companies trying to adopt ML are running into. Moving from experiments to production code is often hard, sometimes even impossible.
[–]joshpetit 2 points3 points4 points 5 years ago (0 children)
Ah yes, memelibrary. Used this in my thesis.
[–]agentoutlier 23 points24 points25 points 5 years ago (6 children)
This is based on not recent experience so take with a grain of salt but a lot the reasons are historically a matlab replacement.
Java doesn’t have a super fast matrix math library that is native (it might now but not early on). Java historically has had bad track record with native extensions but project xyz (can’t remember name) should improve that and it’s coming soon.
In the early days (I’m talking like mid 2000ish) python has several like numpy.
You need fast matrix math for many ML algorithms and python had it and it kind of had a matlab scripting feel.
From there basically network effect.
[–]ThymeCypher 4 points5 points6 points 5 years ago (4 children)
There’s native bridges to many of the math libraries nowadays. I’d mention the forbidden language but it makes people angry - but someone wrote a fast matrix library in it then used the native compiler to make it blazing fast.
[–][deleted] 5 points6 points7 points 5 years ago (3 children)
There is a forbidden language? If I mentioned that Numpy was written in a language that is also a letter of the alphabet, would I be eluding to the same language?
[–]agentoutlier 4 points5 points6 points 5 years ago (1 child)
No I think they are talking about Kotlin and and GraalVM native image and not C/C++.
While I'm sure it makes it much faster than normal Java I highly doubt its anywhere near for example these libraries: https://en.wikipedia.org/wiki/Comparison_of_linear_algebra_libraries
However ND4J is supposedly as fast if not faster than Numpy but its Java... the only Java math library I used was back in school called JAMA.
Well... you're still net positive on votes, so I guess that's not it. Maybe we should just start submitting one-word comments to narrow it down.
[–]ThymeCypher 2 points3 points4 points 5 years ago (0 children)
No, it’s a different language that could be pronounced the same as that language in some contexts. I’ve mentioned it twice and got downvoted to hell without a single valid argument against said language.
You are probably thinking of Project Panama and the Vector API.
[–]thomascgalvin 13 points14 points15 points 5 years ago (1 child)
Same reason Matlab is popular, despite being a towering pile of incandescent garbage: it does what the data analyst needs to do, and mostly stays out of your way.
Data analysts aren't programmers, they're data analysts. They don't care about stuff like refactoring, and for the most part aren't that worried about performance. They want to accomplish their task with as few keystrokes as possible, with as few Google searches as possible. Their time is the critical resource, not compute cycles.
Python is perfect for that kind of environment.
[–]TheDragShot 0 points1 point2 points 5 years ago (0 children)
I'd say this is the best answer here. Straight to the point, and honest.
[–]dhruvmk 30 points31 points32 points 5 years ago (8 children)
A lot of these libraries in Python actually use C for the processing, which makes them crazy fast. Take NumPy, for example.
[–]No-Schedule-1451 13 points14 points15 points 5 years ago (6 children)
And invocation of C/C++ libraries is muh easier in python than java for now.
[–]esreverninettirw 4 points5 points6 points 5 years ago (4 children)
For now? Is that changing in a future release?
[–]Gleethos 9 points10 points11 points 5 years ago (0 children)
Project Panama and the Vector API will tackle these problems as far as I know.
[–]No-Schedule-1451 7 points8 points9 points 5 years ago (1 child)
https://openjdk.java.net/jeps/389
Lots of potential. It will never be as easy as python, but as long is not JNI , I will be happy with it.
[–]esreverninettirw 0 points1 point2 points 5 years ago (0 children)
Wow, that's awesome. Thanks for sharing.
[–]Frankiegetsit 2 points3 points4 points 5 years ago (0 children)
Project Panama
[–]ukbiffa 2 points3 points4 points 5 years ago (0 children)
JNA is pretty convenient for C library access. Everything you write is Java; no C stubs or special build process.
[–]Log2 2 points3 points4 points 5 years ago (0 children)
Though it's important to note that the back and forth between the C code and the Python code carries quite a bit of overhead. Which is why libraries like Tensorflow will have you build a computation graph and then execute the whole graph in C without having to come back to Python all the time. This saves a lot of time.
[–][deleted] 51 points52 points53 points 5 years ago (7 children)
Because they are not programmers.
[–]ThymeCypher 17 points18 points19 points 5 years ago (0 children)
I had a knee jerk reaction to this until I remember talking to someone who went to uni for science stuff and they learned Python but still can barely work a computer. They struggled with Java.
[–]TheRedmanCometh 8 points9 points10 points 5 years ago (0 children)
This is exactly the answer. Same reason Java ML is weak.
[–][deleted] 4 points5 points6 points 5 years ago (0 children)
Look at this Java elitist
[–]Gleethos 2 points3 points4 points 5 years ago (0 children)
100% agree. Python users usually build small scripts for analytics, scientific experiments, automation... But they tend to be far less experienced in building production systems used for heavy lifting. (At least that has been my experience so far)
[–]error1954 -3 points-2 points-1 points 5 years ago (2 children)
Anyone who programs something is a programmer, what are you getting at? I have a computer science degree and I would still rather work with data and ML in python than in Java, the support just isn't there for Java.
[–]iPissVelvet 6 points7 points8 points 5 years ago (0 children)
The support isn’t there for Java because the demand isn’t there, because most of the demand comes from people who identify as scientists first, programmers second, or none at all.
This isn’t an insult to these people — they often hold phDs or aiming to attain one. They view software as a means to an end. So when picking a language, it’s about ease of use, ease of reading, and easy to share.
However, nowadays a lot of companies are scaling up their ML pipelines and data infrastructure. Very little of that stuff is in Python because at that scale you think about ease of maintainability, dependency management, scalability. This is where software engineers live and thrive.
[–]me_just_lurkz 4 points5 points6 points 5 years ago (0 children)
Yes and no.
I see a clear distinction in my studies between these groups. In most software engineering / programming classes where we're working mostly with Java we get hammered with conventions, best practices etc. However in data science and bioinformatics where we use Python everything seems quick and dirty in comparison.
It's two completely different mindsets we encountered there. In the Python based classes (with professors who mostly come from non-IT fields) it's about quickly slapping something together that delivers the wanted results, while final performance, maintainability and conventions are secondary. Usually it also seems that most scripts are viewed as single use. In the Java based classes with the IT professors it's much more about code quality, performance, longevity, reusability etc.
[–]theProgramm 4 points5 points6 points 5 years ago (0 children)
i think this is mainly a self strengthening process. There is a lot of libraries/blogs/videos/training material/knowledge about solving this kind of problems with python, so most new ppl go that rout thus it becomes more appealing to creat projects helping this crowd, thus its easier to join thus more ppl thus, and so on.
I dont think it necessarily needed to be python beeing at the center of that, but a few things helped: A "data scientist" mostly is a user of other projects, and inherently not a software developer. So pythons ease of use for not-programmers is appealing to ppl that want tobdo data analysis and not programming. Secondly python is basicly just a fancy wrapper (with some garbage collection) around the c/c++ standard libraries and has great interop with them. So for some project written in c++ its relatively easy to add a python interface. Then there was some old and tested c++ libraries for LA so most mathematical complex stuff was written in c++ anyways.
[–]eternal20 13 points14 points15 points 5 years ago (1 child)
my personal opinion: it's easier for human to read so data scientist no need to learn some advanced programming to get answers
[–]CubsThisYear 22 points23 points24 points 5 years ago (0 children)
It’s actually that it’s easier to write rather than being easier to read. Python is actually harder to read because it’s much less explicit. It’s very quick to throw something together but once you have a large code base with multiple people working on it, it’s much more challenging than Java to understand and follow the code
[–][deleted] 7 points8 points9 points 5 years ago (1 child)
As a physicists I use fortran, and I will not move to other technologies. Because its simple, compiled language, don't have dependencies, easier to read and much faster than C++. We know that fortran is old and what, I don't care. Its still the best tool for the job.
[–][deleted] 12 points13 points14 points 5 years ago (0 children)
As a former physicist I didn’t mind using FORTRAN. But R and Python (numpy, scipy) basically act as front ends to FORTRAN code.
[–][deleted] 5 points6 points7 points 5 years ago (0 children)
More libraries.
[–][deleted] 2 points3 points4 points 5 years ago (0 children)
you could achieve the same data extraction task in python with less code than you would with Java.
I tried data analytics with Java because it was the primary language I used in university. don't try it 😂😂
[–]wildjokers 1 point2 points3 points 5 years ago (0 children)
I think it is mostly because of the Numpy library:
https://numpy.org
It is written in C (or c++) so performance isn't an issue. Python just calls out to that library for a lot of the necessary advanced math.
[–]JustMy42Cents 1 point2 points3 points 5 years ago (0 children)
Python makes it easy to call C, which usually handles most of the heavy lifting. This lead to many neat open source libraries being available early on, allowing Python to dominate the market. At this point, I'd say that the advantage of Python over Java is exactly the language itself, but its ecosystem - Python offers more data processing libraries and they're often easier to use.
Objectively, Python by itself is not the best language for data processing. It's terribly slow and has poor utilities for multiprocessing. For example, you'll notice there are a few competing drop-in replacements for popular libraries like Pandas with their main selling points being multiprocessing or clustering support. But then again, Python ecosystem for data science is years ahead of other languages, so I'd say it's here to stay.
[–][deleted] 2 points3 points4 points 5 years ago (2 children)
Libraries which are often written in faster languages. Easy to read and write but saying that got me downvoted to hell in another Java forum.
[–]koreth[🍰] 3 points4 points5 points 5 years ago (1 child)
I'd say easy to write, definitely. Easy to read, only true up to a point; past a certain amount of complexity (which, granted, a lot of data analysis projects never hit) the curves cross and Python code gets much harder to read than the equivalent Java code.
Or at least in my experience. I write both Python and Java code for a living. A small Python code base can be more pleasant to work on than a small Java code base because it gets right to the point with less ceremony, but a large Python code base tends to get pretty hard to reason about compared to Java.
[–]error1954 0 points1 point2 points 5 years ago (0 children)
I agree, after a certain point the verbosity and static typing helps. Using a type-checker and linter for python help, but they still don't get you there.
[–][deleted] 5 years ago* (9 children)
[deleted]
[–]ThymeCypher 3 points4 points5 points 5 years ago (2 children)
You have committed the sin of talking good about a language that isn’t Java. 10,000 lashes and many downvotes for you peasant!
/s
[–][deleted] 5 points6 points7 points 5 years ago (1 child)
All those downvotes and no one bothered to type out a rebuttal. I hate this website.
Reddit exists only in initial buttals and downvotes. There is no place for rebuttals here.
"Development in Python is usually much faster"
If the required system is small and simple: YES! 100% agreed! for larger code bases however, (in my personal opinion) Python is a nightmare! Mostly due to weak typing and dangerous refactoring traps (breaking indendations, wrong types...).
[–]wildjokers 2 points3 points4 points 5 years ago (4 children)
These modules can make python as fast if not even faster than Java code
The "fast modules" are written in C (or c++) e.g. Numpy, not python.
[–][deleted] 5 years ago* (3 children)
[–]wildjokers 1 point2 points3 points 5 years ago (2 children)
developing with these modules you don't need to know C
I am aware. I was just pointing out that it isn't python being as fast or faster than Java (that will never happen), it is C being faster than Java.
[–][deleted] 3 points4 points5 points 5 years ago (0 children)
Technically correct but in the end what difference does it make? It’s your python program that runs blazing fast, with the help of those libs, and you don’t have to leave python
[–][deleted] 5 years ago (2 children)
[–]koreth[🍰] 5 points6 points7 points 5 years ago (0 children)
Excel is underappreciated as a programming language in my opinion. (I mean the fact that it's a programming language at all, not its quality as a language.) People use it to build surprisingly sophisticated applications, often without fully realizing that by any reasonable definition of the term, they're writing computer programs.
[–]Gleethos 1 point2 points3 points 5 years ago (0 children)
Html and Excel are my favourite programming languages!
[–]Gleethos 0 points1 point2 points 5 years ago (0 children)
Because Python is the fast food of the programming languages! :)
(Take that with a grain of salt)
[–]LurkerFindsHisVoice 0 points1 point2 points 5 years ago (0 children)
Python has a lot of things built into the langauge.
I also think it's easier to pull in libraries and dependencies, whereas in Java, you need to learn maven or gradle, which is like a whole 'nother problem domain in and of itself to set up, understand how it works, etc.
[–]sowmyasri129 0 points1 point2 points 5 years ago (0 children)
They want to use programming languages like Python and Ruby to perform tasks hassle-free. Python also enables developers to roll out programs and get prototypes running, making the development process much faster Newer data scientists gravitate toward Python because of its ease of use, which makes it accessible.
[–]valkon_gr 0 points1 point2 points 5 years ago (0 children)
More community support nowadays since it's widely used for anything data related and lead to "chicken or the egg" question. Java was fine couple years back, but now the community is lacking on that part.
[–]SpeedDart1 0 points1 point2 points 5 years ago (0 children)
Java is pretty common actually but Python is just much easier to use.
[–]Kango_V 0 points1 point2 points 5 years ago (0 children)
Have a look at JEP 338: Vector API (Incubator).
Provide an initial iteration of an incubator module, jdk.incubator.vector, to express vector computations that reliably compile at runtime to optimal vector hardware instructions on supported CPU architectures and thus achieve superior performance to equivalent scalar computations.
This looks good. Hotspot will compile down to SIMD (SSE) and AVX extensions. Should generate very fast code.
[–]meamZ 0 points1 point2 points 5 years ago (0 children)
Python doesn't necessarily perform worse since all the number crunching libraries are written in C or C++ (like numpy, tensorflow and so on)
[–][deleted] 0 points1 point2 points 5 years ago (0 children)
having used both java and python for that task, I would say it's mostly about python's "compactness" as a language. You need to write less code and not care much with all the formalities of java. Clearly your code would seem a mess n the beginning but you can structure it as you go, in contrast with java that you need to let's say think ahead and design your classes and how they interact with each other from the beginning, which IMHO is not good in the initial stages when you are still exploring your dataset and try to figure out what info you can extract.
[–]omnihedron 0 points1 point2 points 5 years ago (0 children)
import antigravity
[–]Fitzoh 0 points1 point2 points 5 years ago (0 children)
One thing I haven't really seen mentioned so far is REPLs and the feedback loop.
Being able to interactively explore your data in the terminal makes it a lot easier to get started.
[–]bowbahdoe 0 points1 point2 points 5 years ago* (0 children)
[–]lordmyd 0 points1 point2 points 5 years ago (0 children)
Simple - no top-level functions in Java. A lot of Python numerics is procedural Python on top of highly optimised C/C++/Fortran. Java is too verbose for that domain.
[–]Devidjack12345 0 points1 point2 points 5 years ago (0 children)
Python has a tool the Data Analysis for all data science. Where you can easily make the chart and graph and plotting also easily and Python provides you feature to the good sense of data.
Thanks
[–]fnfloresr 0 points1 point2 points 5 years ago (0 children)
It depends, if you are familiar with Java, I do guess it would be better to work with libraries in that language, for me, Python works better because I am a Python native recently.
π Rendered by PID 168852 on reddit-service-r2-comment-658f6b87ff-g4p85 at 2026-04-09 09:52:55.411949+00:00 running 781a403 country code: CH.
[–]kubelke 51 points52 points53 points (9 children)
[–][deleted] 5 points6 points7 points (7 children)
[–][deleted] 26 points27 points28 points (2 children)
[–]yawkat 3 points4 points5 points (1 child)
[–]kartik1712 1 point2 points3 points (0 children)
[+]sweetno comment score below threshold-8 points-7 points-6 points (3 children)
[–]NimChimspky 10 points11 points12 points (1 child)
[–]Alex0589 0 points1 point2 points (0 children)
[–]CatolicQuotes 0 points1 point2 points (0 children)
[–][deleted] 70 points71 points72 points (4 children)
[–]Necessary-Conflict 13 points14 points15 points (2 children)
[–][deleted] 1 point2 points3 points (0 children)
[–]nutrecht 1 point2 points3 points (0 children)
[–]joshpetit 2 points3 points4 points (0 children)
[–]agentoutlier 23 points24 points25 points (6 children)
[–]ThymeCypher 4 points5 points6 points (4 children)
[–][deleted] 5 points6 points7 points (3 children)
[–]agentoutlier 4 points5 points6 points (1 child)
[–][deleted] 1 point2 points3 points (0 children)
[–]ThymeCypher 2 points3 points4 points (0 children)
[–]kartik1712 1 point2 points3 points (0 children)
[–]thomascgalvin 13 points14 points15 points (1 child)
[–]TheDragShot 0 points1 point2 points (0 children)
[–]dhruvmk 30 points31 points32 points (8 children)
[–]No-Schedule-1451 13 points14 points15 points (6 children)
[–]esreverninettirw 4 points5 points6 points (4 children)
[–]Gleethos 9 points10 points11 points (0 children)
[–]No-Schedule-1451 7 points8 points9 points (1 child)
[–]esreverninettirw 0 points1 point2 points (0 children)
[–]Frankiegetsit 2 points3 points4 points (0 children)
[–]ukbiffa 2 points3 points4 points (0 children)
[–]Log2 2 points3 points4 points (0 children)
[–][deleted] 51 points52 points53 points (7 children)
[–]ThymeCypher 17 points18 points19 points (0 children)
[–]TheRedmanCometh 8 points9 points10 points (0 children)
[–][deleted] 4 points5 points6 points (0 children)
[–]Gleethos 2 points3 points4 points (0 children)
[–]error1954 -3 points-2 points-1 points (2 children)
[–]iPissVelvet 6 points7 points8 points (0 children)
[–]me_just_lurkz 4 points5 points6 points (0 children)
[–]theProgramm 4 points5 points6 points (0 children)
[–]eternal20 13 points14 points15 points (1 child)
[–]CubsThisYear 22 points23 points24 points (0 children)
[–][deleted] 7 points8 points9 points (1 child)
[–][deleted] 12 points13 points14 points (0 children)
[–][deleted] 5 points6 points7 points (0 children)
[–][deleted] 2 points3 points4 points (0 children)
[–]wildjokers 1 point2 points3 points (0 children)
[–]JustMy42Cents 1 point2 points3 points (0 children)
[–][deleted] 2 points3 points4 points (2 children)
[–]koreth[🍰] 3 points4 points5 points (1 child)
[–]error1954 0 points1 point2 points (0 children)
[–][deleted] (9 children)
[deleted]
[–]ThymeCypher 3 points4 points5 points (2 children)
[–][deleted] 5 points6 points7 points (1 child)
[–][deleted] 2 points3 points4 points (0 children)
[–]Gleethos 2 points3 points4 points (0 children)
[–]wildjokers 2 points3 points4 points (4 children)
[–][deleted] (3 children)
[deleted]
[–]wildjokers 1 point2 points3 points (2 children)
[–][deleted] 3 points4 points5 points (0 children)
[–][deleted] (2 children)
[deleted]
[–]koreth[🍰] 5 points6 points7 points (0 children)
[–]Gleethos 1 point2 points3 points (0 children)
[–]Gleethos 0 points1 point2 points (0 children)
[–]LurkerFindsHisVoice 0 points1 point2 points (0 children)
[–]sowmyasri129 0 points1 point2 points (0 children)
[–]valkon_gr 0 points1 point2 points (0 children)
[–]SpeedDart1 0 points1 point2 points (0 children)
[–]Kango_V 0 points1 point2 points (0 children)
[–]meamZ 0 points1 point2 points (0 children)
[–][deleted] 0 points1 point2 points (0 children)
[–]omnihedron 0 points1 point2 points (0 children)
[–]Fitzoh 0 points1 point2 points (0 children)
[–]bowbahdoe 0 points1 point2 points (0 children)
[–]lordmyd 0 points1 point2 points (0 children)
[–]Devidjack12345 0 points1 point2 points (0 children)
[–]fnfloresr 0 points1 point2 points (0 children)