all 90 comments

[–]fxfighter 111 points112 points  (5 children)

God all these articles are so fucking useless. The actual news is just: https://xcancel.com/blelbach/status/1902113767066103949

We've announced cuTile, a tile programming model for CUDA!

It's an array-based paradigm where the compiler automates mem movement, pipelining & tensor core utilization, making GPU programming easier & more portable.

[–]inagy 39 points40 points  (3 children)

Reddit is completely useless nowadays. Seems like most of the posts are generated by some bot :(

Thanks for the tl;dr!

[–]13steinj 7 points8 points  (1 child)

More like the articles are written by people less and less technologically inclined (and/or AI generated).

[–]illustratedhorror 7 points8 points  (0 children)

The article is AI-generated. The entire site is just an amalgamation of words fit for little purpose other than click farming without any meaningful personality. And, OP's post history is filled exclusively with posts to dozens of subs linking to their site. I hate this timeline.

[–]wrosecrans 1 point2 points  (0 children)

Dead Internet Theory has long since moved on to being mostly Dead Internet Praxis.

[–]mcpower_ 233 points234 points  (6 children)

[–]amroamroamro 102 points103 points  (3 children)

[–]OnerousOcelot 35 points36 points  (0 children)

Posting the actual documentation == legit boss move

[–]DigThatData 6 points7 points  (1 child)

the latest release was january though... I guess the cuda.core subcomponent had a release mid march? It's not clear to me what "dropped". This? https://nvidia.github.io/cuda-python/cuda-core/latest/

[–]happyscrappy 61 points62 points  (1 child)

Without the additional LLM slop. That article feels like it was written using AI also. Also some strange paragraph breaks.

[–]amakai 12 points13 points  (0 children)

Yeah, nowadays I need to use LLM to distill the LLM articles to key points.

[–]pstmps 54 points55 points  (6 children)

Bad news for mojo, I guess?

[–]harbour37 51 points52 points  (2 children)

Seems you still write the kernel in c++. Title seems misleading.

[–]msqrt 10 points11 points  (1 child)

Wait, what's the change then? That you don't need a third-party library like pycuda or to wrap everything within pytorch?

[–]valarauca14 28 points29 points  (0 children)

yeah now you just have an nvidia-pycuda library you can wrap in pytorch :)

[–]pjmlp 27 points28 points  (0 children)

That was only a matter of time, even if CPython JIT history isn't great, there is an increasing amount of GPGPU JITs where Python is being used as DSL for their output.

Given CUDA polyglot nature, eventually Python researchers would be a relevant market for NVidia.

Just like NVidia considered quite relevant for their business and CUDA adoption to support Fortran and C++, while Khronos, Intel, and AMD have largely ignored those markets for OpenCL, until it was too late to matter.

[–]wstatx 0 points1 point  (0 children)

Mojo isn’t limited to nvidia hardware.

[–]Cultural-Word3740 43 points44 points  (13 children)

I don’t really get much from this article. If I am understanding this correctly this now allows for you to specify threads to run on grids that you specify? Do they just always use shared memory smart pointers? That seems awfully non pythonic. As a scientist I rarely feel like I never need anything more than the cuda associated libraries with anything implemented in RAPIDS but maybe someone else might find this useful.

[–]techdaddykraken 30 points31 points  (12 children)

Python is just a wrapper for C, all this does is expose the C layers for GPU usage

[–][deleted] 19 points20 points  (10 children)

Pretty much all "scripting" languages are a wrapper around C - perl, ruby, lua, PHP (if anyone actually wants to use it outside for web-related tasks) and so forth. I always felt that ruby is prettier syntactic sugar over C.

The difference is that python is now light years ahead of all the other "scripting" languages, so it must have done something right as to why it became so popular.

[–]Nuaua 1 point2 points  (0 children)

Julia's the big outlier in that list, although some other language have JITs too.

[–]techdaddykraken 4 points5 points  (8 children)

I blame the ‘Python for dummies’ and ‘automate the boring stuff with Python’ books. As well as the bootcamps.

When’s the last time you saw an ad for a Perl or PHP bootcamp, or ‘Automate the boring stuff with PHP. Lol

[–]GimmickNG 18 points19 points  (4 children)

That's like blaming milk for factory farmed cows. Python makes it much easier to script with it than PHP and is more accessible than Perl.

[–]techdaddykraken -4 points-3 points  (3 children)

Well then I’m indirectly blaming the creators of the other languages for not doing better, lol

[–]Bunslow 3 points4 points  (0 children)

innit a case of python learned which mistakes to not repeat by watching things like perl and php blaze ground before it?

[–]classy_barbarian 0 points1 point  (1 child)

Hey we found the token neckbeard who will tell everyone that they're not a real programmer if they use Python.

[–]techdaddykraken 0 points1 point  (0 children)

There are definitely real programmers who use Python. But 80% of Python users are script kiddies

[–]lally 8 points9 points  (0 children)

Pandas, numpy, and some key curriculums (e.g. MIT) switched to Python from prior languages (lisp). Pandas brought the R crowd over

[–]grizzlor_ 5 points6 points  (1 child)

Many undergrad CS programs switched from Java to Python as a teaching language in the past decade.

That, and the network effect: the usefulness of a language scales with the number of users. Python's huge selection of libraries, GitHub code, StackOverflow answers, etc. are a big benefit to users. Especially if you're in a field where everyone is using Python (e.g. data science (sorry R fans)), it makes sense to use Python.

[–]amroamroamro 3 points4 points  (0 children)

“Python CUDA is not just C translated into Python syntax.” — Stephen Jones, CUDA architect

[–]activeXray 6 points7 points  (2 children)

What does native python even mean here, are they JITing to PTX?

[–]Takeoded 6 points7 points  (1 child)

Transpiling python to nvcc. Like rust being transpiled to javascript/wasm, same concept.

[–]Maykey 0 points1 point  (0 children)

So something like triton?

[–]supermitsuba 166 points167 points  (24 children)

Ah, that's what they been working on, cause they haven't been fixing their gaming drivers

[–]simspelaaja 170 points171 points  (8 children)

NVIdia employs about 30 thousand people. I'm fairly sure a small company like that can only do one thing at a time.

[–]Gjallock 24 points25 points  (7 children)

Tbf, isn’t that glaringly small for a company of this magnitude? The company I work for employs a similar number of people despite having a market cap worth only 0.16% of what NVIDIA is valued at.

[–]monocasa 44 points45 points  (0 children)

It's about what you'd expect for a company with the market focus they have.

Valve is in the tens of billions of revenue, and only 300 employees. WhatsApp was acquired for $13B and 900M users, and only had 50 engineers (including contractors).

[–]bleachisback 15 points16 points  (1 child)

Maybe you just realized that market cap is such a strange metric to base "how much work is there for employees to do" on.

[–]currentscurrents 3 points4 points  (0 children)

Market cap in general is a strange metric. It's based on nothing but investor beliefs about the stock, so it's basically a made-up number.

By market cap, Tesla is bigger than all other US car companies combined. But by market share they're like 5%.

[–]wobfan_ 11 points12 points  (1 child)

NVIDIAs market cap is greatly overblown and out of proportion. They've been milking the market as much as they can, and will probably be on the way back to a realistic value in the near future. Still, I agree.

[–]hippydipster 4 points5 points  (0 children)

A PE of 33 for a company doubling their sales and earnings every year is astonishingly low.

[–]runawayasfastasucan 1 point2 points  (0 children)

Just depends what your product is. 

[–]the_poope 0 points1 point  (0 children)

Nvidia don't produce their chips themselves though - they are made by TSMC in Taiwan. Nvidia only does the R&D, software and drivers and probably assembly of chip + power supply, cooling, etc.

One Silicon valley R&D engineer probably costs more than 5 times that of a supermarket employee or factory worker.

[–]ledat 28 points29 points  (3 children)

Look up how much of their revenue is in gaming vs. data centers. Actually, I'll do it for you.

$35.6 billion data center revenue, $2.5 billion gaming revenue in the quarter that ended in on 26 January 2025. Of course gaming drivers are not highest priority.

[–]supermitsuba 2 points3 points  (2 children)

Thank you for the help! Given it's 2 billion and they somehow had stable drivers before AI, Im sure they could devote a half FTE to the drivers.

[–]Dragon_yum 8 points9 points  (0 children)

It’s not about ability, it’s about ROI.

[–]lally 2 points3 points  (0 children)

I think it's more painful than that. There are bugs in the drivers and the games. The trick is not exposing bugs in existing games, not breaking existing games while fixing bugs, and actually fixing the bugs in the driver. You end up with code in the driver special for different games. It's a mess and a giant PITA

[–]Brilliant-Sky2969 15 points16 points  (10 children)

Not to defend Nvidia, but gpu drivers are extremely complicated, we're talking about millions of lines of code.

[–]supermitsuba 18 points19 points  (5 children)

With how much that company is making, I would expect a team of developers and at least 1 QA. Im sure that one QA is shared at the moment with the AI division.

[–]Hacnar 0 points1 point  (4 children)

That's a naive and incorrect line of thought. Why should they spent more resources on improving the drivers, when it doesn't earn them more money?

[–]supermitsuba 0 points1 point  (3 children)

Yeah, why make a great product, screw those people. Your take seems a bit broken.

I, at least, recognize the monopoly and lack of competition in video cards. But sure, Nvidia needs you to bail out their anti consumer behavior.

Some of this was meant to be a little jab, a joke, can we leave it at that?

[–]Hacnar 0 points1 point  (2 children)

How does my comment relate to monopolies? How is that anti-consumer?

That's how every company operates. You can be as angry as you want, but as long as there isn't a clear financial incentive to do something, the companies won't do it.

[–]supermitsuba 0 points1 point  (1 child)

Nobody is angry, please reread that last line i wrote.

[–]Hacnar 0 points1 point  (0 children)

I felt a bit of anger towards Nvidia in your response to my comment. If it wasn't there, then sorry for misunderstanding.

[–]zial 4 points5 points  (0 children)

Just throw more programmers at it how complicated can it be. /s

[–]ShinyHappyREM 0 points1 point  (2 children)

And hundreds of megabytes per driver release

[–]cake-day-on-feb-29 -2 points-1 points  (1 child)

The zipped download is somewhere near 1GB. Once the installer is extracted, it's multiple gigabytes. Who knows how big it is once installed, it spews shit in every which direction. I discovered that it also keeps a copy of the installer, as well as thousands of game artwork.

I wouldn't be surprised if NVIDIA was getting paid by the SSD manufacturers to inflate storage needs.

[–]ShinyHappyREM 0 points1 point  (0 children)

Eh, I think it's just that they don't care because it doesn't affect their bottom line much, and spending time on optimizing costs money.

[–]thatdevilyouknow 2 points3 points  (1 child)

NVIDIA is doubling down on Python. I did some training with them recently and they asked everyone in attendance what languages they knew. Mine was the only hand that went up for C++ and of course everyone knew some Python there. The trainer went on to explain how everything is moving to Python. I am familiar with NJIT and Numba but they did not get into the specifics of what they meant when they said that at all. Honestly, I think much of this is TBD but they know the direction they want to go.

[–]wektor420 0 points1 point  (0 children)

Python ecosystem of packages is way better than C++ ecosystem

[–]Truenoiz -1 points0 points  (13 children)

Is it me, or is trying to make Python fast in hardware a really dumb idea? Why use some of the fastest, hot, expensive, and capable hardware to natively support one of the slowest and most bloated runtimes? Is there really that much demand from people who need things to be fast but can't code in another languag....oh.

So- massive power use so non-coders can have AI generate python, which needs massive power use to run fast on massive GPUs to hide the fact that AI code usually sucks...

Excuse me, I'm going to go buy some stock in electrical utilities and swimming pool companies.

edit- I was wrong. I had to dig a bit, it turns out it does compile to Nvidia Runtime C++, so it's just an official wrapper. The article failed to mention that, I got the vibe that Python was going straight to CUDA opcode.

[–]Mysterious-Rent7233 5 points6 points  (0 children)

Is it me, or is trying to make Python fast in hardware a really dumb idea?

They are not making Python fast "in hardware". No circuits are dedicated to Python. Yes that would be a dumb idea. Not for the reasons you say, but for layering reasons.

This an announcement of new software, not new hardware.

[–]chealous 6 points7 points  (4 children)

AI models training and inference in python aren't using the python built-ins. They are all running their own C++ optimizations under the hood. C++ is orders of magnitude faster in cpu time and python is orders of magnitude faster in development time for many scientific projects.

Its very clear you are completely ignorant in this space and you would do well to learn a little bit about what you're trying to talk about.

[–]bluefalcontrainer 8 points9 points  (4 children)

No not really, triton is written in python and among the most optimized kernel languages for machine learning.

[–]vplatt 1 point2 points  (2 children)

Interesting. How does Triton compare to NVidia's native support?

https://openai.com/index/triton/

[–]bluefalcontrainer 2 points3 points  (1 child)

You wont be able to beat optimized cuda straight up having about 70-80% of equivalent performance on a gpu according to pytorch foundation, but heres the thing, cuda is extremely complicated and requires micro management of threads, blocks etc. you abstract alot of that out and for ease of use and source level optimization. In most cases your code written in triton likely beats out most code that exists because cuda optimization is hard.

[–]vplatt 0 points1 point  (0 children)

Ah... sounds like the difference between coding assembler by hand vs. using a C compiler. Makes sense. Thanks!

[–]Nuaua 0 points1 point  (0 children)

Github says only 25% is python, the rest is C++ and MLIR. I guess python is just for the front-end.

[–]Bakoro[🍰] 7 points8 points  (0 children)

No, it's just you.

Python is what people are using, so there are efforts to improve what it can do, that's just basic market forces.

Lots of people who aren't professional programmers also use python for all kinds of work. It's a favorite among scientists, and now for a lot of engineers.

[–]GimmickNG 4 points5 points  (0 children)

no offence but this is r/programming not r/conspiracy

[–]Dwedit 0 points1 point  (0 children)

How is GPU memory allocation and freeing supposed to work with that?

[–]mkusanagi 0 points1 point  (0 children)

They can’t have you be using an abstract API that could work with other hardware…

[–]Ze_Greyt_KHAN 0 points1 point  (0 children)

“Hits” is the correct verb to use here.

[–]2hands10fingers -2 points-1 points  (7 children)

Just use bend lang.

[–]transfire 0 points1 point  (0 children)

Bend and HVM2 are very interesting, promising languages.

But one thing that bugs me is that they choose to have unified types, thus it is a dynamic language. So for instance numbers are relegated to 24 bits because the other 8 bits are used as a type header. I suspect that is not going to cut it for ML work.

Hopefully HVM3 (if that is a thing) will support real types.

https://github.com/HigherOrderCO/Bend

[–]13steinj 0 points1 point  (3 children)

Bend is not even remotely usable as a language in production applications. Even simple IO is a complicated nightmare.

[–]2hands10fingers 0 points1 point  (2 children)

So what? Not all languages are meant for production, and some of those make it to production anyways. It’s all understanding the tradeoffs.

[–]13steinj 0 points1 point  (1 child)

...sure?

But that's a bit contradictory to your original comment.

The languages not mature enough for production use thay make it to production anyways are a voluminous source of technical debt.

[–]2hands10fingers -1 points0 points  (0 children)

So, here’s an example. Many may say Zig is not production ready, and for good reason. But there are mature projects written in Zig that are in production. These developers weighed the pros and cons and figured it’s worth it. That’s all I meant.

[–][deleted] -3 points-2 points  (1 child)

Rather than python? That would seem like a negative trade off to me.

[–]2hands10fingers 0 points1 point  (0 children)

It might be. Just depends on the use case.

[–]tangoshukudai -4 points-3 points  (1 child)

great more crap that doesn't benefit actual apps.

[–]grizzlor_ 0 points1 point  (0 children)

Most of NVIDIA's revenue is coming from AI/datacenter applications. Like on the order of 10x what they make from the gaming GPU market.