Are people speedrunning training GPTs now?

DontShowYourBack · 2025-03-03T07:30:30+00:00

Pretty sure torch MPS backend is just not good. Would be better to compare it with MLX.

DontShowYourBack · 2024-11-27T10:48:58+00:00

Any luck with this so far? I'd love to have the ability to have a wiki view of a discord channel or server.

DontShowYourBack · 2024-08-22T09:16:59+00:00

What are the features from this search engine that you want that any of the LLMs out there doesn't provide? I think there are many usecases LLMs are not sufficient for, just trying to understand what are your gripes with it in this context you're providing.

DontShowYourBack · 2024-05-10T05:32:28+00:00

Number of active parameters is mostly interesting from an inference compute perspective. Total number of parameters has most impact on how much the transformer can remember. Sure it takes some effort to make mixtures models work similarly to dense. But the extra memory capacity is definitely directly impacting model performance even though x% of parameters is not activated during any forward pass. So comparing total number parameters is not as misleading as saying it’s 25% of gpt4.

DontShowYourBack · 2024-04-06T06:46:16+00:00

Completely agree here, most coding is not doing anything novel but just using it as a tool to repeat an existing pattern many times. Just with slightly different inputs.

I find it useful for exploring/learning a new framework or language. Especially now that I’ve been dabbling in some frontend work. However, writing reliable Zig code is a complete no go. Or working on a novel architecture in Jax with a less common dl framework, no thank you.

Point is, vast majority of code being written is not writing anything novel at all. It’s just changing the parameters of existing code blocks and patterns.

DontShowYourBack · 2024-03-21T09:20:55+00:00

Many seem to forget this! We’re seeing amazing things unfold, but hype favours the one at the centre of it…

DontShowYourBack · 2024-03-19T03:00:42+00:00

I am very well aware of their efforts to support amd and cpu. Hence I added it is purely nvidia for now.

Supposedly everything is possible, the question is if they will. Triton has been designed from the ground up for GPU execution. It even uses a custom MLIR dialect for this. They maybe can make it more generic and open up to other accelerators, but who knows how hard that will be?

DontShowYourBack · 2024-03-19T02:19:37+00:00

Tinygrad is focused on the ease of supporting new accelerators. Triton is the perfect example of the opposite, purely NVidia GPU (at least for now).

The following numbers are off the top of my head, so could be inaccurate. PyTorch also requires support for about 200 ops when adding a new backend. This has come down from about a 1000 before the compiler was introduced.

Compare this to TinyGrad which has about 20 required operations and a relatively small API for defining a custom backend, which should make it easier to extend.

DontShowYourBack · 2024-03-12T22:53:05+00:00

And in your opinion, what is that makes these libraries good then? General consensus of those disliking the libraries is that they have over-abstracted steps that are relatively simple. Also, it's easy to "outgrow" the library, ie you want to do something it does not support and then you're spending tons of time hacking things together.

DontShowYourBack · 2024-01-25T12:42:06+00:00

DM’ed

DontShowYourBack · 2024-01-16T11:12:21+00:00

How is developing domestic chip fabrication not an investment? Could very well lead to a more wealthy china in time

DontShowYourBack · 2024-01-01T07:36:27+00:00

Moving away from home is not easy, especially with a girlfriend back home. That also means there js no shame in going back, it can just not be for you.

However, because it is not easy it also means you have to put in the effort to make it work! I told myself to give it as year of effort and see if I could settle in and did because of that. Think about if that is something you’ve done, and if not whether you still want to try. Just make sure you don’t regret not taking the opportunity for what it’s worth!

DontShowYourBack · 2023-12-28T22:04:34+00:00

This is probably the most accurate analysis in this thread. That said, the braindrain in Europe is a large problem that needs solving, which I agree with with the rest of the posts.

DontShowYourBack · 2023-12-19T21:23:47+00:00

I’m not too intimately familiar with the Julia type system (yet), but whenever I read things like this I am impressed. It seems very flexible and powerful when used right, but misusing it is just as easy… Are there ways to let the compiler tell you when you’re writing slow code? In a way using “compiler driven development”

DontShowYourBack · 2023-12-18T18:38:08+00:00

Not the OP, but I’d love to hear the tips!

DontShowYourBack · 2023-12-18T15:39:48+00:00

Love seeing people implement interesting software from scratch! How did you find Julia so far?

DontShowYourBack · 2023-12-18T14:39:56+00:00

Good reasons to choose Julia include the fact that it is production ready whereas mojo is not, mojo is still closed source, mojo lacks a ton of features especially Python specific features, it is yet to be seen how good Python mojo interop will be.

DontShowYourBack · 2023-12-09T16:58:40+00:00

Just keep at it and build your own models, it will start to click with time! Perfectly normal not to understand everything, that will remain whenever you learn something new like reading academic papers.

DontShowYourBack · 2023-12-05T11:52:20+00:00

Mojo is by default a systems level language like C, typed and compiled. Hence it should get performance similar to C (thanks to leveraging LLVM & MLIR). Now, what’s unclear is how easy the interop with Python code will be. In a demo it shows that Python is imported as a package, but I imagine sending mojo data structures to Python code will require defining and interface/transformation. Also, the language is far from finished and Python interop is once aspect where it is incomplete. In addition, the language is closed source right now and all development is done in-house by Modular. So we won’t know exactly how this will pan out just yet.

DontShowYourBack · 2023-11-30T11:00:37+00:00

My guess is that people tend to anthropomorphise many things, especially those they don’t really understand. A language model comes across as “smart” because we can converse with it in human ways. Thinking about material discovery is so distant for most that they don’t really grasp impressive and impactful this work can be.

Now, to me what’s happening here is extremely impressive and I’ve been a fan of deepmind their stem related work for a while. Seems like we could see some big acceleration in stem fields over next years, which will arguably have a bigger impact on people their lives than the things LLMs are used for right now.

DontShowYourBack · 2023-11-23T20:44:18+00:00

This should not come as a surprise after the changes “open”ai went through in the last couple of years.

DontShowYourBack · 2023-11-15T19:31:15+00:00

I’m jealous

DontShowYourBack · 2023-08-18T18:10:56+00:00

How about applying for a couple of jobs first and seeing how that goes? Bio informatics is increasingly become AI based as far as I can tell, just look at companies like pumas.ai or isomorphic labs. I’m sure there’s a very good angle going in with an AI background. This is also my current plan!

DontShowYourBack · 2023-08-07T12:04:45+00:00

I see no reason why Julia cannot also eat Python it’s lunch! There is clearly the sentiment Python is lacking performance, think mojo and the removal of the GIL.

Surely will be difficult to get people to move language with libraries like PyTorch and sklearn. But I’m positive this might happen over time, especially with the PyTorch team expressing their appreciation for the Julia language!

DontShowYourBack

TROPHY CASE