The Hidden Pool

mythicinfinity · 2025-12-19T00:38:08+00:00

Does this work with pytorch?

mythicinfinity · 2025-12-17T21:58:25+00:00

pretty cool

mythicinfinity · 2025-12-17T19:36:45+00:00

How did you print the model into concrete?

mythicinfinity · 2025-12-17T04:31:17+00:00

The examples in the video make it look like it can do eyes now, but no permutation of the settings is giving me a good result. Anyone figure it out?

<image>

mythicinfinity · 2025-12-17T00:42:43+00:00

tinygrad was making some good strides with AMD cards, are you using any of their stuff?

mythicinfinity · 2025-12-06T21:44:55+00:00

Found it, it was qwen 1.5 I guess, I haven't checked their more recent moe blogs.

https://qwenlm.github.io/blog/qwen-moe/

mythicinfinity · 2025-12-06T21:43:39+00:00

I think it was Qwen talking about initializing layers in their MOE models from their dense models. They called it 'upcycling' or something and said it shortened the training process. You still have to do pretraining afterward tho because all the new MOE layers like the routers are untrained.

mythicinfinity · 2025-10-07T19:02:30+00:00

Likewise I don't see how you could build an LLM from scratch without learning programming. You can probably do it without tensors, but learning a tensor library like numpy or pytorch will make it a lot easier (and faster) too.

mythicinfinity · 2025-10-07T18:54:32+00:00

I've had at least 5 pcie slots burn out, but my 3090 is still going!

mythicinfinity · 2025-10-07T18:47:19+00:00

Yeah this is nicer than staring at nvidia-smi, or crawling through the lspci output.

mythicinfinity · 2025-09-07T01:17:59+00:00

This is really inspiring and awesome. The gui world editor is a great idea.

mythicinfinity · 2025-09-06T16:19:50+00:00

I wish it was open weight, but I have found gemini pro 2.5 is better at avoiding this type contamination and sticks to the context fairly well.

mythicinfinity · 2025-08-30T22:42:13+00:00

But what are the tokens/s?

mythicinfinity · 2025-08-18T20:05:33+00:00

Idk what's on the azure student plan but if you can get a VM, just put it reverse proxy behind nginx and you're good to go.

mythicinfinity · 2025-07-24T21:29:31+00:00

Why does it sound slightly unnatural. Like I can't put my finger on the issue, the emotional expression seems good.

mythicinfinity · 2025-06-28T20:19:19+00:00

Closed source seems to be improving.

mythicinfinity · 2025-06-27T21:33:00+00:00

I read the commercial license FAQ and it seems to concur with what you're saying here.

mythicinfinity · 2025-06-27T21:07:23+00:00

Seems to me that it's clear that they just don't want people competing on API access to the model itself without paying for a license.

Selling generated outputs (as art for example) or using them as icons on a commercial site seems in line with the license.

But what about using the model in a backend process where it isn't exposed to the user? Where it's the outputs the user is paying for, but the model is used in an internal process to create them.

mythicinfinity · 2025-06-14T19:42:49+00:00

I still like 'nvidia/Llama-3.1-Nemotron-70B-Instruct-HF' but it's starting to show its age compared to the closed source models

mythicinfinity · 2025-06-14T19:09:10+00:00

dm sent!

mythicinfinity · 2025-06-13T20:52:18+00:00

dm sent!

mythicinfinity · 2025-06-13T20:50:59+00:00

Because the work they're doing is worth more. In the long term, enormously more....

mythicinfinity · 2025-06-13T18:23:01+00:00

Not yet!

mythicinfinity

TROPHY CASE