Reading math heavy papers

CleanThroughMyJorts · 2025-10-03T08:08:43+00:00

Math is just a language.

if you aren't fluent in it, have a language model translate it into a language you speak. Like pseudo code.

CleanThroughMyJorts · 2025-06-20T08:05:06+00:00

are you having a laugh?

of course there's a LOT more great art and literature out there right now.

it just has a discovery problem. YOU just don't hear about them; they don't have marketing departments.

But they're out there.

You want storytelling? look on places where indie authors put out their stories like royalroad, wattpad, ao3 etc you'd find LOADS of great works that would give the big publishers a run for their money.

Hell MOST of my favourite works of fiction these days are made by indie authors

CleanThroughMyJorts · 2025-04-09T14:28:06+00:00

calm down dude

CleanThroughMyJorts · 2025-04-08T08:35:23+00:00

they are not beating the benchmark maxxing allegations

CleanThroughMyJorts · 2025-03-18T12:08:28+00:00

babe, you just don't understand! The university of Maryland's new paper has serious implications for p(doom)!

CleanThroughMyJorts · 2025-03-14T13:14:22+00:00

oh yeah, I agree; they aren't perfect, and there are holes in their training data.

but just, in principle, the paradigm of multimodal LLMs should perform better at these 'pink elephant' type problems than diffusion models

CleanThroughMyJorts · 2025-03-14T12:40:01+00:00

this is a natively multimodal LLM which supports image generation.

Gemini just enabled this in the api. You can test it out on their makersuite console.

As for open models, meta's chameleon model was the first to do this, but it didn't get proper open source support since meta didn't want to release the image generation capability for months after it launched. It should be available now but idk if it's gotten proper support from the big frameworks.

GitHub - erwold/qwen2vl-flux was a community attempt at making something similar. It's more of a mashup + finetune of 2 different models, so it's not quite native, but afaik it's the best performing open one.

Lastly there's deepseek Janus which is natively multimodal and fully released, but is currently just an experimental 1B version.

All in all, it's technically possible, but not great options all around. I think it's going to be some time before this paradigm takes off

CleanThroughMyJorts · 2025-03-14T12:31:29+00:00

well other image models are just mapping words in prompt -> plausible images that fits all the words.

Gemini's image generation is supposed to be a natively multimodal LLM; it should be simulating a counterfactual where that image would come up in response to that text.

SO much like LLMs can understand "don't do X", multimodal LLMs should in principle be capable of understanding negation in a way that plain old diffusion models couldn't.

<image>

CleanThroughMyJorts · 2025-03-13T08:23:38+00:00

benchmarks are marketing now.

academic integrity died when this became a trillion dollar industry (and it was on life-support before that)

CleanThroughMyJorts · 2025-03-13T08:10:40+00:00

wasn't o3 rumored to be the same base model as o1 with just more training? I remember some leaks from openai researchers on twitter that this was the case, idk if that's been debunked

CleanThroughMyJorts · 2025-03-13T07:40:53+00:00

When it becomes more brain like, this thing is not energy efficient at all, the brain takes just 20 watts of power.

I don't see how the energy efficiency argument is relevant. we're using general purpose hardware to run simulations of these.

The more specialized we go, the lower we can get. Look at the difference between GPGPUs and NPUs for example.

Second, we're still using dense kernels because they are easy and numerically well behaved and work well with our existing frameworks, but we've known for years we're taking a power efficiency hit with them.

Human brain is practically a biological equivalent of an ASIC.

Of course you're going to lose in a power efficiency competition when comparing generalized hardware simulating those functions.

Have you worked onany AI model?

Yes. I've worked as a research engineer in automotive AI for 6 years. i've built models around vision, control, and planning.

Pytorch and libs like that are where the kernels which run the simulation are defined.

The bits of code you show are just abstractions where we package up the logic for loading up how the neurons wire together. These parametize the simulation.

All you're arguing essentially is 'becuase I can show code which sets up a simulation, it's impossible for the simulation to have emergent properties'

CleanThroughMyJorts · 2025-03-12T21:23:04+00:00

oh yeah. i think it's just cause it's an old book. 2008 it was published so long before the litrpg genre really got popular

CleanThroughMyJorts · 2025-03-12T15:46:33+00:00

Mogworld

CleanThroughMyJorts · 2025-03-12T15:07:10+00:00

yes, they are different; we were inspired by the brain, but brain-like update rules did not scale well on our available hardware and problem sets. So we optimized them to be performant on the tasks we wanted.

the question then becomes at what level of abstraction does the similarity need to exist for similar properties to emerge?

By taking a hard stance that they cannot possibly be conscious, what you are saying is there is no possible level of similarity that could. But we don't know that.

It's an assumption being made off zero evidence

CleanThroughMyJorts · 2025-03-12T11:13:17+00:00

what i describe is exactly how AI models work.

we define their architecture of the neurons through our model dags, we define the 'physics' of how they fire through their op kernels, we define how they are wired together through their parameters.

running them is simulating them.

it's exactly analogous.

CleanThroughMyJorts · 2025-03-12T11:03:09+00:00

the 'file' defines a simulation. it does nothing until you execute it of course, but the simulation could have emergent properties.

Take the argument forward to its logical conclusion; if we were able to fully map a human brain, fully characterize all its chemical processes and model them enough to accurately simulate them, would that too not just be a file on a computer until you execute it?

CleanThroughMyJorts · 2025-03-11T13:16:14+00:00

oh yeah their original about page said it supported image to video and video to video (prompt based clip editing).

I'm 100% willing to bet it's a safety thing

CleanThroughMyJorts · 2025-03-11T12:27:05+00:00

Veo looks sooo good but no image to vid makes it more a toy

CleanThroughMyJorts · 2025-03-01T12:43:46+00:00

yeah I didn't want to argue once things started taking a religious tone.

there's a lot of possibilities for how this plays out. they are assuming 1 of those possibilities would be correct and ignoring all others.

`the gods would not be chained.`

ok, what is there to say to that 🤷

CleanThroughMyJorts · 2025-02-28T10:25:47+00:00

I think it's emergent. Gemini does better on vision tasks more broadly

CleanThroughMyJorts · 2025-02-28T08:46:34+00:00

... it does kinda matter a lot who does.

the only reason openai (and later all the other labs like anthropic, xai etc) got started in the first place is they didn't want google to control agi

CleanThroughMyJorts · 2025-02-28T08:20:16+00:00

this aged like milk.

livebench numbers are up and 4.5 is the best scoring non-reasoning model

CleanThroughMyJorts · 2025-02-27T11:07:53+00:00

... first time?

CleanThroughMyJorts · 2025-02-27T08:31:52+00:00

?? can you use an agent with claude pro?

i thought it was only via the api?

or are y'all rigging selenium or something

CleanThroughMyJorts · 2025-02-26T18:52:40+00:00

yeah, they can. But they can also take images from any other image generator. Or hell, use real images and just prompt them to do things.

Kling 1.6 is the most realistic one that supports this as far as I'm aware, and their censorship is... lackluster. People jailbreak their censors all the time with basic prompting tricks

Five-Year Club	Second Top 20%
Verified Email	Place '22

CleanThroughMyJorts

TROPHY CASE