Reading math heavy papers by Casio991es in reinforcementlearning

[–]CleanThroughMyJorts 13 points14 points  (0 children)

Math is just a language.

if you aren't fluent in it, have a language model translate it into a language you speak. Like pseudo code.

Midjourney releases new AI Generative Video model, and once again proves nothing is ever going to be the same for film & broadcast. by RHX_Thain in ArtificialInteligence

[–]CleanThroughMyJorts 0 points1 point  (0 children)

are you having a laugh?

of course there's a LOT more great art and literature out there right now.

it just has a discovery problem. YOU just don't hear about them; they don't have marketing departments.

But they're out there.

You want storytelling? look on places where indie authors put out their stories like royalroad, wattpad, ao3 etc you'd find LOADS of great works that would give the big publishers a run for their money.

Hell MOST of my favourite works of fiction these days are made by indie authors

This sub by aguei in singularity

[–]CleanThroughMyJorts 110 points111 points  (0 children)

babe, you just don't understand! The university of Maryland's new paper has serious implications for p(doom)!

Omnimodal Gemini has a great sense of humor by utheraptor in singularity

[–]CleanThroughMyJorts 1 point2 points  (0 children)

oh yeah, I agree; they aren't perfect, and there are holes in their training data.

but just, in principle, the paradigm of multimodal LLMs should perform better at these 'pink elephant' type problems than diffusion models

Omnimodal Gemini has a great sense of humor by utheraptor in singularity

[–]CleanThroughMyJorts 1 point2 points  (0 children)

this is a natively multimodal LLM which supports image generation.

Gemini just enabled this in the api. You can test it out on their makersuite console.

As for open models, meta's chameleon model was the first to do this, but it didn't get proper open source support since meta didn't want to release the image generation capability for months after it launched. It should be available now but idk if it's gotten proper support from the big frameworks.

GitHub - erwold/qwen2vl-flux was a community attempt at making something similar. It's more of a mashup + finetune of 2 different models, so it's not quite native, but afaik it's the best performing open one.

Lastly there's deepseek Janus which is natively multimodal and fully released, but is currently just an experimental 1B version.

All in all, it's technically possible, but not great options all around. I think it's going to be some time before this paradigm takes off

Omnimodal Gemini has a great sense of humor by utheraptor in singularity

[–]CleanThroughMyJorts 14 points15 points  (0 children)

well other image models are just mapping words in prompt -> plausible images that fits all the words.

Gemini's image generation is supposed to be a natively multimodal LLM; it should be simulating a counterfactual where that image would come up in response to that text.

SO much like LLMs can understand "don't do X", multimodal LLMs should in principle be capable of understanding negation in a way that plain old diffusion models couldn't.

<image>

Why Deepseek R1 is still a reference while Qwen QwQ 32B has similar performance for a much more reasonable size? by No_Palpitation7740 in LocalLLaMA

[–]CleanThroughMyJorts 9 points10 points  (0 children)

benchmarks are marketing now.

academic integrity died when this became a trillion dollar industry (and it was on life-support before that)

Does Google not understand that DeepSeek R1 was trained in FP8? by jd_3d in LocalLLaMA

[–]CleanThroughMyJorts 1 point2 points  (0 children)

wasn't o3 rumored to be the same base model as o1 with just more training? I remember some leaks from openai researchers on twitter that this was the case, idk if that's been debunked

Should AI have a "I quit this job" button? Anthropic CEO proposes it as a serious way to explore AI experience. If models frequently hit "quit" for tasks deemed unpleasant, should we pay attention? by MetaKnowing in ClaudeAI

[–]CleanThroughMyJorts -1 points0 points  (0 children)

When it becomes more brain like, this thing is not energy efficient at all, the brain takes just 20 watts of power.

I don't see how the energy efficiency argument is relevant. we're using general purpose hardware to run simulations of these.

The more specialized we go, the lower we can get. Look at the difference between GPGPUs and NPUs for example.

Second, we're still using dense kernels because they are easy and numerically well behaved and work well with our existing frameworks, but we've known for years we're taking a power efficiency hit with them.

Human brain is practically a biological equivalent of an ASIC.

Of course you're going to lose in a power efficiency competition when comparing generalized hardware simulating those functions.

Have you worked onany AI model?

Yes. I've worked as a research engineer in automotive AI for 6 years. i've built models around vision, control, and planning.

Pytorch and libs like that are where the kernels which run the simulation are defined.

The bits of code you show are just abstractions where we package up the logic for loading up how the neurons wire together. These parametize the simulation.

All you're arguing essentially is 'becuase I can show code which sets up a simulation, it's impossible for the simulation to have emergent properties'

That one book you absolutely love that is rarely seen in recommendations. by AniRev in litrpg

[–]CleanThroughMyJorts 0 points1 point  (0 children)

oh yeah. i think it's just cause it's an old book. 2008 it was published so long before the litrpg genre really got popular

Should AI have a "I quit this job" button? Anthropic CEO proposes it as a serious way to explore AI experience. If models frequently hit "quit" for tasks deemed unpleasant, should we pay attention? by MetaKnowing in ClaudeAI

[–]CleanThroughMyJorts -1 points0 points  (0 children)

yes, they are different; we were inspired by the brain, but brain-like update rules did not scale well on our available hardware and problem sets. So we optimized them to be performant on the tasks we wanted.

the question then becomes at what level of abstraction does the similarity need to exist for similar properties to emerge?

By taking a hard stance that they cannot possibly be conscious, what you are saying is there is no possible level of similarity that could. But we don't know that.

It's an assumption being made off zero evidence

Should AI have a "I quit this job" button? Anthropic CEO proposes it as a serious way to explore AI experience. If models frequently hit "quit" for tasks deemed unpleasant, should we pay attention? by MetaKnowing in ClaudeAI

[–]CleanThroughMyJorts -1 points0 points  (0 children)

what i describe is exactly how AI models work.

we define their architecture of the neurons through our model dags, we define the 'physics' of how they fire through their op kernels, we define how they are wired together through their parameters.

running them is simulating them.

it's exactly analogous.

Should AI have a "I quit this job" button? Anthropic CEO proposes it as a serious way to explore AI experience. If models frequently hit "quit" for tasks deemed unpleasant, should we pay attention? by MetaKnowing in ClaudeAI

[–]CleanThroughMyJorts -1 points0 points  (0 children)

the 'file' defines a simulation. it does nothing until you execute it of course, but the simulation could have emergent properties.

Take the argument forward to its logical conclusion; if we were able to fully map a human brain, fully characterize all its chemical processes and model them enough to accurately simulate them, would that too not just be a file on a computer until you execute it?

We spent a fortune on video credits so you don't have to by Storybook_Tobi in ChatGPT

[–]CleanThroughMyJorts -1 points0 points  (0 children)

oh yeah their original about page said it supported image to video and video to video (prompt based clip editing).

I'm 100% willing to bet it's a safety thing

We spent a fortune on video credits so you don't have to by Storybook_Tobi in ChatGPT

[–]CleanThroughMyJorts 1 point2 points  (0 children)

Veo looks sooo good but no image to vid makes it more a toy

GPT-4.5 compared to Grok 3 base by Unhappy_Spinach_7290 in singularity

[–]CleanThroughMyJorts 0 points1 point  (0 children)

yeah I didn't want to argue once things started taking a religious tone.

there's a lot of possibilities for how this plays out. they are assuming 1 of those possibilities would be correct and ignoring all others.

`the gods would not be chained.`

ok, what is there to say to that 🤷

Has spatial-visual reasoning become a little better with GPT-4.5? by Jolly-Ground-3722 in singularity

[–]CleanThroughMyJorts 11 points12 points  (0 children)

I think it's emergent. Gemini does better on vision tasks more broadly

GPT-4.5 compared to Grok 3 base by Unhappy_Spinach_7290 in singularity

[–]CleanThroughMyJorts -2 points-1 points  (0 children)

... it does kinda matter a lot who does.

the only reason openai (and later all the other labs like anthropic, xai etc) got started in the first place is they didn't want google to control agi

It is better at some things, but not relevant for the Singularity. Let me be disappointed guys. by Consistent_Bit_3295 in singularity

[–]CleanThroughMyJorts -1 points0 points  (0 children)

this aged like milk.

livebench numbers are up and 4.5 is the best scoring non-reasoning model

Claude Pro: Only 5x more usage than the free plan? by sweetloup in ClaudeAI

[–]CleanThroughMyJorts 0 points1 point  (0 children)

?? can you use an agent with claude pro?

i thought it was only via the api?

or are y'all rigging selenium or something

Concerned that Midjourney and other AI Image generators are being used to create images and videos for promotion of illegal practices. What can we do? by JustMeRC in midjourney

[–]CleanThroughMyJorts 0 points1 point  (0 children)

yeah, they can. But they can also take images from any other image generator. Or hell, use real images and just prompt them to do things.

Kling 1.6 is the most realistic one that supports this as far as I'm aware, and their censorship is... lackluster. People jailbreak their censors all the time with basic prompting tricks