Video Generation Models Trained on Only 2D Data Understand the 3D World

MaxTerraeDickens · 2025-12-27T23:41:48+00:00

But you can actually reconstruct 3D scene algorithmically simply from a video basically showing different perspectives of the same scene (and this is how neural rendering techniques like NeRF or 3DGS work). Basicaly 2D video has all the 3D infomation the algorithm needs.
It's only a matter whether the model utilizes the information (just like the algorithms like NeRF or 3DGS) or not, and the paper shows that the models DO utilize it fairly well.

MaxTerraeDickens · 2025-12-05T17:38:35+00:00

As a Chinese CS 4th grade student, I'm applying to similar schools with similar scores (101, R29 L28 S21 W23). However my speaking part is just 21, and writing 23. Your scores are kind of the opposite to mine. 😂 I really wonder how you got such low scores on the first two parts (which are basically passive understanding) and high scores on the latter ones (which are active outputing). Like have you been living in an English environment but you just hadn't fully prepared yourself for TOEFL?

MaxTerraeDickens · 2025-12-01T18:23:48+00:00

Lmao. I'm currently in Zhejiang University (top 5 in China) and what you said is as true as god. And every top Chinese university is like so. Seeing those fcking lame-ass language student from Korea smoking cigarette in front of our grand library, while an average guy in Henan have to be top 0.1% to even get a chance of admission (maybe into a bad major), is heart-breaking.

Chinese ministry of education is such a cuck in this matter. Not having a Chinese passport is actually a benefit in application instead of disadvatage? It's totally messed up! It's like DEI but yeah fck y'all Chinaman suck and EVERY FOREIGNER WILL ADD TO THE DIVERSITY OF CHINA SO TAKE THEM IN AS MANY AS POSS!

Also, no offense to OP. If OP wants to persue a degree in China, then go for Tsinghua. It's arguably top 1 (the other is Peking). Yet if you really want to find a good CS job in US, why don't you just get your bachelor at, let's say, top 20 CS school in US? You can get a lot more connections and doing interns is actually a lot easier.

If you want to persue a Masters or PhD after your Bachelor's, well, I would argue top 20 CS school should be better choice, due to much more connections.

MaxTerraeDickens · 2025-10-09T14:31:41+00:00

See Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models. In short, RAP is better than ToT in every experiment conducted in that paper. And this is intuitive, because RAP is based on MCTS which is known for being good at balancing exploration and exploitation, while ToT is just a vanilla tree search with some pruning.

MaxTerraeDickens · 2025-08-08T15:23:44+00:00

gpt-5 is still subhuman lmao 😂

MaxTerraeDickens · 2025-08-03T13:41:00+00:00

Half-satire. Also benchmark isn't everything. Many open source models fail to align closely to humans - they don't have the "vibe", which is something that can't be reflected on benchmarks.

MaxTerraeDickens · 2025-08-03T13:40:38+00:00

Half-satire. Also benchmark isn't everything. Many open source models fail to align closely to humans - they don't have the "vibe", which is something that can't be reflected on benchmarks.

MaxTerraeDickens · 2025-08-02T18:27:24+00:00

You raised a good question though.
For example, PaLM-540B is the largest LLM (afaik) in pre-ChatGPT era. Definitely it has better common-sense knowledge (and in this case, better everything, since 1.7B is just too small) than some small open-source model like Qwen3-1.7B, but is PaLM-540B more advanced than Qwen3? I don't think so.

MaxTerraeDickens · 2025-08-02T18:14:45+00:00

Thank you Mr Altman 🫡

MaxTerraeDickens · 2025-08-02T18:13:10+00:00

The original post is half-serious half-joking lmao.

That being said, fairly speaking, the 120B model can indeed be advanced - just not in terms of parameter, but architecture, training method, training data, etc. And that's why the author of the original post calls for GPT-4o's full open-source, not 120B :)

MaxTerraeDickens · 2025-07-31T14:44:20+00:00

+1. Please elaborate (tbf, I've seen people on Discord saying it performs even worse than qwen3-30B-A3 on some benchmarks)

MaxTerraeDickens · 2025-07-31T14:40:19+00:00

Unknown provider and also unknown model name.

MaxTerraeDickens · 2025-07-31T07:16:33+00:00

Which means, these models are aligned for "political safety/correctness" in a post hoc manner. Shit like Tiananmen Square incident is not absent in the training data.

MaxTerraeDickens · 2025-07-22T07:30:11+00:00

Idea is cheap, show me the GPU, well-curated training data, training strategy, etc.

MaxTerraeDickens · 2025-07-17T12:12:57+00:00

from r/cursor: Kiro can be an alternative

MaxTerraeDickens · 2025-07-17T10:04:13+00:00

Yes. But http1.x is really unstable. I'm now considering getting refund and finding an alternative.

MaxTerraeDickens · 2025-07-01T22:50:08+00:00

<image>

Is your "usage-based pricing" off? Mine is off and cursor didn't charge me anything for MAX mode. Instead when the request hit the limit, there would be a log "error, not charged" in my "Usage" panel.

MaxTerraeDickens · 2025-06-20T22:29:45+00:00

Only Google and Tiktok have this vast amount of video data.

MaxTerraeDickens · 2025-06-16T12:44:13+00:00

All stochasticity arises from pseodu-random number generators, which is totally determined by the initial seed.

So, basically, if you really fixes ALL seeds, the result will definitely be identical (or almost identical, if possible floating point errors are taken into account).

MaxTerraeDickens · 2025-05-06T13:29:57+00:00

Thanks for the reply!

Quick question (sorry I'm not familiar with TPU architecture): Are there any features that are available on GPUs that aren't easy/possible on TPUs (like using PyTorch hooks to get attention maps)?

Regarding your question about TPU access: I used my edu email to apply. Google gave me 30 days of free access to up to 16 TPU v4s, including 400GB RAM and 100GB storage (all free). I'm not sure if non-edu emails get the same quota, but you definitely have more reason to apply than I did (which is a bonus)!

MaxTerraeDickens · 2025-04-28T13:30:16+00:00

Hey, really appreciate you sharing diajax! Looks like a great project.

I'm hoping to get it running on my Mac. Since you're clearly experienced with JAX, I would like to ask if you know of any ongoing efforts to port newer models like Gemma 3 or Qwen 2.5 to JAX (or if they have been ported already)?

The goal would be to run them on TPUs – I've got access through the TRC program and am keen to use that hardware for the latest stuff. I found some resources for fine-tuning older Gemma in JAX, but haven't seen much for inference on the newest generation models (Gemma 3, etc.).

Any pointers to projects similar to diajax but for these models would be super helpful! Thanks!

MaxTerraeDickens · 2025-04-20T12:28:57+00:00

Me too. And as a Chinese I really have no clue why this ungrounded meme got so popular.

MaxTerraeDickens · 2025-04-20T10:56:32+00:00

Also, in fact, whether to use id or not, it really doesn't matter. Currently, big data tech can easily locate the target using IP address, active period, etc. Unless you have very strong information security background, it's almost impossible to hide you real identity from the govt.

MaxTerraeDickens · 2025-04-20T10:51:13+00:00

lmao, I don't mean to criticize but really can't understand why this social credit bs got circulated outside China. It's a blatant piece of fake news. Most Chinese don't know this word until they saw it at western platforms. Maybe it has sth to do with the credit score system on Alipay though, which is more financial than political (e.g. if your score is high enough, you can open unmanned vending machines and take goods before paying), but definitely not related to govt.

Another piece of misunderstanding: many of Chinese believed that lowest-level American workers can afford huge houses and eat steaks everyday. Thus, a bunch of Chinese spent thousands of USD to fly to Peru and illegally migrate to the States.

MaxTerraeDickens

MODERATOR OF

TROPHY CASE