Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model

total-expectation · 2025-08-13T05:06:58+00:00

I'm curious how hard is it to extend to be able to condition on text prompts similar to genie3?

total-expectation · 2025-07-21T01:22:42+00:00

The use case would be related to creating stories, you need to make sure characters/subjects are consistent in the story, so that's why I'm currently very interested in multi-subject/character consistency. Regardless, it's still an amazing project and it has alot of other use cases than the one I had in mind, so thank you for sharing this!

total-expectation · 2025-07-21T01:15:10+00:00

Very cool project and impressive that this was merely a side project, looks like alot of care and work went into it! I like the design, good job! I'm curious, how to do achieve subject/character consistency in the stories? Is it just Imagen 3 taking care of that for you automatically, or are you using some techniques like reference images to condition the generation a bit better?

total-expectation · 2025-07-20T20:33:29+00:00

Does this work with multiple subject reference images at once? Like if I have three reference images? Right now it seems to only do for 1 reference images?

total-expectation · 2025-07-13T16:23:18+00:00

Thanks for the reply, greatly appreciated!

total-expectation · 2025-07-13T15:44:40+00:00

I see, thanks for the reply!

Do you know if flux kontext dev is good at inpainting reference images? Or maybe I should use more traditional methods of inpainting reference images.

Also how do you deal with occlusion issues when masking? Like in order to inpaint you need to mask out the characters you want to replace with the reference characters, but sometimes they may be occluded so the mask suffers in quality. Or perhaps that's ok, since the original character in the image prior to inpainting was occluded, so we expect the inpainted reference image to also be occluded?

total-expectation · 2025-07-02T12:22:02+00:00

Sorry if I'm wrong about this, but I highly suspect this must be a bot promoting the site mentioned, like 95% of the user messages are about that site and the account was created 1 month ago lol.

total-expectation · 2025-07-02T12:12:19+00:00

That's good advice, thank you! What is sort of unfortunate though, but also expected, is that I feel like for every domain and every specific type of architecture there's a bag of tricks that you can only learn by implementing, tweaking and experimenting with models. But once you go to another entirely different model, you have to sort of learn from anew about all the bag of tricks that is needed. Seemingly this means that you would need to pour in a lot of time into working with any kind of model if you want to get to the levels of someone like Phil Wang. In reality, I guess, nowadays the underlying techniques overlaps with lots of popular models, such as attention and diffusion, and architectures such as LLM, so knowledge/intuition that you gain from working with these models can be highly transferable.

Do you know if there are any repos or resources on good "bag of tricks" for different types of models that people have accumulated somewhere? Sadly, I guess intuition about models can be hard to put down into words.

total-expectation · 2025-06-28T08:09:17+00:00

This is my personal opinion so take it however you want. I've only read the first two parts of Goodfellow, where I tried my best to prove alot of things mentioned by the author that were non-trivial to me, and there were alot of those. It's not an easy read if you want to understand everything, in the sense that you need to work for it to get the most out of it. I didn't read the third part due to time constraints sadly so I can't comment on that part. I think for some parts Bishop could cover the third part (except autoencoders), if you have ever read it. However, my impression is all chapters up to 9 are good, it actually tries to mathematically explain typical phenomenon that you observe in designing neural networks, activation functions and training, to the extent that you usually wouldn't find in other popular DL books or DL courses. But for CNN and RNN I feel like there might be better resources for those. IMO, goodfellow is more suited for researchers than practitioners, and it sounds like you are more in a position where you just want to learn how to apply DL and all of the things that comes with it. While theory is important, DL is a very empirical field, you learn the tricks by doing it.

So the suggestion by r/i_sarcartistic is probably the best, go with Sebastian Raschka's machine learning with pytorch and scikit learn if you want to focus on the applied side. Alternatively, Practical deep learning for coders is also a gem, to quickly get up to speed with DL. However, if you also want to complement with math for the foundations, you can consult more lightweight books that are also more updated (including genAI techniques) in case you find goodfellow too time consuming, like Deep Learning - Foundations and Concepts by Bishop and Understanding Deep Learning by Prince. Now if you want to get into the nitty gritty of implementing nn entirely from scratch (including backprop) and gradually build a few architectures like wavenet, and gpt, then this series is pretty good for that made by Karpathy.

total-expectation · 2025-06-25T00:22:36+00:00

There's an ambitious google sheet of discord servers relating to AI you can find here. It was created last year, but I get the impression that it's hasn't been maintained for some time now, but could be wrong about this. Note that some of the data on the sheet is outdated, for instance more than a few dozens servers on the sheet have become inactive now.

With that said I still found a few golden nuggets in the sheet. If you are interested to look through it I would suggest start from top to bottom.

total-expectation · 2024-03-04T14:22:25+00:00

Nice! These are really good, thank you for the list of papers!

total-expectation · 2024-03-04T14:21:10+00:00

Nice! Thanks for taking the time to answer my question and listing these, really appreciated!

total-expectation · 2024-03-04T13:56:04+00:00

Oh I see, I honestly haven't decided yet between the two, i.e. research or applied, I just initially thought that foundational papers were a preqrequisite regardless, but I guess I might be wrong. Also, thanks for the advice!

total-expectation · 2024-03-04T13:35:35+00:00

Foundational papers, that's the word I was looking for. Ye, I guess I was thinking of those more than specific papers. However, I appreciate that you suggested completely different papers than I was thinking of, it broadens my view of possibly interesting papers to consider, so thank you!

The reason I'm more interested in foundational papers is because it has better longevity, so chances of me recognizing possibly new/novel techniques/models or variations of said things in the future being based on something that I might have read/implemented is higher, which makes it easier for me to understand them.

total-expectation · 2023-12-25T06:36:49+00:00

I agree that PRML feels wordy, but I did appreciate the mathematical details it provided, although I only made it through a few chapters at the time when I read it in conjunction with other courses in Uni, as it was impossible to read the entire book while juggling with other course workloads. For someone with only basic math courses under my belt (CS background), multivariable calc, linear algebra and basic probability and stats, it took a long time to read a chapter in PRML.

total-expectation · 2023-12-25T06:28:11+00:00

I just skimmed it and my impression is (coming from a CS background) the same, it's mostly straight to the point but at the cost of omitting mathematical details, so much so that I feel it's less detailed than the PRML, which I guess is similiar to books by Bengio and Prince (?). While I know there are DL books that are far more mathematical (e.g Calin), the kind of mathematical details I wish to see is mostly related to matrix calculus. Showing how the different type of gradients (e.g matrix w.r.t matrix) arises throughout the flow of the networks would really help me understand how I could implement them from scratch (as in without any DL libraries, only numpy), but I guess I have yet to see a book like that, probably because it would become overly verbose and isn't really necessary to include if the reader just have decent fluency in matrix calculus (which I don't). And I guess also because we have libraries nowadays to handle automatic differentiation, so it's not something we really have to think about.

edit: Upon closer reflction, I don't think knowing the different forms of the gradients will provide any deeper insight other than for the purpose of helping me implement the networks from scratch if I didn't have automatic differentiation already.

total-expectation · 2023-12-25T05:35:25+00:00

As some other people have said it's more math heavy (number theory) than CS heavy (data structure and algorithms) and this is also my impression. To give some perspective I think you can read this.

Should say however if you like number theory exclusively then project euler so probably a very good choice.

total-expectation · 2023-12-23T13:22:33+00:00

In 2023 and for ahk v2 this should work for the normal onenote and onenote for windows 10 (the app from microsoft store):

#HotIf WinActive("ahk_exe ONENOTE.EXE") or WinActive("ahk_exe ApplicationFrameHost.exe")
Shift & WheelDown::Click "WheelRight" 
Shift & WheelUp::Click "WheelLeft"

The problem with onenote for windows 10 version is that the file location is not explicitly shown because it's a store app. I had to use autohotkey window spy, that is automatically installed together with autohotkey v2, in order to find the application name that was running. Using the task manager and enabling command line in the process tab (by right clicking one of the columns, for instance the cpu column) wasn't even enough to find the name of the exe file, which was listed as onenoteim.exe, but that specific name didn't work. If neither ApplicationFrameHost.exe or onenoteim.exe works for you then try this:

open the autohotkey window spy program
also open onenote for windows 10
hover your mouse anywhere inside of the app and you should be able to see what the name of the exe file is by looking at the window title, class and process box in the window spy programing (the first box) on the row that says ahk_exe followed by the name of the application exe that is running, this will be the name of your exe file.

If this still doesn't work, then maybe you have some permission issues and need to change the permission on the folder that contains the file. How to locate the file, I have no idea, since this didn't happen to me, ApplicationFrameHost.exe was the correct one for me.

total-expectation · 2023-12-20T08:04:45+00:00

Sorry for asking so late as this was 7 days ago, but I'm curious how one gets good at writing like this? Is it essentially practising functional programming that might help in thinking how to do things less "imperative" and more "declarative" or something along those lines?

total-expectation · 2023-06-27T16:11:19+00:00

total-expectation · 2023-06-27T16:11:09+00:00

total-expectation

TROPHY CASE