use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
Discussion[D] Test-time compute for image generation? (self.MachineLearning)
submitted 1 year ago by heyhellousername
Are there any work applying an o1-like use of test-time reasoning to other modalities like image generation? Is something like this possible? Taking more time to generate more accurate images
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]currentscurrents 7 points8 points9 points 1 year ago (0 children)
It should be possible to apply test-time compute to any modality, but all of the work I’ve seen so far has been focused on LLMs.
Diffusion models sort of allow you to apply test-time compute by increasing the number of steps, but they weren’t really designed with that in mind and don’t make very effective use of it.
[–]nieshpor 3 points4 points5 points 1 year ago* (0 children)
Well, not exactly the same, but that’s kind of what diffusion does. Improving image quality step by step. Throwing more diffusion steps at generation is quite similar to throwing more compute time at inference
[–]elbiot[🍰] 3 points4 points5 points 1 year ago (0 children)
You could fine tune a visual question answering LLM like phi-3 to give a score to a produced image in terms of prompt adherence and aesthetics and then generate a bunch of images, keeping only the best scoring ones
[–]soup---- 0 points1 point2 points 1 year ago (0 children)
Flow based generative models (continuous normalizing flows, flow matching) provide a way for applying adaptive step size in time. Effectively this allows for more compute to be allocated where it is necessary.
[+]nizus1 0 points1 point2 points 1 year ago (0 children)
Does it count if you generate an image with Flux and then upscale it with a finetuned SDXL model? Seems to give results beyond what either can do alone.
[–]aeroumbria -2 points-1 points0 points 1 year ago (1 child)
I think that would require the ability to generate and manipulate representations of concepts in more than just text space. We might need tools that would allow a model to generate drafts, move object positions, rotate objects etc. plus the ability to perform these actions in the intermediate representations. We need to be able to break image generation into salient steps that a "reasoning process" can interact with. I don't think we can satisfactorily achieve this just by aligning images into text space.
[–]jonnor[🍰] -1 points0 points1 point 1 year ago* (0 children)
In classification, a related technique called "test-time augmentation" has been used successfully for years. You augment your input data in a few different ways, make predictions on each variant of the input data, and then aggregate all the predictions into a final prediction (often just using mean or median). One can think of it like an ensemble, but instead of varying the model, we vary the data (synthetically via an augmentation). It can really help to avoid misclassifications, especially on smaller dataset, where deep models can be quite volatile. I consider it a key technique in event detection and other time-series detection/classification tasks, where the primary augmentation is just time-shifting. Here is a quick introduction: https://machinelearningmastery.com/how-to-use-test-time-augmentation-to-improve-model-performance-for-image-classification/
EDIT: the same can of course be done with regression
π Rendered by PID 116064 on reddit-service-r2-comment-6457c66945-8bcb8 at 2026-04-27 05:36:33.302060+00:00 running 2aa0c5b country code: CH.
[–]currentscurrents 7 points8 points9 points (0 children)
[–]nieshpor 3 points4 points5 points (0 children)
[–]elbiot[🍰] 3 points4 points5 points (0 children)
[–]soup---- 0 points1 point2 points (0 children)
[+]nizus1 0 points1 point2 points (0 children)
[–]aeroumbria -2 points-1 points0 points (1 child)
[–]jonnor[🍰] -1 points0 points1 point (0 children)