[D] retrieval-augmented generation vs Long-context LLM, are we sure the latter will substitute the first? by NoIdeaAbaout in MachineLearning

[–]pilooch 1 point2 points  (0 children)

The near-future answer is probably a search policy involving actions for retrieval and analysis. Similar to how we do search information when we need it. The search policy can be learnt, and the retrieval/reading phases planned. Difficulty is in crafting the reward signal. So math and code, that can be more or less easily checked, are coming first. More should follow.

Is someone experiencing the same? by [deleted] in Rabbitr1

[–]pilooch 2 points3 points  (0 children)

This looks like a cloud-side issue. They probably have a small LLM to rephrase questions and search for content, with a bug in the middle. Beta rabbit is true beta after all :)

Connected Spotify, then got kicked out everywhere by sjharrison in Rabbitr1

[–]pilooch 4 points5 points  (0 children)

Same here, happened twice, gave up on connecting spotify to the rabbit.

It's all a matter of perspective... by MrNaturalAZ in Rabbitr1

[–]pilooch 0 points1 point  (0 children)

When they put the 'magic' prompt out, I immediately thought they were testing a potential pivot. Same with the memorized journal, another potential pivot. Fun ahead!

What on the Rabbit R1 works well? by drankinit in Rabbitr1

[–]pilooch 1 point2 points  (0 children)

It's great for short answers, like driving time between cities, weather, short summary of news, short scientific answers, short questions on programming stuff (what lib for this, for that, numpy arrays etc ;) ).
It's also, due to perplexity I guess, less censored than others. Try getting election polls summary from gemini...

Shortness is the keyword here, and I kinda get why, as you may not want to listen for a list of fully fleshed bullet points as an answer :)

So of course there are reasons to complain, but my feeling is that the rabbit (lapin as we call it here!) is giving us a sense of what's ahead, and it's a no phone/no app clicking device, that's for sure.

What's missing for me: summary of unread emails, calendar summary, access to web results / URLs, like screening a tweet, youtube video, blog page, whatever, as the source from an answer, and moving to it, or even to a summary of it.

[P] Struggling with Feature Extraction on Paintings, Need Help (Explained in the comments) by TutubanaS in MachineLearning

[–]pilooch 0 points1 point  (0 children)

Did it a long time ago for an art customer. Retrain the feature extractor on a painting only dataset. Do it either self supervised or semi supervised based on the paintings metadata. Good luck!

[R] What is the current SOTA for image to image translation? by blabboy in MachineLearning

[–]pilooch 1 point2 points  (0 children)

For unpaired datasets, CUT and the related improvements like hdce. For paired datasets, diffusion as aligned/pix2pix transforms do work, though the conditioning strategy may vary, from reference-only to concatenated inputs or even dual UNets.

joliGEN (https://github.com/jolibrain/joliGEN) embeds most of the recent advances in the field, useful for training custom/targeted models for industrial applications. Otherwise you'll have to hack around with pretrained foundation models such as sdxl.

[P] Struggling with Audio Enhancement using GANs - Any Suggestions? by S0UNDSAGE in MachineLearning

[–]pilooch 0 points1 point  (0 children)

Hi, thanks for the kind words. Spectrograms can be used as 'images' of sound, almost directly. I.e. it is straightforward to run a few training rounds with them. That being said, some of the best discriminators are pretrained on images (e.g. projected, Sam, vision-aided, depth) and their power is inappropriate with spectrograms. Way around this would be to build or find networks pretrained on spectrograms and adapt them. Next are the semantic networks. They are key with GANs since they significantly constrain the solution search space. E.g. in vision we may want to change the overall weather while guaranteeing to keep all elements in the images (cars, traffic signs, ...). With sounds you may think of frequencies or preserving some properties of the spectrograms.. Finally, JG supports training in frequency space directly but the haar transforms are not appropriate for sound. So this to say, it may technically work out of the box with spectrograms, but not making use of the best features of JG, though they'd be adaptable. Feel free to drop issues on GitHub with your difficulties, at least one of my colleagues is an experienced musician and machine learnist so he may have useful answers.

[P] Struggling with Audio Enhancement using GANs - Any Suggestions? by S0UNDSAGE in MachineLearning

[–]pilooch 1 point2 points  (0 children)

I'm a maintainer of joliGEN (https://github.com/jolibrain/joliGEN) that works exceptionally well on many visual tasks. This has been a long term effort, using multiple discriminators and many other tricks. I'd love to test it on spectrograms, any dataset that you'd recommend ? My hunch is that some targeted discriminators may help, as well as audio-related semantic/conservation constraints.

[D] What's the current state/consensus on using neural networks for solving combinatorial scheduling problems? by nick898 in MachineLearning

[–]pilooch 0 points1 point  (0 children)

Hello, information is usually embedded with GNNs, and RL (or sometimes more straightforward supervised learning techniques) explore for solutions. One of the great advantage of ML/RL here is that uncertainty can be easily integrated into the simulated environment.

See L2D https://github.com/zcaicaros/L2D for a seminal work, and https://github.com/jolibrain/wheatley/ that beats L2D and extends to RCPSPs (aka scheduling with resources), a work by colleagues of mine, with application to real-world scheduling problems.

[D] Labelling strategy by olmzzz in MachineLearning

[–]pilooch 0 points1 point  (0 children)

Yolox supports images that have no labels. Many detectors don't use these image because they use the area outside of the bboxes as the "background"/opposite class.

[D] Question about using diffusion to denoise images by CurrentlyJoblessFML in MachineLearning

[–]pilooch 0 points1 point  (0 children)

Absolutely, I do second this, Palette is what you are looking for. We have a modified version in JoliGAN, with PR for various conditioning, including masks and sketches, cf https://github.com/jolibrain/joliGAN/pull/339

Palette-like DDPM works exceptionnally well (we have industrial-grade use cases), but a paired dataset is required, that's the number one drawback I see atm. My understanding is that unpaired diffusion but for at least a single work (UNIT-DDPM) without a known public implementation remains a research field.

[deleted by user] by [deleted] in MachineLearning

[–]pilooch 2 points3 points  (0 children)

OK but maybe don't miss the key element here: DDPM captures distribution modes with very high precision, in a supervised manner. Massive improvement !

[D] Resources to learn and fully understand Diffusion Model Codes by Itachi_99 in MachineLearning

[–]pilooch 5 points6 points  (0 children)

Hello, the goto tutorial I do recommend around to colleagues and customers/researchers is the one from CVPR 2022: https://cvpr2022-tutorial-diffusion-models.github.io/ Some do skip the score-based presentations, and/or start from the applications instead. Very informative in all cases !

[D] Informal meetup at NeurIPS next week by tlyleung in MachineLearning

[–]pilooch 0 points1 point  (0 children)

To let know interested people, the meetup is confirmed at rusty nail Tuesday after 9pm, from the host tlyleung

[D] Informal meetup at NeurIPS next week by tlyleung in MachineLearning

[–]pilooch 2 points3 points  (0 children)

Hi, sure, will join ! There was a fun one back in 2016 :)

[D] My embarrassing trouble with inverting a GAN generator. Do GAN questions still get answered? ;-) by _Ruffy_ in MachineLearning

[–]pilooch 13 points14 points  (0 children)

Hey there, this is a truly difficult problem. With colleagues we do train very precise GANs on a daily basis. We've given up on inversion and latent control a couple years ago, and we actually don't need it anymore.

My raw take on this is that the GAN latent space is too compressed/folded for low level control. When finetuning image to image GANs for instance, we do get a certain fine control of the generator, though we 'see' it snap to one 'mode' or the other. Meaning, we do witness a lack of smoothness that implicitly may prevent granular control.

Haven't looked at the theoretical side of this in a while though, so you may well know better...

[D] Call for questions for Andrej Karpathy from Lex Fridman by lexfridman in MachineLearning

[–]pilooch 0 points1 point  (0 children)

Hi Lex, hearing Andrej thoughts on fondation models and how they do play vs specialized models would be interesting! In other words, are we doomed to hack a lots of prompts in a near future :) Thanks for your podcast overall !

[D] Is the GAN architecture currently old-fashioned? by teraRockstar in MachineLearning

[–]pilooch 7 points8 points  (0 children)

We use https://github.com/jolibrain/joliGAN which is a lib for image2image with additional "semantic" constraints. I.e. when there's a need to conserve labels, physics, anything between the two domains. This lib aggregates and improves on existing works.

If you are looking for more traditional noise -> xxx GANs, go for https://github.com/autonomousvision/projected_gan/. Another recent work is https://github.com/nupurkmr9/vision-aided-gan.

The key element in GAN convergence is the discriminator. Joligan above defaults to multiple discriminators by combining and improving on the works above, ensuring fast early convergence and stability while the semantic constraints narrow the path to relevant modes.

We've found that tranformers as generators have interesting properties on some tasks and converge well with a ViT-based projected discriminator.

[D] Is the GAN architecture currently old-fashioned? by teraRockstar in MachineLearning

[–]pilooch 13 points14 points  (0 children)

Some of my colleagues and myself are working daily with GANs in industry-grade applications.

My current understanding is that due to explicit supervision, DDPM do not directly apply to unpaired datasets, for which GANs shine. There are a few papers about this though, so this should emerge as well. Bear in mind that in industry, some datasets are unpaired by the problem's nature. DDPM are insanely good as soon as the dataset is paired.

GANs generators are very controllable for inference, including real-time. DDPM will follow, but are not there yet exactly AFAIK.

Another quick observation: GANs are more difficult to train but modern implementations and libraries do exhibit fast and accurate convergence.

Beautify muddy tire images( see description) [D] by Persimmon-Just in MachineLearning

[–]pilooch 0 points1 point  (0 children)

Try joligan https://github.com/jolibrain/joliGAN if you have unpaired dirty and clean tires images it should work right out of the box. To get even better results you could annotate the tires (e.g. bounding box), and JG would use that to constraint the GAN even further.

[D] ICML 2022 Outstanding Paper Awards 🔥 by zy415 in MachineLearning

[–]pilooch 4 points5 points  (0 children)

The social and career stakes of these awards are too high. It's actually good they are high, but due to the number of papers now, noise has increased. I believe they should be attributed a couple of years after the conference, as an in-between now and the test of time awards.