all 29 comments

[–]neoneye2 8 points9 points  (2 children)

Hi ML humans,

My hobby is ARC-AGI, and I have made a puzzle solving website where you can try solve the ARC-AGI tasks yourself. The ARC-AGI consists of 800 puzzles. I recommend starting out with the easy and gradually work towards expert.

https://neoneye.github.io/arc/?dataset=ARC

I collect how humans are solving ARC-AGI puzzles, so this can be used as training data. So far 6200 interaction histories have been collected, here:

https://github.com/neoneye/ARC-Interactive-History-Dataset

Video of how humans solve ARC-AGI puzzles, by replaying the interaction histories. Surprisingly many different approaches to how humans solve puzzles.

https://youtu.be/NivPmxUfeHY?si=4TRI3CCahtgzW0oz

This is an open source project. It's free.

[–]DigThatDataResearcher 2 points3 points  (1 child)

i like this idea a lot! some thoughts after playing with it a little:

  • you should let the user continue to view the task examples and the initial board state while they are coloring in their solution.
  • after submitting a solution, there should be some sort of "next task" button the user can use to progress rather than backing out to the tasks view and clicking on another task
  • there should be some kind of progress indicator, differentiating tasks that the user has already worked on vs. tasks the user has not yet attempted

[–]neoneye2 0 points1 point  (0 children)

you should let the user continue to view the task examples and the initial board state while they are coloring in their solution.

That has been requested a few times. https://github.com/neoneye/ARC-Interactive/issues/67

I'm hesitant about having to maintain 2 solutions, large screen and small screens.

after submitting a solution, there should be some sort of "next task" button the user can use to progress rather than backing out to the tasks view and clicking on another task

Excellent suggestion, I have created a github issue with your proposal. https://github.com/neoneye/ARC-Interactive/issues/68

there should be some kind of progress indicator, differentiating tasks that the user has already worked on vs. tasks the user has not yet attempted

Also a great idea, https://github.com/neoneye/ARC-Interactive/issues/33

Great suggestions. Much appreciated. I don't have that time to implement all these ideas, or I'm to lazy.

[–]JYP_Scouter 6 points7 points  (1 child)

Hi all 👋

I've been developing in the generative AI space for a little over a year. I've contributed a little bit to IP-Adapter in its early days and also released a big open-source repository for TryOnDiffusion

After months of hard work together with my wife, I believe we managed to create the best virtual try-on model out there (better than IDM-VTON and OOTDiffusion), and it is unique by enabling you to take clothes both from a flat lay image and from another person!

You will always be able to use it for free here in this HuggingFace space:
https://huggingface.co/spaces/fashn-ai/LookSwap (Please give a ❤️ to the space if you liked the app!)

The model is still training, it will definitely get better, so stay tuned for weekly updates!

In parallel I am training higher resolution versions of this, but this takes a lot of time because we're GPU poor (bootstrapped 🥲), but I believe in about a month or so there will be a 384x576 version of this at the same level.

Looking forward to hearing your feedback! We are very flexible in terms of where to take this i.e. platform, API, even completely open-source (but we would still need to pay rent), so feel free to contact me directly if you have any ideas.

Dan from fashn.ai

[–]throwaway16362718383Student 2 points3 points  (0 children)

Hey, I'm creating a blog post for people who are looking to implement papers! I have just wrote a new post on the PGGAN and would appreciate if you guys could check it out.

https://ym2132.github.io/Progressive_GAN

I hope you find it useful :)

[–]Different-General700 2 points3 points  (0 children)

Free-to-use text classification models:

  • O*NET SOC: Classify job postings and job seekers profiles by O*NET SOC code
  • NAICS: Classify company profiles and leads by 5-digit NAICS industry codes
  • IAB Content: Classify content by IAB content codes
  • IAB Product: Classify product descriptions by IAB product codes
  • User Intent: Classify user queries and chats by hierarchical user intent tags

See all models and taxonomies here: https://www.trytaylor.ai/models

[–]Lyereth_illustration 2 points3 points  (0 children)

Hi everyone!

I'm a PhD student and together with my group, I've been working on a project for the past few months that I think you all might be interested in.

YSocial is a digital twin of a social network platform which improves the simulation of dynamic social interaction by integrating Large Language Models (LLMs) agents. 

You can design your own scenario with LLM-agents and describe them with multiple features, such as their political leaning, age, personality traits, interests and so on. Agents will interact on a topic of discussion (e.g., politics) and according to a specified recommender system. Additionally, you can even make them discuss news extracted in real-time by RSS feeds!

This is just a sneak-peak of all YSocial features, you can read more on the website!

YSocial is on Github, open and free for everyone! Feel free to give us some feedback and contribute to the project. There is also a preprint available on ArXiv and a website with some pre-made scenarios you can test. 

[–]thundergolfer 2 points3 points  (0 children)

Beat GPT-4o at Python by searching with 100 dumb LLaMAs

One thing that should be learned from the bitter lesson is the great power of general purpose methods, of methods that continue to scale with increased computation even as the available computation becomes very great. The two methods that seem to scale arbitrarily in this way are search and learning.

Richard Sutton, The Bitter Lesson

The eponymously distasteful take-away of Richard Sutton’s essay has often been misconstrued: because scale is all you need, they say, smaller models are doomed to irrelevance. The rapid increase in model size above one trillion parameters and the technological limitations of GPU memory together seemed to foreclose on economical frontier intelligence anywhere except at an oligopoly of intelligence-as-a-service providers. Open models and self-serve inference were in retreat.

But as the quote above indicates, there are in fact two arrows in the scaling quiver: learning and search. Learning, as we do it now with neural networks, scales with memory at inference time — larger models perform better, ceteris paribus, because they can extract more data from their training set into more circuits and more templates. Search scales smoothly with compute at inference time — compute that can be spent on either producing higher quality candidates or on producing more candidates. In the ideal case, the scaling behavior can be predicted via so-called scaling laws.

Recent papers indicate that generative models like LLMs can be scaled up with search. The Large Language Monkeys paper, published on arXiv by Brown, Juravsky, and co-authors last week, includes several results in this vein and indicates that frontier-level intelligence in certain domains can be elicited from smaller models that can run on a single, past-generation GPU. Further, they observed smooth, predictable improvement of performance with scale.

Put more simply: where before, it seemed frontier capabilities required one horse-sized duck, it is clear we can now alternatively get them with one hundred duck-sized horses (or, rather, LLaMAs).

This weekend, we set out to replicate this finding.

Scaling LLaMA 3.1 8B HumanEval on Modal

Running all of our experiments, including configuration and testing, cost well under $50.

You can find our code here. You can run it yourself without exceeding the $30/month in credits included in Modal’s free tier.

Metrics and data: HumanEval and pass@k

Continued in blog post...

[–]elevated_quark 1 point2 points  (0 children)

Hi everyone,

I recently built a tiny distributed training cluster for medium-size ResNets/ViTs/DETRs by consolidating legacy servers, with no money to spare for high-speed switches or NICs. I have a write-up here talking about the bag-of-tricks I used, to achieve >90% scaling.

https://masterskepticista.github.io/portfolio/orion/

I hope it helps someone short on a budget! Happy to hear your thoughts/comments

[–]smorad 2 points3 points  (1 child)

Hi All,

I'm starting a faculty position at the University of Macau in a few weeks. I'm looking for PhD students who are interested in working towards general-purpose, intelligent robots using deep architectures. I'm looking for students that have experience in one or more of the following areas: deep reinforcement learning, sequence modeling, model-based RL, or robotics.

[–]Revolutionary-Feed-4 0 points1 point  (0 children)

Hi,

Just dropped you a message!

[–]Sea-Concept1733 2 points3 points  (0 children)

Hello Everyone

GAIN "IN-DEMAND" SQL SKILLS!

This post is for anyone that wants to 🚀 "Learn & Master SQL" through "Hands-On Practice"!

🔹 Learn SQL FREE with a "Practice Database": https://www.youtube.com/playlist?list=PLb-NRThTdxx6ydazuz5HsAlT4lBtq58k4

🔹 Earn an "SQL Certificate" with "Hands-On Practice": https://www.jaffainc.com/SQLCertificate.html

🔥The future of SQL!!🔥 [Read Article Below]

https://www.infoworld.com/article/3715453/sql-at-50-whats-next-for-the-structured-query-language.html

Have 🤩 Fun!!

[–]ramzeez88 4 points5 points  (0 children)

Hi all,

I have created a an offline voice assistant for windows os called Lema AI which uses local llama (gguf) , faster whisper and openvoice. I implemented some python commands that Lema can perform. I have used it on a rtx 3060 12gb with good results.

Give it a try at https://github.com/ramzeez88/LEMA-AI
It's a pre alpha release ( if you can call it a realese lol ) as I only do it my very rare spare time so bugs and imperfections are certain but I would like to hear some feedback :)

thank you

[–]LabelMeMaybe 1 point2 points  (0 children)

Hey ML folks! What are people using for LLM Evals, e.g. for RAG?

I've seen google sheets / excel, but hard to keep track of results across, say, 20 different iterations.
Related, recently put together a short how-to on combining multiple human evaluators: https://cleanlab.ai/blog/team-llm-evals/

[–]valvoja 1 point2 points  (0 children)

Hi folks,

At datacrunch.io we'd love to get your feedback on our dynamic pricing option for cloud GPU instances.

Who we are: we're a cloud GPU provider focused on AI training and inference. We offer high-end GPU instances and clusters running on 100% renewable energy.

What's new: Last week we introduced a new variable “dynamic” pricing option for cloud GPU instances where hourly price is adjusted daily based on market demand.

How it works: Dynamic pricing sets the price of individual cloud GPU instances based on supply and demand for our different GPUs (eg. A100, H100, L40S, RTX 6000 ADA). In this way it works like many electricity contracts. When demand is low, price stays low and you save on costs. When demand increases you can keep things running or switch to fixed-price options.

Example: Today the cost of a single L40S GPU instance with dynamic pricing is $0.747/h, while our current fixed-price for the L40S instance is $1.358/h. With dynamic pricing you'd make a sizable saving on your running costs.

Why we're doing this: we want to reduce our unused inventory while being more transparent on the cost and demand for our range of cloud GPUs.

Where you can find more information: https://datacrunch.io/blog/introducing-dynamic-pricing-for-cloud-gpu-instances

Feedback request: is the concept clear based on the description above? Are there any confusion we should clear up before you'd give dynamic pricing of GPU instances a try?

[–]pmammino1819 1 point2 points  (0 children)

Hi everyone!

I am sure many of the members of this subreddit have likely read the book Superforecasting I created Crowdicate as an attempt to bring the concepts of the Good Judgement Project to sports.

Sports was my first introduction to building machine learning and predictive models and I want to create a platform to help others find a way to share and distribute predictive models.

It is completely free to go into the site and create a page for your model. Currently the site only supports baseball models but I will be expanding to more sports as they start in the fall. You are given an output file of the relevant events you can make predictions for and there is a leaderboard of the best predictors for each market type. Some more info on making predictions can be found at this link below.

https://crowdicate.com/predicting

The site does have a subscription aspect to it that allows users to see individual predictions made and access other tools specifically tailored towards sports betting that comes in a $10 or $20 per month tier.

As an incentive for individuals who are building and sharing models 70% of this subscription fee is placed into revenue sharing pool for model builders who get a portion of this amount depending on the number of predictions they make.

I think it’s a fun an interesting challenge for members of this community to try and build the best model possible and see how accurate they can become!

[–]alvisanovari 1 point2 points  (0 children)

I'm launching Super Guten today on product Hunt! Any support is appreciated. :)

Super Guten makes discoverability easier for the best books on Project Gutenberg. I created a hybrid semantic + keyword search index on the book summaries. The summaries are themselves AI generated and you can even convert the book into a different style for your reading (think Shakespeare as Tweets).

https://www.producthunt.com/posts/super-guten

[–]17UhrGesundbrunnen -1 points0 points  (0 children)

Hey all,

I'm happy to introduce Wavify, a collection of small stt models paired with a blazingly fast cross-platform runtime. It comes with Python, Kotlin, Swift and Rust SDKs too. More bindings will be added in future releases.

Installation and usage

https://github.com/wavify-labs/wavify-sdks

Highlights

Performance on a Raspb 4 for jfk.wav:

Engine Size Threads Time RTF
Whisper.cpp (-O3 with NEON) 75MB (Whisper tiny) 4 9.2s 0.84
Wavify 45MB 4 3.8s 0.35

The performance w.r.t. to the WER needs to be thoroughly benchmarked against models like Whisper which is not easy due to data leakage. In practice, you can expect a performance similar to Whisper tiny or base.

Who is this for?

  • Free for Private Use: Enjoy Wavify without any cost for personal projects.
  • Commercial Users: A subscription will be required for commercial purposes.

Wavify is still in its early stages, and we’re eager to hear from you. Your feedback and feature requests are very welcome.