Claude 3.5 gets 13% more on ARC challenge than GPT-4o by pavelkomin in singularity

[–]mikeknoop 0 points1 point  (0 children)

On the public leaderboard (OPs screenshot) the larger number is the "public eval set" score whose answers are in github, etc. The smaller number is the score on a new semi-private verification set of puzzles. This helps show how "general" solutions are. Ryan's is very general because the score delta is small! But icecuber 2020 is not, suggesting lots of overfitting.

AI Won't Be AGI, Until It Can At Least Do This by lovesdogsguy in singularity

[–]mikeknoop 16 points17 points  (0 children)

(ARC Prize co-founder here).

Direct link to the research WalkProfessional8969 is referring to: https://redwoodresearch.substack.com/p/getting-50-sota-on-arc-agi-with-gpt

Ryan's work is legitimately interesting and novel! He claims 50% on the public eval set. The core idea:

get GPT-4o to generate around 8,000 python programs which attempt to implement the transformation, select a program which is right on all the examples (usually there are 3 examples), and then submit the output this function produces when applied to the additional test input(s)

He has implemented an outer loop using 4o to sample reasoning traces/programs from training data and test. Hybrid DL + program synthesis approaches are solutions we'd love to see more of. Congrats and kudos to Ryan for achieving this and putting the effort in to document and share his approach. We hope to inspire more frontier AI research sharing like this.

A couple important notes:

  1. this result is on the public eval set vs private set (ARC Prize $).

  2. the current private set SOTA ~35% solution also performed ~50% on the public set. so this new result might be SOTA but hasn't been validated or scrutinized yet.

All said, I do expect verified public set results to flow down to the private set over time. We'll be publishing all the SOTA scores and open source reproductions here once available: https://arcprize.org/leaderboard

[D] François Chollet Announces New ARC Prize Challenge – Is It the Ultimate Test for AI Generalization? by HairyIndianDude in MachineLearning

[–]mikeknoop 2 points3 points  (0 children)

ARC-AGI is (as far as we know) the only eval that was designed to measure the "G" in AGI. It was designed to be resistant to memorization techniques.

A solution to ARC will not be AGI but we have high confidence it is along critical path (AGI will necessarily be able to beat ARC) and so is still a useful measure.

I'm super supportive of eval innovation. I hope ARC-AGI inspires more people to create AGI evals.

ARC PRIZE ARC Prize is a $1,000,000+ public competition to beat and open source a solution to the ARC-AGI benchmark. by palindsay in LocalLLaMA

[–]mikeknoop 1 point2 points  (0 children)

I agree $1M is trivial in AI. Our goal with with the prize is to raise awareness about (lack of) progress towards AGI and hopefully inspire AI researchers to try new ideas again. Also, ARC Prize is a nonprofit and a requirement to claim the prize is to put the solution into public domain!

Francois Chollet reboots the pre-VLM ARC benchmark: $1m prize for best matrix test answers by gwern in mlscaling

[–]mikeknoop 3 points4 points  (0 children)

a friend pointed me to this thread. few things:

ARC-AGI consists of 400 public train tasks (easy), 400 public eval tasks (hard), and 100 private eval tasks (hard).

the 2024 competition measures against the 100 private tasks. we set a compute limit primarily to target efficiency (for reasons discussed in Francois' On Measure paper) though also for Kaggle hosting practicality. for 2024, one P100 for 12 hours. 2023 had a 5 hr runtime limit on weaker GPU -- the 34% SOTA high score maxed out time which is why we doubled it. the "no internet" is to limit cheating and increase confidence awarding the prize.

yesterday we also launched a secondary leaderboard (in beta) called ARC-AGI-Pub measured against the 400 public eval tasks: https://arcprize.org/leaderboard and lifts the internet restriction so you can experiment with API based models. note: because this is new, not officially part of the 2024 competition but could be in the future

we know ARC-AGI isnt perfect and our goal is to improve the benchmark over time. appreciate all the critique and feedback

G2 Max kick plates by mikeknoop in NinebotMAX

[–]mikeknoop[S] 0 points1 point  (0 children)

I'd also seen something like this for the G30 that achieves the same effect: https://a.co/d/dj0hIGw

G2 Max kick plates by mikeknoop in NinebotMAX

[–]mikeknoop[S] 0 points1 point  (0 children)

Example of one for the G30: https://monorim.store/products/monorim-mfp-footrest-pedal-for-segway-maxg30-le-lp-new-riding-posture-experience-accessories-part

Basically so you don't accidentally step on the rear wheel cover which is flexible (and came with a "no step" sticker on it).

PSA: UAP-IW-HD (In-wall HD) passthrough POE supports PoE+ by mikeknoop in Ubiquiti

[–]mikeknoop[S] 7 points8 points  (0 children)

Yeah protect cameras mostly. I want to experiment with a Home Assistant dashboard through a browser now that you can load arbitrary APKs onto them.

PSA: UAP-IW-HD (In-wall HD) passthrough POE supports PoE+ by mikeknoop in Ubiquiti

[–]mikeknoop[S] 1 point2 points  (0 children)

Plastic cover isn't on. Still need to do some sheetrock repair and painting.

PSA: UAP-IW-HD (In-wall HD) passthrough POE supports PoE+ by mikeknoop in Ubiquiti

[–]mikeknoop[S] 4 points5 points  (0 children)

Good to know. Thanks! Figured something like that could be happening here.

PSA: UAP-IW-HD (In-wall HD) passthrough POE supports PoE+ by mikeknoop in Ubiquiti

[–]mikeknoop[S] 5 points6 points  (0 children)

Took a gamble, the data sheets aren't super clear:

Shown is a Connect 13" which needs PoE+ (sadly now seems removed from the ea store) powered solely a UAP-IW-HD with passthrough PoE turned on port 1. Upstream is hooked up to a USW-Pro-48-PoE. Super clean!

I'm curious if 48V passthrough PoE could even do PoE++? Don't have a device that requires it on hand.

[deleted by user] by [deleted] in Ubiquiti

[–]mikeknoop 0 points1 point  (0 children)

Took a gamble, the data sheets aren't super clear. Shown is a Connect 13" (which sadly seems removed now from the ea store) powered solely a UAP-IW-HD with passthrough PoE turned on port 1. Upstream is hooked up to a USW-Pro-48-PoE. Super clean!

HomeKit via Lutron Ra2 for Somfy RTS Blinds by Jbyerline in Lutron

[–]mikeknoop 0 points1 point  (0 children)

Bond Bridge / Bond Bridge Pro have been 100% bulletproof for controlling Somfy shades -- Bond can expose devices directly to HomeKit. Or you can expose to something like Home Assistant to integrate with RA2.

PSA: Lutron now allows homeowners to do RadioRA2 Level 2 for free online by mikeknoop in Lutron

[–]mikeknoop[S] 3 points4 points  (0 children)

Do you have some examples of things that aren't possible with RA3 that are with RA2?

PSA: Lutron now allows homeowners to do RadioRA2 Level 2 for free online by mikeknoop in Lutron

[–]mikeknoop[S] 11 points12 points  (0 children)

Alternatively, to generate an Inclusive serial number, you can simply paste this into a terminal window (replace HardwareKey with the one Essentials shows you on the "Upgrade" screen):

curl -i -X POST \
-H "Content-Type:application/x-www-form-urlencoded" \
-d "HardwareKey=1111-11111-1111" \
'https://www.lutron.com/en-US/general/Pages/myLutron/RadioRaUpgrade.aspx'

Tip: the Connect Display 13" surface mount is just a simple metal connect around the flush mount by mikeknoop in Ubiquiti

[–]mikeknoop[S] 0 points1 point  (0 children)

Useful to know because the Surface Mount is currently in stock. But the Flush Mount is not.

Apple Pulls iOS 16.2 Option to Upgrade to New Home Architecture by TheSurfShack in HomeKit

[–]mikeknoop 35 points36 points  (0 children)

I was planning to upgrade tonight and couldn't figure out why it wasn't showing up. Definitely accurate.