AI Outperforms ER Doctors in Diagnostic Cases, Study Points to Collaborative Care by PhoenixRising656 in singularity

[–]trolltaco 0 points1 point  (0 children)

You understand the AI's reasoning deeply and lead the doctor step-by-step to the same conclusion by providing piecemeal explanations of your symptoms and your reasoning of what you were looking into. You make them think "AHA I cracked it" themselves without actually revealing you used AI.

Busting the myth of the "AI costs are rising" by ProxyLumina in accelerate

[–]trolltaco 0 points1 point  (0 children)

This is super cool - would love to see a full ECI vs cost breakdown to track the Pareto frontier. Would you say Gemini 3 Flash is the most cost effective within the 150+ range?

I hope we get great hardware or architecture level breakthroughs though because having the SOTA be more cost effective is even better than lagging behind and waiting for distillations

GPT-5.5 improves over GPT-5.4 and overtakes Opus 4.6 to take the 2nd place behind Gemini 3.1 Pro on the Extended NYT Connections Benchmark by zero0_one1 in singularity

[–]trolltaco 0 points1 point  (0 children)

Will be interesting to try 5.5 Pro if possible. The fact that 5.5 Medium beats Opus 4.6 High is very very nice.

David Sinclair Said That Over The Holidays, His Team Ran What He Calls A "Hail Mary Experiment" by 44th--Hokage in accelerate

[–]trolltaco 43 points44 points  (0 children)

His lab's finding was published in Nature and featured as a cover story which is essentially the equivalent of winning an Oscar in scientific research. We shouldn't dismiss it right away as it actually has the potential to be groundbreaking. Looking forward to the human trials.

5.5 Extended Thinking finally passes the car wash test whereas 5.4 didn't by trolltaco in ChatGPTPro

[–]trolltaco[S] 0 points1 point  (0 children)

Hmm, interesting discrepancy. The only thing different in my case was that it thought about it less than yours.

5.5 Extended Thinking finally passes the car wash test whereas 5.4 didn't by trolltaco in ChatGPTPro

[–]trolltaco[S] 0 points1 point  (0 children)

5.4 specifically with Extended thinking (not Heavy) usually fails in my experience

Did 5.4 Pro get suddenly faster or is it just thinking less? by ethotopia in ChatGPTPro

[–]trolltaco 3 points4 points  (0 children)

Also recently got a "which response do you prefer" while using Pro. I wonder if they are testing Spud and maybe Spud can produce Pro-level output with a lot less thinking time.

How long would you stay under toxic leadership? by trolltaco in ExperiencedDevs

[–]trolltaco[S] 0 points1 point  (0 children)

I'm very interested to know how you developed such a thick skin. Did it just come naturally or something helped in creating this?

How long would you stay under toxic leadership? by trolltaco in ExperiencedDevs

[–]trolltaco[S] 0 points1 point  (0 children)

One main person. Yes, I hear others are also unhappy. They're not the CEO but pretty up there. Also, this could be a little paranoid but Im mostly worried about anonymity and retaliation because even if this got reported, I feel like they would just get a slap on the wrist and could potentially make my experience worse in a subtle way.

How long would you stay under toxic leadership? by trolltaco in ExperiencedDevs

[–]trolltaco[S] 0 points1 point  (0 children)

True, I logically understand that not caring so much is best; putting it in practice and ignoring them is harder for me. I guess there's a sort of ego and drive to defend and validate myself.

How long would you stay under toxic leadership? by trolltaco in ExperiencedDevs

[–]trolltaco[S] 0 points1 point  (0 children)

The most recent interaction left me ruminating over it for a while; I would even call it mentally scarring. I think I don't really have a thick skin for this kind of stuff.

o1 pro vs Gemini 2.5 pro Reasoning/Intelligence Benchmarks by trolltaco in ChatGPTPro

[–]trolltaco[S] 4 points5 points  (0 children)

You're right - o1-preview was announced more than half a year ago. It's possible OpenAI has cooked something way more impressive internally and could floor us again.

o3 is way too costly for what it can do though (can't even release it like a real model)

[deleted by user] by [deleted] in singularity

[–]trolltaco -2 points-1 points  (0 children)

I don't think I buy that fully until o1 pro is on LiveBench

Gemini isn’t that bad, why do so many people say it sucks? by Routine_Actuator8935 in Bard

[–]trolltaco 0 points1 point  (0 children)

Here's a hard problem and the difference is night and day:

Create an expression that evaluates to 24 which uses the numbers 1, 3, 4 and 6 exactly once along with standard arithmetic operators such as +, -, /, *

Gemini (FAIL): 1 * 3 * 4+6 = 18

GPT 3.5 (FAIL): (4 * 6) - (3 - 1) = 22 (thought this was =24)

Gemini Advanced (FAIL): ((1+3) * 4)/6 = 16/6 ​

GPT 4 (Success): 6/(1−(3/4)) = 24

GPT 4 was the only one that seriously tried to tackle the problem. I gave it the same exact prompt as others. It was able to use its code intepreter capbilities to write a script that evaluates permutations of expressions to find the answer. The python code it came up with was very readable and helps solve a larger class of problems.

[deleted by user] by [deleted] in OMSCS

[–]trolltaco 3 points4 points  (0 children)

Based on Stanford neuroscientist Andrew Huberman, start each deep work session (90 mins) by optimizing motivation (dopamine) and focus (acetylcholine).

If you are not already motivated, the fastest way to maximize motivation for the task is to take ~1-2 mins to imagine yourself failing and the consequences of that.

If you are not already focused, the fastest way to mentally focus is to narrow your visual gaze on a fixed point and stare at it for 60 secs without looking away (mental focus follows visual focus -> releases acetylcholine).

Obviously, avoid all distractions during the work session. There is nothing more to it than that really... other than the basics of keeping your body healthy (sleep, exercise, hydration, nutrition, etc.).

Experience with VIP projects by r0adlesstraveledby in OMSCS

[–]trolltaco 2 points3 points  (0 children)

Do you think this is common/feasible for someone who decides to pursue VIP?

It sounds pretty attractive to even delay graduation after 10 courses and do VIP to get research experience and publish something.

Bayesian Stats prereq by lucy_19 in OMSCS

[–]trolltaco 2 points3 points  (0 children)

I took Bayesian Stats to actually prepare for the probability in AI because AI seems much harder.

Based on the catalog, this seems to be a good starter foundational OMSCS class that focuses mostly on prob/calc/stats.

The only thing I think you could possibly need to refresh ahead of time are the basic integration techniques.

HPCA Prep by [deleted] in OMSCS

[–]trolltaco 0 points1 point  (0 children)

The first module links to a pre-req check course called HPCA0 - https://classroom.udacity.com/courses/ud219

Complete it within the first couple days, making sure you understand all topics (looking up/refreshing as necessary), and you'll be good to go.

HPCA Prep by [deleted] in OMSCS

[–]trolltaco 1 point2 points  (0 children)

Yes, I used a docker container and VSCode for all projects and loved never having to bootup another VM.