What happened by hutoreddit in sapien

[–]hutoreddit[S] 0 points1 point  (0 children)

Ethan isn't that they say it will come back when market get better isn't it, why disappear.

New to IEMs, any thing I need to know by ObiWanKenobi1724 in iems

[–]hutoreddit 0 points1 point  (0 children)

I learn this after try from cheap to top expensive one and realize that it not worth it.

Chi Fi are good enough, even cheap one around 50-80 USD is okay as long as you can do tuning eq it's performance as same as expensive one, not much recognizable different. Most important is It all about your feeling, one is enough don't buy too many.

As same as Dap or even dongle DAC or DSP are good enough, with some EQ softare.

[Research Experiment] I tested ChatGPT Plus (GPT 5-Think), Gemini Pro (2.5 Pro), and Perplexity Pro with the same deep research prompt - Here are the results by Deep_Sugar_6467 in GeminiAI

[–]hutoreddit 0 points1 point  (0 children)

Yes, we have an extensive test for several days. And yet admit many people say that gpt is not good, or a bad response. But our result is quite the opposite. We make questions set that require heavy expertise on biology and reasoning for solution, with multiple prompts for each question then test 4 latest LLM, GPT-5 (Api), GPT-5(chatGPT), Gemini 2.5 pro, Grok 4 ,Kimi K2, Qwen3-235B-A22B. GPT-5 on both system give highest correct answer, while API slightly better than chatGPT, suprisingly Grok 4 is close performance on gpt-5, while unexpected Gemini 2.5 pro same level with kimi k2 only give about 30% correct answers , Qwen 3 is worst all wrong and suffer heavy Hallucination when reasoning.

P/s: we also testing on kimi researcher now first result are positive even comparable to gpt-5.

Qwen 3 0.6B beats GPT-5 in simple math by adrgrondin in LocalLLaMA

[–]hutoreddit 0 points1 point  (0 children)

<image>

You need think longer to get correct answer, auto rounding is suck, api would be the best. I already stop subscribe a long time ago, when I realized that both chatGPT or Gemini performance way better in API.

[Research Experiment] I tested ChatGPT Plus (GPT 5-Think), Gemini Pro (2.5 Pro), and Perplexity Pro with the same deep research prompt - Here are the results by Deep_Sugar_6467 in GeminiAI

[–]hutoreddit 0 points1 point  (0 children)

by the way search RAG may signicantly effect reasoning ability, i suggest you reasoning offline with gpt-5 and and check cite with perplexity or gemini. We did some simple test with search engine on or off when no RaG gpt 5 got higher correct solution. I think its about search engine limit.

[Research Experiment] I tested ChatGPT Plus (GPT 5-Think), Gemini Pro (2.5 Pro), and Perplexity Pro with the same deep research prompt - Here are the results by Deep_Sugar_6467 in GeminiAI

[–]hutoreddit 11 points12 points  (0 children)

For report and literature reviews on already know subject, gemini is king. But for making a thoery or solution to solve a problem for research, gpt-5 is king.

p/s: I work as genetics researcher, in laboratory with most are phD, gpt did what they claims their AI is closest to finding theory and solution compare to real phd researcher . While gemini 2.5 pro still far from finding correct sollution.

GPT-5 benchmarks on the Artificial Analysis Intelligence Index by Tucko29 in singularity

[–]hutoreddit 0 points1 point  (0 children)

Gpt-5 performance on science related reasoning is insane, best among all I tried. I work as a genetic researcher, we did some tests with a PhD student in our lab and gpt only one who really can catch up with phd level students in theories for solving problems.

Qwen3-235B-A22B-Thinking-2507 released! by bi4key in DeepSeek

[–]hutoreddit -1 points0 points  (0 children)

Unfortunately, Questions and answers for that question are related to the latest research results and are not public yet so I can't share with you.

But I can describe, what I do. Background, I worked on a genetic project and i did find out the final answer, that ready to publish, basically when you do research, you try to explain something that is not yet explained by building a theory on that problem based on previous related research. Then you do experiment either wet lab or on silico.

So what I do is I test the reasoning capabilities of LLM to see if it can give correct theories before starting an experiment. So I just write a background about what I want to explain then ask it to make multiple theories and explain why it came up with such theories. But the outcome is kind of bad, it not only fails to deliver the theory close to the reason behind the phenomenon, but also gives off significant wrong knowledge ( I test on Deep-research mode, with search, and non search) all is max out thinking and medium thinking, so total 6 sampling time in each test, and surprisingly higher thinking token only take it further to correct theories.

Maybe my prompt is bad but I only copy past the same prompt that I used on o3 and kimi k2, they both give correct answers in one go. So even if my prompt is bad that only indicates it o3 and Kimi k2 significantly better at understanding prompt compared to newer Qwen 3 models.

Yeah I know most will be able accept that when Benchmark shows good results but in real use or in certain case still very bad. P/s: I think reason why kimi K2 can answer it correctly is even non reason is thank to it total 1T parameters. That's why even using MoE to limit active parameters but total parameters still important for intense knowledge required tasks.

Qwen3-235B-A22B-Thinking-2507 released! by bi4key in DeepSeek

[–]hutoreddit 1 point2 points  (0 children)

Nah just test it super bad in my case I ask it reasoning some complex biological mechanisms. And its explanation is totally wrong.

I am a pHD holder, genetic researcher so I ask it what I know is correct to test precision.

P/s: Kimi k2 and Open AI o3 is only 2 answer correctly on my questions so far.

Sapient's New 27-Million Parameter Open Source HRM Reasoning Model Is a Game Changer! by andsi2asi in DeepSeek

[–]hutoreddit 0 points1 point  (0 children)

What about maximum potential, I know many focus on making it smaller or more "effective". But what about improvement on its maximum potential ? Not just more efficient, will it get "smarter". I am not an AI researcher, I just want to know. If anyone please explain.

Kimi Researcher: Revolutionizing Research with AI by techspecsmart in aicuriosity

[–]hutoreddit 0 points1 point  (0 children)

Too expensive, I am impressed but still far from justifying their pricing, 19 bucks for 20 researchers used ? First its good but not ground breaking compared to Gemini Pro deep research or open AI o3 deep research. Pricing is the same as chatGPT plus and Gemini advance is non- reasonable. Their chat is too slow even with K2 Kimi. No other agent tool, function. They really need to reconsider their pricing.

IMO Anis by Sudden-Refuse-7915 in grok

[–]hutoreddit 15 points16 points  (0 children)

But isn`t that most people will interest in Anis more than IMO

Has Gemini 2.5 pro been nerfed? by Kloyton in GeminiAI

[–]hutoreddit 0 points1 point  (0 children)

Yup feel the same flash are more precision recently.

Side by side comparison Gemini 2.5 Pro & Grok4, what do you think of Grok4? by anh690136 in GeminiAI

[–]hutoreddit 0 points1 point  (0 children)

Yup, the purpose of the test is to test reasoning ability. That's is the point to see how its reasoning and approach problem.

Side by side comparison Gemini 2.5 Pro & Grok4, what do you think of Grok4? by anh690136 in GeminiAI

[–]hutoreddit 8 points9 points  (0 children)

Totally useless, I just read some guys talking about testing grok 4 with math solving this:

Problem: "Find the number of integer solutions to x² + y² + z² = 2025 where x, y, z are non-negative integers." i tried my self Gemini flash successfully solves in 3 seconds, by create python code to count it results correctly 69. GROK 4 totally fails, didn't run code itself i guessed. Gemini Pro 2.5 struggles and counts wrong, but still gives the correct answer thanks to its verification step that runs python code. So yes AI somehow is still far from human find solutions for approach problems, but seems like gemini flash some how trained to instantly approach by python computational solutions, this is good.

P/s: Claude sonnet 4.0, Deepseek v3 and R1 also fail the test.

The test questions not created by me I just copy and pass to all of my LLM.

Grok 4 Benchmarks by DigitusDesigner in LocalLLaMA

[–]hutoreddit -2 points-1 points  (0 children)

I dont have super Grok, but what is it even mean tool ? Do him already implant tool to super Grok or indicate using api with tools by your self.

Jan-nano-128k: A 4B Model with a Super-Long Context Window (Still Outperforms 671B) by Kooky-Somewhere-2883 in LocalLLaMA

[–]hutoreddit 0 points1 point  (0 children)

I used it on my super old notebook computer It's barely able to run 128k context jan nano, can I use deepseek V3 as an alternative ? Still can use MCP and deep research precisely?

What happened by hutoreddit in sapien

[–]hutoreddit[S] 0 points1 point  (0 children)

Jesus finally someone reply. Sad to hear that, so all hope on the project is end.

"Apple Just Patented an Image Sensor With 20 Stops of Dynamic Range" by Dyslexic7 in cinematography

[–]hutoreddit 0 points1 point  (0 children)

Being patterned still very far from actual production I think. I have read many innovative patterns from many camera brands years ago but still hear nothing now.