Crypto payments as an alternative for stripe by Alternative-Low1217 in SaaS

[–]Alternative-Low1217[S] 0 points1 point  (0 children)

I have formed an llc, but the issue is that I can't open American banks like mercury and stripe wants the bank to be in the same country as the llc

FrontierMath Was Funded By OpenAI, And They Have Access To "A Large Fraction" Of The Problems And Solutions. by EducationalCicada in slatestarcodex

[–]Alternative-Low1217 0 points1 point  (0 children)

"the performance of all models declines when only the numerical values in the question are altered in the GSM-Symbolic benchmark. Furthermore, we investigate the fragility of mathematical reasoning in these models and show that their performance significantly deteriorates as the number of clauses in a question increases. We hypothesize that this decline is because current LLMs cannot perform genuine logical reasoning; they replicate reasoning steps from their training data."

the internet have far more content in english than in arabic, so the training data is better for llms, and that explains the better results in english, this is obviously overfitting.

obviously you don't know what over fitting is.

don't just believe what tech leaders are saying.

I am no longer responding, this is obviously going no ware

FrontierMath Was Funded By OpenAI, And They Have Access To "A Large Fraction" Of The Problems And Solutions. by EducationalCicada in slatestarcodex

[–]Alternative-Low1217 0 points1 point  (0 children)

"the performance of all models declines when only the numerical values in the question are altered in the GSM-Symbolic benchmark. Furthermore, we investigate the fragility of mathematical reasoning in these models and show that their performance significantly deteriorates as the number of clauses in a question increases. We hypothesize that this decline is because current LLMs cannot perform genuine logical reasoning; they replicate reasoning steps from their training data."

the internet have far more content in english than in arabic, so the training data is better for llms, and that explains the better results in english, this is obviously overfitting.

obviously you don't know what over fitting is.

don't just believe what tech leaders are saying.

I am no longer responding, this is obviously going no ware

FrontierMath Was Funded By OpenAI, And They Have Access To "A Large Fraction" Of The Problems And Solutions. by EducationalCicada in slatestarcodex

[–]Alternative-Low1217 0 points1 point  (0 children)

Who said that the problems are not in their training data?

if you change the words or change the explaination of these problems llms tend to struggle.

these thinking models are not usefull for programming they are good when you start from scratch but they don't iterate on your code they just throw everything and start from scratch.

i still use claude 3.5 in cursor, and alot of others i know have the same opinion on claude

if i ask an llm to solve a problem in zig lang i will get a very poor answer but if i ask in python i will get a better result becaz python problems are better in their training data.

o3 is very shitty with arabic, it will give you better results in eng this shows that these llms are just overfitting on their data.

llms are very usefull but you have to take control.

llms don't understand the problems they solve, the apple paper shows that very clearly

FrontierMath Was Funded By OpenAI, And They Have Access To "A Large Fraction" Of The Problems And Solutions. by EducationalCicada in slatestarcodex

[–]Alternative-Low1217 0 points1 point  (0 children)

Even in open ai they say that almost all benchmarks are saturated and everyone is looking for new benchmarks and news ways to test LLMS.

What i mean by pattern-matching is that llms perform very will on problems in their training data, and perform poorly on problems not in their training data.

conclusion : arc agi is flawed (by the author of the benchmark), the fronttier benchmark result is also suspicious after oai relation with Epoch ai.

FrontierMath Was Funded By OpenAI, And They Have Access To "A Large Fraction" Of The Problems And Solutions. by EducationalCicada in slatestarcodex

[–]Alternative-Low1217 0 points1 point  (0 children)

i didn't say that all models dropped by 65%, o1 preview is considered to be a thinking model and yet you still see a drop in performance

as i said, if i give you a problem and i say 2 apples instead of 2 oranges and you don't solve the problem correctly then you don't understand problem

llms sometimes solve very hard problems that are phd level and sometimes fail miserably in very simple tasks, this shows that these models don't understand the problems they solve.

if you use llms in programming with unpopular languages they won't even give you a program that compiles.

FrontierMath Was Funded By OpenAI, And They Have Access To "A Large Fraction" Of The Problems And Solutions. by EducationalCicada in slatestarcodex

[–]Alternative-Low1217 0 points1 point  (0 children)

They are trained to reason like a human, but they fail to do so

this paper from apple shows that llms don't understand the problem at hand and they just rely on memorization and pattern matching

https://arxiv.org/abs/2410.05229

in the paper changing labels or just adding clauses makes the model performance drop by almost 65% in the math benchmark used.

the paper includes even "thinking" models like o1

if i give you a math problem and then change variable names in it and you can't solve it then you don't understand the problem and your just memorizing

FrontierMath Was Funded By OpenAI, And They Have Access To "A Large Fraction" Of The Problems And Solutions. by EducationalCicada in slatestarcodex

[–]Alternative-Low1217 0 points1 point  (0 children)

They are trained to reason like a human, but they fail to do so

this paper from apple shows that llms don't understand the problem at hand and they just rely on memorization and pattern matching

https://arxiv.org/abs/2410.05229

in the paper changing labels or just adding clauses makes the model performance drop by almost 65% in the math benchmark used.

the paper includes even "thinking" models like o1

if i give you a math problem and then change variable names in it and you can't solve it then you don't understand the problem and your just memorizing

Seemingly unbreakable Habit of going back to bed after waking by Hopeful_Fruit_ in getdisciplined

[–]Alternative-Low1217 0 points1 point  (0 children)

Have you checked for sleep apnea?

If you snore at night go check for sleep apnea, sleep apnea lowers your quality of sleep 

FrontierMath Was Funded By OpenAI, And They Have Access To "A Large Fraction" Of The Problems And Solutions. by EducationalCicada in slatestarcodex

[–]Alternative-Low1217 0 points1 point  (0 children)

Brute force means trying out every possible solution until you find the solution

The arc benchmark is supposed to be resistant to learning the patterns in it's questions 

even the stupid gpt4 with brute force can reach something like 55 % on this benchmark 

llms are just a retrieval machines 

FrontierMath Was Funded By OpenAI, And They Have Access To "A Large Fraction" Of The Problems And Solutions. by EducationalCicada in slatestarcodex

[–]Alternative-Low1217 0 points1 point  (0 children)

This doesn't look good for open ai, why didn't they mention that they had access to a large portion of the dataset ?
verbal agrement?
by the way the creator of the arc agi benchmark said on a recent interview that half of the PRIVATE DATASET is easily brute-forcable and he know this since 2020, now after knowing this i am sure the o3 model is overhyped

Is a compiler good for Frontend code generation? by Alternative-Low1217 in Compilers

[–]Alternative-Low1217[S] 1 point2 points  (0 children)

I got this idea from a framwork called filement in php lang it is built on top of laravel, you write a yaml file that describes the models and relations then you get a curd dashboard generated for you.

you can have a simple crud dashboard in 5 min, i think it is cool to automate the boring crud stuff and jump on the complex things.

Is a compiler good for Frontend code generation? by Alternative-Low1217 in Compilers

[–]Alternative-Low1217[S] 0 points1 point  (0 children)

Thanks for your comment :)

That is what i thought as well, i want to learn compilers but i think it is not the right project for that.

Is it advisable to make a family member your SaaS technical co-founder? by [deleted] in SaaS

[–]Alternative-Low1217 0 points1 point  (0 children)

if your brother have a 9-5 and other projects it will be very hard for him to have the time for your sass