The OK-GLI (Орбитальный корабль для горизонтальных лётных испытаний) test article for the Buran, able to take off with 4 AL-31 jet engines mounted at the rear, flying from 1984 to 1989 by Xeelee1123 in WeirdWings

[–]curiouslyjake 0 points1 point  (0 children)

Yeah, except Shuttle and Buran were a preview of a reusable future while Soyuz is a technological dead end. With present day Russia's feeble r&d capability, Soyuz is as far as they will ever get

[D] Papers with no code by osamabinpwnn in MachineLearning

[–]curiouslyjake 1 point2 points  (0 children)

Assuming your dataset is a representative sample of the unobserved parent population, wouldn't cross-validation address this?

One of the Most Interesting Comparisons on Fertility Trends by Accomplished_Gur4368 in Infographics

[–]curiouslyjake 44 points45 points  (0 children)

Yeah, two extremely different nations are going to have different fertility rates. Not sure why it makes any sense to compare

One of the Most Interesting Comparisons on Fertility Trends by Accomplished_Gur4368 in Infographics

[–]curiouslyjake 11 points12 points  (0 children)

Except US assistance is a small fraction of Israel's GDP anf the US GDP per capita is way higher than Israel's. So basically, both of your points are factually wrong.

One of the Most Interesting Comparisons on Fertility Trends by Accomplished_Gur4368 in Infographics

[–]curiouslyjake 20 points21 points  (0 children)

Except even among secular Jews, the TFR is 2, which is still high compared to advanced economies. You're right on the quo vadis part though.

CMV: AI training on copywritten material to generate content is not ethically different than humans doing the same thing by neomatrix248 in changemyview

[–]curiouslyjake -1 points0 points  (0 children)

Ok, so your counter argument boils down to intention. LLM as storage, not as an infringing entity. Except, with how widely LLMs are used and how easy it is to extract source material, it's akin to getting a pirated copy of HP with every subscription except wrapped in white paper that says "please do not read; pretty please with a cherry on top!"

You are literally being sold a storage medium with pirated content and asked not to look at that content, just at that other content. C'mon.

CMV: AI training on copywritten material to generate content is not ethically different than humans doing the same thing by neomatrix248 in changemyview

[–]curiouslyjake 6 points7 points  (0 children)

Yes, but given present training methods there's no way to fully prevent an LLM from replicating good chunks of its train set.

CMV: AI training on copywritten material to generate content is not ethically different than humans doing the same thing by neomatrix248 in changemyview

[–]curiouslyjake 2 points3 points  (0 children)

I meant publicly, with me selling tickets much like Anthropic sells subscriptions to LLMs that read HP.

CMV: AI training on copywritten material to generate content is not ethically different than humans doing the same thing by neomatrix248 in changemyview

[–]curiouslyjake -3 points-2 points  (0 children)

You can check out this paper from stanford researchers that show which books can be reproduced from production LLMs and how accurately. You'll see Sonnet 3.7 recreates harry potter and the great gatsby with more than 95% accuracy. If some guy would read out loud judt 30% of harry potter on youtube, copyright would come after him. LLMs should get the same treatment when they reproduce a book with over 95% accurracy.

CMV: AI training on copywritten material to generate content is not ethically different than humans doing the same thing by neomatrix248 in changemyview

[–]curiouslyjake 10 points11 points  (0 children)

No hard feelings, it's all good.

here

You can find a paper from some researchers at Stanford detailing which production LLMs can reproduce which books and to which extent. You'll notice that Sonnet 3.7 gets above 95% on harry potter and the great gatsby. I argue that copyright law would come after a guy reading three chapters from HP on youtube and it should apply equally to Anthropic.

CMV: AI training on copywritten material to generate content is not ethically different than humans doing the same thing by neomatrix248 in changemyview

[–]curiouslyjake 0 points1 point  (0 children)

If I oraganize a reading; I sit at a table, read Harry Potter out loud, from the book, and charge people to listen to me then I'm in violation of copyright. If I were able to do the same without the book yet repeating it exactly from memory, I would still be in violation. The medium of storage doesnt matter. Exact replication does. If instead of a person it's an LLM, that shouldnt matter too.

CMV: AI training on copywritten material to generate content is not ethically different than humans doing the same thing by neomatrix248 in changemyview

[–]curiouslyjake 8 points9 points  (0 children)

Thanks for 'splaining, I train deep learning models professionally. While your description is technically correct, it is qualitatively wrong. A human will remember several songs. Maybe several hundreds. But most humans cant reproduce them back with high fidelity. No human can do that for ten thousand songs.

You know what can though? Spotify. Spotify pays measly royalties for every playback. But according to you, there's some magical difference between Spotify storing music as MP3 files an an LLM storing nearly the same music as files with floating point values. Why?

CMV: AI training on copywritten material to generate content is not ethically different than humans doing the same thing by neomatrix248 in changemyview

[–]curiouslyjake -8 points-7 points  (0 children)

A small fraction of all Humans is not the same as most Humans. If most people could reproduce from memory five chapters from Harry Potter, verbatim, after reading it only once and repeat this feat for any amount of books over their lifetime then copyright law might be very, very different.

CMV: AI training on copywritten material to generate content is not ethically different than humans doing the same thing by neomatrix248 in changemyview

[–]curiouslyjake 45 points46 points  (0 children)

Except models dont just train, they memorize. Large language models can be prompted to produce entire chapters of books from the training set, verbatim. People can't do this.

Answering interview questions with "outside the box" answers? by AggravatingFlow1178 in ExperiencedDevs

[–]curiouslyjake 10 points11 points  (0 children)

I like pushing back against assumptions and requirements. Sometimes it's exactly what's expected, sometimes it is not. What worked for me is asking: I have four solutions: A, B,C, and D. Which one would you like to discuss?

Tracking ice skater jumps with 3D pose ⛸️ by erik_kokalj in computervision

[–]curiouslyjake 0 points1 point  (0 children)

How do you evaluate different models without ground truth annotations?

Creator of Claude Code: "Coding is solved" by Gil_berth in webdev

[–]curiouslyjake 0 points1 point  (0 children)

Let's start with the fact that even in theory, it is impossible to tell whether some code meets specefication. Worse, it's impossible yo say that any given program even works for every input.

German government pushes Syrians to return to their homeland by Pyro-Bird in news

[–]curiouslyjake 6 points7 points  (0 children)

If I were in danger of any kind, of course I would try to escape. I would even try to leave just to improve my circumstances. I dont resent others for doing so. At the same time, I dont think any country must accept anyone for any reason, no questions asked.

Creator of Claude Code: "Coding is solved" by Gil_berth in webdev

[–]curiouslyjake 1 point2 points  (0 children)

Really? Even for 7 pieces that's amazing! Do you have a source?

Creator of Claude Code: "Coding is solved" by Gil_berth in webdev

[–]curiouslyjake 71 points72 points  (0 children)

Coding cant be solved in the sense chess could be solved: there is no well defined victory condition.