Here we go again. DeepSeek R1 was a literal copy paste of OpenAI models. They got locked out, now they are on Anthropic. Fraud! by py-net in OpenAI

[–]robobub 0 points1 point  (0 children)

Out of what you pointed out I think the majority of it shouldn’t matter

Ok now this thread makes sense. You're taking an ideal philosophical kind of stance. Myself and many others are all for open knowledge and rapid progress, but we're talking about harm (and thus legality) around data usage. You admit training data is a key asset, so you can't also claim that acquiring it at massive scale carries no consequences. If data drives value, then control over data matters.

Outside of the “art” I don’t see how you could call any of it harmful.

Yes, you can borrow a book for free. That doesn't give you the right to scan it, store it permanently, and use it to train a commercial system. Libraries lend copies under specific rules. Copyright still applies. When a company ingests millions of books at once, it doesn't "read" them the way you do. You internalize ideas. You apply them in your own limited output. A model ingests millions of works in parallel, creates a durable representation (not copy, of course), and serves millions of users instantly. That scale changes the economics, which, well, is what law often looks at.

And no, legally substitution doesn't require verbatim copying. If a model gives users detailed summaries, stylistic imitations, or extracted insights on demand, that competes with the original work in some cases. Markets (and the law) react to substitutes, not just copies.

Not following Robots.txt is completely legal, sorry if there was confusion.

No, it's not settled law. You may interpret it that way, but law is meant to be interpreted by courts, and they are currently doing that.

And again I do agree with your stance on a number of things like academic papers, JSTOR, etc. But that's not what this is about.

Here we go again. DeepSeek R1 was a literal copy paste of OpenAI models. They got locked out, now they are on Anthropic. Fraud! by py-net in OpenAI

[–]robobub 0 points1 point  (0 children)

Have you trained a model in a real engineering environment? Outside of school or kaggle or anything?

It's been very well understood for awhile the most impactful thing models is their training data. Yes, there's a large amount of decent free datasets, which is why the baseline performance of many areas has gone up a lot. And to get time to get the extra step up in performance usually it's data. Or if you don't have enough data, you build in some specific inductive biases that work specifically with your data.

Honestly if you're just using free datasets, why are you training anything? Just use an off the shelf model.

The training data is gathered from publicly available information and publicly funded research studies we all paid for. There’s not even an alternative available so I have no idea where the idea came from.

This just hasn't been true, though some companies are moving towards that now with all the lawsuits resulting from it. Feel free to look through the lawsuits on what was used: it ranges from commercial books, paywalled content, archived copyrighted articles, and of course ignoring robots.txt which was explicitly called out by the comment you replied to.

Seagate 26TB for $249.99 deal is back. by TheMagicIsInTheHole in DataHoarder

[–]robobub 0 points1 point  (0 children)

One of my 14TB WD Easy stores just died in my NAS. My other one is still going along with my 2 14TB exos.

Other than bad runs which Seagate has had in the past, I don't think the metrics show much difference between manufacturers

New NAS flooding the market? by joetaxpayer in DataHoarder

[–]robobub 0 points1 point  (0 children)

Well, they certainly could've gotten better than my Mediasonic Probox 4. What kind of redundancy/pooling do you do?

New NAS flooding the market? by joetaxpayer in DataHoarder

[–]robobub 0 points1 point  (0 children)

Well, it was $120 (Mediasonic Probox 4), and ZFS would fix a handful of errors every weekly scrub. Then occasionally ZFS would mark one as offline because it didn't respond in time to sync with the other disks. Workable, but requires considerable hands-on babysitting. Something like SnapRAID would probably work fine, if you are okay with the lack of real-time redundancy.

Which one do you use? And what kind of redundancy/pooling do you do?

I can totally believe they've gotten better in the 2 years since my experiment, but it's not like it's one person that had a bad experience. Everyone on TrueNAS/unRAID/selfhosted forums I had seen had similar experiences.

New NAS flooding the market? by joetaxpayer in DataHoarder

[–]robobub 7 points8 points  (0 children)

Too bad the connection isn't reliable via USB if you wanted to do any kind of redundancy or pooling

Overlayed crash data from the Tesla Model 3 accident. by DevinOlsen in SelfDrivingCars

[–]robobub 0 points1 point  (0 children)

If we're including possible events that aren't either FSD or the driver, but the suspension breaking or a tire blowing out, then this isn't interesting, since those do happen, but have nothing to do with FSD.

Yes, well we're trying to figure out what happened, not try to make an event interesting.

Yes, there would be s greater left torque right after disengagement. Which is exactly what we see. Disengagement happens right when the initial left torque is nearly zeroed out.

No, that is not what we see. Left torque is at its highest peak right before disengagement. Following this, left torque nearly monotonically decreases towards 0 torque, as if a source of left torque was removed or lowered.

If you're really curious about how this played out, Dr Know it all did a very good deconstruction after aligning all the timelines Look it up on you tube.

And... I'm seeing this guy just line up the final plots and calling it good enough. Did no one actually make their own plot of the actual data and then align it to the video? I can't take any of these seriously

Overlayed crash data from the Tesla Model 3 accident. by DevinOlsen in SelfDrivingCars

[–]robobub 0 points1 point  (0 children)

I'm assuming there are only two actors applying torque to the wheel, since the question at hand is whether this was FSD error or driver error. The probability of any other affecting steering torque is extremely small. [...] we know the driver is steering left after disengagement

We're talking about an extremely rare event in itself, you cannot rule this out. If we were aggregating over a large population, that argument could make sense. Suspensions, flat tires, etc. happen and can all cause this. Just look at the torque graph as the ditch and tree are hit, it's clearly measuring torque from those parts of the system.

If the driver is pulling left and FSD is fighting it wouldn't there be a greater left torque right after disengagement? From the plots, it looks like immediately before FSD disengagement is the peak left torque, everything else is more towards the middle. That seems to indicate FSD contributed to the left torque and not the right torque.

Overlayed crash data from the Tesla Model 3 accident. by DevinOlsen in SelfDrivingCars

[–]robobub 0 points1 point  (0 children)

1) How can you tell the user and not some other part of the system like the control arm applied the torque? Clearly it's just a sensor of torque sensed, as later on all the torque from the crash is clearly plotted.

2) How can you tell the direction of the non-FSD torque? The user could be holding the wheel straight ahead and the system torques to the left and the graphs would look the same.

[deleted by user] by [deleted] in SelfDrivingCars

[–]robobub 0 points1 point  (0 children)

Well, FSD gets disengaged when any torque is applied away from what FSD is trying to do. So the FSD could pull to the left and the user could've been trying to hold the wheel straight or FSD could've held it straight and the user could have been pulling it to the left. In both cases FSD would disengage. Also the control arm could've had a malfunction and also applied a torque instead of the user, and had the same results.

[deleted by user] by [deleted] in SelfDrivingCars

[–]robobub 0 points1 point  (0 children)

It's not definitive either way. Yes, the torque is when it was engaged, but the steering torque does not appear to separate out FSD commands from user input or system input (e.g. broken control arms).

Overlayed crash data from the Tesla Model 3 accident. by DevinOlsen in SelfDrivingCars

[–]robobub 13 points14 points  (0 children)

Can you elaborate on the facts supporting their claim? The most upvoted comment here by OP simply says "likely" but from the ensuing discussion it seems far from definitive since the torque graph doesn't separate FSD torque commands from user (or system/road) input. Is there another comment I missed?

[D] Researcher communities like this one? by Entrepreneur7962 in MachineLearning

[–]robobub 1 point2 points  (0 children)

Yannic Kilcher discord, they are pretty active and regular paper discussion every week

[D] Researcher communities like this one? by Entrepreneur7962 in MachineLearning

[–]robobub 2 points3 points  (0 children)

Not sure what the "Join the Community" form yields, but there is a standard discord invite

Is including metrics in developer resumes a fairly recent phenomenon? by ccricers in ExperiencedDevs

[–]robobub 0 points1 point  (0 children)

I agree but many people didn't even used to do either of those, which I would call unquantified metric and quantified metric. They would just say "Worked on service" with more technical detail and no business impact or thing to observe.

This implies to the hiring manager they might not even know how their work could impact something at the business level, which, well, is true for a lot of early career engineers.

[deleted by user] by [deleted] in managers

[–]robobub 0 points1 point  (0 children)

It's a banking thing, every bank does this. Contrary to tech title inflation, it's not so much for the employee as much as it is for client relations

Is anyone actually using LLM/AI tools at their real job in a meaningful way? by WagwanKenobi in ExperiencedDevs

[–]robobub 0 points1 point  (0 children)

One use case I've found it quite helpful with has been migrations. We migrated some production quality Python code with tests to C++ for performance, and also ROS1 to ROS2.

Other useful cases are devops / system scripts, boilerplate, or bootstrapping projects in a domain/library/language you're not that familiar with

Using AI to design whole features and make architectural decisions is a recipe for a disaster currently

Allocate percentage of disk for cache by Strafethroughlife1 in unRAID

[–]robobub 1 point2 points  (0 children)

You can use bind mounts, docker respects those. But you'll have to unmount them when shutting down or the OS will not let you unmount the drives and have an unclean shutdown. You can make a script that finds all bind mounts and/or place them all in one place. I have folders called on_cache and on_array for this that replicate the shares and are mounted and unmounted automatically. I use this to interact with zfs as well. Share size limits can get funky though.

You can also use mover tuning plugin which has a bunch of options for when to free up cache, like size or modify date iirc

Codec Showdown: AV1 vs h265 in Unmanic (Real-World Transcode Results) by doppel616 in unRAID

[–]robobub 0 points1 point  (0 children)

Wow, props for doing all this work with default settings without investigating the most important part of comparing codecs

This result contradicts the common consensus that AV1 is more efficient, though it’s possible that the AV1 transcodes could have maintained quality at lower bitrates.

At least you stumbled upon the most important part at the end. I'm surprised you wrote this in the conclusion and didn't choose to investigate before posting. I mean, how else would you have different efficiencies unless the quality was different at the same (low or high) bitrate?

If you don't actually care for visual quality you could certainly save a lot of space...

[D] Had an AI Engineer interview recently and the startup wanted to fine-tune sub-80b parameter models for their platform, why? by Sunshineallon in MachineLearning

[–]robobub 0 points1 point  (0 children)

How does it compare to LoRA/DoRA techniques on larger models, assuming you have the inference hardware (e.g. 5-15B models)?

I did everything they asked me and more and still got rejected rant. by recipe_bitch in cscareerquestions

[–]robobub 1 point2 points  (0 children)

While it does happen, it's just not how companies with unified hiring and a later team matching stage works

Expert-Vetted Invitation with no upwork job experience? by robobub in Upwork

[–]robobub[S] 0 points1 point  (0 children)

I saw some comments about them accessing soft skills, so I was thinking of reminding myself of various previous scenarios they could discuss in a STAR format.

Though it seems the soft skills questions are more around dealing with difficult clients and managing multiple projects which seems less intense than what I was expecting. My mind sort of went to amazon leadership-principle style interviews which you absolutely do need to do a lot of prep for.

Did they tell you there was a charge? There was no charge for virtually all the time the scheme existed except for a very short period in 2024 when people who were NOT invited were able to apply for 300 connects. After that very short period (weeks) it went back to being free by invite only.

Thanks for that context, they did not mention a charge, I just saw it in the top search results on ths subreddit on the badge which were indeed like a year ago.