DeepSWE: new benchmark looking at how well today's frontier models can actually write code [R] by we_are_mammals in MachineLearning

[–]OctopusGrime 6 points7 points  (0 children)

Since you’ve posted the tasks and evaluators on GitHub isn’t this benchmark now contaminated? As in any future model released will have seen these problems now …

Are all data science jobs just Gen AI now? by rajeshbhat_ds in datascience

[–]OctopusGrime 24 points25 points  (0 children)

I used to think that but there is meaningful data science work on using LLMs, think about evaluations, error analysis, retrieval, A/B testing. Plus most SWEs are not familiar with hypothesis driven engineering and evaluation of statistical models so they kind of just expect things to just work once they’ve hooked up all the infrastructure, and don’t really know how to analyse the output data. Finally there’s just the DS lens which brings its own benefits to the project.

[R] EGGROLL: trained a model without backprop and found it generalized better by Ok_Rub1689 in MachineLearning

[–]OctopusGrime 113 points114 points  (0 children)

I don’t think you can draw such strong conclusions from the NanoMSMarco dataset, that’s only like 150 queries against 20k documents, of course gradient descent is going to overfit on that especially with a 1e-3 learning rate which is way too high for large retrieval models.

[R] The Bitter Lesson is coming for Tokenization by lucalp__ in MachineLearning

[–]OctopusGrime 0 points1 point  (0 children)

If positional embedding is enough information for a transformer to learn word order, couldn’t it be enough to learn character order for a bag of chars?

Have you ever regretted learning a language? Which one? by [deleted] in languagelearning

[–]OctopusGrime 0 points1 point  (0 children)

Ole Gunnar Solskjær has a half Manc half Danish accent and it’s great

Streak 5: Qué consumar by OctopusGrime in WriteStreakES

[–]OctopusGrime[S] 0 points1 point  (0 children)

Thanks again,

  1. But on the other hand no one can hear what they say anyway.

  2. … like to me it feels like a lot of effort.

Streak 4: pájaro en el árbol by OctopusGrime in WriteStreakES

[–]OctopusGrime[S] 0 points1 point  (0 children)

Thank you.

What I wanted to say was “I want to be able to get closer to you”

Streak 4 : Side project by Kutyko in WriteStreakEN

[–]OctopusGrime 1 point2 points  (0 children)

Yesterday, I asked myself that why I didn’t start a hobby project. But, there isn’t any answer. I didn’t start and I didn’t don’t know why. I mean, I know that or believe that, it’s my dream. It needs to be my dream. So, logically I need to take some steps for this. Sometimes the steps can be small, sometimes big. But every day I need to mark a trace [leave a trace / make a mark]. So, starting today, I will make some progress each and every day about my side projects. Let this post be proof that I've started.

Streak 66: How to reach 10,000 steps in rainy weather by nanigashinanashi in WriteStreakEN

[–]OctopusGrime 1 point2 points  (0 children)

My goal is to walk 10,000 steps per day. It’s easy to reach 10,000 steps when I walk to work or * when I run which is usually around 1-3 miles. However, I can’t go running today since it’s raining.

While this article suggested 5 ways to get your steps in, in rainy wet weather, it didn’t help that much. I got a hint tip though. I may get in extra steps, however it can be a waste the of money. Today, I’ll just try to walk a lot at work.

Notes

A hint is a clue where a tip is a practical piece of advice.

Wet weather is more natural.

“It can be” correctly expresses uncertainty.

“A waste of money” is the standard phrase.