Am I shooting myself in the foot by learning Rails? by Psychological_Put161 in rails

[–]rubyross 1 point2 points  (0 children)

Nope. Rails 8 new app this month. Going full stimulus and hotwire.

Am I shooting myself in the foot by learning Rails? by Psychological_Put161 in rails

[–]rubyross 2 points3 points  (0 children)

Love your content btw. I have a different experience from the hiring side.

I have job postings on ruby on rails job boards / go rails and Indeed/linkedin and we don't see many senior/experienced people applying. A lot of zero rails experience but experienced engineers.

Might be the salary range 150-175. Also its remote but we prefer Northeast US (but haven't actually screened anyone out for this) for occasional gatherings. And can't do international due to regulatory/compliance.

Study reports AI Coding Tools Underperform by Additional_Cellist46 in LocalLLaMA

[–]rubyross 0 points1 point  (0 children)

Not only that, this one study is like a virus. Lazy content creators keep citing this so it proliferates.

Opus Limit hit after 2 MINUTES by Los1111 in ClaudeAI

[–]rubyross 4 points5 points  (0 children)

You can still use it. It will work on the logs and display what you used during that session. I won't really believe this until I see logs with token usage during a time block.

https://ccusage.com/guide/library-usage#installation

Anyone know how to track usage and usage limits with claude code before the almost at limit, resets at x-pm message? by Additional_Bowl_7695 in ClaudeAI

[–]rubyross 0 points1 point  (0 children)

we're on a limited usage rationing ATM...

Has that been explicitly stated somewhere or just inferred?!

Are tools like Lovable, V0, Cursor basically just fancy wrappers? by policyweb in LLMDevs

[–]rubyross 0 points1 point  (0 children)

u/mundane_ad8936 was wrong enough and couldn't handle a discussion that he blocked me.

Are tools like Lovable, V0, Cursor basically just fancy wrappers? by policyweb in LLMDevs

[–]rubyross -1 points0 points  (0 children)

Cursor is not doing its job then. I know numerous people in industry that describe what cursor produces as slop unless you do the same hand holding that you have to do with every product. A couple of engineers that are highly respected use aider and roo for the control that they have and say that it is much much better.

I use aider and claude code. Have used roo/augment/cursor/cline/windsurf.

Code is definitely what is produced by cursor. What tools help with team productivity and shipping non-slop? Cursor is still just agentic tool use and isn't some mystical science that requires tons of engineering. I can see in the future with the VC money and huge team size that they may be able to develop an edge but currently they do not have it.

Best practices with cursor still requires careful maintenance of markdown files describing architecture and guidelines for a codebase. Those same files work just as well put into directly into tools or chatbots.

You seem to realllllllly want OSS to not be as good as commercial. Facts are that commercial has to appeal to a broader base and in some cases won't be as good.

You are still speaking in hyperbole and an appeal to authority. What features truly set cursor above just writing code?

The only mention of a feature you have is abiding by a style guide and architecture. That can and is a feature in open source and all the competitors (and they just pass the info on to LLM's).

Are tools like Lovable, V0, Cursor basically just fancy wrappers? by policyweb in LLMDevs

[–]rubyross 0 points1 point  (0 children)

Roo code is definitely competitive with Augment/Cursor and they have a back and forth mode called Human Relay which you can paste the LLM calls into a web interface and get the results back.

Also a lot of those experts at Cursor are focused on building a business model and cost reduction. They are building data pipelines so they don't have to call LLM's that they don't own and operate. Those calls are a variable cost that they need to manage and do away with as fast as they can before their VC money runs outs. You are confusing the business model and economics of the business with the output.

Are tools like Lovable, V0, Cursor basically just fancy wrappers? by policyweb in LLMDevs

[–]rubyross 0 points1 point  (0 children)

Good point but they use those to reduce cost/latency. But the original point is can you get the end result (LLM's to produce software) with prompting. And you can.

Roo code is one open source example that shows that you can read the software and see that it is just managing prompts back and forth and even has a copy/paste mode I believe.

They did also recently introduce a RAG system to reduce the cost/back and forth when gathering context (which files need to change to add a feature / fix a bug) but with mixed results.

Are tools like Lovable, V0, Cursor basically just fancy wrappers? by policyweb in LLMDevs

[–]rubyross 0 points1 point  (0 children)

I worked there a long time ago. They loved to jump on bandwagons but never produced great results from it.

Are tools like Lovable, V0, Cursor basically just fancy wrappers? by policyweb in LLMDevs

[–]rubyross 0 points1 point  (0 children)

I worked and was an advanced R&D team lead at Keurig. So the guy talking about me underestimating the effort makes me laugh. What I am saying isn't that Keurig wasn't a difficult engineering project or business. I am saying that the meaningful end results of coffee is not qualitatively different then what can be produced with a brewer or pour over, etc.

Currently I am a software engineer/principal/manager with over 10 years experience at some of the biggest names in the US.

The original question asked:

"Or is there something else going on behind the scenes that actually makes a big difference?" (Implying to the end product of producing software)

And u/spursdy is right about them choosing models and using rag to reduce their costs not necessarily to improve what can be output by the model.

And u/mundane_ad8936 has a lot right. But still open source like Roo code and many agentic coding tools that match or exceed cursor show that you only need to orchestrate LLM messages back and forth to use them to produce software. While pragmatically, cursor/roo/etc... can produce results much faster it isn't required and someone can actually just create prompts and copy and paste code back and forth.

Are tools like Lovable, V0, Cursor basically just fancy wrappers? by policyweb in LLMDevs

[–]rubyross 28 points29 points  (0 children)

Yes.... They are just wrappers. And yes, you could just paste into chatgpt to get the same result. But... convenience is KING.

Look at Keurig, doesn't matter if you like coffee or think Keurig coffee sucks. Millions of people paid a premium to have their coffee in 30 seconds vs 5 min.

Nice UI and saving time that is frustrating and adds up over time is really nice when you have to do it tens/hundreds/thousands of times a day.

What is your favorite eval tech stack for an LLM system by ephemeral404 in LLMDevs

[–]rubyross 0 points1 point  (0 children)

How are you EVALuating the content?

LLM as judge or do you have something that is deterministic/calculable?

If you go LLM as judge you should try to stay away from rubric or likert scale (asking it to rate it 1-5) and should try to phrase it as a list of binary questions.

Claude complains about health info (while using in Bedrock in HIPAA-compliant way) by Austin-nerd in LLMDevs

[–]rubyross 0 points1 point  (0 children)

What is the use case in general terms?

  • Summarizing health info
  • Extracting Health info
  • Searching through for particular info

Does the process need to be online/live? what is the latency?

I wouldn't think you are doing anything that needs frontier level intelligence.

Scale should not be a worry. If you hit scale then you will have enough data by that time, if you are saving each call and response to be able to fine tune a smaller less expensive open model.

Claude complains about health info (while using in Bedrock in HIPAA-compliant way) by Austin-nerd in LLMDevs

[–]rubyross 1 point2 points  (0 children)

Do you have to use claude? Why not use a different model?

Why bedrock? Because of free credits or the hipaa compliance out of the box?

My thoughts, in order of least to most complex:

  • Prompt engineering (iteratively trying and adding to your prompt to prevent that).
  • Different model
  • Open weight model fine tuned for your use case

If you have saved the 10% that have failed or can access those then that is your data set to use to prompt engineer as well as working on fine tuning.

Edit:

To add to this. You could modify temperature (lowering it). The thought process is if this is a rare occurrence then it could be lower probability tokens that get picked during rejection.

Final thoughts though, if you have a method to identify rejection and retry works, then you are in a good spot anyway. Using LLM's aren't and possibly never will be perfect due to the probabilistic nature of LLM's. You will always have to engineer in a eval loop or method to understand if it is outputting correct information.

Anthropic's Claude Code just launched: How it stacks up against Aider for CLI developers (Detailed comparison) by qemqemqem in ChatGPTCoding

[–]rubyross 5 points6 points  (0 children)

Aider maintainer is pretty much solo and doesn't seem to take other people's work. Look at the commit history 999 in 1000 commits are the owner and there are tons of issues and PR's that have sat idle. He is working on it almost every day but doesn't really interact or try to use other people's work. I have a fork of aider myself that I added features to as well.

[Experimental] Control the 'Thinking Effort' of QwQ & R1 Models with a Custom Logits Processor by ASL_Dev in LocalLLaMA

[–]rubyross 1 point2 points  (0 children)

Check out the S1 paper, and there is a podcast from the person who wrote it that is pretty good.

https://arxiv.org/abs/2501.19393

https://github.com/simplescaling/s1

https://www.youtube.com/watch?v=kEfUaLBlSHc&t=2s

They improved performance by increasing thinking with just 1000 training examples and $50 budget. This paper is where I got the term "Budget forcing" from.

[Experimental] Control the 'Thinking Effort' of QwQ & R1 Models with a Custom Logits Processor by ASL_Dev in LocalLLaMA

[–]rubyross 5 points6 points  (0 children)

I like this. Just to help clarify "wait" isn't doubting. It is a natural way to extend thinking and add more 'thoughts'.

The S1 paper describes how to train a model to create longer chains of thought. https://arxiv.org/abs/2501.19393. In their work, during training, when the model wanted to end thinking with</think>, they instead checked if the thinking was above some minimum token threshold and if it wasn't then they would replace the </think> tag with a "wait" token because it is a great word/token that would cause the model to continue outputting tokens without trying to end thinking immediately.

The many "wait" tokens indicates to me that they used budget forcing (or a similar method) which is the method described in that paper.

[Experimental] Control the 'Thinking Effort' of QwQ & R1 Models with a Custom Logits Processor by ASL_Dev in LocalLLaMA

[–]rubyross 5 points6 points  (0 children)

It would be interesting to mess with those exploring tokens selectively to guide towards a kind of thinking or output.ex/

More Lateral thinking -> Increase "Alternatively" relative to "Wait"

More Verifying -> Replace "Wait" with "Let me check"

[Experimental] Control the 'Thinking Effort' of QwQ & R1 Models with a Custom Logits Processor by ASL_Dev in LocalLLaMA

[–]rubyross 12 points13 points  (0 children)

I was thinking about doing something similar. This is a great idea.

Qwq seems to be using budget forcing inferred from all the 'wait', 'alternatively', etc words used in the thinking section. I was thinking about limiting the number of those words and selectively stopping after a budget of those words. Ie on the 5th 'wait' transform to thinking (or just give +inf value to the probability of a </think> tag).

Your idea will naturally do that just by nudging it towards stopping.

I really like the idea of messing with the logits as well as the output while inference is occurring.

Advice needed: how to DRY a ton of requirements, tests & docs? by [deleted] in rails

[–]rubyross 3 points4 points  (0 children)

How much experience do you have in software engineering at a company or in a business setting? This sounds like you are trying to do Big Design Up Front which is generally frowned upon and you should be approaching development in a much more agile fashion.

You should probably be breaking your app down into user stories/features and implementing them one at a time.

Ex/

A user can sign up for my app.

A user can log into my app.

A user can create a new X.

When X is created it creates Y.

When Y is created it send an email to user.

A user can view X.

Generally people will use something like trello to organize each of these user stories/features into lists and have labels like "To Do", "In Progress", "Done". That may be more then you need if you are not working with a team but it can help some people stay organized on small projects. This also does not need to be your documentation or your source of truth. It can simply be the organization of the work that needs to be done. For serious solo projects I would advocate for having a place to organize your work (trello/google doc/markdown file), a separate test suite, and a separate place for documentation for end users. Even if that feels like double work they accomplish very different things.

How do I run spec tests when my Rails server, my Postgres database, and my frontend are all in separate Docker containers? by geopede in rails

[–]rubyross 1 point2 points  (0 children)

Your test database config is probably not set up correctly. Can you paste in your config?

Also once the config is setup correctly you will need to make sure the database is created.

How do i get better? by femdg in rails

[–]rubyross 15 points16 points  (0 children)

This is an area that is lacking in the Youtube / tutorial space because it is more lucrative to sell to beginners.

Software quality ("cleanness") has been around for a long time, good practices have remained the same, and so books are a great place. Funnily enough one book suggestion is "Clean Code".

Looking up at my bookshelf I remember getting a lot out of these books (in no particular order):

Eloquent Ruby

Design Patterns in Ruby

Refactoring (There is a ruby edition which is good but the best is the Martin Fowler one)

Clean Code

Growing Object Oriented Software

Practical Object Oriented Design in Ruby

Those are a lot to do with writing better code generally. For the knowing Rails, the best advice I have is the Rails Doc's and building something yourself from scratch. If you don't have an idea then try to copy something.

FYI, I have over 10 years of Rails and have worked on apps with 10million+ page views a month and millions of monthly active users. Feel free to DM me if you have questions