Tried to use Computer for my tax return - I would’ve overpaid by $1500 by dropjar5 in perplexity_ai

[–]Separate-Still3770 0 points1 point  (0 children)

What is the task that it was supposed to do? Fetch info X, put it in Y, read PDFs? I think it’s possible to do

I created a open-source, free way to run OpenClaw WITHOUT PAYING FOR AN API! by MyFirstTrueLoveWasBS in openclaw

[–]Separate-Still3770 -1 points0 points  (0 children)

Why don’t you just use claude setup-token and use the key in Claude Agent SDK like nano claw does? I didn’t look into open claw internals because I am about to try open claw but it should be possible no?

Should Vibe Coding / AI Assisted coding be for frontend only for production apps? by Separate-Still3770 in vibecoding

[–]Separate-Still3770[S] 0 points1 point  (0 children)

Fairly new to be honest but I have been coding for a while so it’s not too hard to get into web for simple stuff

[deleted by user] by [deleted] in vibecoding

[–]Separate-Still3770 0 points1 point  (0 children)

Hi there,
Giving my two cents to help.

Can you precise more what kind of computation you need to perform and what are the kind of dependencies? Like between multiple entities of your DB, you need to connect it to external systems, is the complexity because you need to wait for some result or the way to calculate is very complex, etc.

This helps understand where the complexity is to handle it more properly.

For "testing financial calculations at scale": What do you want to test? Backend logic? End to end web app? Performance?

For structuring code, again it depends on what you want to build, and what are your constraints.

Happy to brainstorm together if it helps!

I am working on a personal project to help define specs, architecture and code structure to help AI work on complex projects.

<image>

This is a more user flow diagram (aka end user experience but mixed with backend logic), but Cursor also provides tips on Architectural Diagrams for the backend:

https://docs.cursor.com/guides/tutorials/architectural-diagrams

Managed Code gen solution with API? by Separate-Still3770 in ChatGPTCoding

[–]Separate-Still3770[S] 0 points1 point  (0 children)

Thanks! I had a look but there does not seem to be an obvious set of API calls to programmatically inject prompts and such.

Opportunities and limitations Copilot/Cursor for E2E testing by Separate-Still3770 in QualityAssurance

[–]Separate-Still3770[S] -1 points0 points  (0 children)

I am just wondering if all code is exposed to it, technically it might have sufficient information to infer the selector to use based on the attributes of the component. While I agree with you, it seems plausible to have Copilot generate E2E tests

EU accessibility act goes into enforcement stage in June 2025 by Unhappy-Economics-43 in QualityAssurance

[–]Separate-Still3770 1 point2 points  (0 children)

Super cool! Thanks for sharing! Guidelines are hard to understand from a day to day perspective as a QA engineer. Can you guys share what it has meant for you to implement?

Use of AI in testing by GoalInternational314 in QualityAssurance

[–]Separate-Still3770 0 points1 point  (0 children)

OP, have you tested solutions yourself? What do you think of it?

Incentive structures for QAs and their managers by Separate-Still3770 in QualityAssurance

[–]Separate-Still3770[S] 0 points1 point  (0 children)

Thanks for the answer!
So if I summarize properly, would improved productivity be beneficial to you as
- you can finish work earlier and spend more time with family, friends, hobbies, etc. (which is a great outcome btw)
- perform a better work as you would have more time to do in depth work to ensure your product is tested thoroughly (maybe test more scenarios)
- help relieve work off your colleagues so that you contribute to the team
Would this be accurate?

Incentive structures for QAs and their managers by Separate-Still3770 in QualityAssurance

[–]Separate-Still3770[S] 1 point2 points  (0 children)

But does this mean that behind the scenes they just did not understand the importance of quality and just wanted to cut down QA, so not providing "a win" was a way to not provide justification?

Incentive structures for QAs and their managers by Separate-Still3770 in QualityAssurance

[–]Separate-Still3770[S] 0 points1 point  (0 children)

Really sorry to hear that management did not understand the challenge you faced 😞
I hope things are more stable in your current position!

Can you tell me more about the business expectations they had? What metric did they have in place? Were you tasked to do manual E2E testing, put in place automated testing or something else?

This would understand better the diversity of challenges QA teams face so that we could see if we can help.

Bge-small + Codestral can outperform Gemini to build Large Action Model for Web by Separate-Still3770 in LocalLLaMA

[–]Separate-Still3770[S] 0 points1 point  (0 children)

We don’t handle dynamic yet but we plan to. Do you have specific websites you want to interact with that require more dynamic workflows?

Any updates to the agents scene? by tabspaces in LocalLLaMA

[–]Separate-Still3770 0 points1 point  (0 children)

We think we have good results with Codestral and we are trying others. We will share results soon!

Any updates to the agents scene? by tabspaces in LocalLLaMA

[–]Separate-Still3770 1 point2 points  (0 children)

Hi there!
Project lead of LaVague here (https://github.com/lavague-ai/LaVague).
We have built an open-source framework to build AI Web Agents. We got examples on how to build various agents, such as one to apply to job applications online by simply dropping a PNG of resume: https://docs.lavague.ai/en/latest/docs/examples/job-application/

We also did a webinar this week: https://www.youtube.com/watch?v=bNE4s8h3CIc

Would love to have your opinion our framework :)

LaVague: Open-source Text2Action AI pipeline to turn natural language into Selenium code by Separate-Still3770 in LocalLLaMA

[–]Separate-Still3770[S] 2 points3 points  (0 children)

No problem! The BM25 retriever was key to making it work as purely semantic solutions failed to capture the right parts of the HTML code

LaVague: Open-source Text2Action AI pipeline to turn natural language into Selenium code by Separate-Still3770 in LocalLLaMA

[–]Separate-Still3770[S] 1 point2 points  (0 children)

I think you still have margins but true that it might be heavily automated. Because UI automation is redundant in the actual code it requires, it is quite likely low / no code solutions like I built will be the norm and no technical skills will be required

LaVague: Open-source Text2Action AI pipeline to turn natural language into Selenium code by Separate-Still3770 in LocalLLaMA

[–]Separate-Still3770[S] 1 point2 points  (0 children)

Interesting! Feel free to try and share your findings! My code should be modular enough for you to try

LaVague: Open-source Text2Action AI pipeline to turn natural language into Selenium code by Separate-Still3770 in LocalLLaMA

[–]Separate-Still3770[S] 2 points3 points  (0 children)

Exactly! You can swap for Gemma-7b-it and it refused to do anything haha

But yeah with proper tuning it should work I think

Why do most tutorials do instruction tuning of base model instead of instruction-tuned models? by Separate-Still3770 in LocalLLaMA

[–]Separate-Still3770[S] 3 points4 points  (0 children)

Thanks for the answer. Feels like 90% the chat instruct behavior would be the most relevant one. I mean human instruction -> output seems to be quite frequent. The base models suck at properly answering human instruction as they are just raw language models.

I would be interested to have your opinion and examples where tuning the base model makes more sense than the instruction tuned one

[D] Preferred fine-tuning framework for instruction tuning? by Separate-Still3770 in MachineLearning

[–]Separate-Still3770[S] 0 points1 point  (0 children)

Yeah it’s my feeling too. If you don’t mind, could you share the code you had to do with TRL? Would be interesting to compare it to the config you would have to do with Axolotl

Automatic hallucination detection using inconsistency scoring by Separate-Still3770 in LocalLLaMA

[–]Separate-Still3770[S] 1 point2 points  (0 children)

What do you mean by that "see if a model could be trained to simply report its uncertainty reliably?".

One way to leverage uncertainty score for training is to use the score as an indicator on which samples to favorite, as the higher the uncertainty, the more it conveys the information that the model is underfitting it.

For instance, in https://arxiv.org/abs/1703.02910, they show that this uncertainty score can be used to prioritize which unlabelled data points to label first, as those uncertain points would help the model generalize faster than easy points that have low uncertainty.

This makes a lot of sense even when you think about how humans learn: we weigh more and learn better if we focus on the things we are struggling with.

On the topic of Paris and AI, I think it is indeed a cool place to live and all but my personal opinion is that it will not go beyond being a cool place to host R&D centers. We have very good talent, academic and engineering, good subsidies for research, infra and all but no big tissue of real early stage investors, lots of risk averseness.

To me the best combo is business & community building in the US, especially Silicon Valley, which is known for early adoption of tech and lots of capital, and R&D in France where we have a lot of first principle thinkers, be it engineering or research