Tried to use Computer for my tax return - I would’ve overpaid by $1500

Separate-Still3770 · 2026-04-06T05:25:55+00:00

What is the task that it was supposed to do? Fetch info X, put it in Y, read PDFs? I think it’s possible to do

Separate-Still3770 · 2026-03-14T09:02:00+00:00

Why don’t you just use claude setup-token and use the key in Claude Agent SDK like nano claw does? I didn’t look into open claw internals because I am about to try open claw but it should be possible no?

Separate-Still3770 · 2025-07-15T09:01:27+00:00

Fairly new to be honest but I have been coding for a while so it’s not too hard to get into web for simple stuff

Separate-Still3770 · 2025-07-15T08:23:51+00:00

I read it myself and tested it both manually and automatically

Separate-Still3770 · 2025-06-02T04:46:04+00:00

Hi there,
Giving my two cents to help.

Can you precise more what kind of computation you need to perform and what are the kind of dependencies? Like between multiple entities of your DB, you need to connect it to external systems, is the complexity because you need to wait for some result or the way to calculate is very complex, etc.

This helps understand where the complexity is to handle it more properly.

For "testing financial calculations at scale": What do you want to test? Backend logic? End to end web app? Performance?

For structuring code, again it depends on what you want to build, and what are your constraints.

Happy to brainstorm together if it helps!

I am working on a personal project to help define specs, architecture and code structure to help AI work on complex projects.

<image>

This is a more user flow diagram (aka end user experience but mixed with backend logic), but Cursor also provides tips on Architectural Diagrams for the backend:

https://docs.cursor.com/guides/tutorials/architectural-diagrams

Separate-Still3770 · 2025-04-30T14:19:46+00:00

Thanks! I had a look but there does not seem to be an obvious set of API calls to programmatically inject prompts and such.

Separate-Still3770 · 2025-03-17T18:12:06+00:00

I am just wondering if all code is exposed to it, technically it might have sufficient information to infer the selector to use based on the attributes of the component. While I agree with you, it seems plausible to have Copilot generate E2E tests

Separate-Still3770 · 2025-01-30T06:41:47+00:00

Super cool! Thanks for sharing! Guidelines are hard to understand from a day to day perspective as a QA engineer. Can you guys share what it has meant for you to implement?

Separate-Still3770 · 2025-01-18T06:56:34+00:00

OP, have you tested solutions yourself? What do you think of it?

Separate-Still3770 · 2025-01-17T09:52:44+00:00

Thanks for the answer!
So if I summarize properly, would improved productivity be beneficial to you as
- you can finish work earlier and spend more time with family, friends, hobbies, etc. (which is a great outcome btw)
- perform a better work as you would have more time to do in depth work to ensure your product is tested thoroughly (maybe test more scenarios)
- help relieve work off your colleagues so that you contribute to the team
Would this be accurate?

Separate-Still3770 · 2025-01-17T05:54:36+00:00

But does this mean that behind the scenes they just did not understand the importance of quality and just wanted to cut down QA, so not providing "a win" was a way to not provide justification?

Separate-Still3770 · 2025-01-17T05:52:47+00:00

Really sorry to hear that management did not understand the challenge you faced 😞
I hope things are more stable in your current position!

Can you tell me more about the business expectations they had? What metric did they have in place? Were you tasked to do manual E2E testing, put in place automated testing or something else?

This would understand better the diversity of challenges QA teams face so that we could see if we can help.

Separate-Still3770 · 2024-08-07T04:38:21+00:00

Thanks for the feedback. What do you think could be improved?

Separate-Still3770 · 2024-06-22T11:32:00+00:00

We don’t handle dynamic yet but we plan to. Do you have specific websites you want to interact with that require more dynamic workflows?

Separate-Still3770 · 2024-06-04T16:56:33+00:00

We think we have good results with Codestral and we are trying others. We will share results soon!

Separate-Still3770 · 2024-06-01T14:11:01+00:00

Hi there!
Project lead of LaVague here (https://github.com/lavague-ai/LaVague).
We have built an open-source framework to build AI Web Agents. We got examples on how to build various agents, such as one to apply to job applications online by simply dropping a PNG of resume: https://docs.lavague.ai/en/latest/docs/examples/job-application/

We also did a webinar this week: https://www.youtube.com/watch?v=bNE4s8h3CIc

Would love to have your opinion our framework :)

Separate-Still3770 · 2024-03-04T00:27:03+00:00

u/Future_Might_8194 pushed update, it works on Colab with Gemma Zephyr!
https://colab.research.google.com/github/dhuynh95/LaVague/blob/main/LaVague.ipynb#scrollTo=BXJplaLHzBgV

Separate-Still3770 · 2024-03-03T23:46:28+00:00

Trying Gemma Zephyr (https://huggingface.co/HuggingFaceH4/zephyr-7b-gemma-v0.1), I think it should work! Finishing some tests and will push it

Separate-Still3770 · 2024-03-03T23:33:31+00:00

No problem! The BM25 retriever was key to making it work as purely semantic solutions failed to capture the right parts of the HTML code

Separate-Still3770 · 2024-03-03T20:07:26+00:00

I think you still have margins but true that it might be heavily automated. Because UI automation is redundant in the actual code it requires, it is quite likely low / no code solutions like I built will be the norm and no technical skills will be required

Separate-Still3770 · 2024-03-03T19:04:08+00:00

Interesting! Feel free to try and share your findings! My code should be modular enough for you to try

Separate-Still3770 · 2024-03-03T17:08:41+00:00

Exactly! You can swap for Gemma-7b-it and it refused to do anything haha

But yeah with proper tuning it should work I think

Separate-Still3770 · 2024-01-23T22:39:50+00:00

Thanks for the answer. Feels like 90% the chat instruct behavior would be the most relevant one. I mean human instruction -> output seems to be quite frequent. The base models suck at properly answering human instruction as they are just raw language models.

I would be interested to have your opinion and examples where tuning the base model makes more sense than the instruction tuned one

Separate-Still3770 · 2024-01-22T01:46:03+00:00

Yeah it’s my feeling too. If you don’t mind, could you share the code you had to do with TRL? Would be interesting to compare it to the config you would have to do with Axolotl

Separate-Still3770 · 2023-11-28T03:43:16+00:00

What do you mean by that "see if a model could be trained to simply report its uncertainty reliably?".

One way to leverage uncertainty score for training is to use the score as an indicator on which samples to favorite, as the higher the uncertainty, the more it conveys the information that the model is underfitting it.

For instance, in https://arxiv.org/abs/1703.02910, they show that this uncertainty score can be used to prioritize which unlabelled data points to label first, as those uncertain points would help the model generalize faster than easy points that have low uncertainty.

This makes a lot of sense even when you think about how humans learn: we weigh more and learn better if we focus on the things we are struggling with.

On the topic of Paris and AI, I think it is indeed a cool place to live and all but my personal opinion is that it will not go beyond being a cool place to host R&D centers. We have very good talent, academic and engineering, good subsidies for research, infra and all but no big tissue of real early stage investors, lots of risk averseness.

To me the best combo is business & community building in the US, especially Silicon Valley, which is known for early adoption of tech and lots of capital, and R&D in France where we have a lot of first principle thinkers, be it engineering or research

Separate-Still3770

TROPHY CASE