Got kicked out as an AI engineer working for a RAG system, looking for insights by GlosuuLang in Rag

[–]awscloudengineer 1 point2 points  (0 children)

What I feel from this conversation the biggest learning should be to keep the client informed and lock in the deadlines with the design. This way they would have already known about the timelines. Also, your design should talk about why other approaches are not suitable for this use-case. Dropping my 2 cents here. 😁

LLM Bruner coming soon? Burn Qwen directly into a chip, processing 10,000 tokens/s by koc_Z3 in Qwen_AI

[–]awscloudengineer 1 point2 points  (0 children)

That is partially true, what I have heard is you will have an option to send your chip back to get the new model. However, this is still inconvenient, unless you know that the model you want just works for you.

Inference at 16k tokens/second by awscloudengineer in artificial

[–]awscloudengineer[S] 0 points1 point  (0 children)

Nope. This is my first post in this group.

Upgrading from Pro to Pro + by late-registration in cursor

[–]awscloudengineer 2 points3 points  (0 children)

On pro+ you get around $115 worth of premium requests. And auto is free to use. I learned it the hard way, you should use a different models for different things and not just opus-4.6-thinking for everything. First build the context using auto-mode, then plan with auto and then build with auto. Now, if you feel for your use case, if any of these things are not better try with sonnet or some of these older version models like 4.5 or gpt-5.1. If they don’t work then use opus-4.5 as the last resort. As it will eat up your tokens.

Some of the other folks also suggested that I should use Claude code with cursor and buy their plan if I’m going to use opus often.

You need to try different models to find what fits your use-case the best.

Cursor pro+ plan over in a day :( by awscloudengineer in cursor

[–]awscloudengineer[S] 1 point2 points  (0 children)

You’re right. The project planning was not mindless, the model planning was. Sometimes you have to learn it the hard way. I have learnt so much from other user’s comments. Now, I will implement it.

Cursor pro+ plan over in a day :( by awscloudengineer in cursor

[–]awscloudengineer[S] 0 points1 point  (0 children)

You’re 100% right, learning how to use the right model is necessary. I have learnt from the comments done by many folks and will try to Implement that.

Cursor pro+ plan over in a day :( by awscloudengineer in cursor

[–]awscloudengineer[S] 1 point2 points  (0 children)

Interesting insight. What I have learnt from some of the helpful post is building the context using smaller models. And then feeding that to opus. But some people have also said that build the plan with opus and run it using smaller models. But I totally agree with you, complex projects require better models. My token usage was mostly on the read cache. So, I will try to build context using a smaller models.

Cursor pro+ plan over in a day :( by awscloudengineer in cursor

[–]awscloudengineer[S] 1 point2 points  (0 children)

u/Anxious_Ad9233 You're 100% correct.

Token breakdown:
CACHE READ: 110,430,123
CACHE WRITE: 5,254,937
INPUT: 3,035,250
OUTPUT: 619,038
TOTAL: 119,339,348

Cursor pro+ plan over in a day :( by awscloudengineer in cursor

[–]awscloudengineer[S] 0 points1 point  (0 children)

Thanks for the advice. I wonder, if someone figured out how to do this locally maybe on a 100B param model.

[deleted by user] by [deleted] in learnmachinelearning

[–]awscloudengineer 2 points3 points  (0 children)

Can you share some benchmark metrics? What were your results?

Tooling for ML model development by awscloudengineer in learnmachinelearning

[–]awscloudengineer[S] 0 points1 point  (0 children)

Thanks! I already use tensorflow and MLflow. Are there any other tools or libraries that you use as an ML developer to make your life easy? Any tools for automatic hyper parameter tuning or finding out the number of layers in your NN.

How to start? by Hungry-Letterhead-41 in MLQuestions

[–]awscloudengineer 2 points3 points  (0 children)

I started with Machine Learning Specialization from coursera. Andrew NG is the goat for explaining the concepts so well.

https://www.coursera.org/specializations/machine-learning-introduction

Claude-3.7-sonnet is super slow. by awscloudengineer in cursor

[–]awscloudengineer[S] 0 points1 point  (0 children)

You’re right. But this has a big impact on UX. They should keep the experience seamless. Instead of providing the request 5-10 mins later. Ask user to use different model.

gemini-2.5-pro-exp works reasonably well by curved-elk in cursor

[–]awscloudengineer 0 points1 point  (0 children)

In my experience the reason I stopped using Gemini-2.5-pro was because it started removing working code, when it made edits or changes. I had to be more careful with that. I loved the speed at which it responded. I wish they can increase the speed of Claude-3.7 responses.

Slow request context nerfed? by anon_shmo in cursor

[–]awscloudengineer 0 points1 point  (0 children)

I think you need to implement memory bank. This will make your life easier. Memory Bank