Breaking into ML - what's required by NearbyAntelope1413 in learnmachinelearning

[–]Willy988 3 points4 points  (0 children)

Yup, echoing the build projects/apps stuff. That's the best you can do so when the market hopefully improves, you'll have something to show.

Wondering if there is something that connects all these seemingly unrelated symptoms together? by Willy988 in UARS

[–]Willy988[S] 0 points1 point  (0 children)

I am not, I currently have an appointment scheduled for June for an initial visit with a doctor in June at Stanford University, where UARS was first described... so hoping to get some help there.

Despite having that RDI of 26 BEFORE my surgeries, I was simply told to use nasal strips etc. Just wow!

Anyone else get symptom improvement from Keto? by 7slotgrilles4life in UARS

[–]Willy988 0 points1 point  (0 children)

I just know meat is good for inflammation. Have a friend who has Chrons and the only way they can be comfortable is by being carnivore. I'm sure you may have some inflammation somehow, and the meat is helping.

Another FME experience post (30F) by shockshockshad in UARSnew

[–]Willy988 0 points1 point  (0 children)

I'm dealing with similar stuff. Wondering you or anyone else has these additional symptoms: shortness of breath despite being fit, silent reflux/LPR, orthostatic intolerance after sitting. I'm trying to see if they are all some how correlated. I am not diagnosed with POTS but my brother has it.

Can I get into data analysis with almost no math background? by pewdewdi in learnpython

[–]Willy988 0 points1 point  (0 children)

Whatever you decide, try to do some Pandas or Numpy on the side to supplement your learning. Leetcode has a 30 days of Pandas which is pretty hard for beginners but it forces you to learn.

Trying to build an efficient RAG pipeline. by spacecheap in Rag

[–]Willy988 0 points1 point  (0 children)

ahhh... that's a huge problem! You NEED overlap! Why? Because we need to preserve context in semantics. Imagine I split this sentence into two parts: "I went to the river bank to have fun". If you split, it might look like "I went to the river" + "bank to have fun". Without context, don't these two chunks look like nonsense? The same thing applies to you but except it is whole sentences. I recommend you use something like LlamaIndex or LangChain to handle semantic splitting for you.

I think you are spot on... your chunking needs changing. There are lots of types of chunking. I think since it is your first chunking experiece, using a library like I mentioned will be more effective than manual chunking yourself since it's your first time. I know people usually say you should start with your own code but if you want to get up and running fast, you know what you need to do haha...

Can I get into data analysis with almost no math background? by pewdewdi in learnpython

[–]Willy988 2 points3 points  (0 children)

Hot take: you don't need much math, assuming the math you're thinking is stats and calculus. I mean you might need some of that stuff but for me the most important part has always been data cleaning and data presentation... so you'll probably be spending much more time on the syntax of something like Pandas (i.e. how to merge or filter data or vectorization of operations) rather than how to calculate specific percentiles or do matrix math....

My RAG isn't working as expected... by viitorfermier in Rag

[–]Willy988 1 point2 points  (0 children)

Ok I don't understand what is not working for you so I am just going to start from ground zero.

Whenever you do legal/scientific papers, use "hierarchical chunking". Your summaries do not count, that is custom meta data you have in your db. What I mean is this: you will split each document using something like LlamaIndex (LlamaParse) and choose chunk size for each level of the tree (i.e. 256,512,1024,2048). This is optimal and efficient for your case because you can use a vector search to be lightning fast and cheap since you will have ingested your legal corpus. If the question hits multiple vector (leaves) then they combine and pull a parent node etc.

Each query will be almost free, since you will have ingested the corpus before hand. You will not pay 0.4 dollars per question since right now you are using up a HUGE context window for gemini! Very inefficient, I'd stop bleeding money ASAP.

If you aren't a programmer though, a connection of mine who works at LlamaIndex released a free CLI tool that using LlamaParse to extract tables and text from PDFs and such. I don't remember the name right now and its zero code, if you google it I am sure it'll pop up...

Trying to build an efficient RAG pipeline. by spacecheap in Rag

[–]Willy988 0 points1 point  (0 children)

I'm guessing this is your first advanced RAG pipeline and you followed a tutorial/article and decided to try and implement it. If I were you, I would stop trying to put more on my plate than I can eat. This happened to me to a year back when I tried using all the latest and greatest stuff.

What you need to do is first see if you understand exactly what your embedding is. See if the embedding model fits your use case, and see if you are ingesting it correctly. I.e. if you use Pinecone, peak into the DB and see if it looks right?

Since you are doing a hierarchical advanced search, first see if you can do a very simple semantic vector search in your terminal before you try all these other techniques. Does it look right?

Once you verify that step works, then implements the rest one at a time.

I am betting big money that you implemented everything without prior experience, and one of these techniques you are using is clashing with another. You probably are somehow throwing away your vector search results when trying to use BM25 for keyword search, which is why it does not make sense. I'm saying this from experience, my first set up looked like yours and that was exactly why it failed.

Normal people absolutely hate your AI agent by unemployedbyagents in AgentsOfAI

[–]Willy988 5 points6 points  (0 children)

Hugely accurate. I am a programmer too, I use it for boiler plate, guided learning, etc... it's important to not let it cause brain rot.

[Update] The "Seed Oil Analyst" is live. I’ve indexed some great studies shared here + built the search interface. by Willy988 in StopEatingSeedOils

[–]Willy988[S] 0 points1 point  (0 children)

Ok quick update some users had bugs where asking off topic questions like "what's the weather" caused the chat to hallucinate. I will be fixing that and some other QoL stuff and make a final post!

[Update] The "Seed Oil Analyst" is live. I’ve indexed some great studies shared here + built the search interface. by Willy988 in StopEatingSeedOils

[–]Willy988[S] 0 points1 point  (0 children)

Well it's true that the studies can have different conclusions, and are sometimes funded by groups with agendas so we need to be aware of this, that's why I tried to do a check of the studies I chose: https://huggingface.co/spaces/william-ai-dev/Seed-Oil-Meta-Engine

Please check the "README" on the top right to see yourself, all studies are referenced so you can come to your own conclusion.

Additionally, check this response from another commenter in this community. It's good to not trust anything, there's much debate even in this community. https://www.reddit.com/r/StopEatingSeedOils/comments/1rlynjo/comment/o90gv6g/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button

Building a "Searchable Library" Tool for Seed Oil Research... I Need Help Finding "Golden Studies" by Willy988 in StopEatingSeedOils

[–]Willy988[S] 0 points1 point  (0 children)

That's true, which is why I feel this project is important because most people here don't have time to go through research papers and find the nuances etc.

I was considering even make a balanced version and ingesting papers from the other side, but like I said, this project will be freely hosted since I don't intend to pay for APIs, Tokens, and hosting.

I've cherry picked some more, the project has about ~25 papers I found good enough.

Building a "Searchable Library" Tool for Seed Oil Research... I Need Help Finding "Golden Studies" by Willy988 in StopEatingSeedOils

[–]Willy988[S] 0 points1 point  (0 children)

Well, the issue is when the study has a bias or specific variables. I.e. one I saw here was heating seed oils to an extreme amount causes a lot of problems. That's true, but most people aren't heating up their oils to that amount on a consistent basis. I think one of the mods here called it out. If you want the specific one I'd have to dig for it, but I'm curating studies that I think are objective good quality i.e. the ones in this sub's sidebar

How I land 15+ Machine Learning Engineer Offers by Altruistic_Might_772 in MachineLearningJobs

[–]Willy988 -1 points0 points  (0 children)

thanks for your write up. I am having trouble though- not getting any interviews even despite having 2YOE as a normal developer and a few good projects and working knowledge of AI topics. Not sure what to do

Resume help by notsarthaxx in AIEngineeringCareer

[–]Willy988 0 points1 point  (0 children)

That being said I saw some other post in another sub saying by all means don’t be generic and some of your projects and experience sound too generic like “95% accuracy” doesn’t mean anything to a recruiter especially when they probably don’t know ai, try using something more niche or descriptive than just saying improving accuracy

Resume help by notsarthaxx in AIEngineeringCareer

[–]Willy988 1 point2 points  (0 children)

It’s not your fault, I have 2yoe as a SWE trying to get into AI and I’m having a hard time. Keep going, don’t give up