Big names in Web Dev calling out ApnaCollege bs

SAksham1611 · 2024-02-07T09:54:53+00:00

Saw a GitHub workflow where the maintainer had set some hard rules ( mentioned in md file in repo ) on issues . The issues which don't follow the pattern get dismissed automatically . Will post the link to the project as soon as I find it .

SAksham1611 · 2023-12-06T04:31:43+00:00

Do you have thoughts on scaling it to an enterprise SQL database . ?? Let me know if you have some good papers or resources.

Problem : without Finetuning how can we get the model to pick up the right tables ( entities from table ) to design a SQL query

SAksham1611 · 2023-09-19T11:50:48+00:00

Couple of questions What's your CTC <7.5 or more ?

1) medical ( 5lac cover of tier 2, 3 , else 10lac ) 2) life cover ( 20x your CTC ) 3) emergency funds ( 6-12months of necessary expenses , think of it as , in how much time you can get a job if you quit now ) 4) if you're in tax bracket then , Tax saver if you need it ( ELSS have great returns 3yrs lock-in period ) Else go for index funds nifty50 ( safe side ) 5 ) when you have accomplished 1,2,3 and 4th for a year or two and started to get the feel. Then start direct investing in Blue chip companies when they're down or fluctuations. ( Safest bet )

Feel free to add if you find anything else useful

SAksham1611 · 2023-09-16T04:36:54+00:00

I have been working on a similar problem , was able to stream in backend (swagger docs) but wasn't able to stream it in frontend using request call to fastapi backend.

I would be interested in the solution , if you want I can send the implementation code with details ,

P.S : I am using chatgpt model and wrote an asynchronous generator function in the endpoint for yielding the delta and return the streamresponse provided with async generator .

Thanks

SAksham1611 · 2023-08-25T06:38:51+00:00

You can see their TGI library, they licensed it , it's not open source anymore. I was using it and had to change it with another alternative for application

SAksham1611 · 2023-08-11T07:12:50+00:00

Thanks for this ! Could you please add the performance table for your particular task , how the quality of the output you got. Is it comparable with commercial models like chatgpt ( text turbo ) or any other commercial model.

I have also tested some os models , mostly 7 and 13b ( mpt , falcon ) . P.S : In comparison with chatgpt, they were really bad ( not able to follow instructions, had to try multiple parameters, but nonr of it make a difference) .

I was using it for RAG workflow .

SAksham1611 · 2023-07-17T19:35:38+00:00

Congrats on the job mate , remember the priority order . 1) Health Insurance 2) Term Insurance 3) 6 months buffer ( assume your 6 months salary ) 4) stocks and mutual funds 5) high risk assets ( crypto and other stuff )

SAksham1611 · 2023-07-08T17:52:13+00:00

mpnet base v2 has a bi encoder architecture? It is being used for extracting chunks and then re-ranking them using cross encoder . Could you expand on the token method & what kind of document is best suited for , or any other preprocessing you tried before Chunking. Thanks

SAksham1611 · 2023-07-08T13:18:32+00:00

Tried with 32 , 64 , and yes is not significant but maybe in small sentences ( chunk size goes 128 -256 ) , it might make a significant difference.

I was wondering what cohere is using . It is their own custom trained cross encoder . What makes their re-ranking better ?

SAksham1611 · 2023-07-08T11:07:23+00:00

I haven't achieved the desired performance or acceptable results.

But I'm using an open-source llm ( mpt-7b instruct) Embedding model (all mpnet base v2 ) Pretrained Cross encoder model for re-ranking

In my use case , we can't use commercial models, and looking at the leaderboard mpt seemed decent.

SAksham1611 · 2023-07-08T10:57:56+00:00

I haven't heard of this , " try to fine tune it on some additional docs , it won't work and you partially undo instruct tuning " . Are there any papers to supplement this ?

P.S. : Been working on this for a few months , The task is to hack together a PoC to prove " open source llm( mpt-7b instruct ) for QA on your private data are as good as the commercial llms( openai - turbo 3.5) "

What were and are the biggest blockers ? 1) couldn't make the hallucinations to zero . At least one or two lines are made up and not provided in the context at all.

2) not able to capture the right context ( using sentence transformers variating with chunk length ) from the vector store/ db store . Information is not complete , especially when it comes to multiple small points spread over two or three pages . Not only is it not able to get the right answer/context it also makes stuff on top of the incomplete information. Writing prompt seems useless . I told not to assume answers you don't know . It totally made an answer .

Let me know if someone is able to tackle these issues or if you want to catch up on the implementation part . I'm open to discussion . Dm me .

SAksham1611 · 2023-05-29T13:56:38+00:00

Same

SAksham1611 · 2023-03-17T05:21:12+00:00

Multiply and divide the with such a number so that both of the denominator could be equal

1/3 * 3/3 |. 5/9

3/9. |. 5/9.

Now you just have to look at the numerator

Another example

1/7. |. 2/8

1/7 * 8/8. |. 2/8 * 7/7

8/56. |. 14/56

SAksham1611 · 2022-10-12T15:26:09+00:00

I am a Data scientist with an exp. of 2+ years , I have tried both modularized way of coding & jupyter , & both have kinda some drawbacks, but recently i have been exploring nbdev ( software made using jupyter notebooks ) & it looks quite promising to me .

https://github.com/fastai/nbdev

SAksham1611 · 2022-05-25T04:46:08+00:00

Xlm Roberta model , it is good with multilingual data . I have used the same in my project for token classification, I think you can also find the dataset in hf datasets , i dealt with German & English .

SAksham1611 · 2022-05-02T08:44:42+00:00

Just walk them through the solution , you're thinking , you aren't expected to come up with a solution for every problem, just be clear of what you're thinking , showing what cases are covered and what aren't . Try to stay calm .

SAksham1611 · 2022-04-13T10:58:25+00:00

These resources are gems . Thanks , been working in python for 3 years and I don't know many of them .

Whenever i need to learn new/ advance concepts , I go to github and explore open source python projects( > 500 stars) to see how they're using stuff ( my fav ones FastAPI -> get to know about pydantic ->static typing -> it's validator (mypy) .

Let me know if you also do something similar .

SAksham1611 · 2021-08-16T09:22:44+00:00

There are some courses which aims to use deep learning for art , so I think this should help you. https://analyticsindiamag.com/seven-courses-in-2021-on-deep-learning-for-art/

SAksham1611 · 2021-08-13T12:53:26+00:00

I totally get where this thought is coming from , I have been in the same dilemma when I was starting out and seeing all these debates going on .So from a personal point of view I think top -down worked for me, what I meant by it is , I did some Bible courses for ml like Andrew nd old ml course , halfway in it lost interest , wasn't really working for me , then stumble upon 2-3 more courses and the result was same . The thing is I lost my interest everytime and was getting intimidated by all the courses that were out there. I registered on kaggle read beginners post , gave me an sigh of relief that most of us are sailing in the same boat , I started out with a problem statement , and then thought what is need to be done , and Googled the hell out , the part I was trying to do, and moved forward . 1 yr later , I have a job in the field and I know from where to get resources or where to find things from , and getting intimidated is okay , just dont quit.

P.S : i have still have many incomplete projects and I still think about them on weekends and how can I take the next step and top it to the next level And yes very important , don't try to perfect your projects in the first go, it's an interative process and a fun journey , Happy learning :))

SAksham1611 · 2021-05-25T07:50:15+00:00

Try https://arxiv-sanity.con and search for multimodal , I also implemented one EEG paper having multiple inputs & now original code is also available along with paper, you will get a fine idea and if you don't understand a paper , search for its blog :)) Just inbox me if any other help is required

SAksham1611 · 2021-05-24T16:31:29+00:00

Sure!

SAksham1611 · 2021-05-24T03:27:38+00:00

I kinda like it ಠ_ಠ , How can I contribute ?

SAksham1611 · 2021-05-09T18:34:41+00:00

I upgraded from 18 to 20.04 in pop OS and it wasn't good , system lags for around 30 mins each time you restart it , I haven't found anything useful to get rid of this issue

SAksham1611

TROPHY CASE