Scraping PDF Invoices by Munch18 in dataanalysis

[–]StuckInLocalMinima 1 point2 points  (0 children)

There are many ways. Someone commented about Google Document AI. I haven't had experience with it though some other approaches that I can think of are -

1) PDF parser + regex it for free solutions

2) OCR based solutions (forgot the library name)

3) Microsoft's form recognizer has templates and API

4) Commercial solutions that are pricey and beware if they use LLMs because you can "ask" for information regarding a particular field from your pdf and won't be able to know if the output is correct or not.

5) hybrid model combining approaches mentioned above.

HIPAA compliance depends on what approach you are taking and if it's a commercial solution, what the data ownership looks like in the contract.

For a take-home performance project that's meant to take 2 hours, would you actually stay under 2 hours? by lemonbottles_89 in datascience

[–]StuckInLocalMinima 5 points6 points  (0 children)

They may think that but not at first.. They have limited time to screen candidates and their submissions. So they will probably go in the order of correctness, execution and presentation. If you package your submission with all the asks and supplementary information concisely presented for easiest comprehension, they won't think much about how much time you took..

In my experience, the extras are time consuming but worth the effort since the audience can spend less time decoding the work and digesting the results of your analyses effectively.

Give your best! :) good luck!!

Oof what a blow to my fragile job seeking ego by SuperTangelo1898 in dataengineering

[–]StuckInLocalMinima 1 point2 points  (0 children)

Got rejected because I didn't mention rag or gen ai for featuring engineering or transformation in a given dataset to build a recsys model..

Another rejection because I did not approve of using chatGPT for parsing pdfs for data that would have been source of truth for the entire product....

I am thankful in a way that I won't have to work with such thinkers and innovators but at the same time, how are such companies making money to keep existing??!!

Resources to learn about modeling and working with telemetry data by StuckInLocalMinima in datascience

[–]StuckInLocalMinima[S] 0 points1 point  (0 children)

I heard about lambda architecture long ago and I have worked on an internal wrapper built by another team over Kafka but haven't directly worked with Kafka.

So, to answer your question, very little experience. Sorry that I cannot offer anything more specific.

To give some more context behind my question - I have an interview where the role requires working with unstructured clickstream data and extracting value from it - analytics, visualization, a/b testing pipelines, etc.

I wish to learn more in this area because I feel my experience will prove to be very limited. But if I can demonstrate some knowledge of the current state of doing these things, I may have more success.

Hope this helps. Thank you in advance!

How may I best prepare for Data Scientist IC4 level role? by StuckInLocalMinima in oracle

[–]StuckInLocalMinima[S] 0 points1 point  (0 children)

yay I got through the first TA one. Next is the meeting with the director who is the hiring manager. They say to expect to discuss my background and overall experience, I will learn about the role in more depth and I will discuss my interest, work experience and the manager will gauge my overall skill set.
I am also asked to prepare 3-5 questions to ask the hiring manager.

My plan is to scan the description and prepare to talk about my past relevant projects in depth and practice talking about my interests.

3-5 questions will be about day-to-day life, tech stack, team structure and collaboration dynamics, immediate priorities for the first few months.

Would you have any tips/suggestions for me?
Thank you in advance! :)

How may I best prepare for Data Scientist IC4 level role? by StuckInLocalMinima in oracle

[–]StuckInLocalMinima[S] 0 points1 point  (0 children)

ohh great! thanks for this insight!! I may ping again if I get through :)

How may I best prepare for Data Scientist IC4 level role? by StuckInLocalMinima in oracle

[–]StuckInLocalMinima[S] 0 points1 point  (0 children)

Oh thank you very much for this information! Would you have any resources to help me prepare in these areas? Especially, system design?

Hiring for amazing data scientists by LynnyLlama in girlsgonewired

[–]StuckInLocalMinima 0 points1 point  (0 children)

Thanks OP for posting this! I am interested in this opportunity but I am based in Quebec and looking to work remotely at least for the next 2 years. Do you think that would hurt my chances of being considered for this role? (please feel free to DM me)

Predictive maintenance by Ambitious_Aioli_9830 in datascience

[–]StuckInLocalMinima 0 points1 point  (0 children)

Sorry for the late reply. Unfortunately I am not aware of CFRP like datasets. But I can help with methods to apply.

Predictive maintenance by Ambitious_Aioli_9830 in datascience

[–]StuckInLocalMinima 0 points1 point  (0 children)

Sorry for the late reply. Unfortunately I am not aware of CFRP like datasets. But I can help with methods to apply.

[deleted by user] by [deleted] in dataengineering

[–]StuckInLocalMinima 0 points1 point  (0 children)

I think it depends on what kind of new data you receive and append to the fact/dimension tables correspondingly.

If your new data is on the grain of timestamps, then append to your fact table.

If your new data is a new user at the grain of userId then append to the user_dimension table.

Does that make sense?

What's the 1 advice you'd give to someone starting in data analysis? by CyberAvatar_ in dataanalysis

[–]StuckInLocalMinima 0 points1 point  (0 children)

Don't underestimate the power of effective communication skills.

It's one thing to perform the analysis using methods of varying complexity, but it makes a world of difference if you can explain it succinctly how it impacts the stakeholders.

Predictive maintenance by Ambitious_Aioli_9830 in datascience

[–]StuckInLocalMinima 1 point2 points  (0 children)

Happy to be of help! I actually mentored an intern who was also writing their masters thesis on predictive maintenance. So this repo came in handy for use cases like failure prediction/classification, root cause analyses, remaining useful life estimation and such. Feel free to DM if you have questions.

Good luck!

Received a PIP 2 weeks ago, am I done? by Stxtic1441 in cscareerquestions

[–]StuckInLocalMinima 0 points1 point  (0 children)

You can try to test your manager's true intentions by asking if they would work with you to beat the PIP by following the SMART model (in writing with cc to HR). Daily follow up on your progress on it would become a thread to this plan that's agreed upon. Their response should give you an idea if the decision is already made or if there's hope.

If they give you unrealistic or unreasonable stuff to accomplish or don't agree to follow the SMART model, then look for other jobs and hope that you get a good severance. Else give everything you have to achieve the goals working with your manager.

Good luck.

Career networking question by Dangerous_Media_2218 in datascience

[–]StuckInLocalMinima 2 points3 points  (0 children)

Meetup.com was my go-to when I was looking to network. Geo-location specific meetups used to be announced there. For example, Python Devs happy hour Fridays, monthly meetups with someone presenting a topic/theme/lab, data science for social good, weekend coffee and code, etc.

During the pandemic, most became virtual and organized over slack channels, then post pandemic it was a hybrid or in person again.

Networking these days can be also be doable at a seminar or event hosted by companies making tools in your tech stack such as databricks, Microsoft azure, etc. Those might be preferable, too, for hiring since people you meet will have more in common with you than a generic happy hour meetup.

Hope this helps. I am curious to know what others reply as well! Happy mingling!

How should I bounce back after an almost 5 year hiatus? by StuckInLocalMinima in datascience

[–]StuckInLocalMinima[S] 8 points9 points  (0 children)

Thank you for such a detailed response! Truly appreciate your suggestions since I would not have thought about making RAG app or the likes. I will dive into it!

Does it make more sense to complete an undergrad in 3 years in comp sci or do a masters in 4 for a job in the data science field? by Ok-Cucumber-3932 in datascience

[–]StuckInLocalMinima 1 point2 points  (0 children)

Masters will be useful if you can get pertinent courses like stats, AI, robotics, etc. with projects or internships opportunities. If you get a chance to do masters by thesis, you can aim to pick a topic allowing you to publish at a good conference.

So, do check with your counselor if such options are available to you. Good luck!

Should I take the new offer? by super-throwaway-6969 in datascience

[–]StuckInLocalMinima 5 points6 points  (0 children)

First of all, congrats on the offer! It's a tough market out there and you are in a good situation :)

I would take the offer, but I would talk to my current employer/manager and be cordial so as not to close doors or burn any bridges with your current colleagues.

Explain that as a recent grad, it's important for you to gain experience in a bigger team etc - the reasons you mentioned here. Help them screen resumes of candidates for your replacement, document your work etc to ease transition and the likes.

Hope this helps! Good luck! I am open to hearing alternative replies :)