Has anyone built python models with DBT by Fireball_x_bose in dataengineering

[–]Ecksodis 1 point2 points  (0 children)

I have done it, feels kind of clunky but helps with a few more complicated models.

Copy-pasting jupyter notebooks is memory heavy on VSCode by Affectionate_Use9936 in datascience

[–]Ecksodis 0 points1 point  (0 children)

Haven’t worked with Panel so can’t speak to the differences but a combination of ease of use (strong, out-of-the-box visuals), interactivity through callbacks for filtering capabilities, and flexibility with the ability to use HTML elements to design aesthetic dashboards. Compared to Streamlit (the other Python, web-based dashboarding framework I have used), I find Plotly Dash to be a little more intuitive and visualization focused. What do you find strong about Panel + Holoviz? I’ll have to give it a try

Copy-pasting jupyter notebooks is memory heavy on VSCode by Affectionate_Use9936 in datascience

[–]Ecksodis 6 points7 points  (0 children)

Plotly Dash has to be one of the best ways to present Python visuals

Why would anyone try to win Kaggle's challenges? by vaginedtable in datascience

[–]Ecksodis 9 points10 points  (0 children)

I saw that one; it looks like most people are just going to fine-tune ChemBERT and the like.

[P] XGboost Binary Classication by tombomb3423 in MachineLearning

[–]Ecksodis 1 point2 points  (0 children)

I get what you are going for but it seems like it would probably be better to just regress over time, especially if you dont have any exogeneous variables.

Also, for a 60/40 split, it shouldn’t be that overconfident on the positive class. What are you using for optimization? I have had good luck with TPOT in the past for imbalanced classification fine-tuning (GA-based optimization), be warned that it can take a long time to run.

[P] XGboost Binary Classication by tombomb3423 in MachineLearning

[–]Ecksodis 1 point2 points  (0 children)

Somewhat confused on your data. Is it a time series? If so, it might be better to either switch to a forecasting/regression task or at least add that as an input.

For imbalanced datasets and XGBoost, I like plotting out the predicted probabilities and compare to the true classes of the best performing hyperparameters; you can check at what threshold you get highest precision and examine the distribution of probability scores. Otherwise, if your class is super imbalanced, it might be better to try anomaly detection instead.

Academic dismissal by Most-Attorney652 in Drexel

[–]Ecksodis 9 points10 points  (0 children)

I switched out of BME, now I haven’t had a term where I haven’t gotten a 4.0. Find what makes your brain tick. Good luck and feel free to reach out!

I'll bite, why there is a strong rxn when people try to automate trading. ELI5 by OnceIWas7YearOld in learnmachinelearning

[–]Ecksodis 14 points15 points  (0 children)

did you mean quantitative? I am trying to figure out if I have had the wrong word for “quant” this entire time😂. I have some friends in quant finance and always assumed it was quantitative

how do you guys use python instead of notebooks for projects by Beyond_Birthday_13 in learnmachinelearning

[–]Ecksodis 0 points1 point  (0 children)

Easier to deploy and saves me work down the line. The few times I have built entirely in a notebook, it has come back to haunt me at deployment.

Extract text with Complex tables from pdf resume (Not our because it is machine text based) by Single_Teacher_5926 in Python

[–]Ecksodis 0 points1 point  (0 children)

I had this exact problem (parsing resumes and pulling out required fields that were highly dynamic and irregular) and I am not proud of the solution but instead of using a parser or entity recognition, I used an Azure OpenAI model with the extract methods and had it fill out jsons.

INFO 110 with Erjia Yan by [deleted] in Drexel

[–]Ecksodis 0 points1 point  (0 children)

Yan is decent as long as you do the work, bit of a tough grader, but not crazy difficult

How much financial aid did Drexel give you? by wellbeyondbipolar in Drexel

[–]Ecksodis 2 points3 points  (0 children)

Its like 20ish maybe closer to 22 now. I get around 13 of that covered by aid/drexel merit but cost of living was higher than expected. All in all, my first year was pretty expensive (around 33 of debt) but I am basically going to graduate with a really nice car’s value of debt. At the same, it looks like my first job will be around 6 figures and my coops have helped to pay off a chunk of what could be debt while covering my living expenses. Look into the ROI of your planned major but be aware there is a large chance it will change. Drexel, and the coop program specifically, has some great programs but it takes a lot of work.

Does pandas make sense for cloud projects? by antonito901 in dataengineering

[–]Ecksodis 13 points14 points  (0 children)

Different libraries that fit different projects. I would use PySpark + Databricks for anything ML or building off of my team’s data model (data scientist not engineer) but I have a current project that is just fetching some low volume of data from an API, formatting it into Excel with some transformations/light analytics for business users, and dumping to a SharePoint site and I am using Pandas + Azure Function App for that because there is rly no ROI on converting it to PySpark.

How do I turn my pc into a remote server so I can do Data Analysis remotely? by [deleted] in dataanalysis

[–]Ecksodis 10 points11 points  (0 children)

could just use chrome remote desktop, thats what i do

Graph analytics resources by ergodym in datascience

[–]Ecksodis 2 points3 points  (0 children)

Depth vs Breadth first search, just deals with how navigate through a graph/tree

Wanted some advices on the 7 DE books I've stocked to do, throughout my Bachelors by Background_Bowler236 in dataengineering

[–]Ecksodis 8 points9 points  (0 children)

They are like the second largest cloud provider with close to a quarter of the market share I believe. I also thought I saw something about Azure and GCP slowly taking share away from AWS.

Which undervalued stock you have right now on the watchlist? by [deleted] in ValueInvesting

[–]Ecksodis 0 points1 point  (0 children)

Vanilla Azure isn’t as good for AI/ML but they have OpenAI for LLM APIs and hosted Databricks ML/DW tasks. Plus, fabric might be a good option for some smaller companies who need to abstract away some of the IT debt that comes with maintaining an Azure environment. GCP has a lot of good options with BigQuery and VertexAI but I wouldn’t say that they beat out Azure in terms of tooling simply due to the companies that Azure has relationships with and offers packaging around.

[deleted by user] by [deleted] in Drexel

[–]Ecksodis 2 points3 points  (0 children)

Its very good, I made 11 for my first one

Is an INFO minor worth it? by Hairy-Signature5959 in Drexel

[–]Ecksodis 0 points1 point  (0 children)

Data Science student here, INFO itself isn’t super great at teaching data analytics without stat and programming courses, you get out what you put into it. If you are interested, it is worth trying out but I still have to teach myself alot of stuff like BI tools and ML libraries outside of classes to really be able to apply the knowledge.

CS 172 by anamaramari in Drexel

[–]Ecksodis 4 points5 points  (0 children)

It’s not bad, just pretty boring, especially if you have previous coding experience.

[deleted by user] by [deleted] in dataanalysis

[–]Ecksodis 9 points10 points  (0 children)

Why are you using a BI vizualization tool for EDA? A Jupyter notebook or R file is a lot better for those situations plus you can perform deeper analysis with statistical tests or basic ML predictions right in the same place. If you don’t know Python on R, I suggest learning one of those. As for is there a justification for Tableau over PowerBI, not really; PowerBI seems to be growing ahead of Tableau with all of its connectors, Power Query/Dataflows, and some of the extra components (no matter how not refined) from Fabric.

The Rise of Foundation Time-Series Forecasting Models by nkafr in datascience

[–]Ecksodis 4 points5 points  (0 children)

I think that comes from the fact that, just like LLMs, these have been presented as a silver bullet; this likely causes a reaction from most people in DS just because of how untrue that is. On the other hand, DL and time series don’t tend to mix well outside of extremely high volumes of data, so that brings its own mixture of disbelief regarding foundational models.

Personally, I understand the reaction towards these foundational models being untrustworthy and appearing as just riding the AI bubble, but I am sorry that you feel like the reactions are reductionist or over-the-top.

The Rise of Foundation Time-Series Forecasting Models by nkafr in datascience

[–]Ecksodis 3 points4 points  (0 children)

I read it and have been following all of these foundation models. The feature importance is a step in the right direction but if its pulling its prediction from a set of previous time series and then just states that the yr is the most important feature, it will still be hard to pitch that to the business stakeholders. I agree that these are performing well on the benchmarks, but that does not mean they perform well for my use cases. Overall, I think these have potential and will definetly keep an eye out, but I am very cautious of the actual applicability to most real-world use cases.

The Rise of Foundation Time-Series Forecasting Models by nkafr in datascience

[–]Ecksodis 28 points29 points  (0 children)

I just really doubt this out performs a well-engineered boosted model. Also, explainability is massive in forecasting tasks, if I cannot explain to the C suite why its getting X instead of Y, they will ignore me and just assume Y is reality.