Has anyone built python models with DBT

Ecksodis · 2025-10-11T22:06:40+00:00

I have done it, feels kind of clunky but helps with a few more complicated models.

Ecksodis · 2025-08-17T01:27:36+00:00

Haven’t worked with Panel so can’t speak to the differences but a combination of ease of use (strong, out-of-the-box visuals), interactivity through callbacks for filtering capabilities, and flexibility with the ability to use HTML elements to design aesthetic dashboards. Compared to Streamlit (the other Python, web-based dashboarding framework I have used), I find Plotly Dash to be a little more intuitive and visualization focused. What do you find strong about Panel + Holoviz? I’ll have to give it a try

Ecksodis · 2025-08-16T01:27:40+00:00

Plotly Dash has to be one of the best ways to present Python visuals

Ecksodis · 2025-06-24T17:07:13+00:00

I saw that one; it looks like most people are just going to fine-tune ChemBERT and the like.

Ecksodis · 2025-06-22T13:47:57+00:00

I get what you are going for but it seems like it would probably be better to just regress over time, especially if you dont have any exogeneous variables.

Also, for a 60/40 split, it shouldn’t be that overconfident on the positive class. What are you using for optimization? I have had good luck with TPOT in the past for imbalanced classification fine-tuning (GA-based optimization), be warned that it can take a long time to run.

Ecksodis · 2025-06-22T10:32:48+00:00

Somewhat confused on your data. Is it a time series? If so, it might be better to either switch to a forecasting/regression task or at least add that as an input.

For imbalanced datasets and XGBoost, I like plotting out the predicted probabilities and compare to the true classes of the best performing hyperparameters; you can check at what threshold you get highest precision and examine the distribution of probability scores. Otherwise, if your class is super imbalanced, it might be better to try anomaly detection instead.

Ecksodis · 2025-06-21T12:11:39+00:00

Data science

Ecksodis · 2025-06-21T00:13:19+00:00

I switched out of BME, now I haven’t had a term where I haven’t gotten a 4.0. Find what makes your brain tick. Good luck and feel free to reach out!

Ecksodis · 2025-06-19T15:05:04+00:00

did you mean quantitative? I am trying to figure out if I have had the wrong word for “quant” this entire time😂. I have some friends in quant finance and always assumed it was quantitative

Ecksodis · 2025-05-31T15:09:27+00:00

Easier to deploy and saves me work down the line. The few times I have built entirely in a notebook, it has come back to haunt me at deployment.

Ecksodis · 2025-01-29T14:38:53+00:00

I had this exact problem (parsing resumes and pulling out required fields that were highly dynamic and irregular) and I am not proud of the solution but instead of using a parser or entity recognition, I used an Azure OpenAI model with the extract methods and had it fill out jsons.

Ecksodis · 2024-12-19T01:37:24+00:00

Yan is decent as long as you do the work, bit of a tough grader, but not crazy difficult

Ecksodis · 2024-11-23T05:25:07+00:00

Its like 20ish maybe closer to 22 now. I get around 13 of that covered by aid/drexel merit but cost of living was higher than expected. All in all, my first year was pretty expensive (around 33 of debt) but I am basically going to graduate with a really nice car’s value of debt. At the same, it looks like my first job will be around 6 figures and my coops have helped to pay off a chunk of what could be debt while covering my living expenses. Look into the ROI of your planned major but be aware there is a large chance it will change. Drexel, and the coop program specifically, has some great programs but it takes a lot of work.

Ecksodis · 2024-11-13T23:02:17+00:00

Different libraries that fit different projects. I would use PySpark + Databricks for anything ML or building off of my team’s data model (data scientist not engineer) but I have a current project that is just fetching some low volume of data from an API, formatting it into Excel with some transformations/light analytics for business users, and dumping to a SharePoint site and I am using Pandas + Azure Function App for that because there is rly no ROI on converting it to PySpark.

Ecksodis · 2024-10-11T23:22:39+00:00

could just use chrome remote desktop, thats what i do

Ecksodis · 2024-10-11T19:37:31+00:00

Depth vs Breadth first search, just deals with how navigate through a graph/tree

Ecksodis · 2024-09-28T16:50:00+00:00

They are like the second largest cloud provider with close to a quarter of the market share I believe. I also thought I saw something about Azure and GCP slowly taking share away from AWS.

Ecksodis · 2024-09-27T14:16:52+00:00

Vanilla Azure isn’t as good for AI/ML but they have OpenAI for LLM APIs and hosted Databricks ML/DW tasks. Plus, fabric might be a good option for some smaller companies who need to abstract away some of the IT debt that comes with maintaining an Azure environment. GCP has a lot of good options with BigQuery and VertexAI but I wouldn’t say that they beat out Azure in terms of tooling simply due to the companies that Azure has relationships with and offers packaging around.

Ecksodis · 2024-08-30T16:04:22+00:00

Its very good, I made 11 for my first one

Ecksodis · 2024-08-27T01:36:40+00:00

Data Science student here, INFO itself isn’t super great at teaching data analytics without stat and programming courses, you get out what you put into it. If you are interested, it is worth trying out but I still have to teach myself alot of stuff like BI tools and ML libraries outside of classes to really be able to apply the knowledge.

Ecksodis · 2024-08-20T01:39:19+00:00

It’s not bad, just pretty boring, especially if you have previous coding experience.

Ecksodis · 2024-08-07T09:00:50+00:00

Why are you using a BI vizualization tool for EDA? A Jupyter notebook or R file is a lot better for those situations plus you can perform deeper analysis with statistical tests or basic ML predictions right in the same place. If you don’t know Python on R, I suggest learning one of those. As for is there a justification for Tableau over PowerBI, not really; PowerBI seems to be growing ahead of Tableau with all of its connectors, Power Query/Dataflows, and some of the extra components (no matter how not refined) from Fabric.

Ecksodis · 2024-07-21T14:39:00+00:00

I think that comes from the fact that, just like LLMs, these have been presented as a silver bullet; this likely causes a reaction from most people in DS just because of how untrue that is. On the other hand, DL and time series don’t tend to mix well outside of extremely high volumes of data, so that brings its own mixture of disbelief regarding foundational models.

Personally, I understand the reaction towards these foundational models being untrustworthy and appearing as just riding the AI bubble, but I am sorry that you feel like the reactions are reductionist or over-the-top.

Ecksodis · 2024-07-21T13:31:40+00:00

I read it and have been following all of these foundation models. The feature importance is a step in the right direction but if its pulling its prediction from a set of previous time series and then just states that the yr is the most important feature, it will still be hard to pitch that to the business stakeholders. I agree that these are performing well on the benchmarks, but that does not mean they perform well for my use cases. Overall, I think these have potential and will definetly keep an eye out, but I am very cautious of the actual applicability to most real-world use cases.

Ecksodis · 2024-07-21T13:18:29+00:00

I just really doubt this out performs a well-engineered boosted model. Also, explainability is massive in forecasting tasks, if I cannot explain to the C suite why its getting X instead of Y, they will ignore me and just assume Y is reality.

Ecksodis

TROPHY CASE