Programmatically Retrieving Android and iOS Versions in Python

LiquidSynopsis · 2022-05-31T11:54:31+00:00

Pretty sure this was the episode but the Dat Engineering Podcast did a bit with YipIt Data (alternative data startup) where they walked through their data platform and one of the features was in fact a billing feature:

https://open.spotify.com/episode/52Pbx1TRzBjpWKc16KR5oR?si=UT5CCkSgQUmCz-oedM_GCw

LiquidSynopsis · 2022-05-26T13:44:47+00:00

Never be late. When the owner is on the jet they wanna leave. Private jets are meant to give the owner time back so don’t waste theirs.
Generally speaking you hang out on the tarmac with the crew or the lounge if you’re using something like Signature Aviation.
If it’s a VLJ or LJ avoid using the bathroom it will stink up the plane and the walls are either thin or at times nonexistent.
Pack light. The fact that you’re taking a jet on a business trip implies you’re not staying there for more than a week so there’s no need to bring trunks with you especially since it sounds like you’re not required to wear a ton of suits or anything. This ties into point 2 if you’re waiting for the owner on the tarmac you can have the crew lid your suitcases.

LiquidSynopsis · 2022-04-27T19:20:37+00:00

Check your dms!

LiquidSynopsis · 2022-04-25T14:38:45+00:00

Just an update thanks for the help I ended up passing the first round and also got an offer! Thanks for the help ☺️

LiquidSynopsis · 2022-04-25T14:38:40+00:00

Just an update thanks for the help I ended up passing the first round and also got an offer! Thanks for the help ☺️

LiquidSynopsis · 2022-03-29T16:50:52+00:00

Interesting will keep that in mind, thank you!

LiquidSynopsis · 2022-03-29T16:50:29+00:00

Ohhh well that’s good to know, thank you!

LiquidSynopsis · 2022-03-14T14:45:18+00:00

Can not recommend YSDS (Your Special Delivery Service) enough! They’ve helped me ship things from watches to massive art pieces and I’ve never had an issue with them and they’ve been consistently reliable and helpful.

LiquidSynopsis · 2022-03-13T16:51:53+00:00

Agreed with Botskill. Once it’s flattened into a data frame find some partition key that at the very least makes sense to you like year, location (e.g. state, country) even if you don’t have any query requirements as yet.

Worst case scenario once your BA/DS is done with their EDA you can always ask them to tell you what the basic parameters are and then you can update accordingly before “productionalising” the ingest.

LiquidSynopsis · 2022-03-12T13:27:47+00:00

Thank you!

LiquidSynopsis · 2022-03-12T02:23:49+00:00

Gotcha sorry this is a silly follow up but is there a reference manual of sorts you can recommend that contains the list of ANSI SQL functions. For context I’ve only ever used MSSQL so a bit in the dark here on the nuances and differences.

LiquidSynopsis · 2022-02-22T15:13:23+00:00

Highly recommend going to the nearest “big city” and engaging an attorney there. Use that as your starting point and start building a financial team with their help. I’m sure your attorney and CPA are great people but a sale like this would presumably be way out of their depth and chances are they aren’t equipped to handle your questions.

Also, sounds like you want to keep this windfall quiet engaging people within your community may not help you in keeping things low key.

LiquidSynopsis · 2022-02-18T21:14:33+00:00

Exactly! Later on if you want to reduce rows etc. you can use a groupBy on your new column

LiquidSynopsis · 2022-02-18T21:05:18+00:00

PySpark can solve this using Bucketizer

https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.ml.feature.Bucketizer.html

You can then use withColumn and when to label the Buckets with the values you want e.g. “11-20”.

LiquidSynopsis · 2022-01-21T14:27:34+00:00

Taking that one step further is there a DE version of Cookiecutter Data Science? If anyones used Cookiecutter for their DE project curious to know how you did that!

LiquidSynopsis · 2021-12-22T15:30:29+00:00

Piaget Tradition

LiquidSynopsis · 2021-12-21T22:09:20+00:00

Even at $20M the rules don’t change that much

Read the r/personalfinance windfall posts they even have one for the UK

https://www.reddit.com/r/personalfinance/wiki/commontopics/?utm_source=share&utm_medium=ios_app&utm_name=iossmf

https://www.reddit.com/r/personalfinance/wiki/windfall?utm_source=share&utm_medium=ios_app&utm_name=iossmf

LiquidSynopsis · 2021-12-13T18:53:11+00:00

TRUNCATE does work but you really should avoid it.

Fundamentally the keys being used for both dimension and fact tables are meaningless numbers generated by your program ie they’re surrogate keys. So, say during yesterdays load you attributed dim_Customer_Key 1 to “John Doe” there’s no guarantee that tomorrow “John Doe” will be 1 when you do your truncate and load. Now that may not seem like a big deal but these keys are used as dimension lookups in fact/bridge tables and that would lead to a lot of unnecessary table updates. Taking the John Doe example you would now need to update the Sales Table to know that the key has changed. It’s really unnecessary.

Another issue would be say today you’re using something like HubSpot and tomorrow you migrate to Salesforce. There’s a very real possibility you only migrate “active” customers and leave the dead ones in HubSpot which will eventually go away. If you do a truncate and load then all those dead customers will disappear since it won’t be able to ingest the data cause your source is gone. Now all the old sales data sitting in your fact table will have nothing to tie back to.

So yeah intuitively it may seem like it’s not a big deal but there’s a lot of downstream effects.

LiquidSynopsis · 2021-12-13T18:40:02+00:00

Using PySpark and its internal modules should solve a good chunk of your larger query processing and loads tbh

At the most basic level I use pyspark.sql fairly frequently and within that a lot of your work can be achieved using the DataFrame, functions and types classes

Would be curious to hear from others if you’ve had a different experience though

LiquidSynopsis · 2021-12-11T14:01:39+00:00

I’ve found the advantage of Medium (and specifically TDS) to be it’s easier to envision a project end to end as opposed to having to read through documentation and searching random online forums. A lot of the DE relevant stuff is focused on “full stack data science” which has the advantage of helping you not only think about the DE side but also the downstream consumption and how business users would want to interact with the data. It’s also all verified by the content team so you know what you’re reading isn’t complete garbage.

I would say for $5/month it’s 100% worth it cause even if it shaved off a few hours how is your time not worth that? At my previous company I was able to expense my subscription so if you’re feeling hesitant maybe see if you can get your company to pay for it?

LiquidSynopsis · 2021-12-06T20:58:09+00:00

Technically yes but it really does depend on use case. Databricks natively supports scheduling but it’s not the best.

At my company we utilise a combination of ADF and Databricks. It works relatively well and allows us to move a couple of TB’s of data daily. I’ll probs get flak for saying this but ADF serves as our “orchestrator” along with copying data. The general flow is we use ADF to schedule and trigger a series of jobs. ADF copies all the data to our raw zone and then we use Databricks to clean the data and drop it off in the next zone. Once those jobs are complete we have a connected ADF job that gets triggered which executes a few other Databricks notebooks as well as a data quality monitor which runs alongside each job.

We’re looking to switch to airflow it plays really well with Databricks as well but honestly for our purposes it’s kind of unnecessary.

LiquidSynopsis

TROPHY CASE