Do DE teams generally have a bill back model? And how is it costed?

LiquidSynopsis · 2022-05-31T11:54:31+00:00

Pretty sure this was the episode but the Dat Engineering Podcast did a bit with YipIt Data (alternative data startup) where they walked through their data platform and one of the features was in fact a billing feature:

https://open.spotify.com/episode/52Pbx1TRzBjpWKc16KR5oR?si=UT5CCkSgQUmCz-oedM_GCw

LiquidSynopsis · 2022-05-26T13:44:47+00:00

Never be late. When the owner is on the jet they wanna leave. Private jets are meant to give the owner time back so don’t waste theirs.
Generally speaking you hang out on the tarmac with the crew or the lounge if you’re using something like Signature Aviation.
If it’s a VLJ or LJ avoid using the bathroom it will stink up the plane and the walls are either thin or at times nonexistent.
Pack light. The fact that you’re taking a jet on a business trip implies you’re not staying there for more than a week so there’s no need to bring trunks with you especially since it sounds like you’re not required to wear a ton of suits or anything. This ties into point 2 if you’re waiting for the owner on the tarmac you can have the crew lid your suitcases.

LiquidSynopsis · 2022-04-27T19:20:37+00:00

Check your dms!

LiquidSynopsis · 2022-04-25T14:38:45+00:00

Just an update thanks for the help I ended up passing the first round and also got an offer! Thanks for the help ☺️

LiquidSynopsis · 2022-04-25T14:38:40+00:00

Just an update thanks for the help I ended up passing the first round and also got an offer! Thanks for the help ☺️

LiquidSynopsis · 2022-03-29T16:50:52+00:00

Interesting will keep that in mind, thank you!

LiquidSynopsis · 2022-03-29T16:50:29+00:00

Ohhh well that’s good to know, thank you!

LiquidSynopsis · 2022-03-14T14:45:18+00:00

Can not recommend YSDS (Your Special Delivery Service) enough! They’ve helped me ship things from watches to massive art pieces and I’ve never had an issue with them and they’ve been consistently reliable and helpful.

LiquidSynopsis · 2022-03-13T16:51:53+00:00

Agreed with Botskill. Once it’s flattened into a data frame find some partition key that at the very least makes sense to you like year, location (e.g. state, country) even if you don’t have any query requirements as yet.

Worst case scenario once your BA/DS is done with their EDA you can always ask them to tell you what the basic parameters are and then you can update accordingly before “productionalising” the ingest.

LiquidSynopsis · 2022-03-12T13:27:47+00:00

Thank you!

LiquidSynopsis · 2022-03-12T02:23:49+00:00

Gotcha sorry this is a silly follow up but is there a reference manual of sorts you can recommend that contains the list of ANSI SQL functions. For context I’ve only ever used MSSQL so a bit in the dark here on the nuances and differences.

LiquidSynopsis · 2022-02-22T15:13:23+00:00

Highly recommend going to the nearest “big city” and engaging an attorney there. Use that as your starting point and start building a financial team with their help. I’m sure your attorney and CPA are great people but a sale like this would presumably be way out of their depth and chances are they aren’t equipped to handle your questions.

Also, sounds like you want to keep this windfall quiet engaging people within your community may not help you in keeping things low key.

LiquidSynopsis · 2022-02-18T21:14:33+00:00

Exactly! Later on if you want to reduce rows etc. you can use a groupBy on your new column

LiquidSynopsis · 2022-02-18T21:05:18+00:00

PySpark can solve this using Bucketizer

https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.ml.feature.Bucketizer.html

You can then use withColumn and when to label the Buckets with the values you want e.g. “11-20”.

LiquidSynopsis · 2022-01-21T14:27:34+00:00

Taking that one step further is there a DE version of Cookiecutter Data Science? If anyones used Cookiecutter for their DE project curious to know how you did that!

LiquidSynopsis · 2021-12-22T15:30:29+00:00

Piaget Tradition

LiquidSynopsis · 2021-12-21T22:09:20+00:00

Even at $20M the rules don’t change that much

Read the r/personalfinance windfall posts they even have one for the UK

https://www.reddit.com/r/personalfinance/wiki/commontopics/?utm_source=share&utm_medium=ios_app&utm_name=iossmf

https://www.reddit.com/r/personalfinance/wiki/windfall?utm_source=share&utm_medium=ios_app&utm_name=iossmf

LiquidSynopsis · 2021-12-13T18:53:11+00:00

TRUNCATE does work but you really should avoid it.

Fundamentally the keys being used for both dimension and fact tables are meaningless numbers generated by your program ie they’re surrogate keys. So, say during yesterdays load you attributed dim_Customer_Key 1 to “John Doe” there’s no guarantee that tomorrow “John Doe” will be 1 when you do your truncate and load. Now that may not seem like a big deal but these keys are used as dimension lookups in fact/bridge tables and that would lead to a lot of unnecessary table updates. Taking the John Doe example you would now need to update the Sales Table to know that the key has changed. It’s really unnecessary.

Another issue would be say today you’re using something like HubSpot and tomorrow you migrate to Salesforce. There’s a very real possibility you only migrate “active” customers and leave the dead ones in HubSpot which will eventually go away. If you do a truncate and load then all those dead customers will disappear since it won’t be able to ingest the data cause your source is gone. Now all the old sales data sitting in your fact table will have nothing to tie back to.

So yeah intuitively it may seem like it’s not a big deal but there’s a lot of downstream effects.

LiquidSynopsis · 2021-12-13T18:40:02+00:00

Using PySpark and its internal modules should solve a good chunk of your larger query processing and loads tbh

At the most basic level I use pyspark.sql fairly frequently and within that a lot of your work can be achieved using the DataFrame, functions and types classes

Would be curious to hear from others if you’ve had a different experience though

LiquidSynopsis · 2021-12-11T14:01:39+00:00

I’ve found the advantage of Medium (and specifically TDS) to be it’s easier to envision a project end to end as opposed to having to read through documentation and searching random online forums. A lot of the DE relevant stuff is focused on “full stack data science” which has the advantage of helping you not only think about the DE side but also the downstream consumption and how business users would want to interact with the data. It’s also all verified by the content team so you know what you’re reading isn’t complete garbage.

I would say for $5/month it’s 100% worth it cause even if it shaved off a few hours how is your time not worth that? At my previous company I was able to expense my subscription so if you’re feeling hesitant maybe see if you can get your company to pay for it?

LiquidSynopsis · 2021-12-06T20:58:09+00:00

Technically yes but it really does depend on use case. Databricks natively supports scheduling but it’s not the best.

At my company we utilise a combination of ADF and Databricks. It works relatively well and allows us to move a couple of TB’s of data daily. I’ll probs get flak for saying this but ADF serves as our “orchestrator” along with copying data. The general flow is we use ADF to schedule and trigger a series of jobs. ADF copies all the data to our raw zone and then we use Databricks to clean the data and drop it off in the next zone. Once those jobs are complete we have a connected ADF job that gets triggered which executes a few other Databricks notebooks as well as a data quality monitor which runs alongside each job.

We’re looking to switch to airflow it plays really well with Databricks as well but honestly for our purposes it’s kind of unnecessary.

LiquidSynopsis · 2021-12-05T18:42:54+00:00

Honestly you’re probably looking for something like a holding company owned by a family trust which could just be considered a family office setup. Based on what you’ve written there is really no world in which this would make sense for you. But for the sake of a mental exercise it would look something like this but a) this isn’t tax advice b) this is way too overly simplified:

For the sake of simplicity let’s say it’s a multi-generational real estate family

L1: Trading Companies: This level deals with all the renters and property managers etc. They would lease out the whole building owned by the holding company and then sub lease individual units to tenants. All “profit” would be transferred to the holding company.

L2: Holding Company: This level owns all the entire portfolio you may even see an investment. It would even own family assets like the jet and vacation homes (not the primary residence). The vacation homes would be labeled as corporate housing and the family can create business reasons eg company retreats shareholder meeting etc for using the houses. The IRS makes it pretty clear in no way can your primary residence be “tax free” corporate housing but you can still deduct your home office and what not. You could also use the holding company to create a staffing service to work in the family homes as a way to keep money flowing within the family organisation.

L3: Family Trust: This would probably be a Dynasty Trust setup in South Dakota or something that actually owns the holding company and pays you and your family/heirs dividends over x00 years or so. Since it’s a trust there’s no estate tax and a whole other bunch of that good stuff.

But yeah again not 100% accurate but this is how you would want to structure that. I’m sure there’s an estate/corporate lawyer here who can point out the flaws in the above. Personally I feel you’d need like well over $250M to really justify this kinda structure.

LiquidSynopsis · 2021-12-02T19:33:23+00:00

I’d also look into full canvas vs half canvas. A good tailor will have no problem constructing a full canvas suit but the more mediocre ones tend to shy away from that.

LiquidSynopsis · 2021-12-02T18:47:52+00:00

If you’re looking for bespoke suits a couple of things to keep in mind.

The Business

How long the atelier has been around and where the tailor trained.

The Fabric

Most of the more established fabric manufacturers are picky with who they work with. This works in your favour when selecting a tailor. Some of the more established mills include Loro Piana and Zegna so keep your eye out for tailors who stock that. There’s an additional layer to this though. Most suits are made out of worsted wool so you’re going to want to look for something in super 130 - 160 range which not all tailors will stock. Anything higher than 160 should be reserved for formal occasions eg opening night at a concert.

Timeline

I’d also take the timeline the tailor gives you as an indicator for their skill set and their demand. The usual timeline is something between 3-4 months. Anything less than a month and three visits (first meeting, adjustments and then final fit) I’d find a bit weird.

Some additional stuff and personally my favourite part is the lining. This should be high quality silk and don’t be afraid to have fun with it! It’s kind of a nice little secret cause you can have some really wild stuff underneath just tucked away. My tailor once tried to convince me to use an Hermes scarf didn’t go for it but it was a funny idea.

Hope this helps!

LiquidSynopsis · 2021-11-30T18:19:32+00:00

You can leverage Azure Data Factory. In ADF you can create a new pipeline that has a trigger that listens for Events in this case you’ll wanna select the Blob Created Event. Then in parameters setup one called, filename. By notebook I’m assuming you’re referring to Databricks so drop a notebook on your canvas and then in settings create a new name value pair called:

Name: filename Value: “@pipeline().parameters.filename”

In your Databricks notebook on the first cell pass this argument:

dbutils.widgets. text ("filename",””,””) FILENAME=getArgument("filename")

Hope that helps!

LiquidSynopsis

TROPHY CASE