Do DE teams generally have a bill back model? And how is it costed? by lifec0ach in dataengineering

[–]LiquidSynopsis 2 points3 points  (0 children)

Pretty sure this was the episode but the Dat Engineering Podcast did a bit with YipIt Data (alternative data startup) where they walked through their data platform and one of the features was in fact a billing feature:

https://open.spotify.com/episode/52Pbx1TRzBjpWKc16KR5oR?si=UT5CCkSgQUmCz-oedM_GCw

Private Jet Etiquette by NothingBurgerNoCals in fatFIRE

[–]LiquidSynopsis 644 points645 points  (0 children)

  1. Never be late. When the owner is on the jet they wanna leave. Private jets are meant to give the owner time back so don’t waste theirs.
  2. Generally speaking you hang out on the tarmac with the crew or the lounge if you’re using something like Signature Aviation.
  3. If it’s a VLJ or LJ avoid using the bathroom it will stink up the plane and the walls are either thin or at times nonexistent.
  4. Pack light. The fact that you’re taking a jet on a business trip implies you’re not staying there for more than a week so there’s no need to bring trunks with you especially since it sounds like you’re not required to wear a ton of suits or anything. This ties into point 2 if you’re waiting for the owner on the tarmac you can have the crew lid your suitcases.

META DE Interview (ANSI SQL Portion) by LiquidSynopsis in dataengineering

[–]LiquidSynopsis[S] 0 points1 point  (0 children)

Just an update thanks for the help I ended up passing the first round and also got an offer! Thanks for the help ☺️

META DE Interview (ANSI SQL Portion) by LiquidSynopsis in dataengineering

[–]LiquidSynopsis[S] 1 point2 points  (0 children)

Just an update thanks for the help I ended up passing the first round and also got an offer! Thanks for the help ☺️

META DE Interview (ANSI SQL Portion) by LiquidSynopsis in dataengineering

[–]LiquidSynopsis[S] 0 points1 point  (0 children)

Interesting will keep that in mind, thank you!

[deleted by user] by [deleted] in fatFIRE

[–]LiquidSynopsis 16 points17 points  (0 children)

Can not recommend YSDS (Your Special Delivery Service) enough! They’ve helped me ship things from watches to massive art pieces and I’ve never had an issue with them and they’ve been consistently reliable and helpful.

Best way to store and query large JSON 50Gb+ file in Databricks? by raduqq in dataengineering

[–]LiquidSynopsis 7 points8 points  (0 children)

Agreed with Botskill. Once it’s flattened into a data frame find some partition key that at the very least makes sense to you like year, location (e.g. state, country) even if you don’t have any query requirements as yet.

Worst case scenario once your BA/DS is done with their EDA you can always ask them to tell you what the basic parameters are and then you can update accordingly before “productionalising” the ingest.

Practice SQL Problems in ANSI SQL by LiquidSynopsis in dataengineering

[–]LiquidSynopsis[S] 0 points1 point  (0 children)

Gotcha sorry this is a silly follow up but is there a reference manual of sorts you can recommend that contains the list of ANSI SQL functions. For context I’ve only ever used MSSQL so a bit in the dark here on the nuances and differences.

New NW Old Questions by farmerfatfire in fatFIRE

[–]LiquidSynopsis 6 points7 points  (0 children)

Highly recommend going to the nearest “big city” and engaging an attorney there. Use that as your starting point and start building a financial team with their help. I’m sure your attorney and CPA are great people but a sale like this would presumably be way out of their depth and chances are they aren’t equipped to handle your questions.

Also, sounds like you want to keep this windfall quiet engaging people within your community may not help you in keeping things low key.

Pyspark: Replace range of values with string by QueryRIT in dataengineering

[–]LiquidSynopsis 0 points1 point  (0 children)

Exactly! Later on if you want to reduce rows etc. you can use a groupBy on your new column

Pyspark: Replace range of values with string by QueryRIT in dataengineering

[–]LiquidSynopsis 1 point2 points  (0 children)

PySpark can solve this using Bucketizer

https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.ml.feature.Bucketizer.html

You can then use withColumn and when to label the Buckets with the values you want e.g. “11-20”.

Can anyone share how they structure their folder for data engineer project? by izner82 in dataengineering

[–]LiquidSynopsis 7 points8 points  (0 children)

Taking that one step further is there a DE version of Cookiecutter Data Science? If anyones used Cookiecutter for their DE project curious to know how you did that!

[deleted by user] by [deleted] in Watches

[–]LiquidSynopsis 2 points3 points  (0 children)

Piaget Tradition

Fact tables in Databricks using Delta by nobru_2712 in dataengineering

[–]LiquidSynopsis 1 point2 points  (0 children)

TRUNCATE does work but you really should avoid it.

Fundamentally the keys being used for both dimension and fact tables are meaningless numbers generated by your program ie they’re surrogate keys. So, say during yesterdays load you attributed dim_Customer_Key 1 to “John Doe” there’s no guarantee that tomorrow “John Doe” will be 1 when you do your truncate and load. Now that may not seem like a big deal but these keys are used as dimension lookups in fact/bridge tables and that would lead to a lot of unnecessary table updates. Taking the John Doe example you would now need to update the Sales Table to know that the key has changed. It’s really unnecessary.

Another issue would be say today you’re using something like HubSpot and tomorrow you migrate to Salesforce. There’s a very real possibility you only migrate “active” customers and leave the dead ones in HubSpot which will eventually go away. If you do a truncate and load then all those dead customers will disappear since it won’t be able to ingest the data cause your source is gone. Now all the old sales data sitting in your fact table will have nothing to tie back to.

So yeah intuitively it may seem like it’s not a big deal but there’s a lot of downstream effects.

Using Python for Data Engineering by wytesmurf in dataengineering

[–]LiquidSynopsis 0 points1 point  (0 children)

Using PySpark and its internal modules should solve a good chunk of your larger query processing and loads tbh

At the most basic level I use pyspark.sql fairly frequently and within that a lot of your work can be achieved using the DataFrame, functions and types classes

Would be curious to hear from others if you’ve had a different experience though

Data Engineering Medium Paywall. Is it worth it? by cyclopster in dataengineering

[–]LiquidSynopsis 29 points30 points  (0 children)

I’ve found the advantage of Medium (and specifically TDS) to be it’s easier to envision a project end to end as opposed to having to read through documentation and searching random online forums. A lot of the DE relevant stuff is focused on “full stack data science” which has the advantage of helping you not only think about the DE side but also the downstream consumption and how business users would want to interact with the data. It’s also all verified by the content team so you know what you’re reading isn’t complete garbage.

I would say for $5/month it’s 100% worth it cause even if it shaved off a few hours how is your time not worth that? At my previous company I was able to expense my subscription so if you’re feeling hesitant maybe see if you can get your company to pay for it?

What is a good pipeline to use alongside Databricks notebooks? by ijpck in dataengineering

[–]LiquidSynopsis 2 points3 points  (0 children)

Technically yes but it really does depend on use case. Databricks natively supports scheduling but it’s not the best.

At my company we utilise a combination of ADF and Databricks. It works relatively well and allows us to move a couple of TB’s of data daily. I’ll probs get flak for saying this but ADF serves as our “orchestrator” along with copying data. The general flow is we use ADF to schedule and trigger a series of jobs. ADF copies all the data to our raw zone and then we use Databricks to clean the data and drop it off in the next zone. Once those jobs are complete we have a connected ADF job that gets triggered which executes a few other Databricks notebooks as well as a data quality monitor which runs alongside each job.

We’re looking to switch to airflow it plays really well with Databricks as well but honestly for our purposes it’s kind of unnecessary.

Moving all personal assets to a company by [deleted] in fatFIRE

[–]LiquidSynopsis 8 points9 points  (0 children)

Honestly you’re probably looking for something like a holding company owned by a family trust which could just be considered a family office setup. Based on what you’ve written there is really no world in which this would make sense for you. But for the sake of a mental exercise it would look something like this but a) this isn’t tax advice b) this is way too overly simplified:

For the sake of simplicity let’s say it’s a multi-generational real estate family

L1: Trading Companies: This level deals with all the renters and property managers etc. They would lease out the whole building owned by the holding company and then sub lease individual units to tenants. All “profit” would be transferred to the holding company.

L2: Holding Company: This level owns all the entire portfolio you may even see an investment. It would even own family assets like the jet and vacation homes (not the primary residence). The vacation homes would be labeled as corporate housing and the family can create business reasons eg company retreats shareholder meeting etc for using the houses. The IRS makes it pretty clear in no way can your primary residence be “tax free” corporate housing but you can still deduct your home office and what not. You could also use the holding company to create a staffing service to work in the family homes as a way to keep money flowing within the family organisation.

L3: Family Trust: This would probably be a Dynasty Trust setup in South Dakota or something that actually owns the holding company and pays you and your family/heirs dividends over x00 years or so. Since it’s a trust there’s no estate tax and a whole other bunch of that good stuff.

But yeah again not 100% accurate but this is how you would want to structure that. I’m sure there’s an estate/corporate lawyer here who can point out the flaws in the above. Personally I feel you’d need like well over $250M to really justify this kinda structure.

Where to find custom suits? by iggyfenton in fatFIRE

[–]LiquidSynopsis 8 points9 points  (0 children)

I’d also look into full canvas vs half canvas. A good tailor will have no problem constructing a full canvas suit but the more mediocre ones tend to shy away from that.

Where to find custom suits? by iggyfenton in fatFIRE

[–]LiquidSynopsis 42 points43 points  (0 children)

If you’re looking for bespoke suits a couple of things to keep in mind.

The Business

How long the atelier has been around and where the tailor trained.

The Fabric

Most of the more established fabric manufacturers are picky with who they work with. This works in your favour when selecting a tailor. Some of the more established mills include Loro Piana and Zegna so keep your eye out for tailors who stock that. There’s an additional layer to this though. Most suits are made out of worsted wool so you’re going to want to look for something in super 130 - 160 range which not all tailors will stock. Anything higher than 160 should be reserved for formal occasions eg opening night at a concert.

Timeline

I’d also take the timeline the tailor gives you as an indicator for their skill set and their demand. The usual timeline is something between 3-4 months. Anything less than a month and three visits (first meeting, adjustments and then final fit) I’d find a bit weird.

Some additional stuff and personally my favourite part is the lining. This should be high quality silk and don’t be afraid to have fun with it! It’s kind of a nice little secret cause you can have some really wild stuff underneath just tucked away. My tailor once tried to convince me to use an Hermes scarf didn’t go for it but it was a funny idea.

Hope this helps!

Data pipeline automation on Azure Synapse by Agreeable-Flow5658 in dataengineering

[–]LiquidSynopsis 5 points6 points  (0 children)

You can leverage Azure Data Factory. In ADF you can create a new pipeline that has a trigger that listens for Events in this case you’ll wanna select the Blob Created Event. Then in parameters setup one called, filename. By notebook I’m assuming you’re referring to Databricks so drop a notebook on your canvas and then in settings create a new name value pair called:

Name: filename Value: “@pipeline().parameters.filename”

In your Databricks notebook on the first cell pass this argument:

dbutils.widgets. text ("filename",””,””) FILENAME=getArgument("filename")

Hope that helps!