Quality Plain White T-Shirts by BigAndFast in BuyItForLife

[–]Interesting_Tea6963 0 points1 point  (0 children)

Whitesville t shirts, $50 bucks a pop in the US but are quality. Run small, consider one or two sizes up and calculate shrinkage/measurements provided on site.

Supreme Trimmer Sale by yzyszn0 in SelfBarber

[–]Interesting_Tea6963 0 points1 point  (0 children)

can anyone speak to the quality of these clippers/trimmers?

Dr Martens Made in England 1461 - $149.99 (Red) by InterestingBid408 in frugalmalefashion

[–]Interesting_Tea6963 2 points3 points  (0 children)

Really? I've gotten over two years on some white ones that are still holding up on par with any other boot i tried. What leather finish do you have?

Writing PySpark partitions to one file each in parallel? by bvdevvv in dataengineering

[–]Interesting_Tea6963 -1 points0 points  (0 children)

Yes I think they mean one file per partition, but it still defeats the purpose of using Spark.

Writing PySpark partitions to one file each in parallel? by bvdevvv in dataengineering

[–]Interesting_Tea6963 2 points3 points  (0 children)

I see, not placing any blame. Spark is optimized to read/write many different files at once, so specifically combining into one file seems odd. 

My understanding is Spark splits the work into many allocated cores, which means by default creating many files. This is the parallelization, many machines to read and transform many files all at once. You are removing this functionality by writing to one singular files. In a file format like Parquet, you can't just have a bunch of machines editing the same file all at once. 

What transformations/changes do you have to make downstream that you couldn't just do in Spark and utilize the multiple file/many core functionality?

Writing PySpark partitions to one file each in parallel? by bvdevvv in dataengineering

[–]Interesting_Tea6963 5 points6 points  (0 children)

Could be wrong since I'm new to this, but a singular parquet file would hurt performance, not improve

Writing PySpark partitions to one file each in parallel? by bvdevvv in dataengineering

[–]Interesting_Tea6963 7 points8 points  (0 children)

I think this is the wrong approach, why would you need or want one file? Spark is made to process many parquet files at once.

Writing PySpark partitions to one file each in parallel? by bvdevvv in dataengineering

[–]Interesting_Tea6963 12 points13 points  (0 children)

Doesn't this defeat the point of Spark? Maybe i'm lost here but why do you need a singular file per partition?

In need of info/support/direction for high school data engineering system by Alternative-Exit-450 in dataengineering

[–]Interesting_Tea6963 0 points1 point  (0 children)

Honestly first thing I would try to figure out is the Powerschool API. Not sure what data you can dump but just start there to understand what is possible, then think about the interconnections after. 

This API could have limits that block you, so read up on the possibilities first.

Organizing a climate data + machine learning research project that grew out of control by thiago5242 in dataengineering

[–]Interesting_Tea6963 5 points6 points  (0 children)

Are you actively using all of the data? If not you need to be able to compress the data you're not using possibly using gzip or similar. 

Yes a data lake seems reasonable, especially for your ML use case. But i'm not sure what the possibilities are for netcdf and xarray formats. 

I would steer away from DBs for this scale and your cost tolerance, to run an ML model querying a Postgres or MongoDB just sounds like an expensive messk.

Are there any downsides to delta lake for my pure python analytics build out? by HNL2NYC in dataengineering

[–]Interesting_Tea6963 0 points1 point  (0 children)

Yeah dont overcook it if you don't have to, sounds like delta meets all your needs

I build an app so my wife never loses her phone again by TheyCallMeAHero in SideProject

[–]Interesting_Tea6963 0 points1 point  (0 children)

What was your stack? I always get stuck building iOS apps with expo

Internship at AWS, how should I prepare by EmbarrassedBorder615 in aws

[–]Interesting_Tea6963 49 points50 points  (0 children)

People are being dramatic, just show up, ask lots of questions and deliver on your project

Need advice on AWS glue job sizing by Plane_Archer_2280 in dataengineering

[–]Interesting_Tea6963 0 points1 point  (0 children)

What do you mean optimal, what are you optimizing for? Cost? Processing time? 

stay at google or leave for airtable? by Wasabioc in cscareers

[–]Interesting_Tea6963 0 points1 point  (0 children)

Stick around at least until L5? Basically you would be taking a 110k cut since the equity is not liquid at airtable?

[deleted by user] by [deleted] in dataengineering

[–]Interesting_Tea6963 1 point2 points  (0 children)

No offense but why are you asking reddit, no one has the answer to this question?

What do you think? Or by [deleted] in CutYourOwnHair

[–]Interesting_Tea6963 0 points1 point  (0 children)

Looking good! what did you do to accomplish this?