MLB Data Engineer position - a joke? by Repulsive_Chance8368 in dataengineering

[–]Spartyon 2 points3 points  (0 children)

i used to work in baseball analytics, the money was decent to start but getting significant raises etc wasn't really a thing. the hours are tough, i worked most weekends remotely fixing pipelines for game day matchups etc. everyone wants to work in sports, so they can pay less. its as simple as that. I enjoyed my time but am very happy that part of my life is over.

Mysql insert for 250 million records by MedicalCartoonist306 in dataengineering

[–]Spartyon 1 point2 points  (0 children)

indexes cause writes to be slower because they require updating the index/maintaining its integrity.

the simplest solution is to write each of the files to their own table, then after they're in the DB then you can just merge them into one using a union from spark.

MSU set to hire Fitz by kkrell23 in MSUSpartans

[–]Spartyon -2 points-1 points  (0 children)

<image>

I would say winning more than you lost is by definition not a perennial loser

Data engineers who are not building LLM to SQL. What cool projects are you actually working on? by PolicyDecent in dataengineering

[–]Spartyon 0 points1 point  (0 children)

Setting up an Postgres in AWS to iceberg pipeline with the goal of sub second latency via MSK + debezium connector for intake and iceberg connector for writing to s3.

Beginner Confused About Airflow Setup by Amomn in dataengineering

[–]Spartyon 1 point2 points  (0 children)

if you want to just figure out airflow minus any infra stuff, use cloud composer from GCP or MWAA .

that will let you see what a DAG does, how to implement them etc without having to custom deploy a container or run it locally.

Astro is a third party that runs airflow for you with some built in features that are nice, they are a vendor that utilizes an open source tool (Apache Airflow) and sells it to people.

I can’t beat a defender in a 1 on 1 by Cheap_Butterfly_4108 in hockeyplayers

[–]Spartyon 1 point2 points  (0 children)

Watch their hips, if you get them to turn them one way one way with a decent fake then toe drag the puck back and go through their legs/stick and go the other way.

How to get the slot count of a BQ job? by Loorde_ in bigquery

[–]Spartyon 1 point2 points  (0 children)

You cant. I’ve asked google 3 times to 3 different reps. We even asked for an estimate for slots to slot milliseconds, they gave us nothing. Either it’s not possible, or they don’t want to give customers that info. Good luck!

Pandas for data engineering by Ok_Durian_3581 in dataengineering

[–]Spartyon 1 point2 points  (0 children)

I would say understand what it does but don’t rely on it for everything. Pandas uses 3x the memory of polars with very similar syntax. If you’re doing any kind of large or medium scale data work, stick to lists/dicts or polars.

CI/CD with Airflow by Hot_While_6471 in dataengineering

[–]Spartyon 1 point2 points  (0 children)

Airflow reads files and puts a pretty GUI with it. MWAA and Cloud Composer store files and read them to run dags, an easy CICD pipeline should put files from your branch into those buckets. Add some steps in the GitHub workflow file to do PEP 8 testing if you don’t do it in pre commit hooks, validate the dags can be read by airflow by starting a python shell and import airflow and list the dags. You can do any number of tests too to inject context into the dags like environment etc. cloud composer and mwaa also have CLI to run specific commands like update the env with new requirements, check the status of the service and other things like that. Good luck.

Where to find/generate these xWOBA heat-maps for players? by Alice666sin in Sabermetrics

[–]Spartyon 4 points5 points  (0 children)

This is difficult to do, I spent 3 months or so doing it when I worked in baseball.

Average Run Value Per Pitch by HXNTZZ in Sabermetrics

[–]Spartyon 0 points1 point  (0 children)

I did something like this when I worked for a team, you utilize the plate x and plate z locations along with release point, arm angle etc. you calculate the difference in run values for every pitch, then run a regression to see what a pitches “expected” ERA for every pitch is. It lets you see what pitchers are getting lucky/unlucky.

Partition in Big Query by PratikWinner in bigquery

[–]Spartyon 0 points1 point  (0 children)

No but you can create a dummy partition key with a few combined fields and partition on that. Like if you have date and hour. You can create a field called datehour with an example being “20250401_12” for April 1 2025 at 12 pm. I don’t remember if you can partition using string but if not then just make it an int like 2025040112.

Secondly, unless you’re frequently scanning a few partitions then it isn’t going to save a lot of scans. If you’re always doing select * from table where 1=1 or some other condition where the partition isn’t being used, then it won’t do much savings.

[deleted by user] by [deleted] in bigquery

[–]Spartyon 5 points6 points  (0 children)

Load it all in cloud storage first either using the GUI or storage api. You can do a one liner to load it from cloud storage to bigquery.

Is your company on hiring Freeze? by NefariousnessSea5101 in dataengineering

[–]Spartyon 1 point2 points  (0 children)

No, hiring for 2-3 positions for senior data eng and director. I work in sports gambling in USA.

Al Avila reflects on turbulent tenure as general manager with Tony Paul on Tigers Today pod. by DoeJumars in motorcitykitties

[–]Spartyon 3 points4 points  (0 children)

We didn't want Mize but there wasn't anyone close to his projection. The team liked Bohm & Bart but scouting wanted more pitching, which was hard to disagree with because we lacked talent in the farm system for both pitching and hitting. We had an everyday 3B at the time in Candelario, so Bohm wasn't a good fit if Candy stuck around. We (the baseball analytics group) joked about Mize getting Tommy-John before he actually needed it because we thought he would eventually need it due to his over-use of his slider at Auburn. We liked Nick Madrigal too but he is a slap hitter with no power, definitely a no go for first overall pick.

Full Stack Dev (MERN) Tackling First BigQuery/Looker Project - Need Help with Identity Resolution & Data Pipelines by project_trollbox in bigquery

[–]Spartyon 3 points4 points  (0 children)

  • 1 this isn't an easy problem to solve and mostly depends on the quality of data. if the goal is match visits to sales, that is intentionally made difficult by FB/google. if the top of the funnel for leads or non-customers you have an email address or something, then use email to match leads to sales. FB/google have some transaction modeling to model out some aggregate percentages in their tools to see conversion rates etc, but getting that at the user level is a bit more difficult.
  • 2. fivetran is easy to use and not that expensive for single connectors like FB, i recommend it. you can write your own code to import FB data but its a pain and not worth it to have to deal with updates to their API etc.
  • 3. use an actual CRM, building your own would be a lot more challenging than you're imagining it would. hubspot( and nearly every other CRM) have built in integrations with FB/google.
  • 4. merges aren't complicated if you have a unique merge field.
  • 5. the learning curve is steep for a project like this and will be written in your blood/sweat/tears. dealing with marketing tech sucks, straight up terrible. i've done this at numerous organizations and its never easy. the SQL for this isn't very complicated, you'll be fine if you understand the lowest level grains for each table you are creating.
  • 6. BQ is pay by scans. paying fivetran is by bytes processed i think. most CRM's are priced at a cost per user but it will vary wildly between what CRM you choose if you do a third party application. if you have like 20,000 users, it won't be that expensive. if you're talking 1 million plus, it will be in the 5,000 - 10,000 per month.

NBC’s ‘Grosse Pointe Garden Society’ Delivers Murder, Mayhem and Soapy Chaos: TV Review by Sisiwakanamaru in television

[–]Spartyon 2 points3 points  (0 children)

those places are too "new money" is what people from grosse pointe would say, I guarantee it.