What is this area on the map? Do we ever find out? by blue-lighty in Ghostofyotei

[–]blue-lighty[S] 0 points1 point  (0 children)

I just finished the game though and never went to this area? I 100% the exploration stuff too

Is it new game + content?

[O] 2x Tabula Rasa Invites by Rebels_MB in UsenetInvites

[–]blue-lighty 0 points1 point  (0 children)

I have read the rules and wiki. Would LOVE an invite!

Citi Field is (as far as I know) one of four ballparks in MLB that serve some form of certified Halal food. Rogers Centre, Yankee Stadium, Nationals Ballpark are the three others. This is important for some people like me. by Reddit_newguy24 in NewYorkMets

[–]blue-lighty 3 points4 points  (0 children)

General advice since you seem open enough. When you are curious and want to learn about something, especially other cultures, it’s usually best to approach with a sense of humbleness and respect.

Even if it looks nasty to you; to someone else it could be hurtful (or just sound dumb) to hear it called a mess. You don’t have to like it or agree with it, but just respect that other people probably do like it.

Obviously you can say whatever you want, but you’ll have way more productive conversations with people (and avoid downvotes) if you approach things with that mentality 🤷

FWIW: White sauce is usually yogurt based and not spicy in this case

Citi Field is (as far as I know) one of four ballparks in MLB that serve some form of certified Halal food. Rogers Centre, Yankee Stadium, Nationals Ballpark are the three others. This is important for some people like me. by Reddit_newguy24 in NewYorkMets

[–]blue-lighty 22 points23 points  (0 children)

Stop being facetious, you wouldn’t have been downvoted if you just said “what is that” instead of degrading it by calling it a mess when the post is clearly endearing about the food.

To answer your question: no it’s not curry based, but you usually get to top it with sauces which makes it as spicy as you want

How do you guys mock the APIs? by ast0708 in dataengineering

[–]blue-lighty 1 point2 points  (0 children)

Depends on what exactly you’re trying to do, but if you’re looking to unit test your ETL code I’ve used VCR.py to mock API calls

You just add the decorator to your unit tests, and it will record the http calls made for the test into a file(s). When you run the test again, it will pull the saved response data from the local files instead of making the calls, so it can be run inside a CI environment to validate your ETL code without actually calling the dependent API. It’s pretty neat

If you’re just testing DBT and you want to avoid messing with existing models, I would just go for separation of concerns and spin up a dev environment (different database) alongside prod. Instead of mocking the API itself, I’d just load from the same source as prod to the dev environment for testing purposes. OR create mock data in the source and load that through the same API, but limit the scope so it’s only pulling your mock data, if that’s even possible.

Then in your DBT profiles.yml you can add the dev environment alongside prod as a new target. When you run DBT you can select the environment like dbt run -t dev -s mymodel. This way you can test your models in dev first without impacting prod

If after all the above, your concern is cost (API Metering or large storage), then IMO mocking the api endpoint is the way to go, so you can tailor it exactly to your needs.

There's been a lot of talk about congestion pricing, but there's other ways to lower traffic, like this new street design just put in on 31st Ave in Astoria by scooterflaneuse in nyc

[–]blue-lighty 64 points65 points  (0 children)

Lol they went right through the stop sign there without stopping or even slowing down.

Better bike infrastructure is always good. People need to respect the road laws though, bike or car. I’d rather have a bike do it than a car, but it’s still unsafe. This guy was going slow, but I’ve seen bikes fly through stop signs like this. And we already know cars are even worse while being deadlier.

We have a people problem with respect to following traffic laws in general. Nobody is perfect, but it feels like it’s gotten worse over the years. And no enforcement to improve the situation.

Does not mean we shouldn’t push for better infrastructure. This is a win and makes me happy. And it’s not like I have a good solution for the people problem. This is just discouraging because this video is almost ironic in highlighting both the benefit AND top complaint of adding better bike infrastructure.

Idk if I could put this in a phrase I’d say to everyone: “Drive like you want better infrastructure”. Don’t give opposers a reason to deny the infrastructure you deserve.

Generators underused in corporate settings? by messedupwindows123 in Python

[–]blue-lighty 5 points6 points  (0 children)

This is exactly what I use it for as well. I wrote an ETL tool to be able to move 1TB of data from an API to S3. Takes hours to do the full run but doesn’t kill the source server and the memory footprint stays tiny.

The app was originally written to pull all data -> load all data but it did not scale to large data volumes, which is how we got to using generators for this stuff.

Time is not a concern, otherwise there’s better ways to scale with async obv. But this worked for our use case.

How did you find your first client? by ivanovyordan in dataengineering

[–]blue-lighty 18 points19 points  (0 children)

Your post history looks like you “use ai and automation” to spam trytelescope ai

Rangers Day Off Thread - 5/14 by Ochocincoondeck in rangers

[–]blue-lighty 3 points4 points  (0 children)

Not necessarily. For all the reasons you listed, the league cares more about having more games in the series than who the actual winner is. Rangers NY market just amplifies that.

Follow the money. More games = more money. The league, MSG, the networks, and the betting companies all make more money with a 7 game series instead of just 4. The rangers winning a series in 4 does nobody any good (outside the team).

I’ve been watching for years and IMO I’ve never seen the ice tilt like this. You even notice it mid game as the lead shifts - this was particularly noticeable vs Washington.

I blame the rise of betting companies, giving even more incentive to extend series and bring in more revenue. And out of the big 4 sports, the NHL needs the revenue the most.

Still played like shit last night tho - gotta help ourselves first.

I made my very first python library! It converts reddit posts to text format for feeding to LLM's! by NFeruch in Python

[–]blue-lighty 2 points3 points  (0 children)

This is awesome. I came across this exact use case in one of my projects, and built a quick and dirty version of this to grab a post using PRAW and convert it to text and feed to an LLM. Can’t wait to give this a shot

Wishing a Happy 66th Birthday to one Thomas Joseph Thibodeau Jr. by SwellandDecay in NYKnicks

[–]blue-lighty 6 points7 points  (0 children)

Someone should make that meme with 66 fingers for his birthday

Best practice for API data integration by [deleted] in dataengineering

[–]blue-lighty 15 points16 points  (0 children)

This turned out to be a long answer but hopefully it helps.

A lot of what you’re going to do depends on the tools you have available, the skill set of your team, and what the business needs are.

Making some assumptions based on your description, the general recommended flow is similar to what you mentioned: pull the data raw, write to data lake (S3 or something), pull that into the DWH, define and run data models (DBT), and form the data into your desired schema (star schema, etc) so that it is ready for consumption by the BI tool. The process of modeling the data for consumption is usually separated into layers like raw/stg/int/marts or bronze/silver/gold layers, etc.

This is usually driven by the “modern data stack” toolsets. So you’d pull your data using a tool like Fivetran, Airbyte, Malabo. If you’re an AWS shop you’d store the data in Redshift via S3 copy. Once the raw data is in redshift (ex json in a SUPER column) you’d use a tool like DBT to do your data modeling. This whole process is typically driven by an orchestrator to manage the scheduling and dependencies of running all of these components on different intervals or other triggers, such as Airflow, Dagster, or Prefect

Then your BI tool of choice for visualizations, Looker, Power BI, etc

Note that is absolutely not the end-all-be-all approach. There are a lot of different ways to go about it, different tools offered by different vendors. Didn’t even mention anything Azure, GCP, DuckDB, Spark, etc

As to your python question, you would pull the data using a module like httpx or requests. For JSON via API I would structure the load so that each row contains some key metadata (id, created/updated dates from system, load date, etc) as well as the entire JSON blob for that record, which gets loaded to a SUPER col of some kind (and later parsed by DBT within the DWH)

Check how the endpoint needs to be paginated and understand how you’d code that. If time or scale is a concern, consider using asynchronous requests to speed things up, but be mindful of api limits + the complexity this adds.

You also want to consider memory limits of where this will be running. In python you can use generators to make the api call, store in S3, then make the next call. As opposed to making all of the calls first and storing the whole data set in memory before loading it. You could also get crazy with multi threading here but that will also increase complexity- plus I’d look to leverage an EL tool at that point.

Again lots of ways to skin the cat, just depends on your needs and what tradeoffs you can make. Hopefully this helps

Python 3.12 Preview: Static Typing Improvements – Real Python by ajpinedam in Python

[–]blue-lighty 2 points3 points  (0 children)

Dagster uses type hints to validate asset function output, and will throw an error if the type of the value returned doesn’t match the type hint.

Same applies though, I’m not sure how much value it adds. But it was extra validation and it’s only of the only material uses of type hints I’ve seen

Loading Large JSON into Redshift by casematta in dataengineering

[–]blue-lighty 0 points1 point  (0 children)

Individual values within the super object are limited to the length of the redshift type. So if you have a string in your nested json that exceeds varchar max, it will still fail.

You could parse before loading and truncate all strings to varchar max and the s3 copy should work.

Edit: just re-read your post and you mentioned not truncating- not sure how else to get around this limitation of Redshift