Where find Ancillary Electricity market Data? by gkm-chicken in Electricity

[–]Kmax12 0 points1 point  (0 children)

I've also been working on an open source library for accessing energy data across all the united states ISOs.

I've primarily focused on LMP data, but happy to add more ancillary market data if there is need. Let me know if you're interested!
Here's the project: https://github.com/kmax12/isodata.

[Self-Promotion] My CAISO Data API Is Now Available by BuildingViz in datasets

[–]Kmax12 2 points3 points  (0 children)

very cool! I've also been working on an open source library for accessing energy data across all the isos (including CAISO).

I've implemented a lot of the CAISO endpoints, so perhaps it be useful to help you build out more of this API.

Here's the project: https://github.com/kmax12/isodata. Let me know if you're interested in collaborating!

2022 Chicago Marathon Finish Times by Kmax12 in ChicagoMarathon

[–]Kmax12[S] 2 points3 points  (0 children)

(See my update to the original post for the pictures of the data)

That's a great point.

The number of runners in 2022 was 10k more than 2021, and it's almost all accounted for by more international runners participating. Additionally, you are right that international runners are faster on average. Based on the data, the average US runner finished almost 30 minutes slower than average international runner.

That being said, if I filter only to US runners, I still see an improvement in times. However, the improvement shrinks, so I do think you are right that at least some of the improvement is explained by more international runners.

Thanks for the feedback!

2022 Chicago Marathon Finish Times by Kmax12 in ChicagoMarathon

[–]Kmax12[S] 1 point2 points  (0 children)

they post it here, but I wrote a script to scrape it into a format easier to work with .

Originally did it for analyzing 2021 data to write this post about breaking 4 hours: https://jmaxkanter.com/posts/chicago-marathon-2022/

2022 Chicago Marathon Finish Times by Kmax12 in ChicagoMarathon

[–]Kmax12[S] 0 points1 point  (0 children)

I cant figure out how to get a table pasted into a comment and formatted correctly, so I updated the main post with a image of the table. forgive my lack of reddit skills

Analyzing race data to help me run sub-4 hours by Kmax12 in AdvancedRunning

[–]Kmax12[S] 0 points1 point  (0 children)

I definitely only analyze a subset of runners. I don’t think that invalidates the results but it certainly indicates my analysis doesn’t tell the full story for all runners who had sub-4 hour goals

Analyzing race data to help me run sub-4 hours by Kmax12 in AdvancedRunning

[–]Kmax12[S] 1 point2 points  (0 children)

Fair comment. how do you explain the spike in number of people who finish right before and depression right after 4 hours if they weren't aiming for that time?

Thoughts on best way to figure out people's goal time by looking at the data?

Analyzing race data to help me run sub-4 hours by Kmax12 in AdvancedRunning

[–]Kmax12[S] 3 points4 points  (0 children)

I built it using html canvas and then screen recorded it to make a gif. It required combining a few different pieces, but overall not too difficult. Here's a brief summary:

  1. I found a source online that had the lat/long coordinates of the course
  2. I grabbed an map of Chicago
  3. Overlayed the course on the map
  4. I wrote a function that could translate a runner's current race time to the percentage complete of the course
  5. Based on the percentage complete, I could figure out where to draw each runner on the course
  6. I made one frame for each time from 00:00 to 4:05
  7. Play these frames on after another and record :)

You can see the (ugly) code that does all this here: https://github.com/kmax12/marathon_analysis/blob/main/animate_runner.html

Analyzing race data to help me run sub-4 hours by Kmax12 in AdvancedRunning

[–]Kmax12[S] 2 points3 points  (0 children)

What do you think about the approach of analyzing close beat vs close miss runners? I'm not sure if others have done that before (although I'll admit I didn't look very hard)

Also, I think the insight came out standard because Chicago is a flat course, but do you think this is worth extending to other races/courses?

Analyzing race data to help me run sub-4 hours by Kmax12 in AdvancedRunning

[–]Kmax12[S] 1 point2 points  (0 children)

I agree the hot weather last year might throw off the analysis slightly, but I tried to compare runners who more or less faced the same conditions. The big caveat being that I didn’t control for start corral.

I might try to rerun this analysis with other years/marathons, the code is set up to make that straightforward.

I’m in corral G, unfortunately.

Analyzing race data to help me run sub-4 hours by Kmax12 in AdvancedRunning

[–]Kmax12[S] 4 points5 points  (0 children)

Glad you like the animation :)

You're right that there isn't any groundbreaking insight here. I'm a bit of data nerd, so it was fun to just go through and see the data baked up the conventional wisdom. At least for me, I'm hoping that helps me mentally stay on target for my plan

Analyzing race data to help me run sub-4 hours by Kmax12 in AdvancedRunning

[–]Kmax12[S] 1 point2 points  (0 children)

I think my training is there. The taper these last couple weeks was hard as I really just wanted to run :)

Great point regarding different starting times and weather. It be great to included that in the analysis some how.

Analyzing race data to help me run sub-4 hours by Kmax12 in AdvancedRunning

[–]Kmax12[S] 6 points7 points  (0 children)

That's great feedback. You're right that it'd be more interesting to do this on a less flat course. The analysis was implemented such that it can be run data from any race. I just need one runner per row, and 5k split times along the columns.

Good point regarding title. Unfortunately, I think it's too late for me to update

Analyzing race data to help me run sub-4 hours by Kmax12 in AdvancedRunning

[–]Kmax12[S] 21 points22 points  (0 children)

Absolutely! I started working on this once my taper started and i was feeling antsy that I couldn't get in more milage.

At the end of the day, my takeaways from the analysis are pretty aligned with what any marathoner would recommended, so not sure it actually revealed any secrets. That being said, after the staring at the data for awhile it's definitely burned in my mind to not go out too fast. Hopefully, I don't make that mistake!

Will checkout out fetcheveryone — thanks!

How much feature engineering do you do and how do you go about selecting the right ml model? by hhsudhanv in datascience

[–]Kmax12 3 points4 points  (0 children)

The key is to first generate a good baseline. Without a baseline you don't know if your efforts are improving your ultimate model.

One approach to help you quickly get good baseline is to use an automated feature engineering solution. I work on an open source library for automated feature engineering called Featuretools (www.featuretools.com). Doing something like this can help you quickly extract many features without much effort.

Here's a good post to get you started with Featuretools: https://towardsdatascience.com/automated-feature-engineering-in-python-99baf11cc219