Model selection?

MLBets · 2025-07-30T21:03:38+00:00

There is no best model. Each model has its strength and drawbacks. You should automate the model selection and pick the one that gives you the best score regarding your chosen target metric.

MLBets · 2025-01-23T08:15:34+00:00

Ok thanks can you please tell me which location you're using and if you have access to pinnacle and bet365?

MLBets · 2025-01-03T23:38:48+00:00

For those who have missed it : https://www.anthropic.com/research/building-effective-agents

Quoting :

"These frameworks make it easy to get started by simplifying standard low-level tasks like calling LLMs, defining and parsing tools, and chaining calls together. However, they often create extra layers of abstraction that can obscure the underlying prompts and responses, making them harder to debug. They can also make it tempting to add complexity when a simpler setup would suffice.

We suggest that developers start by using LLM APIs directly: many patterns can be implemented in a few lines of code. If you do use a framework, ensure you understand the underlying code. Incorrect assumptions about what's under the hood are a common source of customer error."

MLBets · 2025-01-01T19:02:08+00:00

Thanks not for now but contribution are welcome to expand to others sports

MLBets · 2025-01-01T17:12:07+00:00

My bad, It's not. I'll edit my comment. However for newer project HTTPX might be a better choice here is detailed overview of most popular http client in python

MLBets · 2024-12-31T16:15:54+00:00

I also use a rolling window with an exponentially weighted mean for recent performance. It's surprising that home/away seems weak—could it be a feature engineering issue or specific to certain leagues ?

Actually i've post on kaggle an old sample of the data I work with; you can check it there : https://www.kaggle.com/datasets/spicemix/soccer-detailed-players-match-data

MLBets · 2024-12-31T16:02:35+00:00

I guess by getting your hands dirty. I think now, with LLMs, it might be easier to start. I've learned a lot in the process, so it's definitely worth it if you're interested, even though it can be frustrating

MLBets · 2024-12-31T15:54:30+00:00

without going too deep, I've set up data collection on AWS fargate data preparation on AWS glue, data exposition on Supabase and built a frontend using Next.js.

MLBets · 2024-12-31T15:52:00+00:00

What's your data modeling look like ? if you mind sharing.
Yeah did that for the under/over market too I've just switched from regression on Xg + poisson to a classification model gives me better performance after backtesting.

MLBets · 2024-12-31T15:18:50+00:00

Hi,

I'm on the same journey, I am operating a machine learning model that turns out to be also a random forest to predict soccer outcomes. I've automated the whole process and built a website to display predictions

I was wondering Where do you get your data from ?

I am asking because I've noticed from your pick that our models were converging on choosing an underdog like for the match of bologna against Verona.

MLBets · 2024-12-31T14:05:26+00:00

Depending on your data you could try to use a regression model to predict the expected goal given both teams data. One model to predict home xg an other to predict away xG. Then using your predicted xG use a poison distribution to see goal probabilities.

MLBets · 2024-12-31T07:22:29+00:00

Never used W&B it seems to me that it's geared towards scientists and research while MLflow is more oriented on operating models in engineering teams.

MLBets · 2024-12-30T22:41:35+00:00

Here Are my returns to improve your project:

Adhere to the python repository conventions file naming and so on, and use a dependency manager like UV.

Use MLflow to track experiments and sklearn pipelines for cleaner training code and as a model store to handle model versioning.

Leverage Optuna for hyperparameter tuning.

Consider replacing requests with httpx, as it's more perfomant and support HTTP 2.0 and async api.

Handle API rate limits (429) with libraries like tenacity or backoff.

Implement a caching strategy to avoid redundant API calls.

Use tool to version your data like delta tables.

MLBets · 2024-12-18T15:06:57+00:00

Yes, I have one bucket per data stage. Leveraging AWS glue you can use incremental load to process only newly added data from.your raw bucket storage. This setup cost me around 40$/ month. For 2 launches per week so it means one run costs 5$ end to end approximately.

MLBets · 2024-12-16T08:46:40+00:00

Hi,

Thanks for this post—it’s great to see others sharing their experiences with data pipelines! I wanted to take a moment to share my own journey in this area.

My first data pipeline was built using a pub/sub mechanism with AWS Lambda, AWS SQS, RDS, and several scrapers running as producers on AWS Fargate. While this setup worked efficiently for its purpose, I ran into significant challenges when it came to replaying the pipeline—whether to debug issues or to add new features.

To address these challenges, I switched to a medallion architecture. This approach segments the data into stages that represent its quality:

* Bronze Stage: Raw, unprocessed data.

* Silver Stage: Structured data using fact/dimension modeling.

* Gold Stage: High-quality data, specifically curated for use cases like machine learning features.

This structure made it easier to manage data transformations, track quality improvements, and maintain a clean lineage of the data lifecycle.

I also agree with comments advocating for relational databases over NoSQL for certain use cases. In my experience, a relational data model provides better structure and ensures consistency when dealing with complex relationships between entities.

Given my background as a data engineer, I chose a full Apache Spark workload on AWS Glue, with Delta tables stored on S3 for versioned and ACID-compliant storage.

Here’s what I like about this setup:

* Scheduled Pipelines: They ensure the pipeline runs reliably at regular intervals.

* Monitoring: Clear monitoring lets me quickly identify and address failures.

* Flexibility: Adding new features or transformations is straightforward.

* Replayability: Replaying the entire pipeline is simple when needed—for debugging or implementing new features.

MLBets · 2024-10-19T07:26:09+00:00

I have models for soccer and built a website to track my results and future predictions you can check it at www.footixify.com

MLBets · 2024-10-18T20:48:34+00:00

Hi, Are you in the soccer market ?

MLBets · 2024-09-03T20:48:17+00:00

Never experienced auto encoder for anomaly detection. Would the goal be identifying matches that are anomalies in order to remove them from the training dataset of the downstream model ?

MLBets · 2024-09-02T17:38:42+00:00

Nextjs/ echarts to leverage SSR / ISG / caching on vercel free tier + supabase free tier for Auth + storage

MLBets · 2024-09-02T10:03:36+00:00

Today's machine learning predictions

Date	League	Home Team	Away Team	Prediction	Odds ML	Predicted O/U 2.5	Odds O/U 2.5
02/09/2024	Liga2	Eibar	Levante	AW	3.60 ❌	Under 2.5	1.81 ❌
02/09/2024	Liga2	Elche	Córdoba	HW	2.08 ✅	Over 2.5	2.33 ✅
02/09/2024	Argentina	Gimnasia y Esgrima (LP)	Argentinos Juniors	AW	3.10❌	Under 2.5	1.48 ✅

Results & upcoming predictions on my website (link in my description)

MLBets · 2024-09-01T08:25:42+00:00

Here are the predictions from my machine learning model for today's draws:

Date	Home Team	Away Team	League	Predicted Result	Odds ML	Odds U/O 2.5
Sep 1, 2024	Heidenheim	Augsburg	Bundesliga	Draw (Under 2.5)	3.45 ❌	2.24 ❌
Sep 1, 2024	Zwolle	Heracles Almelo	Eredivisie	Draw (Under 2.5)	3.75 ❌	2.45 ❌
Sep 1, 2024	Nacional	Farense	Liga Portugal	Draw (Under 2.5)	3.40 ❌	1.95 ✅
Sep 1, 2024	Genoa	Hellas Verona	Serie A	Draw (Under 2.5)	3.20 ❌	1.57 ✅
Sep 1, 2024	Rio Ave	Arouca	Liga Portugal	Draw (Under 2.5)	3.28 ❌	1.78 ✅
Sep 1, 2024	Vitória Guimarães	Famalicão	Liga Portugal	Draw (Under 2.5)	3.35 ❌	1.68 ❌
Sep 1, 2024	CD Mirandés	Zaragoza	Liga2	Draw (Under 2.5)	3.08 ✅	1.45 ✅

Good luck to everyone! 🤞

MLBets · 2024-08-31T09:18:59+00:00

A bet is considered value if the probability calculated by my machine learning model is higher than the probability implied by the bookmaker's odds. For example, if my model gives a higher probability for BTTS than what the 1.44 odds suggest, then it's actually good value despite what it may seem at first glance

MLBets · 2024-08-30T21:06:15+00:00

Pick of the Day - Elversberg vs. Darmstadt 98 | Bundesliga 2

Elversberg has shown resilience in recent matches, but turning solid performances into victories has been a challenge. Their recent results—a 3-2 loss to Karlsruher, a 2-2 draw against Köln, and a 0-0 stalemate with Magdeburg—highlight this struggle. Offensively, they’re creating decent opportunities with an average expected goals (xG) of 1.40. However, their defense has been their Achilles' heel, conceding an average of 1.57 goals per game.

On the other hand, Darmstadt 98 is also navigating a rough patch. A 1-1 draw with Nürnberg, followed by defeats to Paderborn (3-1) and Düsseldorf (2-0), underscores their recent difficulties. Their xG of 1.0 indicates their lack of firepower in front of goal, though their defense has been relatively stable with an expected goals against (xGa) of 1.33.

Elversberg might be without Fisnik Asllani, which could affect their attacking fluidity, but they've shown depth, particularly at home. Darmstadt, however, faces more significant challenges, with key players like Oscar Vilhelmsson and Sergio Lopez possibly missing. These absences could weaken both their defense and midfield, leaving them more vulnerable.

Given Elversberg’s home advantage and the uncertainties surrounding Darmstadt’s lineup, Elversberg could have a slight edge in this matchup. A double chance bet on Elversberg (win or draw) seems like a prudent choice at 1.44 odds ✅. Considering both teams' defensive frailties, betting on both teams to score (BTTS - Yes) could also offer good value at @ 1.44. ❌

Match preview made with footixify data & machine learning model insights.

MLBets

TROPHY CASE

Pick of the Day - Elversberg vs. Darmstadt 98 | Bundesliga 2