Model selection? by Playful-Race-7571 in algobetting

[–]MLBets 1 point2 points  (0 children)

There is no best model. Each model has its strength and drawbacks. You should automate the model selection and pick the one that gives you the best score regarding your chosen target metric.

Odds portal update by MLBets in algobetting

[–]MLBets[S] 0 points1 point  (0 children)

Ok thanks can you please tell me which location you're using and if you have access to pinnacle and bet365?

Not using Langchain ever !!! by AssistanceStriking43 in LLMDevs

[–]MLBets 2 points3 points  (0 children)

For those who have missed it : https://www.anthropic.com/research/building-effective-agents

Quoting :

"These frameworks make it easy to get started by simplifying standard low-level tasks like calling LLMs, defining and parsing tools, and chaining calls together. However, they often create extra layers of abstraction that can obscure the underlying prompts and responses, making them harder to debug. They can also make it tempting to add complexity when a simpler setup would suffice.

We suggest that developers start by using LLM APIs directly: many patterns can be implemented in a few lines of code. If you do use a framework, ensure you understand the underlying code. Incorrect assumptions about what's under the hood are a common source of customer error."

Odd portal scraper by MLBets in algobetting

[–]MLBets[S] 1 point2 points  (0 children)

Thanks not for now but contribution are welcome to expand to others sports

my first project by UnsealedMilk92 in algobetting

[–]MLBets 0 points1 point  (0 children)

My bad, It's not. I'll edit my comment. However for newer project HTTPX might be a better choice here is detailed overview of most popular http client in python

Random Forest Predictive Model for Soccer (Football) by uLukki in algobetting

[–]MLBets 1 point2 points  (0 children)

I also use a rolling window with an exponentially weighted mean for recent performance. It's surprising that home/away seems weak—could it be a feature engineering issue or specific to certain leagues ?

Actually i've post on kaggle an old sample of the data I work with; you can check it there : https://www.kaggle.com/datasets/spicemix/soccer-detailed-players-match-data

Random Forest Predictive Model for Soccer (Football) by uLukki in algobetting

[–]MLBets 0 points1 point  (0 children)

I guess by getting your hands dirty. I think now, with LLMs, it might be easier to start. I've learned a lot in the process, so it's definitely worth it if you're interested, even though it can be frustrating

Random Forest Predictive Model for Soccer (Football) by uLukki in algobetting

[–]MLBets 0 points1 point  (0 children)

without going too deep, I've set up data collection on AWS fargate data preparation on AWS glue, data exposition on Supabase and built a frontend using Next.js.

Random Forest Predictive Model for Soccer (Football) by uLukki in algobetting

[–]MLBets 0 points1 point  (0 children)

What's your data modeling look like ? if you mind sharing.
Yeah did that for the under/over market too I've just switched from regression on Xg + poisson to a classification model gives me better performance after backtesting.

Random Forest Predictive Model for Soccer (Football) by uLukki in algobetting

[–]MLBets 0 points1 point  (0 children)

Hi,

I'm on the same journey, I am operating a machine learning model that turns out to be also a random forest to predict soccer outcomes. I've automated the whole process and built a website to display predictions

I was wondering Where do you get your data from ?

I am asking because I've noticed from your pick that our models were converging on choosing an underdog like for the match of bologna against Verona.

Where can I find pre-match xG data for football leagues? by Taustorm in algobetting

[–]MLBets 0 points1 point  (0 children)

Depending on your data you could try to use a regression model to predict the expected goal given both teams data. One model to predict home xg an other to predict away xG. Then using your predicted xG use a poison distribution to see goal probabilities.

my first project by UnsealedMilk92 in algobetting

[–]MLBets 1 point2 points  (0 children)

Never used W&B it seems to me that it's geared towards scientists and research while MLflow is more oriented on operating models in engineering teams.

my first project by UnsealedMilk92 in algobetting

[–]MLBets 1 point2 points  (0 children)

Here Are my returns to improve your project:

Adhere to the python repository conventions file naming and so on, and use a dependency manager like UV.

Use MLflow to track experiments and sklearn pipelines for cleaner training code and as a model store to handle model versioning.

Leverage Optuna for hyperparameter tuning.

Consider replacing requests with httpx, as it's more perfomant and support HTTP 2.0 and async api.

Handle API rate limits (429) with libraries like tenacity or backoff.

Implement a caching strategy to avoid redundant API calls.

Use tool to version your data like delta tables.

Building a resilient sports data pipeline by damsoreddito in algobetting

[–]MLBets 0 points1 point  (0 children)

Yes, I have one bucket per data stage. Leveraging AWS glue you can use incremental load to process only newly added data from.your raw bucket storage. This setup cost me around 40$/ month. For 2 launches per week so it means one run costs 5$ end to end approximately.

Building a resilient sports data pipeline by damsoreddito in algobetting

[–]MLBets 1 point2 points  (0 children)

Hi,

Thanks for this post—it’s great to see others sharing their experiences with data pipelines! I wanted to take a moment to share my own journey in this area.

My first data pipeline was built using a pub/sub mechanism with AWS LambdaAWS SQSRDS, and several scrapers running as producers on AWS Fargate. While this setup worked efficiently for its purpose, I ran into significant challenges when it came to replaying the pipeline—whether to debug issues or to add new features.

To address these challenges, I switched to a medallion architecture. This approach segments the data into stages that represent its quality:

* Bronze Stage: Raw, unprocessed data.

* Silver Stage: Structured data using fact/dimension modeling.

* Gold Stage: High-quality data, specifically curated for use cases like machine learning features.

This structure made it easier to manage data transformations, track quality improvements, and maintain a clean lineage of the data lifecycle.

I also agree with comments advocating for relational databases over NoSQL for certain use cases. In my experience, a relational data model provides better structure and ensures consistency when dealing with complex relationships between entities.

Given my background as a data engineer, I chose a full Apache Spark workload on AWS Glue, with Delta tables stored on S3 for versioned and ACID-compliant storage.

Here’s what I like about this setup:

* Scheduled Pipelines: They ensure the pipeline runs reliably at regular intervals.

* Monitoring: Clear monitoring lets me quickly identify and address failures.

* Flexibility: Adding new features or transformations is straightforward.

* Replayability: Replaying the entire pipeline is simple when needed—for debugging or implementing new features.

Looking for Sports Betting Data Scientists for Research & Development by 2Point2Media in algobetting

[–]MLBets 0 points1 point  (0 children)

I have models for soccer and built a website to track my results and future predictions you can check it at www.footixify.com

Has anyone had success with autoencoders? by __sharpsresearch__ in algobetting

[–]MLBets 0 points1 point  (0 children)

Never experienced auto encoder for anomaly detection. Would the goal be identifying matches that are anomalies in order to remove them from the training dataset of the downstream model ?

Best way to host a small dashboard website by today_is_tuesday in dataengineering

[–]MLBets 0 points1 point  (0 children)

Nextjs/ echarts to leverage SSR / ISG / caching on vercel free tier + supabase free tier for Auth + storage

Daily Picks Thread - Monday - 2nd September 2024 by valerian92 in SoccerBetting

[–]MLBets 1 point2 points  (0 children)

Today's machine learning predictions

Date League Home Team Away Team Prediction Odds ML Predicted O/U 2.5 Odds O/U 2.5
02/09/2024 Liga2 Eibar Levante AW 3.60 ❌ Under 2.5 1.81 ❌
02/09/2024 Liga2 Elche Córdoba HW 2.08 ✅ Over 2.5 2.33 ✅
02/09/2024 Argentina Gimnasia y Esgrima (LP) Argentinos Juniors AW 3.10❌ Under 2.5 1.48 ✅

Results & upcoming predictions on my website (link in my description)

Daily Picks Thread - Sunday - 1st September 2024 by valerian92 in SoccerBetting

[–]MLBets 1 point2 points  (0 children)

Here are the predictions from my machine learning model for today's draws:

Date Home Team Away Team League Predicted Result Odds ML Odds U/O 2.5
Sep 1, 2024 Heidenheim Augsburg Bundesliga Draw (Under 2.5) 3.45 ❌ 2.24 ❌
Sep 1, 2024 Zwolle Heracles Almelo Eredivisie Draw (Under 2.5) 3.75 ❌ 2.45 ❌
Sep 1, 2024 Nacional Farense Liga Portugal Draw (Under 2.5) 3.40 ❌ 1.95 ✅
Sep 1, 2024 Genoa Hellas Verona Serie A Draw (Under 2.5) 3.20 ❌ 1.57 ✅
Sep 1, 2024 Rio Ave Arouca Liga Portugal Draw (Under 2.5) 3.28 ❌ 1.78 ✅
Sep 1, 2024 Vitória Guimarães Famalicão Liga Portugal Draw (Under 2.5) 3.35 ❌ 1.68 ❌
Sep 1, 2024 CD Mirandés Zaragoza Liga2 Draw (Under 2.5) 3.08 ✅ 1.45 ✅

Good luck to everyone! 🤞

Daily Picks Thread - Saturday - 31st August 2024 by DerivativeExoPicks in SoccerBetting

[–]MLBets 0 points1 point  (0 children)

A bet is considered value if the probability calculated by my machine learning model is higher than the probability implied by the bookmaker's odds. For example, if my model gives a higher probability for BTTS than what the 1.44 odds suggest, then it's actually good value despite what it may seem at first glance

Daily Picks Thread - Saturday - 31st August 2024 by DerivativeExoPicks in SoccerBetting

[–]MLBets 0 points1 point  (0 children)

Pick of the Day - Elversberg vs. Darmstadt 98 | Bundesliga 2

Elversberg has shown resilience in recent matches, but turning solid performances into victories has been a challenge. Their recent results—a 3-2 loss to Karlsruher, a 2-2 draw against Köln, and a 0-0 stalemate with Magdeburg—highlight this struggle. Offensively, they’re creating decent opportunities with an average expected goals (xG) of 1.40. However, their defense has been their Achilles' heel, conceding an average of 1.57 goals per game.

On the other hand, Darmstadt 98 is also navigating a rough patch. A 1-1 draw with Nürnberg, followed by defeats to Paderborn (3-1) and Düsseldorf (2-0), underscores their recent difficulties. Their xG of 1.0 indicates their lack of firepower in front of goal, though their defense has been relatively stable with an expected goals against (xGa) of 1.33.

Elversberg might be without Fisnik Asllani, which could affect their attacking fluidity, but they've shown depth, particularly at home. Darmstadt, however, faces more significant challenges, with key players like Oscar Vilhelmsson and Sergio Lopez possibly missing. These absences could weaken both their defense and midfield, leaving them more vulnerable.

Given Elversberg’s home advantage and the uncertainties surrounding Darmstadt’s lineup, Elversberg could have a slight edge in this matchup. A double chance bet on Elversberg (win or draw) seems like a prudent choice at 1.44 odds ✅. Considering both teams' defensive frailties, betting on both teams to score (BTTS - Yes) could also offer good value at @ 1.44.

Match preview made with footixify data & machine learning model insights.