Description
The online sports gambling industry employs teams of data analysts to build forecast models that turn the odds at sports games in their favour. While several betting strategies have been proposed to beat bookmakers, from expert prediction models and arbitrage strategies to odds bias exploitation, their returns have been inconsistent and it remains to be shown that a betting strategy can outperform the online sports betting market. We designed a strategy to beat football bookmakers with their own numbers:
"Beating the bookies with their own numbers - and how the online sports betting market is rigged", by Lisandro Kaunitz, Shenjun Zhong and Javier Kreiner.
Here, we make the full dataset publicly available to the Kaggle community. We also provide the codes, raw SQL database and the online real-time dashboard that were used for our study on github.
Our strategy proved profitable in a 10-year historical simulation using closing odds, a 6-month historical simulation using minute to minute odds, and a 5-month period during which we staked real money with the bookmakers. We would like to challenge the Kaggle community to improve our results:
Can your strategy consistently beat the sports betting market over thousands of bets across leagues around the world?
Do time series odds movements offer insightful information that a betting strategy can exploit?
Can you outperform the bookmakers’ predictions included in the odds data by creating a better model?
What's inside the Beat The Bookie dataset
10 year historical closing odds:
479,440 football games from 818 leagues around the world
Games from 2005-01-01 to 2015-07-30.
Maximum, average and count of active odds at closing time (start of the match)
Betting odds from up to 32 providers
Details about the match: date and time, league, teams, 90-minute score
14-months time series odds:
92,647 football games from 1005 leagues around the world
Games from 2015-09-01 to 2016-11-22
Hourly sampled odds time series, from up to 32 bookmakers from 72 hours before the start of each game
Details about the match: date and time, league, teams, 90-minute score
The dataset was assembled over months of scraping online sport portals.
We hope you enjoy your sports betting simulations (but remember… the house always wins in the end).