I have gathered to write this entry for a long time. One day, when I was playing with the capabilities of the Apache Spark MLib library, I came up with an idea …
What if it was possible to predict which team would win a football match? And it started …
I have already seen through the eyes of my mind those millions of coins that could be made on bookmakers 🙂 Well, to the point.
When I started thinking about it more deeply, I came to the conclusion that I would not only like to be able to predict the result of the match before it started, but it was more important to me to know how each team’s chances change during it.
In addition, I would like to have it presented in some graphical way, e.g. a graph. Then another idea appeared to create a web application, which will present the match schedule from several leagues, with an indication of “today’s matches” and currently ongoing.
The latter would present changes in individual teams’ chances in real time. Thanks to this, I could follow how each team’s trend changes. And in this way I found a job for a few months (about 3). 🙂
Immediately, when I started to create architecture, I wanted to create it so that individual elements were not rigidly connected with each other, more approach to this problem modularly. And this is how applications began to appear that were only responsible for:
- Downloading data from the source (historical and during the match) – Java
- Data cleaning, normalization and enrichment – Java
- Predicting results (Machine Learning) – Spark + Scala
- Web application – HTML, CSS, PHP, JS, Bootstrap
I immediately point out that I am not a web developer, so the part related to the web application was treated with a grain of salt, it was supposed to work and that’s it.
With the increased number of applications, there was a need to somehow be able to conveniently manage the launch of individual applications (schedule) and have insight into their logs. In this case, Apache Airflow came to the rescue, which worked very well.
Machine Learning ML
The result of a football match may be the win of either team or a draw. Something obvious! So we have 3 possible surprises. The classification algorithm that I implanted focused on calculating the probability of these events occurring at a given moment of the match. It was based on over 100 features that he took into account to calculate the probability of any of the events.
When I submitted the whole application divided into microservices and launched it was time for testing. I watched selected matches live, watched the situation on the pitch and at the same time looked at my artificial intelligence model. The fun was huge as when I looked at the chart and it largely reflected the actual state of play on the pitch.
Below are some selected matches along with an analysis of how and what factors influenced the course of the meeting.
Match 1: Ajax Amsterdam [3 – 3] Bayern Monachium (12 Dec 2018)
Bayern Munich was the clear favorite. But Ajax showed on the pitch the will to fight and skills, which translated into the result of the match, where a 3: 3 draw was for Ajax a reflection of their good play.
Course of the match:
- 13′ – goal for Bayern (Robert Lewandowski) – it can be seen on the chart that from this minute guests’ chances increased by several percent.
- around 35 ‘after two yellow cards for Bayern and the weaker moment and their play, Ajax’s chances increased. The upward trend continued for a long time.
- 61′ – Ajax goal at 1: 1 – from now on you can see that Ajax took the lead and did not stop there.
- in the time window between 61 ‘- 87‘ it can be seen that Ajax was constantly increasing his chances of winning the match. Which he proved by scoring a second goal at 2: 1 in 84 ‘. It can be seen in the chart that the ML algolytrm well predicted the steady increase in host opportunities.
- between 87′-90′ minute there is a breakdown in the home team after two quick goals from Bayern and it is already 2: 3 for guests. But Ajax does not give up as you can see that the result was a draw very likely (green line)
- 95′ – in extra time Ajax equals 3:3.
Match 2: Borussia Dortmund [2 – 1] Borussia M’gladbach (21 Dec 2018)
In general, Borussia Dortmund played a weak match, but the one who creates more chances and is statistically better, but the one who uses these chances does not always win. But from the graph you can easily see the better and worse moments of the game of both teams.
Course of the match:
- 43′ – goal for Borussia Dortmund
- 45′ – goal for Borussia M’gladbach
- 54′ – second goal for Borussia Dortmund (Reus)
After Reus’ goal, Borussia Dortmund rested on her laurels. Their game has become static. Instead, the team of guests was growing the desire to win a balanced fight. Interestingly, the algorithm “stated” that Borussia M’gladbach has an equal chance to win the match as well as the hosts. Only time and happiness were lacking.
Match 3: Lazio [3 – 1] Eintracht Frankfurt (13 Dec 2018)
Eintracht Frankfurt was the favorite of the match from the first minutes, but Lazio scored the first goal in the 56 ‘match. But Eintracht players equalized at 65 ‘and then took the lead at 71’.
Course of the match:
- 50′ – Yellow card for Luis Albert (Lazio))
- 56′ – first goal for Lazio
- then 65′and 71′ respectively – two goals for Eintracht Frankfurt
Unfortunately, the maintenance of the servers and the fees associated with the subscription with the match data provider exceeded my budget, which I anticipated. To break through the competition today, all you need is time and money to maintain.
The second point is that if I were to take up this topic for the second time today, I would approach it in a different way. In some cases I would use other technologies 🙂