Pandascore is the world’s AI-powered hub of esports data, whose odds feed is the main product serving the esports betting industry.
We use artificial intelligence to process and model more than 300 data points per second to empower our dedicated esports traders with the most up to date information in real-time.
First-person shooter Counter-Strike: Global Offensive (CS:GO) is one of the big 3 esports titles for Western audiences, along with MOBAs League of Legends and Dota 2. When it comes to esports betting, CS:GO is far and away the most popular for punters. The biggest tournaments in the world regularly draw in betting turnover in the tens of millions of euros.
Mapping and modeling professional Counter-Strike as accurately as possible is essential to providing the best statistics and odds available on the market.
Counter-Strike has been an esports staple for over 20 years, with established tournaments featuring teams from every corner of the globe competing year-round and a growing Players’ Association to boot. The game itself boasts almost 35 million players and roughly 53m hours watched on Twitch every month.
To break the game down in simple terms:
- 2 teams of 5 players face each other in a 30 round matchup
- Each round is 1 minute and 55 seconds, with either team having specific win conditions
- For the T-side, they can win a round by either eliminating every member of the opposing team OR by planting a bomb in a designated zone, which will detonate after 40 seconds
- For the CT-side, they can win a round by eliminating all members of the T-side, or defusing the bomb after it has been planted
- After playing 15 rounds, the two teams will switch sides from T-side to CT-side and vice versa
- The first side to win 16 rounds wins the match
There is also overtime to account for:
- A team must win by 2 rounds, otherwise the game will go into overtime after the 30 rounds are completed
- In overtime, the two teams will need to win in a best-of-6 rounds format. If the teams are tied at the end of overtime, another round of overtime will start until there is a victor
In the competitive scene, teams will compete over multiple matches and maps in a best-of-3 or best-of-5 series. As you can imagine, there is a ton of data to process in any CS:GO esports match.
We need to have access to a lot of predictions (called markets in the betting industry) regarding CS:GO matches such as:
- Who is going to win the match?
- Who is going to have 10 rounds first?
- What will be the round difference at the end of the match?
- How many rounds will be played in the match?
It is not possible to have one model per prediction since it could potentially lead to contradictory predictions easily exploitable by bettors. We need to model the round dynamic of CS:GO all at once. This means that we need to have access to probabilities to reach any given intermediate and final round score.
Since our model will run also during live for in-play betting, the model also needs to be able to infer predictions fast — typically a few dozens of milliseconds. This motivated us to try analytical-based solutions for inference instead of simulation-based ones.
For the model we are describing in this blog post, we are going to make the following assumption:
- The probability for a team to win a round only depends on the outcome of the previous round
Anyone knowing a bit about the game CS:GO could argue that this assumption is too naïve for the problem we are trying to solve. Indeed, it is commonly accepted that the best feature in CS:GO to predict the outcome of the coming round in live is the economy which has its own round dynamic. However, we found it interesting to check how a simple model like the one we are going to present would perform.
Binomial distribution with Markov states
A 2 states Markov Chain Binomial (MB in this post) can be seen as an extension of a binomial distribution in which the probability for the nth trial to be a success or a failure depends on the outcome of the previous trial.
Let’s walk through the equations while keeping in mind CS:GO.
More formally, we can define:
- The probability for team A to win (j=1) or lose (j=0) the round m+1 round given they won (i=1) or lost (i=0) round m
- Alongside a transition matrix P representation
Following this paper, we can rewrite P
- p being the probability of success of the first trial (first pistol round in CS:GO)
- q = 1-p being the probability of failure of the first trial
- delta can be interpreted as the first-order autocorrelation between 2 rounds
The details to go from the first matrix to the second one is well explained in the paper mentioned above.
The nth transition probability matrix is then equal to:
Only 2 parameters need to be estimated to fit such model to some data: p and delta.
Again following the paper, the likelihood of a sequence of trials parametrized by N1, N0 and Nij (i,j = 0|1) following a MB model can be expressed as:
N1 is a boolean equal to 1 if the first round was won by team A.
N0 is a boolean equal to 1 if the first round was lost team A.
Nij denotes the total number of transitions from state i to state j within the sequence.
Let’s change the notations from the paper for the application to CS:GO. We will call i the outcome of the first round, j the outcome of the last round, n the number of rounds, k the number of rounds won, l the number of pairs of successive round wins.
We then have the following equalities:
- N1 + N01 is equal to the number of “individual successes”:
- N0 + N10 is equal to the number of “individual failures”:
- N11 is equal to the number of pairs of consecutive successes:
- N00 is equal to the number of pairs consecutive failures:
- N01 + N10 is the number of times the state (win, lose) has changed in the sequence:
We won’t take a detailed dive into the calculus that leads to the above equations, but you can easily check that they are correct on the specific example shown below!
The probability to observe a path with n rounds, k wins, l pairs of consecutive wins given that the first pistol round is i and the last round is j, under an MB model parametrized with p and delta can now be written as:
i, j, n, k, l together define a sequence pattern, and we therefore consider all sequences that have the same structure equal in terms of probabilities. To compute the probability of having a given sequence pattern, we have to multiply the likelihood of a certain pattern to occur with the number of possible patterns, given by Ѱ.
Following Lemma 2 of Rudolfer (1990) “A Markov chain model of extrabinomial variation”.
The picture above shows one counting example of the sequence patterns of a MB model.
We can then write the probability for team A to reach k rounds in n rounds, with l pairs of consecutive rounds given than the outcome of the first round is i and the last is j:
To get the probability of team A to win, we just need to marginalise over n, l and i. Indeed, if team A wins, they will win the last round so j = 1. However there is something we have not considered at all for now which is the half time.
Indeed, in a CS:GO game, after round 15, both teams switch sides (T-side becomes CT-side and vice-versa) and the outcome of the 16th round (second pistol round) can be considered as independent from what happened before.
This brings us to model a CS:GO game as follows:
- The first half (from round 1 to 15) should be modeled with a fixed-n Markov Binomial distribution (a fixed number of trials is run, in our case 15)
- The second half (after round 15) should be modeled with a Negative Markov Binomial (some trials are run until there is a winner, 16 rounds if no overtime)
Following what we have said in the previous section we have:
- The probability to have k successes in n trials for a sequence modeled with a Markov Binomial distribution can be written as:
- The probability to reach k successes and r failures, the last trial being a success
We are almost there!
Imagine you are watching a game of CS:GO and the score at the half of the game is 8–7. Then the probability for team A to win 16–14 at the end of the game (8 successes and 7 failures in the second half) is:
Without any assumption on the first half (prematch setup), we can still compute the probability for team A to win the game 16–14 by just marginalizing on all possible scores for the first half:
- the probability for team A to win the game without overtime is:
- the probability to go to overtime is:
- the probability for team B to win the game without overtime is:
And this is it! We manage to have a closed-form expression that models the round dynamic in CS:GO.
N.B. It’s possible to model the overtimes with the same idea since they are just a sequence of 6 rounds maximum with a “first to 4” rule.
How to use such a model?
First, the delta parameter can be found easily from the data we collect at PandaScore. The other parameter p (probability to win the pistol rounds) can be found using a machine learning model or with the help of humans.
Secondly, it’s also possible to work the other way around. We can for instance have another model estimating the probability for team A to win the game, and then infer p using some optimisation algorithms.
We have fixed delta to 0.35 which appeared to work well. Let’s take as example 2 scenarios.
- Team A has a 50% of chance to win the game and we deduce p from that
The first plot below shows the probability for every possible in-play round score to be reached. The anti-diagonals sum to 1 since they gather all possible scores for a given fixed number of rounds played. Since both teams have the same chance to win, the plot is symmetrical and the biggest probabilities stand near the diagonal.
The second plot below shows the probabilities on all end scores. Since both teams are equal, the highest probabilities are on scores ranging from 16–14 to 16–10.
2. Team A has only 5% of chance to win the game and we deduce p from that
The plots below are the same as described above. But obviously the matrices are not symmetrical anymore!
Obviously the assumption made with this model is too strong. While the model proposed has its advantages (fast to compute with analytic solutions, easy to interpret, good estimates of the round distribution), it has shown some limitations e.g.
- Overestimation of close games for teams of similar strengths
- Underestimation of stomps for teams of similar strengths
- Fine grained dynamics are averaged
The plot above shows for teams of the same level, difference in % between empirical distribution (historical data) and our Markov Binomial model (Old model)
Many aspects of CS:GO round dynamic were not taken into account in this attempt. To mention just a few of them:
- “Eco rounds”
- Catchup mechanisms
- Pistol rounds
This led us to go for a more complex model for production which captures better these mechanisms.
When we model some esports at PandaScore, we always try to evaluate simple solutions first and then iterate from this baseline. Markov Binomial models is not a common distribution — we did not know about it before working on CS:GO! — but has proven to be a good first approximation of the round dynamic of the game.