TL;DR

Project Goal: To determine if player "roles" in Counter-Strike are measurable behavioural archetypes that can be classified using machine learning, rather than just theoretical concepts.

Key Outcomes:

Data Processing & Feature Engineering: Processed over 1,000 match replay files (>400GB) to engineer a robust dataset of behavioural features, representing aspects of 'playstyle' such as aggression, trading, and positioning.
Quantifying Behaviour: Used statistical analysis (including visualisation, hypothesis testing and regression analysis) to formally model behavioural differences across roles, providing empirical validation for domain intuition (e.g. verifying how specific roles differ in positioning).
Role Classification: Developed a supervised learning model as a proof-of-concept for automating role labelling. Achieved ~0.92 (T-Side), ~0.76 (CT-Side) F1-Macro scores, demonstrating that player roles are statistically distinct and enabling the algorithmic identification of mislabelled data.

Potential Utility: This framework offers a scalable method for automated role labelling, applicable when manual annotation is unfeasible, such as on commercial analytics platforms or for large datasets.

Introduction

Counter-Strike (CS) has a rich history of competition spanning over two decades; it functions both as a casual game and a high-stakes sport with a healthy professional circuit (exceeding $30 million in annual prize pools). Each match of Counter-Strike acts as a complex system, with 5 players (agents) split onto two sides and constrained by a virtual arena. Out of this system, player "roles" have emerged as optimal ways to play the game. Similar to positions in football, terms like "Lurker" and "Anchor" describe a player's functional responsibility within the team structure.

One motivation for this project is that while these roles are widely discussed, they are determined largely by the "eye test" or community consensus. They do not appear on the scoreboard. Quantifying these behaviours allows us to add rigour to the ongoing discussion in the scene. For example, we can move from debating if a player is an "Entry-Fragger" to measuring exactly where they sit in on a statistical spectrum of aggression and why.

Beyond the theory, automated classification holds potential for significant commercial value. Services like Leetify or Scope.gg rely on providing users with a sense of "Identity". A tool that can mathematically validate a user's playstyle and tell them "You play like an Anchor" or "You played like (Pro Player) that match" could be a powerful engagement driver.

The Data

Source & Scale

The dataset for this project was constructed from scratch using raw match replay files (.dem) from the 2024 competitive season. To ensure high-quality "labels" for the roles, the data was restricted to Tier 1 Events (Majors and International LANs with >$250k prize pools), filtering out lower-tier matches where roles may not be recorded by my source or may be less structured than in more developed teams.

Total Demos Processed: ~1,150 Maps
Total Raw Size: >416 GB
Parsing Tool: awpy (Python)

Parsing binary files at this scale presented significant engineering challenges, including handling corrupted demos, technical pauses, and the sheer computational cost of processing hundreds of gigabytes of semi-structured event data into usable pandas DataFrames.

Data Processing Diagram — The data processing pipeline: from raw .dem files to an aggregated player-feature matrix.

Feature Engineering: Behaviour over Outcome

The core philosophy of this feature set was to quantify style, not skill. Most available statistics (K/D ratio, ADR, Win %) measure performance. I deliberately excluded these to ensure that a "good" Anchor and a "bad" Anchor would still be grouped together based on how they played, rather than their success.

Constraint: I explicitly excluded the potential feature "% of kills with sniper rifles". While this would have made classification trivial (AWPers (snipers) use AWPs), it would have masked the underlying behavioural patterns. I wanted to see if the model could identify an AWPer purely by how they behave, not just by the gun they hold.

I engineered three categories of behavioural features:

Aggression:

TAPD (Time Alive Per Death): Do they die early or survive late?
OAP (Opening Attempt %): How often do they take the first duel of the round?

Teamwork (Trading):

PODT (Proportion of Deaths Traded): When they die, is a teammate close enough to avenge them?
POKT (Proportion of Kills which were Trades): Are they the ones doing the avenging?

Positioning (Novel Metrics):

ADNT (Avg Distance to Nearest Teammate): Are they a "pack" player or a lone wolf?
ADAT (Avg Distance to Avg Teammate): Do they play central or peripheral angles?

Normalising Geometry: The 0.2–1.0 Scale

One major challenge with positional data is that maps have different sizes and geometries (e.g., Nuke has more verticality, Overpass is large). A distance of 100 units can mean something different on each map.

To solve this, I implemented a Team-Relative Rank system. For every sampled tick, players were ranked 1–5 based on their distance from teammates, then normalised to a 0.2–1.0 scale.

The scale ranges from (closest) 0.2 <-> 1.0 (farthest).

This normalisation allows one to compare the positional style of players, despite differences in which maps their teams tend to play (decided by a veto process).

The Target Labels

To supervise the analysis, I used expert role labels from Harry Richards (HLTV Analyst), whose Positions Databases are a gold standard in the community. The dataset covers these primary roles: Anchor, Rotator, Spacetaker, Lurker, and AWPer.

It is important to note that these labels are "fuzzy". Unlike positions in sports like Baseball (where a Pitcher is defined by the rules), or non-competitive videogames such as "Overwatch", Counter-Strike roles are strategic concepts. A "Lurker" might group up with the team for a specific execute. This ambiguity makes the classification task significantly harder (and more interesting) than a simple rule-based sort.

Dataset available on: Kaggle / Hugging Face / GitHub

Exploratory Analysis

Before building any models, I needed to answer a fundamental question: how many matches (maps) does a player need before their playstyle metrics stabilise?

Using bootstrap resampling to estimate measurement uncertainty, I established a 40-map threshold. Below this, behavioural estimates are too noisy to distinguish signal from variance. This filtered the dataset to a stable cohort of 84 professional players, enough to draw meaningful conclusions while ensuring each player's statistics are reliable.

Side Asymmetry: T vs CT

One of the clearest patterns in the data is the difference between attacking (T) and defending (CT) sides:

T-side players trade more consistently: Both the proportion of deaths that get traded (+4.6pp) and the proportion of kills that are trades (+7.6pp) are significantly higher on T-side. This reflects the coordinated nature of T-side executes: attackers push together, ensuring deaths are avenged.
T-side shows greater behavioural diversity: The standard deviation across most features is wider on T, while CT distributions are tighter. Defenders fall into more constrained patterns; attackers have more room for individual expression.
Yet T-side roles are more distinct: Using Mahalanobis distance (a multivariate measure of "how different" each role is from the others), T-side roles showed clearer separation. The most distinct role? T-side AWPers (D² = 4.40), whose weapon constraints enforce a recognisable playstyle across all players.

This asymmetry makes tactical sense: T-side strategies are executed, CT-side strategies are reactive.

The Positioning Problem (More Feature Engineering)

Early analysis revealed a problem: the two positioning metrics, ADNT (distance to nearest teammate) and ADAT (distance from team centre), were highly correlated (r = 0.92 on T-side, 0.78 on CT-side). This makes intuitive sense: players far from their nearest teammate tend to be far from the team centre.

But this correlation masked an interesting question: do some roles position further from the centre than their isolation alone would predict?

To answer this, I decomposed the relationship using linear regression. By fitting ADAT ~ ADNT for each side and extracting the residuals, I created an orthogonal positioning feature that captures "central vs peripheral tendency" independent of isolation.

The results were revealing:

Lurkers (T) and Anchors (CT) consistently position further from the team centre than their isolation predicts, holding off-angles and peripheral positions.
AWPers (CT) show the opposite: they stay closer to the centre despite isolation, holding fixed sightlines with team support nearby.
Rotators (CT) showed the highest variance in this residual, reflecting their adaptive, context-dependent positioning.

This feature engineering step reduced multicollinearity while preserving (and clarifying) the positional signal that distinguishes roles.

Deep Dive: View the interactive HTML report (11.4 MB) or the raw Jupyter Notebook on GitHub.

Classifying Roles

If roles are behaviourally distinct, can a machine learning model predict them purely from statistics?

I tested four algorithms (Logistic Regression, SVM, Random Forest, and XGBoost) using nested cross-validation (4 splits × 20 repeats = 80 folds) to ensure robust performance estimates on a small dataset. The results revealed a fundamental difference between the two sides of the game.

T-Side: Linear Separability

Champion Model: Logistic Regression | F1-Macro: ~0.92

Logistic Regression Coefficients for T-side Roles — Logistic regression coefficients showing how each feature influences role classification on T-side. Positive values push predictions towards that role, negative values push away.

The simplest model won. T-side roles are linearly separable: a straight line (or hyperplane) can cleanly divide AWPers, Lurkers, and Spacetakers in feature space. Complex ensembles like XGBoost offered no improvement.

This means T-side role identities are crisp: positioning metrics (ADNT, ADAT Residual) combined with aggression (OAP, TAPD) fully define the archetypes. The model's coefficients tell a clear story:

Lurkers: High isolation (ADNT), high trade-kill participation (POKT)
Spacetakers: High opening attempts (OAP), low survival time (TAPD)
AWPers: Low isolation (pack play), high survival time

CT-Side: Non-Linear Boundaries

Champion Model: Random Forest | F1-Macro: ~0.75

SHAP Feature Importance for CT-side Roles — SHAP beeswarm plots showing how features influence CT-side role predictions. Each dot represents a player; red indicates high feature values, blue indicates low values. Position along the x-axis shows impact on prediction.

CT-side required more complexity. Logistic Regression plateaued at F1 ~0.67, while Random Forest's non-linear decision boundaries pushed performance to ~0.75.

One of the limiting factors was Rotator vs AWPer confusion. These roles share similar positioning profiles on defence, and our features struggle to distinguish them. Interestingly, trade metrics (PODT, POKT) provided almost no discriminatory value on CT-side. Defensive role identity is almost entirely defined by where you play, not how you trade.

The "Ambiguous Role" Test

To test whether the model struggled with features or fuzzy definitions, I ran a sensitivity analysis excluding the hybrid roles (Half-Lurker on T, Mixed on CT).

The result: F1 improved by +0.28 (T) and +0.21 (CT).

This confirms that "Core" roles (Lurker, Spacetaker, Anchor, Rotator, AWPer) are statistically distinct archetypes. The hybrid roles weren't necessarily poorly-captured; they seem to genuinely blur the boundaries between playstyles.

The Sniper-Captain Confound

The most interesting finding came from analysing errors.

Four CT-side Rotators (biguzera, chopper, bLitz, and apEX) were misclassified as AWPers, sometimes confidently. What do they have in common? They're all In-Game Leaders (IGLs).

Digging into IGL feature values revealed the cause: almost all have unusually low ADAT Residuals, meaning they positioned closer to the team centre than typical Rotators. This "central positioning" pattern mimics the spacing profile of CT AWPers.

IGL Positioning Distribution — Comparison of ADAT Residual values for IGLs versus the wider CT role distribution. IGLs across all roles show a clear tendency towards central positioning (lower residual values).

Why would IGLs play centrally? Information flow. Central positions (think: Connector on Mirage, Middle on Ancient) allow the IGL to process information from both bombsites without relying solely on callouts. The trade-off is that these are high-engagement areas typically reserved for star players (often AWPers), which contributes to why IGLs are often criticised for underwhelming stats.

This isn't a novel discovery. Analyst Harry Richards documented the "supportive rotator" IGL trend in 2024. But seeing the model independently surface this pattern validates both the features and the classification approach.

Diagnostic Value

Perhaps the most practical outcome: high-confidence "misclassifications" could point to labelling errors.

For example, Spinx was labelled as an Anchor but the model confidently predicted Rotator. Upon reviewing his 2024 matches, he did indeed play rotating positions on Vitality. I had simply mislabelled him when filling gaps in the dataset.

This suggests a potential application: even if the model isn't accurate enough (yet) for fully automated labelling, it can serve as a diagnostic tool to flag inconsistencies in manually-curated role databases.

Deep Dive: View the interactive HTML report (3.2 MB) or the raw Jupyter Notebook on GitHub.

TL;DR

Project Goal: To determine if player "roles" in Counter-Strike are measurable behavioural archetypes that can be classified using machine learning, rather than just theoretical concepts.

Key Outcomes:

Data Processing & Feature Engineering: Processed over 1,000 match replay files (>400GB) to engineer a robust dataset of behavioural features, representing aspects of 'playstyle' such as aggression, trading, and positioning.
Quantifying Behaviour: Used statistical analysis (including visualisation, hypothesis testing and regression analysis) to formally model behavioural differences across roles, providing empirical validation for domain intuition (e.g. verifying how specific roles differ in positioning).
Role Classification: Developed a supervised learning model as a proof-of-concept for automating role labelling. Achieved ~0.92 (T-Side), ~0.76 (CT-Side) F1-Macro scores, demonstrating that player roles are statistically distinct and enabling the algorithmic identification of mislabelled data.

Potential Utility: This framework offers a scalable method for automated role labelling, applicable when manual annotation is unfeasible, such as on commercial analytics platforms or for large datasets.

Introduction