01 - Exploratory Data Analysis and Feature Engineering¶
Executive Summary¶
This analysis characterises the playstyles of 84 professional CS2 players to determine if distinct in-game roles exhibit measurable statistical signatures.
Key Findings:
- Data Stability: Established a reliability threshold of 40 maps, confirming that playstyle metrics stabilize sufficiently to distinguish signal from noise.
- Role Signatures:
- Lurkers (T) and Anchors (CT) show distinct positioning behaviour, consistently operating far from the team centre even when controlling for isolation.
- Spacetakers (T) and Rotators (CT) exhibit measurably more aggressive profiles, engaging in opening duels significantly more often (OAP) and surviving for less time (TAPD) than their passive counterparts.
- CT Rotators display significantly higher variance in positioning than Anchors, quantitatively validating their role as adaptive/reactive rather than static.
- Feature Engineering: Identified high multicollinearity ($r > 0.9$) between isolation metrics (
ADNT) and team-distance metrics (ADAT).- Solution: Decomposed this relationship using linear regression to create orthogonal residual features.
- Outcome: New features that isolate "positioning preference" from "teammate proximity," theoretically improving discriminative power for downstream classification models.
Outcome: A validated, engineered dataset containing unique behavioural fingerprints for each player, ready for supervised role classification.
Note: All analysis is performed separately for T and CT sides to preserve tactical context and enable side-specific modelling decisions.
Objectives¶
1. Data Stability & Inclusion Criteria
Estimate per-player measurement uncertainty as a function of map count using bootstrap standard errors. Determine a principled inclusion threshold (MIN_MAPS) balancing precision and sample size.
2. Feature Distributions & Role Profiles
Examine univariate distributions of behavioral metrics by side and role using kernel density estimates. Visualise role-level playstyle signatures via standardised radar plots to assess feature discriminability.
3. Statistical Validation
Quantify key patterns via formal hypothesis testing:
- Side-level differences (T vs CT) in trading behavior and variability
- Role contrasts (aggressive vs passive) across all behavioral dimensions
- Role distinctiveness via Mahalanobis distance in feature space
4. Feature Correlations
Compute pairwise Pearson correlations to identify shared variance and inform feature selection for clustering and classification.
5. Positional Feature Engineering
Decompose the ADNT–ADAT correlation via side-specific linear regression. Extract positioning residuals that capture role-specific tendencies (peripheral vs central positioning) independent of isolation effects.
6. Dataset Export
Produce a refined, analysis-ready dataset with engineered features, metadata flags, and complete documentation for reproducible downstream modelling.
Outputs¶
Tables & Summaries:
- Data coverage statistics and inclusion threshold justification
- Feature summary statistics by side and role
- Hypothesis test results with effect sizes and confidence intervals
- Correlation matrices (T and CT sides)
- Role distinctiveness rankings with feature contributions
Visualisations:
- Stability curves (standard error vs map count)
- Side-comparison KDEs for all features
- Role-stratified KDEs (T and CT)
- Interactive radar plots (role profiles by side)
- Split-diagonal correlation heatmap
- Positioning regression scatter plots with role stratification
- Role-stratified residual distributions
Processed Dataset:
data/processed/cs2_playstyles_2024_with_residuals.parquet
Contains all original features plus engineered positioning residuals (adat_residual_t, adat_residual_ct). Data remains unscaled to preserve interpretability and enable flexible preprocessing in subsequent notebooks.
Note: This notebook establishes the
MIN_MAPS = 40inclusion threshold and defines the stable cohort (n=84) used throughout the project. The derived residual features provide orthogonal positional information that reduces multicollinearity while retaining role-specific positioning signal.
1. Setup & Data Overview¶
No analysis here, feel free to skip to section 2
Configure imports, display options, and project paths. Load the dataset and report structure (shape, dtypes, missingness) and coverage tables to anchor later analysis. Plots are deferred to their relevant sections to keep the notebook focused.
# === Setup: paths, imports, theme ===
from pathlib import Path
import sys
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
# Dev convenience
%load_ext autoreload
%autoreload 2
# Display options
pd.set_option("display.max_columns", 120)
pd.set_option("display.width", 120)
# Resolve repository root (if in notebooks/, step up one level)
REPO_ROOT = Path.cwd().resolve().parent if Path.cwd().name.lower() == "notebooks" else Path.cwd().resolve()
# Canonical paths
DATA_DIR = REPO_ROOT / "data"
RESULTS_DIR = REPO_ROOT / "results" / "eda"
FIG_DIR = RESULTS_DIR / "figures"
TAB_DIR = RESULTS_DIR / "tables"
# Ensure results dirs exist
FIG_DIR.mkdir(parents=True, exist_ok=True)
TAB_DIR.mkdir(parents=True, exist_ok=True)
# Dataset path
DATA_PATH = DATA_DIR / "raw" / "cs2_playstyle_roles_2024.csv"
assert DATA_PATH.exists(), f"Dataset not found at {DATA_PATH}"
# Local helpers
SRC_DIR = REPO_ROOT / "src"
if str(SRC_DIR) not in sys.path:
sys.path.insert(0, str(SRC_DIR))
from style import (
set_mpl_theme,
set_seaborn_theme,
ROLE_COLOURS,
get_role_colour,
)
from stability import (
rates_quantile_table,
tapd_quantile_table,
retained_summary,
)
from viz import (
plot_rates_quantile,
plot_tapd_quantile,
plot_mapcount_hist,
summarize_side_stats,
plot_kdes_roles_by_side,
plot_kdes_side_compare,
summarize_role_stats_by_side,
plot_correlation_split_heatmap,
compute_feature_correlations,
plot_positioning_residuals_by_role
)
from hypo_tests import (
test_side_trade_increase,
test_side_variability_ct_less,
test_role_contrast,
test_role_contrast,
compute_role_distinctiveness,
)
from viz_plotly import (
plot_role_radars_interactive,
plot_positioning_regression_interactive,
)
from features import (
fit_positioning_regressions,
compute_positioning_residuals,
)
# Themes
set_mpl_theme(mode="dark", preferred_font="Georgia")
set_seaborn_theme(mode="dark", preferred_font="Georgia")
# Echo key paths
REPO_ROOT, DATA_PATH, FIG_DIR, TAB_DIR
(WindowsPath('P:/cs2-playstyle-analysis-2024'),
WindowsPath('P:/cs2-playstyle-analysis-2024/data/raw/cs2_playstyle_roles_2024.csv'),
WindowsPath('P:/cs2-playstyle-analysis-2024/results/eda/figures'),
WindowsPath('P:/cs2-playstyle-analysis-2024/results/eda/tables'))
Load & structural overview¶
Inspect shape, preview rows, data types, and missingness. Identify side-suffixed fields for later grouping.
# Load dataset and run structural checks
df = pd.read_csv(DATA_PATH)
# Shape and a compact preview
print("shape:", df.shape)
display(df.head(3))
# Column summary
col_info = (
pd.DataFrame({
"column": df.columns,
"dtype": df.dtypes.astype(str),
"missing_rate": df.isna().mean().round(4)
})
.sort_values(["missing_rate", "column"], ascending=[False, True])
)
display(col_info.head(25)) # preview top 25 by missingness
# Quick counts by dtype for a high-level feel
dtype_counts = col_info["dtype"].value_counts().rename_axis("dtype").to_frame("n")
display(dtype_counts)
# Light check for commonly expected identifiers (report only)
expected = ["steamid", "player_name", "team_clan_name", "map_count"]
present = [c for c in expected if c in df.columns]
missing = [c for c in expected if c not in df.columns]
print("present expected cols:", present)
print("missing expected cols:", missing)
# Detect side-suffixed families (useful for later grouping)
suffixes = ("_t", "_ct", "_overall")
side_cols = {suf: [c for c in df.columns if c.endswith(suf)] for suf in suffixes}
{key: len(val) for key, val in side_cols.items()}
shape: (306, 25)
| steamid | player_name | team_clan_name | map_count | tapd_ct | tapd_t | tapd_overall | oap_ct | oap_t | oap_overall | podt_ct | podt_t | podt_overall | pokt_ct | pokt_t | pokt_overall | adnt_rank_ct | adnt_rank_t | adnt_rank_overall | adat_rank_ct | adat_rank_t | adat_rank_overall | role_overall | role_t | role_ct | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 76561198041683378 | NiKo | G2 Esports | 158 | 60.952893 | 59.136540 | 60.136000 | 24.745965 | 24.093423 | 24.424242 | 21.020276 | 24.857741 | 22.507740 | 17.051295 | 21.586555 | 19.197995 | 0.562493 | 0.621199 | 0.593089 | 0.547525 | 0.695336 | 0.616046 | Lurker | Spacetaker | Rotator |
| 1 | 76561198012872053 | huNter | G2 Esports | 158 | 62.048685 | 62.871661 | 62.589800 | 16.852540 | 14.807692 | 15.847511 | 21.585198 | 27.994772 | 24.696747 | 17.195516 | 27.180894 | 22.538284 | 0.480859 | 0.643004 | 0.571875 | 0.406082 | 0.615021 | 0.455108 | Flex | Lurker | Rotator |
| 2 | 76561198074762801 | m0NESY | G2 Esports | 155 | 62.786553 | 66.632594 | 64.362519 | 23.914373 | 17.754078 | 20.873335 | 19.122381 | 23.094640 | 21.473108 | 17.397469 | 26.423178 | 21.056274 | 0.577785 | 0.423733 | 0.453028 | 0.515617 | 0.409889 | 0.427645 | AWPer | AWPer | AWPer |
| column | dtype | missing_rate | |
|---|---|---|---|
| role_overall | role_overall | object | 0.4183 |
| role_ct | role_ct | object | 0.3856 |
| role_t | role_t | object | 0.3856 |
| adat_rank_ct | adat_rank_ct | float64 | 0.0000 |
| adat_rank_overall | adat_rank_overall | float64 | 0.0000 |
| adat_rank_t | adat_rank_t | float64 | 0.0000 |
| adnt_rank_ct | adnt_rank_ct | float64 | 0.0000 |
| adnt_rank_overall | adnt_rank_overall | float64 | 0.0000 |
| adnt_rank_t | adnt_rank_t | float64 | 0.0000 |
| map_count | map_count | int64 | 0.0000 |
| oap_ct | oap_ct | float64 | 0.0000 |
| oap_overall | oap_overall | float64 | 0.0000 |
| oap_t | oap_t | float64 | 0.0000 |
| player_name | player_name | object | 0.0000 |
| podt_ct | podt_ct | float64 | 0.0000 |
| podt_overall | podt_overall | float64 | 0.0000 |
| podt_t | podt_t | float64 | 0.0000 |
| pokt_ct | pokt_ct | float64 | 0.0000 |
| pokt_overall | pokt_overall | float64 | 0.0000 |
| pokt_t | pokt_t | float64 | 0.0000 |
| steamid | steamid | int64 | 0.0000 |
| tapd_ct | tapd_ct | float64 | 0.0000 |
| tapd_overall | tapd_overall | float64 | 0.0000 |
| tapd_t | tapd_t | float64 | 0.0000 |
| team_clan_name | team_clan_name | object | 0.0000 |
| n | |
|---|---|
| dtype | |
| float64 | 18 |
| object | 5 |
| int64 | 2 |
present expected cols: ['steamid', 'player_name', 'team_clan_name', 'map_count'] missing expected cols: []
{'_t': 7, '_ct': 7, '_overall': 7}
Coverage overview¶
Summarise player/team counts and map_count distribution buckets as tables, and record role-label missingness. A binned map_count figure will be shown in the Stability section where its interpretation is most relevant.
# Coverage summary, binned map-count table, role-label missingness, and threshold retention
import pandas as pd
# Coverage summary
coverage = pd.DataFrame({
"n_players": [df["steamid"].nunique()],
"n_teams": [df["team_clan_name"].nunique()],
"mean_maps": [df["map_count"].mean()],
"median_maps": [df["map_count"].median()],
"p10_maps": [df["map_count"].quantile(0.10)],
"p90_maps": [df["map_count"].quantile(0.90)],
"min_maps": [df["map_count"].min()],
"max_maps": [df["map_count"].max()],
}).round(2)
display(coverage)
coverage.to_csv(TAB_DIR / "eda_coverage_summary.csv", index=False)
# Binned map-count table (no plot)
bins = [0, 10, 20, 30, 40, 60, 80, 120, 160, 1_000]
map_bins = pd.cut(df["map_count"], bins=bins, right=False)
map_hist = (
map_bins.value_counts()
.sort_index()
.rename_axis("map_count_bin")
.reset_index(name="n_players")
)
display(map_hist)
map_hist.to_csv(TAB_DIR / "eda_mapcount_hist.csv", index=False)
# Role label missingness per field
role_cols = [c for c in ("role_overall", "role_t", "role_ct") if c in df.columns]
role_missing = df[role_cols].isna().mean().rename("missing_rate").to_frame()
display(role_missing)
role_missing.to_csv(TAB_DIR / "eda_role_missingness.csv")
# Players retained under common min-map thresholds (feeds into stability analysis)
thresholds = [20, 30, 40, 60, 80, 100, 120]
thr_table = (
pd.DataFrame({"min_map_count": thresholds})
.assign(n_players=lambda t: t["min_map_count"].apply(lambda m: (df["map_count"] >= m).sum()))
)
thr_table["share_players"] = (thr_table["n_players"] / df["steamid"].nunique()).round(3)
display(thr_table)
thr_table.to_csv(TAB_DIR / "eda_threshold_coverage.csv", index=False)
| n_players | n_teams | mean_maps | median_maps | p10_maps | p90_maps | min_maps | max_maps | |
|---|---|---|---|---|---|---|---|---|
| 0 | 306 | 61 | 35.72 | 15.0 | 4.0 | 113.0 | 1 | 158 |
| map_count_bin | n_players | |
|---|---|---|
| 0 | [0, 10) | 111 |
| 1 | [10, 20) | 58 |
| 2 | [20, 30) | 23 |
| 3 | [30, 40) | 30 |
| 4 | [40, 60) | 10 |
| 5 | [60, 80) | 31 |
| 6 | [80, 120) | 16 |
| 7 | [120, 160) | 27 |
| 8 | [160, 1000) | 0 |
| missing_rate | |
|---|---|
| role_overall | 0.418301 |
| role_t | 0.385621 |
| role_ct | 0.385621 |
| min_map_count | n_players | share_players | |
|---|---|---|---|
| 0 | 20 | 137 | 0.448 |
| 1 | 30 | 114 | 0.373 |
| 2 | 40 | 84 | 0.275 |
| 3 | 60 | 74 | 0.242 |
| 4 | 80 | 43 | 0.141 |
| 5 | 100 | 33 | 0.108 |
| 6 | 120 | 27 | 0.088 |
2. Stability threshold analysis (Map Count)¶
TL;DR. We set MIN_MAPS = 40 to balance precision and sample size.
Details (click to expand)
Objective. Pick a minimum map threshold where per-player metrics are precise enough for analysis.
Metrics. Rates: oap_overall, podt_overall, pokt_overall. Duration-like: tapd_overall.
*Positional rank metrics (adnt_*, adat_*) are not used for this threshold decision* (see “Omitted metrics” below).
Uncertainty model.
• Rates: per-player standard error (SE) in percentage points (pp) via a binomial approximation with trials ≈ k × maps (rounds for opener; kills/deaths for trades).
• tapd: relative proxy proportional to 1/√maps (arbitrary units); decision is based on the plateau rather than a numeric tolerance.
Aggregation & display. Players are binned by map_count; figures show the 75th percentile (p75) of per-player SE vs. bin center.
The rates figure includes a 2.0 pp tolerance line; both figures mark the chosen MIN_MAPS.
Assumptions. The trials-per-map factor k is assumed; reasonable changes shift levels but not the 1/√maps shape or the cutoff region.
Omitted metrics (positional ranks).
Rank features are ordinal and team/context-relative; without per-map variability we can’t construct a meaningful sampling-error model.
Future data / improvements.
Collect per-map (or per-round/per-life) microdata to compute actual standard errors or confidence/credible intervals for opener attempts, trades, time-alive, and positional measures. With microdata, the threshold can be revisited using measured uncertainty instead of approximations.
Parameters¶
# === Parameters ===
# Map-count bins for summaries and plots
MAP_BINS = [0, 10, 20, 30, 40, 50, 60, 80, 100, 120, 140, 160]
# Trials-per-map assumptions (sensitivity moves levels, not shape)
K_OAP, K_PODT, K_POKT = 20.0, 18.0, 14.0
# Quantile for bin summary (p75 is conservative and robust)
Q = 0.75
# Decision lines for figures
MIN_MAPS = 40
TOL_PP = 2.0 # percentage points for rate metrics; set None to hide
Proportional Features¶
Uncertainty falls quickly and is ≤ 2.0 pp by ~40 maps for oap/podt/pokt; gains flatten beyond.
# === Stability: rates (oap/podt/pokt), pQ in percentage points ===
rates_tbl = rates_quantile_table(
df, MAP_BINS, k_oap=K_OAP, k_podt=K_PODT, k_pokt=K_POKT, q=Q, map_col="map_count"
)
plot_rates_quantile(
rates_tbl, q=Q, tol_pp=TOL_PP, min_maps=MIN_MAPS,
savepath=FIG_DIR / f"stability_rates_p{int(Q*100)}_pp.png",
savepath_svg=FIG_DIR / f"stability_rates_p{int(Q*100)}_pp.svg",
)
Time Alive Per Death (TAPD)¶
Steep early drop then plateau around ~40 maps; higher thresholds give diminishing returns.
# === Stability: tapd (duration proxy), pQ in proxy units ===
tapd_tbl, _ = tapd_quantile_table(
df, MAP_BINS, q=Q, map_col="map_count", scale=None, ref_mask=None
)
plot_tapd_quantile(
tapd_tbl, q=Q, min_maps=MIN_MAPS,
savepath=FIG_DIR / f"stability_tapd_p{int(Q*100)}_proxy.png",
savepath_svg=FIG_DIR / f"stability_tapd_p{int(Q*100)}_proxy.svg",
)
Coverage — players by map count.¶
MIN_MAPS = 40 trims the low-map tail while retaining a substantial cohort.
# === Coverage: players by map_count and retention at MIN_MAPS ===
ret = retained_summary(df, MIN_MAPS, map_col="map_count")
plot_mapcount_hist(
df, bins=list(range(0, 180, 20)), min_maps=MIN_MAPS,
savepath=FIG_DIR / "mapcount_hist.png",
savepath_svg=FIG_DIR / "mapcount_hist.svg",
)
print(f"Total players: {ret['total']}")
print(f"Retained at MIN_MAPS={ret['min_maps']}: {ret['retained']} ({ret['retained_pct']:.1f}%)")
Total players: 306 Retained at MIN_MAPS=40: 84 (27.5%)
Creating the stable cohort dataframe and checking for missing roles.
# === Stable cohort: create df_stable and check for missing roles ===
# Stable cohort dataframe with players having at least MIN_MAPS
df_stable = df.loc[df["map_count"] >= MIN_MAPS].copy()
# Sanity check: no missing/empty roles in stable cohort. If you select a lower threshold, roles may be missing.
role_cols = ["role_overall", "role_t", "role_ct"]
for col in role_cols:
mask = df_stable[col].isna() | (df_stable[col].astype(str).str.strip() == "")
n = int(mask.sum())
print(f"{col} missing: {n}")
assert n == 0, f"Some players in the stable cohort are missing {col}."
role_overall missing: 0 role_t missing: 0 role_ct missing: 0
Summary & cutoff¶
- Rates figure: p75 SE for
oap/podt/poktdrops rapidly and is ≤ 2.0 pp by ~40 maps for all three; beyond 40 the curves flatten. - tapd figure: p75 proxy shows a clear diminishing-returns plateau around ~40 maps.
- Coverage figure: a large low-map tail; 40 maps removes the noisiest segment while retaining a substantial cohort.
Decision: MIN_MAPS = 40.
Why: Meets the rates tolerance, tapd has plateaued, and coverage remains relatively strong.
3. Feature Distributions & Role Profiles¶
TL;DR — Feature Distributions & Role Profiles: Filled KDEs show side/role distribution shapes; tables give μ/σ/n; radars summarise role differences.
Details (click to expand)
Objective. Describe how the six core features vary by side (CT/T) and role, quantify location/dispersion, and provide compact role profiles to support comparison and downstream clustering.
Questions.
- Do CT and T differ in opener activity, trading, time-alive, or spacing?
- Within a side, which roles occupy distinct regions of each feature (shape/shift/overlap)?
- What are the exact means (μ) and standard deviations (σ) per side/role?
- How do roles compare across features at once (radar “shape”)?
Visuals.
- KDEs (CT vs T): 2×3 grid with semi-transparent fills; dashed means only for non-positional features (TAPD, OAP, PODT, POKT). Positional ranks (ADNT/ADAT) are read by shape/shift, not mean.
- KDEs (by role, per side): 2×3 grids overlaying roles; low-n roles excluded;
roles_ordercan limit which roles are shown. - Tables: compact μ/σ/n summaries for sides and for roles (per side).
- Radars (per side): normalised multi-feature role profiles for quick comparison.
Conventions & filters.
- Colors: CT = blue, T = orange; roles use the project role palette.
- Scales: X-limits fixed per feature for like-for-like comparison; adjust
bw_adjustif smoothing looks off. - Inclusion: Optional
min_map_countfor players,min_role_nfor roles;roles_orderacts as a role toggle.
How to read.
- In KDEs, focus on location, spread, and overlap; watch for bimodality.
- Use tables for exact μ/σ/n (don’t interpret means for rank features).
- Radars summarise cross-feature role differences; they complement, not replace, the distributions.
Scope.
- Positional ranks are ordinal and team-relative; means are not meaningful, so mean lines are omitted.
CT vs T — Distributions (KDE) + Side Summary Table¶
2×3 KDE grid comparing CT vs T with filled densities; dashed means only for non-positional features (TAPD, OAP, PODT, POKT). X-limits are fixed per feature for comparability. The table below reports μ/σ/n for each side and feature.
# === Role summaries and plots by side ===
plot_kdes_side_compare(
df=df_stable,
bw_adjust=1,
min_map_count=None, # Select a min_map_count if desired (use df instead of df_stable)
savepath=FIG_DIR / "kde_side_compare.png",
savepath_svg=FIG_DIR / "kde_side_compare.svg",
)
# Side summary table
stats_tbl = summarize_side_stats(df_stable, min_map_count=None)
display(stats_tbl.round(2))
(stats_tbl.round(3)
.to_csv(RESULTS_DIR / "tables" / "side_kde_summary.csv"))
| Feature | CT μ | CT σ | T μ | T σ | n | |
|---|---|---|---|---|---|---|
| 0 | Time Alive Per Death (TAPD) (s) | 60.67 | 3.34 | 60.29 | 4.60 | 84 |
| 1 | Opening Attempt % (OAP) | 20.18 | 3.50 | 20.17 | 5.90 | 84 |
| 2 | Proportion of Deaths Traded % (PODT) | 19.66 | 2.82 | 24.27 | 3.33 | 84 |
| 3 | Proportion of Kills that are Trades % (POKT) | 16.07 | 2.13 | 23.66 | 3.72 | 84 |
| 4 | Distance to Nearest Teammate (ADNT) – rank | 0.60 | 0.11 | 0.60 | 0.14 | 84 |
| 5 | Distance from Average Teammate (ADAT) – rank | 0.60 | 0.15 | 0.60 | 0.15 | 84 |
Observations
- T side shows a noticeably wider spread across most features, suggesting greater behavioural diversity.
- The clearest side difference appears in the trading measures, where T values are generally higher.
- Most distributions are smooth and single-peaked, while T-side spacing (ADAT) stands out as distinctly bimodal, hinting at mixed positional tendencies.
Roles — T Side (KDE) + Summary Table¶
2×3 KDE grid for T-side overlaid by role . Use t_roles to include specific roles; roles with low n are excluded via min_role_n. The table below gives μ/σ/n by role × feature.
# T roles to choose from: "AWPer","Spacetaker","Lurker","Half-Lurker"
t_roles = ["AWPer","Spacetaker","Lurker","Half-Lurker"]
plot_kdes_roles_by_side(
df=df_stable, side="t",
roles_order=t_roles, min_role_n=8, bw_adjust=1.2, min_map_count=None,
savepath=FIG_DIR / "kde_roles_T.png",
savepath_svg=FIG_DIR / "kde_roles_T.svg",
)
# Summary table for T side, use wide=False for long format (easier role comparisons)
tbl_roles_t = summarize_role_stats_by_side(
df_stable, side="t", roles_order=t_roles, min_role_n=8,
round_to=2, wide=True
)
display(tbl_roles_t)
# Optional save
tbl_roles_t.to_csv(RESULTS_DIR / "tables" / "role_kde_summary_T.csv", index=False)
| Feature | AWPer μ | AWPer σ | Half-Lurker μ | Half-Lurker σ | Lurker μ | Lurker σ | Spacetaker μ | Spacetaker σ | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | Distance from Average Teammate (ADAT) – rank | 0.43 | 0.06 | 0.66 | 0.09 | 0.75 | 0.08 | 0.55 | 0.10 |
| 1 | Distance to Nearest Teammate (ADNT) – rank | 0.43 | 0.08 | 0.65 | 0.08 | 0.72 | 0.09 | 0.57 | 0.11 |
| 2 | Opening Attempt % (OAP) | 14.20 | 2.85 | 22.42 | 4.12 | 17.68 | 3.82 | 24.51 | 5.30 |
| 3 | Proportion of Deaths Traded % (PODT) | 21.57 | 2.30 | 25.05 | 2.51 | 24.90 | 3.55 | 24.97 | 3.27 |
| 4 | Proportion of Kills that are Trades % (POKT) | 25.81 | 2.48 | 23.41 | 2.79 | 25.07 | 3.50 | 21.49 | 3.69 |
| 5 | Time Alive Per Death (TAPD) (s) | 64.36 | 4.40 | 57.24 | 3.78 | 61.89 | 3.25 | 58.00 | 3.77 |
Observations (T-side Roles)
- AWPers tend to have more distinct distributions, with relatively sharp peaks suggesting less intra-role playstyle diversity.
- Positional features display the most separation between all roles.
- All roles overlap with all other roles, implying inherent fluidity in playstyle.
- Positionally, the more ambiguous role "Half-Lurker" tends to sit between the aggresive and passive roles (Spacetaker/Lurker), with AWPers positioning closest to teammates.
Roles — CT Side (KDE) + Summary Table¶
2×3 KDE grid for CT-side overlaid by role. Use ct_roles to include specific roles; roles with low n are excluded via min_role_n. The table below gives μ/σ/n by role × feature.
# CT roles to choose from: "AWPer","Anchor","Rotator","Mixed"
# Toggle which roles to include (optional)
ct_roles = ["AWPer","Anchor","Rotator","Mixed",]
plot_kdes_roles_by_side(
df=df_stable, side="ct",
roles_order=ct_roles, min_role_n=8, bw_adjust=1.2, min_map_count=None,
savepath=FIG_DIR / "kde_roles_CT.png",
savepath_svg=FIG_DIR / "kde_roles_CT.svg",
)
# Summary table for CT side, use wide=False for long format (easier role comparisons)
tbl_roles_ct = summarize_role_stats_by_side(
df_stable, side="ct", roles_order=ct_roles, min_role_n=8,
round_to=2, wide=True
)
display(tbl_roles_ct)
# Optional save
tbl_roles_ct.to_csv(RESULTS_DIR / "tables" / "role_kde_summary_CT.csv", index=False)
| Feature | AWPer μ | AWPer σ | Anchor μ | Anchor σ | Mixed μ | Mixed σ | Rotator μ | Rotator σ | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | Distance from Average Teammate (ADAT) – rank | 0.53 | 0.03 | 0.77 | 0.10 | 0.63 | 0.10 | 0.49 | 0.11 |
| 1 | Distance to Nearest Teammate (ADNT) – rank | 0.61 | 0.05 | 0.71 | 0.09 | 0.61 | 0.08 | 0.51 | 0.10 |
| 2 | Opening Attempt % (OAP) | 21.37 | 3.35 | 17.80 | 3.11 | 19.38 | 2.97 | 21.76 | 3.15 |
| 3 | Proportion of Deaths Traded % (PODT) | 18.80 | 2.69 | 19.06 | 2.44 | 20.42 | 3.46 | 20.15 | 2.60 |
| 4 | Proportion of Kills that are Trades % (POKT) | 15.86 | 2.21 | 15.88 | 2.30 | 16.39 | 2.44 | 16.15 | 1.78 |
| 5 | Time Alive Per Death (TAPD) (s) | 62.05 | 3.42 | 61.95 | 3.21 | 60.40 | 3.13 | 59.03 | 2.85 |
Observations (T-side Roles)
- Roles appear far less distinct on the CT side, with much more overlap in distributions. This reflects the more reactive and fluid nature of CT playstyles.
- Positional features again display the most separation between all roles. With AWPers having noticeably sharper peaks, suggesting less intra-role playstyle diversity positionally.
- Positional features mirror the pattern on T side where the ambiguous role tends to sit between the aggressive and passive roles.
Interactive Role Profile Radars¶
Explore standardised radar plots comparing T and CT role profiles across all six features.
Use the interactive controls above the plot to toggle uncertainty bands and baseline visibility.
Notes:
- Click on a role to toggle it
- Bands are best used when looking at one or two roles at a time
- Double-Click a plot to return the view back to normal
# === Role radar plots (interactive) ===
fig = plot_role_radars_interactive(df, min_map_count=40)
fig.show(renderer="notebook")
# Static PNG export requires Chrome and Kaleido; commented out for portability
# fig.write_image(FIG_DIR / "role_radars_t_vs_ct.png", width=1500, height=840)
# Saving as HTML
fig.write_html(FIG_DIR / "role_radars_t_vs_ct.html", include_plotlyjs="embed")
Observations
- T side AWPers appear to have the most unique playstyle profile of all roles, aligning with distribution observations.
- Profiles of the more "passive" roles (Lurker, Anchor) appear very similar to each other across sides.
- The "aggresive" roles (Spacetaker, Rotator) also appear to have similar profile shape.
- The positional features (ADNT/ADAT) seem to delineate roles best, aligning with distribution observations.
Summary and Interpretations¶
The presence of clear, interpretable patterns in the playstyle–role data provides quantitative support for two key assumptions:
- The selected behavioural features form a valid representation of player playstyle.
- The expert-assigned role labels capture meaningful differences in player tendencies and responsibilities.
Key insights
For readers familiar with Counter-Strike, much of this will feel intuitive—the figures mostly quantify expectations rather than overturn them. The value here is turning those intuitions into measurable evidence.
Substantial overlap across roles
Counter-Strike has no class system; everyone has the same tools. Playstyle comes from choices within each round, so overlap in feature distributions is expected. Role labels reflect general responsiblities, not hard rules—for example, an individual hyper-aggressive AWPer could resemble a typical Spacetaker on engagement metrics.An aggression spectrum among rifler roles
On T, Spacetaker ↔ Lurker, and on CT, Rotator ↔ Anchor, broadly map onto aggressive ↔ passive behaviour. The aggressive end tends to take more opening duels (OAP) and survive for less time (TAPD), and typically plays closer to teammates (supporting trades). The passive end tends to avoid first contact and holds space more conservatively. These differences are less pronounced on CT side, presumably due to the T side typically determining where engagements take place.Positioning features separate roles most clearly
This is unsurprising: many roles are fundamentally positional responsibilities (e.g., “Spacetaker” implies taking territory). Even the AWPer, though defined by weapon choice rather than a position, is constrained by the AWP’s cost and handling—typically yielding more conservative spacing and greater reliance on nearby support.T-side AWPers are the most distinct group
Their distributions are tighter and their positional/engagement profiles stand apart on T, suggesting a more standardised way of playing the role. Adding further behavioural features could confirm or refute this and reveal additional role-specific nuances.
4. Exploratory Hypothesis Checks¶
TL;DR — Exploratory Hypothesis Checks: This section quantifies a few side- and role-related patterns surfaced in Section 3. We keep it exploratory and lean: effect sizes with 95% CIs, exact p-values (where applicable), and a small set of targeted questions.
Details (click to expand)
What’s included (and why):
Side: Are trades higher on T?
We compare each player’s T vs CT values for the trading metrics (PODT, POKT). A paired, one-sided check on Δ = T − CT estimates how much higher T is, on average. This directly tests the Section 3 observation that trading is more prevalent on T.Side: Is CT less variable (tighter spread)?
We compare spread between sides per feature (OAP, TAPD, PODT, POKT). Effect size is the SD ratio (CT/T) with a bootstrap CI; inference uses a one-sided permutation on IQR (robust to non-normality). This targets the idea that CT play is more constrained/reactive.Roles: Aggressive vs Passive contrasts (within side)
We test predefined role contrasts (e.g., Spacetaker vs Lurker on T; Rotator vs Anchor on CT) for features tied to aggression and trading. Primary is a Welch one-sided mean difference (standardized, with CI); we include a Mann–Whitney sensitivity (Cliff’s δ + p) to check robustness with modest n.Roles: Distinctiveness index (side-specific)
We compute a centroid-distance index in standardized feature space and rank roles within side, highlighting where separation is strongest (often T AWPers). This provides a concise “how distinct is each role?” summary without building a classifier here.
Framing: These are post-hoc exploratory checks that quantify patterns; they are not pre-registered confirmatory tests. We emphasize effect sizes and uncertainty over dichotomous decisions.
#1 Are trades higher on T? (PODT & POKT)¶
Design. For each player, compute Δ = T − CT for:
- PODT — proportion of deaths that are traded (5s window).
- POKT — proportion of kills which are trades (5s window).
We then run a one-sided paired comparison (H₁: Δ > 0) to estimate the typical increase on T.
Reporting: mean Δ with 95% bootstrap Confidence Interval (CI), exact one-sided p-value, and the number of paired players after the ≥ MIN_MAPS filter. Positive Δ supports “more trading on T,” and the CI shows its magnitude.
# === Hypothesis test: is trading higher on T? ===
MIN_MAPS = 40 # minimum maps to include player in test
res_trades = test_side_trade_increase(df, min_maps=MIN_MAPS)
display(res_trades)
res_trades.to_csv(TAB_DIR / "hypo_test_t_more_trading.csv", index=False)
| feature | n | mean_delta | ci95_lo | ci95_hi | t_stat | p_one_sided | alternative | test | note | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | PODT | 84 | 4.609876 | 3.898434 | 5.350810 | 12.486384 | 4.732547e-21 | T > CT | paired_t (mean Δ) | Players with ≥40 maps; Δ = podt_t − podt_ct |
| 1 | POKT | 84 | 7.587971 | 6.826470 | 8.352514 | 19.608965 | 3.363318e-33 | T > CT | paired_t (mean Δ) | Players with ≥40 maps; Δ = pokt_t − pokt_ct |
Summary. Both trading metrics are higher on T at the player level:
- PODT: mean Δ ≈ +4.6 pp (95% CI +3.9 to +5.35), one-sided p ≈ 4.7e-21.
- POKT: mean Δ ≈ +7.6 pp (95% CI +6.82 to +8.35), one-sided p ≈ 3.4e-33.
Interpretation. These effect sizes are consistent with Section 3’s interpretation that T sides trade more—both in how often deaths are traded (PODT) and how often kills are trades (POKT). The paired design isolates within-player differences, so the reported lifts reflect a typical per-player increase on T rather than differences driven by player mix.
#2 Are CT playstyle features tighter (less variable) than T?¶
What we measure
- Effect size: Standard Deviation ratio = SD(CT) / SD(T) across the same set of players. Values below 1 mean CT is tighter. We show a 95% bootstrap confidence interval for this ratio using a paired bootstrap on log-ratios.
- Directional p-value (robust): One-sided paired permutation on the IQR difference (IQR_CT − IQR_T). Under no side difference in spread, flipping CT/T labels within each player shouldn’t systematically change this difference. We report the probability of getting a value as small or smaller than observed (alternative: CT spread < T spread).
Why this combo? SD ratio is intuitive for the effect size; IQR is robust to outliers for the p-value. Together: an interpretable magnitude plus an honest, distribution-free check.
# === Cell — #2: CT tighter spread test (SD ratio + perm test on IQR) ===
res_var = test_side_variability_ct_less(df, min_maps=MIN_MAPS)
display(res_var)
res_var.to_csv(TAB_DIR / "hypo_test_ct_tighter_variability.csv", index=False)
| feature | n_players | sd_ct | sd_t | sd_ratio_ct_over_t | ci95_lo_ratio | ci95_hi_ratio | iqr_ct | iqr_t | p_one_sided_perm_IQR | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | OAP | 84 | 3.496475 | 5.903650 | 0.592256 | 0.488798 | 0.713436 | 4.467156 | 8.924401 | 0.000100 |
| 1 | TAPD | 84 | 3.335992 | 4.596741 | 0.725730 | 0.608445 | 0.871978 | 4.089794 | 6.483488 | 0.011199 |
| 2 | PODT | 84 | 2.818093 | 3.328184 | 0.846736 | 0.684628 | 1.054527 | 3.974669 | 4.594908 | 0.279072 |
| 3 | POKT | 84 | 2.126360 | 3.719068 | 0.571745 | 0.457596 | 0.716249 | 2.493735 | 4.831818 | 0.024998 |
Summary. CT distributions show tighter spread on most features:
- OAP: SD(CT)/SD(T) ≈ 0.59 (95% CI 0.49–0.71), permutation p ≈ 0.0001 → strong evidence CT is tighter.
- TAPD: SD ratio ≈ 0.73 (95% CI 0.61–0.87), p ≈ 0.011 → CT moderately tighter.
- PODT: SD ratio ≈ 0.85 (95% CI 0.68–1.05), p ≈ 0.279 → CI includes 1.00; no clear side difference.
- POKT: SD ratio ≈ 0.57 (95% CI 0.46–0.72), p ≈ 0.025 → CT tighter.
Read: These results support Section 3’s interpretation that CT play is generally less variable, especially for opening duels (OAP), time alive per death (TAPD) and trade-kill behaviour (POKT). For PODT, the CI spans 1.00, so the direction is uncertain.
#3 Do "aggressive" and "passive" roles show distinct behavioral profiles?¶
What we test
- Roles compared:
- T-side: Spacetaker (aggressive) vs Lurker (passive)
- CT-side: Rotator (aggressive) vs Anchor (passive)
- Features: All six behavioral metrics (OAP, TAPD, PODT, POKT, ADNT_rank, ADAT_rank) for the respective side.
- Statistical tests:
- Welch's t-test (two-sided): Compares means without assuming equal variances. Reports t-statistic and p-value.
- Mann-Whitney U test (two-sided): Nonparametric alternative, robust to non-normality. Provides sensitivity check on p-values.
- Cohen's d: Standardised mean difference (effect size). Values around 0.2, 0.5, and 0.8 are typically considered small, medium, and large effects respectively.
Why this approach? Welch's t-test provides interpretable effect sizes (Cohen's d) and handles unequal group sizes/variances. Mann-Whitney U confirms findings hold under weaker distributional assumptions. Together: a parametric estimate with nonparametric validation.
Alternative comparisons: The function accepts any role pair with side suffix (e.g., 'AWPer_t' vs 'AWPer_ct' to compare the same role across sides, or any other combination like 'AWPer_t' vs 'Spacetaker_t').
# Role contrast: Aggressive vs Passive roles
# T-side: Spacetaker (aggressive) vs Lurker (passive)
result_t = test_role_contrast(
df=df,
role_a='Spacetaker_t',
role_b='Lurker_t',
min_maps=MIN_MAPS
)
print("T-Side Role Contrast: Spacetaker vs Lurker")
print("=" * 80)
display(result_t)
result_t.to_csv(RESULTS_DIR / "tables" / "hypo_test_t_spac_vs_lurk.csv", index=False)
T-Side Role Contrast: Spacetaker vs Lurker ================================================================================
| feature | n_Spacetaker | n_Lurker | mean_Spacetaker | mean_Lurker | cohens_d | t_stat | p_welch | U_stat | p_mw | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | OAP_T | 31 | 24 | 24.509 | 17.683 | 1.447 | 5.545 | 9.59e-07 | 634.0 | 9.08e-06 |
| 1 | TAPD_T | 31 | 24 | 58.001 | 61.890 | -1.095 | -4.105 | 0.0001 | 167.0 | 0.0005 |
| 2 | PODT_T | 31 | 24 | 24.966 | 24.899 | 0.020 | 0.073 | 0.9425 | 375.0 | 0.9662 |
| 3 | POKT_T | 31 | 24 | 21.489 | 25.073 | -0.993 | -3.677 | 0.0006 | 186.0 | 0.0016 |
| 4 | ADNT_RANK_T | 31 | 24 | 0.572 | 0.722 | -1.498 | -5.602 | 8.15e-07 | 107.0 | 7.16e-06 |
| 5 | ADAT_RANK_T | 31 | 24 | 0.545 | 0.755 | -2.205 | -8.351 | 3.09e-11 | 46.0 | 3.31e-08 |
Observations:
- OAP: Spacetakers engage opening duels substantially more (24.5% vs 17.7%, d = 1.45, p < 0.001).
- TAPD: Spacetakers die ~3.9s earlier (58.0s vs 61.9s, d = -1.09, p < 0.001).
- PODT: No difference in being traded (both ~25%, d = 0.02, p = 0.94).
- POKT: Lurkers execute more trades (25% vs 21%, d = -0.99, p < 0.001).
- Positioning: Lurkers position further from nearest teammate (ADNT: d = -1.50) and from team centroid (ADAT: d = -2.20), both p < 0.001. Largest effect sizes observed.
# CT-side: Rotator (aggressive) vs Anchor (passive)
result_ct = test_role_contrast(
df=df,
role_a='Rotator_ct',
role_b='Anchor_ct',
min_maps=MIN_MAPS
)
print("CT-Side Role Contrast: Rotator vs Anchor")
print("=" * 80)
display(result_ct)
result_ct.to_csv(RESULTS_DIR / "tables" / "hypo_test_ct_rot_vs_anc.csv", index=False)
CT-Side Role Contrast: Rotator vs Anchor ================================================================================
| feature | n_Rotator | n_Anchor | mean_Rotator | mean_Anchor | cohens_d | t_stat | p_welch | U_stat | p_mw | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | OAP_CT | 28 | 21 | 21.756 | 17.804 | 1.261 | 4.376 | 7.44e-05 | 484.0 | 0.0001 |
| 1 | TAPD_CT | 28 | 21 | 59.032 | 61.949 | -0.969 | -3.299 | 0.0020 | 142.0 | 0.0022 |
| 2 | PODT_CT | 28 | 21 | 20.151 | 19.057 | 0.431 | 1.507 | 0.1388 | 372.0 | 0.1174 |
| 3 | POKT_CT | 28 | 21 | 16.150 | 15.878 | 0.135 | 0.449 | 0.6560 | 295.0 | 0.9919 |
| 4 | ADNT_RANK_CT | 28 | 21 | 0.512 | 0.711 | -2.088 | -7.323 | 3.52e-09 | 41.0 | 3.37e-07 |
| 5 | ADAT_RANK_CT | 28 | 21 | 0.493 | 0.773 | -2.591 | -9.100 | 8.65e-12 | 22.0 | 4.13e-08 |
Observations:
- OAP: Rotators engage opening duels more (21.7% vs 17.8%, d = 1.26, p < 0.0001).
- TAPD: Rotators die ~2.9s earlier (59.0s vs 61.9s, d = -0.97, p = 0.0022).
- PODT & POKT: No statistically significant difference in trading metrics.
- Positioning: Anchors position further from nearest teammate (ADNT: d = -2.09) and from team centroid (ADAT: d = -2.59), both p < 0.001.
- Effect sizes: Positioning metrics now show effect sizes comparable to T-side; aggression metrics remain smaller.
#4 Role Distinctiveness: Which roles have the most unique behavioral signatures?¶
What we measure
- Mahalanobis D²: Squared distance from each role's mean feature vector to the mean of all other roles on the same side. Accounts for feature scales and correlations via the covariance matrix. Higher values = more distinct playstyle.
- Feature contributions: Squared standardised difference per feature
((mean_role - mean_others) / pooled_SD)². Shows which behaviors drive distinctiveness. These approximately sum to D² (exact sum requires accounting for feature correlations).
Why Mahalanobis distance? Unlike simple Euclidean distance, it automatically handles different feature scales (OAP is percentage, TAPD is seconds) and doesn't double-count correlated features (e.g., ADNT and ADAT). One unit of Mahalanobis distance represents "equally surprising" deviation across all features.
Interpretation: Roles are compared within their tactical context (T roles vs other T roles; CT roles vs other CT roles). This isolates role-specific distinctiveness from side-level behavioral differences. Comparing maximum D² values across sides reveals whether one side's roles are generally more distinct than the other's.
# T-side distinctiveness
distinct_t = compute_role_distinctiveness(
df=df,
side='t',
min_maps=MIN_MAPS
)
distinct_t.to_csv(RESULTS_DIR / "tables" / "distinct_t_roles.csv", index=False)
print("T-Side Role Distinctiveness")
print("=" * 80)
display(distinct_t)
T-Side Role Distinctiveness ================================================================================
| role | n | mahal_d_sq | oap_contrib | tapd_contrib | podt_contrib | pokt_contrib | adnt_contrib | adat_contrib | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | AWPer | 17 | 4.396 | 1.607 | 1.234 | 1.036 | 0.523 | 2.318 | 2.052 |
| 1 | Lurker | 24 | 2.999 | 0.349 | 0.237 | 0.070 | 0.282 | 1.576 | 2.199 |
| 2 | Spacetaker | 31 | 2.126 | 1.354 | 0.623 | 0.110 | 0.858 | 0.082 | 0.308 |
| 3 | Half-Lurker | 12 | 0.922 | 0.196 | 0.600 | 0.074 | 0.006 | 0.211 | 0.223 |
T-side observations:
- AWPer is most distinct (D² = 4.40), driven primarily by positioning (ADNT: 2.32, ADAT: 2.05) and aggression metrics (OAP: 1.61, TAPD: 1.23). All features contribute meaningfully.
- Lurker shows moderate distinctiveness (D² = 3.00), primarily from positioning (ADAT: 2.20, ADNT: 1.58). Aggression and trading metrics contribute minimally.
- Spacetaker is moderately distinct (D² = 2.13), driven by aggression (OAP: 1.35, POKT: 0.86, TAPD: 0.62) rather than positioning.
- Half-Lurker is least distinct (D² = 0.92), with modest TAPD contribution (0.60) and minimal separation on other features.
# CT-side distinctiveness
distinct_ct = compute_role_distinctiveness(
df=df,
side='ct',
min_maps=MIN_MAPS
)
distinct_ct.to_csv(RESULTS_DIR / "tables" / "distinct_ct_roles.csv", index=False)
print("CT-Side Role Distinctiveness")
print("=" * 80)
display(distinct_ct)
CT-Side Role Distinctiveness ================================================================================
| role | n | mahal_d_sq | oap_contrib | tapd_contrib | podt_contrib | pokt_contrib | adnt_contrib | adat_contrib | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | Anchor | 21 | 2.746 | 0.822 | 0.263 | 0.082 | 0.015 | 1.693 | 2.462 |
| 1 | AWPer | 17 | 1.895 | 0.182 | 0.272 | 0.148 | 0.016 | 0.002 | 0.332 |
| 2 | Rotator | 28 | 1.757 | 0.456 | 0.539 | 0.068 | 0.003 | 1.408 | 1.233 |
| 3 | Mixed | 18 | 0.375 | 0.084 | 0.010 | 0.117 | 0.035 | 0.005 | 0.074 |
CT-side observations:
- Anchor is most distinct (D² = 2.75), driven primarily by positioning (ADAT: 2.46, ADNT: 1.69) with moderate OAP contribution (0.82).
- AWPer shows moderate distinctiveness (D² = 1.90), with relatively balanced contributions across positioning (ADNT: 0.002, ADAT: 0.33), aggression (OAP: 0.18, TAPD: 0.27), and trading (PODT: 0.15).
- Rotator is moderately distinct (D² = 1.76), driven by positioning (ADNT: 1.41, ADAT: 1.23) and TAPD (0.54).
- Mixed is least distinct (D² = 0.38), with minimal separation across all features—consistent with its label as a hybrid/flexible role.
Summary and Interpretations¶
The exploratory hypothesis checks quantify key patterns observed in Section 3, providing statistical validation for claims about side differences, role separation, and behavioral distinctiveness.
Side-level differences confirmed
T-side trades more consistently
Both PODT and POKT are significantly higher on T (mean Δ ≈ +4.6 pp and +7.6 pp respectively, both p < 0.001). This reflects T-side's coordinated aggression—attackers push together, ensuring deaths are traded and kills often come from follow-up engagements. CT-side trading is lower and more variable, consistent with defenders reacting to attacks rather than initiating contact.CT-side playstyles are tighter but less distinct
CT features show consistently lower variability (SD ratios 0.57–0.85, most permutation p < 0.05), confirming more constrained defensive positioning and timing. However, role distinctiveness (Mahalanobis D²) is lower on CT: the most distinct CT role (Anchor, D² = 2.75) is less separated than T AWPers (D² = 4.40) or even T Lurkers (D² = 3.00). This suggests CT roles are structurally rigid but behaviorally homogeneous—players hold specific positions but express similar engagement patterns within those constraints.
The aggression spectrum quantified
Role contrast tests confirm Section 3's observation of an aggression ↔ passive continuum:
T-side (Spacetaker vs Lurker): Large effect sizes on OAP (d = 1.45) and positioning (ADNT: d = -1.50, ADAT: d = -2.20). Spacetakers engage early and stay closer to teammates; Lurkers delay contact and occupy peripheral positions. Notably, there isn't a significant different in proportion of deaths traded (PODT: p = 0.94), this is counter-intuitive to the assumption that the more "sacrificial" Spacetakers would be traded more often. This may be due to a wide trading time window (5s) not efficiently detecting "true" trades, or perhaps professional players are so good at trading each other generally that the differences in trading dynamics only manifest in the proportion of kills that are trades (POKT).
CT-side (Rotator vs Anchor): Similar pattern with strong positioning effects (OAP: d = 1.26; ADNT: d = -2.09, ADAT: d = -2.59). Trading metrics show no significant difference. Positioning effect sizes now rival T-side comparisons, while aggression metrics remain smaller.
Positioning dominates role identity
Distinctiveness analysis reveals that positioning features (ADNT, ADAT) contribute most to role separation:
T AWPers are uniquely distinct (D² = 4.40), with roughly balanced contributions from positioning (ADNT: 2.32, ADAT: 2.05) and aggression (OAP: 1.61, TAPD: 1.23). No other role shows this multi-dimensional separation. This can be seen visually by their radar chart in section 3.
Lurkers (D² = 3.00) separate almost entirely on positioning (ADAT: 2.20, ADNT: 1.58), with minimal aggression contributions—consistent with their peripheral, late-contact playstyle.
Spacetakers (D² = 2.13) separate primarily on aggression metrics (OAP: 1.35, POKT: 0.86), reflecting frontline engagement rather than spatial positioning.
CT roles show lower peak distinctiveness and rely heavily on positioning: Anchors (D² = 2.75) separate via ADAT (2.46) and ADNT (1.69), while even AWPers contribute minimally on ADNT (0.00), suggesting CT AWPers blend positionally with other roles.
Key takeaway: T AWPers emerge as the most behaviorally standardised and distinct role, validating Section 3's hypothesis. Their weapon constraints and economic cost enforce a recognisable playstyle across players. In contrast, CT roles—particularly "Mixed" (D² = 0.38)—show minimal separation, reflecting the reactive, context-dependent nature of CT-side play where positional assignments exist but behavioral execution converges.
Implications for modelling
These findings suggest:
- T-side roles should cluster more cleanly in unsupervised analysis due to higher distinctiveness and separation.
- Positioning features will be critical for role classification, especially for Lurkers and Anchors.
- CT-side classification will be harder due to lower distinctiveness and greater behavioral homogeneity—expect higher confusion between Rotator/Anchor/Mixed.
Note on label corrections: After correcting two mislabelled players in the original dataset (Rotator ↔ Anchor swaps), the CT-side results strengthened noticeably. Anchor distinctiveness increased from D² ≈ 2.1 to D² = 2.75, and the positioning effect sizes in the Rotator vs Anchor contrast grew from |d| ≈ 1.7–2.0 to |d| > 2.0 (ADNT: -2.09, ADAT: -2.59). These improvements suggest the original labels contained noise that compressed role separation; the corrected labels better reflect the underlying behavioral differences between CT roles.
5. Feature Correlations¶
Purpose: This section examines pairwise Pearson correlations between behavioral features on each side. Understanding feature relationships informs downstream modelling choices—particularly whether features provide independent signal or share underlying structure. We compute correlations separately for T-side and CT-side to capture any side-specific patterns in how features co-vary.
Methodology: Correlations are computed on the stable cohort (map_count ≥ 40, n=84) to ensure reliable estimates. We display both sides in a split-diagonal heatmap for direct comparison: each cell's upper-right triangle shows the T-side correlation, while the lower-left triangle shows CT-side. The diagonal uses side colors to clearly present this structure.
# Compute correlation matrices for each side
corr_t = compute_feature_correlations(df, min_maps=MIN_MAPS, side="t")
corr_ct = compute_feature_correlations(df, min_maps=MIN_MAPS, side="ct")
# Display tables
print("T-side Feature Correlations (Pearson):")
display(corr_t.round(3))
print("\n" + "="*60 + "\n")
print("CT-side Feature Correlations (Pearson):")
display(corr_ct.round(3))
# Save tables
corr_t.to_csv(TAB_DIR / "correlation_matrix_t.csv")
corr_ct.to_csv(TAB_DIR / "correlation_matrix_ct.csv")
T-side Feature Correlations (Pearson):
| TAPD | OAP | PODT | POKT | ADNT_RANK | ADAT_RANK | |
|---|---|---|---|---|---|---|
| TAPD | 1.000 | -0.600 | -0.146 | 0.402 | -0.084 | -0.077 |
| OAP | -0.600 | 1.000 | 0.327 | -0.601 | 0.051 | 0.047 |
| PODT | -0.146 | 0.327 | 1.000 | -0.091 | 0.033 | 0.098 |
| POKT | 0.402 | -0.601 | -0.091 | 1.000 | -0.230 | -0.159 |
| ADNT_RANK | -0.084 | 0.051 | 0.033 | -0.230 | 1.000 | 0.919 |
| ADAT_RANK | -0.077 | 0.047 | 0.098 | -0.159 | 0.919 | 1.000 |
============================================================ CT-side Feature Correlations (Pearson):
| TAPD | OAP | PODT | POKT | ADNT_RANK | ADAT_RANK | |
|---|---|---|---|---|---|---|
| TAPD | 1.000 | -0.567 | -0.112 | 0.094 | 0.204 | 0.207 |
| OAP | -0.567 | 1.000 | 0.099 | -0.310 | -0.195 | -0.313 |
| PODT | -0.112 | 0.099 | 1.000 | 0.090 | -0.117 | -0.060 |
| POKT | 0.094 | -0.310 | 0.090 | 1.000 | -0.214 | -0.022 |
| ADNT_RANK | 0.204 | -0.195 | -0.117 | -0.214 | 1.000 | 0.781 |
| ADAT_RANK | 0.207 | -0.313 | -0.060 | -0.022 | 0.781 | 1.000 |
# Generate split-diagonal correlation heatmap
n_stable = (df["map_count"] >= MIN_MAPS).sum()
fig, ax = plot_correlation_split_heatmap(
corr_t=corr_t,
corr_ct=corr_ct,
save_path=FIG_DIR / "correlation_heatmap_split",
title="CT vs T Side Feature Correlations",
n_stable=n_stable,
)
plt.show()
Summary and Interpretations¶
Overall structure: weak to moderate correlations
Most feature pairs show weak correlations (|r| < 0.3), indicating that the behavioral dimensions captured by our features are largely independent. This is favorable for downstream modelling, features provide distinct perspectives on playstyle rather than redundant information. The few moderate-to-strong correlations that do exist reveal interpretable patterns.
Aggression and timing: moderate inverse relationship
TAPD and OAP correlate negatively on both sides (r = -0.60 T, -0.57 CT), confirming that players who engage in opening duels tend to die earlier in rounds. This is mechanically expected, early contact increases risk. The similar magnitude across sides suggests this phenomena is a universal behavioral pattern rather than side-dependent.
Trading metrics: side-dependent structure
Trading features (PODT, POKT) show different patterns by side:
- T-side: POKT correlates negatively with OAP (r = -0.60), suggesting that players who initiate openings are less likely to secure trade kills themselves—consistent with entry fraggers creating space for teammates to trade. PODT and POKT are uncorrelated (r = -0.09), indicating these capture different aspects of trading behavior.
- CT-side: The weaker negative correlation between OAP and POKT (r = -0.31) persists but is less pronounced than on T. Similarly, PODT and POKT appear uncorrelated.
Positioning features: strong correlation demands further investigation
ADNT and ADAT correlate strongly on both sides (r = 0.92 T, 0.78 CT), indicating that players distant from their nearest teammate also tend to be far from the team centroid. This is geometrically intuitive—peripheral players are both isolated and far from center—but raises a key question: do these features provide redundant information, or does each capture unique aspects of positioning?
The correlation is not perfect (especially on CT: r = 0.78), suggesting residual variance that may relate to role-specific positioning patterns. For instance, a player could be far from the centroid but still near one teammate (lurking near but not with the group), or equidistant from all teammates (true isolation). Section 6 will investigate this relationship directly by fitting ADNT ~ ADAT regressions per side and analyzing the residuals, allowing us to separate shared positional variance from role-specific positioning tendencies.
6. Positional Feature Analysis & Engineering¶
TL;DR — Positional Feature Analysis & Engineering: We decompose the correlation between ADNT and ADAT via linear regression, extracting positioning residuals that capture players' tendency to position far from (or close to) the team center after accounting for their isolation from the nearest teammate. Role-stratified residual patterns reveal systematic positioning styles, and we save an enriched dataset with these orthogonal features for downstream modelling.
Details (click to expand)
Motivation.
ADNT (distance to nearest teammate) and ADAT (distance from team center) are strongly correlated (ρ = 0.92 for T, 0.78 for CT, from Section 5). This reflects a natural tendency: players isolated from their nearest teammate are often also far from the team center. However, the residual variation—how far from center a player positions given their isolation—may encode strategically meaningful positioning information (e.g., lurkers deliberately positioning away from the group). By extracting this orthogonal component, we create an interpretable positioning feature uncorrelated with ADNT.
What we do.
Fit side-specific linear regressions (T and CT separately) on the stable subset (map_count ≥ 40) to model the ADNT–ADAT relationship. This establishes the "expected" distance from center given isolation level.
Compute residuals for all players, representing positioning tendency independent of isolation. Positive residuals indicate positioning farther from center than predicted; negative residuals indicate staying closer to center.
Visualise the regression relationship interactively, allowing inspection of how individual players and roles cluster around the fitted line, and exploration of which players deviate most from the expected pattern.
Analyse role-stratified residual patterns to identify which roles systematically position away from (or closer to) the team center after accounting for isolation, and which roles exhibit high within-role variability in positioning style.
Why this matters.
Creating an orthogonal positioning feature (residual) alongside ADNT gives us two uncorrelated dimensions that, together, capture the same positioning information as the original ADNT–ADAT pair but without redundancy. This aids interpretability and reduces multicollinearity for downstream modelling. We retain all three features (ADNT, ADAT, residual) in the saved dataset, allowing feature selection to determine the optimal subset for each task.
Output.
- New features:
adat_residual_t,adat_residual_ctadded to all players. - Saved dataset:
cs2_playstyles_2024_with_residuals.parquetindata/processed/, ready for downstream analysis. - All three positioning features per side (ADNT, ADAT, residual) retained; modelling notebooks will determine optimal feature subsets.
Technical notes.
- Regressions fit only on stable players (≥40 maps) to ensure robust model estimates.
- Residuals computed for all players, including those below the stability threshold, but interpret cautiously for small samples.
- IQR error bars show the middle 50% of players within each role, providing interpretable spread rather than statistical uncertainty in the mean.
Fitting the linear regressions
models = fit_positioning_regressions(df, MIN_MAPS)
T-side regression: ADAT = 0.979 × ADNT + 0.012 CT-side regression: ADAT = 1.009 × ADNT + -0.006
Producing dataframe containing the linear regression residuals for both sides
pos_residuals_df = compute_positioning_residuals(df, models)
pos_residuals_df.head()
| steamid | player_name | team_clan_name | map_count | adnt_rank_t | adat_rank_t | role_t | adnt_rank_ct | adat_rank_ct | role_ct | adat_residual_t | adat_residual_ct | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 76561198041683378 | NiKo | G2 Esports | 158 | 0.621199 | 0.695336 | Spacetaker | 0.562493 | 0.547525 | Rotator | 0.074668 | -0.014643 |
| 1 | 76561198012872053 | huNter | G2 Esports | 158 | 0.643004 | 0.615021 | Lurker | 0.480859 | 0.406082 | Rotator | -0.026997 | -0.073699 |
| 2 | 76561198074762801 | m0NESY | G2 Esports | 155 | 0.423733 | 0.409889 | AWPer | 0.577785 | 0.515617 | AWPer | -0.017438 | -0.061984 |
| 3 | 76561197991272318 | ropz | FaZe Clan | 152 | 0.871810 | 0.879350 | Lurker | 0.713902 | 0.761098 | Anchor | 0.013306 | 0.046121 |
| 4 | 76561197997351207 | rain | FaZe Clan | 152 | 0.647796 | 0.616937 | Half-Lurker | 0.607561 | 0.606098 | Mixed | -0.029772 | -0.001556 |
Verify that the computed residuals are uncorrelated with ADNT, as guaranteed by OLS regression properties.
print("Correlation between ADNT and ADAT residuals:\n")
for side in ['t', 'ct']:
adnt_col = f'adnt_rank_{side}'
resid_col = f'adat_residual_{side}'
stable_data = pos_residuals_df[pos_residuals_df['map_count'] >= MIN_MAPS]
corr = stable_data[[adnt_col, resid_col]].corr().iloc[0, 1]
print(f"{side.upper()}-side: ρ(ADNT, residual) = {corr:.4f}")
Correlation between ADNT and ADAT residuals: T-side: ρ(ADNT, residual) = 0.0000 CT-side: ρ(ADNT, residual) = 0.0000
Interactive Positioning Scatter: ADNT vs ADAT by Role¶
Side-by-side scatter plots showing the relationship between distance to nearest teammate (ADNT) and distance from team center (ADAT) for T and CT sides. The regression line shows the expected ADAT given ADNT; points are colored by role. Toggle roles via the legends to isolate specific positioning patterns, and hover over points to view player names and residual values.
- Double-Click a plot to return it to the default view and shift-drag to pan around
fig_positioning = plot_positioning_regression_interactive(pos_residuals_df, models, MIN_MAPS)
fig_positioning.show(renderer="notebook")
fig_positioning.write_html(FIG_DIR / "pos_features_scatter.html", include_plotlyjs="embed")
Observations
T side:
- Lurkers tend to position farther from the team center than predicted by their isolation (positive residuals).
- Spacetakers show the opposite pattern (negative residuals), staying closer to the center despite isolation, though with more variability.
- AWPers and Half-Lurkers show no observable systematic deviation from the regression line.
CT side:
- Residuals show heteroscedasticity: variance decreases as ADNT increases, largely driven by high-variance Rotators at low ADNT and tighter-clustering Anchors at high ADNT. This contributes to the lower overall correlation (ρ = 0.78 vs 0.92 on T).
- Anchors cluster above the line (positive residuals), positioning farther from center than their isolation predicts.
- AWPers show negative residuals, holding positions closer to the team center.
- Rotators exhibit high residual variance with no clear directional tendency, reflecting diverse positioning approaches within the role.
Role-Stratified Residual Analysis¶
Mean residuals (left) show systematic positioning tendencies after accounting for isolation (ADNT): positive values indicate roles positioning farther from team center than predicted; negative values indicate closer positioning. IQR error bars show the middle 50% of players within each role. Standard deviations (right) quantify within-role variability in positioning style.
fig_residuals = plot_positioning_residuals_by_role(pos_residuals_df, models, MIN_MAPS)
plt.show()
Observations from role-stratified residuals:
Mean residuals (systematic tendencies):
- Patterns observed in the scatter plots are confirmed: Lurkers (T) and Anchors (CT) consistently position farther from center than predicted (entirely positive IQRs), while Spacetakers (T) and AWPers (CT) position closer (predominantly negative residuals).
- CT AWPers show the strongest systematic deviation (mean ≈ -0.07), holding positions notably closer to the team center than their isolation alone would predict.
Variance (within-role consistency):
- Most roles exhibit similar residual variance (SD ≈ 0.04–0.07), indicating moderate consistency in positioning style within roles.
- CT Rotators are a clear outlier with substantially higher variance (SD ≈ 0.11), reflecting diverse positioning approaches within this role—consistent with the "rotator" label implying adaptive, context-dependent positioning rather than fixed positioning patterns.
Saving residual features to a new parquet file
# Add residuals to original dataframe
# Use merge to ensure proper alignment on steamid (prevents NaN values from index misalignment)
enriched_df = df.merge(
pos_residuals_df[['steamid', 'adat_residual_t', 'adat_residual_ct']],
on='steamid',
how='left'
)
# Save as parquet
enriched_df.to_parquet(DATA_DIR / "processed" / "cs2_playstyles_2024_with_residuals.parquet", index=False)
Summary and Interpretations¶
Positioning features decomposed via regression
The strong correlation between ADNT and ADAT (r = 0.92 T, 0.78 CT) indicated shared variance—players far from their nearest teammate tend to be far from the team center. By fitting side-specific linear regressions (ADAT ~ ADNT) and extracting residuals, we have successfully decomposed this relationship into two orthogonal components: ADNT (isolation from nearest teammate) and residual (positioning tendency relative to team center, after accounting for isolation). This transformation retains positioning information whilst eliminating multicollinearity, improving interpretability for downstream modelling.
Role-specific positioning tendencies revealed
Residual analysis reveals systematic positioning strategies that distinguish roles beyond simple isolation:
- T-side Lurkers and CT-side Anchors consistently position farther from the team center than their isolation predicts (entirely positive IQRs), reflecting deliberate off-angle or site-holding strategies that prioritise map control over proximity to teammates.
- T-side Spacetakers and CT-side AWPers show the opposite pattern—staying closer to the center despite isolation. CT AWPers exhibit the strongest effect (mean ≈ -0.07), consistent with holding fixed, central sightlines rather than peripheral positions.
- These patterns align with tactical expectations: lurkers operate independently on the periphery; anchors hold isolated sites; spacetakers lead coordinated pushes (staying near the action); AWPers hold key angles with team support.
Within-role variance highlights positioning flexibility
Most roles show moderate positioning consistency (SD ≈ 0.04–0.07), but CT Rotators are a clear outlier with substantially higher variance (SD ≈ 0.11). This reflects the adaptive nature of rotation: players must adjust positioning dynamically based on information, rather than holding fixed positions like Anchors. This high variance validates the "rotator" role label and suggests this role may be harder to characterise via static positioning metrics than other roles.
CT positioning is more heterogeneous than T
The weaker CT correlation (r = 0.78 vs 0.92) and heteroscedastic residuals (variance decreasing with ADNT) indicate more varied positioning relationships on defense. This is driven by role-specific patterns: low-ADNT Rotators show high variance (adaptive positioning), whilst high-ADNT Anchors cluster more tightly (fixed site holds).
Feature engineering outcome
We have added adat_residual_t and adat_residual_ct to the dataset, providing interpretable, orthogonal positioning features for clustering and classification. The enriched dataset (cs2_playstyles_2024_with_residuals.parquet) retains all three positioning features per side (ADNT, ADAT, residual), allowing downstream feature selection to determine optimal subsets empirically. Early indications suggest ADNT + residual may be preferable to ADNT + ADAT due to reduced redundancy, but this will be validated through modelling performance.
Summary of Exploratory Data Analysis and Feature Engineering¶
In this notebook we:
- Analysed the number of maps required to stabilise the playstyle features (~40). This gives us confidence that further analysis will pick up more signal, rather than noise.
- Visualised and quantified (mean/sd) the distribution of features when considering side (T/CT) and assigned "role". This allows for analysis and presentation of how playstyles differ across sides and roles.
- Quantified patterns observed by the feature distributions via exploratory hypothesis tests and computing mahalanobis distance. This made our analysis more robust so that we can be more confident in our inferences.
- Computed and visualised correlations between features within each side, enabling some analysis of how features linearly interact pairwise.
- Investigated the relationship between positional features (ADNT / ADAT), sparking the production of a new feature aiming to potentially reduce correlation for future modelling work.