01 - Exploratory Data Analysis and Feature Engineering¶
Executive Summary¶
This analysis characterises the playstyles of 84 professional CS2 players to determine if distinct in-game roles exhibit measurable statistical signatures.
Key Findings:
- Data Stability: Established a reliability threshold of 40 maps, confirming that playstyle metrics stabilize sufficiently to distinguish signal from noise.
- Role Signatures:
- Lurkers (T) and Anchors (CT) show distinct positioning behaviour, consistently operating far from the team centre even when controlling for isolation.
- Spacetakers (T) and Rotators (CT) exhibit measurably more aggressive profiles, engaging in opening duels significantly more often (OAP) and surviving for less time (TAPD) than their passive counterparts.
- CT Rotators display significantly higher variance in positioning than Anchors, quantitatively validating their role as adaptive/reactive rather than static.
- Feature Engineering: Identified high multicollinearity ($r > 0.9$) between isolation metrics (
ADNT) and team-distance metrics (ADAT).- Solution: Decomposed this relationship using linear regression to create orthogonal residual features.
- Outcome: New features that isolate "positioning preference" from "teammate proximity," theoretically improving discriminative power for downstream classification models.
Outcome: A validated, engineered dataset containing unique behavioural fingerprints for each player, ready for supervised role classification.
Note: All analysis is performed separately for T and CT sides to preserve tactical context and enable side-specific modelling decisions.
Objectives¶
1. Data Stability & Inclusion Criteria
Estimate per-player measurement uncertainty as a function of map count using bootstrap standard errors. Determine a principled inclusion threshold (MIN_MAPS) balancing precision and sample size.
2. Feature Distributions & Role Profiles
Examine univariate distributions of behavioral metrics by side and role using kernel density estimates. Visualise role-level playstyle signatures via standardised radar plots to assess feature discriminability.
3. Statistical Validation
Quantify key patterns via formal hypothesis testing:
- Side-level differences (T vs CT) in trading behavior and variability
- Role contrasts (aggressive vs passive) across all behavioral dimensions
- Role distinctiveness via Mahalanobis distance in feature space
4. Feature Correlations
Compute pairwise Pearson correlations to identify shared variance and inform feature selection for clustering and classification.
5. Positional Feature Engineering
Decompose the ADNT–ADAT correlation via side-specific linear regression. Extract positioning residuals that capture role-specific tendencies (peripheral vs central positioning) independent of isolation effects.
6. Dataset Export
Produce a refined, analysis-ready dataset with engineered features, metadata flags, and complete documentation for reproducible downstream modelling.
Outputs¶
Tables & Summaries:
- Data coverage statistics and inclusion threshold justification
- Feature summary statistics by side and role
- Hypothesis test results with effect sizes and confidence intervals
- Correlation matrices (T and CT sides)
- Role distinctiveness rankings with feature contributions
Visualisations:
- Stability curves (standard error vs map count)
- Side-comparison KDEs for all features
- Role-stratified KDEs (T and CT)
- Interactive radar plots (role profiles by side)
- Split-diagonal correlation heatmap
- Positioning regression scatter plots with role stratification
- Role-stratified residual distributions
Processed Dataset:
data/processed/cs2_playstyles_2024_with_residuals.parquet
Contains all original features plus engineered positioning residuals (adat_residual_t, adat_residual_ct). Data remains unscaled to preserve interpretability and enable flexible preprocessing in subsequent notebooks.
Note: This notebook establishes the
MIN_MAPS = 40inclusion threshold and defines the stable cohort (n=84) used throughout the project. The derived residual features provide orthogonal positional information that reduces multicollinearity while retaining role-specific positioning signal.
1. Setup & Data Overview¶
No analysis here, feel free to skip to section 2
Configure imports, display options, and project paths. Load the dataset and report structure (shape, dtypes, missingness) and coverage tables to anchor later analysis. Plots are deferred to their relevant sections to keep the notebook focused.
# === Setup: paths, imports, theme ===
from pathlib import Path
import sys
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
# Dev convenience
%load_ext autoreload
%autoreload 2
# Display options
pd.set_option("display.max_columns", 120)
pd.set_option("display.width", 120)
# Resolve repository root (if in notebooks/, step up one level)
REPO_ROOT = Path.cwd().resolve().parent if Path.cwd().name.lower() == "notebooks" else Path.cwd().resolve()
# Canonical paths
DATA_DIR = REPO_ROOT / "data"
RESULTS_DIR = REPO_ROOT / "results" / "eda"
FIG_DIR = RESULTS_DIR / "figures"
TAB_DIR = RESULTS_DIR / "tables"
# Ensure results dirs exist
FIG_DIR.mkdir(parents=True, exist_ok=True)
TAB_DIR.mkdir(parents=True, exist_ok=True)
# Dataset path
DATA_PATH = DATA_DIR / "raw" / "cs2_playstyle_roles_2024.csv"
assert DATA_PATH.exists(), f"Dataset not found at {DATA_PATH}"
# Local helpers
SRC_DIR = REPO_ROOT / "src"
if str(SRC_DIR) not in sys.path:
sys.path.insert(0, str(SRC_DIR))
from style import (
set_mpl_theme,
set_seaborn_theme,
ROLE_COLOURS,
get_role_colour,
)
from stability import (
rates_quantile_table,
tapd_quantile_table,
retained_summary,
)
from viz import (
plot_rates_quantile,
plot_tapd_quantile,
plot_mapcount_hist,
summarize_side_stats,
plot_kdes_roles_by_side,
plot_kdes_side_compare,
summarize_role_stats_by_side,
plot_correlation_split_heatmap,
compute_feature_correlations,
plot_positioning_residuals_by_role
)
from hypo_tests import (
test_side_trade_increase,
test_side_variability_ct_less,
test_role_contrast,
test_role_contrast,
compute_role_distinctiveness,
)
from viz_plotly import (
plot_role_radars_interactive,
plot_positioning_regression_interactive,
)
from features import (
fit_positioning_regressions,
compute_positioning_residuals,
)
# Themes
set_mpl_theme(mode="dark", preferred_font="Georgia")
set_seaborn_theme(mode="dark", preferred_font="Georgia")
# Echo key paths
REPO_ROOT, DATA_PATH, FIG_DIR, TAB_DIR
(WindowsPath('P:/cs2-playstyle-analysis-2024'),
WindowsPath('P:/cs2-playstyle-analysis-2024/data/raw/cs2_playstyle_roles_2024.csv'),
WindowsPath('P:/cs2-playstyle-analysis-2024/results/eda/figures'),
WindowsPath('P:/cs2-playstyle-analysis-2024/results/eda/tables'))
Load & structural overview¶
Inspect shape, preview rows, data types, and missingness. Identify side-suffixed fields for later grouping.
# Load dataset and run structural checks
df = pd.read_csv(DATA_PATH)
# Shape and a compact preview
print("shape:", df.shape)
display(df.head(3))
# Column summary
col_info = (
pd.DataFrame({
"column": df.columns,
"dtype": df.dtypes.astype(str),
"missing_rate": df.isna().mean().round(4)
})
.sort_values(["missing_rate", "column"], ascending=[False, True])
)
display(col_info.head(25)) # preview top 25 by missingness
# Quick counts by dtype for a high-level feel
dtype_counts = col_info["dtype"].value_counts().rename_axis("dtype").to_frame("n")
display(dtype_counts)
# Light check for commonly expected identifiers (report only)
expected = ["steamid", "player_name", "team_clan_name", "map_count"]
present = [c for c in expected if c in df.columns]
missing = [c for c in expected if c not in df.columns]
print("present expected cols:", present)
print("missing expected cols:", missing)
# Detect side-suffixed families (useful for later grouping)
suffixes = ("_t", "_ct", "_overall")
side_cols = {suf: [c for c in df.columns if c.endswith(suf)] for suf in suffixes}
{key: len(val) for key, val in side_cols.items()}
shape: (306, 25)
| steamid | player_name | team_clan_name | map_count | tapd_ct | tapd_t | tapd_overall | oap_ct | oap_t | oap_overall | podt_ct | podt_t | podt_overall | pokt_ct | pokt_t | pokt_overall | adnt_rank_ct | adnt_rank_t | adnt_rank_overall | adat_rank_ct | adat_rank_t | adat_rank_overall | role_overall | role_t | role_ct | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 76561198041683378 | NiKo | G2 Esports | 158 | 60.952893 | 59.136540 | 60.136000 | 24.745965 | 24.093423 | 24.424242 | 21.020276 | 24.857741 | 22.507740 | 17.051295 | 21.586555 | 19.197995 | 0.562493 | 0.621199 | 0.593089 | 0.547525 | 0.695336 | 0.616046 | Lurker | Spacetaker | Rotator |
| 1 | 76561198012872053 | huNter | G2 Esports | 158 | 62.048685 | 62.871661 | 62.589800 | 16.852540 | 14.807692 | 15.847511 | 21.585198 | 27.994772 | 24.696747 | 17.195516 | 27.180894 | 22.538284 | 0.480859 | 0.643004 | 0.571875 | 0.406082 | 0.615021 | 0.455108 | Flex | Lurker | Rotator |
| 2 | 76561198074762801 | m0NESY | G2 Esports | 155 | 62.786553 | 66.632594 | 64.362519 | 23.914373 | 17.754078 | 20.873335 | 19.122381 | 23.094640 | 21.473108 | 17.397469 | 26.423178 | 21.056274 | 0.577785 | 0.423733 | 0.453028 | 0.515617 | 0.409889 | 0.427645 | AWPer | AWPer | AWPer |
| column | dtype | missing_rate | |
|---|---|---|---|
| role_overall | role_overall | object | 0.4183 |
| role_ct | role_ct | object | 0.3856 |
| role_t | role_t | object | 0.3856 |
| adat_rank_ct | adat_rank_ct | float64 | 0.0000 |
| adat_rank_overall | adat_rank_overall | float64 | 0.0000 |
| adat_rank_t | adat_rank_t | float64 | 0.0000 |
| adnt_rank_ct | adnt_rank_ct | float64 | 0.0000 |
| adnt_rank_overall | adnt_rank_overall | float64 | 0.0000 |
| adnt_rank_t | adnt_rank_t | float64 | 0.0000 |
| map_count | map_count | int64 | 0.0000 |
| oap_ct | oap_ct | float64 | 0.0000 |
| oap_overall | oap_overall | float64 | 0.0000 |
| oap_t | oap_t | float64 | 0.0000 |
| player_name | player_name | object | 0.0000 |
| podt_ct | podt_ct | float64 | 0.0000 |
| podt_overall | podt_overall | float64 | 0.0000 |
| podt_t | podt_t | float64 | 0.0000 |
| pokt_ct | pokt_ct | float64 | 0.0000 |
| pokt_overall | pokt_overall | float64 | 0.0000 |
| pokt_t | pokt_t | float64 | 0.0000 |
| steamid | steamid | int64 | 0.0000 |
| tapd_ct | tapd_ct | float64 | 0.0000 |
| tapd_overall | tapd_overall | float64 | 0.0000 |
| tapd_t | tapd_t | float64 | 0.0000 |
| team_clan_name | team_clan_name | object | 0.0000 |
| n | |
|---|---|
| dtype | |
| float64 | 18 |
| object | 5 |
| int64 | 2 |
present expected cols: ['steamid', 'player_name', 'team_clan_name', 'map_count'] missing expected cols: []
{'_t': 7, '_ct': 7, '_overall': 7}
Coverage overview¶
Summarise player/team counts and map_count distribution buckets as tables, and record role-label missingness. A binned map_count figure will be shown in the Stability section where its interpretation is most relevant.
# Coverage summary, binned map-count table, role-label missingness, and threshold retention
import pandas as pd
# Coverage summary
coverage = pd.DataFrame({
"n_players": [df["steamid"].nunique()],
"n_teams": [df["team_clan_name"].nunique()],
"mean_maps": [df["map_count"].mean()],
"median_maps": [df["map_count"].median()],
"p10_maps": [df["map_count"].quantile(0.10)],
"p90_maps": [df["map_count"].quantile(0.90)],
"min_maps": [df["map_count"].min()],
"max_maps": [df["map_count"].max()],
}).round(2)
display(coverage)
coverage.to_csv(TAB_DIR / "eda_coverage_summary.csv", index=False)
# Binned map-count table (no plot)
bins = [0, 10, 20, 30, 40, 60, 80, 120, 160, 1_000]
map_bins = pd.cut(df["map_count"], bins=bins, right=False)
map_hist = (
map_bins.value_counts()
.sort_index()
.rename_axis("map_count_bin")
.reset_index(name="n_players")
)
display(map_hist)
map_hist.to_csv(TAB_DIR / "eda_mapcount_hist.csv", index=False)
# Role label missingness per field
role_cols = [c for c in ("role_overall", "role_t", "role_ct") if c in df.columns]
role_missing = df[role_cols].isna().mean().rename("missing_rate").to_frame()
display(role_missing)
role_missing.to_csv(TAB_DIR / "eda_role_missingness.csv")
# Players retained under common min-map thresholds (feeds into stability analysis)
thresholds = [20, 30, 40, 60, 80, 100, 120]
thr_table = (
pd.DataFrame({"min_map_count": thresholds})
.assign(n_players=lambda t: t["min_map_count"].apply(lambda m: (df["map_count"] >= m).sum()))
)
thr_table["share_players"] = (thr_table["n_players"] / df["steamid"].nunique()).round(3)
display(thr_table)
thr_table.to_csv(TAB_DIR / "eda_threshold_coverage.csv", index=False)
| n_players | n_teams | mean_maps | median_maps | p10_maps | p90_maps | min_maps | max_maps | |
|---|---|---|---|---|---|---|---|---|
| 0 | 306 | 61 | 35.72 | 15.0 | 4.0 | 113.0 | 1 | 158 |
| map_count_bin | n_players | |
|---|---|---|
| 0 | [0, 10) | 111 |
| 1 | [10, 20) | 58 |
| 2 | [20, 30) | 23 |
| 3 | [30, 40) | 30 |
| 4 | [40, 60) | 10 |
| 5 | [60, 80) | 31 |
| 6 | [80, 120) | 16 |
| 7 | [120, 160) | 27 |
| 8 | [160, 1000) | 0 |
| missing_rate | |
|---|---|
| role_overall | 0.418301 |
| role_t | 0.385621 |
| role_ct | 0.385621 |
| min_map_count | n_players | share_players | |
|---|---|---|---|
| 0 | 20 | 137 | 0.448 |
| 1 | 30 | 114 | 0.373 |
| 2 | 40 | 84 | 0.275 |
| 3 | 60 | 74 | 0.242 |
| 4 | 80 | 43 | 0.141 |
| 5 | 100 | 33 | 0.108 |
| 6 | 120 | 27 | 0.088 |
2. Stability threshold analysis (Map Count)¶
TL;DR. We set MIN_MAPS = 40 to balance precision and sample size.
Details (click to expand)
Objective. Pick a minimum map threshold where per-player metrics are precise enough for analysis.
Metrics. Rates: oap_overall, podt_overall, pokt_overall. Duration-like: tapd_overall.
*Positional rank metrics (adnt_*, adat_*) are not used for this threshold decision* (see “Omitted metrics” below).
Uncertainty model.
• Rates: per-player standard error (SE) in percentage points (pp) via a binomial approximation with trials ≈ k × maps (rounds for opener; kills/deaths for trades).
• tapd: relative proxy proportional to 1/√maps (arbitrary units); decision is based on the plateau rather than a numeric tolerance.
Aggregation & display. Players are binned by map_count; figures show the 75th percentile (p75) of per-player SE vs. bin center.
The rates figure includes a 2.0 pp tolerance line; both figures mark the chosen MIN_MAPS.
Assumptions. The trials-per-map factor k is assumed; reasonable changes shift levels but not the 1/√maps shape or the cutoff region.
Omitted metrics (positional ranks).
Rank features are ordinal and team/context-relative; without per-map variability we can’t construct a meaningful sampling-error model.
Future data / improvements.
Collect per-map (or per-round/per-life) microdata to compute actual standard errors or confidence/credible intervals for opener attempts, trades, time-alive, and positional measures. With microdata, the threshold can be revisited using measured uncertainty instead of approximations.
Parameters¶
# === Parameters ===
# Map-count bins for summaries and plots
MAP_BINS = [0, 10, 20, 30, 40, 50, 60, 80, 100, 120, 140, 160]
# Trials-per-map assumptions (sensitivity moves levels, not shape)
K_OAP, K_PODT, K_POKT = 20.0, 18.0, 14.0
# Quantile for bin summary (p75 is conservative and robust)
Q = 0.75
# Decision lines for figures
MIN_MAPS = 40
TOL_PP = 2.0 # percentage points for rate metrics; set None to hide
Proportional Features¶
Uncertainty falls quickly and is ≤ 2.0 pp by ~40 maps for oap/podt/pokt; gains flatten beyond.
# === Stability: rates (oap/podt/pokt), pQ in percentage points ===
rates_tbl = rates_quantile_table(
df, MAP_BINS, k_oap=K_OAP, k_podt=K_PODT, k_pokt=K_POKT, q=Q, map_col="map_count"
)
plot_rates_quantile(
rates_tbl, q=Q, tol_pp=TOL_PP, min_maps=MIN_MAPS,
savepath=FIG_DIR / f"stability_rates_p{int(Q*100)}_pp.png",
savepath_svg=FIG_DIR / f"stability_rates_p{int(Q*100)}_pp.svg",
)
Time Alive Per Death (TAPD)¶
Steep early drop then plateau around ~40 maps; higher thresholds give diminishing returns.
# === Stability: tapd (duration proxy), pQ in proxy units ===
tapd_tbl, _ = tapd_quantile_table(
df, MAP_BINS, q=Q, map_col="map_count", scale=None, ref_mask=None
)
plot_tapd_quantile(
tapd_tbl, q=Q, min_maps=MIN_MAPS,
savepath=FIG_DIR / f"stability_tapd_p{int(Q*100)}_proxy.png",
savepath_svg=FIG_DIR / f"stability_tapd_p{int(Q*100)}_proxy.svg",
)
Coverage — players by map count.¶
MIN_MAPS = 40 trims the low-map tail while retaining a substantial cohort.
# === Coverage: players by map_count and retention at MIN_MAPS ===
ret = retained_summary(df, MIN_MAPS, map_col="map_count")
plot_mapcount_hist(
df, bins=list(range(0, 180, 20)), min_maps=MIN_MAPS,
savepath=FIG_DIR / "mapcount_hist.png",
savepath_svg=FIG_DIR / "mapcount_hist.svg",
)
print(f"Total players: {ret['total']}")
print(f"Retained at MIN_MAPS={ret['min_maps']}: {ret['retained']} ({ret['retained_pct']:.1f}%)")
Total players: 306 Retained at MIN_MAPS=40: 84 (27.5%)
Creating the stable cohort dataframe and checking for missing roles.
# === Stable cohort: create df_stable and check for missing roles ===
# Stable cohort dataframe with players having at least MIN_MAPS
df_stable = df.loc[df["map_count"] >= MIN_MAPS].copy()
# Sanity check: no missing/empty roles in stable cohort. If you select a lower threshold, roles may be missing.
role_cols = ["role_overall", "role_t", "role_ct"]
for col in role_cols:
mask = df_stable[col].isna() | (df_stable[col].astype(str).str.strip() == "")
n = int(mask.sum())
print(f"{col} missing: {n}")
assert n == 0, f"Some players in the stable cohort are missing {col}."
role_overall missing: 0 role_t missing: 0 role_ct missing: 0
Summary & cutoff¶
- Rates figure: p75 SE for
oap/podt/poktdrops rapidly and is ≤ 2.0 pp by ~40 maps for all three; beyond 40 the curves flatten. - tapd figure: p75 proxy shows a clear diminishing-returns plateau around ~40 maps.
- Coverage figure: a large low-map tail; 40 maps removes the noisiest segment while retaining a substantial cohort.
Decision: MIN_MAPS = 40.
Why: Meets the rates tolerance, tapd has plateaued, and coverage remains relatively strong.
3. Feature Distributions & Role Profiles¶
TL;DR — Feature Distributions & Role Profiles: Filled KDEs show side/role distribution shapes; tables give μ/σ/n; radars summarise role differences.
Details (click to expand)
Objective. Describe how the six core features vary by side (CT/T) and role, quantify location/dispersion, and provide compact role profiles to support comparison and downstream clustering.
Questions.
- Do CT and T differ in opener activity, trading, time-alive, or spacing?
- Within a side, which roles occupy distinct regions of each feature (shape/shift/overlap)?
- What are the exact means (μ) and standard deviations (σ) per side/role?
- How do roles compare across features at once (radar “shape”)?
Visuals.
- KDEs (CT vs T): 2×3 grid with semi-transparent fills; dashed means only for non-positional features (TAPD, OAP, PODT, POKT). Positional ranks (ADNT/ADAT) are read by shape/shift, not mean.
- KDEs (by role, per side): 2×3 grids overlaying roles; low-n roles excluded;
roles_ordercan limit which roles are shown. - Tables: compact μ/σ/n summaries for sides and for roles (per side).
- Radars (per side): normalised multi-feature role profiles for quick comparison.
Conventions & filters.
- Colors: CT = blue, T = orange; roles use the project role palette.
- Scales: X-limits fixed per feature for like-for-like comparison; adjust
bw_adjustif smoothing looks off. - Inclusion: Optional
min_map_countfor players,min_role_nfor roles;roles_orderacts as a role toggle.
How to read.
- In KDEs, focus on location, spread, and overlap; watch for bimodality.
- Use tables for exact μ/σ/n (don’t interpret means for rank features).
- Radars summarise cross-feature role differences; they complement, not replace, the distributions.
Scope.
- Positional ranks are ordinal and team-relative; means are not meaningful, so mean lines are omitted.
CT vs T — Distributions (KDE) + Side Summary Table¶
2×3 KDE grid comparing CT vs T with filled densities; dashed means only for non-positional features (TAPD, OAP, PODT, POKT). X-limits are fixed per feature for comparability. The table below reports μ/σ/n for each side and feature.
# === Role summaries and plots by side ===
plot_kdes_side_compare(
df=df_stable,
bw_adjust=1,
min_map_count=None, # Select a min_map_count if desired (use df instead of df_stable)
savepath=FIG_DIR / "kde_side_compare.png",
savepath_svg=FIG_DIR / "kde_side_compare.svg",
)
# Side summary table
stats_tbl = summarize_side_stats(df_stable, min_map_count=None)
display(stats_tbl.round(2))
(stats_tbl.round(3)
.to_csv(RESULTS_DIR / "tables" / "side_kde_summary.csv"))
| Feature | CT μ | CT σ | T μ | T σ | n | |
|---|---|---|---|---|---|---|
| 0 | Time Alive Per Death (TAPD) (s) | 60.67 | 3.34 | 60.29 | 4.60 | 84 |
| 1 | Opening Attempt % (OAP) | 20.18 | 3.50 | 20.17 | 5.90 | 84 |
| 2 | Proportion of Deaths Traded % (PODT) | 19.66 | 2.82 | 24.27 | 3.33 | 84 |
| 3 | Proportion of Kills that are Trades % (POKT) | 16.07 | 2.13 | 23.66 | 3.72 | 84 |
| 4 | Distance to Nearest Teammate (ADNT) – rank | 0.60 | 0.11 | 0.60 | 0.14 | 84 |
| 5 | Distance from Average Teammate (ADAT) – rank | 0.60 | 0.15 | 0.60 | 0.15 | 84 |
Observations
- T side shows a noticeably wider spread across most features, suggesting greater behavioural diversity.
- The clearest side difference appears in the trading measures, where T values are generally higher.
- Most distributions are smooth and single-peaked, while T-side spacing (ADAT) stands out as distinctly bimodal, hinting at mixed positional tendencies.
Roles — T Side (KDE) + Summary Table¶
2×3 KDE grid for T-side overlaid by role . Use t_roles to include specific roles; roles with low n are excluded via min_role_n. The table below gives μ/σ/n by role × feature.
# T roles to choose from: "AWPer","Spacetaker","Lurker","Half-Lurker"
t_roles = ["AWPer","Spacetaker","Lurker","Half-Lurker"]
plot_kdes_roles_by_side(
df=df_stable, side="t",
roles_order=t_roles, min_role_n=8, bw_adjust=1.2, min_map_count=None,
savepath=FIG_DIR / "kde_roles_T.png",
savepath_svg=FIG_DIR / "kde_roles_T.svg",
)
# Summary table for T side, use wide=False for long format (easier role comparisons)
tbl_roles_t = summarize_role_stats_by_side(
df_stable, side="t", roles_order=t_roles, min_role_n=8,
round_to=2, wide=True
)
display(tbl_roles_t)
# Optional save
tbl_roles_t.to_csv(RESULTS_DIR / "tables" / "role_kde_summary_T.csv", index=False)
| Feature | AWPer μ | AWPer σ | Half-Lurker μ | Half-Lurker σ | Lurker μ | Lurker σ | Spacetaker μ | Spacetaker σ | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | Distance from Average Teammate (ADAT) – rank | 0.43 | 0.06 | 0.66 | 0.09 | 0.75 | 0.08 | 0.55 | 0.10 |
| 1 | Distance to Nearest Teammate (ADNT) – rank | 0.43 | 0.08 | 0.65 | 0.08 | 0.72 | 0.09 | 0.57 | 0.11 |
| 2 | Opening Attempt % (OAP) | 14.20 | 2.85 | 22.42 | 4.12 | 17.68 | 3.82 | 24.51 | 5.30 |
| 3 | Proportion of Deaths Traded % (PODT) | 21.57 | 2.30 | 25.05 | 2.51 | 24.90 | 3.55 | 24.97 | 3.27 |
| 4 | Proportion of Kills that are Trades % (POKT) | 25.81 | 2.48 | 23.41 | 2.79 | 25.07 | 3.50 | 21.49 | 3.69 |
| 5 | Time Alive Per Death (TAPD) (s) | 64.36 | 4.40 | 57.24 | 3.78 | 61.89 | 3.25 | 58.00 | 3.77 |
Observations (T-side Roles)
- AWPers tend to have more distinct distributions, with relatively sharp peaks suggesting less intra-role playstyle diversity.
- Positional features display the most separation between all roles.
- All roles overlap with all other roles, implying inherent fluidity in playstyle.
- Positionally, the more ambiguous role "Half-Lurker" tends to sit between the aggresive and passive roles (Spacetaker/Lurker), with AWPers positioning closest to teammates.
Roles — CT Side (KDE) + Summary Table¶
2×3 KDE grid for CT-side overlaid by role. Use ct_roles to include specific roles; roles with low n are excluded via min_role_n. The table below gives μ/σ/n by role × feature.
# CT roles to choose from: "AWPer","Anchor","Rotator","Mixed"
# Toggle which roles to include (optional)
ct_roles = ["AWPer","Anchor","Rotator","Mixed",]
plot_kdes_roles_by_side(
df=df_stable, side="ct",
roles_order=ct_roles, min_role_n=8, bw_adjust=1.2, min_map_count=None,
savepath=FIG_DIR / "kde_roles_CT.png",
savepath_svg=FIG_DIR / "kde_roles_CT.svg",
)
# Summary table for CT side, use wide=False for long format (easier role comparisons)
tbl_roles_ct = summarize_role_stats_by_side(
df_stable, side="ct", roles_order=ct_roles, min_role_n=8,
round_to=2, wide=True
)
display(tbl_roles_ct)
# Optional save
tbl_roles_ct.to_csv(RESULTS_DIR / "tables" / "role_kde_summary_CT.csv", index=False)
| Feature | AWPer μ | AWPer σ | Anchor μ | Anchor σ | Mixed μ | Mixed σ | Rotator μ | Rotator σ | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | Distance from Average Teammate (ADAT) – rank | 0.53 | 0.03 | 0.77 | 0.10 | 0.63 | 0.10 | 0.49 | 0.11 |
| 1 | Distance to Nearest Teammate (ADNT) – rank | 0.61 | 0.05 | 0.71 | 0.09 | 0.61 | 0.08 | 0.51 | 0.10 |
| 2 | Opening Attempt % (OAP) | 21.37 | 3.35 | 17.80 | 3.11 | 19.38 | 2.97 | 21.76 | 3.15 |
| 3 | Proportion of Deaths Traded % (PODT) | 18.80 | 2.69 | 19.06 | 2.44 | 20.42 | 3.46 | 20.15 | 2.60 |
| 4 | Proportion of Kills that are Trades % (POKT) | 15.86 | 2.21 | 15.88 | 2.30 | 16.39 | 2.44 | 16.15 | 1.78 |
| 5 | Time Alive Per Death (TAPD) (s) | 62.05 | 3.42 | 61.95 | 3.21 | 60.40 | 3.13 | 59.03 | 2.85 |
Observations (T-side Roles)
- Roles appear far less distinct on the CT side, with much more overlap in distributions. This reflects the more reactive and fluid nature of CT playstyles.
- Positional features again display the most separation between all roles. With AWPers having noticeably sharper peaks, suggesting less intra-role playstyle diversity positionally.
- Positional features mirror the pattern on T side where the ambiguous role tends to sit between the aggressive and passive roles.
Interactive Role Profile Radars¶
Explore standardised radar plots comparing T and CT role profiles across all six features.
Use the interactive controls above the plot to toggle uncertainty bands and baseline visibility.
Notes:
- Click on a role to toggle it
- Bands are best used when looking at one or two roles at a time
- Double-Click a plot to return the view back to normal
# === Role radar plots (interactive) ===
fig = plot_role_radars_interactive(df, min_map_count=40)
fig.show(renderer="notebook")
# Static PNG export requires Chrome and Kaleido; commented out for portability
# fig.write_image(FIG_DIR / "role_radars_t_vs_ct.png", width=1500, height=840)
# Saving as HTML
fig.write_html(FIG_DIR / "role_radars_t_vs_ct.html", include_plotlyjs="embed")