Methodology · Playing Styles

How the playing styles are built

From the raw Wyscout event feed to the nine transparent "Offensive × Defensive" styles - every step is theory-anchored, three-times empirically validated and fully reproducible.

Model structure

Each team-season is classified along two independent axes - composite style name = 'Offensive × Defensive'.

Playing styles in the Swiss Super League are described by two independent axes: an offensive build-up axis (possession -> direct) and a defensive pressing-height axis (low block -> high press). Each axis is reduced from a small set of style indicators with PCA and clustered into three archetypes with k-means. The cross-product yields 3 × 3 = 9 transparent playing styles — each named directly by the combination of its two parts, e.g..

Offensive pipeline

9 cleaned ratio/share/index features → PCA (≥ 70 % cumulative variance -> 2 components retained) -> k-means with k = 3.

Defensive pipeline

5 pressing-height features → PCA (PC1 already explains ~84 % → 1 component retained) -> k-means with k = 3. A 2-component visualisation PCA is computed separately for the drilldown scatter.

Data foundation

Season-aggregated Wyscout event data for the SFL across ten seasons (2015/16 – 2024/25).

The training set contains 104 team-season observations from 16 clubs across the ten seasons 2015/16 to 2024/25, provided by Wyscout. The current 2025/26 season is held out as an out-of-sample observation (the FC Thun promotion). Season-level aggregation smooths match-specific noise from opponent strength, home/away effects and game state, and captures the recurring patterns that define a playing style (Hewitt et al. 2016, Fernández-Navarro et al. 2016).

Clean feature design

Ratios, shares and indices — not volumes. Set-pieces, crosses and long balls are deliberately excluded.

Volume KPIs such as passes_per90, shots_per90 or xg_per90 strongly correlate with team strength and would dominate PC1. Instead the feature set is built from ratios (forward-pass share), shares (possession %) and indices (recovery height index, loss height index). Set-piece, cross and long-ball indicators are intentionally excluded — they signal a tactical accent rather than a main style and would otherwise bias PC1.

Offensive features (9)

possession_pct, avg_passes_per_possession, pass_acc_pct, forward/back/lateral_pass_share, progressive_pass_share, passes_per_shot, match_tempo. Each is documented in Plakias 2024 / Fernández-Navarro 2016 / Castellano 2019 / Memmert 2022 as a style indicator.

Defensive features (5)

ppda (Bialkowski 2014), recovery_height_index, high/low_recovery_share, loss_height_index. Pressing-height trichotomy following Bauer & Anzer 2021.

Excluded from main-style features

Set-pieces, crosses, long balls, shot profile and tackle/aerial style. They live orthogonal to the main axes and would bias PC1 toward set-piece dependence rather than build-up direction.

Where each feature comes from

Every input is either a raw Wyscout column or a season-level ratio/index — the exact formula for each.

Wyscout records ~100 columns per match. The 14 features that drive the playing-style model are either taken directly from Wyscout's published columns or computed deterministically at season level from Wyscout sums. Below is the full provenance table — for each feature its source, what it measures, and (where applicable) the exact formula. A convention worth stating upfront: shares and ratios are built from season-summed numerators and denominators (e.g. season-total forward_passes ÷ season-total passes), not as means of per-match quotients. This keeps the ratios numerically stable when individual matches have very small denominators.

Offensive features (9)

All offensive indicators describe how a team moves the ball forward and how often it cycles before shooting.

Possession %possession_pct

Wyscout · direct

Share of match time the team had control of the ball, averaged across all league matches in the season.

Source: Wyscout column “Possession %” per match

Passes per Possessionavg_passes_per_possession

Wyscout · direct

Average number of passes a team strings together per ball-possession sequence — a direct measure of build-up patience.

Source: Wyscout column “Average passes per possession” per match

Pass Accuracy %pass_acc_pct

Derived

Share of passes that found a teammate — proxy for build-up control and pass selection.

Formula

pass_acc_pct = (Σ accurate_passes / Σ passes) × 100

Source: Wyscout columns passes_acc, passes — season sums

Forward Pass Shareforward_pass_share

Derived

Share of passes played forward in the direction of the opponent's goal. Core indicator of directness.

Formula

forward_pass_share = Σ forward_passes / Σ passes

Source: Wyscout columns forward_passes, passes — season sums

Back Pass Shareback_pass_share

Derived

Share of passes played backwards — indicator of recycling / re-building from a deeper position.

Formula

back_pass_share = Σ back_passes / Σ passes

Source: Wyscout columns back_passes, passes — season sums

Lateral Pass Sharelateral_pass_share

Derived

Share of passes played sideways — indicator of horizontal circulation, common in possession-oriented build-up.

Formula

lateral_pass_share = Σ lateral_passes / Σ passes

Source: Wyscout columns lateral_passes, passes — season sums

Progressive Passes %progressive_pass_share

Derived

Share of passes that move the ball significantly closer to the opponent's goal (Wyscout's own threshold definition is applied per pass).

Formula

progressive_pass_share = Σ progressive_passes / Σ passes

Source: Wyscout columns progressive_passes, passes — season sums

Passes per Shotpasses_per_shot

Derived

Possession efficiency: how many passes a team strings together for every shot generated. Low = direct, high = patient.

Formula

passes_per_shot = Σ passes / Σ shots

Source: Wyscout columns passes, shots — season sums

Match Tempomatch_tempo

Wyscout · direct

Wyscout's tempo metric — passes per minute of possession. Higher = faster ball circulation; tempo correlates with direct or pressing-oriented teams.

Source: Wyscout column “Match tempo” per match

Defensive features (5)

All defensive indicators describe where on the pitch a team wins or loses the ball — the pressing-height axis.

PPDAppda

Wyscout · direct

Passes per Defensive Action. Opponent passes in their attacking 60 % of the pitch divided by the team's defensive actions (tackles, fouls, interceptions, challenges) in that area. Low PPDA = aggressive high pressing.

Source: Wyscout column “PPDA” per match

Recovery Height Indexrecovery_height_index

Derived

Wyscout records ball recoveries in three discrete pitch zones (low / mid / high — the team's own, middle, and opponent's third). To get a single scalar that captures *where* a team typically wins the ball back, the three categories are encoded with equidistant weights (1, 2, 3) and the share-weighted mean is computed. This operationalisation corresponds to the standard Likert-style scoring of ordinal categories in statistics (cf. Stevens 1946 on ordinal scaling). Conceptually the index follows the well-established style dimension *pressing height* (Bauer & Anzer 2021; Memmert 2022); methodologically it is our own operationalisation that works with the available Wyscout event categories. Output is bounded between 1 (all recoveries in own third) and 3 (all recoveries in opponent third).

Formula

recovery_height_index = (3·rec_high + 2·rec_med + 1·rec_low) / (rec_high + rec_med + rec_low)

Source: Wyscout columns recoveries_low, recoveries_med, recoveries_high — season sums

High Recoveries %high_recovery_share

Derived

Share of all ball recoveries that happen in the opponent's third — a hard test of pressing intensity.

Formula

high_recovery_share = Σ recoveries_high / (Σ recoveries_low + recoveries_med + recoveries_high)

Source: Wyscout columns recoveries_low, recoveries_med, recoveries_high — season sums

Low Recoveries %low_recovery_share

Derived

Share of all ball recoveries that happen in the team's own third — characteristic of deep block defences.

Formula

low_recovery_share = Σ recoveries_low / (Σ recoveries_low + recoveries_med + recoveries_high)

Source: Wyscout columns recoveries_low, recoveries_med, recoveries_high — season sums

Loss Height Indexloss_height_index

Derived

Built with identical logic to the recovery-height index (Likert-style scoring of the three Wyscout zone categories), but on losses instead of recoveries. The loss-height index captures *where* a team typically gives the ball away: high values mean it frequently loses possession in opponent territory (signature of ambitious build-up), low values mean losses happen mostly in the team's own third (signature of a low block under sustained opponent pressure). Together with the recovery-height index, the pair captures the team's overall pressing-and-build-up footprint along the length of the pitch.

Formula

loss_height_index = (3·loss_high + 2·loss_med + 1·loss_low) / (loss_high + loss_med + loss_low)

Source: Wyscout columns losses_low, losses_med, losses_high — season sums

Principal component analysis

Each cleaned feature set is reduced to the components needed to reach 70 % cumulative variance.

Both feature sets are standardised (mean 0, SD 1) and transformed into independent principal components. The number of retained components is set by the cumulative-variance rule (≥ 70 %). The first principal component is sign-fixed: PC1+ is "direct" for the offensive set (forward_pass_share loads positively) and "high press" for the defensive set (recovery_height_index loads positively). This produces an asymmetric dimensionality that mirrors the underlying tactical reality: the offensive style space is genuinely two-dimensional, the defensive style space is essentially one-dimensional (pressing height). The feature design passes the three standard PCA suitability tests — KMO sampling adequacy, Bartlett's sphericity test and per-feature VIF — and a feature-leave-out sensitivity analysis (defensive cluster solution Adjusted Rand Index ≥ 0.85) confirms the result is robust against the choice of indicators.

Standardisation

X = (X_raw − mean) / sd (StandardScaler)

Loadings

loading[i, j] = component[i, j] · sqrt(eigenvalue[j])

k-Means clustering & three-criteria validation

k = 3 is fixed a priori by theory and validated empirically with a three-criteria consensus.

k-means with n_init = 200 is applied on the principal-component scores. The cluster count k = 3 is fixed a priori by theory (Plakias et al. 2024: direct / pragmatic / possession for offensive play; Bauer & Anzer 2021: high press / mid press / low block for defensive play). The empirical validation rests on a consensus of three independent indicators across k = 2 … 7: 1. Inertia (elbow method) — diminishing returns in within-cluster sum of squares. 2. Silhouette score — geometric separation of clusters. 3. Bootstrap-ARI (200 resamples) — reproducibility of cluster assignments. A pure silhouette argmax would prefer k = 2 (a well-known mathematical bias toward small k; Halkidi 2001, Plakias 2023). Only the consensus across theory + elbow + stability justifies the choice of k = 3 — and would also forfeit the theoretically grounded *pragmatic* middle category.

Inertia

Sum of squared distances from each point to its centroid

Bootstrap-ARI

Mean ARI(km(X), km(X*)) over 100 bootstrap samples X*

Offensive validation

Across k = 2 … 7, evaluated on the 2 retained offensive PCs.

Diminishing returns at k = 3

Geometric separation (higher is better)

Reproducibility over 100 bootstraps

Cluster count is fixed a priori from theory and validated with a three-criteria consensus. Pure silhouette would prefer k = 2 (mathematical bias to small k); only the elbow inflection + ARI stability + theoretical grounding justify k = 3.

Defensive validation

Across k = 2 … 7, evaluated on the 1 retained defensive PC.

Diminishing returns at k = 3

Geometric separation (higher is better)

Reproducibility over 100 bootstraps

Transparent style names — Offensive × Defensive

Composite name = offensive cluster × defensive cluster. No invented combo labels.

Cluster naming is heuristic and reproducible from the z-profile of each cluster. Offensive thresholds: forward_pass_share > 0.5 SD and avg_passes_per_possession < −0.3 SD → "Direct Vertical Play"; avg_passes_per_possession > 0.5 SD and pass_acc_pct > 0.3 SD → "Possession-Oriented"; otherwise "Pragmatic Style". Defensive thresholds: recovery_height_index > 0.7 SD → "High Press", < −0.7 SD → "Low Block", otherwise "Mid Press". The final style name of a team-season is then simply the cross-product of its two axes: "{offensive} × {defensive}". This keeps the model entirely transparent — every label can be derived directly from the cluster assignment.

Offensive	Defensive	Composite style name
Direct Vertical Play	High Press	Direct Vertical Play × High Press
Direct Vertical Play	Mid Press	Direct Vertical Play × Mid Press
Direct Vertical Play	Low Block	Direct Vertical Play × Low Block
Pragmatic Style	High Press	Pragmatic Style × High Press
Pragmatic Style	Mid Press	Pragmatic Style × Mid Press
Pragmatic Style	Low Block	Pragmatic Style × Low Block
Possession-Oriented	High Press	Possession-Oriented × High Press
Possession-Oriented	Mid Press	Possession-Oriented × Mid Press
Possession-Oriented	Low Block	Possession-Oriented × Low Block

Out-of-sample classification

New seasons are projected through the trained scalers, PCA and k-means — no re-fitting.

For a held-out season such as Thun 2025/26 the model re-uses the trained transformations. The features are standardised with the trained scaler, projected into the PCA space, and assigned to the nearest k-means centroid by Euclidean distance. This keeps the style space stable: results from the current season are directly comparable to historical ones, and the same composite style name (e.g. "Direct Vertical Play × High Press") follows immediately.

Projection

scores_new = pca.transform(scaler.transform(X_new))

Assignment

label_new = argmin_c || scores_new − centroid_c ||₂

Visual encoding

Three colours (offensive) + three shapes (defensive) + amber ring (champion).

Every chart on the analysis page uses the same visual vocabulary so a team-season's full composite style can be read from a single marker — without consulting a legend.

Colour

Offensive cluster

Possession-Oriented

Pragmatic Style

Direct Vertical Play

Shape

Defensive cluster

High Press

Mid Press

Low Block

Amber ring

Champion season

SFL title

Outline hugs the shape

Reading a marker

Colour + shape together identify the composite style; the amber ring adds the champion flag.

Possession × High Press

🏆 Champion season

e.g. Young Boys 2019/20

Pragmatic × Mid Press

The league's ground state

e.g. Lugano 2022/23

Direct × Low Block

Counter / long-ball pattern

e.g. Sion 2017/18

Limitations

Small sample, season aggregation, and Wyscout coding noise.

104 team-seasons is a small sample. Season-level aggregation smooths match-specific noise but also erases within-season variation and the influence of opponent strength, home/away effects and game state. Wyscout's event coding is not perfectly consistent (e.g. the definition of "possession" has changed over time) and some features are more affected by noise than others (e.g. avg_passes_per_possession is more volatile than possession_pct). The model captures broad style patterns but cannot explain every nuance of a team's tactical identity and should be complemented with qualitative analysis for a full picture.