Cs2 10k A Large Scale Egocentric Counter Strike 2 Dataset
Captured source
source ↗CS2-10k: A Large-Scale Egocentric Counter-Strike 2 Dataset
← Back to Blog
')" data-framer-background-image-wrapper="true">
Jun 24, 2026
CS2-10k: A Large-Scale Egocentric Counter-Strike 2 Dataset
CS2-10k: A Large-Scale Egocentric Counter-Strike 2 Dataset
Training interactive world models requires data that is notoriously hard to find: ego-centric video sequences with densely aligned action signals (keyboard inputs, camera motion, and ego state) all synchronized to the visual stream. Real-world embodied data is costly to collect, while synthetic data often lacks the visual richness or behavioral diversity needed for generalization. Counter-Strike 2 demos offer a compelling middle ground: because matches are recorded as deterministic replays, we can reconstruct clean first-person video at any point in a match, extracting the precise control inputs that drove each visual change. For these reasons, Counter-Strike is fast becoming a popular substrate for embodied AI and world-model research, with recent efforts such as EgoCS-400k reflecting a growing community interest in it as a rich source of egocentric training data. Today we release CS2-10k , a large-scale egocentric gameplay dataset built from professional CS2 matches. It contains 600,000+ player-round videos spanning 10,000+ hours of first-person footage , paired with per-frame annotations covering keyboard state, mouse movement, and 3D player trajectory . Alongside this ready-to-use dataset , we are also releasing the ready-to-extend cs2-dem-renderer , the open-source pipeline used to produce it. All of this, so we can build better world models, together. 154+
Hours
512+
Videos
587
PRO MATCHES
893+
Rounds
154+
Hours
512+
Videos
587
PRO MATCHES
893+
Rounds
Browse Dataset →
Browse Dataset →
Dataset Overview CS2-10k is built from public professional match demos sourced from HLTV . For each demo, we render clean first-person video at 720p, 48fps using the demo replay tool inside CS2, producing one video per player per round. Alongside each video, we store a parquet file containing per-frame annotations synchronized to the video timeline. Annotation Schema Every video clip has its corresponding anotations stored in a .parquet file: Field Type Description
map string Map name (e.g. "mirage", "dust2")
round_number int Round within the match
team int 0 = Counter-Terrorist, 1 = Terrorist
num_frames int Total frames in the clip
fps float Video frame rate (48.0)
total_time float Clip duration in seconds
fov float Camera field of view (90.0°)
frame_data list[dict] Per-frame annotation array (see below)
Per-Frame Annotations Each entry in frame_data contains: Field Description
actions Concatenated active keys: W/A/S/D (movement), J (jump), C (crouch), R (run), V (freefall), [ (fire), ] (scope/secondary), - (no input)
mouse_x_delta Horizontal camera delta — proxy for mouse X movement
mouse_y_delta Vertical camera delta — proxy for mouse Y movement
position_x / y / z Player world position in game units
rotation_yaw Camera yaw angle (−180° to 180°)
rotation_pitch Camera pitch angle (−90° to 90°)
The combination of video and per-frame control signals creates a tight action-observation loop. No Abrupt Visual Changes Each clip is a contiguous segment of a single round from a single player's perspective. There are no mid-round cuts, no editing transitions, and no UI HUD. The camera moves in a physically plausible relationship in the world and we hide the player weapon to get rid of sudden visual changes caused by weapon recoil, reloads, and weapon switching. Many Use Cases CS2-10k is designed for training interactive world models that learn how first-person visual observations change in response to player actions. The same aligned video, control, and state signals also support a range of related research workflows:
Action-Conditioned Video Generation
Train models to generate the next N frames given the current frame and a keyboard+mouse action sequence. Dense per-frame controls make CS2-10k a natural fit for models like GameNGen, Genie, DIAMOND , and OASIS.
Egocentric Navigation
With 3D player positions and yaw/pitch per frame, the dataset supports learning navigation priors: what does moving forward look like in a confined corridor vs an open site? How does camera control correlate with positional change?
Long-Horizon Planning
Full rounds of 60–90 seconds provide significantly longer temporal horizons than most embodied datasets. A model can learn tactical structure: site entry, holds, rotations, and retakes — each as a coherent visual sequence.
Multi-Agent World Modeling
All 10 players per match are recorded simultaneously with shared round and map identifiers, making it possible to model how one agent's actions causally affect another's observations.
Action-Conditioned Video Generation
Train models to generate the next N frames given the current frame and a keyboard+mouse action sequence. Dense per-frame controls make CS2-10k a natural fit for models like GameNGen, Genie, DIAMOND , and OASIS.
Egocentric Navigation
With 3D player positions and yaw/pitch per frame, the dataset supports learning navigation priors: what does moving forward look like in a confined corridor vs an open site? How does camera control correlate with positional change?
Long-Horizon Planning
Full rounds of 60–90 seconds provide significantly longer temporal horizons than most embodied datasets. A model can learn tactical structure: site entry, holds, rotations, and retakes — each as a coherent visual sequence.
Multi-Agent World Modeling
All 10 players per match are recorded simultaneously with shared round and map identifiers, making it possible to model how one agent's actions causally affect another's observations.
Rendering Pipeline If CS2-10k does not cover the scale, matches, or annotations you need, you can use our open-source pipeline at github.com/reka-ai/cs2-dem-renderer to render your own CS2 datasets. Given a .dem file, it performs a two-pass parse to extract per-player spawn/death intervals and per-frame button inputs, then drives CS2's built-in demo replay system to render first-person video for each player each round. Frames are streamed in real time from CS2's movie output to ffmpeg (VAAPI HEVC), producing .mp4 clips alongside synchronized .parquet annotation files. A worker mode processes entire directories of demos with automatic deduplication, making it straightforward to run at the scale of...
Excerpt shown — open the source for the full document.