Within a recording, behavioral clustering is done in 30-second windows (10s pre tone, 10s of tone, 10s post tone). Clustering was done on every 3rd video frame, which has a speed of 3 frames per second, so there are occasionally ~70ms gaps between behavioral clusters.