Utilizing PyTorch, pc imaginative and prescient strategies, and a Convolutional Neural Community (CNN), I labored on a mannequin that tracks gamers, groups and primary efficiency statistics
These days, I don’t play hockey as a lot as I wish to, however it’s been part of me since I used to be a child. Lately, I had the possibility to assist with the referee desk and preserve some stats within the first Ice Hockey Match in Lima (3 on 3). This occasion concerned a rare effort of the the Peruvian Inline Hockey Affiliation (APHL) and a sort go to from the Friendship League. So as to add an AI twist, I used PyTorch, pc imaginative and prescient strategies, and a Convolutional Neural Community (CNN) to construct a mannequin that tracks gamers and groups and gathers some primary efficiency stats.
This text goals to be a fast information to designing and deploying the mannequin. Though the mannequin nonetheless wants some fine-tuning, I hope it will probably assist anybody introduce themselves to the attention-grabbing world of pc imaginative and prescient utilized to sports activities. I wish to acknowledge and thank the Peruvian Inline Hockey Association (APHL) for permitting me to make use of a 40-second video pattern of the event for this undertaking (you will discover the video enter pattern within the project’s GitHub repository).
Earlier than shifting on with the undertaking, I did some fast analysis to discover a baseline from which I might work and keep away from “reinventing the wheel”. I discovered that by way of utilizing pc imaginative and prescient to trace gamers, there may be numerous attention-grabbing work on soccer (not stunning, being the most well-liked workforce sport on the planet). Nevertheless, I didn’t discover many assets for ice hockey. Roboflow has some attention-grabbing pre-trained fashions and datasets for coaching your individual, however working with a hosted mannequin offered some latency points that I’ll clarify additional. Ultimately, I leveraged the soccer materials for studying the video frames and acquiring the person monitor IDs, following the essential ideas and monitoring methodology method defined in this tutorial (In case you are fascinated by gaining a greater understanding of some primary pc imaginative and prescient strategies, I counsel watching at the very least the primary hour and a half of the tutorial).
With the monitoring IDs lined, I then constructed my very own path. As we stroll by way of this text, we’ll see how the undertaking evolves from a easy object detection job to a mannequin that totally detects gamers, groups, and delivers some primary efficiency metrics (pattern clips from 01 to 08, creator’s personal creation).
The monitoring mechanism is the spine of the mannequin. It ensures that every detected object throughout the video is recognized and assigned a singular identifier, sustaining this id throughout every body. The primary elements of the monitoring mechanism are:
- YOLO (You Solely Look As soon as): It’s a strong real-time object detection algorithm initially launched in 2015 within the paper “You Only Look Once: Unified, Real-Time Object Detection”. Stands out for its pace and its versatility in detecting round 80 pre-trained courses (it’s essential to notice that it can be educated on customized datasets to detect particular objects). For our use case, we are going to depend on YOLOv8x, a pc imaginative and prescient mannequin constructed by Ultralytics primarily based on earlier YOLO variations. You’ll be able to obtain it here.
- ByteTrack Tracker: To grasp ByteTrack, we’ve got to grasp MOT (A number of Object Monitoring), which entails monitoring the actions of a number of objects over time in a video sequence and linking these objects detected in a present body with corresponding objects in earlier frames. To perform this, we are going to use ByteTrack ( launched in 2021 within the paper “ByteTrack: Multi-Object Tracking by Associating Every Detection Box”). To implement the ByteTrack tracker and assign monitor IDs to detected objects, we are going to depend on the Python’s supervision library.
- OpenCV: is a well known library for numerous pc imaginative and prescient duties in Python. For our use case, we are going to depend on OpenCV to visualise and annotate video frames with bounding packing containers and textual content for every detected object.
In an effort to construct our monitoring mechanism, we’ll start with these preliminary two steps:
- Deploying the YOLO mannequin with ByteTrack to detect objects (in our case, gamers) and assign distinctive monitor IDs.
- Initializing a dictionary to retailer object tracks in a pickle (pkl) file. This might be extraordinarily helpful to keep away from executing the video frame-by-frame object detection course of every time we run the code, and save vital time.
For the next step, these are the Python packages that we’ll want:
pip set up ultralytics
pip set up supervision
pip set up opencv-python
Subsequent, we’ll specify our libraries and the trail for our pattern video file and pickle file (if it exists; if not, the code will create one and reserve it in the identical path):
#**********************************LIBRARIES*********************************#
from ultralytics import YOLO
import supervision as sv
import pickle
import os
import cv2# INPUT-video file
video_path = 'D:/PYTHON/video_input.mp4'
# OUTPUT-Video File
output_video_path = 'D:/PYTHON/output_video.mp4'
# PICKLE FILE (IF AVAILABLE LOADS IT IF NOT, SAVES IT IN THIS PATH)
pickle_path = 'D:/PYTHON/stubs/track_stubs.pkl'
Now let’s go forward and outline our monitoring mechanism (you will discover the video enter pattern within the project’s GitHub repository):
#*********************************TRACKING MECHANISM**************************#
class HockeyAnalyzer:
def __init__(self, model_path):
self.mannequin = YOLO(model_path)
self.tracker = sv.ByteTrack()def detect_frames(self, frames):
batch_size = 20
detections = []
for i in vary(0, len(frames), batch_size):
detections_batch = self.mannequin.predict(frames[i:i+batch_size], conf=0.1)
detections += detections_batch
return detections
#********LOAD TRACKS FROM FILE OR DETECT OBJECTS-SAVES PICKLE FILE************#
def get_object_tracks(self, frames, read_from_stub=False, stub_path=None):
if read_from_stub and stub_path just isn't None and os.path.exists(stub_path):
with open(stub_path, 'rb') as f:
tracks = pickle.load(f)
return tracks
detections = self.detect_frames(frames)
tracks = {"particular person": []}
for frame_num, detection in enumerate(detections):
cls_names = detection.names
cls_names_inv = {v: okay for okay, v in cls_names.objects()}
# Monitoring Mechanism
detection_supervision = sv.Detections.from_ultralytics(detection)
detection_with_tracks = self.tracker.update_with_detections(detection_supervision)
tracks["person"].append({})
for frame_detection in detection_with_tracks:
bbox = frame_detection[0].tolist()
cls_id = frame_detection[3]
track_id = frame_detection[4]
if cls_id == cls_names_inv.get('particular person', None):
tracks["person"][frame_num][track_id] = {"bbox": bbox}
for frame_detection in detection_supervision:
bbox = frame_detection[0].tolist()
cls_id = frame_detection[3]
if stub_path just isn't None:
with open(stub_path, 'wb') as f:
pickle.dump(tracks, f)
return tracks
#***********************BOUNDING BOXES AND TRACK-IDs**************************#
def draw_annotations(self, video_frames, tracks):
output_video_frames = []
for frame_num, body in enumerate(video_frames):
body = body.copy()
player_dict = tracks["person"][frame_num]
# Draw Gamers
for track_id, participant in player_dict.objects():
colour = participant.get("team_color", (0, 0, 255))
bbox = participant["bbox"]
x1, y1, x2, y2 = map(int, bbox)
# Bounding packing containers
cv2.rectangle(body, (x1, y1), (x2, y2), colour, 2)
# Track_id
cv2.putText(body, str(track_id), (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.9, colour, 2)
output_video_frames.append(body)
return output_video_frames
The tactic begins by initializing the YOLO mannequin and the ByteTrack tracker. Subsequent, every body is processed in batches of 20, utilizing the YOLO mannequin to detect and acquire objects in every batch. If the pickle file is on the market in its path, it precomputes the tracks from the file. If the pickle file just isn’t obtainable (you’re working the code for the primary time or have erased a earlier pickle file), the get_object_tracks converts every detection into the required format for ByteTrack, updates the tracker with these detections, and shops the monitoring data in a brand new pickle file within the designated path.Lastly, iterations are remodeled every body, drawing bounding packing containers and monitor IDs for every detected object.
To execute the tracker and save a brand new output video with bounding packing containers and monitor IDs, you should utilize the next code:
#*************** EXECUTES TRACKING MECHANISM AND OUTPUT VIDEO****************## Learn the video frames
video_frames = []
cap = cv2.VideoCapture(video_path)
whereas cap.isOpened():
ret, body = cap.learn()
if not ret:
break
video_frames.append(body)
cap.launch()
#********************* EXECUTE TRACKING METHOD WITH YOLO**********************#
tracker = HockeyAnalyzer('D:/PYTHON/yolov8x.pt')
tracks = tracker.get_object_tracks(video_frames, read_from_stub=True, stub_path=pickle_path)
annotated_frames = tracker.draw_annotations(video_frames, tracks)
#*********************** SAVES VIDEO FILE ************************************#
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
top, width, _ = annotated_frames[0].form
out = cv2.VideoWriter(output_video_path, fourcc, 30, (width, top))
for body in annotated_frames:
out.write(body)
out.launch()
If every part in your code labored accurately, it’s best to anticipate a video output just like the one proven in pattern clip 01.
TIP #01: Don’t underestimate your compute energy! When working the code for the primary time, anticipate the body processing to take a while, relying in your compute capability. For me, it took between 45 to 50 minutes utilizing solely a CPU setup (take into account CUDA as an choice). The YOLOv8x monitoring mechanism, whereas highly effective, calls for vital compute assets (at occasions, my reminiscence hit 99%, fingers crossed it didn’t crash!🙄). When you encounter points with this model of YOLO, lighter fashions can be found on Ultralytics’ GitHub to steadiness accuracy and compute capability.
As you’ve seen from step one, we’ve got some challenges. Firstly, as anticipated, the mannequin picks up all shifting objects; gamers, referees, even these exterior the rink. Secondly, these pink bounding packing containers could make monitoring gamers a bit unclear and never very neat for presentation. On this part, we’ll give attention to narrowing our detection to things throughout the rink solely. Plus, we’ll swap out these bounding packing containers for ellipses on the backside, guaranteeing clearer visibility.
Let’s change from utilizing packing containers to utilizing ellipses first. To perform this, we’ll merely add a brand new methodology above the labels and bounding packing containers methodology in our current code:
#************ Design of Ellipse for monitoring gamers as an alternative of Bounding packing containers**************#def draw_ellipse(self, body, bbox, colour, track_id=None, workforce=None):
y2 = int(bbox[3])
x_center = (int(bbox[0]) + int(bbox[2])) // 2
width = int(bbox[2]) - int(bbox[0])
colour = (255, 0, 0)
text_color = (255, 255, 255)
cv2.ellipse(
body,
middle=(x_center, y2),
axes=(int(width) // 2, int(0.35 * width)),
angle=0.0,
startAngle=-45,
endAngle=235,
colour=colour,
thickness=2,
lineType=cv2.LINE_4
)
if track_id just isn't None:
rectangle_width = 40
rectangle_height = 20
x1_rect = x_center - rectangle_width // 2
x2_rect = x_center + rectangle_width // 2
y1_rect = (y2 - rectangle_height // 2) + 15
y2_rect = (y2 + rectangle_height // 2) + 15
cv2.rectangle(body,
(int(x1_rect), int(y1_rect)),
(int(x2_rect), int(y2_rect)),
colour,
cv2.FILLED)
x1_text = x1_rect + 12
if track_id > 99:
x1_text -= 10
font_scale = 0.4
cv2.putText(
body,
f"{track_id}",
(int(x1_text), int(y1_rect + 15)),
cv2.FONT_HERSHEY_SIMPLEX,
font_scale,
text_color,
thickness=2
)
return body
We’ll additionally have to replace the annotation step by changing the bounding packing containers and IDs with a name to the ellipse methodology:
#***********************BOUNDING BOXES AND TRACK-IDs**************************#def draw_annotations(self, video_frames, tracks):
output_video_frames = []
for frame_num, body in enumerate(video_frames):
body = body.copy()
player_dict = tracks["person"][frame_num]
# Draw Gamers
for track_id, participant in player_dict.objects():
bbox = participant["bbox"]
# Draw ellipse and monitoring IDs
self.draw_ellipse(body, bbox, (0, 255, 0), track_id)
x1, y1, x2, y2 = map(int, bbox)
output_video_frames.append(body)
return output_video_frames
With these adjustments, your output video ought to look a lot neater, as proven in pattern clip 02.
Now, to work with the rink boundaries, we have to have some primary data of decision in pc imaginative and prescient. In our use case, we’re working with a 720p (1280×720 pixels) format, which implies that every body or picture we course of has dimensions of 1280 pixels (width) by 720 pixels (top).
What does it imply to work with a 720p (1280×720 pixels) format? It implies that the picture is made up of 1280 pixels horizontally and 720 pixels vertically. Coordinates on this format begin at (0, 0) within the top-left nook of the picture, with the x-coordinate rising as you progress proper and the y-coordinate rising as you progress down. These coordinates are used to mark particular areas within the picture, like utilizing (x1, y1) for the top-left nook and (x2, y2) for the bottom-right nook of a field. Understanding this may helps us measure distances and speeds, and resolve the place within the video we wish to focus our evaluation.
That mentioned, we are going to begin marking the body borders with inexperienced strains utilizing the next code:
#********************* Border Definition for Body***********************
import cv2video_path = 'D:/PYTHON/video_input.mp4'
cap = cv2.VideoCapture(video_path)
#**************Learn, Outline and Draw corners of the body****************
ret, body = cap.learn()
bottom_left = (0, 720)
bottom_right = (1280, 720)
upper_left = (0, 0)
upper_right = (1280, 0)
cv2.line(body, bottom_left, bottom_right, (0, 255, 0), 2)
cv2.line(body, bottom_left, upper_left, (0, 255, 0), 2)
cv2.line(body, bottom_right, upper_right, (0, 255, 0), 2)
cv2.line(body, upper_left, upper_right, (0, 255, 0), 2)
#*******************Save the body with marked corners*********************
output_image_path = 'rink_area_marked_VALIDATION.png'
cv2.imwrite(output_image_path, body)
print("Rink space saved:", output_image_path)
The consequence ought to be a inexperienced rectangle as proven in (a) in pattern clip 03. However with a purpose to monitor solely the shifting objects throughout the rink we would want a delimitation extra just like the one in (b) .
Getting (b) proper is like an iterative strategy of trial and error, the place you take a look at completely different coordinates till you discover the boundaries that greatest suit your mannequin. Initially, I aimed to match the rink borders precisely. Nevertheless, the monitoring system struggled close to the perimeters. To enhance accuracy, I expanded the boundaries barely to make sure all monitoring objects throughout the rink had been captured whereas excluding these exterior. The end result, proven in (b), was the very best I might get (you might nonetheless work higher eventualities) outlined by these coordinates:
- Backside Left Nook: (-450, 710)
- Backside Proper Nook: (2030, 710)
- Higher Left Nook: (352, 61)
- Higher Proper Nook: (948, 61)
Lastly, we are going to outline two extra areas: the offensive zones for each the White and Yellow groups (the place every workforce goals to attain). This can allow us to assemble some primary positional statistics and strain metrics for every workforce inside their opponent’s zone.
#**************YELLOW TEAM OFFENSIVE ZONE****************
Backside Left Nook: (-450, 710)
Backside Proper Nook: (2030, 710)
Higher Left Nook: (200, 150)
Higher Proper Nook: (1160, 150)#**************WHITE TEAM OFFENSIVE ZONE****************
Backside Left Nook: (180, 150)
Backside Proper Nook: (1100, 150)
Higher Left Nook: (352, 61)
Higher Proper Nook: (900, 61)
We are going to put aside these coordinates for now and clarify within the subsequent step how we’ll classify every workforce. Then, we’ll deliver all of it collectively into our unique monitoring methodology.
Over 80 years have handed because the launch of “A Logical Calculus of the Ideas Immanent in Nervous Activity”, the paper written by Warren McCulloch and Walter Pitts in 1943, which set the stable floor for early neural community analysis. Later, in 1957, the mathematical mannequin of a simplified neuron (receiving inputs, making use of weights to those inputs, summing them up, and outputting a binary consequence) impressed Frank Rosenblatt to build the Mark I. This was the primary {hardware} implementation designed to reveal the idea of a perceptron, a neural community mannequin able to studying from information to make binary classifications. Since then, the search to make computer systems assume like us hasn’t slowed down. If that is your first deep dive into Neural Networks, or if you wish to refresh and strengthen your data, I like to recommend studying this series of articles by Shreya Rao as an incredible place to begin for deep studying. Moreover, you’ll be able to entry my assortment of tales (completely different contributors) that I’ve gathered here, and which you may discover helpful.
Why select a Convolutional Neural Community (CNN)? Truthfully, it wasn’t my first selection. Initially, I attempted constructing a mannequin with LandingAI, a user-friendly platform for cloud deployment and Python connection through APIs. Nevertheless, latency points appeared (over 1,000 frames to course of on-line). Related latency issues occurred with pre-trained fashions in Roboflow, regardless of their high quality datasets and pre-trained fashions. Realizing the necessity to run it regionally, I attempted an MSE-based methodology to categorise jersey colours for workforce and referee detection. Whereas it appeared like the ultimate answer, it confirmed low accuracy. After days of trial and error, I switched to CNNs. Amongst completely different deep studying approaches, CNNs are well-suited for object detection, in contrast to LSTM or RNN, that are higher match for sequential information like language transcription or translation.
Earlier than diving into the code, let’s cowl some primary ideas about its structure:
- Pattern Dataset for studying: The dataset has been labeled into three courses: Referee, Team_Away (White jersey gamers), and Team_Home (Yellow jersey gamers). A pattern of every class has been divided into two units: coaching information and validation information. The coaching information might be utilized by the CNN in every iteration (Epoch) to “study” patterns throughout a number of layers. The validation information might be used on the finish of every iteration to guage the mannequin’s efficiency and measure how effectively it generalizes to new information. Creating the pattern dataset wasn’t too arduous; it took me round 30 to 40 minutes to crop pattern photographs from every class from the video and set up them into subdirectories. I managed to create a pattern dataset of roughly 90 photographs that you will discover within the project’s GitHub repository.
- How does the mannequin study?: Enter information strikes by way of every layer of the neural community, which might have one or a number of layers linked collectively to make predictions. Each layer makes use of an activation operate that processes information to make predictions or introduce adjustments to the info. Every connection between these layers has a weight, which determines how a lot affect one layer’s output has on the following. The objective is to seek out the best mixture of those weights that decrease errors when predicting outcomes. By means of a course of referred to as backpropagation and a loss operate, the mannequin adjusts these weights to scale back errors and enhance accuracy. This course of repeats in what’s referred to as an Epoch (ahead move + backpropagation), with the mannequin getting higher at making predictions in every cycle because it learns from its errors.
- Activation Perform: As talked about earlier than, the activation operate performs an essential function within the mannequin’s studying course of. I selected ReLU (Rectified Linear Unit) as a result of it’s recognized for being computationally environment friendly and mitigating what is known as the vanishing gradient drawback (the place networks with a number of layers might cease studying successfully). Whereas ReLU works effectively, other functions like sigmoid, tanh, or swish even have their makes use of relying on how complicated the community is.
- Epochs: Setting the best variety of epochs entails experimentation. You need to bear in mind components such because the complexity of the dataset, the structure of your CNN mannequin, and computational assets. Typically, it’s best to observe the mannequin’s efficiency in every iteration and cease coaching when enhancements change into minimal to forestall overfitting. Given my small coaching dataset, I made a decision to begin with 10 epochs as a baseline. Nevertheless, changes could also be obligatory in different eventualities primarily based on metric efficiency and validation outcomes.
- Adam (Adaptive Second Estimation): Finally, the objective is to scale back the error between predicted and true outputs. As talked about earlier than, backpropagation performs a key function right here by adjusting and updating neural community weights to enhance predictions over time. Whereas backpropagation handles weight updates primarily based on gradients from the loss operate, the Adam algorithm enhances this course of by dynamically adjusting the educational fee to step by step decrease the error or loss operate. In different phrases, it fine-tunes how shortly the mannequin learns.
That mentioned with a purpose to run our CNN mannequin we are going to want the next Python packages:
pip set up torch torchvision
pip set up matplotlib
pip set up scikit-learn
Tip-02: Be certain that PyTorch it’s put in correctly. All my instruments are arrange in an Anaconda atmosphere, and after I put in PyTorch, at first, it appeared that it was arrange accurately. Nevertheless, some points appeared whereas working some libraries. Initially, I believed it was the code, however after a number of revisions and no success, I needed to reinstall Anaconda and set up PyTorch in a clear atmosphere, and with that, drawback mounted!
Subsequent, we’ll specify our libraries and the trail of our pattern dataset:
# ************CONVOLUTIONAL NEURAL NETWORK-THREE CLASSES DETECTION**************************
# REFEREE
# WHITE TEAM (Team_away)
# YELLOW TEAM (Team_home)import os
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.useful as F
import torchvision.transforms as transforms
import torchvision.datasets as datasets
from torch.utils.information import DataLoader
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
import matplotlib.pyplot as plt
#Coaching and Validation Datasets
#Obtain the teams_sample_dataset file from the undertaking's GitHub repository
data_dir = 'D:/PYTHON/teams_sample_dataset'
First, we are going to guarantee every image is equally sized (resize to 150×150 pixels), then convert it to a format that the code can perceive (in PyTorch, enter information is usually represented as Tensor objects). Lastly, we are going to regulate the colours to make them simpler for the mannequin to work with (normalize) and arrange a process to load the pictures. These steps collectively assist put together the photographs and set up them so the mannequin can successfully begin studying from them, avoiding deviations attributable to information format.
#******************************Knowledge transformation***********************************
remodel = transforms.Compose([
transforms.Resize((150, 150)),
transforms.ToTensor(),
transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
])# Load dataset
train_dataset = datasets.ImageFolder(os.path.be part of(data_dir, 'practice'), remodel=remodel)
val_dataset = datasets.ImageFolder(os.path.be part of(data_dir, 'val'), remodel=remodel)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=32, shuffle=False)
Subsequent we’ll outline our CNN’s structure:
#********************************CNN Mannequin Structure**************************************
class CNNModel(nn.Module):
def __init__(self):
tremendous(CNNModel, self).__init__()
self.conv1 = nn.Conv2d(3, 32, kernel_size=3, padding=1)
self.pool = nn.MaxPool2d(kernel_size=2, stride=2, padding=0)
self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
self.conv3 = nn.Conv2d(64, 128, kernel_size=3, padding=1)
self.fc1 = nn.Linear(128 * 18 * 18, 512)
self.dropout = nn.Dropout(0.5)
self.fc2 = nn.Linear(512, 3) #Three Courses (Referee, Team_away,Team_home)def ahead(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = self.pool(F.relu(self.conv3(x)))
x = x.view(-1, 128 * 18 * 18)
x = F.relu(self.fc1(x))
x = self.dropout(x)
x = self.fc2(x)
return x
You’ll discover that our CNN mannequin has three layers (conv1, conv2, conv3). The information begins within the convolutional layer (conv), the place the activation operate (ReLU) is utilized. This operate allows the community to study complicated patterns and relationships within the information. Following this, the pooling layer is activated. What’s Max Pooling? It’s a way that reduces the picture dimension whereas retaining essential options, which helps in environment friendly coaching and optimizes reminiscence assets. This course of repeats throughout conv1 to conv3. Lastly, the info passes by way of totally linked layers (fc1, fc2) for remaining classification (or decision-making).
As the following step, we initialize our mannequin, configure categorical cross-entropy because the loss operate (generally used for classification duties), and designate Adam as our optimizer. As talked about earlier, we’ll execute our mannequin over a full cycle of 10 epochs.
#********************************CNN TRAINING**********************************************# Mannequin-loss function-optimizer
mannequin = CNNModel()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(mannequin.parameters(), lr=0.001)
#*********************************Coaching*************************************************
num_epochs = 10
train_losses, val_losses = [], []
for epoch in vary(num_epochs):
mannequin.practice()
running_loss = 0.0
for inputs, labels in train_loader:
optimizer.zero_grad()
outputs = mannequin(inputs)
labels = labels.sort(torch.LongTensor)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
running_loss += loss.merchandise()
train_losses.append(running_loss / len(train_loader))
mannequin.eval()
val_loss = 0.0
all_labels = []
all_preds = []
with torch.no_grad():
for inputs, labels in val_loader:
outputs = mannequin(inputs)
labels = labels.sort(torch.LongTensor)
loss = criterion(outputs, labels)
val_loss += loss.merchandise()
_, preds = torch.max(outputs, 1)
all_labels.prolong(labels.tolist())
all_preds.prolong(preds.tolist())
To trace efficiency, we are going to add some code to observe the coaching progress, print validation metrics, and plot them. Lastly, we save the mannequin as hockey_team_classifier.pth in a delegated path of your selection.
#********************************METRICS & PERFORMANCE************************************val_losses.append(val_loss / len(val_loader))
val_accuracy = accuracy_score(all_labels, all_preds)
val_precision = precision_score(all_labels, all_preds, common='macro', zero_division=1)
val_recall = recall_score(all_labels, all_preds, common='macro', zero_division=1)
val_f1 = f1_score(all_labels, all_preds, common='macro', zero_division=1)
print(f"Epoch [{epoch + 1}/{num_epochs}], "
f"Loss: {train_losses[-1]:.4f}, "
f"Val Loss: {val_losses[-1]:.4f}, "
f"Val Acc: {val_accuracy:.2%}, "
f"Val Precision: {val_precision:.4f}, "
f"Val Recall: {val_recall:.4f}, "
f"Val F1 Rating: {val_f1:.4f}")
#*******************************SHOW METRICS & PERFORMANCE**********************************
plt.plot(train_losses, label='Prepare Loss')
plt.plot(val_losses, label='Validation Loss')
plt.legend()
plt.present()
# SAVE THE MODEL FOR THE GH_CV_track_teams CODE
torch.save(mannequin.state_dict(), 'D:/PYTHON/hockey_team_classifier.pth')
Moreover, alongside your “pth” file, after working by way of all of the steps described above (you will discover the entire code in the project’s GitHub repository, it’s best to anticipate to see an output like the next (metrics might fluctuate barely):
#**************CNN PERFORMANCE ACROSS TRAINING EPOCHS************************Epoch [1/10], Loss: 1.5346, Val Loss: 1.2339, Val Acc: 47.37%, Val Precision: 0.7172, Val Recall: 0.5641, Val F1 Rating: 0.4167
Epoch [2/10], Loss: 1.1473, Val Loss: 1.1664, Val Acc: 55.26%, Val Precision: 0.6965, Val Recall: 0.6296, Val F1 Rating: 0.4600
Epoch [3/10], Loss: 1.0139, Val Loss: 0.9512, Val Acc: 57.89%, Val Precision: 0.6054, Val Recall: 0.6054, Val F1 Rating: 0.5909
Epoch [4/10], Loss: 0.8937, Val Loss: 0.8242, Val Acc: 60.53%, Val Precision: 0.7222, Val Recall: 0.5645, Val F1 Rating: 0.5538
Epoch [5/10], Loss: 0.7936, Val Loss: 0.7177, Val Acc: 63.16%, Val Precision: 0.6667, Val Recall: 0.6309, Val F1 Rating: 0.6419
Epoch [6/10], Loss: 0.6871, Val Loss: 0.7782, Val Acc: 68.42%, Val Precision: 0.6936, Val Recall: 0.7128, Val F1 Rating: 0.6781
Epoch [7/10], Loss: 0.6276, Val Loss: 0.5684, Val Acc: 78.95%, Val Precision: 0.8449, Val Recall: 0.7523, Val F1 Rating: 0.7589
Epoch [8/10], Loss: 0.4198, Val Loss: 0.5613, Val Acc: 86.84%, Val Precision: 0.8736, Val Recall: 0.8958, Val F1 Rating: 0.8653
Epoch [9/10], Loss: 0.3959, Val Loss: 0.3824, Val Acc: 92.11%, Val Precision: 0.9333, Val Recall: 0.9213, Val F1 Rating: 0.9243
Epoch [10/10], Loss: 0.2509, Val Loss: 0.2651, Val Acc: 97.37%, Val Precision: 0.9762, Val Recall: 0.9792, Val F1 Rating: 0.9769
After ending 10 epochs, the CNN mannequin exhibits enchancment in efficiency metrics. Initially, in Epoch 1, the mannequin begins with a coaching lack of 1.5346 and a validation accuracy of 47.37%. How ought to we perceive this preliminary level?
Accuracy is among the commonest metrics for evaluating classification efficiency. In our case, it represents the proportion of accurately predicted courses out of the entire. Nevertheless, excessive accuracy alone doesn’t assure general mannequin efficiency; you continue to can have poor predictions for particular courses (as I skilled in early trials). Concerning coaching loss, it measures how successfully the mannequin learns to map enter information to the proper labels. Since we’re utilizing a classification operate, Cross-Entropy Loss quantifies the distinction between predicted class chances and precise labels. A beginning worth like 1.5346 signifies vital variations between predicted and precise courses; ideally, this worth ought to method 0 as coaching progresses. As epochs progress, we observe a major drop in coaching loss and a rise in validation accuracy. By the ultimate epoch, the coaching and validation loss attain lows of 0.2509 and 0.2651, respectively.
To check our CNN mannequin, we will choose a pattern of participant photographs and consider its prediction functionality. For testing, you’ll be able to run the next code and make the most of the validation_dataset folder within the project’s GitHub repository.
# *************TEST CNN MODEL WITH SAMPLE DATASET***************************import os
import torch
import torch.nn as nn
import torch.nn.useful as F
import torchvision.transforms as transforms
from PIL import Picture
# SAMPLE DATASET FOR VALIDATION
test_dir = 'D:/PYTHON/validation_dataset'
# CNN MODEL FOR TEAM PREDICTIONS
class CNNModel(nn.Module):
def __init__(self):
tremendous(CNNModel, self).__init__()
self.conv1 = nn.Conv2d(3, 32, kernel_size=3, padding=1)
self.pool = nn.MaxPool2d(kernel_size=2, stride=2, padding=0)
self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
self.conv3 = nn.Conv2d(64, 128, kernel_size=3, padding=1)
self.fc1 = nn.Linear(128 * 18 * 18, 512)
self.dropout = nn.Dropout(0.5)
self.fc2 = nn.Linear(512, 3)
def ahead(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = self.pool(F.relu(self.conv3(x)))
x = x.view(-1, 128 * 18 * 18)
x = F.relu(self.fc1(x))
x = self.dropout(x)
x = self.fc2(x)
return x
# CNN MODEL PREVIOUSLY SAVED
mannequin = CNNModel()
mannequin.load_state_dict(torch.load('D:/PYTHON/hockey_team_classifier.pth'))
mannequin.eval()
remodel = transforms.Compose([
transforms.Resize((150, 150)),
transforms.ToTensor(),
transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
])
#******************ITERATION ON SAMPLE IMAGES-ACCURACY TEST*****************************
class_names = ['team_referee', 'team_away', 'team_home']
def predict_image(image_path, mannequin, remodel):
# LOADS DATASET
picture = Picture.open(image_path)
picture = remodel(picture).unsqueeze(0)
# MAKES PREDICTIONS
with torch.no_grad():
output = mannequin(picture)
_, predicted = torch.max(output, 1)
workforce = class_names[predicted.item()]
return workforce
for image_name in os.listdir(test_dir):
image_path = os.path.be part of(test_dir, image_name)
if os.path.isfile(image_path):
predicted_team = predict_image(image_path, mannequin, remodel)
print(f'Picture {image_name}: The participant belongs to {predicted_team}')
The output ought to look one thing like this:
# *************CNN MODEL TEST - OUTPUT ***********************************#Picture Away_image04.jpg: The participant belongs to team_away
Picture Away_image12.jpg: The participant belongs to team_away
Picture Away_image14.jpg: The participant belongs to team_away
Picture Home_image07.jpg: The participant belongs to team_home
Picture Home_image13.jpg: The participant belongs to team_home
Picture Home_image16.jpg: The participant belongs to team_home
Picture Referee_image04.jpg: The participant belongs to team_referee
Picture Referee_image09.jpg: The participant belongs to team_referee
Picture Referee_image10.jpg: The participant belongs to team_referee
Picture Referee_image11.jpg: The participant belongs to team_referee
As you’ll be able to see, the mannequin exhibits fairly good skill in figuring out groups and excluding the referee as a workforce participant.
Tip #03: One thing I realized throughout the CNN design course of is that including complexity doesn’t at all times enhance efficiency. Initially, I experimented with deeper fashions (extra convolutional layers) and color-based augmentation to boost gamers’ jersey recognition. Nevertheless, in my small dataset, I encountered overfitting slightly than studying generalizable options (all photographs had been predicted as white workforce gamers or referees). Regularization strategies like dropout and batch normalization are additionally essential; they assist impose constraints throughout coaching, guaranteeing the mannequin can generalize effectively to new information. Much less can generally imply extra by way of outcomes😁.
Placing all of it collectively would require some changes to our monitoring mechanism described earlier. Right here’s a breakdown of the up to date code step-by-step.
First, we’ll arrange the libraries and paths we’d like. Notice that the paths for our pickle file and the CNN mannequin are specified now. This time, if the pickle file isn’t discovered within the path, the code will throw an error. Use the earlier code to generate the pickle file if wanted, and use this up to date model to carry out the video evaluation:
import cv2
import numpy as np
from ultralytics import YOLO
import pickle
import torch
import torch.nn as nn
import torch.nn.useful as F
import torchvision.transforms as transforms
from PIL import Picture# MODEL INPUTS
model_path = 'D:/PYTHON/yolov8x.pt'
video_path = 'D:/PYTHON/video_input.mp4'
output_path = 'D:/PYTHON/output_video.mp4'
tracks_path = 'D:/PYTHON/stubs/track_stubs.pkl'
classifier_path = 'D:/PYTHON/hockey_team_classifier.pth'
Subsequent, we are going to load the fashions, specify the rink coordinates, and provoke the method of detecting objects in every body in batches of 20, as we did earlier than. Notice that for now, we are going to solely use the rink boundaries to focus the evaluation on the rink. Within the remaining steps of the article, after we embrace efficiency stats, we’ll use the offensive zone coordinates.
#*************************** Masses fashions and rink coordinates********************#
class_names = ['Referee', 'Tm_white', 'Tm_yellow']class HockeyAnalyzer:
def __init__(self, model_path, classifier_path):
self.mannequin = YOLO(model_path)
self.classifier = self.load_classifier(classifier_path)
self.remodel = transforms.Compose([
transforms.Resize((150, 150)),
transforms.ToTensor(),
transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
])
self.rink_coordinates = np.array([[-450, 710], [2030, 710], [948, 61], [352, 61]])
self.zone_white = [(180, 150), (1100, 150), (900, 61), (352, 61)]
self.zone_yellow = [(-450, 710), (2030, 710), (1160, 150), (200, 150)]
#******************** Detect objects in every body **********************************#
def detect_frames(self, frames):
batch_size = 20
detections = []
for i in vary(0, len(frames), batch_size):
detections_batch = self.mannequin.predict(frames[i:i+batch_size], conf=0.1)
detections += detections_batch
return detections
Subsequent, we’ll add the method to foretell every participant’s workforce:
#*********************** Masses CNN Mannequin**********************************************#def load_classifier(self, classifier_path):
mannequin = CNNModel()
mannequin.load_state_dict(torch.load(classifier_path, map_location=torch.system('cpu')))
mannequin.eval()
return mannequin
def predict_team(self, picture):
with torch.no_grad():
output = self.classifier(picture)
_, predicted = torch.max(output, 1)
predicted_index = predicted.merchandise()
workforce = class_names[predicted_index]
return workforce
As the following step, we’ll add the tactic described earlier to modify from bounding packing containers to ellipses:
#************ Ellipse for monitoring gamers as an alternative of Bounding packing containers*******************#
def draw_ellipse(self, body, bbox, colour, track_id=None, workforce=None):
y2 = int(bbox[3])
x_center = (int(bbox[0]) + int(bbox[2])) // 2
width = int(bbox[2]) - int(bbox[0])if workforce == 'Referee':
colour = (0, 255, 255)
text_color = (0, 0, 0)
else:
colour = (255, 0, 0)
text_color = (255, 255, 255)
cv2.ellipse(
body,
middle=(x_center, y2),
axes=(int(width) // 2, int(0.35 * width)),
angle=0.0,
startAngle=-45,
endAngle=235,
colour=colour,
thickness=2,
lineType=cv2.LINE_4
)
if track_id just isn't None:
rectangle_width = 40
rectangle_height = 20
x1_rect = x_center - rectangle_width // 2
x2_rect = x_center + rectangle_width // 2
y1_rect = (y2 - rectangle_height // 2) + 15
y2_rect = (y2 + rectangle_height // 2) + 15
cv2.rectangle(body,
(int(x1_rect), int(y1_rect)),
(int(x2_rect), int(y2_rect)),
colour,
cv2.FILLED)
x1_text = x1_rect + 12
if track_id > 99:
x1_text -= 10
font_scale = 0.4
cv2.putText(
body,
f"{track_id}",
(int(x1_text), int(y1_rect + 15)),
cv2.FONT_HERSHEY_SIMPLEX,
font_scale,
text_color,
thickness=2
)
return body
Now, it’s time so as to add the analyzer that features studying the pickle file, narrowing the evaluation throughout the rink boundaries we outlined earlier, and calling the CNN mannequin to determine every participant’s workforce and add labels. Notice that we embrace a function to label referees with a distinct colour and alter the colour of their ellipses as effectively. The code ends with writing processed frames to an output video.
#******************* Masses Tracked Knowledge (pickle file )**********************************#def analyze_video(self, video_path, output_path, tracks_path):
with open(tracks_path, 'rb') as f:
tracks = pickle.load(f)
cap = cv2.VideoCapture(video_path)
if not cap.isOpened():
print("Error: Couldn't open video.")
return
fps = cap.get(cv2.CAP_PROP_FPS)
frame_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
frame_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
fourcc = cv2.VideoWriter_fourcc(*'XVID')
out = cv2.VideoWriter(output_path, fourcc, fps, (frame_width, frame_height))
frame_num = 0
whereas cap.isOpened():
ret, body = cap.learn()
if not ret:
break
#***********Checks if the participant falls throughout the rink space**********************************#
masks = np.zeros(body.form[:2], dtype=np.uint8)
cv2.fillConvexPoly(masks, self.rink_coordinates, 1)
masks = masks.astype(bool)
# Draw rink space
#cv2.polylines(body, [self.rink_coordinates], isClosed=True, colour=(0, 255, 0), thickness=2)
# Get tracks from body
player_dict = tracks["person"][frame_num]
for track_id, participant in player_dict.objects():
bbox = participant["bbox"]
# Test if the participant is throughout the Rink Space
x_center = int((bbox[0] + bbox[2]) / 2)
y_center = int((bbox[1] + bbox[3]) / 2)
if not masks[y_center, x_center]:
proceed
#**********************************Group Prediction********************************************#
x1, y1, x2, y2 = map(int, bbox)
cropped_image = body[y1:y2, x1:x2]
cropped_pil_image = Picture.fromarray(cv2.cvtColor(cropped_image, cv2.COLOR_BGR2RGB))
transformed_image = self.remodel(cropped_pil_image).unsqueeze(0)
workforce = self.predict_team(transformed_image)
#************ Ellipse for tracked gamers and labels******************************************#
self.draw_ellipse(body, bbox, (0, 255, 0), track_id, workforce)
font_scale = 1
text_offset = -20
if workforce == 'Referee':
rectangle_width = 60
rectangle_height = 25
x1_rect = x1
x2_rect = x1 + rectangle_width
y1_rect = y1 - 30
y2_rect = y1 - 5
# Completely different setup for Referee
cv2.rectangle(body,
(int(x1_rect), int(y1_rect)),
(int(x2_rect), int(y2_rect)),
(0, 0, 0),
cv2.FILLED)
text_color = (255, 255, 255)
else:
if workforce == 'Tm_white':
text_color = (255, 215, 0) # White Group: Blue labels
else:
text_color = (0, 255, 255) # Yellow Group: Yellow labels
# Draw Group labels
cv2.putText(
body,
workforce,
(int(x1), int(y1) + text_offset),
cv2.FONT_HERSHEY_PLAIN,
font_scale,
text_color,
thickness=2
)
# Write output video
out.write(body)
frame_num += 1
cap.launch()
out.launch()
Lastly, we add the CNN’s structure (outlined within the CNN design course of) and execute the Hockey analyzer:
#**********************CNN Mannequin Structure ******************************#
class CNNModel(nn.Module):
def __init__(self):
tremendous(CNNModel, self).__init__()
self.conv1 = nn.Conv2d(3, 32, kernel_size=3, padding=1)
self.pool = nn.MaxPool2d(kernel_size=2, stride=2, padding=0)
self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
self.conv3 = nn.Conv2d(64, 128, kernel_size=3, padding=1)
self.fc1 = nn.Linear(128 * 18 * 18, 512)
self.dropout = nn.Dropout(0.5)
self.fc2 = nn.Linear(512, len(class_names)) def ahead(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = self.pool(F.relu(self.conv3(x)))
x = x.view(-1, 128 * 18 * 18)
x = F.relu(self.fc1(x))
x = self.dropout(x)
x = self.fc2(x)
return x
#*********Execute HockeyAnalyzer/classifier and Save Output************#
analyzer = HockeyAnalyzer(model_path, classifier_path)
analyzer.analyze_video(video_path, output_path, tracks_path)
After working all of the steps, your video output ought to look one thing like this:
Notice that on this final replace, object detections are solely throughout the ice rink, and groups are differentiated, in addition to the referee. Whereas the CNN mannequin nonetheless wants fine-tuning and infrequently loses stability with some gamers, it stays principally dependable and correct all through the video.
The flexibility to trace groups and gamers opens up thrilling prospects for measuring efficiency, resembling producing heatmaps, analyzing pace and distance lined, monitoring actions like zone entries or exits, and diving into detailed participant metrics. So as we will have a style of it, we’ll add three efficiency metrics: common pace per participant, skating distance lined by every workforce, and offensive strain (measured as the proportion of distance lined by every workforce spent in its opponent’s zone). I’ll depart extra detailed statistics as much as you!
We start adapting the coordinates of the ice rink from pixel-based measurements to approximate meters. This adjustment permits us to learn our information in meters slightly than pixels. The true-world dimensions of the ice rink seen within the video are roughly 15mx30m (15 meters in width and 30 meters in top). To facilitate this conversion, we introduce a technique to transform pixel coordinates to meters. By defining the rink’s precise dimensions and utilizing the pixel coordinates of its corners (from left to proper and prime to backside), we acquire conversion components. These components will assist our strategy of estimating distances in meters and speeds in meters per second. (One other attention-grabbing method you’ll be able to discover and apply is Perspective Transformation)
#*********************Masses fashions and rink coordinates*****************#
class_names = ['Referee', 'Tm_white', 'Tm_yellow']class HockeyAnalyzer:
def __init__(self, model_path, classifier_path):
*
*
*
*
*
*
self.pixel_to_meter_conversion() #<------ Add this utility methodology
#***********Pixel-based measurements to meters***************************#
def pixel_to_meter_conversion(self):
#Rink actual dimensions in meters
rink_width_m = 15
rink_height_m = 30
#Pixel coordinates for rink dimensions
left_pixel, right_pixel = self.rink_coordinates[0][0], self.rink_coordinates[1][0]
top_pixel, bottom_pixel = self.rink_coordinates[2][1], self.rink_coordinates[0][1]
#Conversion components
self.pixels_per_meter_x = (right_pixel - left_pixel) / rink_width_m
self.pixels_per_meter_y = (bottom_pixel - top_pixel) / rink_height_m
def convert_pixels_to_meters(self, distance_pixels):
#Convert pixels to meters
return distance_pixels / self.pixels_per_meter_x, distance_pixels / self.pixels_per_meter_y
We are actually able to add pace to every participant measured in meters per second. To do that, we’ll have to make three modifications. First, provoke an empty dictionary named previous_positions within the HockeyAnalyzer class to assist us evaluate the present and former positions of gamers. Equally, we’ll create a team_stats construction to retailer stats from every workforce for additional visualization.
Subsequent, we are going to add a pace methodology to estimate gamers’ pace in pixels per second, after which use the conversion issue (defined earlier) to remodel it into meters per second. Lastly, from the analyze_video methodology, we’ll name our new pace methodology and add the pace to every tracked object (gamers and referee). That is what the adjustments appear to be:
#*********************Masses fashions and rink coordinates*****************#
class_names = ['Referee', 'Tm_white', 'Tm_yellow']class HockeyAnalyzer:
def __init__(self, model_path, classifier_path):
*
*
*
*
*
*
*
self.pixel_to_meter_conversion()
self.previous_positions = {} #<------ Add this.Initializes empty dictionary
self.team_stats = {
'Tm_white': {'distance': 0, 'pace': [], 'rely': 0, 'offensive_pressure': 0},
'Tm_yellow': {'distance': 0, 'pace': [], 'rely': 0, 'offensive_pressure': 0}
} #<------ Add this.Initializes empty dictionary
#**************** Velocity: meters per second********************************#
def calculate_speed(self, track_id, x_center, y_center, fps):
current_position = (x_center, y_center)
if track_id in self.previous_positions:
prev_position = self.previous_positions[track_id]
distance_pixels = np.linalg.norm(np.array(current_position) - np.array(prev_position))
distance_meters_x, distance_meters_y = self.convert_pixels_to_meters(distance_pixels)
speed_meters_per_second = (distance_meters_x**2 + distance_meters_y**2)**0.5 * fps
else:
speed_meters_per_second = 0
self.previous_positions[track_id] = current_position
return speed_meters_per_second
#******************* Masses Tracked Knowledge (pickle file )**********************************#
def analyze_video(self, video_path, output_path, tracks_path):
with open(tracks_path, 'rb') as f:
tracks = pickle.load(f)
*
*
*
*
*
*
*
*
# Draw Group label
cv2.putText(
body,
workforce,
(int(x1), int(y1) + text_offset),
cv2.FONT_HERSHEY_PLAIN,
font_scale,
text_color,
thickness=2
)
#**************Add these strains of code --->:
pace = self.calculate_speed(track_id, x_center, y_center, fps)
# Velocity label
speed_font_scale = 0.8
speed_y_position = int(y1) + 20
if speed_y_position > int(y1) - 5:
speed_y_position = int(y1) - 5
cv2.putText(
body,
f"Velocity: {pace:.2f} m/s",
(int(x1), speed_y_position),
cv2.FONT_HERSHEY_PLAIN,
speed_font_scale,
text_color,
thickness=2
)
# Write output video
out.write(body)
frame_num += 1
cap.launch()
out.launch()
When you’ve got troubles including this new strains of code, you’ll be able to at all times go to the project’s GitHub repository, the place you will discover the entire built-in code. Your video output at this level ought to appear to be this (discover that the pace has been added to the label of every participant):
Lastly, let’s add a stats board the place we will monitor the common pace per participant for every workforce, together with different metrics resembling distance lined and offensive strain within the opponent’s zone.
We’ve already outlined the offensive zones and built-in them into our code. Now, we have to monitor how typically every participant enters their opponent’s zone. To attain this, we’ll implement a technique utilizing the ray casting algorithm. This algorithm checks if a participant’s place is contained in the white or yellow workforce’s offensive zone. It really works by drawing an imaginary line from the participant to the goal zone. If the road crosses one border, the participant is inside, if it crosses extra (in our case, two out of 4 borders), the participant is exterior. The code then scans all the video to find out every tracked object’s zone standing.
#************ Find participant's place in Goal Zone***********************#def is_inside_zone(self, place, zone):
x, y = place
n = len(zone)
inside = False
p1x, p1y = zone[0]
for i in vary(n + 1):
p2x, p2y = zone[i % n]
if y > min(p1y, p2y):
if y <= max(p1y, p2y):
if x <= max(p1x, p2x):
if p1y != p2y:
xinters = (y - p1y) * (p2x - p1x) / (p2y - p1y) + p1x
if p1x == p2x or x <= xinters:
inside = not inside
p1x, p1y = p2x, p2y
return inside
Now we’ll deal with the efficiency metrics by including a technique that shows common participant pace, whole distance lined, and offensive strain (share of time spent within the opponent’s zone) on a desk format for every workforce. Utilizing OpenCV, we’ll format these metrics right into a desk overlaid on the video and we’ll incorporate a dynamic replace mechanism to take care of real-time statistics throughout gameplay.
#*******************************Efficiency metrics*********************************************#
def draw_stats(self, body):
avg_speed_white = np.imply(self.team_stats['Tm_white']['speed']) if self.team_stats['Tm_white']['count'] > 0 else 0
avg_speed_yellow = np.imply(self.team_stats['Tm_yellow']['speed']) if self.team_stats['Tm_yellow']['count'] > 0 else 0
distance_white = self.team_stats['Tm_white']['distance']
distance_yellow = self.team_stats['Tm_yellow']['distance']offensive_pressure_white = self.team_stats['Tm_white'].get('offensive_pressure', 0)
offensive_pressure_yellow = self.team_stats['Tm_yellow'].get('offensive_pressure', 0)
Pressure_ratio_W = offensive_pressure_white/distance_white *100 if self.team_stats['Tm_white']['distance'] > 0 else 0
Pressure_ratio_Y = offensive_pressure_yellow/distance_yellow *100 if self.team_stats['Tm_yellow']['distance'] > 0 else 0
desk = [
["", "Away_White", "Home_Yellow"],
["Average SpeednPlayer", f"{avg_speed_white:.2f} m/s", f"{avg_speed_yellow:.2f} m/s"],
["DistancenCovered", f"{distance_white:.2f} m", f"{distance_yellow:.2f} m"],
["OffensivenPressure %", f"{Pressure_ratio_W:.2f} %", f"{Pressure_ratio_Y:.2f} %"],
]
text_color = (0, 0, 0)
start_x, start_y = 10, 590
row_height = 30 # Handle Peak between rows
column_width = 150 # Handle Width between rows
font_scale = 1
def put_multiline_text(body, textual content, place, font, font_scale, colour, thickness, line_type, line_spacing=1.0):
y0, dy = place[1], int(font_scale * 20 * line_spacing) # Regulate line spacing right here
for i, line in enumerate(textual content.cut up('n')):
y = y0 + i * dy
cv2.putText(body, line, (place[0], y), font, font_scale, colour, thickness, line_type)
for i, row in enumerate(desk):
for j, textual content in enumerate(row):
if i in [1,2, 3]:
put_multiline_text(
body,
textual content,
(start_x + j * column_width, start_y + i * row_height),
cv2.FONT_HERSHEY_PLAIN,
font_scale,
text_color,
1,
cv2.LINE_AA,
line_spacing= 0.8
)
else:
cv2.putText(
body,
textual content,
(start_x + j * column_width, start_y + i * row_height),
cv2.FONT_HERSHEY_PLAIN,
font_scale,
text_color,
1,
cv2.LINE_AA,
)
#****************** Observe and replace recreation stats****************************************#
def update_team_stats(self, workforce, pace, distance, place):
if workforce in self.team_stats:
self.team_stats[team]['speed'].append(pace)
self.team_stats[team]['distance'] += distance
self.team_stats[team]['count'] += 1
if workforce == 'Tm_white':
if self.is_inside_zone(place, self.zone_white):
self.team_stats[team]['offensive_pressure'] += distance
elif workforce == 'Tm_yellow':
if self.is_inside_zone(place, self.zone_yellow):
self.team_stats[team]['offensive_pressure'] += distance
So as the stats show within the video we’ll must name the tactic within the analyze_video methodology, so be sure you add this additional strains of code after the pace label is outlined and simply earlier than the output video is processed:
*
*
*
*
*
*
*
#Velocity label
speed_font_scale = 0.8
speed_y_position = int(y1) + 20
if speed_y_position > int(y1) - 5:
speed_y_position = int(y1) - 5cv2.putText(
body,
f"Velocity: {pace:.2f} m/s",
(int(x1), speed_y_position),
cv2.FONT_HERSHEY_PLAIN,
speed_font_scale,
text_color,
thickness=2
)
#**************Add these strains of code--->:
distance = pace / fps
place = (x_center, y_center)
self.update_team_stats(workforce, pace, distance, place)
# Write output video
out.write(body)
frame_num += 1
The gap in meters lined by every participant is calculated by dividing their pace (measured in meters per second) by the body fee (frames per second). This calculation permits us to estimate how far every participant strikes between every body change within the video. If every part works effectively, your remaining video output ought to appear to be this:
This mannequin is a primary setup of what will be achieved utilizing pc imaginative and prescient to trace gamers in an ice hockey recreation (or any workforce sport). Nevertheless, there’s numerous fine-tuning that may be completed to enhance it and add new capabilities. Listed here are just a few concepts that I’m engaged on for a subsequent 2.0 model that you may additionally take into account:
The problem of following the puck: Relying on which route your digicam is going through and the decision, monitoring the puck is difficult contemplating its dimension in comparison with a soccer or basketball ball. However in case you obtain this, attention-grabbing prospects speak in confidence to monitor efficiency, resembling possession time metrics, objective alternatives, or photographs information. This additionally applies additionally to particular person performances; in ice hockey, gamers change considerably extra typically than in different workforce sports activities, so monitoring every participant’s efficiency throughout one interval presents a problem.
Compute assets, Oh why compute! I ran all of the code on a CPU association however confronted points (generally leading to blue screens 😥) as a consequence of working out of reminiscence throughout the design course of (think about using a CUDA setup). Our pattern video is about 40 seconds lengthy and initially 5 MB in dimension, however after working the mannequin, the output will increase to as much as 34 MB. Think about the dimensions for a full 20-minute recreation interval. So, it’s best to take into account compute assets and storage when scaling up.
Don’t underestimate MLOps: To deploy and scale quickly, we’d like Machine Studying pipelines which are environment friendly, assist frequent execution, and are dependable. This entails contemplating a Steady Integration-Deployment-Coaching method. Our use case has been constructed for a selected state of affairs, however what if situations change, such because the digicam route or jersey colours? To scale up, we should undertake a CI/CD/CT mindset.