Next Article in Journal
“Realistic Choice of Annual Matrices Contracts the Range of λS Estimates” under Reproductive Uncertainty Too
Previous Article in Journal
Accidental Degeneracy of an Elliptic Differential Operator: A Clarification in Terms of Ladder Operators
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Long-Term Target Tracking of UAVs Based on Kernelized Correlation Filter

1
School of Information Science and Technology, Zhejiang Sci-Tech University, Hangzhou 310018, China
2
Software Engineering Institute, East China Normal University, Shanghai 200062, China
*
Author to whom correspondence should be addressed.
Submission received: 13 November 2021 / Revised: 21 November 2021 / Accepted: 22 November 2021 / Published: 24 November 2021
(This article belongs to the Section Mathematics and Computer Science)

Abstract

:
During the target tracking process of unmanned aerial vehicles (UAVs), the target may disappear from view or be fully occluded by other objects, resulting in tracking failure. Therefore, determining how to identify tracking failure and re-detect the target is the key to the long-term target tracking of UAVs. Kernelized correlation filter (KCF) has been very popular for its satisfactory speed and accuracy since it was proposed. It is very suitable for UAV target tracking systems with high real-time requirements. However, it cannot detect tracking failure, so it is not suitable for long-term target tracking. Based on the above research, we propose an improved KCF to match long-term target tracking requirements. Firstly, we introduce a confidence mechanism to evaluate the target tracking results to determine the status of target tracking. Secondly, the tracking model update strategy is designed to make the model suffer from less background information interference, thereby improving the robustness of the algorithm. Finally, the Normalized Cross Correlation (NCC) template matching is used to make a regional proposal first, and then the tracking model is used for target re-detection. Then, we successfully apply the algorithm to the UAV system. The system uses binocular cameras to estimate the target position accurately, and we design a control method to keep the target in the UAV’s field of view. Our algorithm has achieved the best results in both short-term and long-term evaluations of experiments on tracking benchmarks, which proves that the algorithm is superior to the baseline algorithm and has quite good performance. Outdoor experiments show that the developed UAV system can achieve long-term, autonomous target tracking.

1. Introduction

An unmanned aerial vehicle (UAV) refers to an aircraft that is operated by radio remote control equipment and self-provided program control devices without any pilot, or is completely or intermittently operated by an on-board computer. UAVs were initially developed and used for military applications. Recently, with the advancement of electronic, information, control, sensor technologies and the reduction of manufacturing costs, UAVs have begun to be used in civilian fields and scientific research to solve various problems, such as environmental monitoring [1], search and rescue [2], transportation [3], etc.
In these applications, UAVs are required to track a target autonomously for a long time. Due to the limited load and power consumption, it is impossible to use large sensing devices such as radar to estimate the position of the target. Therefore, camera-based visual target tracking system appears as an alternative, which usually consists of two parts: a tracker and a controller. Among them, the tracker is responsible for locating the position of the target in each frame, and the controller uses the position information obtained from the previous step to track the target accurately. Existing UAV target tracking works either focus on the target tracking methods [4,5] or focus on the control methods. Among them, most of the works related to the study of target tracking methods evaluate their algorithms on general datasets. During the tracking process, the movement of the UAV itself will cause the camera angle to change continuously, resulting in more complicated situation compared to tracking on the ground. Therefore, the above evaluations cannot accurately explain the performance of their algorithms in UAV target tracking. Moreover, although these algorithms have good performance, most of them cannot perform real-time target tracking due to the limitation of on-board computer computing ability. This paper mainly focuses on the research of target tracking methods, so we assume that the control problems of UAVs can be solved well. If interested, you can read related papers on your own, such as the works of Mahony et al. [6] and Carli et al. [7].
During the target tracking process of the UAV, due to camera motion and changes of UAV’s speed, it is easy to cause blur and deformation of the target information. In addition, in the long-term target tracking, the target may disappear from the field of view or be completely occluded, resulting in the loss of the target. Moreover, after obtaining the target location information, it is necessary to design a reasonable control method to make the UAV’s tracking continuous. We summarize the problems of the UAV’s long-term target tracking as follows: (1) How can one design a robust tracker which can reasonably respond to changes of the target and the environment and deal with the problem of occlusion and disappearance of the target? (2) How can one judge the position of the target and design a closed-loop control method so that the UAV can track the target continuously? (3) Due to the limited computing ability of onboard computer, how can one ensure that the system can run in real time?
Based on the above statements, we improved the kernelized correlation filter (KCF) algorithm. Firstly, we evaluate the target tracking state by calculating the peak-to-sidelobe ratio (PSR) of the response map, and use it as the basis of model update which make the model less interfered by background information and more stable. Secondly, the existing tracking model is used for re-detection after target loss. Before that, we apply normalized cross correlation (NCC) template matching to region proposal to reduce the amount of calculation and thus speed up the algorithm. Finally, we propose a UAV target tracking system to track the target continuously. The main contributions of our article are summarized as follows: (1) KCF has been improved to have the ability to detect target loss and re-detect targets, so it is suitable for long-term target tracking now. We also evaluated our tracker reasonably; (2) we use binocular camera to accurately estimate the position of targets and designed a control strategy to keep the target in the UAV’s field of view and keep a certain distance from the UAV; (3) we propose a framework that makes the entire system integrated on Jetson TX2 and mounted on DJI M100 and successfully conduct outdoor experiments, verifying the feasibility of the framework.
The rest of this article is organized as follows: In Section 2, we introduce the related methods and research of UAV target tracking. In Section 3, we introduce the KCF tracker and our algorithms. In Section 4, we conduct experiments on tracking benchmarks. Our tracker is evaluated and the experimental results are given. In Section 5, we introduce the UAV target tracking system and the results of outdoor flight experiments. Finally, our conclusions are given.

2. Related Work

UAV target tracking includes two parts, the target tracking method and schemes of the tracking system. In this section, related work will be introduced based on these two aspects.

2.1. Target Tracking Methods

Existing target tracking methods can be divided into two categories according to the observation model: generative methods and discriminative methods. The generative tracking method is based on the target detection; it does not consider the background information, and builds a model to represent the target through learning, using the model to directly match the target to achieve the purpose of tracking. The discriminative tracking method transforms the target tracking problem into a binary classification problem. It extracts the target and background information to train a classifier, wherein the target area of the current frame is considered as a positive sample, while the background area is considered as a negative sample. Then, it uses the classifier to find the target in the next frame. Compared with the generative tracking method, the discriminative tracking method has higher robustness when dealing with strong occlusion and appearance changes of the target. You can find more details about them in other works such as [8,9]. In recent years, discriminative methods have become the main methods of target tracking, represented by correlation filter methods and deep learning methods which have achieved satisfactory results. The features extracted by deep learning methods have a strong ability to represent the target, so they have high accuracy. However, due to their high computing ability requirements, the tracking speed of deep learning methods is generally slow and cannot achieve real-time tracking. While the correlation filter method has considerable accuracy with high speed, so it is more suitable than deep learning methods for tracking target in real time which was proved by Fu et al. [10], in their works.
Due to the limitation of the UAV’s onboard computing capacity, it is difficult for deep learning methods to achieve real-time tracking, so we mainly introduce correlation filter target tracking methods. Correlated filter was first applied in MOSSE to target tracking by Bolme et al. [11]. MOSSE trains a filter that minimizes the sum of squared error between the actual output of the convolution and the desired output of the convolution so as to map training inputs to the desired training outputs. It reached 669 fps, but due to its single grayscale feature, the tracking accuracy is low. Henriques et al. [12], proposed CSK to solve the problem of sample redundancy caused by sparse sampling in MOSSE. They applied ridge regression, cyclic shift-based approximate dense sampling method, and kernel method in CSK. However, it still uses grayscale feature, which is relatively simple, and there is a bounding effect due to the use of circulant matrix. Henriques et al. [13], subsequently proposed KCF which improves CSK. KCF expands the multi-channel features and uses Histogram of Oriented Gradient (HOG) features instead of gray features. But it cannot solve the problem of target scale change. Danelljan et al. [14], proposed DSST that regards target tracking as both target center translation and target scale change problem, two filters were trained separately to be responsible for these two aspects. DSST solves the problem of multi-scale tracking, but its performance is still limited by the direct use of circular convolution as a classifier and lack of training samples. Li et al. [15], proposed SAMF based on KCF, which uses HOG and Color Names (CN) features and achieves multi-scale target tracking. SAMF uses different scales to search for the maximum response on the whole image to estimate the position and target scale in the new frame, resulting in a decrease in speed. Bertinetto et al. [16], combined DSST and DAT [17] and proposed STAPLE. The correlation filter template feature is not good for fast deformation and fast motion, but it is better for motion blur and lighting changes. While color statistical features are not sensitive to deformation, they do not perform well in the case of lighting changes and similar colors in the background. Therefore, these two types of methods can complement each other. With the development of deep learning, people began to add depth features to correlation filter in the algorithm, and subsequent algorithms such as DeepSRDCF [18], SRDCFdecon [19], C-COT [20], ASRCF [21] and so on have appeared. Although these works are getting better and better, they are becoming more and more complex and cannot achieve real-time performance. The ECO-HC proposed by Danelljan et al. [22], which takes into account both accuracy and speed not only exceeds most deep learning methods, but also has a speed of 60 fps, so it is a superior correlation filter method in recent years.
Although the correlation filter tracking algorithms mentioned above all meet the requirements of real-time tracking, they all assume that the target has not been lost during the tracking process. Therefore, they are more suitable for short-term target tracking where the target is never lost. However, in the process of long-term target tracking, the target is easily occluded and lost. Therefore, solving this problem is the key to the success of long-term target tracking.

2.2. Target Tracking on UAVs

Many research on UAV target tracking system have been proposed. Pestana et al. [23], used OpenTLD [24], on UAV to track the target. OpenTLD uses the tracking module to track the target, and the detection module is used for re-detection after the target is lost. OpenTLD has certain target tracking and re-detection capabilities, but it updates the detection model every frame, which leads to model drift and false target re-detection [25]. Mueller et al. [26], proposed UAV50 as dataset of target tracking for UAVs. Then they evaluated several trackers on the dataset, among which Struck [27], performed the best. They improved it to enable multi-scale detection of targets, and applied it on two UAVs to achieve continuous tracking of the same target successfully. However, with the development of target tracking, the performance of Struck can no longer be compared with the latest algorithm. In recent years, due to its efficiency, accuracy, and simple implementation, the target tracking algorithms based on correlation filter have achieved great success. KCF is one representative of them, and many trackers are based on it. There are also many works in UAV target tracking based on KCF. Cheng et al. [28], designed a target detector based on the frame difference and combine it with KCF. It has the ability to re-detect target after it is lost, and they successfully carried out tracking experiments outdoors. However, it is easy to cause detection errors if there are similar objects near the target. Ma et al. [29], improved KCF by using a motion model to adjust the search window during target re-detection and verified it indoors, but they did not build a complete UAV system and conducts outdoor experiments to verify the feasibility of the algorithm. Li et al. [30], proposed a tracker named FAST based on KCF, and then evaluated it on OTB50 [31], which proved its good performance. Then, the algorithm is implemented on their UAV platform, and tracking experiments have been successfully conducted outdoors. However, OTB50 is a general dataset and cannot accurately describe the performance of the algorithm in UAV target tracking. Li et al. [32], proposed SKCF based on KCF, they solve the problem of target scale change by training an extra correlation filter and adopt coarse to fine tuning through Gaussian distribution to precisely estimate the target position. They evaluated their algorithm on three benchmarks and showed good results. Also, they did not implement it on a real UAV.
The long-term target tracking of UAVs is still a complicated problem. It is not only a problem of target tracking algorithms, but also the problem of implementing it on real UAVs. Therefore, a suitable target tracking method and target tracking framework on UAVs are both necessary.

3. Tracker Based on KCF

Our tracker is based on KCFLabCPP [13], which uses sub-pixel peak estimation on the basis of KCF, while adding Lab color features, and the multi-scale detection method in SAMF. In this part, we first review KCF, and then introduce the main difference between KCFLabCPP and KCF, and finally introduce our model update strategy and re-detection module.

3.1. KCF Tracker

KCF ranked third in the VOT2014 [33] challenge. It is characterized by fast running speed and high accuracy, which affects many subsequent works. The main contribution of KCF is to use a cyclic matrix to represent image blocks and use ridge regression to train the target detector and converts the problem from the calculation of the matrix to element-wise operations through the Fourier transform, which greatly reduces the amount of calculation and improves the calculation speed. Let’s briefly review the idea of KCF.
KCF is a Tracking-by-Detection tracker. Tracking-by-Detection trains a classifier through online learning to predict the position of the target in the image. Given a n × 1 vector x = x 1 , x 2 , , x n T representing the target image block, and KCF uses it as the base sample. P is a permutation matrix,
P = 0 0 0 1 1 0 0 0 0 1 0 0 0 0 1 0
then P x = x n , x 1 , x 2 , , x n 1 T shift x by one element, and all cyclic shift samples P u x | u = 0 , 1 , , n 1 constitute the data matrix Χ , since all rows of the matrix Χ are the cyclic shift of the previous row, Χ is a circulant matrix. As we know circulant matrices can be diagonalized by discrete Fourier transform (DFT). It can be expressed as
X   = C x = F   diag x ^   F H
where F is a constant matrix known as the DFT matrix, which does not depend on x , x ^ denotes the DFT of x , and F H is the Hermitian transpose, i.e., F H = F T , F is the complex-conjugate of F .
KCF train a classifier f z = w T z through ridge regression to minimize the squared error over samples x i and their regression targets y i ,
min w i f x i   y i 2 + λ w 2
where λ is a regularization parameter that controls overfitting, the minimizer has a closed-form,
w = X T X + λ I 1 X T y
KCF uses the complex version of Equation (4) to work in the Fourier domain,
w = X H X +   λ I 1 X H y
KCF uses Equation (2) to transform it to
w ^ = x ^ y ^ x ^     x ^ +   λ
where is the element-wise product. KCF uses the kernel method to further speed up the calculation, it expresses the solution w as the linear combination of the samples,
w = i α i φ x i
where φ x is a non-linear mapping function. The optimized variable changes from w to α , we call the new parameter α in the dual space, and the original parameter w in the primal space. The classifier changes to
f z = w T z = i = 1 n α i k z , x i
replace f x i in Equation (3) using Equation (8) and then we get the solution
α = K + λ I 1 y
where K is the kernel matrix, α is the vector of coefficients α i , which represents the solution in the dual space. When K is a circulant matrix, it can be diagonalized as in the linear case, obtaining
α ^ = y ^ k ^ xx +   λ
where k xx is the first row of the kernel matrix K = C k xx , and   ^ represents the DFT of a vector. In order to make the kernel matrix K a circulant matrix, the kernel method must treat each dimension of the data equally. Most kernel methods meet the requirement. In our method, the Gaussian kernel is used,
k xx = exp 1 σ 2 x 2 + x 2 2 F 1 x ^   x ^
Therefore, only a few DFT/IDFT and element-wise operations are needed to calculate the kernel correlation, the time complexity is O n log n [11].
KCF also compute the regression function for all candidate patches using kernel method with
f z = K z   α
To compute Equation (12) efficiently, KCF diagonalize it to obtain
f ^ z = k ^ xz     α ^
After evaluation, KCF use linear interpolation to update the coefficient α and the base sample x ,
α p = 1 λ α p 1 +   λ α
x p = 1 λ x p 1 +   λ x
where p is the index of the current frame, and λ is the learning rate.
So far, we have reviewed the main ideas of KCF.

3.2. KCFLabCPP Tracker

The original version of KCF does not have the capability of multi-scale detection. The author introduced the multi-scale method to KCFLabCPP. The size of the template is fixed to s T = s x , s y , and scale pool S   =   t 1 , t 2 , , t k is used. The target window size in the original image is s T , in the current frame, the tracker get k samples from t i s T | t i ϵ S to find a suitable target. The final response map is
arg max F 1 f ^ z t i
where z t i is an image patch of size t i s T , and its size is finally adjusted to s T .
The scale pool used by KCFLabCPP is S   =   0.95 , 1.0 , 1.05 . In addition, KCFLabCPP uses the HOG + Lab features.

3.3. Model Update Strategy and Re-Detection Module

Since KCF updates its model is every frame and does not have the ability to detect target loss, it performs well in short-term tracking, but it is not ideal in long-term tracking. Lukežič et al. [34], classified it as an ST0 tracker. In this part, we will introduce how to improve KCF to make it from ST0 tracker to LT1 tracker, so as to have better long-term tracking performance.
In long-term tracking, if the tracking model is updated when the prediction bounding box is wrong, the model will deteriorate. As the error accumulates, the target will eventually be lost. Therefore, updating the model when the prediction bounding box is right is important. We introduce the peak sidelobe ratio (PSR) [11], to evaluate the confidence of the prediction bounding box. It is defined as
PSR = g max   μ s 1 σ s 1
where g m a x is the maximum value of the response map, and an 11 × 11 window is selected around the peak corresponding to the maximum value, μ s 1 and σ s 1 are the average value and standard deviation of the response map in the window, respectively. The larger the PSR, the higher the confidence of the prediction bounding box. In our experiment, we set two thresholds t 1 and t 2 ( t 2 > t 1 ). When the confidence is lower than t 1 , the target is considered lost, and when the confidence is higher than t 1 and lower than t 2 , although the target is not lost, the confidence is low, the model is not updated at this time. When the confidence is higher than t 2 , the model is updated. In the multi-scale situation, the PSR of the response map is calculated for each scale, select the scale and position corresponding to the maximum value, then Equation (16) changes to
arg max PSR F 1 f ^ z t i
Above we optimize the model update mechanism and solve the problem of how to judge the target loss. Next, we describe how to re-detect the target after the target is lost. According to the research of Li et al. [30], the tracking model can also be used to detect the target. However, the detection on the entire image will cause the calculation too complex, thereby reducing the running speed. Therefore, we use the NCC [35], template matching for region proposal first, and then detect over the area from the previous step, which reduce the complexity of calculations. The specific method is introduced below.
As we all know that the larger PSR indicates the higher confidence of the prediction bounding box. Therefore, when the target state remains stable, the PSR value will gradually increase. We set a threshold t 3 ( t 3 > t 2 ), When the confidence is greater than t 3 , the prediction bounding box is intercepted as a template referred as T. When the target is lost, we used NCC to match T with the image, and obtained an area with the largest response value as the area to be detected. Then, we used the tracking model to calculate the response map on the area, and when its PSR is higher than t 2 , use it to get new prediction bounding box (otherwise, we repeated the above process). The overall algorithm is summarized as Algorithm 1.
Algorithm 1. Improved kcf algorithm
Input:
Y: the newly arrived observation,
Bold: the last target bounding box,
Output:
Bnew: the new target bounding box,
  • for every si in S do
  • Sample the new patch zi based on size tist and resize it to st with multiple features.
  • Calculate response with Equations (11) and (13).
  • end for
  • Compute the response map corresponding to the largest PSR according to Equations (17) and (18), and the maximum value is recorded as PSRmax.
  • The position corresponding to PSRmax is denoted as Ptrack.
  • if PSRtrack < t1 then
  • repeat
  • Do template matching using NCC to get region proposal ROI.
  • Calculate response map with Equations (11) and (13) on ROI.
  • Get PSRdetect and corresponding position Pdetect according to Equation (18).
  • untilPSRdetect > t2
  • ifPSRdetect > t3
  • Update T.
  • end if
  • Update α and x according to Equations (14) and (15).
  • Get Bnew based on Pdetect.
  • else
  • ifPSRtrack > t2
  • Update α and x according to Equations (14) and (15).
  • end if
  • ifPSRtrack > t3
  • Update T.
  • end if
  • Get Bnew based on Ptrack.
  • end if

4. Evaluation on Tracking Benchmark

4.1. Experimental Setup

In most of the UAV target tracking works, the tracker is evaluated on a general dataset. In order to emphasize the performance of our tracker in the long-term target tracking of UAVs, UAV20L is selected as our benchmark. UAV20L is a tracking dataset proposed by Mueller et al. [36], which was taken by a low-altitude UAV. It contains 20 long sequences, a total of 58,670 frames, with 12 attributes indicating challenging types. Due to the real-time requirements of UAV target tracking, we only select trackers with speed of more than 10 fps to compare with our trackers. Among them, according to the classification standard given by Lukežič et al. [34], CSK, KCF, KCFCPP [13], KCFLabCPP, DSST, and STAPLE belong to ST0 tracker, ECO-HC belongs to ST1 tracker, and CMT [37] and TLD belong to LT1 tracker.
In our experiments, the values of t 1 , t 2 , t 3 are set to 7, 20, and 60, respectively. Due to the RAM structure of the onboard computer, some trackers cannot be directly evaluated on it, so our experiments are all conducted on an i5 −8500, 3.00 GHz CPU computer to ensure the fairness of the evaluation. In all trackers, CMT is implemented in python, the others are implemented in C++. The running speed of the trackers on this computer is three to four times the speed on onboard computer.

4.2. Evaluation Method

We use two methods to evaluate the tracker separately, short-term tracking performance evaluation and long-term tracking performance evaluation to fully compare the performance of the tracker.
We use the method proposed by Wu et al. [31], for short-term tracking performance evaluation. It uses precision and success plot to qualitatively analyze the tracker. A precision plot shows the percentage of frames wherein their Euclidean distance between the center of the prediction bounding box and the center of ground truth bounding box is within given threshold. We use the score when the threshold is 20 pixels as the representative precision score of the tracker. Given the prediction bounding box B t and ground truth bounding box B g t , the overlap score of each frame is defined as
O v e r l a p =   B t     B g t     B t     B g t  
where and represent the intersection and union of the two regions, respectively, and · denotes the number of pixels in the region. The success plot shows the percentage of frames with an overlap score greater than given threshold. We use the area under the curve (AUC) of each success plot to rank the trackers.
We use the method proposed by Lukežič et al. [34], for long-term tracking performance evaluation. Long-term tracking performance evaluation is different from short-term tracking performance evaluation. It should reflect the accuracy of the tracker’s detection of target loss and the ability to re-detect the target. Wu et al. [31], uses precision and recall to quantify these two attributes. We use G t to represent ground truth bounding box of the target location, A t τ θ is the prediction bounding box of the tracker, and θ t is the confidence score of the prediction bounding box at time step t, τ θ is the discrimination threshold. When the target does not exist, G t = . Similarly, when the tracker does not predict the position of the target or the confidence of the prediction bounding box is lower than given threshold, that is, θ t < τ θ , A t τ θ = . let Ω A t τ θ , G t represents the overlap score of the predicted bounding box and the ground truth bounding box. Ω A t τ θ , G t is 0 when either A t τ θ or G t is . In target detection, when Ω A t τ θ , G t is greater than the given threshold τ Ω , the prediction bounding box and ground truth bounding box matches. Given two thresholds τ θ and τ Ω , the precision P r and the recall R e are defined as:
P r τ θ , τ Ω =   t   :   Ω A t τ θ , G t       τ Ω   N p
R e τ θ , τ Ω =   t   :   Ω A t τ θ , G t       τ Ω   N g
where · is the number of frames that satisfy Ω A t τ θ , G t     τ Ω (time step t), N p is the number of frames that satisfy A t τ θ     , and N g is the number of frames that satisfy G t     .
In target detection, τ Ω is 0.5 or a higher value, but the threshold is too high, which makes it impossible to correctly indicate that the target is lost in practice. By integrating τ Ω , the calculation formula is reduced to a single threshold,
P r τ θ = 0 1 P r τ θ , τ Ω d τ Ω = 1 N p   t t : A t θ t Ω A t θ t , G t
R e τ θ = 0 1 R e τ θ , τ Ω d τ Ω = 1 N g   t t : G t Ω A t θ t , G t
P r τ θ is called as tracking precision, and R e τ θ is called as tracking recall to distinguish it from target detection. The long-term target tracking performance of the tracker can be analyzed by drawing the precision/recall plot. P r τ θ and R e τ θ are weighted and averaged to obtain the F-score,
F τ θ   =   2 P r τ θ R e τ θ / ( P r τ θ + R e τ θ  
This metric is called F-measure. One should visualize the F-score with different values of τ θ to obtain the F-score plot. Since the F-scores of different τ θ are different, the F-score of the tracker are defined as the maximum F-score of different τ θ , so that avoid the measurement difference caused by manually setting the threshold, and rank trackers according to this value.
Since ST0 and ST1 trackers cannot detect target loss, they will give a prediction bounding box in every frame. In order to evaluate its long-term tracking performance, Lukežič et al. [34], manually determined a score for each tracker. It is used as uncertainty score to solve this problem, while manually setting the score for the tracker may cause errors. In order to reduce this error, in our experiment, we use the Euclidean distance of the center of the prediction bounding box and the center of ground truth bounding box as the uncertainty score, and formulas (22) and (23) are modified to,
P r τ θ =   0 1 P r τ θ , τ Ω d τ Ω =   1 N p   t t : A t , G t < τ θ Ω A t , G t
R e τ θ =   0 1 R e τ θ , τ Ω d τ Ω = 1 N g   t t : A t , G t < τ θ Ω A t , G t
where A t is the prediction bounding box of the tracker, G t is the ground truth bounding box, · is the Euclidean distance between A t and G t , when any of A t and G t is , A t , G t is + . N p is the number of frames (at time step t) satisfying A t , G t < τ θ , N g is the number of frames satisfying G t     , and τ θ is the discriminant threshold. It can be proved that the evaluation is reasonable, considering the scene in short-term tracking: At this time, the target is always visible, and the tracker will give a prediction bounding box every frame and believe it is credible. Then the F-score calculated according to the modified formula is the average overlap score.

4.3. Qualitative Evaluation

Figure 1 shows the partial tracking snapshots of our tracker and other three trackers on 5 video sequences in UAV20L, they are bike1, car16, group1, person14, uav1 from left to right, respectively. We use different colors to draw the prediction bounding boxes of trackers, in order to facilitate comparison, the ground truth bounding box is marked with green color, and the result bounding box of our tracker is marked with red color. The number in the upper left corner of the image is the number of frames in the sequence. After analyzing and comparing the experimental results we find that our tracker has quite good performance.
(1)
bike1: In this sequence, the tracking target is a moving bicycle. The aspect ratio of the object and camera’s angle of view are constantly changing. Only our tracker and ECO-HC track the target correctly from beginning to end, and it can be seen from the image that the result of our tracker is closer to the ground truth. Since the tracking model of our tracker removes the tracking results with lower confidence when updating, the error accumulated by the model becomes less.
(2)
car16: In this sequence, the tracking target is a moving car. The target is moving fast and its distance from the camera is always changing. In the end, only our tracker and KCFLabCPP can track the target correctly.
(3)
group1: The tracking target of this sequence is a walking person. There are similar objects around the target, and similar objects are interlaced with the target many times during the tracking process. Our tracker can still track the target correctly.
(4)
person14: The tracking target in this sequence is a running person. During the tracking process, the target was completely blocked by obstacles for a period of time. When the target reappears in the field of view, only our tracker successfully re-detected the target, and other trackers lost the target. This shows that the re-detection module of our tracker has successfully worked.
(5)
uav1: The tracking target of this sequence is a fast-moving UAV, and the video resolution is low. This is a great challenge for trackers. Although there is no tracker to track the target successfully from beginning to end, our tracker has highest evaluation score indicating our tracker’s ability to track low-resolution targets.

4.4. Quantitative Evaluation

Short-term evaluation: Figure 2 shows the precision and success plots of all trackers with OPE measure on UAV20L. We can see that although ECO-HC belongs to the ST1 tracker, it still ranks second in the evaluation results due to its outstanding performance. And our tracker has best results. Compared with the KCFLabCPP tracker, our tracker has increased by 21.0% in precision and 13.8% in AUC. Compared with ECO-HC, the precision of our tracker has increased by 5.2%, and AUC has increased by 4.1%. We can see that the baseline algorithm performs worse than ECO-HC, but our improved algorithm was better than it. Moreover, ECO-HC runs at low fps so it is not suitable for real-time tracking. Although our algorithm runs 20% slower than the baseline algorithm, it still achieved 175 fps which was, enough for real-time tracking on the onboard computer.
In order to analyze the performance of the tracker under different conditions, Figure 3 shows the evaluation results of the tracker on selected attributes. The attributes we selected are common challenges in long-term target tracking, and they are FOC (complete occlusion), OV (Out of field of view), FM (fast motion), and ARC (aspect ratio change), respectively. We can see that the performance of trackers under specific attributes is obviously worse than the overall performance in all sequences, and the challenges that each tracker does well in are also different. However, our tracker achieved the best results under each of the four attributes. Especially for the main challenges of long-term target tracking FOC (Completely occluded) and OV (out of field of view), our tracker’s evaluation results are much better than the second ranked tracker. On FOC attribute, our tracker has an advancement of 10.4% in AUC against KCFCPP, and an advancement of 9.1% in precision against CMT. On OV attribute, our tracker has made a progress of 11.1% and 11.5% in AUC and precision, respectively, against second-place tracker ECO-HC. The baseline algorithm is only in the middle of the ranking under the evaluation of the four attributes. The improved algorithm has achieved the best results, confirming the superiority of our algorithm.
Long-term evaluation: Figure 4 shows the results of long-term evaluation of trackers on UAV20L. After modifying the formula, as the threshold increases, the F-score gradually increases. And when the deviation in the actual UAV target tracking gets more than 50 pixels, the target position calculated based on the center of the prediction bounding box is no longer credible. Therefore, we only evaluate trackers using threshold within 50. As you can see, our tracker is still better than all trackers. According to our evaluation Method, when the threshold is 50, the F-score of each tracker reaches the maximum. The maximum F-score of our tracker has an advancement of 7.6% against the second-place tracker ECO-HC and 14.0% against KCFLabCPP.
From the long-term evaluation and short-term evaluation results, our tracker has the best performance. ECO-HC is the second-place tracker in both evaluations, and it performs better than KCFLabCPP, but its running speed is only 36.83 fps. It will be further reduced on the onboard computer and can’t track in real time. By improving KCFLabCPP, our tracker’s performance exceeds ECO-HC, and its running speed reaches 175.54 fps, meeting the real-time requirement on the onboard computer.

5. Implementation on UAV Platform

5.1. System Architecture

Our tracking system is implemented on the DJI M100. Figure 5 shows the overall architecture of the system. The system obtains image information through the binocular camera ZED 2, and the target location information is processed by the target tracking algorithm on the onboard computer Jetson TX2, then converted into the control signal of the UAV, achieving continuous tracking of the target.

5.2. Target Position Estimation

Our system uses binocular camera to estimate the relative position of the UAV and the target. Binocular stereo vision imitates the principle of human vision and uses two identical cameras to shoot the same object from different angles. According to the parallax of the two images, we can use geometry and algebra to find the actual distance between the object and the camera. The ZED 2 camera integrates a calculation method, so the distance between the target and the camera can be obtained by calling the method. In our experiment, the distance relative to the left camera is used.

5.3. Flight Control

After obtaining the position of the target through the binocular camera, according to the control strategy, the system will correspondingly control the UAV to maintain a certain distance from the target. In our experiment, the height of the UAV remains stable. We emphasize that in most UAV target tracking processes, the height of the target does not change, so there is no need to change the height of the UAV. In our experiments, keeping the UAV at a constant height is just for the convenience, and changes in the height of the UAV will not greatly affect the tracking of the target. Thus, the flight control can be simplified as a problem on the XY plane as shown in Figure 6.
In the X direction, we control the distance between the target and the UAV by controlling the speed. After initialization, the distance between the target and the UAV is recorded as d 1 . During the tracking process, at kth frame, the distance between the UAV and the target is d k , the time is denoted as t k . When d k d 1     δ ( δ is a threshold), the UAV moves in the X direction. Our system measures the distance between the target and the UAV every four frames, the speed in the X direction is calculated by the following method,
v x = d k d k 4 / t k t k 4
In the Y direction, we maintain the distance between the UAV and the target by controlling the yaw angle instead of the speed. During the tracking process, the position of the target in the camera is shown in Figure 7. A central area is defined which is surrounded by two dotted lines in the figure. The box surrounded by the green lines is the target bounding box. The black dot represents the center of the target bounding box, and the distance between the target and the UAV is calculated based on the coordinates of this point. During initialization, we keep the target in the center area of the image. During the tracking process, when the center of the target leaves the center area of the image, control the UAV to yaw in the corresponding direction to keep the target in the UAV’s field of view.
When the UAV needs to be adjusted in both the X and Y directions, since the control methods are independent of each other, simultaneous adjustment in both directions can be achieved.

5.4. Outdoor Experiments

We implemented the target tracking system on the UAV and conducted tracking experiments outdoors. We conducted outdoor experiments on the school playground. We used a moving person as the experimental subject, so there are similar targets around. During the experiment, the subject moved irregularly around the playground, and the UAV could always track the target correctly. Figure 8 shows an instance of our outdoor experiment. The upper right corner of the image is the picture of the onboard camera. The image will show the current tracking status and the distance to the target, the yellow rectangle is the current target bounding box. The experimental results prove that the UAV can autonomously track the target after initialization. In our experiment, the UAV has achieved continuous tracking for about six minutes at the longest. The tracking fails due to long-term accumulation of target position drifts eventually. Therefore, the tracking algorithm can be further improved. A feasible method is to prepare more templates for target re-detection, which will undoubtedly reduce the speed of the algorithm.

6. Discussion and Conclusions

A KCF-based moving target tracking algorithm has been proposed in this paper. By improving the model update strategy of KCF, the background information interference of the tracking model is reduced. Our proposed algorithm combines the NCC template matching and the tracking model to perform target re-detection, so that it has the ability of long-term target tracking. In order to highlight the performance of the algorithm in UAV target tracking, we use different methods to evaluate our algorithm on UAV20L benchmark where all the sequences were taken by a low-altitude UAV. The short-term evaluation method we used is a widely used evaluation method in target tracking. Our algorithm achieves both the best results in the entire sequence and under the selected four long-term target tracking challenge attributes. Also, in order to highlight the performance of the algorithm in long-term target tracking, we use the long-term target evaluation method proposed by others. In the original version, in order to use this method, the author manually set a score for each short-term tracking method to detect tracking failures, which may cause errors. Therefore, we have modified the method to use a unified score to determine the tracking failure. Our algorithm still achieves the best results. In conclusion, the experimental results show that our algorithm has not only better performance than baseline algorithm but also exceed other algorithms, and can cope with various complex scenarios in long-term target tracking. In addition, we integrate it into the UAV system and conduct flight experiments outdoors. The experiment proves that the algorithm can run in real time on the onboard computer, and the UAV can respond to the loss of the target during long-term target tracking. Moreover, the tracker and controller in our UAV target tracking system are independent, which means that if there are better trackers, they can be integrated into the system to achieve better target tracking results. Therefore, in future work, we will try to study better target tracking algorithms to further optimize the UAV target tracking system.

Author Contributions

Conceptualization, J.Y. and W.T.; methodology, J.Y.; software, J.Y.; validation, J.Y. and W.T.; writing—original draft preparation, J.Y.; writing—review and editing, W.T. and Z.D.; supervision, Z.D.; funding acquisition, Z.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported partially by fund of TICPSH of Shanghai.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. De Smedt, F.; Hulens, D.; Goedemé, T. On-board real-time tracking of pedestrians on a UAV. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Boston, MA, USA, 7–12 June 2015; pp. 1–8. [Google Scholar]
  2. Scherer, J.; Yahyanejad, S.; Hayat, S.; Yanmaz, E.; Andre, T.; Khan, A.; Vukadinovic, V.; Bettstetter, C.; Hellwagner, H.; Rinner, B. An autonomous multi-UAV system for search and rescue. In Proceedings of the First Workshop on Micro Aerial Vehicle Networks Systems, and Applications for Civilian Use, Florence, Italy, 18 May 2015; pp. 33–38. [Google Scholar]
  3. San, K.T.; Mun, S.J.; Choe, Y.H.; Chang, Y.S. UAV delivery monitoring system. In MATEC Web of Conferences; EDP Sciences: Les Ulis, France, 2018; p. 04011. [Google Scholar]
  4. Huang, Z.; Fu, C.; Li, Y.; Lin, F.; Lu, P. Learning aberrance repressed correlation filters for real-time uav tracking. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 2891–2900. [Google Scholar]
  5. Fu, C.; Zhang, Y.; Duan, R.; Xie, Z. Robust scalable part-based visual tracking for UAV with background-aware correlation filter. In Proceedings of the 2018 IEEE International Conference on Robotics Biomimetics (ROBIO), Kuala Lumpur, Malaysia, 12–15 December 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 2245–2252. [Google Scholar]
  6. Mahony, R.; Kumar, V.; Corke, P. Multirotor aerial vehicles: Modeling, estimation, and control of quadrotor. IEEE Robot. Autom. Mag. 2012, 19, 20–32. [Google Scholar] [CrossRef]
  7. Carli, R.; Cavone, G.; Epicoco, N.; Di Ferdinando, M.; Scarabaggio, P.; Dotoli, M. Consensus-Based Algorithms for Controlling Swarms of Unmanned Aerial Vehicles. In Proceedings of the International Conference on Ad-Hoc Networks and Wireless, Bari, Italy, 19–21 October 2020; Springer: Cham, Switzerland, 2020; pp. 84–99. [Google Scholar]
  8. Hao, J.; Zhou, Y.; Zhang, G.; Lv, Q.; Wu, Q. A review of target tracking algorithm based on UAV. In Proceedings of the 2018 IEEE International Conference on Cyborg and Bionic Systems (CBS), Shenzhen, China, 25–27 October 2018; pp. 328–333. [Google Scholar]
  9. Chen, P.; Zhou, Y. The Review of target tracking for UAV. In Proceedings of the 2019 14th IEEE Conference on Industrial Electronics and Applications (ICIEA), Xi’an, China, 19–21 June 2019; pp. 1800–1805. [Google Scholar]
  10. Fu, C.; Li, B.; Ding, F.; Lin, F.; Lu, G. Correlation Filters for Unmanned Aerial Vehicle-Based Aerial Tracking: A Review and Experimental Evaluation. IEEE Geosci. Remote Sens. Mag. 2021, 2–387. [Google Scholar] [CrossRef]
  11. Bolme, D.S.; Beveridge, J.R.; Draper, B.A.; Lui, Y.M. Visual object tracking using adaptive correlation filters. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; pp. 2544–2550. [Google Scholar]
  12. Henriques, J.F.; Caseiro, R.; Martins, P.; Batista, J. Exploiting the circulant structure of tracking-by-detection with kernels. In Proceedings of the European Conference on Computer Vision, Florence, Italy, 7–13 October 2012; Springer: Berlin/Heidelberg, Germany, 2012; pp. 702–715. [Google Scholar]
  13. Henriques, J.F.; Caseiro, R.; Martins, P.; Batista, J. High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intel. 2014, 37, 583–596. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  14. Danelljan, M.; Häger, G.; Khan, F.; Felsberg, M. Accurate scale estimation for robust visual tracking. In Proceedings of the British Machine Vision Conference, Nottingham, 1–5 September 2014; Bmva Press: Durham, UK, 2014. [Google Scholar]
  15. Li, Y.; Zhu, J. A scale adaptive kernel correlation filter tracker with feature integration. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; Springer: Cham, Switzerland, 2014; pp. 254–265. [Google Scholar]
  16. Bertinetto, L.; Valmadre, J.; Golodetz, S.; Miksik, O.; Torr, P.H. Staple: Complementary learners for real-time tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1401–1409. [Google Scholar]
  17. Possegger, H.; Mauthner, T.; Bischof, H. In defense of color-based model-free tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 2113–2120. [Google Scholar]
  18. Danelljan, M.; Hager, G.; Shahbaz Khan, F.; Felsberg, M. Learning spatially regularized correlation filters for visual tracking. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 4310–4318. [Google Scholar]
  19. Danelljan, M.; Hager, G.; Shahbaz Khan, F.; Felsberg, M. Adaptive decontamination of the training set: A unified formulation for discriminative visual tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1430–1438. [Google Scholar]
  20. Danelljan, M.; Robinson, A.; Khan, F.S.; Felsberg, M. Beyond correlation filters: Learning continuous convolution operators for visual tracking. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; Springer: Cham, Switzerland, 2016; pp. 472–488. [Google Scholar]
  21. Dai, K.; Wang, D.; Lu, H.; Sun, C.; Li, J. Visual tracking via adaptive spatially-regularized correlation filters. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 4670–4679. [Google Scholar]
  22. Danelljan, M.; Bhat, G.; Shahbaz Khan, F.; Sun, C.; Li, J. Eco: Efficient convolution operators for tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6638–6646. [Google Scholar]
  23. Pestana, J.; Sanchez-Lopez, J.L.; Saripalli, S.; Campoy, P. Computer vision based general object following for gps-denied multirotor unmanned vehicles. In Proceedings of the 2014 American Control Conference, Portland, OR, USA, 4–6 June 2014; pp. 1886–1891. [Google Scholar]
  24. Kalal, Z.; Mikolajczyk, K.; Matas, J. Tracking-learning-detection. IEEE Trans. Pattern Anal. Mach. Intel. 2011, 34, 1409–1422. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  25. Ma, C.; Yang, X.; Zhang, C.; Yang, M.H. Long-term correlation tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 5388–5396. [Google Scholar]
  26. Mueller, M.; Sharma, G.; Smith, N.; Ghanem, B. Persistent aerial tracking system for uavs. In Proceedings of the 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Daejeon, South Korea, 9–14 October 2016; pp. 1562–1569. [Google Scholar]
  27. Hare, S.; Golodetz, S.; Saffari, A.; Vineet, V.; Cheng, M.M. Struck: Structured output tracking with kernels. IEEE Trans. Pattern Anal. Mach. Intel. 2015, 38, 2096–2109. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  28. Cheng, H.; Lin, L.; Zheng, Z.; Guan, Y.; Liu, Z. An autonomous vision-based target tracking system for rotorcraft unmanned aerial vehicles. In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 24–28 September 2017; pp. 1732–1738. [Google Scholar]
  29. Ma, Y.; Pei, P.; Xiang, C.; Yao, S.; Gao, Y. KCF based 3D Object Tracking via RGB-D Camera of a Quadrotor. In Proceedings of the 2017 International Conference on Unmanned Aircraft Systems (ICUAS), Miami, FL, USA, 13–16 June 2017; pp. 939–944. [Google Scholar]
  30. Li, R.; Pang, M.; Zhao, C.; Zhou, G.; Fang, L. Monocular long-term target following on uavs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 29–37. [Google Scholar]
  31. Wu, Y.; Lim, J.; Yang, M.H. Online object tracking: A benchmark. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013; pp. 2411–2418. [Google Scholar]
  32. Li, C.; Liu, X.; Su, X.; Zhang, B. Robust kernelized correlation filter with scale adaption for real-time single object tracking. J. Real-Time Image Process. 2018, 15, 583–596. [Google Scholar] [CrossRef]
  33. Hadfield, S.J.; Lebeda, K.; Bowden, R. The visual object tracking VOT2014 challenge results. In Proceedings of the European Conference on Computer Vision (ECCV) Visual Object Tracking Challenge Workshop, University of Surrey, Zurich, Switzerland, 6–7, 12 September 2014. [Google Scholar]
  34. Lukežič, A.; Zajc, L.Č.; Vojíř, T.; Matas, J.; Kristan, M. Now you see me: Evaluating performance in long-term visual tracking. arXiv 2018, arXiv:1804.07056. [Google Scholar]
  35. Briechle, K.; Hanebeck, U.D. Template matching using fast normalized cross correlation. In Proceedings of the Optical Pattern Recognition XII. Int. Soc. Opt. Photonics 2001, 4387, 95–102. [Google Scholar]
  36. Mueller, M.; Smith, N.; Ghanem, B. A benchmark and simulator for uav tracking. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; Springer: Cham, Switzerland, 2016; pp. 445–461. [Google Scholar]
  37. Nebehay, G.; Pflugfelder, R. Clustering of static-adaptive correspondences for deformable object tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 2784–2791. [Google Scholar]
Figure 1. Part of tracking results between our tracker and other three trackers.
Figure 1. Part of tracking results between our tracker and other three trackers.
Mathematics 09 03006 g001
Figure 2. Precision and success plots with OPE measure for all sequences.
Figure 2. Precision and success plots with OPE measure for all sequences.
Mathematics 09 03006 g002
Figure 3. Precision and success plots for all sequences on selected attributes.
Figure 3. Precision and success plots for all sequences on selected attributes.
Mathematics 09 03006 g003aMathematics 09 03006 g003b
Figure 4. Long-term evaluation results of trackers on UAV20L.
Figure 4. Long-term evaluation results of trackers on UAV20L.
Mathematics 09 03006 g004
Figure 5. Architecture of the target tracking system.
Figure 5. Architecture of the target tracking system.
Mathematics 09 03006 g005
Figure 6. The relative position between the UAV and the target on the XY plane.
Figure 6. The relative position between the UAV and the target on the XY plane.
Mathematics 09 03006 g006
Figure 7. Target position in camera.
Figure 7. Target position in camera.
Mathematics 09 03006 g007
Figure 8. An instance of outdoor flight experiment.
Figure 8. An instance of outdoor flight experiment.
Mathematics 09 03006 g008
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Yang, J.; Tang, W.; Ding, Z. Long-Term Target Tracking of UAVs Based on Kernelized Correlation Filter. Mathematics 2021, 9, 3006. https://0-doi-org.brum.beds.ac.uk/10.3390/math9233006

AMA Style

Yang J, Tang W, Ding Z. Long-Term Target Tracking of UAVs Based on Kernelized Correlation Filter. Mathematics. 2021; 9(23):3006. https://0-doi-org.brum.beds.ac.uk/10.3390/math9233006

Chicago/Turabian Style

Yang, Junqiang, Wenbing Tang, and Zuohua Ding. 2021. "Long-Term Target Tracking of UAVs Based on Kernelized Correlation Filter" Mathematics 9, no. 23: 3006. https://0-doi-org.brum.beds.ac.uk/10.3390/math9233006

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop