1. Introduction
Automotive radar systems commonly use frequency-modulated continuous wave (FMCW) [
1,
2] technology. By employing frequency modulation on a continuous wave (CW) signal, it becomes possible to simultaneously estimate both the distance and velocity of a target. Moreover, a multiple-input and multiple-output (MIMO) antenna system [
3,
4], consisting of transmitting and receiving antenna elements, is used to estimate the angle of a target. Therefore, the recently developed concept of imaging radar can provide high-resolution point cloud images with enhanced imaging performance [
5]. Target classification is a key technology directly related to driver safety in the driving assistance system. Traditionally, target classification in radar systems was mostly based on Doppler information [
6,
7,
8]. However, recent advancements in radar sensors have enabled the acquisition of high-resolution point cloud data, leading to active research in target classification based on point cloud data. The authors in [
9] proposed a method of classification and segmentation in driving environments using point clouds. As such, studies using various clustering techniques and classifiers were conducted to improve target classification performance [
10,
11,
12]. In addition, the authors in [
13] proposed a method for detecting and classifying the positions of dynamic road users by combining various classifiers. The authors in [
14] proposed efficient real-time road user detection for multi-target traffic scenarios through FMCW measurement simulation.
Research on target classification methods using deep learning algorithms in radar systems has been recently conducted [
15,
16,
17]. The authors of [
15] applied a multi-view convolutional neural network (CNN) to the point clouds acquired using a high-resolution MIMO FMCW radar system for target classification. The authors in [
16] proposed graph neural networks for radar object-type classification, which jointly process the radar reflection list and spectra. Also, the authors of [
17] performed multiperson activity recognition tasks through a four-channel CNN classification model based on the Doppler, range, azimuth, and elevation features of the point cloud. Moreover, research has also been conducted using spatial features in conjunction with target information such as distance, velocity, and angle [
18,
19]. The authors of [
18] transformed sparse point cloud data into radio frequency (RF) images to infer precise target shapes. In addition, the authors in [
19] used the rolling ball method to extract accurate contour information from high-resolution radar point cloud data. In this paper, we focus on target classification in driving environments. We propose a method that effectively classifies stationary targets based on the spatial features of point clouds. We apply the density-based spatial clustering of applications with noise (DBSCAN) [
20] method to cluster commonly encountered pedestrians, cyclists, sedans, and sports utility vehicles (SUVs) in road scenarios and define convex hull boundaries that enclose the point clouds in 3D and 2D space obtained by orthogonally projecting the data in three different directions (i.e.,
,
, and
planes). Using the vertices of convex hull, we calculate the volume of the targets and the areas in 2D spaces. These spatial features are then complemented with the number of points in the point cloud. Additionally, we identify significant features that affect classification and validate the performance of the classification method with corresponding feature vectors.
In summary, the key contributions of our work can be outlined as follows:
Through the proposed method, we obtain the vertices of convex hull that encloses the point cloud and extract spatial features by calculating the volume and areas in 3D and 2D space. Unlike conventional methods that merely cluster point cloud data, our approach considers the shape and perimeter of each cluster, enabling a deeper utilization of the target’s spatial features.
The proposed spatial feature-based target classification method shows improved target classification performance compared to the case using spatial features extracted only with the DBSCAN method.
By integrating spatial information into the classification process, our proposed method not only achieves higher accuracy but also reduces the training time compared to deep learning-based object classification methods that do not use spatial information.
The remainder of the paper is organized as follows.
Section 2 provides an introduction to target estimation in the MIMO FMCW radar system, covering the fundamental concepts, and describing the experimental environment. In
Section 3, we describe the process of extracting the vertices of the convex hull through the proposed method and the feature selection process for target classification. In
Section 4, we describe the structure of the classifier. We also perform an analysis to identify the most significant features and evaluate the performance by comparing it with other target classification methods. Finally, we conclude this paper in
Section 5.
4. Results
In this section, we describe structure of the deep neural network (DNN) for target classification and configure the feature vectors from the features considered in
Section 3.2, through the performance evaluation. Furthermore, we compare the performance with other target classification methods that do not use spatial information.
4.1. The Structure of the DNN for Target Classification
DNNs have the advantage of enhanced training ability compared to artificial neural networks, as they have multiple hidden layers. As a result, there have been numerous studies applying radar sensor data to DNNs. The authors in [
25] proposed a machine learning-based method to classify pedestrians using automotive radar. They used the DBSCAN method for clustering detected targets and calculated features to identify targets belonging to the same moving object. In addition, the authors in [
26] proposed a method to improve the target classification accuracy of automotive radar sensors by directly applying deep convolutional neural networks to the region of interest on the radar spectrum.
Figure 9 shows the structure of the DNN for target classifications. The DNN model used in this paper consists of three hidden layers, and each layer is composed of 30 nodes. The activation functions of the hidden layers are connected in the order of sigmoid, hyperbolic tangent, and hyperbolic tangent. Finally, they pass through the softmax layer and output layer to generate the output. Training is carried out by feedforward, where the nodes of the input layer are multiplied by weight and added with bias as they are passed to the next layer. Then, the process passes through the activation function and reaches the output layer. If the training is incorrect, the process is adjusted by correcting the gradient of the nodes through error backpropagation.
For the input, we started with five types of features (i.e., the areas of the convex hull observed from three directions, volume, and the number of points) and empirically reduced them to the most significant features. The output types of targets were fixed to pedestrians, cyclists, sedans, and SUVs. We obtained a total of 11,200 feature vectors for different types of pedestrians, cyclists, sedans, and SUVs. The input data are divided with 75% of the data allocated to training and 25% allocated to testing. The maximum number of epochs for training is set to 1500, where each epoch represents one complete iteration over the entire feature vector. The training data are further divided into random independent samples, with 70% for training, 15% for validation, and 15% for testing purposes.
4.2. Performance Evaluation
For efficient target classification, we constructed feature vectors for the two cases and evaluated their performance. First, the performance was evaluated considering all five types of features considered in
Section 3.2. Then, the classification performance was evaluated using only the areas and volume of the target obtained through the proposed method, excluding the number of points. The average classification accuracy is presented in
Table 3.
4.2.1. Classification Using All Features
As shown in
Table 3, the feature vector consisting of all five features considered is
.
Figure 10a shows the confusion matrix for
. As shown in
Figure 10a, pedestrians showed the highest classification accuracy at 97.1%. In contrast, the accuracy of prediction for cyclists was relatively low at 89.6%, and the highest error rate was 9.4% when cyclists were confused with pedestrians. The average classification accuracy of all features was 91.7%.
4.2.2. Classification Using Selected Features
To determine which features significantly affect classification performance, we constructed a feature vector
.
Figure 10b represents the confusion matrix obtained from the feature vector
. The classification performance decreased by 4.7% when the number of points comprising the targets was removed. Despite a slight decrease in classification performance, the spatial features of areas and volume exhibited a high classification accuracy of 87.0%. This finding confirms that these spatial features are significant factors in the classification process.
4.3. Comparison of Spatial Features from a Virtual Bounding Box and the Proposed Method
To evaluate the performance of the proposed method, we compared it with the use of spatial features extracted by a virtual bounding box. In
Section 2.4, we used the camera mounted on the radar sensor to define a virtual bounding box corresponding to the targets. The spatial features extracted in
Section 4.2.2 (i.e.,
,
,
, and
) were replaced by the area and volume obtained by orthogonally projecting the virtual bounding box in three different directions. The feature vectors were organized in the form of
, which yields the highest classification accuracy.
Table 4 shows the classification accuracy when using spatial features from virtual bounding boxes and the proposed method. The accuracy of predicting pedestrians was improved by 2.8% when using the feature vectors processed with the proposed method. Particularly, the accuracy of predicting cyclists was improved by 4.9%, and the overall average classification accuracy was enhanced by 3.5%.
4.4. Comparison with Other Target Classification Methods
In this section, we compare the proposed method with other deep learning-based target classification methods. We compare the performance of the proposed method against following models that are widely used for target classification: PointNet and SqueezeNet. PointNet uses the coordinate information of the point cloud as the input, while SqueezeNet uses the point cloud images from the
,
, and
planes.
Figure 11 shows the confusion matrices for target classification result. The training time was computed based on the Intel Core i7-9750H CPU (Intel Corporation, Santa Clara, CA, USA), GeForce GTX 1650 GPU (NVIDIA Corporation, Santa Clara, CA, USA), and Samsung 16 GB RAM (Samsung Electronics, Suwon, Republic of Korea).The required training time and average accuracy are shown in
Table 5. As shown in
Table 5, the average classification accuracies when using PointNet and SqueezeNet were 67.9% and 91.1%, respectively. Compared to the proposed method, the accuracies of each method were 23.8% and 0.6% lower, respectively. Additionally, when comparing the time required for training, the proposed method showed a faster training time than the other two methods. As a result, the proposed method showed the highest performance relative to the required training time compared to target classification methods that do not use spatial information.
Moreover, we also compared the proposed method with other research methods that classify targets using spatial features, which are currently attracting attention. The first method mainly converts the sparse point cloud into an RF image to obtain the accurate shape of the target. For instance, PointNet can accurately capture the local structure and geometry of a vehicle from dense point clouds. However, millimeter-wave radar only captures the vehicle’s edge in the point cloud, leaving other regions unknown. Consequently, PointNet cannot accurately infer the shape and category of vehicles from these sparse point clouds. Therefore, the researchers noted that RF images from automotive radars provide more information for target detection than point clouds. However, they contain significant noise, increasing neural network complexity and slowing down processing. In contrast, radar point clouds offer simpler data collection, lower noise, and faster processing through peak detection algorithms such as constant false alarm rate. Therefore, our proposal used radar point clouds to not only identify clusters but also accurately describe the outlines of these clusters, thereby improving the general accuracy of deep learning even for sparse point clouds. The second method is to estimate target outline information from a high-resolution target point group using the rolling ball technique. This technique is effective in detailing the fine contours of radar point cloud data, similar to the proposed method. Parameters with optimal results must be set depending on the type of target, but the proposed method does not require parameter adjustment depending on the type of target. In addition, our method using convex hull processing is less sensitive to noise and outliers compared to rolling ball methods because the convex hull naturally excludes extremes. This property also improves computational efficiency, making convex hull calculations relatively efficient even for large data sets, making them suitable for computer vision and machine learning tasks. These features highlight the advantages of our approach over existing methods, providing a strong foundation for improved target classification and tracking in automotive radar applications.
5. Conclusions
In this paper, we proposed a DNN-based target classification method for high-resolution automotive radar systems. From the raw data obtained by the MIMO FMCW radar sensor, we processed and transformed it into point clouds representing four different target types: pedestrians, cyclists, sedans, and SUVs. Then, we extracted the vertices of the point cloud surrounding the targets in 3D and 2D space. Using the vertices constituting the convex hull of the targets, we obtained more accurate spatial information regarding the target. We configured the feature vectors by incorporating the obtained spatial features along with the number of points. Then, we evaluated the classification performance of the DNN classifier using the selected features. Finally, we compared the proposed method with other target classification methods that do not use spatial information, and the proposed target classification method exhibited faster training time and higher classification accuracy.