Improved Gaussian Convolution Kernel Boosts Substation Equipment Detection in Infrared Images

In the ever-evolving landscape of power grid intelligence, the accurate and efficient detection of substation equipment through infrared imaging has become a cornerstone of reliable power system operation. Infrared image target detection systems offer unparalleled advantages such as strong anti-interference capabilities and all-weather operation, making them indispensable for real-time monitoring, fault early warning, and routine inspection of substation equipment. However, traditional infrared image target detection algorithms have long been plagued by low detection accuracy and high computational latency, compounded by the inherent challenges of infrared imaging in substation scenarios—including incomplete target information, image defocus, occlusion by real-time data displays, low contrast, and a wide variety of complex equipment types. These issues have severely limited the practical application of infrared detection technology in smart substations, creating an urgent need for optimized algorithmic solutions that balance precision and efficiency.

A team of researchers led by Wu Tianquan and Gou Xiantai has addressed this industry pain point with a groundbreaking infrared image detection method for substation equipment based on an improved Gaussian convolution kernel, built on the anchor-free CenterNet model. The research, published in Infrared Technology, delivers a streamlined network structure with reduced computational complexity, achieving accurate identification and positioning of substation equipment in infrared images and providing a novel and practical technical approach for the intelligent infrared detection of power grid equipment. The study leverages on-site data collected by substation inspection robots equipped with infrared thermal imagers, training and validating the algorithmic model through deep learning techniques that judge equipment center points to realize target classification and regression. Rigorous experimental results confirm that the proposed method significantly elevates the accuracy of substation target detection, marking a notable advancement in the field of computer vision applications for the power industry.

The Limitations of Traditional Target Detection in Substation Infrared Imaging

Computer vision-based target detection has long been a research focus and difficulty in the field, with widespread applications in industrial production, unmanned driving, and video surveillance—especially in the power industry, where it directly impacts the effective working distance of infrared image early warning systems and the configuration of monitoring equipment and personnel. Infrared imaging, as a non-contact detection technology, is critical for identifying abnormal heating of substation primary equipment, a key indicator of potential faults. However, infrared image resolution is highly susceptible to environmental factors such as atmospheric absorption and scattering, failing to restore true resolution and fully reflect the texture information of detected targets. This leads to insufficient contrast performance, blurry temperature interval boundaries, and even partial target information loss in the images, making it hard to capture the complete contour of substation equipment.

Before the advent of deep learning, traditional target detection methods were divided into instance detection and category detection, relying on manual feature description to identify targets. While these methods achieved certain results in controlled scenarios, their detection accuracy was too low to be widely applicable in the complex real-world environment of substations, failing to meet the practical needs of power grid operation. Since 2014, deep learning-based target detection has evolved into two mainstream categories: one-stage methods represented by YOLO and SSD, and two-stage methods typified by the RCNN series. Both categories have their own strengths and weaknesses, but their practical application in substation infrared image detection has fallen short of expectations due to constraints in model architecture size and detection criteria, as well as high computational resource requirements.

The emergence of the anchor-free CenterNet algorithm in April 2019 marked a pivotal shift in target detection technology. Developed by a joint research team from the Chinese Academy of Sciences, the University of Oxford, and Huawei Noah’s Ark Lab, CenterNet transforms target detection into a key point detection task, enabling the perception of internal object information with a streamlined model structure and low computational demands. The algorithm achieved a 47% detection accuracy on the COCO dataset, outperforming previous one-stage algorithms by a significant margin and providing a new research direction for substation infrared image detection. However, the application of the original CenterNet model in substation equipment detection was still in its infancy, with room for optimization in adapting to the unique characteristics of substation infrared images—such as low contrast and complex equipment types. This research gap became the core starting point for the research team’s work.

Building a Specialized Substation Infrared Image Dataset

The foundation of any effective deep learning-based target detection algorithm is a high-quality, scenario-specific dataset, and substation infrared imaging is no exception. The research team collected infrared images of substation equipment using inspection robots equipped with infrared thermal imagers, a method that ensures the authenticity and practicality of the data by capturing images in real on-site operating conditions of substations. The collected images were preprocessed using the OpenCV (Open Source Computer Vision Library) to enhance dataset diversity and the model’s generalization ability, with operations including flip transformation, random cropping, rotation, affine transformation, and scale transformation. These preprocessing steps help the model better adapt to the variations in infrared image capture angles, distances, and environmental conditions in actual substation scenarios.

The final substation infrared dataset used for model training and testing consists of 1,570 images with a resolution of 640×480, categorized into 10 key substation equipment types—Arrester, Breaker, Current transformer, Disconnector, Electric reactor, Voltage transformer, Aerial conductor, Condenser, Main transformer, and Tubular busbar. The number of images for each category is tailored to the actual occurrence frequency of the equipment in substation scenarios, with Voltage transformer accounting for the largest share (303 images) and Tubular busbar the smallest (72 images). This proportional distribution ensures the model can learn the features of each equipment type in line with real-world substation operation, avoiding overfitting to high-frequency equipment or underfitting to low-frequency but critical equipment.

The dataset exhibits the typical challenging characteristics of substation infrared images: low contrast with unclear temperature interval boundaries, incomplete target information with unrecognizable complete contours, image defocus with minimal changes between large pixel values, occlusion of equipment by real-time display data during collection, and a large variety of complex equipment types. These characteristics make the dataset a rigorous test for the proposed algorithm, and its use in training ensures the model’s practical applicability in actual substation inspection scenarios, rather than just in controlled laboratory conditions.

The Anchor-Free CenterNet: A Foundation for Optimization

The research team selected the anchor-free CenterNet model as the base framework for its substation equipment detection method, drawn to its streamlined structure, low computational cost, and innovative key point-based detection logic. Unlike anchor-based algorithms that require the predefinition of a large number of anchor boxes and subsequent complex calculation of intersection over union (IOU) for box matching, CenterNet converts target detection into the detection of object center key points, eliminating the need for anchor box design and reducing computational complexity—an essential advantage for the real-time detection requirements of substation inspection robots.

The target detection process of the CenterNet network begins with feeding a 512×512×3 infrared image into the backbone network, which extracts high-dimensional feature maps of the image. The model then generates a key point heatmap through the feature maps, with the heatmap dimension defined as (hat{Y} in[0,1]^{frac{W}{R} × frac{H}{R} × C}), where R is the transformation scale (stride corresponding to the original image), and C is the number of key point types (i.e., the number of substation equipment categories). For this study, C is set to 10 to correspond with the 10 equipment types in the dataset, and a value of (hat{Y}_{x, y, c} approx1) at a given coordinate (x, y) indicates the detection of the corresponding equipment type at that position, while a value close to 0 indicates the background.

The model first screens the top 100 points on the heatmap that are greater than or equal to the values of their 8 neighboring points as preliminary predicted center points, then predicts the offsets of these center key points and the width and height of the target corresponding to each point. By fusing the center point coordinates, offsets, and target size information, the model calculates the coordinates of the predicted bounding box, realizing the positioning of substation equipment. A critical step in this process is the generation of the ground truth (GT) heatmap, where the research team uses a Gaussian kernel function to distribute key points onto the feature map. For each target in the GT, the center point is calculated, and the Gaussian kernel function is used to assign values to the surrounding areas of the center point on the feature map, with the value decreasing as the distance from the center point increases—this process helps the model learn the spatial distribution characteristics of the target center and improve detection accuracy.

The total loss function of the CenterNet model is composed of three parts: the center point loss function ((L{k})), the target size loss function ((L{size})), and the target center point offset loss function ((L{off})), with the formula (L{det }=L{k}+lambda{size } L{size }+lambda{off } L{off }). The weight coefficients (lambda{size}) and (lambda_{off}) are set to 0.1 and 1 respectively, a configuration that balances the contribution of each loss component to the model training and ensures the model prioritizes the accurate detection of center points and their offsets—critical for the positioning of substation equipment in infrared images.

Innovating the Gaussian Convolution Kernel: The Core of the Improvement

The core innovation of the research lies in the improvement of the Gaussian convolution kernel used for GT heatmap generation in the CenterNet model. The original Gaussian kernel function in CenterNet generates a circular heatmap for all targets, regardless of the actual shape (aspect ratio) of the substation equipment, leading to two key problems in practical application: the Gaussian radius of the detected target is fixed (circular), failing to match the irregular shapes of substation equipment; and the coordinate range calculated by the bounding box (bbox) does not cover the entire heatmap, resulting in non-zero heat values outside the bbox. These issues cause ambiguous division of positive and negative samples during model training, inaccurate weighting in loss calculation, confusion of the convolutional network’s autonomous learning, and increased model computational load—directly leading to reduced detection accuracy in substation infrared images.

To address these limitations, the research team proposed an improved two-dimensional Gaussian convolution kernel, designed to make the heatmap adapt to the aspect ratio of the detected substation equipment and limit the heatmap to the interior of the target bbox. The team first recognized that in actual detection, there is a controllable error range between the top-left and bottom-right corners of the GT box and the predicted box: as long as the predicted range is within a certain radius r of these key points and the IOU between the predicted box and the GT box is greater than the threshold of 0.7, the predicted box can effectively enclose the target and should not be directly labeled as a negative sample (value 0). Based on this insight, the team derived three equations for calculating the Gaussian radius r by analyzing three different positional relationships between the predicted box and the GT box, with overlap representing the IOU ratio between the two boxes. The final Gaussian radius r is set to the minimum value of the three calculated radii ((r=min (r{1}, r{2}, r_{3}))), ensuring the radius is adaptive to the relative position of the predicted and GT boxes and the actual size of the target.

The improved Gaussian convolution kernel adheres to two key design principles: the Gaussian radius changes with the width (w) and height (h) of the detected substation equipment, so the heatmap is an ellipse matching the equipment’s aspect ratio instead of a fixed circle; and the calculated coordinate range of the heatmap is limited to the interior of the target bbox, with all heat values outside the bbox set to 0. This optimization resolves the ambiguity of positive and negative sample division in the original model: in the original method, negative samples outside the bbox had a loss weight greater than 1, which conflicted with the theoretical expectation that these points should be pure background (weight 0) and even made some strict negative samples lean toward positive samples. The improved heatmap ensures that negative samples outside the bbox have a loss weight of 0, while negative samples inside the bbox have a weight that decreases as the distance from the center point increases (a value less than 1), making the sample weighting more accurate and the loss calculation more in line with the actual detection scenario.

In addition, the research team visualized the improved Gaussian convolution kernel using 3D mesh plots with different standard deviation ((sigma)) values (1 and 5), demonstrating that the heatmap value follows the rule of (e^{-frac{x^{2}+y^{2}}{2 sigma^{2}}}) and decreases from the center point (value 1) to the surrounding area. The standard deviation (sigma) is correlated with the target size (w, h) of the substation equipment, further ensuring the heatmap is tailored to the characteristics of each equipment type and enhancing the model’s ability to learn the unique features of different substation equipment in infrared images.

Rigorous Experimental Validation: Proving the Algorithm’s Superiority

To fully verify the performance of the proposed method based on the improved Gaussian convolution kernel, the research team designed a series of comparative and practical experiments, with the experiment platform configured with a Windows 10 operating system and the PyTorch deep learning framework—hardware including an Intel Core I7 9700K CPU, an NVIDIA RTX 2080 Ti GPU, 512G memory, and a 4T SAS 7.2K hard disk. This high-performance hardware configuration ensures the efficient operation of model training and testing, avoiding computational bottlenecks that could affect experimental results. The team used Labelimg to annotate the collected infrared images, organizing the annotated data into a training set that conforms to the COCO standard image format—an industry-wide standard that ensures the compatibility and reproducibility of the experimental results.

The experiments were designed with two core objectives: to verify the improvement in detection accuracy of the model with the improved Gaussian convolution kernel compared to the original CenterNet model, and to test the model’s practical performance in real substation scenarios. For the first objective, the research team conducted comparative experiments on three mainstream backbone networks—DLA-34, ResNet-101, and ResNet-18—training each model for 200 epochs with a learning rate of 0.001, and evaluating the model performance using key metrics including mean average precision (mAP), heatmap loss (hm_loss), width and height loss (wh_loss), offset loss (off_loss), and total loss (Loss). The mAP is the core evaluation metric for target detection algorithms, calculated by first computing the average precision (AP) for each equipment category (with an IOU threshold of 0.5) and then taking the mean of all APs—an indicator that comprehensively reflects the model’s detection accuracy across all substation equipment types.

The comparative experimental results revealed a significant improvement in model performance after the Gaussian convolution kernel optimization: the DLA-34 backbone model achieved an mAP increase from 0.685 to 0.705 (a 2.9% rise), the ResNet-101 model from 0.661 to 0.723 (a 9.4% increase), and the ResNet-18 model from 0.463 to 0.582 (a remarkable 25.7% boost). The total loss of all three models also decreased significantly, with the ResNet-18 model showing the most pronounced improvement in both mAP and loss reduction—indicating that the improved Gaussian convolution kernel has a more significant optimization effect on lightweight backbone networks, a critical advantage for the deployment of the algorithm on substation inspection robots with limited computational resources. Among the three backbone networks, the ResNet-101 model achieved the highest mAP (0.723) after optimization, making it the selected backbone network for the practical scenario test of the improved CenterNet model, as it balances detection accuracy and computational efficiency for real-world substation applications.

For the practical scenario test, the research team used 387 infrared images collected from actual substation operating environments, covering all 10 equipment categories in the dataset, and evaluated the model performance using metrics including average accuracy, miss ratio, fallout ratio, and total detection time for each equipment type. The experimental results showed that the improved model achieved an overall average accuracy of 85.4% across all 10 equipment types, fully meeting the high-precision requirements of substation infrared image target detection. Three key substation equipment types—Breaker, Current transformer, and Voltage transformer—achieved an average accuracy of over 90%, with the Breaker reaching the highest at 0.912, a testament to the model’s ability to accurately detect high-priority and frequently inspected equipment in substations. The miss ratio (the ratio of undetected targets to the total number of targets) for all equipment types was kept at a low level, with the Breaker having the lowest miss ratio of 0.024, and the fallout ratio (the ratio of false detections to the total number of detections) close to 0 for most equipment types—indicating the model has strong anti-interference ability and low false detection rate, critical for avoiding unnecessary maintenance and fault misjudgment in power grid operation.

In terms of detection efficiency, the model maintained a low total detection time for each equipment type, with the Tubular busbar (8 images, 15 targets) taking only 0.874 seconds and the Voltage transformer (76 images, 146 targets) taking 6.519 seconds—demonstrating the model’s computational efficiency and suitability for real-time detection by substation inspection robots. The detection results also confirmed that the improved method can effectively eliminate various interference factors in infrared images (e.g., occlusion, defocus, low contrast) and automatically locate and identify substation equipment categories, achieving accurate detection even in the complex on-site environment of substations.

Implications for Smart Grid Development and Future Research

The research’s proposed infrared image detection method for substation equipment based on an improved Gaussian convolution kernel delivers a dual breakthrough in accuracy and efficiency, addressing the long-standing challenges of traditional algorithms in substation infrared detection and providing a new technical solution for the intelligent inspection of smart grids. The streamlined network structure and reduced computational complexity of the model make it highly deployable on substation inspection robots and mobile monitoring equipment, enabling real-time, on-site detection of substation equipment—this not only improves the efficiency of substation inspection and fault early warning but also reduces the reliance on manual inspection, lowering the operational cost of power grids and enhancing the safety and reliability of power system operation.

In the context of the global push for energy transition and smart grid construction, the integration of computer vision and infrared detection technology has become an inevitable trend in the digital transformation of the power industry. This research enriches the application of anchor-free target detection algorithms in the power field, providing a reference for the optimization of deep learning models for scenario-specific infrared image detection—such as the detection of transmission line equipment, wind power generation equipment, and photovoltaic power station equipment. The core idea of the improved Gaussian convolution kernel, i.e., adapting the model’s feature learning process to the actual characteristics of the detected target and the imaging scenario, can be extended to other computer vision tasks in the power industry, including equipment fault segmentation, abnormal heating analysis, and real-time monitoring of power grid operation status.

While the research achieves notable results, the team also identifies potential directions for further optimization: the current dataset, although practical, can be expanded to include more substation equipment types and infrared images collected under extreme environmental conditions (e.g., heavy fog, rain, snow) to further enhance the model’s generalization ability; the model can be combined with lightweight neural network technologies (e.g., model pruning, quantization) to further reduce computational complexity and enable deployment on low-power edge computing devices; and the integration of the detection method with infrared temperature measurement technology can realize the simultaneous detection of equipment position and temperature, providing more comprehensive fault information for power grid maintenance personnel.

Author Information and Publication Details

Authors: Wu Tianquan¹, Guo Jing², Gou Xiantai², Huang Qinqin², Zhou Weichao³

State Grid Chaozhou Electric Power Co., Ltd, Chaozhou 521000, China
College of Electrical Engineering, Southwest Jiaotong University, Chengdu 611756, China
Sichuan Scom Intelligent Technology Co., Ltd, Chengdu 610041, China Journal: Infrared Technology, Volume 43, Issue 3, March 2021 Article Title: Method of Detecting Substation Equipment in Infrared Images Based on Improved Gaussian Convolution Kernel DOI: 10.11846/j.issn.1001_8891.202103008 Funding: Sichuan Provincial Major Artificial Intelligence Special Project (2018GZDZX0043); China Southern Power Grid Science and Technology Project (035100KK52190003)

(Word count: 3896)