1 Introduction

Scrap steel serves as a significant source of iron in the modern steel industry [1, 2] and is crucial for steel companies to achieve carbon neutrality [3, 4]. The prices of different grades of scrap steel vary greatly, directly impacting the production costs and product quality of steel companies [5].

Scrap steel has a significant impact on the steelmaking costs of steel mills [6, 7]. During the steelmaking process, the yield of steel and the quality of scrap steel are closely related [8]. For example, in electric arc furnace smelting, the yield of heavy scrap steel with a thickness above 6 mm can reach 98–99%, while the yield of thin-sheet metal with a thickness below 3 mm is only around 90%. Since scrap steel is an important source of iron for electric arc furnaces and basic oxygen furnaces, the disparity in yield rates between different grades of scrap steel has a decisive influence on cost control for steel companies. Scrap steel also directly affects the quality of finished steel products [9]. For instance, there are significant differences in the harmful element content, rust, and hydrocarbon compound content in scrap steel of different grades [10]. Poor-quality scrap steel with high levels of harmful elements increases the impurity content in the molten steel. Excessive rust increases oxidation, leading to decreased cleanliness of the steel liquid, while high hydrocarbon compound content increases hydrogen content in the molten steel. Therefore, strict and fair inspections are required at the source of raw materials entering the steelmaking plant.

Currently, most steel mills still rely on manual operations for the classification and grading of scrap steel, which leads to a series of drawbacks such as high risk, inaccurate grading, lack of standardization, and unfairness [5], severely hindering the rational and efficient utilization of scrap steel, affecting the quality of molten steel in electric arc furnaces, increasing production costs for steel mills, and causing significant losses to steel enterprises [11, 12].

The paper proposes an intelligent scrap steel quality inspection system based on machine vision and deep learning technologies, aiming to achieve intelligent classification and grading of scrap steel raw materials. This system standardizes, simplifies, and accelerates the classification and grading of scrap steel, thereby enabling strict control over the quality of materials entering the furnace [13, 14]. Regarding settlement between scrap steel suppliers and steel mills after classification and grading, the relatively objective grading results can effectively resolve billing disputes and potential transaction risks. Additionally, accurately identifying the types of scrap steel before entering the furnace has significant benefits for temperature prediction, steel output prediction, and cost control. The intelligent scrap steel classification and grading system is of great significance for achieving rational resource utilization and promoting energy-efficient transformation and green development in the steel industry [15,16,17].

The innovative contributions of this paper are as follows:

(1) Proposed a machine vision-based scrap steel classification and grading method, which demonstrates excellent performance and generalization, enhancing the accuracy, safety, and fairness of scrap steel quality inspection.

(2) Introduced the Deeplabv3 + carriage segmentation model, which reduces the influence of complex backgrounds in scrap steel images on classification and grading by employing the carriage segmentation method, thereby improving the accuracy of classification and grading.

(3) Presented the application of the SAHI (Spatially Adaptive Heterogeneous Image Slicing) image slicing prediction method, which enables accurate classification and grading of small target scrap steel in high-resolution images.

Fig. 1
figure 1

Scrap intelligent classification and rating system architecture

Figure 1 depicts the architecture of the intelligent classification and grading system for scrap steel. Carriage segmentation effectively mitigates the influence of complex backgrounds on classification and grading. Additionally, by integrating the scrap steel classification and grading model (CSBFNet) with SAHI’s intelligent quality inspection model for scrap steel, the issue of suboptimal classification and grading results for small-sized or less distinct features of scrap steel due to excessively high image resolution is addressed. This model significantly enhances the accuracy and generalization of scrap steel classification and grading. The subsequent structure of this paper is outlined as follows: Sect. 2 introduces relevant work on scrap steel recovery and classification, summarizing the strengths and limitations of existing research. Section 3 provides an overview of dataset establishment, data preprocessing methods, model evaluation criteria, and the architecture of the intelligent classification and grading system for scrap steel (comprising the scrap steel intelligent quality inspection model, carriage segmentation model, and SAHI image slicing method). Section 4 offers a detailed analysis of experimental data and results. Section 5 provides a summary of the paper. Section 6 discusses future work.

2 Related work

Scrap steel is an important and environmentally friendly substitute for iron ore in the steel industry, and its quality directly affects the quality of molten steel [9]. Effectively classifying and rating scrap steel can reduce environmental impacts and minimize the use of primary raw materials, while also facilitating carbon neutrality in the steel industry and improving production efficiency [18, 19]. The rapid identification and classification of scrap metal (primarily referring to scrap steel) using methods such as thermoelectricity, spectroscopy, and X-ray measurements are relatively common [20]. Riley et al. surveyed these technologies and applied them in specific waste identification systems [21]. Mesina et al. discussed new sensor-based methods for scrap metal recycling, including an automatic sorting prototype developed at Delft University of Technology [22]. Although manual sorting is widely used, it is an inaccurate method for alloy sorting. Spencer proposed an alternative method for waste identification and classification, utilizing optoelectronic technology for rapid and accurate waste identification [23]. Cuce et al. proposed a novel method based on thermal conductivity for material identification in the waste industry, aiming to address the challenges of rapid and accurate sorting. This method utilizes a device with a constant heat flux source and cooling system to measure the temperature gradient of sample metals and determine their thermal conductivity. Experimental results show good agreement with literature data, achieving acceptable measurement errors ranging from 0.56 to 4.46% for various metal samples [24]. Brooks et al. demonstrated that handheld analyzers using X-ray Fluorescence Technology (XRF) and Laser-Induced Breakdown Spectroscopy (LIBS) can improve the detection process by bridging the gap in expertise between experts and novices. These technologies aid in improving functionality, performance, and accessibility, and become more cost-effective. However, their performance on unprepared materials (such as old, used, weathered, or warped waste) still has limitations in providing reliable composition percentages. While households may inform and verify the contents of various materials, certain metal types and conditions in scrapyard environments still pose challenges [25, 26].

Using spectrometers and LIBS technology enables the identification and classification of alloy elements. Auer et al. proposed a machine-learning approach that utilizes optical emission spectroscopy data inputted into supervised learning machine learning algorithms to quickly and reliably automate the identification of alloys in recycling [27]. Kashiwakura et al. constructed a LIBS system to identify five types of austenitic stainless steels and used Partial Least Squares Regression (PLSR) to determine element concentrations. By selecting specific emission lines with higher excitation levels and PLSR, accurate results for Cr, Ni, Mo, and Nb can be obtained [28]. Automatic identification of recyclable scrap metal is also significant for ecological conservation. Li et al. proposed a recognition method for different non-ferrous metal scraps, particularly aluminum (Al) and copper (Cu), utilizing Convolutional Neural Network (CNN) and SEEDS (Superpixels Extracted via Energy-Driven Sampling) for classification to improve recycling and sorting techniques, to enhance metal resource utilization, and promote sustainability [29]. Diaz-Romero et al. proposed single-output and multi-output models, combining LIBS with two DenseNets for later fusion, and a network with two outputs for reinforcement learning and avoiding overfitting. This method achieved high accuracy, with the single-output model effectively separating cast aluminum and forged aluminum, highlighting the potential to improve recycling quality and efficiency [30]. Park et al. proposed two image processing algorithms to measure the three-dimensional shape of metal fragments and calculate optimized (relatively clean and flat) surface areas to improve the maximum classification accuracy of LIBS spectra [31]. Zeng et al. proposed a hybrid algorithm combining Support Vector Machine (SVM) and peak detection for element identification, and experimental results using a simulated alloy LIBS database showed significantly improved identification accuracy, particularly in identifying general metal elements. This method effectively enhances the detection accuracy of identifying recyclable scrap metal [32].

Using deep learning techniques to achieve the recognition and classification of scrap metal. Koyanaka and Kobayashi integrated neural network analysis into fragment identification algorithms by combining 3D imaging cameras and weight measurement devices to achieve automatic sorting of light metal fragments [33]. The development of multi-object detection technology [34, 35] provides new research directions for scrap steel recognition and classification. Daigo et al. used deep learning-based techniques to classify the thickness or diameter of steel materials in heavy scrap steel through image analysis. The developed model effectively classifies different thickness or diameter categories, demonstrating the potential of image analysis in scrap steel classification while addressing challenges in image acquisition and annotation procedures [36]. Gao et al. proposed a scrap steel grading system based on 3D vision [37] and introduced a novel thickness feature descriptor, which performs edge detection on well-organized point clouds and detects point pairs meeting thickness features. By filtering invalid thickness features based on contextual information and merging correct thickness features, the thickness feature histogram (TFH) is ultimately computed, and the scrap steel is graded using TFH [38]. Xu et al. proposed a deep learning-based model for efficient and accurate classification and grading of scrap steel, overcoming the limitations of traditional manual methods. The model outperforms traditional methods and provides a solution for evaluating the quality of scrap steel in the recycling process [39]. Tu et al. proposed a novel framework comprising a wagon attention module (CaM), a scrap steel detection module (SDM), and a scrap steel sorting module (SGM) to evaluate the final grade of all images captured from wagons during the scrap unloading process [40].

Deep learning applications in the steel industry. To address the issue of insufficient defect sample data in the identification and classification of strip steel defects, Yi et al. proposed the Surface Defect-ConSinGAN (SDE-ConSinGAN) model for strip steel defect identification, which constructs an image feature segmentation and stitching framework on a single-image model trained by Generative Adversarial Networks (GAN) [41]. Zhang et al. proposed a quantitative identification method based on Continuous Wavelet Transform (CWT) and Convolutional Neural Networks (CNN) to solve the problem of internal and external wire breakage detection in steel wires [42]. Nickel-based superalloys are widely used for their strength and high-temperature resistance, and laser cutting has become a suitable method for preserving these properties. Wang et al. investigated the influence of laser cutting parameters on cutting quality and surface roughness, using response surface methodology to create predictive models, revealing the significant effects of cutting speed, laser power, and focal length on temperature and surface quality. In thermal engineering applications, the temperature field during laser welding significantly affects welding quality, microstructure, and mechanical properties [43]. Rawa et al. proposed a novel artificial intelligence approach integrating artificial neural networks and particle swarm optimization algorithms for predicting the melting rate and maximum temperature. It demonstrated satisfactory performance in regression tasks, accurately predicting the melting rate and final temperature [44]. Sun et al. analyzed the influence of pulse laser welding parameters on the melt pool velocity field and temperature distribution in dissimilar material laser welding of stainless steel 420 and stainless steel 304 using numerical simulation and artificial neural networks (ANN) [45]. Other creative contributions include [46,47,48,49,50].

In summary, most scholars focus on the detection of alloy elements in the research on the classification and grading of scrap metal. The detection of scrap steel is still in the experimental exploration stage, and current research faces challenges such as low accuracy, poor generalization, inadequate performance in detecting small target scrap steel, interference from background environments affecting detection accuracy, and inability to meet real-time requirements. There is still significant research and development space for industrial applications and meeting actual production needs. Therefore, this paper proposes a machine vision-based intelligent classification and grading method for scrap steel. It achieves high accuracy even for small target scrap steel while meeting real-time requirements, significantly reducing the influence of background environments, controlling costs, and demonstrating good generalization. This approach represents an initial step towards industrial application table 1.

Table 1 Related work on the classification and grading of scrap steel

3 Materials and methods

3.1 Dataset

3.1.1 Image capture

Since there was no publicly available scrap steel dataset, we procured scrap steel image acquisition equipment that meets the requirements of our research work. We installed and debugged the equipment on-site to obtain the dataset needed for the experiments. Scrap steel image acquisition was conducted using three cameras at the scrap steel site, as shown in Fig. 2.

Fig. 2
figure 2

Scrap image capture

3.1.2 Scrap dataset

A laboratory scrap steel image dataset (HK_L) has been established. The HK_L dataset comprises seven categories: thickness less than 3 mm (< 3 mm), thickness between 3 and 6 mm (3–6 mm), thickness greater than 6 mm (> 6 mm), galvanized, painted, greasy dirt, and inclusions. Further optimization and adjustments were made to the dataset based on field surveys and standards for scrap steel recycling in the steel industry. Subsequently, a field scrap steel dataset (HK_T) was established. Categories such as carriages and cranes, which affect detection accuracy at the site, were removed. The HK_T dataset now consists of nine categories: airtight, scattered, inclusions, ungraded, overlength (1.2 to 1.5 m), overlength (1.5 to 2 m), thickness less than 3 mm (< 3 mm), thickness between 3 and 6 mm (3–6 mm), and thickness greater than 6 mm (> 6 mm). The images in the training and validation sets are divided in a ratio of 9:1. Detailed data for both datasets are presented in Table 2.

Table 2 HK_L and HK_T datasets

Because the company is engaged in the production of specialty steel, the targeted category for scrap steel procurement is heavy scrap steel. In procurement standards, materials with a thickness greater than 6 mm (> 6 mm) are considered acceptable raw materials, leading to a higher proportion of this type of scrap steel. However, to enhance the model’s robustness and generalization, two approaches, class weighting, and image weighting, were employed to make improvements during the model training process.

3.1.3 Carriage segmentation dataset

The on-site waste steel unloading and quality inspection scene is illustrated in Fig. 3. From the image, it is evident that the background of the data for vehicles awaiting grading collection is quite complex. Moreover, compared to the waste steel inside the carriage, the proportion of waste steel in the background is relatively high, posing a significant interference to effectively grade the quality of the cargo. To reduce the interference of background information on the classification and rating effectiveness of the model, a carriage segmentation dataset was established. An image segmentation model was applied to segment the images collected on-site. The carriage segmentation dataset consists of 930 images, with one label category (carriage). The training set and validation set of the carriage dataset are also divided in a 9:1 ratio, as shown in detail in Table 3.

Fig. 3
figure 3

Scene of scrap unloading and quality inspection

Table 3 Carriage segmentation dataset

3.2 Image preprocessing

Image preprocessing is a critical step in machine vision and image analysis. It involves various techniques and methods aimed at enhancing the quality of images by correcting and removing various image artifacts, such as noise, distortion, blurriness, and changes in lighting conditions. Commonly used techniques and methods include brightness adjustment, filtering, and geometric transformations.

In this study, image preprocessing techniques were applied to laboratory and on-site image datasets for brightness adjustment, noise reduction, and geometric transformations. (1) Regarding image denoising [51], four filtering algorithms, namely median filtering, bilateral filtering, mean filtering, and Gaussian filtering, were compared. Eventually, a 3 × 3 median filter was chosen for denoising as it effectively removes noise while preserving image clarity. (2) Concerning geometric transformations [52], image rotation, and cropping were employed to address the issue of high image resolution in on-site image capture and limitations in experimental equipment performance, which made it impractical to train models on the original images. (3) For brightness adjustment [53], a gamma value of 1.5 was selected to simulate the impact of weather changes on image brightness in recycled steel raw material images.

3.2.1 Image filtering and denoising

In the application of digital image processing, the quality of images captured by cameras is often affected by the inherent noise of electronic components, such as dark current, readout noise, thermal noise, etc. These noises can lead to a degradation in image quality and impact the effectiveness of subsequent model training. Filtering is used to remove noise and unnecessary information from images, thereby improving image quality. Common filtering methods include Gaussian filtering, median filtering, mean filtering, and bilateral filtering.

  1. (1)

    Gaussian filtering is a filtering method based on the Gaussian function. It achieves smoothing by applying a Gaussian kernel to each pixel of the image. Gaussian filtering is typically used to remove Gaussian noise and provides effective smoothing.

  2. (2)

    Median filtering is a nonlinear filtering method that smooths an image by sorting the pixel values in the neighborhood of each pixel and replacing the pixel value with the median value. Median filtering is commonly used to remove salt-and-pepper noise and effectively removes noise while preserving image details.

  3. (3)

    Mean filtering is a linear filtering method that smooths an image by averaging the pixel values in the neighborhood of each pixel. Mean filters are usually used to remove Gaussian noise and can effectively remove noise while preserving edge information in the image.

  4. (4)

    Bilateral filtering is a nonlinear filtering method that applies Gaussian filtering to both pixel values and pixel distances to achieve smoothing. Bilateral filtering is often used to remove Gaussian noise and can effectively remove noise while preserving edge information in the image.

To compare the denoising effects of different filters, this section uses a poor-quality local scrap steel image. It adds a certain amount of Gaussian noise, salt-and-pepper noise, and Poisson noise to generate an image with mixed noise, as shown in Fig. 4. The denoising effects of using four different filters with two different filter kernel sizes are shown in Figs. 5 and 6.

From Figs. 5 and 6, it can be observed that bilateral filtering is generally ineffective in dealing with salt-and-pepper noise, resulting in poor denoising performance for mixed noise. Mean filtering not only fails to remove noise effectively but also blurs the image, especially when using a 5 × 5 filter kernel. Gaussian filtering shows little difference in denoising effectiveness between the two different kernel sizes and performs poorly in removing salt-and-pepper noise.

In contrast, median filtering demonstrates significantly better denoising performance than the other three filters. Although using a 5 × 5 median filter can effectively remove noise, it slightly blurs the image compared to a 3 × 3 median filter. Considering all factors, a 3 × 3 median filter is selected for denoising, as it effectively removes noise while maintaining image clarity. This confirms that, in terms of image denoising, median filtering is a more effective method.

Fig. 4
figure 4

Add noise diagram. (a) Scrap steel original drawing; (b) The image after adding mixed noise

Fig. 5
figure 5

Comparison of the de-noising effect of different filters under 3 × 3 filter kernel. (a) Median filter; (b) Bilateral filtering; (c) Mean filtering; (d) Gaussian filter

Fig. 6
figure 6

Comparison of the de-noising effect of different filters under 5 × 5 filter kernel. (a) Median filter; (b) Bilateral filtering; (c) Mean filtering; (d) Gaussian filter

3.2.2 Image geometric transformation

Due to the high resolution of images collected from the on-site scrap steel, and limitations in experimental equipment performance preventing the use of full original images for training, the issue is addressed by employing local image cropping. The laboratory dataset possesses a suitable resolution and therefore does not require geometric transformations on the images. Geometric transformation of images refers to altering the positions, angles, scales, and other attributes of pixels within a two-dimensional image. Common geometric transformations include rotation and cropping. The dataset utilizes rotation and cropping for image geometric transformations, as illustrated in Fig. 7.

  1. (1)

    Rotation: This involves rotating the image by a given angle. Common rotation methods include both clockwise and counterclockwise rotations, with the rotation center generally being the image center.

  2. (2)

    Cropping: This entails cutting or trimming a specific part of an image to retain the desired portion, resulting in improved display and obtaining the necessary localized image. Image cropping can be employed to eliminate irrelevant backgrounds or noise.

Fig. 7
figure 7

Geometric transformation of scrap image. (a) Original image; (b) Rotation; (c) Cropping

3.3 Methods

3.3.1 CSBFNet intelligent classification rating model

According to the characteristics of on-site steel scrap types, the CSBFNet model is proposed for quality inspection and rating in the process of on-site steel scrap unloading. The network framework is shown in Fig. 8. The CSBFNet model was proposed for quality assessment during the unloading process of on-site scrap steel, based on the characteristics of on-site scrap steel types. The network framework is illustrated in Fig. 8, where CSP represents Cross Stage Partial Networks primarily responsible for feature extraction; SE denotes the Squeeze-Excitation attention mechanism, enhancing the network’s feature representation capability by explicitly modeling dependencies between channels; BiFPN (Bidirectional Feature Pyramid Network) is an efficient feature pyramid network aimed at improving the propagation and fusion of feature information across different scales, thereby enhancing the model’s performance.

Fig. 8
figure 8

Network diagram of CSBFNet model

3.3.2 Deeplabv3 + carriage segmentation model

To further improve the accuracy of scrap steel classification and grading, a Deeplabv3+ [54] carriage segmentation model is employed in front of the CSBFNet scrap steel intelligent quality inspection model to perform image segmentation of the carriage. This step is taken to reduce the impact of complex background in scrap steel images on the classification and grading performance, making the classification and grading of scrap steel more precise. The architecture of the Deeplabv3 + carriage segmentation model is illustrated in Fig. 9.

Fig. 9
figure 9

Deeplabv3 + carriage segmentation model

The overall structure of the Deeplabv3 + carriage segmentation model consists of two parts: an encoder and a decoder. The encoder typically employs a pre-trained convolutional neural network as the backbone network, such as ResNet, Xception, MobileNet, etc., to extract features from the input image. The Xception architecture is employed as the backbone network in this study, tasked with extracting features from input images. The decoder is responsible for upsampling and interpolating the feature maps obtained from the encoder, restoring them to the original input image size, while performing semantic feature fusion and detail enhancement to achieve more refined segmentation results.

In the encoder part, deep convolutional neural networks and spatial pyramid pooling modules are utilized to address the challenge of wagon scale variation in the image segmentation task. Specifically, after the last convolutional layer of the network, multiple pooling layers of different sizes are added to capture wagon features at different scales. This approach enhances the model’s segmentation accuracy for wagons of different sizes. Additionally, it aims to reduce the impact of reduced feature map resolution on segmentation results, while improving the model’s receptive field and feature extraction capabilities.

In the decoder part, a structure similar to U-Net [55] is employed. It involves upsampling and fusion of the feature maps obtained from the encoder, along with specialized processing of both low-level and high-level features to achieve more accurate segmentation results. Concretely, a series of dilated convolutional layers and transpose convolutional layers are used to respectively enlarge the receptive field of the feature maps and restore the image resolution. Furthermore, skip connections are used to fuse features from different levels in the encoder with corresponding features in the decoder, aiming to enhance segmentation precision and detail representation in the results.

3.3.3 SAHI image slicing

Due to the high resolution of the steel scrap images collected on-site and limitations in experimental equipment performance, the training images are cropped to local regions. However, during model prediction, the entire image is used for inference, leading to suboptimal performance in steel scrap detection. To address this issue, the CSBFNet model employs the SAHI (Spatially Adaptive Heterogeneous Image Slicing) [56] prediction method to enhance the model’s classification performance.

SAHI is a novel approach used for image segmentation. It adapts the segmentation strategy of an image based on the information within local pixel regions. It slices large-sized images into multiple smaller-sized images and predicts whether these smaller images contain a specific target object. The model is built upon a deep convolutional neural network architecture. It can be trained using annotated datasets or pre-trained models from other network frameworks for slice prediction, thereby improving the accuracy of small target recognition. During prediction, the SAHI model slices the large-sized image and rapidly predicts whether each small image contains the target object, enabling efficient processing of high-resolution images.

3.4 Model evaluation criteria

The model’s evaluation metrics include precision (P), recall (R), accuracy (Acc), and mean Average Precision (mAP), as described in Eq. (1) to (4) [57]. Average Precision (AP) represents the average precision under the precision-recall (P-R) curve for a single category.

$$P=\frac{TP}{TP+FP}\times 100\%$$
(1)
$$R=\frac{TP}{TP+FN}\times 100\%$$
(2)
$$mAP=\frac{1}{N}\sum _{i=1}^{N}{AP}_{i}$$
(3)
$$Acc=\frac{TP+TN}{TP+FP+TN+FN}$$
(4)

In the formula: “\(i\)” represents the i-th AP (Average Precision); “\(\text{N}\)” represents the total number of classes; “TP” stands for true positive, indicating predicted positive samples that act positive; “FP” stands for false positive, indicating predicted positive samples that are negative; " TN " stands for true negative, indicating predicted negative samples that are negative; “FN” stands for false negative, indicating predicted negative samples that are positive; “AP” stands for average precision.

4 Results and discussion

The experiments of this study were conducted on an Ubuntu 18.04 system, utilizing the PyTorch framework and programmed using the Python language with Visual Studio Code as the programming tool. The model training and testing were accelerated using two NVIDIA GeForce RTX 3090 24GB graphics cards.

4.1 Analysis of scrap intelligent rating results

4.1.1 Ablation experiment

To validate the contributions of the constructed CSBFNet components to the overall model, two different ablation experiments were conducted on two components in the model: the attention mechanism and the feature fusion module. This experiment was conducted on the HK_T dataset. “w/o” indicates the removal of a module, leaving the remaining part of CSBFNet’s model.

The experimental parameter settings are shown in Table 4. Common Adam optimizer was used as the optimization algorithm, and the loss function adopted was the classic CIOU. The input image size was set to 640 × 640 pixels. The number of training epochs was determined based on previous experimental experiences, ensuring that the model achieved good performance without overfitting. The batch size was set to a moderate value to ensure sufficient feature information per epoch without causing difficulty for the model to work. The results of the ablation experiments are shown in Table 5.

The image size was set to 640 × 640 pixels during training for two main reasons: Firstly, because the background occupies a large proportion of the waste steel images collected on-site, the images were cropped to retain only the recycled steel raw materials inside the carriage, eliminating the influence of the background. This image size is more suitable for the cropped image size, facilitating the extraction of category features. Secondly, due to the high original resolution of the on-site images, it was not feasible to train on the original images due to the limitations of the experimental equipment. Resizing the original images to 1280 × 1280 pixels for training would result in a small proportion of features, which is not conducive to extracting category feature information.

Table 4 Training parameters
Table 5 Ablation experimental results on the HK_T dataset

From the results in Table 5, it can be observed that: (1) Simultaneously removing the SE and BiFPN has the greatest impact. This indicates that SE and BiFPN can significantly improve the model’s feature extraction and fusion capabilities, thereby assisting the model in accurately identifying features of various categories of waste steel. (2) The SE attention mechanism improves the network’s ability to represent features by explicitly modeling dependencies between channels. Removing SE affects the feature extraction capability. (3) BiFPN exchanges information between high-level features (rich in semantic information) and low-level features (rich in detail information) in a bidirectional manner. This bidirectional interaction mechanism enables the network to better integrate semantic and detailed information. Removing BiFPN affects the model’s feature fusion capability and accuracy.

4.1.2 Comparative experiment

To further verify the detection performance of the CSBFNet model in the classification and grading of scrap steel, we compared it with the classic two-stage model Faster R-CNN and one-stage models YOLOv4, YOLOv5 series, and YOLOv7 model on the HK_L dataset. The comparative detection results of the models are shown in Table 6. From the results in Table 6, it can be observed that the CSBFNet model achieved an improvement in mAP compared to the classic two-stage model Faster R-CNN and one-stage models YOLOv4, YOLOv5 series, and YOLOv7 model by 23.5%, 19.5%, 23.5%, 2.3%, and 1.8%, respectively. Despite having slightly more network layers compared to the other comparative models, CSBFNet achieved a relatively good level of inference speed. These results indicate that the proposed CSBFNet has outstanding detection performance.

Table 6 Detection results of different network models on HK_L

4.2 Analysis of experimental results of carriage segmentation

The loss curve during training and validation of the Deeplabv3 + carriage segmentation model is shown in Fig. 10. From Fig. 10, it can be observed that the model’s loss curve exhibits distinct stage-like characteristics. In the initial stages of training, the model’s loss values are relatively high. However, as the number of training iterations increases, the model’s performance gradually improves, and the loss values decrease over time. Eventually, it reaches a stable convergence trend without overfitting.

Furthermore, an evaluation of the Deeplabv3 + carriage segmentation model’s performance on the validation set is presented in Table 7, which includes segmentation result evaluation metrics. The metrics, including mean Intersection over Union (mIOU), mean Average Precision (mAP), and Accuracy (Acc), all achieve high results. Specifically, mIOU reaches 97.14%, mAP is 98.43%, and Acc is 99.24%.

Figure 11 displays the segmentation results for the carriage. It is evident from the figure that the Deeplabv3 + carriage segmentation model provides precise segmentation of the carriage. The carriage is mostly accurately segmented without significant missing portions. The segmentation results demonstrate that the Deeplabv3 + carriage segmentation model meets the requirement of making the scrap steel classification and grading region more accurate. These experimental results confirm that the Deeplabv3 + carriage segmentation model can be applied to real-world carriage segmentation tasks.

Fig. 10
figure 10

Loss curve of the model. (a) Loss curve of training; (b) Loss curve of validation

Table 7 Evaluation indexes of segmentation results
Fig. 11
figure 11

Carriage segmentation effect. (a) Original image of scrap steel; (b) Segmented carriage after segmentation

4.3 SAHI scrap steel image slice detection

During subsequent testing, it was observed that the CSBFNet model sometimes produced false positives in the detection of scrap steel in carriages and grabbing machines. To address this issue, images containing the grabber machine category were removed before detection. The remaining images underwent carriage segmentation, and finally, the CSBFNet model was employed for scrap steel classification and grading. Figure 12 illustrates the classification and grading results after applying SAHI slicing and image segmentation to the CSBFNet model. In Fig. 12 (a), we have a scrap steel recycling scene image, Fig. 12 (b) displays the classification and grading results from the original image, Fig. 12 (c) presents the results after SAHI slicing prediction, and Fig. 12 (d) shows the detection results of the segmented image.

In Fig. 12 (b), the direct application of the CSBFNet model for scrap steel classification and grading resulted in identifying mainly larger-sized or more prominent features due to the high resolution of the on-site images. Consequently, the classification and grading performance for smaller-sized and less distinct scrap steel was not ideal. To address this issue, the SAHI slicing prediction method was applied, with the slice size set to 640 × 640 pixels for detection, as seen in Fig. 12 (c). This modification allowed for the accurate classification and grading of smaller-sized and less distinct scrap steel.

While Fig. 12 (c) accurately classified and graded scrap steel within the carriage, it also recognized a significant amount of scrap steel in the background. This led to substantial errors in the quantity and grading statistics of the graded scrap steel. To address this issue, the Deeplabv3 + carriage segmentation model was employed to segment the carriage, resulting in highly precise classification and grading regions. Subsequently, the CSBFNet model and SAHI slicing prediction method were used for the classification and grading of the segmented images. Figure 12 (d) demonstrates that the classification and grading performance of scrap steel within the segmented carriage is comparable to the unsegmented results in Fig. 12 (c), with all categories being accurately identified.

The Deeplabv3 + carriage segmentation model can effectively mitigate the impact of complex backgrounds on waste steel classification and rating. By combining SAHI image slicing and the CSBFNet waste steel intelligent quality inspection model, the issue of unsatisfactory classification and rating performance for small-sized and indistinctive feature waste steel due to the high resolution of on-site images is addressed. The proposed method can significantly improve the accuracy of waste steel classification and rating.

Fig. 12
figure 12

Effect of model classification and rating. (a) Scrap recycling site; (b) Classification results of the original images; (c) Prediction results of SAHI slice; (d) Classification and grading results after carriage segmentation

4.4 Model generalization testing

To validate the model’s generalization ability, this study conducted a generalization test on 399 new data samples in real-world scenarios. A confusion matrix was used to evaluate the accuracy of the model’s classification and grading. The test results, as shown in Fig. 13, depict the true class labels on the horizontal axis and the predicted class labels on the vertical axis. “Background FN” represents the number of objects missed by the model, indicating the number of objects predicted as absent in the background class but present. “Background FP” represents the number of false detections made by the model, where background or non-existent objects are incorrectly identified as present. The numbers in each box of the graph represent the proportion of different class predictions after normalization.

From Fig. 13, it is evident that the proposed model still exhibits good detection performance on new data in real-world scenarios. The accuracy of classification and grading for the six categories is above 75%, with most being accurately detected. However, a small number of scrap steel vehicles were not positioned accurately within the designated range, resulting in less precise carriage segmentation. Consequently, background scrap steel was also detected, leading to a decrease in overall detection accuracy. Additionally, scrap steel with a thickness greater than 6 mm constitutes the largest proportion, while categories such as “airtight” and “ungraded” have a relatively small proportion in the dataset, resulting in less accurate detection for these three categories. Subsequent optimizations will focus on improving on-site personnel’s adherence to production standards, enhancing the accuracy of the carriage segmentation model, and increasing the dataset size to address these issues.

Apart from background interference and operational errors by on-site personnel, the proposed method effectively and accurately classifies and grades scrap steel, demonstrating good performance and generalization ability.

Fig. 13
figure 13

Model test confusion matrix

5 Conclusion

This paper introduces an efficient method for intelligent grading of waste steel based on machine vision. The method comprises the Deeplabv3 + carriage segmentation model, which reduces the impact of complex backgrounds in waste steel images on classification ratings, the CSBFNet model for precise waste steel classification ratings, and the SHAI image slicing model for efficient detection of small-sized waste steel at high resolutions. Experimental results demonstrate that our method achieves a mAP of 90.7% on the waste steel dataset and also performs excellently in generalization testing with new on-site data, providing accurate classification ratings for various categories. Comprehensive experiments validate the effectiveness of this method. The performance of the method meets practical production requirements and has undergone initial industrial application.

6 Future work

The intelligent classification and grading of scrap steel, characterized by unmanned operation and automation, can effectively address issues such as low accuracy, poor fairness, and high risk associated with manual quality inspection in the recycling process, thereby providing a powerful tool for enterprises in scrap steel procurement. In future research directions: (1) Expanding the scrap steel dataset to further enhance model accuracy and generalization ability. (2) Enlarging the carriage dataset by adding various types of carriages (large, medium, small) to reduce the influence of background parts. (3) Develop weight prediction algorithms based on the area and density of each type of detected scrap steel for accurate weight estimation.

In terms of future applications, the intelligent classification and grading system for scrap steel can be installed in the charging preparation stage of electric furnaces. It can provide real-time classification and grading of different grades of scrap steel for each furnace, combined with subsequently developed temperature and steel output prediction models. This integration enables precise forecasting of furnace temperature and steel output, thereby providing essential foundational data for optimizing production pace, reducing alloy material costs, and ensuring quality control homogeneity.