Introduction

Non-Cartesian acquisitions are widely utilized in cardiac MRI with their efficient coverage of k-space, less visually apparent aliasing artifacts and higher motion robustness [1,2,3,4]. Among these, 3D non-Cartesian trajectories [5, 6] have been applied in multiple applications, including coronary MRI, for their efficient volumetric coverage [7, 8], and potential to resolve respiratory motion [9,10,11,12] by sampling the center of k-space at every readout. In particular, 3D radial or kooshball trajectories are seldom acquired at Nyquist rate due to the lengthy scan times (about 1.5 times the Nyquist Cartesian acquisition time to achieve the same coverage and resolution), and several accelerated imaging strategies have been studied [9, 13,14,15,16] to shorten the scans, though faster acceleration methods with higher image quality are desirable.

Physics-guided deep learning (PG-DL) has emerged as a powerful reconstruction method for accelerated MRI with improved image quality compared to conventional methods [17,18,19,20,21,22]. A common strategy for PG-DL relies on algorithm unrolling, where an algorithm for solving a regularized least squares problem is unrolled for a fixed number of iterations [17, 20, 21]. The unrolled network alternates between a neural network-based regularizer and a linear data fidelity (DF) unit, which is often solved via gradient descent or conjugate gradient method (CG), and which itself is unrolled. A sufficient number of unrolled iterations, as well as CG steps are required for a satisfactory image quality. Consequently, PG-DL has high GPU memory demands, making its implementation a challenging task for high-resolution MRI, especially with 3D non-Cartesian trajectories, such as kooshball acquisitions [5, 23], where data fidelity cannot be separated across different dimensions.

The difficulties of memory consumption for 3D non-Cartesian MRI are evident in existing studies [24,25,26,27]. In [24] a PG-DL reconstruction of 2D non-Cartesian data was implemented using simulated single-coil acquisitions; in [25] PG-DL of a 3D multi-coil spiral acquisition was performed for low-resolution navigator data. On the other hand, alternative strategies that alter the model formulation have also been proposed: In [26] a 3D Cartesian volume was broken into smaller 3D slabs along frequency encode dimension to fit the GPU memory; while [27] proposed block-wise DF using limited CG iterations to reduce GPU memory consumption. An additional issue for implementing PG-DL reconstruction for 3D non-Cartesian trajectories is the limited availability of training data. In this case, each imaging volume corresponds to a single subject, and data augmentation strategies such as the one in [26] are not applicable.

In this study, we combined several recent advances in DL and MRI reconstruction to enable PG-DL reconstruction of multi-coil 3D kooshball coronary MRI data. From an implementation perspective, we combined recently developed memory-efficient learning (MEL) techniques [28], the Toeplitz formulation for the operator used in DF units [29, 30], mixed-precision processing [31] and distributed learning [32] to ensure deep networks can be implemented without restrictions on the architectures. Furthermore, to tackle the challenge of limited training data, we proposed a 2.5D strategy that efficiently uses the 3D training data with a regularization process that comprises three 2D convolutional neural networks (CNNs) in orthogonal views. Our results suggest that both the 3D and 2.5D PG-DL implementations outperform conventional linear reconstructions [33, 34], with the 2.5D strategy leading to higher vessel sharpness and improved qualitative image scores.

Materials and methods

Physics-Guided deep learning formulation

PG-DL solves the regularized multi-coil reconstruction model:

$$\begin{array}{c}\underset{\mathbf{x}}{{\text{argmin}}}{\left|\left|\mathbf{y}-\mathbf{E}\mathbf{x}\right|\right|}_{2}^{2}+\mathcal{R}\left(\mathbf{x}\right)\end{array}$$
(1)

where \(\mathbf{y}\) is acquired k-space, \(\mathbf{E}\) is multi-coil encoding operator, \(\mathbf{x}\) is the underlying image, and \(\mathcal{R}\left(\cdot \right)\) is a regularizer. In PG-DL, the objective function in (1) can be solved with various algorithms [35], including variable splitting with quadratic penalty [17, 19, 21]:

$$\begin{array}{c}{\mathbf{z}}^{\left(i\right)}=\underset{\mathbf{z}}{{\text{argmin}}}\mu {\Vert {\mathbf{x}}^{\left({{i}}\right)}-\mathbf{z}\Vert }_{2}^{2}+\mathcal{R}\left(\mathbf{z}\right)\end{array}$$
(2)
$$\begin{array}{c}{\mathbf{x}}^{\left({{i}}+1\right)}=\underset{\mathbf{x}}{{\text{argmin}}}\mu {\Vert \mathbf{x}-{\mathbf{z}}^{\left(i\right)}\Vert }_{2}^{2}+{\Vert \mathbf{y}-\mathbf{E}\mathbf{x}\Vert }_{2}^{2}={\left({\mathbf{E}}^{H}\mathbf{E}+\mu \mathbf{I}\right)}^{-1}\left({\mathbf{E}}^{H}\mathbf{y}+\mu {\mathbf{z}}^{\left(i\right)}\right)\end{array}$$
(3)

where \(\mu\) is a learnable scalar. The proximal operator for the regularizer in Eq. (2) is solved by a trained CNN-based regularizer, followed by the data fidelity (DF) step in Eq. (3) whose solution can be estimated via conjugate gradient (CG) method [19, 21]. The reconstruction is given by a pre-determined number of unrolled iterations between Eqs. (2) and (3) (Fig. 1). Since the DF term performs multiple CG iterations itself, the resulting computational graph may lead to large GPU memory demands for backpropagation.

Fig. 1
figure 1

PG-DL scheme for 3D high-resolution multi-coil non-Cartesian MRI. a PG-DL starts from a linear reconstruction with one CG iteration. MEL is applied for the unrolled network, to support unlimited unrolled iterations using GPU memory of a single unrolled step. The CNN regularizer is implemented using half-precision fitting into a single GPU. b In MEL, intermediate data in the forward propagation is stored in host memory and the GPU resource is released and reused for the next unrolled iteration. During backpropagation, the intermediate results on host memory is fed to the GPU to the gradients are computed through each unrolled step. c CG iterations in DF is performed on an individual GPU, which distributes coil images to additional GPUs during multi-coil encoding and decoding operations. DF is implemented using single precision to support k-space processing

Memory efficient learning for algorithm unrolling

The alternating feedforward structure of the unrolled network suggests that each unrolled iteration can be performed sequentially. To this end, an MEL scheme for PG-DL was proposed in [28], which reduces GPU memory consumption of a PG-DL network with \(N\) unrolled iterations from \(\mathcal{O}(N)\) to \(\mathcal{O}(1)\), with trade-offs between increased computational complexity for lower memory consumption. The scheme in [28] was a special case of the broader technique of memory checkpointing [36,37,38], which builds the computation graph as a single unrolled step on the device (GPU), and save the intermediate outputs on the host (CPU) memory (Fig. 2a). During forward propagation, the output images of each unrolled step is saved on host memory, to be used for computing gradient of each unrolled step during backpropagation, where the gradient is also saved step by step on the host memory. The resulting PG-DL scheme uses GPU memory corresponding to a single unrolled step while supporting an arbitrary number of unrolling steps (Fig. 1b).

Fig. 2
figure 2

2.5D PG-DL for 3D MRI. 2.5D PG-DL consists of three 2D CNNs applied over coronal, sagittal and axial planes individually. The 2D CNNs treat a 3D volume as a batch of 2D images, which effectively augments the training data by hundreds-fold depending on the image dimension

Toeplitz method for data fidelity

The DF step (Eq. 3) computes \({\mathbf{E}}^{H}\mathbf{E}\), which consists of NUFFT and its adjoint operation, in every CG iteration. This requires large GPU memory consumption during backpropagation, especially with respect to the convolutions in NUFFT, and often prevents the training of PG-DL for 3D multi-coil non-Cartesian MRI. The Toeplitz nature of \({\mathbf{E}}^{H}\mathbf{E}\) has been noted [29, 30], and has been used to convert this calculation into a series of fast Fourier transform and point-wise multiplications with a trajectory-dependent operator on a zero-padded image as:

$$\begin{array}{c}{\mathbf{E}}^{H}\mathbf{E}=\sum_{n=1}^{{n}_{c}}{\mathbf{S}}_{n}^{H}{\mathbf{Z}}^{H}{\mathbf{F}}^{H}\mathbf{MFZ}{\mathbf{S}}_{n}\end{array}$$
(4)

where \({\mathbf{S}}_{n}\) denotes the \(n\)th coil sensitivities, \({n}_{c}\) is the number of coils, \(\mathbf{Z}\) is zero-padding to double FOV along each dimension and \({\mathbf{Z}}^{H}\) is cropping to the original FOV, \(\mathbf{F}\) is Cartesian FFT, and \(\mathbf{M}\) is a trajectory-dependent operator that can be pre-computed as:

$$\begin{array}{c}\mathbf{M}=diag(\mathbf{F}{\mathcal{F}}^{H}\mathbf{W}\mathcal{F}\mathbf{\delta} )\end{array}$$
(5)

where \({\mathcal{F}}^{H}\) denotes adjoint NUFFT that grids non-Cartesian k-space of doubled FOV along each dimension, \(\mathbf{W}\) is density compensation weights, and \({\varvec{\updelta}}\) is the delta impulse function. Under this setup, the non-Cartesian PG-DL is implemented without the use of NUFFT or its adjoint, but by Cartesian FFTs and diagonal matrix multiplication, leading to substantial savings in computational time.

Mixed precision

Half-precision processing uses 16-bit floating point data (float16) that naturally occupies less memory than conventional single precision (float32). Although float16 has a limited representation range, its application in computer vision is well-established with virtually no impact on network accuracy from the use of half-precision [31]. However, unlike images used in computer vision, k-space data exhibit large variations of intensities hindering the use of half-precision. Thus, in this study, we apply half-precision for the CNN regularizer only, while the DF employs conventional single precision (float32) for a sufficient representable range. As a result, the depth of the CNN regularizer can be high while only requiring access to a single GPU, and the DF unit does not incur any risks of over/underflow.

Distributed learning

The zero-padding in the Toeplitz method increases the image matrix size by 23 = 8 times for 3D data. Additionally, as aforementioned, k-space operations are not well-suited for half-precision processing. Thus, considerable GPU memory is required to support multi-coil DF units. Noting the separability of Eq. (4) across coils, we distribute the operation DF to multiple devices. Based on the GPU configuration, we split all \({n}_{c}\) coils into three subsets to be evaluated on three GPUs respectively, and combine the results in a primal device (Fig. 1c).

2.5D PG-DL for 3D reconstruction

To train the PG-DL network with limited training data, we propose a 2.5D PG-DL formulation using 3D DF and three 2D CNN regularizers \({\mathcal{R}}_{c}(\cdot )\), \({\mathcal{R}}_{s}(\cdot )\), \({\mathcal{R}}_{a}(\cdot )\) over coronal, sagittal, and axial planes, respectively:

$$\begin{array}{c}\underset{\mathbf{x}}{{\text{min}}}{\Vert \mathbf{E}\mathbf{x}-\mathbf{y}\Vert }_{2}^{2}+{\mathcal{R}}_{c}\left({\mathbf{u}}_{c}\right)\text{+}{\mathcal{R}}_{s}({\mathbf{u}}_{s})\text{+}{\mathcal{R}}_{a}({\mathbf{u}}_{a})\end{array}$$
(6)
$$s.t. \mathbf{x}={\mathbf{u}}_{c},\boldsymbol{ }\mathbf{x}={\mathbf{u}}_{s},\mathbf{x}={\mathbf{u}}_{a}$$

Using the same variable splitting with quadratic penalty approach, we have:

$$\underset{\mathbf{x}}{{\text{min}}}{\Vert \mathbf{E}\mathbf{x}-\mathbf{y}\Vert }_{2}^{2}+{\mathcal{R}}_{c}\left({\mathbf{u}}_{c}\right)\text{+}{\mathcal{R}}_{s}\left({\mathbf{u}}_{s}\right)\text{+}{\mathcal{R}}_{a}\left({\mathbf{u}}_{a}\right)+{\mu }_{c}{\Vert \mathbf{x}-{\mathbf{u}}_{c}\Vert }_{2}^{2}+{\mu }_{s}{\Vert \mathbf{x}-{\mathbf{u}}_{s}\Vert }_{2}^{2}+{{{\mu}}}_{a}{\Vert \mathbf{x}-{\mathbf{u}}_{a}\Vert }_{2}^{2}$$
(7)

which is solved via:

$$\begin{array}{c}{\mathbf{u}}_{c}^{(i)}\underset{\mathbf{u}}{{\text{argmin}}}{\mu }_{c}{\Vert {\mathbf{x}}^{(i-1)}-\mathbf{u}\Vert }_{2}^{2}+{\mathcal{R}}_{c}\left(\mathbf{u}\right)\end{array}$$
(8)
$$\begin{array}{c}{\mathbf{u}}_{s}^{(i)}\underset{\mathbf{u}}{{\text{argmin}}}{\mu }_{s}{\Vert {\mathbf{x}}^{(i-1)}-\mathbf{u}\Vert }_{2}^{2}+{\mathcal{R}}_{s}\left(\mathbf{u}\right)\end{array}$$
(9)
$$\begin{array}{c}{\mathbf{u}}_{a}^{(i)}\underset{\mathbf{u}}{{\text{argmin}}}{\mu }_{a}{\Vert {\mathbf{x}}^{(i-1)}-\mathbf{u}\Vert }_{2}^{2}+{\mathcal{R}}_{a}\left(\mathbf{u}\right)\end{array}$$
(10)
$$\begin{array}{c}{\mathbf{x}}^{(i-1)}=\underset{\mathbf{x}}{{\text{argmin}}}{\Vert \mathbf{E}\mathbf{x}-\mathbf{y}\Vert }_{2}^{2}+{\mu }_{c}{\Vert \mathbf{x}-{\mathbf{u}}_{c}^{(i)}\Vert }_{2}^{2}+{\mu }_{s}{\Vert \mathbf{x}-{\mathbf{u}}_{s}^{(i)}\Vert }_{2}^{2}+{{{\mu}}}_{a}{\Vert \mathbf{x}-{\mathbf{u}}_{a}^{(i)}\Vert }_{2}^{2}={\left({\mathbf{E}}^{H}\mathbf{E}+\left({\mu }_{c}+{\mu }_{s}+{\mu }_{a}\right)\mathbf{I}\right)}^{-1}({\mathbf{E}}^{H}\mathbf{y}+{\mu }_{c}{\mathbf{u}}_{c}^{\left(i\right)}+{\mu }_{s}{\mathbf{u}}_{s}^{\left(i\right)}+{\mu }_{a}{\mathbf{u}}_{a}^{(i)}),\end{array}$$
(11)

where the proximal operators in (8)–(10) are solved by three distinct 2D CNNs, and the DF term in (11) is solved via CG as before. The 2D CNNs take 3D volume as input while treating the direction orthogonal to the plane of interest as the batch dimension, thus each 3D volume provides hundreds of 2D images for training (Fig. 2).

Coronary MRI datasets

3D kooshball coronary MRI was acquired on a 1.5 T scanner (Magnetom Aera, Siemens Healthcare, Erlangen, Germany) on 9 subjects, using an ECG-triggered T2-prepared, fat-saturated, navigator-gated bSSFP sequence, with resolution = (1.15 mm)3, matrix size = 1923 (grid size = 3843), FOV = (220 mm)3 and twofold readout oversampling. A total of 12,320 radial projections (sub-Nyquist rate of 5) were acquired in 385 heartbeats using the spiral phyllotaxis pattern [23] with one interleaf of 32 readouts per heartbeat, resulting in scan efficiency of 41.16 ± 11.75 (%). The data was further retrospectively sixfold undersampled prior to any processing.

Implementation details

As detailed earlier, the CNN-based regularizer is implemented using half-precision to fit a ResNet [21] with sufficient depth on a single GPU. The proposed 3D PG-DL uses a ResNet regularizer comprising 10 residual blocks, where each residual block has 2 convolutional layers using 3 × 3 × 3 convolutions without bias, followed by ReLU activation. Such ResNet results in 1,444,611 learnable parameters. 2.5D PG-DL employs three 2D ResNets of the same architecture as its 3D counterpart, only replacing the 3D convolutions with 3 × 3 2D convolutions. In such case, three 2D ResNets lead to the same number of learnable parameters as a 3D ResNet using 3 × 3 × 3 convolutions. Both 3D and 2.5D PG-DL has 10 unrolled iterations and 10 CG iterations in the DF. DF is implemented using single precision on an individual GPU, which further distributes coil images into additional 2 GPUs to perform the \({\mathbf{E}}^{H}\mathbf{E}\) operations. All experiments were implemented using Pytorch 1.14, and executed on a server with 4 NVIDIA A100 GPUs (40 GB). PG-DL networks were trained using an ADAM optimizer with a learning rate of 0.0005 for 100 epochs. A mixed \({{\ell}}_{1}\)\({{\ell}}_{2}\) loss was employed [21] to perform supervised training using 5 of 18 subjects as training data, and the remaining 13 distinct subjects were used for testing. For each subject, a 3D volume is used for the 3D PG-DL, while 2.5D PG-DL takes 192 2D slices chosen equally from the 3 orthogonal views. The peak memory consumption during training were 107.1 GB and 74.0 GB for 3D and 2.5D set-up respectively, and 79.4 GB and 46.1 GB for 3D and 2.5D, respectively during testing. For comparison, the datasets were also reconstructed using conventional gridding and Tikhonov-regularized CG-SENSE reconstruction.

Image analysis

Quantitative assessments using SSIM and PSNR were performed with respect to the fully-sampled reference image. Vessel sharpness scores and vessel length were obtained using SoapBubble software [39]. Since CG-SENSE and PG-DL outputs complex coil-combined images as their final results, we adapted complex coil combination [40, 41] to generate the images for a fair comparison between reference and other reconstructions, instead of root-sum-squares combination [23, 42]. Statistical differences in SSIM, PSNR and vessel sharpness scores were assessed using a paired signed rank test with a P value < 0.05 being considered significant.

Furthermore, a qualitative image quality assessment was performed by an experienced cardiologist (15 years of experience). The reader was blinded to the reconstruction methods, and their orders were randomized. Four test subjects were evaluated on a 4-point ordinal scale, for overall image quality (1: excellent; 2: good; 3: fair; 4: poor), perceived SNR (1: high SNR; 2: minor noise with moderate SNR; 3: major noise but not limiting clinical diagnosis; 4: poor SNR and non-diagnostic), aliasing artifacts (1: no visual apparent aliasing; 2: mild; 3: moderate; 4: severe) and blurring (1: none; 2: mild; 3: moderate; 4: severe).

Results

Figure 3 depicts representative reconstruction results of retrospectively sixfold undersampled data, using gridding, Tikhonov-regularized CG-SENSE, 3D PG-DL and 2.5D PG-DL. Due to the relatively high undersampling rate, gridding reconstruction exhibited blurring and undersampling artifacts. Tikhonov-regularized CG-SENSE showed less blurring than gridding but suffers from noise amplification due to the high acceleration rate. Both 3D and 2.5D PG-DL substantially reduced the noise compared to Tikhonov regularized CG-SENSE, while 2.5D PG-DL offered visually improved image sharpness compared to its 3D counterpart.

Fig. 3
figure 3

Representative reconstruction results with retrospectively undersampled data at rate 6, using gridding, Tikhonov regularized CGSENSE, 3D PG-DL and 2.5D PG-DL. Gridding exhibits blurring and undersampling artifacts. Tikhonov regularized CGSENSE displays noise amplification and blurring. Both 3D and 2.5D PG-DL offer reduced noise and artifacts, while 2.5D PG-DL outperforms 3D PG-DL in terms of sharpness

Figure 4 depicts the quantitative metrics of the tested reconstruction methods. Boxplots of SSIM (Fig. 4a) and NMSE (Fig. 4b) suggest the gridding led to the lowest quantitative metrics due to the relatively high undersampling rate. Tikhonov-regularized CG-SENSE improved SSIM and NMSE compared to gridding, while both 3D and 2.5D PG-DL achieved significantly better scores (P < 0.05). No statistical difference were observed between 3D and 2.5D PG-DL. Similar conclusions apply to vessel sharpness measurements (Fig. 4c) with gridding having the lowest sharpness in all subjects, CGSENSE outperforming gridding with limited improvement, and 2.5D PG-DL and 3D PG-DL substantially improving on these methods. In particular, 2.5D PG-DL had higher vessel sharpness in all subjects compared to 3D PG-DL, performing closely to fully-sampled reference. Similar to PSNR and SSIM, 2.5D and 3D PG-DL significantly improved upon classical methods, though the differences among each other were not significant. Measured vessel length (Fig. 4d) results suggest gridding suffered from significant sub-sampling artifacts and failed to provide reasonable vessel length; both 3D and 2.5D PG-DL gave more accurate vessel length measurements than Tikhonov-regularized CG-SENSE, while 2.5D PG-DL shows an advantage over its 3D counterpart.

Fig. 4
figure 4

Quantitative metrics of tested methods. a SSIM computed on 2D slices in the 3 orthogonal views of 13 testing subjects. Medians are 0.7402, 0.7770, 0.8412, 0.8441 for gridding, Tikhonov-regularized CGSENSE, 3D PG-DL and 2.5D PG-DL, respectively. b PSNR of the tested methods, computed from 2D slices in the three orthogonal views of all testing data with medians 22.5034, 30.4560, 32.1610 and 32.2839 for gridding, Tikhonov-regularized CGSENSE, 3D PG-DL and 2.5D PG-DL, respectively. c Vessel sharpness scores (%) of the reference fully sampled image and tested methods with medians 76.1980, 38.7044, 63.3126, 74.3423 and 76.7669 for reference, gridding, CGSENSE, 3D and 2.5D PG-DL, respectively. 2.5D PG-DL and 3D PG-DL had similar SSIM and NMSE, while 2.5D PG-DL gave higher sharpness scores than 3D PG-DL in all tested subjects. d Measured vessel length of the fully sampled image and tested methods for all tested subjects, with averaged vessel length of 4.8187 ± 1.1595 cm, 4.562 ± 0. 6267 cm, 4.6477 ± 0. 8476 cm, 4.7911 ± 1. 1259 and 4.8039 ± 1.1465 cm, for gridding, Tikhonov-regularized CGSENSE, 3D PG-DL and 2.5D PG-DL respectively, with respect to the fully sampled image. Both PG-DL gave more accurate vessel length measurement, where 2.5D PG-DL showed lower error than its 3D counterpart

Figure 5 depicts the reader study results for all methods. 2.5D PG-DL gives excellent overall image quality. Both 3D and 2.5D PG-DL show high SNR, no visual apparent aliasing and mild blurring. CG-SENSE shows moderate to severe blur while gridding suffers from low SNR and severe blurring.

Fig. 5
figure 5

Clinical reader study results of tested methods. A 4-point ordinal scale (1: excellent, 2: good, 3: fair, 4: poor) was employed, for overall image quality, perceived SNR, aliasing artifacts and blurring. 2.5D PG-DL gave an overall image quality similar to the reference image. Both 2.5D and 3D PG-DL showed high SNR, no visual aliasing and mild blurring, outperforming gridding and CG-SENSE

Discussion

In this study, we investigated PG-DL of high-resolution, multi-coil non-Cartesian coronary MRI from a practical perspective. In addition to several memory-efficient techniques to tackle hardware limitations, a 2.5D processing strategy was proposed to improve PG-DL using limited training data. Our results show that PG-DL significantly improved image quality compared to conventional methods, while 2.5D PG-DL further led to sharper and higher-rated image quality compared to 3D PG-DL in this limited training data setup.

The combination of MEL, Toeplitz method and distributed learning is necessary in training to ensure a sufficient number of unrolled iterations as well as CG iterations for reconstruction. Without using MEL, only 1 unrolled iteration would be supported using the 4-GPU system with a total memory of 160 GB. In a preliminary study [43], we showed that half-precision was needed to ensure a ResNet with sufficient depth, and distributed learning was necessary to facilitate the use of all 32 coils in DF without coil compression.

The Toeplitz method was first proposed to accelerate non-Cartesian reconstruction [29, 30] by trading memory (8 times for 3D case) with processing speed. Note it has the same memory consumption as NUFFT with 2 × oversampling. However, the Toeplitz method benefits the PG-DL training by simplifying backpropagation using point-wise multiplications instead of convolutions. As a result, PG-DL using the Toeplitz method took less memory than using NUFFT with 1.25 × oversampling during the training. Even with the use of MEL, we failed to execute the training using NUFFT [44] for DF instead of the Toeplitz method due to higher memory demands.

Existing studies on 2.5D networks focus on its close performance, faster processing speed and lower memory consumption compared to 3D networks [45,46,47,48]. These studies, mostly confined to computer vision applications investigate potentially better performance than 3D networks under memory constrains. In our work, the proposed 2.5D PG-DL is formulated via 3D DF and 2D regularization terms, where view-aggregating is achieved via quadratic relaxation with learnable regularization strength. As reported in [48], 2.5D networks with view-aggregating outperform 3D networks using constrained resources, while 3D networks are believed to eventually outperform 2.5D given unlimited resources, for their ability to better model 3D textures [48]. As reported in [48], 2.5D networks with view-aggregating outperform 3D networks using constrained resources, while 3D networks are believed to eventually outperform 2.5D given unlimited resources, for their ability to better model 3D textures [48]. Due to the lack of sufficient training data (in total 5 volumes), it is unclear if 3D PG-DL can eventually outperform 2.5D PG-DL given sufficient data during the training, in the context of MRI reconstruction. In this study, the 2.5D processing allowed improved PG-DL networks to be trained from a higher number of 2D slices instead of limited 3D volumes. As the limited training data becomes an additional challenge for 3D non-Cartesian PG-DL, the 2.5D formulation provides a feasible strategy to exploit the training data efficiently, without introducing artificial augmentation strategies which may be insufficient with such limited training subjects [49]. Finally, we note that while the improvements between 2.5D PG-DL and 3D PG-DL were not statistically significant, this may be due to the limited size of the test database.

In this study, we have used normalized vessel sharpness, as determined by SoapBubble software [39] to evaluate the sharpness of the reconstructions, as is commonly done in coronary MRI studies [13, 50, 51]. However, we note that there are other edge sharpness measurements that have been proposed for cardiac MRI that may also be useful in other applications [52].

This study focused on enabling PG-DL reconstruction under two major restrictions of GPU memory and limited training data, and building a first-of-its-kind baseline for PG-DL reconstruction of large-sized 3D non-Cartesian MRI data. To this end, a combination of techniques was used to ensure the results were compatible with commonly used techniques in the field. Despite the proposed combination of memory-efficient techniques, several recent advancements are worthy to be considered in future works. For model unrolling, deep equilibrium models [53, 54] are an alternative method that offers a memory-efficient strategy for supporting arbitrary numbers of unrolled iterations that are run until convergence. However, many existing studies in the field use a fixed number of unrolled steps, thus we adapted MEL to make the framework compatible with the commonly used reconstruction techniques. Although the Toeplitz method costs less memory than NUFFTs, it significantly increases the matrix size during DF. It is worthy to investigate memory-efficient approximations to replace gridding and its adjoint operation, such as [27, 55, 56]. However, all these compression-type techniques cause error accumulation through unrolls. Thus, in this study we adopted the Toeplitz method without coil compression as a baseline. To handle insufficient amount of training data, transfer learning approaches have also been suggested [57, 58]. However, the performance of transfer learning is largely dependent on the prior task from which the transfer is made. Thus, it was not considered in this work but may be worth investigating in future studies. In conclusion, PG-DL of 3D high-resolution multi-coil non-Cartesian coronary MRI is achieved in this study, without compromising multi-coil data size or network depth. We further formulated a 2.5D PG-DL for 3D reconstruction to efficiently utilize limited training data, and obtained visually improved image quality compared to the conventional 3D PG-DL.