Super resolution for root imaging

Premise High‐resolution cameras are very helpful for plant phenotyping as their images enable tasks such as target vs. background discrimination and the measurement and analysis of fine above‐ground plant attributes. However, the acquisition of high‐resolution images of plant roots is more challenging than above‐ground data collection. An effective super‐resolution (SR) algorithm is therefore needed for overcoming the resolution limitations of sensors, reducing storage space requirements, and boosting the performance of subsequent analyses. Methods We propose an SR framework for enhancing images of plant roots using convolutional neural networks. We compare three alternatives for training the SR model: (i) training with non‐plant‐root images, (ii) training with plant‐root images, and (iii) pretraining the model with non‐plant‐root images and fine‐tuning with plant‐root images. The architectures of the SR models were based on two state‐of‐the‐art deep learning approaches: a fast SR convolutional neural network and an SR generative adversarial network. Results In our experiments, we observed that the SR models improved the quality of low‐resolution images of plant roots in an unseen data set in terms of the signal‐to‐noise ratio. We used a collection of publicly available data sets to demonstrate that the SR models outperform the basic bicubic interpolation, even when trained with non‐root data sets. Discussion The incorporation of a deep learning–based SR model in the imaging process enhances the quality of low‐resolution images of plant roots. We demonstrate that SR preprocessing boosts the performance of a machine learning system trained to separate plant roots from their background. Our segmentation experiments also show that high performance on this task can be achieved independently of the signal‐to‐noise ratio. We therefore conclude that the quality of the image enhancement depends on the desired application.

• Methods: We propose a SR framework for enhancing images of plant roots by using convolutional neural networks (CNNs).We compare three alternatives for training the SR model: i) training with non-plant-root images, ii) training with plant-root images, and iii) pretraining the model with non-plant-root images and fine-tuning with plant-root images.
The architectures of the SR models are based on two state-of-the-art deep learning approaches: i) Fast Super Resolution Convolutional Neural Network; and ii) Super Resolution Generative Adversarial Network.
• Results: In our experiments, we observe that the studied SR models improve the quality of the low-resolution images (LR) of plant roots of an unseen dataset in terms of SNR.
Likewise, we demonstrate that SR pre-processing boosts the performance of a machine learning system trained to separate plant roots from their background.
• Discussion: The incorporation of a deep learning-based SR model in the image formation process boosts the quality of LR images of plant roots.We demonstrate on a collection of publicly available datasets that the SR models outperform the basic bicubic interpolation even when trained with non-root datasets.Also, our segmentation experiments show that high performance on this task can be achieved independently of the SNR.Therefore, we conclude that the quality of the image enhancement depends on the application.Key words: Convolutional neural networks; generative adversarial networks; plant phenotyping; root phenotyping; super-resolution.

INTRODUCTION
In the last decade, advances in sensing devices and computer systems have allowed for the proliferation of high-throughput plant phenotyping systems (Das Choudhury et al. 2019).
These systems are designed to acquire and analyze a large number of plant traits (Han et al. 2014;Krieger 2014), including the measure of small structures, such as the venation network of leaves (Green et al. 2014;Endler 1998).However, the characterization of plant roots is more challenging since they are "hidden" in the soil (Atkinson et al. 2019), which limits the type of sensors and techniques that can be applied.
We categorize the methods that have been used for root analysis as follows: i) Nonimaging based in-situ methods: these methods estimate traits of the root system architecture (RSA) by their correlations with chemical or physical properties.For example, in (Cseresnyés et al. 2018;Dalton 1995), the plant root electrical capacitance is used to estimate the root mass (the RSA is modeled as a resistance-capacitance circuit), likewise in (Cao et al. 2011), the electrical impedance spectroscopy (EIS) approach is employed to model the RSA based on the frequency response.The disadvantage of these methods is that they do not provide morphological details since they are a simplified description of the RSA.ii) Destructive methods: in this category, we include the techniques that destroy the RSA during or after the imaging process.The most basic of this type is the one called "shovelomics" that consists of washing out the roots of the soil (Trachsel et al. 2011).Shovelomics can be applied in any type of soil, in contrast with other root phenotyping techniques that have limitations regarding the physical properties of the environment.However, it is not ideal for high throughput because the manual excavation of the roots is laborintensive and tedious.Also, most of the thin roots are lost in this process.iii) Imaging under controlled conditions: roots can be observed using rhizotrons, that are structures with windows that contain the soil where the plants are grown (Taylor et al. 1990).Also, 3-D imaging of RSA can be carried out by using special substrates, e.g., transparent substrates or easy-to-remove types of soil (Clark et al. 2011).These procedures allow acquiring high-quality images, but their main disadvantage is that the imaging acquisition is not made in situ.Therefore, the knowledge that can be inferred by them is limited.iv) Intrusive methods: this category encompasses the techniques where the acquisition device is introduced into the ground.We consider as intrusive methods the minirhizotrons that use a camera fixed into the soil through a tube to record sequences of pictures of parts of the RSA (Johnson et al. 2001), as well as soil coring (Wu et al. 2018).Although these methods do not necessarily result in the destruction of the RSA, they disturb the roots and soil, which might affect the natural root-soil interactions (Kolb et al. 2017).
The disturbance can be worse when the devices are introduced and extracted frequently or when they are installed in stony soils (Majdi 1996).v) Non-intrusive methods: these techniques aim to image the RSA in situ, without disturbing the roots or the soil.In (Barton and Montagu 2004), ground-penetrating radar (GPR) technology was tested for this purpose ¾it was possible to detect tree roots of 1 cm diameter buried in soil at 50 cm depth.However, GPR is currently limited to the detection of roots of trees or woody plants (Araus and Cairns 2014;Hirano et al. 2009).X-ray computerized tomography (CT) (Tabb et al. 2018) and magnetic resonance imaging (MRI) (Pflugfelder et al. 2017) ¾that consist of scanning by devices traditionally used for medical applications¾ can be grouped into this category when the complete plant can be scanned in the device (e.g., plants grown in pots).On the other hand, X-ray CT and MRI based analysis are intrusive when scanning extracted and washed root systems or soil-cores for root architectures removed from the field.In addition to these available approaches, there is on-going development of additional non-intrusive root imaging approaches including backscatter radiography (Cui et al. 2017).
The root system is responsible for water and nutrient absorption, and it is the first barrier to the changing environment.It affects many processes, such as plant growth, CO2 assimilation, and fruit development (Chen et al. 2019;Akinnifesi et al. 1998).Thus, the development of highthroughput root phenotyping methods with low labor inputs is crucial to plant sciences.As mentioned above, the acquisition of high-resolution (HR) imagery of roots in the field by nonintrusive methods remains a challenge.An effective super-resolution (SR) algorithm that complements the image formation process ¾by inferring HR details not clearly delineated by the sensing device¾ is desired for the deployment of these systems in real-world applications.
The SR problem consists of estimating HR images from low-resolution (LR) images.SR has been used to overcome hardware limitations in applications that heavily rely on high-quality images, such as medical diagnosis (Zhang et al. 2012;Zhang and An 2017).Many SR methods in the literature use mathematical transformations of the original data to learn the LR to HR mapping (Yang et al. 2010;Zeyde et al. 2012).For instance, methods based on sparse representations reconstruct each image by a weighted combination of words from a set of basic patterns ¾called a dictionary.A set of LR and HR words are learned from training data.A SR image is obtained by replacing the LR dictionary words by HR dictionary words.Recently datadriven SR models based on deep learning algorithms with convolutional neural networks (CNN) have become more popular than the sparse representation-based models.The SR deep learning algorithms are preferred in many cases because they generally exhibit a better performance, additionally they can be applied as a "black box" when enough training data is available (Wang et al. 2015;Ledig et al. 2017).Particularly, super-resolution generative adversarial networks (SRGANs) have shown a high performance on the estimation of HR details loss in a degradation process (Ledig et al. 2017).For the best of our knowledge, SR deep learning models for root imagery have not been extensively studied.Additionally, an effective SR performance measure on this context is unclear since it has been observed in previous studies that reconstruction accuracy (pixel by pixel comparison of a HR-SR pair) and perceptual quality (comparison of visual features of a HR-SR pair) are not directly correlated (Blau et al. 2019).
To enhance plant root imagery, we adapt two state-of-the-art deep learning approaches, the Fast-Super-Resolution Convolutional Neural Network (FSRCNN) proposed in (Dong et al. 2016), and the Super Resolution Generative Adversarial Network (SRGAN).We train the SR models with LR-HR data of two non-root datasets (DIV2K and 91-image) and three plant root datasets (Arabidopsis, Wheat, and Barley).These datasets were selected since they have considerable differences in textures and shapes, which encourage the model to find a general solution.Also, in order to facilitate the training of the generator (the part of the SRGAN that converts LR into HR images), we introduce a modification of the SRGAN by implementing multiple discriminators (the part of the SRGAN that evaluates the quality of the SR images).In the loss function, we consider the mean square error between HR-LR (that reduces the reconstruction error since it is low if the pixel values are similar), and the adversarial loss (that encourages the network to learn to add HR details to the LR image).To evaluate the SR performance, we use two methods: i) computing the standard signal to noise ratio (SNR) between the SR image and the original HR image; and ii) computing the intersection over union (IoU) when applying the SegRoot network (Wang et al. 2019).
The remainder of this paper is divided as follows.In Methods, we describe the models used for training and testing the SR algorithms.In Results, we report the performance of the SR models.In Conclusion, we explain the relevant findings in our study and provide recommendations for the implementation of SR algorithms for root imaging.

METHODS
In this section, we explain the SR method and the settings that we use to train and test the SR models.

Super Resolution Model
Many CNN architectures that allow mapping LR images into SR images can be found in the machine learning literature.In this effort, we use two state-of-the-art CNN-based models, FSRCNN and SRGAN.FSRCNN is a model that exhibits similar performance to other state-ofthe-art SR techniques, but its execution is considerably faster ¾this characteristic makes it convenient for comparing different training datasets.Appendix 1 contains a description of the parts of this network.SRGAN is a machine learning system formed by two blocks, a discriminator D and a generator G.The function of D is telling apart SR images and real HR images.On the other hand, G aims to generate SR images capable of fooling D. In Appendix 2, we describe the SRGAN model in detail.
For evaluation purposes, we apply an automatic segmentation on the SR images and quantitatively evaluate the performance of the segmentation.Several U-net encoder-decoder architectures have been proposed for automatic detection and segmentation of plant roots (Xu et al. 2020).In this work, we rely on the SegRoot model.Figure 1 shows the stages of the SR framework applied to enhance plant roots.We quantitatively evaluate the SR performance by two measures: SNR and IoU.SNR is a classic measure for estimating the quality of a recovered signal.It is computed by a pixel-bypixel comparison between the original HR image and the estimated SR image, as follows However, SNR might not necessarily highlight any HR detail enhancement.For example, in Fig. 2, the SNR (the higher the better) of the image estimated by bicubic interpolation (1.83) is higher than the SNR of the SR image (1.62) -even though the interpolated image looks blurred.
For this reason, we also estimate the effect of applying the SR enhancement as a preprocessing step in an automatic root-to-background segmentation process.To this end, we trained the stateof-the-art SegRoot network (Wang et al. 2019) with HR data.Therefore, we assume that the segmentation would be more accurate if the input data contains HR details as the ones used for training.We compare the binary ('1'-pixels indicate root, and '0'-pixels indicate background) segmented images Bseg with manually labeled images Bgt by the IoU (Rahman and Wang 2016), also known as the Jaccard Index computed by . '() *  )* .
The IoU values are between 0 and 1 (the higher the better).An IoU value of '1' is when all the target pixels are correctly classified and there are not false positives.For all the SR training experiments, we use as validation dataset a subset of the 100 images from the Roots dataset.The validation dataset is used to estimate the performance of the model in terms of the SNR after completing each iteration.After finishing the training process, we take as the parameters of the model the ones that output the highest SNR on the validation test.
Each model is trained 100 iterations (the loss function converges with this number of iterations).
To evaluate the performance of the SR models, we downscale the images of the Soybean dataset by four.We use one of the SR models listed above to upscale the test images to their original resolution.We estimate the SNR by comparing the estimated SR images and original HR images.Afterward, we use the SegRoot network to classify each pixel in the input image as root or non-root.As lower and upper bounds, we take the upscaled images by bicubic interpolation, and the original HR images, respectively.Table 1 contains the SNR and IoU obtained on the grayscale-Soybean dataset.Segmentation carried out on HR always exhibits the best performance.We infer that the HR details on the images boost the performance of the SegRoot model on this data.Also, all the SR models outperform the bicubic interpolation in terms of both SNR and IoU.Regarding only the SR models, three of them, FSRCNN-91-image, FSRCNN-roots, and SRGAN-MULDIS exhibit the highest SNR (we consider that there is not statistical evidence to prefer one of them in this case since their standard error overlaps).However, there is a mismatch between SNR and IoU results.The model that performs the best in terms of the IoU is FSRCNN-91-image&roots.Therefore, the features enhanced by the SR models that allows increasing the SNR do not necessarily imply that can be useful for any task, as the applied automatic segmentation.Figure 4    and zero when x~q(x).
To choose the architecture of D and G, we need a two-class classification network, and a network that outputs a matrix of the same size of the input (since the LR image is interpolated to the size of the desired SR image), respectively.We evaluated several architectures and selected two for their balance between performance and computational requirements.As a G network, we use the convolutional super-resolution layers of the resolution-aware convolutional neural network (RACNN) proposed in (Cai et al. 2019).For D, we design a two-class classifier with three convolutional layers and one fully connected layer.For training, we use a batch size of 100, and as an update rule, we apply Adaptive Moment Estimation (Adam), applied also in the method proposed by (Ledig et al. 2017), with a learning rate of 0.001.To create LR training images, we randomly select 64x64 chunks, downsample them to 16x16 size and upsample them again to the original size by bicubic interpolation.
In a SRGAN, x~p(x) is a sample of a set of LR images, and x~q(x) is a sample of a set of HR images.After several iterations, it is expected that the D is not able to tell apart HR and SR images, i.e., G learns to convert LR images into SR images very similar to the original HR images.
Note that in (1), it is not required that the output of the generator matches the HR version of the LR input, i.e., the content of the generated image might not be the same as the one in the LR one.
To enforce the matching between HR-LR pairs, we add the squared error between the HR and SR images to the function as follows

Figure 3 .
Figure 3. Examples of plant-root images used to train SR models.RGB images of Arabidopsis roots (a) and wheat roots (b) were converted to grayscale.
contains examples of SR and segmented.

Figure 4 .
Figure 4. SR and segmentation example images (128x64) on the soybean dataset.From top to bottom: a) groundtruth, HR image and segmentation on HR image; b) Bicubic image and its segmentation; c) FSRCNN-91-image model and its segmentation; d) SRGAN-MULDIS model and its segmentation; and e) FSRCNN-91-image&roots model and its segmentation.

Table 1 .
Evaluation of SR models on Soybean dataset.SNR and IoU mean (and standard error in parentheses)DISCUSSIONwhere x denotes a sample (e.g., an image), and p and q are data distributions (e.g., distributions of LR and HR images).Since this is a min-max problem, the expression in (1) is both, a loss function and a reward function.The optimization problem is solved in an alternating manner.In one step, the loss function is minimized w.r.t.G, such that the output of G(x)|x~p(x) is optimized when D(G(x)) equals to 1. On the other hand, the expression in (1) is seen as a reward function that is maximized w.r.t.D. In this case, D(x) is a classifier that is trained to output one when x~p(x),