Fast upper airway magnetic resonance imaging for assessment of speech production and sleep apnea
Article information
Abstract
The human upper airway is involved in various functions, including speech, swallowing, and respiration. Magnetic resonance imaging (MRI) can visualize the motion of the upper airway and has been used in scientific studies to understand the dynamics of vocal tract shaping during speech and for assessment of upper airway abnormalities related to obstructive sleep apnea and swallowing disorders. Acceleration technologies in MRI are crucial in improving spatiotemporal resolution or spatial coverage. Recent trends in technical aspects of upper airway MRI are to develop state-of-the-art image acquisition methods for improved dynamic imaging of the upper airway and develop automatic image analysis methods for efficient and accurate quantification of upper airway parameters of interest. This review covers the fast upper airway magnetic resonance (MR) acquisition and reconstruction, MR experimental issues, image analysis techniques, and applications, mainly with respect to studies of speech production and sleep apnea.
INTRODUCTION
The human upper airway is involved in various functions, including speech, swallowing, and respiration. Speaking involves airflow from the lung and trachea, vibration of the vocal fold, and rapid motion of articulators such as the tongue, soft palate, and lips. Chewing and swallowing activities involve motion of the tongue and soft palate as well as the food itself. The food is transported from the mouth to the esophagus. The epiglottis is moved back toward the esophagus to prevent the food bolus from entering the trachea. Respiration during sleep involves periodic motion of the pharyngeal wall, soft palate, and tongue to maintain the patency of the upper airway during the process of oxygen and carbon dioxide exchange. These functions are dynamic in nature, and the relevant organs are coordinated in a timely manner.
Medical imaging modalities, including X-ray imaging, computed tomography (CT), ultrasound, and magnetic resonance imaging (MRI), are non-invasive in nature and have been used to obtain images of the upper airway. The electromagnetic articulometer is used to track the movement of transmitter coils attached to the tongue and lips for linguistic studies [1,2]. However, it is invasive and only provides limited information related to the locations of the coils. Fiberoptic endoscopy is used to visualize the airway lumen during drug-induced sleep in patients with obstructive sleep apnea (OSA) [3]. Optical coherence tomography (OCT) [4] has the potential to provide three-dimensional (3D) images of the upper airway with high spatial resolution [5,6]. Both endoscopy and OCT are invasive and do not provide anatomical information of the surrounding soft tissue.
Unlike X-ray imaging and CT, MRI involves no ionizing radiation, and results in no harmful radiation exposure to the subjects. It can non-invasively visualize the internal structure of the human upper airway and soft tissue, although it is expensive and loud and has a low acquisition speed. The organs involved in speech and swallowing tasks typically move faster than the organs involved in other activities, such as a beating heart or a knee or temporomandibular joint in motion.
One of the goals of upper airway MRI is to obtain highquality upper airway dynamic image frames with high temporal fidelity. Clinical MRI protocols applied to the brain, heart, abdomen, and knee may not be directly applicable to the upper airway. Thus, development of magnetic resonance (MR) pulse sequences and image reconstruction should be tailored to the upper airway functions that researchers are interested in visualizing. From the perspective of fast dynamic imaging, vocal tract motion during speech production is relatively faster than pharyngeal airway motion during sleep, and generally speech MRI requires higher temporal resolution than sleep MRI. Table 1 lists several fast upper airway imaging protocols used for speech and sleep apnea studies in dynamic 2D, dynamic multi-slice, static 3D, and dynamic 3D imaging.
This review covers recent technologies used in upper airway MR pulse sequence, image reconstruction, and image analysis methods for human speech production and sleep apnea research. Notably, several previous review papers by Lingala et al. [7], Scott et al. [8], and Bresch et al. [9] addressed the technical aspects of upper airway MR acquisition for speech production. Ramanarayanan et al. [10] presented an in-depth review of image analysis techniques on real-time MRIs of vocal tract motion. Nayak and Fleck [11] introduced MRI techniques for assessment of OSA. Compared with the previous review papers, this article relatively puts more emphasis on MR experimental issues that may be of interest to researchers involved in MR experimental studies of the upper airway imaging. Compared with the previous review papers, this article presents a more comprehensive and up-to-date review of fast upper airway MRI, including imaging strategies for both speech production and sleep apnea research.
IMAGE ACQUISITION
Improvement of the acquisition speed in MRI has been an active area of research for more than two decades. Hardware improvements in the design of the gradient and radiofrequency (RF) coils greatly contributed to reducing the image acquisition time and increasing the signal-to-noise ratio (SNR). In addition, parallel imaging [12,13] and compressed sensing [14] revolutionized MRI speeds. MR protocols that use both parallel imaging and compressed sensing have been recently introduced in clinical MR examination protocols under various vendor-specific names, such as the Compressed SENSE by Philips (Best, the Netherlands), Compressed Sensing GRASP-VIBE by Siemens (Erlangen, Germany), and HyperSense by GE (Chicago, IL, USA).
Unlike the beating of the heart, the upper airway and tongue motion during natural speech or swallowing are not necessarily periodic. For example, cardiac electrocardiogram (ECG)-gated imaging, which is typically used in conventional cardiac MRI exams, assumes that the heart motion is periodic. The MRI raw data acquired over multiple heart beats are sorted into appropriate bins of the cardiac cycle based on the ECG gating information. However, the gated technique is not ideal for dynamic upper airway imaging of fluent speech, during which the motion of the articulators is not periodic in general. Real-time imaging [15] aims to freeze motion by simply reducing the acquisition window and is preferred over gated imaging for visualizing the motion of the articulators as is.
Dynamic 2D imaging
The midsagittal slice is typically acquired in speech imaging, since it shows a slice of the entire vocal tract from the lips to the glottis (Fig. 1). Dynamic real-time 2D imaging requires image frames at least within 80 ms for speech imaging. Hence, a rapid acquisition technique is necessary to meet the temporal resolution requirement. Gradient echo (also known as gradient recalled echo) sequence (GRE) is usually the choice for rapid imaging of the upper airway. Fig. 2 shows four representative k-space sampling trajectories in 2D imaging.
Cartesian sampling (also known as two-dimensional Fourier transform [DFT]) is widely adopted in clinical imaging protocols since it is robust to system imperfections, but it may be slow for real-time speech imaging. Fig. 3 shows an example of airway narrowing in the retroglossal slice during sleep-disordered breathing, where a dynamic 2D Cartesian imaging sequence was used for data acquisition [16]. Cartesian imaging with partial k-space undersampling was combined with projection on to convex set sensitivity encoding (POCSENSE) reconstruction [17] in order to recover an unaliased image and thus speed up real-time dynamic imaging.
Radial sampling is less sensitive to motion than Cartesian sampling and exhibits incoherent spatial aliasing artifacts in undersampling, thus being well-suited to the sparse reconstruction known as compressed sensing. Niebergall et al. [18] demonstrated a radial fast low-angle shot (FLASH) sequence with ultrashort repetition time (i.e., TR) of 2.22 ms to achieve a 33.3 ms acquisition time in speech imaging. More recently, Iltis et al. [19] demonstrated a radial FLASH sequence to achieve a 10.0-ms temporal resolution in capturing rapid tongue motion.
Echo-planar sampling (EPI) is a time-efficient approach that involves acquiring the entire k-space in a single shot or in a few shots, but it may show ghosting and distortion in the presence of motion and off-resonance. EPI is seldom used for speech imaging, mainly because off-resonance from air-tissue boundaries results in a large degree of distortion in images.
Spiral imaging is another time-efficient approach that involves covering the k-space and has been demonstrated in real-time cardiac imaging [20,21], speech imaging [22], and OSA studies [23]. A drawback of spiral imaging is image quality degradation, including spatial blurring due to off-resonance [24]. Given the same readout duration, spatial blurring is more severe at higher magnetic field strengths. In real-time speech imaging, center-frequency adjustment and pre-scan shimming are required to reduce spatial blurring. Off-resonance correction can be performed to de-blur the images, and, in this case, a field map needs to be estimated at each frame either with a pulse sequence with two different echo times [25] or with a focus metric [26,27]. In practice, interleaved spiral trajectories are used to reduce spatial blurring with reduced readout duration. Interleaved spiral trajectories in combination with view-sharing reconstruction increase the frame rate in real-time speech imaging (Fig. 4).
Golden angle sampling [28] emerged as a promising acquisition technique for dynamic imaging [29]. Golden angle sampling, in which the angle of a radial spoke is increased by the golden angle (111.246°) at every TR, bisects the largest azimuthal angle gap at every TR. It is known to provide flexible retrospective selection of temporal resolution and time offset from continuously acquired real-time MR data. It was demonstrated with radial and spiral trajectories in speech imaging [30-32].
The SNR may be insufficient for high-resolution real-time imaging of the soft palate, which is located far from the surface coil and thus the coil sensitivity is relatively low. Adaptive averaging was demonstrated to improve the SNR from data acquired during repetition of a speech utterance [33].
Tagged cine MRI sequences were used to visualize internal deformation in the tongue [34,35]. MR tagging is performed using a spatial modulation of magnetization (SPAMM) imaging protocol [36,37]. Many repetitions of the utterance are required to track internal tongue motion on one slice [34]. For every repetition, the subjects should maintain the same articulatory postures and speech rate to avoid image mis-registration. Thus, tagged cine acquisition requires pre-training of the subjects to reduce variability in speech rate and articulation.
Dynamic multi-slice 2D imaging
For speech imaging, the midsagittal slice covers the entire vocal tract from the lips to the glottis, but it does not provide any information about articulation in the parasagittal regions, such as the grooving/doming of the tongue, asymmetries in tongue shape, and lateral shaping of the pharyngeal airway. A multi-slice real-time imaging technique was developed to capture the vocal tract shaping in three orthogonal planes (i.e., midsagittal, axial, coronal) during fluent speech [38]. Fig. 5 illustrates the acquisition and reconstruction of three slice images during the utterance of /θ/ in the vowel contexts of /a_a/ and /i_i/. Notably, multi-slice imaging sacrifices temporal resolution by the number of scan planes. A recent study demonstrated a dual-planar real-time imaging for assessment of velopharyngeal function [39].
For sleep apnea imaging, simultaneous imaging of the midsagittal slice and multiple axial slices would be a preferred approach over a single slice imaging, because additional anatomical information is available from multi-slice imaging. Shin et al. [40] demonstrated imaging of the pharyngeal airway in one midsagittal and two axial planes during natural sleep and noted that the subject’s head motion during sleep could have the prescribed midsagittal scan plane not centered in the upper airway. Hence, the midsagittal view, if it is not corrected during imaging, may lead to mis-interpretation of a patent airway as a collapsed airway.
Simultaneous multi-slice (SMS) imaging [41] uses multiband RF pulses to excite multiple parallel slices simultaneously and resolves the slices with parallel imaging. The benefit of SMS imaging is the reduced geometric factor, which facilitates higher acceleration in multi-slice imaging. SMS real-time MRI was demonstrated in four parallel axial slices using a radial controlled aliasing in parallel imaging results in higher acceleration (CAIPIRINHA) sequence with golden angle view order for upper airway compliance measurement [42].
Static 3D imaging
High-resolution 3D MRI of vocal tract shaping provides insight into the modeling of the vocal tract in association with speech sounds [43-45]. The production of speech sounds is performed within a subject’s breath-hold. It is desirable to acquire an entire vocal tract shape in 3D within sustained speech over a period typically ranging from 6 to 8 seconds. This requires highly accelerated imaging to achieve high spatial resolution with complete coverage of the vocal tract. Kim et al. [46] demonstrated the first application of compressed sensing to 3D imaging of the upper airway for speech and achieved a resolution of 1.5× 1.5× 2.0 mm3 in 7 seconds of sustained speech production using a single-channel head coil. Recently, Burdumy et al. [47] demonstrated improved full 3D imaging of the vocal tract using a stack-of-stars sequence and compressed sensing to reduce the scan time to 1.3 seconds.
Dynamic 3D imaging
Respiratory gating was used to acquire the 3D dynamics of the upper airway during tidal breathing while awake in patients with OSA [48]. The scan time was proportional to the number of respiratory phases, spatial resolution, and SNR. For successful data acquisition using this technique, subjects’ breathing patterns need to be steady without severe movements near the upper airway.
Real-time dynamic 3D acquisition of the upper airway during spontaneous sleep was demonstrated in obese adolescents [49]. The technique showed the potential to provide information of the upper airway collapse site in obstructive apnea events during the subject’s natural sleep in MRI. Fig. 6 shows the 2D sagittal and axial slices of the 3D upper airway at a time frame prior to the OSA event and at a time frame during the OSA event, demonstrating the benefit of real-time 3D imaging in evaluating airway obstruction patterns.
In speech imaging, dynamic 3D visualization of the vocal tract with high temporal resolution was demonstrated [50].
The technique covers the entire vocal tract with 2D real-time MRI while the subject repeats the same utterance for every sagittal slice. The gated technique is capable of yielding dynamic 3D visualization of the vocal tract, but it requires many repetitions of the utterance and substantial post-processing effort for alignment and segmentation. Another method demonstrated the use of low-rank modeling and sparse sampling to substantially accelerate the imaging speed [51].
RF coils
Commercial coils such as the birdcage head coil and the multi-channel head-neck array coil were designed to produce optimal SNR in the brain or neck regions of interest. Custom RF receiver coils were designed and demonstrated to increase image SNR in the tongue, lips, soft palate, and pharyngeal wall [31,52] and to enhance the parallel imaging performance for high acceleration factors (Fig. 7).
IMAGE RECONSTRUCTION
Highly accelerated imaging is realized by image reconstruction from undersampled k-space data. Conventional reconstruction of undersampled k-space data results in spatial aliasing artifacts in images. Parallel imaging reconstruction exploits the spatial information available from multiple channel coils to recover images without spatial aliasing [12]. Compressed sensing reconstruction exploits transform sparsity and incoherent aliasing from a pseudo-random undersampling scheme to recover images [14]. This typically involves minimization of the sum of the data consistency L2 norm and the sparsity-promoting L1 norm weighted by a regularization parameter.
Low-latency reconstruction
The image reconstruction period is a particularly important factor for real-time interactive imaging, where an ideal time interval between image acquisition and display is within 100 ms for fast interactions and scan parameter changes by the MRI operator. Spiral imaging is inherently fast and was an option for real-time interactive imaging [20,53]. Gridding reconstruction is performed in radial or spiral imaging to map non-Cartesian data to a Cartesian grid followed by fast Fourier transform (FFT) to reconstruct an image [54]. The gridding reconstruction process is sufficiently fast to guarantee acceptable latency in real-time interactive imaging, but it does not use any image acceleration framework such as parallel imaging.
Recently, Lingala et al. [55] demonstrated the feasibility of through-time spiral generalized autocalibrating partial parallel acquisition (GRAPPA) [21] for low-latency reconstruction in real-time speech MRI. The through-time spiral GRAPPA achieved four-fold acceleration and reconstructed image frames at a rate of 18 ms/frame with eight processors [55].
Iterative reconstruction
Constrained reconstruction (also known as compressed sensing parallel imaging) is used to obtain accurate image estimates iteratively from undersampled k-space data. Since it updates the image estimate at every iteration, the reconstruction time is proportional to the iteration number. The iterative reconstruction is based on the minimization of a cost function that is typically the sum of data consistency L2 norm and the sparsity-promoting L1 norm weighted by a regularization parameter. It is well known that iterative reconstruction is often performed off-line and takes more time than conventional FFT-based reconstructions. Parallelization over multiple graphic processing units is an advanced method for acceleration of the reconstruction period [56,57].
Deep learning-based reconstruction
Recent trends in MRI reconstruction investigate the effectiveness of machine learning in optimizing parameters related to image reconstruction algorithms [58,59]. The parameter learning involves learning of the regularization parameter and filter coefficients concerned with image unaliasing. Once the parameters are trained by learning algorithms, they are theoretically not necessary to tune after data acquisition. Another advantage of the learning-based reconstruction, compared to compressed sensing iterative reconstruction, is efficient reconstruction time [58]. It is highly expected that the learning-based reconstruction framework will be applied to upper airway MRI.
IMAGE ANALYSIS
Vocal tract analysis
Dynamic real-time speech MRI typically acquires thousands of image frames over 20 seconds of fluent speech. Articulatory information is obtained from a single midsagittal slice image; thus, manual segmentation and analysis of all image frames is time-consuming and laborious. Hence, custom automatic or semi-automatic analysis techniques were developed by speech scientists who were interested in articulatory analysis of MRI data.
A variety of analysis methods have been developed by several research groups. A statistical shape model was built from training images and was used to automatically segment unseen vocal tract images [60]. Automatic segmentation and labeling of the vocal tract articulators in midsagittal dynamic images was demonstrated in another study [61]. Another report demonstrated an approach based on a graphical user interface for vocal tract segmentation in midsagittal dynamic images [62]. Machine learning has also been used to automatically segment individual articulators in real-time midsagittal images [63]. Semi-automatic estimation of vocal tract area function was demonstrated using a graphical user interface in accelerated 3D MRI data of sustained speech [64]. It involves the user’s annotation of anatomical landmarks, centerline extraction, cross-sectional slicing of the airway, and automatic segmentation of the cross-sectional airway. Fig. 8 illustrates the semi-automatic procedures for estimating vocal tract area functions.
Internal tissue deformation analysis
Internal tissue motion tracking is often performed on images acquired with preparation pulses of SPAMM. Tagged images in the tongue can be analyzed using harmonic phase (HARP) MRI [34]. Measurements involve displacement and velocity of tissue points and strain of specific muscles. The trajectories of tissue points can be visualized as path lines.
Airway narrowing/collapse analysis
A custom graphical user interface was developed to synchronously visualize real-time 3D MRI movies and measured physiological signals (Fig. 9). This allowed the user to rapidly inspect sleep apnea events and associated upper airway images from approximately 20- to 30-minute data. Frame-byframe semi-automatic quantification of pharyngeal airway volume was demonstrated by using 3D region growing segmentation in real-time 3D MRI data [65]. The technique also enabled automatic detection of an airway collapse event from 4D airway data.
Computational fluid dynamics (CFD) simulations were performed to predict flow pressure measurements in human airways from 3D anatomical MRI or CT data [66-68]. The study demonstrated the potential of CFD modeling to elucidate the mechanism of OSA. Finite element analysis modeling was developed to predict the airway closing pressure and airway collapse site under different surgical treatment options [69].
Deep learning
As deep learning [70], more specifically deep convolutional neural network (CNN), shows incredible performance compared with conventional machine learning methods in computer vision [71], its applicability has been investigated in other domains, including medical image analysis. Since the development platforms (e.g., Keras, Tensorflow, PyTorch, Caffe) provide open source software for deep learning, researchers easily access and choose deep learning libraries on their own purposes and modify their programming scripts to implement deep learning algorithms. In MRI, a variety of deep CNN methods have been demonstrated in a variety of applications: for example, brain tissue segmentation [72], cerebral infarct segmentation [73], and cerebral microbleeds detection [74].
Deep learning-based image analysis was recently demonstrated in vocal tract shape analysis. An encoder-decoder CNN was demonstrated to automatically extract the vocal tract air-tissue boundaries [75,76].
MR EXPERIMENTAL ISSUES
Acoustic noise
MRI scans produce loud sounds, which are caused by the vibrations of the three pairs of gradient coils and have been reported to exceed 130 dBA in extreme cases on commercial 3 T scanners [77]. The noise is more severe at higher magnetic field strengths [78]. This may cause patients to experience difficulties in falling asleep in sleep MRI. Sequence parameters such as TR can be adjusted to potentially reduce the sound pressure level. Continuous scanning with a lengthened TR is empirically known to produce a low tone and a steady “humming” sound.
Mirror projector setup
In speech MRI, a mirror projector setup wherein the subject lies on the scanner bed and reads the sentences through the mirror is useful. The experimenter, who is positioned outside the MRI scanner room, operates a laptop and plays the slides containing sentences to read. A projector is used to display the content of the slides to the subject. If the subject is near-sighted and the subject’s glasses contain metallic objects, MR-compatible glasses may be provided prior to the scan in order to help the subject read the stimuli.
Synchronized acquisition
Obstructive apneic events occur during a subject’s sleep. Electroencephalograms are recorded synchronously with MRI to objectively determine sleep and wakefulness [79]. Respiration, heart rate, and oxygen saturation signals are simultaneously recorded to infer central or obstructive apneic events [49]. A pressure transducer can be used to monitor mask pressure. A continuous positive airway pressure device is used in an MRI environment to control airway pressure [80].
Speech MRI synchronously records the audio signals from the subject. The optical microphone is an MR-compatible device, so it can be used inside the magnet room. The raw speech signal acquired during MRI scans is corrupted with the MR noise. Recovery of uncorrupted speech sounds requires signal-processing techniques, including adaptive signal processing [81], dictionary learning [82], and combined operations in the time and frequency domains [83]. Audio denoising is often applied off-line after the MRI and audio acquisitions are completed.
Supine position
The supine position in MRI examinations is not an ideal position for speech and swallowing tasks. An open-type MRI scanner can be used to acquire images in the upright position, but its availability is typically more limited than conventional MRI scanners. One study compared vocal tract configurations between the supine and upright positions [84].
Real-time interactive imaging
Real-time interactive imaging software [85] has proved useful for a variety of applications, including high-intensity focused ultrasound ablation of the liver [86], OSA [23,40], and speech imaging [53]. Real-time interactive imaging for speech involves (1) the operator’s rapid prescription of a midsagittal scan plane based on anatomical landmarks (e.g., nose tip, pharyngeal airway) in the upper airway, (2) center-frequency adjustment and linear shimming for rapid and interactive de-blurring in spiral imaging, and (3) the user-interactive gradient delay correction. With real-time interactive imaging, the subject’s inappropriate behaviors, such as non-speaking or non-responsiveness, can be monitored during scanning. In particular, RTHawk (HeartVista Inc., Los Altos, CA, USA), a commercial real-time interactive imaging software package, has been mainly developed for comprehensive cardiac MRI exams and also has been used for more than a decade to establish a database of real-time speech MRI [22].
Patient comfort
The MRI examination is a loud and claustrophobic process, and thus is not a friendly environment for the subject’s natural sleep. Instructions to properly use the ear plugs are helpful in reducing the noise perceived by the subjects. Bedding with memory foam can also help improve patient comfort.
Repeated real-time speech MRI scans without sufficient pauses between the scans can lead to excessive heating in the gradient amplifiers. The gradient amplifier heating issue sometimes results in inadvertent interruptions in the scanner’s operation and causes long waiting times for the subjects until the scanner resumes working. This can be avoided by adequate pauses between the scans.
Teeth
Visualizing the contours of the teeth is important in measuring the degree of constriction in the production of fricative sounds. However, dynamic real-time MRI cannot visualize tooth contours because teeth, which are bony structures, lack hydrogen. MRI data were acquired while the subject held blueberry juice as a contrast medium for MRI [87]. Trays containing ferric ammonium citrate gels, which are T1-shortening contrast media, were used to visualize the teeth [88]. A relatively simple approach involves acquiring MRI data while the subject wraps his or her tongue to the upper teeth followed by the lower teeth. These methods enable extraction of the contours of the teeth, which are superimposed over the dynamic real-time images.
Image artifact issues
Dental work often causes image artifacts near the mouth in the subject’s upper airway. MRI screening procedures prior to MRI scanning should check if the subject is wearing dental braces or has undergone metallic dental work. Fig. 10A illustrates an example of the signal voids caused by the presence of dental work.
RF interference can cause image artifacts during real-time MRI. In spiral imaging, the artifact appears as a ring-like pattern (Fig. 10B) instead of the zipper artifact pattern observable in Cartesian imaging.
APPLICATIONS
Upper airway MRI has a wide range of applications in research on sleep apnea and speech production as well as in research on other areas such as swallowing and singing. Related works are briefly described for each application.
Obstructive sleep apnea
OSA is a disease characterized by repetitive episodes of upper airway collapse during sleep [89-91]. Dynamic upper airway MRI was used to identify airway narrowing or obstruction sites in OSA patients with sedation [92,93]. It was also used to evaluate airway narrowing during wakefulness and natural sleep [94].
Deformation of airway tissue can be measured using tagged MRI sequences. Analysis of displacements of the tag lines indicated that the genioglossus moved anteriorly during inspiration in healthy awake subjects [95]. Different patterns of tongue motion were observed during awake breathing in OSA patients [96].
Static 3D imaging methods with conventional MR pulse sequences have been used to quantify anatomical regions of interest for clinical OSA research. A T1-weighted sequence was used to measure the tongue volume and lateral pharyngeal wall volume [97]. A three-point Dixon water-fat separation sequence was used to measure tongue fat volume from fat images [98].
Speech and singing
Dynamic real-time MRI of speech can provide unique and valuable insights into understanding the spatial and temporal aspects in vocal tract shaping. This technique has been adopted by linguists or speech scientists to investigate speech production and articulatory gestures in a variety of speech tasks or language settings. For example, real-time speech MRI was exploited to investigate the temporal dynamics of vocal tract articulators of interest in nasal sounds [99], English diphthongs [100], Tamil retroflex consonants [101], etc.
Speech pathology has been investigated using dynamic real-time MRI. Real-time MRI of a patient with speech apraxia could capture vocal tract shaping in silent initiation gestures at speech onset and that during covert articulation of words [102].
Vocal tract shaping from real-time MRI was investigated in resonance tuning in soprano singing [103] and in tenors’ passaggio [104].
Velopharyngeal function
Three-dimensional high-resolution anatomical MRI can be used to visualize the velopharynx including the levator muscle. Early work demonstrated the use of real-time MRI to evaluate velopharyngeal closure in patients with velopharyngeal insufficiency [105]. High frame-rate dynamic speech imaging was used to assess the velopharyngeal anatomy in a midsagittal and an oblique coronal scan plane [106]. The technique enabled visualization of the movements of the soft palate and pharyngeal wall during speech production containing nasal sounds. The imaging is known to be useful in assessing velopharyngeal function in subjects with cleft lip and palate.
Swallowing disorder
An early study investigated three different pulse sequences to evaluate image quality during swallowing [107]. The potential of real-time spiral MRI at 1.5 T was demonstrated in evaluating swallowing function in patients receiving tongue cancer treatment [108]. Real-time MRI was performed at 3 T using GRE sequences [109,110] and a radial FLASH with undersampling [111].
CONCLUSION
Upper airway imaging techniques for speech production and sleep apnea research have been covered in this review. Image acquisition and reconstruction methods for high spatial and temporal resolution as well as full 3D coverage of the upper airway have been investigated by numerous MRI research groups. These techniques involve the use of compressed sensing, parallel imaging, and custom or commercial RF coils, which are sensitive to the upper airway regions of interest. When these research protocols on pulse sequence and reconstruction are not available, the use of commercial pulse sequences is possible for dynamic upper airway MRI [112]. The use of real-time interactive imaging software is beneficial especially for efficient and robust data acquisition in non-Cartesian imaging. In addition, speech and sleep MRI experimental research studies require other MR-compatible measurement devices (e.g., fiberoptic microphone, facial mask), which are not often available for routine clinical exams. High-field MRI (e.g., 7-Tesla MR) is gaining popularity in MR research, but, for dynamic real-time imaging of speech and sleep apnea, lower field strength magnets (e.g., 1.5-Tesla or below) may be advantageous because of the moderate sound pressure level and lower resonance offset from the large degree of magnetic susceptibility between the air and tissue.
Deep learning has recently gained popularity in computer vision [113] and medical image analysis [114]. It is also emerging in the areas of MRI reconstruction and image segmentation. Current trends in deep learning-based image reconstruction suggest that real-time upper airway imaging may benefit from faster reconstruction times using a reconstruction method with a learned model than with conventional iterative reconstructions (e.g., compressed sensing parallel imaging). Automatic post-processing of upper airway image data may benefit from deep learning-based techniques in the areas of image segmentation and landmark detection.
Notes
No potential conflict of interest relevant to this article was reported.
Acknowledgements
This work was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT and Future Planning (Grant Number: NRF-2015 R1C1A1A02036340, NRF-2018 R1D1A1B07042692). The author acknowledges the Speech Production and Articulation kNowledge group at the University of Southern California and thanks anonymous reviewers for valuable comments.