Localization of cervical spine fractures in medical images is a challenging task that requires a large amount of labeled data for accurate diagnosis. However, obtaining labeled data is time-consuming and difficult, which limits the application of supervised learning methods. In this thesis, we propose a semi-supervised learning approach to improve the accuracy of cervical spine fracture localization by combining a small amount of labeled data with a larger amount of unlabeled data.
Our approach exploits semi-supervised learning techniques to learn patterns and features in a larger set of unlabeled CT images, improving the model's ability to generalize to new and unseen cases. Additionally, our approach is more robust to noisy or inaccurate labeled data, as the model can learn to ignore or weight labeled data based on its confidence in the label. To increase the amount of labeled data available for training, we are also exploring data augmentation techniques such as pivoting, flipping, cropping.
We demonstrate the effectiveness of our approach through experiments on a dataset of CT scans for localization of cervical spine fractures. Our results show that our semi-supervised learning approach improves the accuracy of cervical spine fracture localization compared to traditional supervised learning methods, even when trained on a limited amount of labeled data. Overall, our approach has the potential to improve the diagnosis of cerebrospinal fluid in medical images, ultimately leading to better patient outcomes.
This work was carried out with the support of Grant Award Number under the project "Cervical Spine Fracture Localization and Vertebrae segmentation using Semi-Supervised Learning" from Nazarbayev University.
Introduction / Literature Review
- Traditional Vertebrae fracture screening
- Machine learning
- Classification of deep learning
- Model architecture
- Segmentation
- Fracture Detection
This literature review will provide a comprehensive overview of the state-of-the-art in cervical spine fracture detection and localization using machine learning techniques, as well as the potential of semi-supervised learning in this area. The cervical spine is located in the upper part of the spine and connects the head to the shoulders. This approach can be used to process 3D volumes with less computational cost than a full 3D CNN while still capturing some of the 3D spatial features.
Normalization involves scaling the pixel values of an image to a specific range to improve the efficiency of the training process. Image augmentation refers to the process of artificially increasing the size of the training dataset by applying transformations to the input images. These transformations help introduce variability into the training data and improve the robustness of the model to variations in input images.
In addition to improving the quality of the input data, preprocessing and image augmentation can also help prevent overlap. By introducing variability into the training data, preprocessing and image augmentation can help reduce the risk of overfitting and improve overall model performance. In most cases, clinical image datasets have the problem of uneven distribution of image classes.
Oversampling: This involves randomly duplicating samples from the minority class to balance the distribution of the classes. Undersampling: This involves randomly removing samples from the majority class to balance the distribution of the classes. Cost-sensitive learning: This involves assigning different misclassification costs to the different classes, so that the cost of misclassification of the minority class is higher than that of the majority class.
Ensemble learning: This involves combining multiple models, each trained on a different subset of the data, to improve performance on the underrepresented class. Convolutional layers: The first layer of CNN applies filters to the input image to extract features such as edges, corners and other low-level features. Pooling Layers: After each convolutional layer, a pooling layer is used to reduce the spatial dimensions of the output feature maps.
The most common CT image size is 512 x 512 pixels, but other sizes may be used depending on the scanner and the preferences of the radiologist or technologist. The number of FC layers can be increased for improved model accuracy.
Methods
Materials
- Dataset
Figure 2-3 shows that the total target sites are roughly balanced with a 52/48 split, meaning that the distribution of the target variable, which can be a binary outcome or a categorical variable, is relatively evenly split between the two categories. This suggests that there is no significant bias or imbalance in the distribution of the target variable. This suggests that there may be differences in the susceptibility of different cervical vertebrae to fracture and that some vertebrae may be more prone to fracture than others.
More patients have more than one fracture, suggesting that some patients are at greater risk for fractures than others and that the occurrence of one fracture may increase the likelihood of additional fractures. This information is useful for understanding patterns of fracture occurrence and may be important for developing prevention or treatment strategies. It is important to resize the remaining images to 512 x 512 to standardize the image packet for the CNN as input data.
The dataset contains metadata with segmentation files for semi-supervised learning for each patient, which provided enhanced metadata by removing low-contrast image slices [ 22 ].
Segmentation
- Baseline model
- Data analysis
Fracture detection
- Baseline model
- Solver parameters
- Data Analysis
The EfficientNet backbone is used to extract features from the images, and then a semi-supervised learning algorithm is used to learn a mapping between the extracted features and the location of the fracture. The images are first loaded from the train map and then converted into 3 x 384 x 384 tensors using the transforms used to pre-train EfficientNet_V2_S on ImageNet 1000. Then the EfficientNetV2 pre-trained encoder is used to pre-train the images. processing, and the The final classification layer is ignored because it is not relevant to the current task.
Only visible fractures are predicted on the current slice by masking the fracture targets with visible vertebra targets, and the visible vertebra targets are used in the loss function without any changes. The predictions and targets are then passed to the BCELoss function, which optimizes seven independent binary classification targets for C1-C7. A non-parametric model is used to combine the predictions of the underlying models to estimate the final outcome with one record per patient instead of one record per scan.
For example, if there is uncertainty that C3 is in the slice, but there is a high probability that C3 will break, it will be added to the final lightweight aggregate. It is designed to increase the learning rate for the first epochs and then gradually decrease it, following a "one cycle" pattern [24]. The learning rate schedule is usually modeled according to a triangular wave, where the learning rate starts low, gradually increases to a maximum value, and then decreases back to the initial value.
The OneCycleLR scheduler has several advantages over other learning rate schedulers, including faster convergence, improved generalization, and reduced sensitivity to the initial learning rate. By gradually increasing and decreasing the learning rate, the OneCycleLR scheduler allows the model to quickly converge to a good solution and then fine-tune the parameters to achieve better performance. To use the OneCycleLR scheduler, the user specifies the initial learning rate, the maximum learning rate, and the total number of epochs.
The exact shape of the triangular wave can be modified using additional hyperparameters, such as the percentage of the cycle used to increase the learning rate and the percentage used to decrease it. To measure the performance of the cervical spine fracture localization system, we used several evaluation metrics, including accuracy, precision, recall, and F1 score. These metrics were calculated on the validation and test sets to assess model performance on unseen data.
Results
Segmentation
- EfficientNet vertebrae detection
Based on the true positive and false positive values, we can calculate the accuracy of the model, which is the ratio of true positives to the total number of positive predictions made by the model. On the other hand, based on the true positive and false negative values, we can calculate the recall of the model, which is the ratio of true positives to the total number of true positive cases.
Fracture detection
Conclusion
Stiell et al., “A multicenter program to implement the Canadian C-spine rule by emergency department triage nurses,” Ann Emerg Med, vol. Pinto et al., “Errors in imaging patients in the emergency setting,” British Journal of Radiology, vol. Gaillard, “Computer versus human: deep learning versus perceptual training for the detection of femoral neck fractures,” J Med Imaging Radiat Oncol, vol.
Wu, “Handling Imbalanced Medical Image Data: A Deep Learning-Based Single-Class Classification Approach,” Artif Intell Med, vol. Langlotz et al., “A roadmap for basic research on artificial intelligence in medical imaging: from the 2018 NIH/RSNA/ACR/The Academy workshop,” Radiology, vol. Hassanpour, “Deep neural networks for automatic detection of osteoporotic vertebral fractures on CT scans,” Comput Biol Med, vol.
Zhou et al., “Automatic detection and classification of rib fractures on thoracic CT using convolutional neural network: accuracy and feasibility,” Korean J Radiol, vol. Ukai et al., “Pelvic fracture detection on 3D-CT using deep convolutional neural networks with multi-oriented plate images,” Sci Rep, vol. Kunst, “CT detection of cervical spine fractures using a convolutional neural network,” American Journal of Neuroradiology, vol.