Artificial intelligence and machine learning in spine research
Abstract and Figures
Artificial intelligence and machine learning techniques are revolutionizing several industrial and research fields like computer vision, autonomous driving, natural language processing, and speech recognition. These novel tools are already having a major impact in radiology, diagnostics, and many other fields in which the availability of automated solution may benefit the accuracy and repeatability of the execution of critical tasks. In this narrative review, we first present a brief description of the various techniques which are being developed nowadays, with special focus on those used in spine research. Then, we describe the applications of artificial intelligence and machine learning to problems related to the spine which have been published so far, including the localization of vertebrae and discs in radiological images, image segmentation, computer‐aided diagnosis, prediction of clinical outcomes and complications, decision support systems, content‐based image retrieval, biomechanics, and motion analysis. Finally, we briefly discuss major ethical issues related to the use of artificial intelligence in healthcare, namely accountability, risk of biased decisions as well as data privacy and security, which are nowadays being debated in the scientific community and by regulatory agencies. This article is protected by copyright. All rights reserved.
Figures - uploaded by Fabio Galbusera
Author content
All figure content in this area was uploaded by Fabio Galbusera
Content may be subject to copyright.
Discover the world's research
- 20+ million members
- 135+ million publications
- 700k+ research projects
Join for free
Author content
All content in this area was uploaded by Fabio Galbusera on Mar 15, 2019
Content may be subject to copyright.
Content may be subject to copyright.
REVIEW
Artificial intelligence and machine learning in spine research
Fabio Galbusera | Gloria Casaroli | Tito Bassani
Laboratory of Biological Structures Mechanics,
IRCCS Istituto Ortopedico Galeazzi, Milan,
Italy
Correspondence
Fabio Galbusera, Laboratory of Biological
Structures Mechanics, IRCCS Istituto
Ortopedico Galeazzi, via Galeazzi 4, 20161
Milan, Italy.
Email: fabio.galbusera@grupposandonato.it
Artificial intelligence (AI) and machine learning (ML) techniques are revolutionizing several indus-
trial and research fields like computer vision, autonomous driving, natural language processing,
and speech recognition. These novel tools are already having a major impact in radiology, diag-
nostics, and many other fields in which the availability of automated solution may benefit the
accuracy and repeatability of the execution of critical tasks. In this narrative review, we first pre-
sent a brief description of the various techniques that are being developed nowadays, with spe-
cial focus on those used in spine research. Then, we describe the applications of AI and ML to
problems related to the spine which have been published so far, including the localization of ver-
tebrae and discs in radiological images, image segmentation, computer-aided diagnosis, predic-
tion of clinical outcomes and complications, decision support systems, content-based image
retrieval, biomechanics, and motion analysis. Finally, we briefly discuss major ethical issues
related to the use of AI in healthcare, namely, accountability, risk of biased decisions as well as
data privacy and security, which are nowadays being debated in the scientific community and
by regulatory agencies.
KEYWORDS
artificial neural networks, deep learning, ethical implications, outcome prediction,
segmentation
1| INTRODUCTION
The last decade has seen a massive increase in the use of artificial
intelligence (AI), especially machine learning (ML) technologies, for
several applications. For example, personal assistants able to under-
stand vocal natural language and to perform simple tasks such as
retrieving information from a calendar, managing home automation
devices and place online orders are now being used on millions of
smartphones. A notable example of state-of-the-art AI is the self-
driving car, which employs computer vision and other sensors to
sense the surrounding environment, and automated control systems
to take decisions and move without any human input.
While AI and ML are sometimes used in the generalist press as
synonyms, ML constitutes only a branch of AI, the one dealing with
methods to give a machine the capability to "learn, " that is to improve
the performance in specific tasks, based on previous experience or on
provided data.
1
Although other AI branches such as symbolic reason-
ing, heuristics, and evolutionary algorithms have had a tremendous
impact on science and technology,
2
ML arguably constitutes the most
interesting and promising field of AI for applications in medical
research (Figure 1).
ML is based on the availability of data, which is used to train the
machine to perform the desired tasks. Due to its nature, ML lends
itself well to applications in which input data are used to generate an
output based on some features of the inputs themselves, for example,
to perform image classification. Indeed, a research area which was
dramatically advanced by ML in recent years is image processing.
Thanks to the continuous technical improvements, in 2015, a deep
neural network achieved for the first time superhuman performance
in a famous image classification contest, the ImageNet Large Scale
Visual Recognition Challenge.
3
Computer can nowadays perform tasks
such as image classification, object detection (eg, face detection and
recognition), and landmark localization better than expert human
operators. Although the deployment of such powerful technologies to
medical imaging is still in its infancy, radiologists generally agree that
ML is a truly disruptive technology which can deeply transform how
Received: 14 December 2018 Revised and accepted: 31 January 2019
DOI: 10.1002/jsp2.1044
This is an open access article under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs License, which permits use and distribution in any
medium, provided the original work is properly cited, the use is non-commercial and no modifications or adaptations are made.
© 2019 The Authors. JOR Spine published by Wiley Periodicals, Inc. on behalf of Orthopaedic Research Society
JOR Spine. 2019;e1044. jorspine.com 1of20
https://doi.org/10.1002/jsp2.1044
imaging data are interpreted and exploited for treatment planning and
follow-up.
4
The impact of ML and AI on other basic medical research
fields has been less conspicuous so far; nevertheless, numerous novel
applications, for example, in motion analysis and mechanical charac-
terization of tissues, are starting to emerge.
As testified by the sharp increase in the number of published
papers in recent years, AI and ML are more and more being used to
investigate issues related to the spine, especially in radiological imag-
ing but also in other fields such as the outcome prediction of treat-
ments. The reported results are either promising or already surpassing
the previous state of the art in several applications; for example, ML
techniques nowadays allow for an accurate and perfectly repeatable
grading of intervertebral disc degeneration on magnetic resonance
imaging (MRI) scans. Indeed, the current pace of technical improve-
ments is expected to being further benefits in the next future.
With this narrative literature review, we aim at raising the aware-
ness of the current achievements and potential spine-related applica-
tions of AI in the spine science community, including readers working
in different fields who are not familiar with the technical aspects of
such technologies. To this aim, the paper first presents a brief general
overview of AI, with special emphasis on ML and its recent advances
which are having a practical or potential impact on spine research.
The following paragraphs describe the state of the art of the use of
ML and AI in spine science, including diagnostic spine imaging, the
prediction of the outcome of therapeutic interventions, clinical deci-
sion support systems, information retrieval, biomechanical analysis
and characterization of biological tissues, and motion analysis.
2| HISTORICAL PERSPECTIVE
The first steps toward AI date back to the development of general
purpose computers, which were pioneered during the Second World
War and become available for nonmilitary use in the 1950s. The
newly available computing power allowed creating symbolic AI pro-
grams, that is, algorithms that apply a set of rules in order to imitate
reasoning and to draw decisions.
2
Notable examples of such programs
are those aimed at checkers
1
and chess gaming, which achieved very
good performances already in the 1970s,
5
and the first chatbots
which could simulate to some extent a conversation in natural lan-
guage.
6
In parallel, taking advantage of the recent advances in neuro-
logical research which showed that the central nervous system
consists of a large network of units communicating via electric signals,
research groups started developing the so-called artificial neural net-
works (ANNs), that is, networks of artificial neurons mimicking the
brain structure (Figure 2A),
7
by means of analog systems.
8
These net-
works, such as, for example, the perceptron,
9
showed to be able to
perform simple logical functions and to recognize classes of patterns,
although with significant limitations.
10
After the first two decades of research, there was a succession of
phases of general skepticism (the so-called " AI winters" ) mainly due to
an underestimation of the complexity of the problems to be solved
and lack of the necessary computer power, and optimistic phases with
larger funding and technological breakthroughs.
2
In the 1980s, expert
systems, that is, computer programs able to deal with practical prob-
lems based on set of rules derived from human expert knowledge,
were successfully employed in several research and industrial fields. In
the same years, ANNs were revamped by the development of
backpropagation,
11
a powerful training algorithm which is still the base
for their use nowadays.
In the last two decades, the increases in computer power and its
improved accessibility even for small research institutes, made possi-
ble by graphics processing units (GPUs) with tremendous parallel com-
puting capabilities, fostered the adoption of AI solutions for many
practical applications.
12
While the achievement of strong AI , that is, a
computer program with a flexible intelligence which can perform any
task feasible for humans, remains out of the foreseeable future, nar-
row AI, that is a machine able to apply AI only to a specific problem,
has found a widespread use. Internet search engines and speech rec-
ognition software are good showcases of the huge potential of the
recent advances.
One of the branches of AI which is seeing the fastest improve-
ments is deep learning
12
(Figure 2B). In most implementations, this ML
method is based on deep neural networks, that is, network
FIGURE 1 Schematic overview of the main branches of artificial intelligence (AI), including machine learning (ML) methods which are having an
impact on spine research
2of20 GALBUSERA ET AL.
architectures with several layers, and is revolutionizing research fields
such as image processing, voice recognition and natural language pro-
cessing. In addition to the improved computer power, a key driver for
the success of deep learning was the availability of big data , massive
datasets collected from various sources, including the Internet and
medical institutions (eg, imaging databases), which are extremely valu-
able for an effective exploitation of deep learning in practical applica-
tions. As a matter of facts, most of the scientific papers applying AI to
spine research, which are described in the paragraph " Applications of
AI and ML in spine research," are based on deep learning.
3| MACHINE LEARNING
The expression "ML " was introduced by Arthur Samuel in 1959, who
defined it as the field of study that gives computers the ability to learn
without being explicitly programmed.
1
This paragraph summarizes the
main concepts of ML, which are presented in deeper details
elsewhere.
13,14
The general aim of ML is to make a prediction, that is, to estimate
the value of a desired output given an input, based solely on features
provided by the model developer or automatically learned from train-
ing data. More specifically, common applications of ML include:
(a) Classification : the input is assigned to a specific category
among a group of two or more. An example of binary classification is
the automated diagnosis of cancer based on histopathological images,
in which the machine should decide if an image shows features (eg,
texture and color information) depicting a pathological condition. The
automation of Pfirrmann grading for disc degeneration exemplifies a
multiclass classification problem, in which an MRI scan of the disc
should be assigned to a category ranging from 1 (healthy disc) to
5 (severe disc degeneration).
15
Image segmentation, in which each
pixel is labeled based on its belonging to a specific region or anatomi-
cal structure, can also be considered as a subclass of classification
problems.
(b) Regression : the output of the task is continuous rather than dis-
crete. An example of a regression problem is the determination of the
coordinates of an anatomical landmark in a radiographic image.
(c) Clustering : the provided inputs are divided into groups, based
on features learned from the inputs themselves. Cluster analysis is
used to classify data when no a priori knowledge about the belonging
to a specific class is available. Clustering has been used, for example,
to subdivide into groups patients suffering from osteoporotic verte-
bral fractures based on pain progression.
16
Another way to describe the different forms of ML is based on
the nature of the tasks to be performed:
(a) Supervised learning : the machine learns to predict the output
based on a collection of inputs for which the correct output ( ground
truth) is known. In most implementations, supervised learning consists
in learning the optimal manner to map the inputs to the outputs, by
minimizing the value of a loss function representing the difference
between the machine predictions and the ground truth. It is the most
common type of learning used in medical research.
(b) Unsupervised learning : the machine learns from input data for
which there is no ground truth. This type of learning task identifies
patterns and features in the inputs, with the aim of extracting new
knowledge from the available data. Clustering is an application of
unsupervised learning.
(c) Reinforcement learning : instead of having ground truth data
available at the beginning of the task, feedback about the correctness
of the execution is provided after the task has been completed, thus
acting like to a reward or a punishment. Reinforcement learning is typ-
ically used in dynamic or interactive environments, for example, in
gaming. Clinical decision-making is rapidly gaining interest as another
field of application. Models of reinforcement learning are valuable
tools for the investigation of how nonhuman animals and humans
learn the causal structure of tasks and phenomena.
Regardless of the task to be performed, the availability of large
datasets to be used for training the algorithm and to test its accuracy
is essential for a successful implementation of ML. Especially in medi-
cal research, this requirement poses serious challenges related to data
privacy, ethics, regulation, and liability, which are described in
Section 6.
4| METHODSUSEDINSUPERVISED
LEARNING
The next paragraphs provide a brief summary of the methods used for
supervised learning, which play a cardinal role among the ML tasks in
FIGURE 2 Schematic representation of an artificial neural network (A), a deep network (B), and a unit, also called artificial neuron (C). In each
unit, the inputs ("x
1,3
") are multiplied by weights ( "w
1,3
"), summed to a bias term ( "+t"), and the total sum is processed by a linear or nonlinear
activation function ("Ï")
GALBUSERA ET AL .3of20
medical research, and are described in detail elsewhere.
13
The concept
of supervised learning is based on the estimation of a function which
maps an input, which can be, for example, an image or a collection of
clinical data regarding a patient, to an output value. The training data
therefore consists of a set of pairs including an input and the relative
output, which is known. When the mapping function has been deter-
mined, it can be used to process new inputs for which the value of the
output is not available. If the number of training examples is sufficient
and an appropriate learning algorithm has been chosen, the algorithm
itself should be able to generalize well, that is, to provide accurate
results for inputs similar but different to those included in the training
data. Conversely, the predictions may reveal overfitting , that is, results
fitting precisely the input data but not able to make accurate predic-
tions on additional data, or underfitting , which happens when the
learning model is not sufficiently complex to capture the features of
the input data
17
(Figure 3).
4.1 | Methods derived from statistics
Although considering linear regression in the realm of ML might be
counterintuitive, it constitutes a good example of a simple method to
create a function which maps an input (a number, or more frequently
a vector of numbers) to an output. Indeed, any form of input can be
mathematically formulated as a multidimensional vector of numbers,
conventionally named features , which can be processed by linear
regression. Features are a set of variables which characterize the data,
and can be either simple and human readable (such as, eg, age and sex
of a patient) or more difficult to interpret, such as the image features
extracted with specialized algorithms like SIFT
18
and ORB,
19
or with
texture and shape analysis. Even without feature extraction, an image
such as a radiograph can be also viewed as an array of integer num-
bers with length equal to the number of pixels in the image; each ele-
ment of the array would contain the color (gray level) of the specific
pixel. From this perspective, the application of linear regression even
in case of complex and large inputs is straightforward.
The linear regression function is commonly fitted by means of the
least squares method, which therefore acts as the learning algorithm.
In this case, performing a linear regression corresponds to minimizing
the mean square error (MSE) between the predictions and the inputs;
MSE therefore represents the loss function of the algorithm. In ML lit-
erature, MSE is also commonly named as L2 loss , whereas the L1 loss
is the mean absolute error (MAE) which is also a possibly effective
choice for regression problems. Due to its simplicity and its inherent
incapability of capturing a nonlinear behavior, linear regression is
prone to underfitting, and therefore is not the method of choice for
complex ML regression tasks.
Logistic regression can be seen as the equivalent of linear regres-
sion for classification problems. In its simplest form, the inputs (one or
multiple continuous numbers) are fitted to a binary output (0 or 1) by
means of a nonlinear curve, the logistic sigmoid function, which repre-
sents the probability that an input is mapped to the " 1" output. If the
output probability is greater or equal than 0.5, a "1 " is predicted,
whereas on the contrary the output is " 0." In addition to predicting
binary outputs, logistic regression can be effectively generalized to
multiclass classification problems. MSE is not the most appropriate
choice to act as loss function for logistic regression; specialized func-
tions such as the cross entropy are employed in this respect. Similar to
linear regression, logistic regression is outperformed by more complex
algorithms for most ML classification tasks.
Another method derived from statistical inference which found
its place in ML literature is the Bayes classifier,
20
which is based on
Bayes' theorem of conditional probability. The naive Bayes classifier,
which assumes the independence of the features from each other, is
especially simple to implement, fast to train even for very large train-
ing datasets and potentially very effective in tasks where the assump-
tion of feature independence is reasonable. In spine research, Bayes
classifiers have been used for the classification of vertebral fractures
21
and for computer-aided diagnosis.
22
4.2 | Support vector machines
Considering each input belonging to the training data as a multidimen-
sional vector and therefore as a point in a multidimensional space,
performing a classification task corresponds to determining a partition
of the space which divides the points belonging to the various classes.
A support vector machine (SVM) is an algorithm which builds the
hyperplane, or a number of them, which can divide the space so that
FIGURE 3 Examples of a plausible good fitting (left), underfitting (center), and overfitting (right) in a binary classification task
4of20 GALBUSERA ET AL.
the points of the different classes are effectively and optimally parti-
tioned
23
(Figure 4).
SVMs are powerful tools to perform multiclass linear classification
tasks, including image segmentation. Although the original publication
of the method dates back to 1963,
24
SVMs are still widely used nowa-
days and may outperform the most recent techniques in specific
cases, for example, when the dataset available for training has a lim-
ited size. In spine science, SVMs have been used, for example, for the
grading of disc degeneration
25
and for the classification of scoliosis
curve types.
26
SVMs can be adapted to nonlinear classification and
regression problems, as well as to unsupervised learning (eg, for
clustering).
4.3 | Classification and regression decision trees
The use of tree-like structures in AI dates back to the pioneering
checkers programs by Arthur Samuel.
1
Even for classification and
regression purposes, decision tree was first employed in the
1950s.
27,28
Nowadays, decision trees are valuable support tools in
various fields including economics and military; notably, they are com-
monly used for the choice of the most appropriate medical treatment
in health care.
In ML, a classification and regression decision tree (CART) links the
values of the features to the possible outputs, therefore implementing
a classification or a regression task, by means of a set of conditions.
29
For each condition, the tree splits into branches, which end with ter-
minal nodes representing the outcome of the decision; due to this
peculiar structure, CARTs are easier to understand for humans with
respect to other ML techniques. CARTs can be trained based on large
sets of input data by means of specialized algorithms,
30,31
which are
generally not computationally intensive and thus suitable for very
large datasets. Regarding downsides, CARTs are prone to overfitting,
which can be limited by using special techniques such as pruning,
which reduces the size of the tree, and random forests,
32
which exploit
multiple decision trees built on random subsets of the features and
average their predictions. CARTs and random forests have been used
for several applications in spine research. As a clinical decision support
system, decision trees have been used for the management of low
back pain,
33
and for the preoperative selection of patients with adult
spinal deformity.
34
Other applications include the evaluation of the
primary fixation strength of pedicle screws
35,36
(Figure 5), and the pre-
diction of proximal junctional failure.
37
4.4 | Artificial neural networks
ANNs constitute the branch of ML which has seen the most impres-
sive improvements in recent years, so much that it has been identified
by the general public with ML itself. Applications of ANNs in medical
research as well as in spine science are countless, and are described in
detail in the paragraph " Applications of AI and ML in spine research."
ANNs are biologically inspired networks which loosely resemble
how the neurons are connected and interact in the brain.
7
Mimicking
the principles of Hebbian learning,
38
information flows from the
inputs to the outputs through artificial neurons , which are organized in
layers and perform simple operations such as making linear combina-
tions of their inputs multiplied by a weight , and then processing the
result through a linear or nonlinear activation function (Figure 2C). The
networks may include regularization terms, which are aimed at reduc-
ing the risk of overfitting by penalizing large values of the weights
through a penalty coefficient. Training the ANN consists in finding the
optimal values of the weights, so that the inputs belonging to the
training data are processed and transmitted through the layers result-
ing in outputs which fit well the ground truth.
The same loss functions described in the previous paragraphs,
that is, MSE, MAE, and cross entropy, are commonly used to train
ANNs and as metrics for their performance. In its simplest
FIGURE 4 Schematic representation of a simple support vector
machine (SVM) used for binary classification. In brief, the SVM builds
the optimal hyperplane (in green) which separates the two classes
maximizing the gap between them. A non-optimal hyperplane
(in orange) which correctly separates the two classes, but with a
smaller gap, is also shown. The SVM operates in the feature space
("x1 " and "x2 " in the exemplary figure)
FIGURE 5 Example of a decision tree trained to predict the risk of
failure of pedicle screws. Reproduced with permission from Varghese
et al
36
GALBUSERA ET AL .5of20
implementation, the training algorithm, named backpropagation,
11
consists in calculating the derivatives of the loss function with respect
to each weight, and adjusting the specific weight by the value of the
respective derivative multiplied by a coefficient, the learning rate. Iter-
ating the process determines a decrease of the loss function, which
would reach a minimum after convergence has been achieved. This
gradient descent algorithm has been superseded by more sophisticated
methods, such as, for example, the stochastic gradient descent
39
and
Adam,
40
which can generally achieve a faster and more robust
convergence.
ANNs are used in several industrial and research fields, for both
classification and regression problems. Although the applications of
ANNs in spine research are mostly based on supervised learning,
these networks are also proficiently employed for unsupervised tasks
and reinforcement learning. Starting from the earlier examples such
the single layer perceptron,
9
a simple linear binary classifier consisting
of a single layer of outputs directly connected to the inputs via a
series of weights, high-performance network architectures which are
optimized to deal with specific problems have been developed. For
example, ANNs are nowadays used to generate new data which share
some characteristics with known data by means of the so-called gener-
ative models,
41
and to process data keeping memory of previous
inputs, for example, with recurrent neural networks.
42
The latter
methods found widespread use in speech recognition and automated
language translation.
4.5 | Convolutional neural networks
Image processing, as well as computer vision in general, are arguably
the largest fields of application of ANNs. The design of convolutional
neural networks (CNNs or ConvNets) has been inspired by the struc-
ture of the animal visual cortex, based on experiments carried out in
cats and monkeys.
43,44
In the 1960s, Hubel and Wiesel described that
specific groups of neurons in the visual cortex are stimulated only be
small areas of the visual field, and extract features and information
from those areas. Specific groups of neurons are sensitive to features
such as a certain edge orientation, and others to other directions or
shapes. Visual perception then results from combining the information
coming from the neuron groups and exploiting information about their
architecture.
CNNs mimic rather closely such neuronal architecture.
45,46
In a
convolutional layer, which is the characterizing component of a CNN, a
small filter (having most commonly a size of 3 × 3× 3or5× 5× 3)
slides, or convolves, on the input image; for each possible position in
the image, a number is calculated by element-wise multiplication of
the weights of the filter by the corresponding values of the input of
the layer. The collection of all calculated numbers constitutes the so-
called activation map (Figure 6). Since a typical convolutional layer
consists of several filters, the convolution process results in a three-
dimensional matrix, each layer of which is an activation map. Convolu-
tional layers are usually combined with pooling layers,
47
which
downsample the data and help in reducing the risk of overfitting, and
dense ( fully connected) layers, which are the standard nonconvolutional
layers used in ANNs, to generate an output and thus to perform a clas-
sification or a regression task. Dropout layers, in which a predefined
fraction of artificial neurons are artificially canceled, force the network
to learn different ways of achieving the same output and are fre-
quently integrated in CNNs to reduce the risk of overfitting. Training
the convolutional layer consists in finding the optimal values of the
weights of the filters, and is performed by means of optimization algo-
rithms similar to those used for standard ANNs.
48
4.6 | Deep learning
In simple terms, deep learning is the branch of ML which employs
methods involving multiple layers of processing units, with the final
aim of being able to capture different levels of abstraction. Practically,
deep learning is most commonly based on the use of multilayer ANNs,
commonly referred as deep neural networks. Although such ANNs with
several layers were developed in conjunction with CNNs and have
FIGURE 6 Schematic representation of a convolutional neural network (CNN), here exemplary aimed at performing the grading of disc
degeneration on T2-weighted MRI scans based on the scheme presented by Pfirrmann et al.
15
In a convolutional layer, a small filter convolves
over the data creating a series of activation maps; these maps can be downsampled by pooling layers, and then processed by another
convolutional layer. In the simplest forms of a CNN, one or more fully connected layers perform the final classification or regression decision
6of20 GALBUSERA ET AL.
been available already in the 1970s, they never gained widespread
use due to the computational resources required for training, and the
lack of effective learning algorithms. In 1989, the research group of
LeCun introduced the first of a family of networks, LeNet-1, which
featured two convolutional layers and two pooling layers and could be
trained with the standard backpropagation.
49,50
LeNet-1 scored state-
of-the-art results in an image classification task; later developments,
notably LeNet-5, showed that increasing the depth of the network,
that means adding layers, could drastically improve the accuracy of
the predictions.
51
These pioneering studies, together with the
improved accessibility of computer power, opened the way to deep
learning which is nowadays considered as the most advanced frontier
in ML. It should be noted that deep learning architectures are not only
based on ANNs and aimed at computer vision, but also cover other
domains, such as, for example, the deep Boltzmann machines com-
monly used for making music and movie recommendations on the
Internet, and deep recurrent neural networks for speech recognition
and natural language understanding.
49
Recent developments of deep architectures are continuously
raising the bar in image classification tasks. In 2012, AlexNet,
52
a
CNN having five convolutional layers followed by three dense
layers, won several competitions and demonstrated that deep CNNs
have more potential for computer vision than any other current ML
technique. Among the various designs that were introduced after-
ward, some are worth of mention. The Visual Geometry Group
(VGG) architecture was developed at the University of Oxford and
is a large network with 138 million trainable parameters, 13 convolu-
tional layers and two dense layers.
53
GoogLeNet, introduced in
2014, has 22 layers (including nine Inception layers, a novel design)
but a smaller number of parameters (11 million), benefiting the com-
putational resources necessary for training.
54
The ResNet family of
networks, presented by Microsoft in 2015, features a large number
of layers, up to 152, none of which is fully connected.
3
ResNet was
the first architecture to achieve superhuman performance in image
classification; its foundation innovation, the concept of residual
learning, that is, skipping layers in order to make the deep network
easier to train, is still exploited in many of the most recent
architectures.
A key driver for the widespread diffusion of deep learning is its
easy accessibility. In the spirit of knowledge sharing and cooperative
work which characterizes computer science and is gaining momentum
also in other fields, the vast majority of the recently developed algo-
rithms are publicly available on the Internet. ML frameworks such as
Torch (http://torch.ch/), Tensorflow (https://www.tensorflow.org/),
and Caffe (http://caffe.berkeleyvision.org/), as well as high-level librar-
ies such as Keras (https://keras.io/) and PyTorch (https://pytorch.org/)
are also freely available, even for commercial use.
Together with the improved accessibility of powerful GPUs and
of cloud computing platforms offering AI products and services, the
availability of state-of-the-art deep learning software is fostering its
use in a wide range of research fields. Although the adoption of deep
learning for real-world problems in spine science is still limited by the
short time passed since its first introduction, we expect it to become a
disruptive technology in the near future, especially for spine imaging
applications.
4.7 | Assessing the accuracy and robustness of ML
tools
Before any ML tools can be used to address practical problems and
deployed to industrial or research environments, their accuracy and
robustness need to be proven by performing a proper validation. To
do so, in supervised learning, the available data are typically split in
two or three datasets, which serve different purposes.
55
The first one
is the training dataset in the strictest sense of the word, which
includes the majority of the available data (typically around 70%-80%)
and is actually used to train the model, that is, to calculate the weights
of the artificial neurons in case of ANNs. The second set is named vali-
dation dataset and is aimed at tuning the model hyperparameters, such
as learning and dropout rates, penalty coefficients in regularization
terms or even the number of units or layers, in order to improve the
model fit on the training data. The validation dataset might not be pre-
sent in the simplest ML implementations, when all hyperparameters
have been defined by the developer prior to training. The latter set is
the test dataset , which includes data which has not been seen by the
model, that is, neither used for learning the weights nor for tuning the
hyperparameters, and therefore allows for an unbiased assessment of
the model accuracy and robustness. The test dataset should be used
only when the model is completely trained; if modifications to the
model architecture or hyperparameters are performed after testing,
for example, to further improve the accuracy or to reduce overfitting,
a new test should be performed on another set of data which has not
been seen previously by the model.
For a proper assessment of the model performance, it is critical
that training, validation, and test datasets do not overlap. Besides,
selection bias should be avoided when creating the three datasets
from the available data; all sets should be equally representative sam-
ples of the data of interest. The quality of ground truth data is a fur-
ther issue of uttermost importance, especially when the size of the
database is limited; noisy ground truth would result in outputs which
are inaccurate to some extent, depending on the amount of data avail-
able.
56
As a matter of fact, there is no precise rule to estimate the
minimum size of the training database for a good performance of the
model. Heuristics methods as well as naive guesses are sometimes
used to this purpose; a more comprehensive evaluation requires train-
ing the model on databases of different sizes, and creates a learning
curve representing accuracy vs data size. An estimation of the mini-
mum required size can then be extrapolated from the curve.
Test automation is currently widely used in software engineering
to execute a large number of tests in a controlled and formalized envi-
ronment, by means of specifically designed software. Although this
technology still has to find its place in the rapidly evolving field of ML,
especially regarding medical applications, its adoption for model vali-
dation is easy to foresee in the next future. A prerequisite for such
advance is the definition of standardized data sets, which shall be
used to perform quantitative comparisons between different models.
The validation process may reveal either underfitting, which
results in poor performance of the model on all the three datasets, or
overfitting, which can be detected when good accuracies are achieved
on the training data, but the unbiased evaluation on the test dataset
reveals a poor outcome. Whereas addressing underfitting typically
GALBUSERA ET AL .7of20
involves increasing the complexity of the model, overfitting can be
remediated by means of specific techniques such as pooling and drop-
out layers or regularization, as mentioned above, or by simplifying the
model architecture.
5| APPLICATIONS OF AI AND ML IN SPINE
RESEARCH
AI technologies are having a major impact in several research fields
related to the spine, which is expected to further increase in the
future. In the following paragraphs, we summarize the published appli-
cations of AI and ML in various domains of spine research, such as
diagnostic imaging, prediction of treatment outcomes, and decision
support systems. Applications more closely related to basic science
such as biomechanics and motion analysis are covered as well.
5.1 | Localization and labeling of spinal structures
ML approaches have been employed to extract information such as
the location of vertebrae, discs and spinal shape from radiological
images like planar radiographs, computed tomography (CT) and MRI
scans. As a matter of fact, localizing anatomical structures in an imag-
ing dataset is commonly a first step toward the development of fully
automated methods for the detection and classification of pathologi-
cal features, or to predict the outcome of therapies.
In addition to methods not strictly related to ML, based, for exam-
ple, on thresholding and heuristic search,
57,58
proper ML techniques
have been used for localization tasks. Schmidt used a classification
tree to generate a probability map of the location of each interverteb-
ral disc centroid in MRI scans, which were then used by a probabilistic
graphical model to infer the most likely location, resulting in an aver-
age localization error of 6.2 mm with respect to a human-created ref-
erence.
59
Oktay and Akgul trained an SVM for disc localization based
on a feature descriptor, the pyramidal histogram of oriented gradients,
obtaining mean localization errors ranging between 2.6 and 3.6 mm
depending on the disc level.
60
In simple words, the method was based
on a sliding window, which is a rectangular region which slides over a
multiscaled version of the original image; for each position of the win-
dow, the value of the feature descriptor is calculated, and passed as
input to the SVM to determine if the current window contains an
intervertebral disc. When a set of the most likely disc locations have
been calculated, a graphical model is used to infer the position of each
specific disc. The same authors expanded and improved the method
to allow also localizing the vertebrae, achieving average errors lower
than 4 mm.
25
Glocker et al confronted the challenging topic of locali-
zation of vertebrae in CT datasets of pathological spines, including
severe scoliosis, sagittal deformity and presence of fixation devices,
obtaining mean localization errors between 6 and 8.5 mm
61,62
(Figure 7). The proposed method was based on classification random
forests trained to determine the location of the vertebral centroid,
and employed novel techniques to generate appropriate training data
and to eliminate false positive predictions.
More recently, ANNs and deep learning were also employed for
the localization of spinal structures. Chen et al used a hybrid method
involving a random forest classifier which performs a first coarse local-
ization used to drive a deep CNN
63,64
; this approach allowed for a
clear improvement with respect to the previous state of the art not
based on deep learning,
62
that is, average localization errors for the
centroid of the intervertebral disc of 1.6 to 2 mm. The same research
group also used CNNs, both based on a 2D convolution, that is, pro-
cessing separately the single slices, and a novel 3D convolutional
layer.
65
Suzani et al used a six-layer neural network to localize the ver-
tebral centroids by means of a regression task: for each voxel in the
dataset, the network voted the vector connecting the voxel itself to
the centroid. The votes were then used to statistically estimate the
most probable location of the vertebral centroid
66
. An alternative
approach was presented by Payer et al, who used 2D and 3D CNNs
to build regression heatmaps of the landmark locations
67
; the method
was, however, not applied to spine images. In several papers, after a
FIGURE 7 Examples of localization of the vertebral centroids from a literature study,
61
dealing with different types of CT images (from left to
right: standard, low resolution, noisy, cropped). Manual annotations by an expert operator are shown in yellow, whereas the computer predictions
are in red. The numbers indicate the mean absolute error (MAE) with respect to the manual annotations. Reproduced with permission from
Glocker et al
61
8of20 GALBUSERA ET AL.
satisfactory localization of the vertebral or disc centroids has been
achieved, the labeling task was performed by fitting a graphical
model.
68,69
Recent works achieved high accuracies with complex
models able to perform the localization of landmarks and vertebral
centroids by taking as inputs the whole 3D dataset, without any pre-
liminary coarse localization or sliding window approach. Yang et al
were able to achieve localization errors for the vertebral centroid
between 6.9 and 9 mm in CT scans of patients suffering from various
pathologies as well as subjected to surgical instrumentation, with
strongly variable fields of view as well as image resolution
70
.
As a matter of fact, state-of-the-art techniques for localizing and
labeling spinal structures have achieved high performance comparable
to that of expert human observers. Detection and labeling functions
are nowadays already integrated in commercial Picture and Archiving
Communication System and commercially available clinical imaging
software, although technical details about those have not been pub-
licly disclosed.
5.2 | Segmentation
A key problem in image analysis is understanding the content of the
image, that is, subdividing the image in regions at a pixel level so that
each pixel belongs to a specific region. This process is named semantic
segmentation and can be conducted either manually or automatically;
this topic has been the subject of a vast body of literature, since it is
fundamental for applications such as computer vision and autono-
mous driving.
71
In medical imaging, in addition to identifying if a pixel
belongs, for example, to a disc, the segmentation algorithm should
typically determine to which specific instance it belongs (eg, either
L1-L2 or L2-L3). This type of segmentation is named instance segmen-
tation, and is the most relevant for spine research.
72
Assessing the quality of a segmentation algorithm involves the
definition of quantitative metrics, which might be less intuitive than
the localization error employed in localization tasks. Among the sev-
eral metrics which have been introduced in previous studies, the most
common ones are the Dice similarity coefficient (DSC), which
expresses the amount of spatial overlap between the segmented
image and the ground truth, and the mean surface distance (MSD),
which describes the mean distance between every surface voxel of
the segmented surface from the closest surface voxel in the ground
truth.
Many papers introduced methods for spine segmentation not
involving ML techniques, which in several cases required the interven-
tion of the user
73–75
; fully automated methods were described as
well.
76
Other methods relied on fitting deformable anatomical models
to the images by means of optimization procedures.
76–78
Among
many published techniques, the ones based on graphs and the normal-
ized cuts were especially successful,
79,80
as well as methods derived
from them.
81–83
For example, by using normalized cuts, Ayed et al
79
achieved DSC values of 0.88 and MSD of 2.7 mm. Marginal space
learning assumes that the pose and shape of the object to be seg-
mented is quantized in a number of parameters.
84,85
A large number
of hypotheses covering the parameter space, that is, describing all the
possible poses of the object, are then formulated; the best hypothesis
is selected by means of a classifier.
In recent years, CNNs specifically designed for instance segmen-
tation tasks were employed. Chen et al
65
used a deep CNN including
3D convolutional layers to generate the probability of belonging to a
specific region at the voxel level. Postprocessing techniques including
thresholding and smoothing were used to refine the segmentation.
Lessmann et al
86
introduced a 3D CNN with a memory component in
order to remember which vertebrae were already classified. In order
to be able to process large datasets, the technique uses a 3D sliding
window approach which first determines the position in which the
window contains an entire vertebra, and then performs the pixel-level
segmentation with a deep classifier. The memory is then updated so
that if a portion of the already segmented vertebrae is detected while
looking for the next ones, it is then ignored. This method allowed
achieving outstanding accuracies, with an average DSC of 0.94 and
MSD of 0.2 mm.
Although promising results have been achieved, the segmentation
of the anatomical structures of the spine still appears to have large
room for improvements. Indeed, spine segmentation challenges have
been proposed even very recently (Computational Methods and Clini-
cal Applications for Spine Imaging (Figure 8), xVertSeg (http://lit.fe.
uni-lj.si/xVertSeg/overview.php),
87,88
and databases hosting anno-
tated images to be used for the development of new segmentation
methods are currently publicly available (http://spineweb.digitalima
ginggroup.ca/spineweb/).
5.3 | Computer-aided diagnosis and diagnostic
imaging
The use of ML for diagnostic purposes dates back to the 1980s. In
1988, Bounds et al
89
trained a multilayer perceptron to diagnose low
back pain and sciatica, with reported accuracies ranging between 77%
and 82%, better than those obtained by human medical doctors (68%-
76%) (. Symptoms and previous medical history, in a standardized
form, were used as training data; as output, the ANN classified the
back pain in four categories, namely simple back pain, radicular pain,
spinal pathology (tumor, inflammation, or infection), and back pain
with significant psychological overlay. More recently, most papers
exploited the availability of imaging data to perform the automated
diagnosis of a spinal disorder. Nowadays, the use of ML for diagnostic
imaging of the spine encompasses several types of disorders, such as
degenerative diseases, spinal deformities as well as oncology.
Similar to the detection and segmentation of spinal structures,
the first published works about computer-aided diagnosis based on
medical imaging employed non-ML techniques based on classical
image processing techniques,
90
or simple ML methods such as Bayes-
ian classifiers.
91
Shallow ANNs such as perceptrons were also used in
the 2000s for various purposes, for example, detecting osteophytes.
92
Two automated classification systems for degenerated intervertebral
discs on T2-weighted MRI images were presented in 2009,
74,93
and
both provided a binary output (" normal" vs "degenerated" ). One study
was based on a simple statistical model trained on 30 MRI datasets,
93
whereas the other paper employed a Bayesian binary classifier and
exploited MRI scans from 34 patients
74
; both studies took into
account information about the signal intensity and the texture of the
disc. In 2011, Ghosh et al tested several different classifiers in
GALBUSERA ET AL .9of20
performing the same task, including an SVM, all trained on 35 MRI
stacks,
94
obtaining accuracies ranging between 80% and 94%; the
SVM resulted to be the most accurate technique. Hao et al
95
pro-
posed an SVM-based method which considered, in addition to the
intensity and texture information, the shape of the disc in order to
classify it as degenerated or not; accuracies up to 91.6% were
achieved. Oktay et al
96
further refined such approach by including
information from the T1-weighted MRI scan. A significant advance
was provided by the works of Ruiz-Espana et al
97
and Castro-Mateos
et al,
98
who classified disc degeneration not on a binary basis but fol-
lowing the classification scheme published by Pfirrmann et al,
15
which
describes five degeneration degrees and is commonly employed in the
clinical practice. Both studies included the extraction of features
describing the intensity as well as the shape of the discs which were
then passed to a classifier, which was a custom solution in the former
paper and a simple ANN in the latter one. Prior to the feature extrac-
tion, the discs were segmented automatically in both works. The
paper by Jamaludin et al
99
introduced several improvements and inno-
vations, such as the collection of a high number of disc images to be
used for training and testing, namely 12 018 discs from 2009 patients
whereas most previous papers involved less than 100 MRI datasets,
and the use of a CNN as a classifier, which obviated the need for a
segmentation prior to the classification (Figure 9). The method
allowed achieving an agreement with human observations of 70.1%,
comparable to the reported inter-rater agreement between distinct
expert radiologists of 70.4%. Furthermore, the same method was used
to successfully detect other features such as endplate lesions and
marrow changes. Recently, Niemeyer and coworkers used a deep
CNN and further increased the size of the training set, setting the
state-of-the-art accuracy for automatic degeneration grading with the
Pfirrmann classification system at 97%.
100
Aside from the degenerative spine, ML techniques have been also
applied to the study of spinal deformities. The research area which
has been impacted to the largest extent by ML is the evaluation of the
severity of adolescent idiopathic scoliosis by means of noninvasive
techniques such as surface topography. As a matter of fact, such tech-
niques do not offer a direct visualization of the spine; the extraction
of clinically relevant conclusions can therefore take a decisive advan-
tage from inference tools which can exploit subtle patterns in the data
which may not be visible to human observers. Ramirez et al
101
classi-
fied surface topographies of scoliotic patients in three categories,
namely mild, moderate, and severe curves, by means of an SVM, a
decision tree, and a technique derived from statistics, the linear dis-
criminant analysis. The authors achieved an accuracy of 85% with the
SVM, which outperformed the other classifiers. Bergeron et al
102
used
a regression SVM to extract the spinal centerline from surface topog-
raphy, using as ground truth data obtained from biplanar radiographs
of 149 scoliotic subjects. The first attempt to predict the curve type, a
simplified version of the Lenke classification system distinguishing
three types of scoliotic curves,
103
was performed by Seoud et al,
26
who used an SVM trained on radiographs from 97 adolescent subjects
suffering from idiopathic scoliosis, and achieved an overall accuracy of
72.2% with respect to diagnoses based on measurements conducted
on planar radiographs. More recently, Komeili et al
104
trained a deci-
sion tree to classify surface topography data into mild, moderate and
severe curves as well as to identify the curve location (thoracic-thora-
columbar, proximal thoracic, or lumbar), in order to determine the risk
of curve progression. The model was able to detect 85.7% of the pro-
gression curves and 71.6% of the nonprogression ones.
The analysis of radiographic data of patients suffering from spinal
deformities has also been tackled exploiting ML techniques. The chal-
lenging automated analysis of the Cobb angle describing the severity
FIGURE 8 Five automated segmentation methods for CT scans developed in the frame of the grand challenge organized by the International
Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI) Workshop on Computational Spine Imaging (CSI 2014).
Reprinted with permission from Yao et al
87
10 of 20 GALBUSERA ET AL.
of a scoliotic curve has been confronted with various approaches,
ranging from non-ML methods such as the fuzzy Hough transform
105
to deep learning techniques. Sun et al
106
used a regression SVM to
predict the Cobb angle from coronal radiographs, with a very good
accuracy (relative root mean squared error of 21.6%) highlighting a
potential clinical use. Zhang et al
107
trained a deep ANN to predict
the vertebral slopes on coronal radiographic images and used the
slope data to estimate the Cobb angle, achieving absolute errors lower
than 3 .Wuetal
108
and Galbusera et al
109
exploited the three-
dimensional information contained in biplanar radiographs to perform
a more comprehensive assessment of the pathological curvature. See-
ing the problem from another perspective, Thong et al
110
attempted
to use an unsupervised clustering method to obtain a novel classifica-
tion scheme for adolescent idiopathic scoliosis which effectively
describes the variability of the curves among the subjects. Based on
915 biplanar radiographs, the clustering method defined 11 classes
differing based on the location of the main curve, in particular of the
apical vertebra, as well as kyphosis and lordosis (Figure 10).
Although the definition of computer-aided detection (CADe) sys-
tems is rather general and may cover all the studies which have been
summarized in this paragraph, this name is commonly employed in the
scientific literature to describe computer programs able to identify
and localize relevant features such as lesions and fractures in medical
images, with the aim of reducing the risk of missed diagnosis and
favoring incidental findings. In the spine field, CADe systems have
been used to detect and classify with good success vertebral fractures
using either a regression SVM
112
or a CNN,
113
with accuracies up to
95% for vertebral body compression fractures. CADe systems are also
being developed for the detection of spine metastases on CT scans,
which has been undertaken by using a classifier trained on a number
of features extracted from the image of each single vertebra.
114,115
The developed systems were able to detect both lytic and blastic
lesions in real time, with occasional false positives requiring the judg-
ment of a human operator. Burns et al
116
developed an alternative
approach, in which a watershed segmentation algorithm was used to
identify large regions with similar intensities, which were considered
FIGURE 9 Top: workflow to perform classification tasks on lumbar MRI scans from a literature study.
99
First, vertebrae are detected, then the
volumes corresponding to the intervertebral discs are extracted and passed to a classifier. Bottom: the various radiological parameters (Pfirrmann
grading of disc degeneration
15
; disc narrowing; spondylolisthesis; central canal stenosis; endplate defects; marrow changes) automatically
extracted from the images in the same study. Reproduced from Jamaludin et al
99
GALBUSERA ET AL . 11 of 20
FIGURE 10 Eleven clusters of spine curves of patients suffering from adolescent idiopathic scoliosis, automatically determined from a large
database of biplanar radiographs.
110
For each cluster, exemplary radiographs, da Vinci views,
111
coronal and top views of the three-dimensional
reconstructions are shown. Reproduced with permission from Thong et al
110
12 of 20 GALBUSERA ET AL.
as candidate lesions. By means of an SVM classifier processing fea-
tures extracted from the shape, location, and intensity of the region,
the method determined if the candidate region is indeed a tumoral
lesion. This method was also rather prone to produce false positives
(620 false-positive detections vs 439 true-positive lesions), which
appear to be an issue requiring further research efforts.
In summary, in light of the tremendous advances which have been
observed in recent years, there is no doubt that ML is bringing a revo-
lution to diagnostic imaging, both in general and concerning the study
of spine disorders. Although the figure of a human radiologist is not
going to be replaced by a computer soon, also taking into account
ethics aspects such as the issue of individual responsibility, the poten-
tial impact of accurate and reliable automated diagnostic tools is
enormous.
5.4 | Outcome prediction and clinical decision
support
Predictive analytics is a branch of statistics aimed at making predictions
about the future based on available data from the past, and has been
largely impacted by novel AI technologies and big data sources.
117
Healthcare has shown interest in predictive analytics since its early
days, due to its large potential in providing improvements to patient
care and financial management. Applications of predictive analytics
which have been applied to healthcare include the identification of
chronic patients at risk of poor health outcome and who may benefit
from interventions, the development of personalized medicine and
therapies, the prediction of adverse events during the hospital stay,
and the optimization of the supply chain.
In the last decade, several studies presented models aimed at pre-
dicting various aspects of the outcome of spine surgeries, a selection
of those is described below. McGirt et al
118
used simple statistics-
derived techniques such as linear and logistic regression to predict
values such as the Oswestry Disability Index (ODI)
119
1 year after the
surgery, the occurrence of complications, readmission to the hospital,
and return to work. The prediction model was based on data from
750 to 1200 patients, and scored accuracies between 72% and 84%
regarding complications and return to work. The predictors taken into
account by the model were more than 40 and included the preopera-
tive ODI, age, ethnicity, body mass index, a detailed description of the
symptoms, the possible presence of other spinal disorders as well as
various scores describing the health and functional status of the
patient. More recently, Kim et al
120
used logistic regression and a shal-
low ANN to specifically predict the occurrence of four types of major
complications in patients undergoing spine fusion, namely cardiac
complications, wound complications, venous thromboembolism, and
mortality, and achieved results largely better than by using the clinical
score commonly employed for such applications (Figure 11). A similar
approach was used by Lee et al
121
who focused on the prediction of
surgical site infection. Interestingly, a successive study performed an
external validation, that is, based on another sample of patients, of
this predictive model, highlighting several limitations and showing a
generally poor performance.
122
Recently, a large retrospective
study
123
presented an ensemble of decision trees to predict, with an
overall accuracy of 87.6%, major intraoperative or perioperative
complications following adult spine deformity surgery. Durand et al
investigated a different outcome, the necessity of blood transfusion
after adult deformity surgery, which was predicted with good success
using single decision trees and a random forest.
124
An application of predictive analytics which is nowadays finding a
wide use in the clinical practice is the decision support tool (DST),
which exploits the predictive power of the models to support clinical
decisions by providing personalized predictions. A recent example of
DST in spine care is the Nijmegen Decision Tool for Chronic Low Back
Pain,
125,126
which is based on predictors covering various aspects of
the patient's health (namely, sociodemographic, pain, somatic, psycho-
logical, functioning, and quality of life) to suggest either surgical treat-
ment, conservative care, or no intervention. This DST is still under
development, and the technical implementation of the decision has
not been finalized yet.
Compared to the other applications of AI and ML in spine
research, predictive analytics and clinical decision support currently
appear to be at a lower level of development. As a matter of fact,
there is no DST based on ML techniques to support the decisions in
spine surgery, for example, the length of instrumentation and the
choice of the anchoring implants in spine deformity surgery. Imaging
data are usually not exploited by predictive models, which are not
generally based on state-of-the-art techniques such as deep learning.
Indeed, large databases including clinical and imaging data, which
would be necessary to train such models, are still lacking, under con-
struction or inaccessible by AI researchers. Nevertheless, the recent
proliferation of national and local spine registries, some of these
including imaging data, will likely allow for significant advances in the
near future also in this field.
5.5 | Content-based image retrieval
The digital imaging databases of large hospitals typically contain sev-
eral thousands of images for each anatomical district and imaging
modality. To facilitate image retrieval for clinical studies or educational
purposes, many institutions implement an indexing based on the con-
tent of each image, so that the whole imaging database can be easily
searched by means of keywords. This indexing process is commonly
manually performed, but is a cumbersome, error-prone and expensive
task.
127
Automated content-based image retrieval (CBIR) has become
an active area of research in recent years, and is strongly benefiting
from the introduction of ML techniques.
Several CBIR frameworks employ the so-called relevance feedback,
which consists in an evaluation of the relevance of each item returned
by the query.
128
This feedback can be either explicit, that is, the user
is asked to grade the relevance of the returned items, or implicit, that
is derived automatically from the user behavior, for example, based on
which documents are selected by the user for a closer inspection or
on the time spent looking at the item. Recent studies introduced ML
techniques such as SVMs to implement relevance feedback.
129
For
the classification of the images, most CBIR systems are based on sim-
ple solutions such as SVMs rather than on deep learning architec-
tures.
130,131
Nevertheless, recent studies started to employ deep
learning.
132,133
GALBUSERA ET AL . 13 of 20
Regarding spine imaging, a few sophisticated algorithms tailored
to exploit the features of spine images has been presented. Xu et al
134
proposed a novel relevance feedback algorithm for spine radiographs
retrieval based on the vertebral contour. The algorithm includes a
short-term memory feature which was able to keep a memory of the
human choices between different feedback iterations; the final selec-
tion about the relevance of each image is then performed by a deci-
sion tree. The same research group presented a CBIR system which
also took into account the shape of the intervertebral space.
135
5.6 | Biomechanics
So far, AI and ML impacted basic biomechanics to a lower extent with
respect to applied clinical and radiological research. Nevertheless, in
recent years, a few papers describing applications of ANNs for typical
biomechanical problems such as the estimation of loads and stresses
started to appear. Although studies specifically addressing spine bio-
mechanics are currently not available, we believe that it is worthy to
briefly mention here some ML-based studies investigating other mus-
culoskeletal districts, since the analysis of the state-of-the-art may
help in delineating the possible future fields of applications of ML
techniques in spine biomechanics.
ML has been used to estimate the material properties of biologi-
cal tissues. Chande et al
136
employed shallow ANNs to estimate the
relationship between the stiffness of the ligaments and the kinematics
of the foot in patients suffering from adult acquired flatfoot defor-
mity. In order to create the training data, the authors constructed and
employed patient-specific computer models of the foot anatomy. Zad-
poor et al
137
investigated a related problem, that is, the prediction of
the mechanical loads that determine certain mechanical properties of
a biological tissue subject to remodeling, namely trabecular bone. The
authors employed an existing biomechanical computational model
FIGURE 11 Example of heatmap showing the importance of the various factors (first column) in determining an outcome, namely the risk of
complications following posterior lumbar spine fusion, as predicted with machine learning (ML) techniques in a literature study.
120
Reproduced
with permission from Kim et al
120
14 of 20 GALBUSERA ET AL.
able to predict bone tissue adaptation under mechanical loading based
on the local strains, and used it to run a series of simulations in which
random loads were applied to a small bone trabecular sample. The
outputs of the simulations, that is, the remodeled local bone densities,
were used to train the ANN to predict the loads which induced that
form of remodeling.
Another field of application of ML is the calculation of stresses in
patient-specific analysis, thus eliminating the need for computationally
expensive finite element models. For example, Lu et al
138
developed a
shallow ANN able to predict the stress in the cartilage of the tibial pla-
teau and femoral condyles of the knee joint. A finite element model of
the knee was used to generate a dataset then used for training the
ANN, which was able to predict the stress in each element of the
articular cartilage with a dramatic reduction in time and cost with
respect to creating and solving the finite element model itself.
In general, the use of ML techniques in musculoskeletal biome-
chanics appears to be still in its infancy; the few published papers did
not exploit yet the potential of the latest innovations such as deep
learning. Nevertheless, the available papers clearly demonstrate the
potential of ML in this field. Computational models that are able to
predict the biomechanical response of bones, joints as well as the
spine are widely available and could be used for generating large data-
sets to be used as training data for ML models, as suggested previ-
ously.
138
This approach would facilitate a more widespread adoption
of patient-specific modeling in bench-to-bedside applications where
the computational resources and time required for the construction
and solution of a traditional biomechanical model may conflict with
the clinical demands.
5.7 | Motion and gait analysis
The quantitative analysis of human motion, and especially gait, with
cameras, optoelectronic systems, wearable inertial devices, electromy-
ography systems, force plates, and pressure sensors is widely
employed for the scientific and clinical investigations of several
pathologies. Indeed, the study of gait pattern alterations in patients
suffering from spinal disorders is a very active area of research.
139,140
Traditional gait analysis aims at the measurement of spatiotemporal
parameters such as walking velocity, stride and step lengths, cadence,
and duration of the stance and swing phases; kinematic parameters
such as the angles of rotation of the various joints; kinetic parameters
such as forces and moments in the joints, which typically involve the
use of force platforms. The value of these parameters are then com-
pared to reference ranges and used for diagnostic purposes, or to
monitor patient recovery. In addition to the study of gait, specific
motion analysis protocols have been developed for the investigation
of spine motion during common activities such as standing, chair rise
sitting, stair climbing, and flexing the trunk.
141
In the last two decades, this consolidated approach has been
revisited while ML techniques have been gaining a wide use in several
research fields.
142
Recent papers employed ML techniques such as
SVMs
143–145
and ANNs
146
for the classification of abnormal gait pat-
terns with good success. However, only a few studies involving ML
techniques to investigate spinal disorders have been presented so far;
this lack of documentation reflects the technical difficulties in
assessing position and motion of the vertebrae due to soft tissue arti-
facts.
147
An example of a pioneering study in this field is offered by
Hayashi et al,
148
who trained an SVM to distinguish gait patterns
associated to either L4 or L5 radiculopathy in patients suffering from
lumbar canal stenosis, achieving an accuracy of 80.4%.
ML has also been successfully employed to investigate spine dis-
orders by means of electromyography systems.
149
The authors built
an SVM to identify patients responding to a functional restoration
rehabilitation program for chronic low back pain, based on dynamic
surface electromyography readings, with an accuracy of 96% on a
sample of 30 patients.
A radically different research field related to gait and ML concerns
humanoid or animal-shaped agents, that is, computer models, learning
how to walk and move in a simulated environment, which may be geo-
metrically complex and including obstacles. The process of learning to
walk consists in appropriately activating the actuators, which act as
the muscles in a human subject, while keeping equilibrium and achiev-
ing the locomotion goal, and has been shown to be very challenging
to be replicated in a ML framework. Indeed, the implementation of
such models requires sophisticated reinforcement algorithms, which
typically provide rewards when the model is able to accomplish its
goal, that is, reaching the target location, and punishments when the
agent fails, for example, if it falls on the ground. A good example of
the state of the art is offered by Heess et al
150
(https://www.
youtube.com/watch?v=hx_bgoTF7bs).
6| ETHICS ISSUES AND REGULATION
The implementation of AI technologies in healthcare, especially
regarding tools with a direct clinical impact such as those aimed at
supporting diagnosis or clinical decisions, is undoubtedly determining
a paradigm shift. Such a change of perspective involves the emer-
gence of several major ethics issues, which are being heatedly dis-
cussed both in the scientific community and by regulatory agencies.
Most AI technologies, notably including deep learning networks
which now are having a major role, appear as a black box to an exter-
nal user.
151
Although methods to visualize the inner structure and
behavior of the AI tools have been presented (eg,
152
) and more
human-readable technologies such as decision trees are also being
used, AI predictions appears largely to be determined by an obscure
logic which cannot be understood or interpreted by a human
observer.
153
This limitation directly leads to the issue of the account-
ability of the decisions, which is nowadays being debated at a regula-
tory level. In other words, if a prediction fails, for example, in case of
misdiagnosis, determining if the responsibility is of the radiologists
who used the AI system, of the device itself or of the manufacturer is
of critical importance. This obscure nature has also severe implications
regarding the marketing approval of novel AI tools, which require dee-
per testing and verification with respect to other technologies, and
thus longer time-to-market and cost.
A second issue concerns possible biases in the predictions, which
may be either intentional, that is, fraudulent, or unintended. Examples
of intentional biases are a DST preferably promoting the use of drugs
or devices by a specific manufacturer, or a tool designed to maximize
GALBUSERA ET AL . 15 of 20
a specific quality metrics relevant for the hospital but not necessarily
optimizing patients' care.
153
Unintended biases may be related to
scarce availability of data regarding some rare pathologies or pheno-
types, which may be then insufficiently covered in the training dataset
with respect to more common conditions, or ethnicities for which
datasets are indeed not existing or limited.
151
Besides, insufficient
data collection efforts, for example, by privileging data sources easier
to access, may also lead to unintended biases. To limit the impact of
such issues, efforts toward a governance of AI are starting to be
undertaken, with the final aim of building a robust public trust.
154
It
should be noted that cultural differences between the European
Union, the United States, and East Asian countries may likely result in
dramatically different attitudes from a regulatory and governance
point of view.
155
The use of AI in healthcare also raises serious concerns about
data privacy and security, due to the massive amount of clinical and
imaging data required for training and validation of the tools, thus
involving issues about data collection, transmission and storage, as
well as informed consent. Data anonymization is being commonly
used to enhance privacy and security; nevertheless, patients retain
rights on their anonymized data, which are subjected to strict regula-
tions about storage, transmission and use, especially when data are
used in a for-profit environment. The recent introduction of the Gen-
eral Data Protection Regulation in the European Union considerably
expanded the rights of the patients by adopting an explicit opt-in pol-
icy regarding the permission for data processing; on the other side, it
further enlarges the policy differences with the less strict United
States, thus possibly strengthening the leading role of this country in
AI innovation.
155
Due to the large amounts of investments related to
AI technologies and their potential economic consequences, policy
makers and regulatory agencies need to take into account these
aspects as well.
Following in the footsteps of the free software movement, pro-
viding open access to ML models and training data would be a possi-
ble way to foster public trust, as well as to improve accountability and
prediction bias by giving the scientific community the possibility of
further testing and developing these technologies. As a matter of
facts, source code for most of the recent AI and ML algorithms is pub-
licly available, released by public research institutions as well as com-
panies such as Google (Mountain View, California) and Nvidia (Santa
Clara, California). However, due to business and regulatory reasons,
the public release of detailed technical information about production-
ready ML software intended to be used for clinical applications is
highly unlikely to happen.
7| CONCLUSIONS
AI and ML are emerging disruptive technologies which have nowadays
reached a substantial level of development, enabling them to have
already a practical impact on several research fields. Computer vision
and image processing are especially gaining momentum, due to the
latest innovations in deep learning and improved accessibility of com-
putational resources, such as powerful GPUs. Indeed, most recent
spine research studies using AI and ML techniques are related to
medical imaging, but an increasing impact on other fields such as spine
biomechanics should be expected in the near future. Ethics aspects
related to accountability, data privacy and security as well as the risk
of biased predictions are relevant and are currently under the atten-
tion of policy makers and regulatory agencies.
CONFLICTS OF INTEREST
The authors declare that there is no conflict of interest regarding the
publication of this article.
Author contributions
F.G.: literature analysis, manuscript preparation and revision; G.C.: lit-
erature analysis, manuscript revision; T.B.: literature analysis, manu-
script preparation and revision.
ORCID
Fabio Galbusera https://orcid.org/0000-0003-1826-9190
REFERENCES
1. Samuel AL. Some studies in machine learning using the game of
checkers. IBM J Res Dev . 1959;3:210-229.
2. Nilsson NJ. The Quest for Artificial Intelligence . Cambridge, UK: Cam-
bridge University Press; 2009.
3. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recog-
nition. Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition; Piscataway, NJ: Institute of Electrical and Elec-
tronics Engineers (IEEE); 2016. ArXiv Preprint arXiv:1512.03385.
4. Hosny A, Parmar C, Quackenbush J, Schwartz LH, Aerts HJ. Artificial
intelligence in radiology. Nat Rev Cancer . 2018;18:510-518.
5. Slate DJ, Atkin LR. Chess 4.5— the Northwestern University chess
program. In: Frey PW, ed. Chess Skill in Man and Machine . New York,
NY: Springer; 1983.
6. Weizenbaum J. ELIZA— a computer program for the study of natural
language communication between man and machine. Commun ACM.
1966;9:36-45.
7. McCulloch WS, Pitts W. A logical calculus of the ideas immanent in
nervous activity. Bull Math Biophys . 1943;5:115-133.
8. Minsky M. Neural Nets and the Brain-Model Problem [doctoral dis-
sertation]. Princeton University, NJ; 1954.
9. Rosenblatt F. The perceptron: a probabilistic model for information
storage and organization in the brain. Psychol Rev . 1958;65:386-408.
10. Minsky M, Papert SA. Perceptrons: An Introduction to Computational
Geometry. Cambridge, UK: MIT Press; 1969.
11. Rumelhart DE, Hinton GE, Williams RJ. Learning representations by
back-propagating errors. Nature . 1986;323:533-536.
12. Schmidhuber J. Deep learning in neural networks: An overview. Neu-
ral Netw. 2015; 61:85-117.
13. Michalski RS, Carbonell JG, Mitchell TM. Machine Learning: An Artifi-
cial Intelligence Approach. Berlin, Germany: Springer Science & Busi-
ness Media; 2013.
14. Shalev-Shwartz S, Ben-David S. Understanding Machine Learning:
From Theory to Algorithms. Cambridge, UK: Cambridge University
Press; 2014.
15. Pfirrmann CW, Metzdorf A, Zanetti M, Hodler J, Boos N. Magnetic
resonance classification of lumbar intervertebral disc degeneration.
Spine. 2001;26:1873-1878.
16. Toyoda H, Takahashi S, Hoshino M, et al. Characterizing the course
of back pain after osteoporotic vertebral fracture: a hierarchical clus-
ter analysis of a prospective cohort study. Arch Osteoporosis . 2017;
12:82.
17. Domingos P. A few useful things to know about machine learning.
Commun ACM. 2012; 55:78-87.
16 of 20 GALBUSERA ET AL.
18. Lowe DG. Object recognition from local scale-invariant features.
Computer Vision, 1999, the Proceedings of the Seventh IEEE
International Conference. Piscataway, NJ: Institute of Electrical and
Electronics Engineers (IEEE); 1999. https://doi.org/10.1109/ICCV.
1999.790410.
19. Rublee E, Rabaud V, Konolige K, Bradski G. ORB: an efficient
alternative to SIFT or SURF. Computer Vision (ICCV), 2011 IEEE
International Conference. Piscataway, NJ: Institute of Electrical and
Electronics Engineers (IEEE); 2011. https://doi.org/10.1109/ICCV.
2011.6126544.
20. Russell SJ, Norvig P. Artificial Intelligence: A Modern Approach.
London, UK: Pearson Education Limited; 2016.
21. Frighetto-Pereira L, Rangayyan RM, Metzner GA, de Azevedo-
Marques PM, Nogueira-Barbosa MH. Shape, texture and statistical
features for classification of benign and malignant vertebral compres-
sion fractures in magnetic resonance images. Comput Biol Med . 2016;
73:147-156.
22. Unal Y, Polat K, Kocer HE. Pairwise FCM based feature weighting for
improved classification of vertebral column disorders. Comput Biol
Med. 2014;46:61-70.
23. Cortes C, Vapnik V. Support-vector networks. Mach Learn . 1995;20:
273-297.
24. Vapnik V. Pattern recognition using generalized portrait method.
Automat Rem Control. 1963; 24:774-780.
25. Oktay AB, Akgul YS. Simultaneous localization of lumbar vertebrae
and intervertebral discs with SVM-based MRF. IEEE Trans Biomed
Eng. 2013;60:2375-2383.
26. Seoud L, Adankon MM, Labelle H, Dansereau J, Cheriet F. Prediction
of scoliosis curve type based on the analysis of trunk surface topogra-
phy. Biomedical Imaging: From Nano to Macro, 2010 IEEE International
Symposium. Piscataway, NJ: Institute of Electrical and Electronics Engi-
neers (IEEE); 2010. https://doi.org/10.1109/ISBI.2010.5490322.
27. Belson WA. Matching and prediction on the principle of biological
classification. Applied Statistics . 1959;8(2):65-75.
28. Morgan JN, Sonquist JA. Problems in the analysis of survey data, and
a proposal. J Am Stat Assoc . 1963;58:415-434.
29. Breiman L. Classification and Regression Trees . London, UK: Routle-
gde; 2017.
30. Quinlan JR. C4. 5: Programs for Machine Learning . Burlington: Morgan
Kaufmann; 2014.
31. Quinlan JR. Induction of decision trees. Mach Learn . 1986;1:81-106.
32. Breiman L. Random forests. Mach Learn . 2001;45:5-32.
33. Nijeweme-d'Hollosy WO, van Velsen LS, Soer R, Hermens HJ.
Design of a web-based clinical decision support system for guiding
patients with low back pain to the best next step in primary health-
care. Proceedings of the 9th International Joint Conference on Biomedi-
cal Engineering Systems and Technologies (BIOSTEC 2016). Cham:
Springer International Publishing; 2016.
34. Oh T, Scheer JK, Smith JS, et al. Potential of predictive computer
models for preoperative patient selection to enhance overall quality-
adjusted life years gained at 2-year follow-up: a simulation in
234 patients with adult spinal deformity. Neurosurg Focus . 2017;
43:E2.
35. Varghese V, Kumar GS, Krishnan V. Effect of various factors on pull
out strength of pedicle screw in normal and osteoporotic cancellous
bone models. Med Eng Phys . 2017;40:28-38.
36. Varghese V, Krishnan V, Kumar GS. Evaluating pedicle-screw instru-
mentation using decision-tree analysis based on pulloutaStrength.
Asian Spine J. 2018; 12(4):611-621.
37. Yagi M, Akilah KB, Boachie-Adjei O. Incidence, risk factors and classi-
fication of proximal junctional kyphosis: surgical outcomes review of
adult idiopathic scoliosis. Spine . 2011;36:E60-E68.
38. Hebb DO. The First Stage of Perception: Growth of the Assembly.
London: Psychology Press; 2005:102-120.
39. Bottou L, Bousquet O. The tradeoffs of large scale learning. Advances
in Neural Information Processing Systems 20; Cambridge, MA: MIT
Press; 2008 https://papers.nips.cc/paper/3323-the-tradeoffs-of-
large-scale-learning.
40. Kingma DP, Ba J. Adam: a method for stochastic optimization. ArXiv
Preprint arXiv. 2014; 1412:6980.
41. Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial
nets. Advances in Neural Information Processing Systems ; Cambridge,
MA: MIT Press; 2014.
42. Lipton ZC, Berkowitz J, Elkan C. A critical review of recurrent neural
networks for sequence learning. ArXiv Preprint arXiv . 2015;1506:
00019.
43. Hubel DH, Wiesel TN. Receptive fields and functional architecture of
monkey striate cortex. J Physiol . 1968;195:215-243.
44. Hubel DH, Wiesel TN. Receptive fields, binocular interaction and
functional architecture in the cat's visual cortex. J Physiol . 1962;160:
106-154.
45. Denker JS, Gardner W, Graf HP, et al. Neural network recognizer for
hand-written zip code digits. Advances in Neural Information Proces-
sing Systems; Cambridge, MA: MIT Press; 1989. https://papers.nips.
cc/paper/107-neural-network-recognizer-for-hand-written-zip-
code-digits.
46. Fukushima K, Miyake S. Neocognitron: A Self-Organizing Neural Net-
work Model for a Mechanism of Visual Pattern Recognit. 1982;15(6):
455-469.
47. Scherer D, MĂŒller A, Behnke S. Evaluation of pooling operations in
convolutional architectures for object recognition. In: Diamantaras K,
Duch W, Iliadis LS, eds. Artificial Neural Networks— ICANN 2010.
ICANN 2010. Lecture Notes in Computer Science. Vol 6354. Berlin,
Heidelberg: Springer; 2010.
48. LeCun Y, Boser B, Denker JS, et al. Backpropagation applied to hand-
written zip code recognition. Neural Comput . 1989;1:541-551.
49. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature . 2015;521:
436-444.
50. LeCun Y, Boser BE, Denker JS, et al. Handwritten digit recognition
with a back-propagation network. Advances in Neural Information Pro-
cessing Systems. Cambridge, MA: MIT Press; 1990 https://papers.
nips.cc/paper/293-handwritten-digit-recognition-with-a-back-
propagation-network.
51. LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning
applied to document recognition. Proc IEEE . 1998;86:2278-2324.
52. Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with
deep convolutional neural networks. NIPS'12 Proceedings of the 25th
International Conference on Neural Information Processing Systems—
Volume 1. Cambridge, MA: MIT Press; 2012:1097-1105.
53. Simonyan K, Zisserman A. Very deep convolutional networks for
large-scale image recognition. ArXiv Preprint arXiv . 2014;1409:1556.
54. Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions. ArXiv
Preprint arXiv. 2015; 1409:4842.
55. Ripley BD. Pattern Recognition and Neural Networks . Cambridge, UK:
Cambridge University Press; 1996.
56. Krig S. Ground truth data, content, metrics, and analysis. Computer
Vision Metrics. Berkeley, CA: Apress; 2014.
57. Chwialkowski MP, Shile PE, Pfeifer D, Parkey RW, Peshock RM.
Automated localization and identification of lower spinal anatomy in
magnetic resonance images. Comput Biomed Res . 1991;24:99-117.
58. Peng Z, Zhong J, Wee W, Lee J. Automated vertebra detection and
segmentation from the whole spine MR images. Conference Proceed-
ings IEEE Engineering in Medicine and Biology Society. Vol 3. Piscat-
away, NJ: Institute of Electrical and Electronics Engineers (IEEE);
2006:2527-2530.
59. Schmidt S, Kappes J, Bergtholdt M, et al. Spine detection and labeling
using a parts-based graphical model. Inf Process Med Imaging . 2007;
20:122-133.
60. Oktay AB, Akgul YS. Localization of the lumbar discs using machine
learning and exact probabilistic inference. In: Fichtinger G, Martel A,
Peters T, eds. Medical Image Computing and Computer-Assisted Inter-
vention – MICCAI 2011. MICCAI 2011. Lecture Notes in Computer Sci-
ence. Vol 6893. Berlin, Heidelberg: Springer; 2011.
61. Glocker B, Feulner J, Criminisi A, Haynor DR, Konukoglu E. Auto-
matic localization and identification of vertebrae in arbitrary field-of-
view CT scans. In: Ayache N, Delingette H, Golland P, Mori K, eds.
Medical Image Computing and Computer-Assisted Intervention – MIC-
CAI 2012. MICCAI 2012. Lecture Notes in Computer Science. Vol
7512. Berlin, Heidelberg: Springer; 2012.
62. Glocker B, Zikic D, Konukoglu E, Haynor DR, Criminisi A. Vertebrae
localization in pathological spine CT via dense classification from
GALBUSERA ET AL . 17 of 20
sparse annotations. In: Mori K, Sakuma I, Sato Y, Barillot C, Navab N,
eds. Medical Image Computing and Computer-Assisted Intervention –
MICCAI 2013. MICCAI 2013. Lecture Notes in Computer Science. Vol
8150. Berlin, Heidelberg: Springer; 2013.
63. Chen C, Belavy D, Yu W, et al. Localization and segmentation of 3D
intervertebral discs in MR images by data driven estimation. IEEE
Trans Med Imaging. 2015a; 34:1719-1729.
64. Chen H, Shen C, Qin J, et al. Automatic localization and identification
of vertebrae in spine CT via a joint learning model with deep neural
networks. In: Navab N, Hornegger J, Wells W, Frangi A, eds. Medical
Image Computing and Computer-Assisted Intervention— MICCAI 2015.
MICCAI 2015. Lecture Notes in Computer Science. Vol 9349. Cham:
Springer; 2015b.
65. Chen H, Dou Q, Wang X, Qin J, Cheng JC, Heng P. 3D fully convolu-
tional networks for intervertebral disc localization and segmentation.
International Conference on Medical Imaging and Virtual Reality. Cham:
Springer; 2016. https://doi.org/10.1007/978-3-319-43775-0_34.
66. Suzani A, Seitel A, Liu Y, Fels S, Rohling RN, Abolmaesumi P. Fast
automatic vertebrae detection and localization in pathological ct
scans-a deep learning approach. In: Navab N, Hornegger J, Wells W,
Frangi A, eds. Medical Image Computing and Computer-Assisted Inter-
vention – MICCAI 2015. MICCAI 2015. Lecture Notes in Computer Sci-
ence. Vol 9351. Cham: Springer; 2015.
67. Payer C, Ć tern D, Bischof H, Urschler M. Regressing heatmaps for
multiple landmark localization using CNNs. In: Ourselin S,
Joskowicz L, Sabuncu M, Unal G, Wells W, eds. Medical Image Com-
puting and Computer-Assisted Intervention – MICCAI 2016. MICCAI
2016. Lecture Notes in Computer Science. Vol 9901. Cham: Springer;
2016.
68. Forsberg D, Sjöblom E, Sunshine JL. Detection and labeling of verte-
brae in MR images using deep learning with clinical annotations as
training data. J Digit Imaging . 2017;30:406-412.
69. Lootus M, Kadir T, Zisserman A. Vertebrae Detection and Labelling in
Lumbar MR Images. Cham: Springer; 2014:219-230.
70. Yang D, Xiong T, Xu D, et al. Deep image-to-image recurrent net-
work with shape basis learning for automatic vertebra labeling in
large-scale 3D CT volumes. In: Descoteaux M, Maier-Hein L, Franz A,
Jannin P, Collins D, Duchesne S, eds. Medical Image Computing and
Computer-Assisted Intervention − MICCAI 2017. MICCAI 2017. Lecture
Notes in Computer Science. Vol 10435. Cham: Springer; 2017.
71. Thoma M. A survey of semantic segmentation. ArXiv Preprint arXiv.
2016;1602 :06541.
72. Romera-Paredes B, Torr PHS. Recurrent instance segmentation.
ArXiv Preprint arXiv. 2016; 1511:08250.
73. Law MW, Tay K, Leung A, Garvin GJ, Li S. Intervertebral disc seg-
mentation in MR images using anisotropic oriented flux. Med Image
Anal. 2013;17:43-61.
74. Michopoulou SK, Costaridou L, Panagiotopoulos E, Speller R,
Panayiotakis G, Todd-Pokropek A. Atlas-based segmentation of
degenerated lumbar intervertebral discs from MR images of the
spine. IEEE Trans Biomed Eng . 2009;56:2225-2231.
75. Neubert A, Fripp J, Engstrom C, et al. Automated detection, 3D seg-
mentation and analysis of high resolution spine MR images using sta-
tistical shape models. Phys Med Biol . 2012;57:8357-8376.
76. Klinder T, Ostermann J, Ehm M, Franz A, Kneser R, Lorenz C. Auto-
mated model-based vertebra detection, identification, and segmenta-
tion in CT images. Med Image Anal . 2009;13:471-482.
77. Korez R, Ibragimov B, Likar B, PernuĆĄ F, Vrtovec T. Deformable
model-based segmentation of intervertebral discs from MR spine
images by using the SSC descriptor. In: Vrtovec T et al., eds. Compu-
tational Methods and Clinical Applications for Spine Imaging. CSI 2015.
Lecture Notes in Computer Science. Vol 9402. Cham: Springer; 2016.
78. Korez R, Ibragimov B, Likar B, PernuĆĄ F, Vrtovec T. Interpolation-
based shape-constrained deformable model approach for segmenta-
tion of vertebrae from CT spine images. In: Yao J, Glocker B,
Klinder T, Li S, eds. Recent Advances in Computational Methods and
Clinical Applications for Spine Imaging. Lecture Notes in Computational
Vision and Biomechanics. Vol 20. Cham: Springer; 2015.
79. Ayed IB, Punithakumar K, Garvin G, Romano W, Li S. Graph cuts with
invariant object-interaction priors: application to intervertebral disc
segmentation. Inf Process Med Imaging . 2011;22:221-232.
80. Carballido-Gamio J, Belongie SJ, Majumdar S. Normalized cuts in 3-D
for spinal MRI segmentation. IEEE Trans Med Imaging . 2004;23:
36-44.
81. Egger J, Kapur T, Dukatz T, et al. Square-cut: a segmentation algo-
rithm on the basis of a rectangle shape. PloS One . 2012;7 :e31064.
82. Huang S, Chu Y, Lai S, Novak CL. Learning-based vertebra detection
and iterative normalized-cut segmentation for spinal MRI. IEEE Trans
Med Imaging. 2009; 28:1595-1605.
83. Schwarzenberg R, Freisleben B, Nimsky C, Egger J. Cube-cut: verte-
bral body segmentation in MRI-data through cubic-shaped diver-
gences. PloS One . 2014;9 :e93389.
84. Kelm BM, Wels M, Zhou SK, et al. Spine detection in CT and MR
using iterated marginal space learning. Med Image Anal . 2013;17:
1283-1292.
85. Zheng Y, Barbu A, Georgescu B, Scheuering M, Comaniciu D. Four-
chamber heart modeling and automatic segmentation for 3-D cardiac
CT volumes using marginal space learning and steerable features.
IEEE Trans Med Imaging. 2008; 27:1668-1681.
86. Lessmann N, van Ginneken B, IĆĄ gum I. Iterative convolutional neural
networks for automatic vertebra identification and segmentation in
CT images. ArXiv Preprint arXiv . 2018;1804 :04383.
87. Yao J, Burns JE, Forsberg D, et al. A multi-center milestone study of
clinical vertebral CT segmentation. Comput Med Imaging Graph.
2016;49 :16-28. https://doi.org/10.1016/j.compmedimag.2015.
12.006.
88. Zheng G, Chu C, BelavĂœ DL, et al. Evaluation and comparison of 3D
intervertebral disc localization and segmentation methods for 3D T2
MR data: A grand challenge. Med Image Anal . 2017;35:327-344.
89. Bounds DG, Lloyd PJ, Mathew B, Waddell G. A multilayer perceptron
network for the diagnosis of low back pain. Proceedings of IEEE Inter-
national Conference on Neural Networks, San Diego. Piscataway, NJ:
Institute of Electrical and Electronics Engineers (IEEE); 1988. https://
doi.org/10.1109/ICNN.1988.23963.
90. Tsai M, Jou S, Hsieh M. A new method for lumbar herniated inter-
vertebral disc diagnosis based on image analysis of transverse sec-
tions. Comput Med Imaging Graph . 2002;26:369-380.
91. Koompairojn S, Hua KA, Bhadrakom C. Automatic classification sys-
tem for lumbar spine X-ray images. Computer-Based Medical Systems,
2006. CBMS 2006. 19th IEEE International Symposium. Piscataway,
NJ: Institute of Electrical and Electronics Engineers (IEEE); 2006.
https://doi.org/10.1109/CBMS.2006.54.
92. Cherukuri M, Stanley RJ, Long R, Antani S, Thoma G. Anterior osteo-
phyte discrimination in lumbar vertebrae using size-invariant fea-
tures. Comput Med Imaging Graph . 2004;28:99-108.
93. Raja'S A, Corso JJ, Chaudhary V, Dhillon G. Desiccation diagnosis in
lumbar discs from clinical MRI with a probabilistic model. Biomedical
Imaging: From Nano to Macro, 2009. ISBI'09. IEEE International Sympo-
sium. Piscataway, NJ: Institute of Electrical and Electronics Engineers
(IEEE); 2009. https://doi.org/10.1109/ISBI.2009.5193105.
94. Ghosh S, Raja'S A, Chaudhary V, Dhillon G. Computer-aided diagno-
sis for lumbar MRI using heterogeneous classifiers. Biomedical Imag-
ing: From Nano to Macro, 2011 IEEE International Symposium.
Piscataway, NJ: Institute of Electrical and Electronics Engineers
(IEEE); 2011. https://doi.org/10.1109/ISBI.2011.5872612.
95. Hao S, Jiang J, Guo Y, Li H. Active learning based intervertebral disk
classification combining shape and texture similarities. Neurocomput-
ing. 2013;101:252-257.
96. Oktay AB, Albayrak NB, Akgul YS. Computer aided diagnosis of
degenerative intervertebral disc diseases from lumbar MR images.
Comput Med Imaging Graph. 2014; 38:613-619.
97. Ruiz-España S, Arana E, Moratal D. Semiautomatic computer-aided
classification of degenerative lumbar spine disease in magnetic reso-
nance imaging. Comput Biol Med . 2015;62:196-205.
98. Castro-Mateos I, Pozo JM, Lazary A, Frangi AF. 2D segmentation of
intervertebral discs and its degree of degeneration from T2-weighted
magnetic resonance images. Medical Imaging 2014: Computer-Aided
Diagnosis. Bellingham, WA: The International Society for Optics and
Photonics (SPIE); 2014. https://doi.org/10.1117/12.2043755.
99. Jamaludin A, Lootus M, Kadir T, et al. ISSLS PRIZE IN BIOENGI-
NEERING SCIENCE 2017: automation of reading of radiological fea-
tures from magnetic resonance images (MRIs) of the lumbar spine
18 of 20 GALBUSERA ET AL.
without human intervention is comparable with an expert radiologist.
Eur Spine J. 2017; 26:1374-1383.
100. Niemeyer F, Galbusera F, Kienle A, Wilke H. A Deep Learning System
for Consistent Automatic Disc Degeneration Grading. Dublin: World
Congress of Biomechanics; 2018.
101. Ramirez L, Durdle NG, Raso VJ, Hill DL. A support vector machines
classifier to assess the severity of idiopathic scoliosis from surface
topography. IEEE Trans Inf Technol Biomed . 2006;10:84-91.
102. Bergeron C, Cheriet F, Ronsky J, Zernicke R, Labelle H. Prediction of
anterior scoliotic spinal curve from trunk surface using support vec-
tor regression. Eng Appl Artif Intel . 2005;18:973-983.
103. Lenke LG, Edwards CC, Bridwell KH. The Lenke classification of ado-
lescent idiopathic scoliosis: how it organizes curve patterns as a tem-
plate to perform selective fusions of the spine. Spine . 2003;28:S199-
S207.
104. Komeili A, Westover L, Parent EC, El-Rich M, Adeeb S. Monitoring
for idiopathic scoliosis curve progression using surface topography
asymmetry analysis of the torso in adolescents. Spine J . 2015;15:
743-751.
105. Zhang J, Lou E, Le LH, Hill DL, Raso JV, Wang Y. Automatic Cobb
measurement of scoliosis based on fuzzy Hough transform with ver-
tebral shape prior. J Digit Imaging . 2009;22:463-472.
106. Sun H, Zhen X, Bailey C, Rasoulinejad P, Yin Y, Li S. Direct estimation
of spinal Cobb angles by structured multi-output regression. In:
Niethammer M et al., eds. Information Processing in Medical Imaging.
IPMI 2017. Lecture Notes in Computer Science. Vol 10265. Cham:
Springer; 2017.
107. Zhang J, Li H, Lv L, Zhang Y. Computer-aided cobb measurement
based on automatic detection of vertebral slopes using deep neural
network. Int J Biomed Imaging . 2017;2017 :1-6. https://doi.org/10.
1155/2017/9083916.
108. Wu H, Bailey C, Rasoulinejad P, Li S. Automated comprehensive
Adolescent Idiopathic Scoliosis assessment using MVC-Net. Med
Image Anal. 2018; 48:1-11.
109. Galbusera F, Niemeyer F, Wilke HJ, et al. Fully automated radiologi-
cal analysis of spinal disorders and deformities: a deep learning
approach. Eur Spine J. under review.
110. Thong WE, Parent S, Wu J, Aubin CE, Labelle H, Kadoury S. Three-
dimensional morphology study of surgical adolescent idiopathic sco-
liosis patient from encoded geometric models. Eur Spine J . 2016;15:
3104-3113.
111. Labelle H, Aubin CE, Jackson R, Lenke L, Newton P, Parent S. Seeing
the spine in 3D: how will it change what we do? J Pediatr Orthop.
2011;31 (1 Suppl):S37-S45.
112. Burns JE, Yao J, Summers RM. Vertebral Body Compression Frac-
tures and Bone Density: Automated Detection and Classification on
CT Images. Radiology . 2017;284:788-797.
113. Roth HR, Wang Y, Yao J, Lu L, Burns JE, Summers RM. Deep convo-
lutional networks for automated detection of posterior-element frac-
tures on spine CT. ArXiv Preprint arXiv . 2016;1602:0020.
114. Hammon M, Dankerl P, Tsymbal A, et al. Automatic detection of lytic
and blastic thoracolumbar spine metastases on computed tomogra-
phy. Eur Radiol . 2013;23:1862-1870.
115. O'Connor SD, Yao J, Summers RM. Lytic metastases in thoracolum-
bar spine: computer-aided detection at CT— preliminary study. Radi-
ology. 2007;242:811-816.
116. Burns JE, Yao J, Wiese TS, Muñoz HE, Jones EC, Summers RM.
Automated detection of sclerotic metastases in the thoracolumbar
spine at CT. Radiology . 2013;268:69-78.
117. Amarasingham R, Patzer RE, Huesch M, Nguyen NQ, Xie B. Imple-
menting electronic health care predictive analytics: considerations
and challenges. Health Aff . 2014;33:1148-1154.
118. McGirt MJ, Sivaganesan A, Asher AL, Devin CJ. Prediction model for
outcome after low-back surgery: individualized likelihood of compli-
cation, hospital readmission, return to work, and 12-month improve-
ment in functional disability. Neurosurg Focus . 2015;39 :E13.
119. Fairbank JC, Couper J, Davies JB, O'Brien JP. The Oswestry low back
pain disability questionnaire. Physiotherapy . 1980;66:271-273.
120. Kim JS, Merrill RK, Arvind V, et al. Examining the ability of artificial
neural networks machine learning models to accurately predict
complications following posterior lumbar spine fusion. Spine . 2018;
43:853-860.
121. Lee MJ, Cizik AM, Hamilton D, Chapman JR. Predicting surgical site
infection after spine surgery: a validated model using a prospective
surgical registry. Spine J . 2014;14:2112-2117.
122. Janssen DM, van Kuijk SM, d'Aumerie BB, Willems PC. External vali-
dation of a prediction model for surgical site infection after thoraco-
lumbar spine surgery in a Western European cohort. J Orthop Surg
Res. 2018;13:114.
123. Scheer JK, Smith JS, Schwab F, et al. Development of a preoperative
predictive model for major complications following adult spinal
deformity surgery. J Neurosurg Spine . 2017;26:736-743.
124. Durand WM, DePasse JM, Daniels AH. Predictive modeling for blood
transfusion after adult spinal deformity surgery: a tree-based
machine learning approach. Spine . 2018;43 :1058-1066.
125. Coupé VM, van Hooff ML, de Kleuver M, Steyerberg EW,
Ostelo RW. Decision support tools in low back pain. Best Pract Res
Clin Rheumatol. 2016; 30:1084-1097.
126. van Hooff ML, van Loon J, van Limbeek J, de Kleuver M. The Nijme-
gen decision tool for chronic low back pain. Development of a clinical
decision tool for secondary or tertiary spine care specialists. PLoS
One. 2014;9:e104226.
127. Antani SK, Long LR, Thoma GR. Content-based image retrieval for
large biomedical image archives. Stud Health Technol Inform . 2004;
107(Pt 2):829-833.
128. SchĂŒtze H, Manning CD, Raghavan P. Introduction to Information
Retrieval. Cambridge, UK: Cambridge University Press; 2008.
129. Liu R, Wang Y, Baba T, Masumoto D, Nagata S. SVM-based active
feedback in image retrieval using clustering and unlabeled data. Pat-
tern Recognition. 2008; 41:2645-2655.
130. Hoi SC, Jin R, Zhu J, Lyu MR. Semisupervised SVM batch mode
active learning with applications to image retrieval. ACM Trans Inf
Syst. 2009;27:16.
131. Rahman MM, Bhattacharya P, Desai BC. A framework for medical
image retrieval using machine learning and statistical similarity
matching techniques with relevance feedback. IEEE Trans Inf Technol
Biomed. 2007;11:58-69.
132. Anavi Y, Kogan I, Gelbart E, Geva O, Greenspan H. Visualizing and
enhancing a deep learning framework using patients age and gender
for chest x-ray image retrieval. Medical Imaging 2016: Computer-
Aided Diagnosis, Proceedings Volume 9785. Bellingham, WA: The
International Society for Optics and Photonics (SPIE); 2016. https://
doi.org/10.1117/12.2217587.
133. Shah A, Conjeti S, Navab N, Katouzian A. Deeply learnt hashing for-
ests for content based image retrieval in prostate MR images. Medi-
cal Imaging 2016: Image Processing. Bellingham, WA: The
International Society for Optics and Photonics (SPIE); 2016. https://
doi.org/10.1117/12.2217162.
134. Xu X, Lee D, Antani SK, Long LR, Archibald JK. Using relevance feed-
back with short-term memory for content-based spine X-ray image
retrieval. Neurocomputing . 2009;72:2259-2269.
135. Lee D, Antani S, Chang Y, Gledhill K, Long LR, Christensen P. CBIR of
spine X-ray images on inter-vertebral disc space and shape profiles
using feature ranking and voting consensus. Data Knowl Eng . 2009;
68:1359-1369.
136. Chande RD, Hargraves RH, Ortiz-Robinson N, Wayne JS. Predictive
behavior of a computational foot/ankle model through artificial neu-
ral networks. Comput Math Methods Med . 2017;2017:3602928.
137. Zadpoor AA, Campoli G, Weinans H. Neural network prediction of
load from the morphology of trabecular bone. App Math Model.
2013;37:5260-5276.
138. Lu Y, Pulasani PR, Derakhshani R, Guess TM. Application of neural
networks for the prediction of cartilage stress in a musculoskeletal
system. Biomed Signal Process Control . 2013;8:475-482.
139. Haddas R, Belanger T. Clinical gait analysis on a patient undergoing
surgical correction of kyphosis from severe ankylosing spondylitis.
Int J Spine Surg. 2017; 11:18.
140. Mahaudens P, Mousny M. Gait in adolescent idiopathic scoliosis.
Kinematics, electromyographic and energy cost analysis. Stud Health
Technol Inform. 2010; 158:101-106.
GALBUSERA ET AL . 19 of 20
141. Leardini A, Biagi F, Merlo A, Belvedere C, Benedetti MG. Multi-
segment trunk kinematics during locomotion and elementary exer-
cises. Clin Biomech (Bristol, Avon) . 2011;26(6):562-571.
142. Prakash C, Kumar R, Mittal N. Recent developments in human gait
research: parameters, approaches, applications, machine learning
techniques, datasets and challenges. Artif Intell Rev . 2018;49:1-40.
143. Fukuchi RK, Eskofier BM, Duarte M, Ferber R. Support vector
machines for detecting age-related changes in running kinematics.
J Biomech. 2011; 44:540-542.
144. Lai DT, Begg RK, Palaniswami M. Computational intelligence in gait
research: a perspective on current applications and future challenges.
IEEE Trans Inf Technol Biomed. 2009; 13:687-702.
145. Zhang J, Lockhart TE, Soangra R. Classifying lower extremity muscle
fatigue during walking using machine learning and inertial sensors.
Ann Biomed Eng. 2014; 42:600-612.
146. Begg R, Kamruzzaman J. A machine learning approach for automated
recognition of movement patterns using basic, kinetic and kinematic
gait data. J Biomech . 2005;38:401-408.
147. Stucovitz E, Vitale J, Galbusera F. In vivo measurements: motion
analysis. Biomechanics of the Spine— Basic Concepts, Spinal Disorders
and Treatments. London, UK: Academic Press; 2018.
148. Hayashi H, Toribatake Y, Murakami H, Yoneyama T, Watanabe T,
Tsuchiya H. Gait analysis using a support vector machine for lumbar
spinal stenosis. Orthopedics . 2015;38:e959-e964.
149. Jiang N, Luk KD, Hu Y. A machine learning-based surface electromy-
ography topography evaluation for prognostic prediction of func-
tional restoration rehabilitation in chronic low back pain. Spine . 2017;
42:1635-1642.
150. Heess N, Sriram S, Lemmon J, et al. Emergence of locomotion
behaviours in rich environments. ArXiv Preprint arXiv . 2017;1707 :
02286.
151. Pesapane F, Volonté C, Codari M, Sardanelli F. Artificial intelligence
as a medical device in radiology: ethical and regulatory issues in
Europe and the United States. Insights Imaging . 2018;9(5):745-753.
152. Simonyan K, Vedaldi A, Zisserman A. Deep inside convolutional net-
works: Visualising image classification models and saliency maps.
ArXiv Preprint arXiv. 2013; 1312:6034.
153. Char DS, Shah NH, Magnus D. Implementing machine learning in
health care - addressing ethical challenges. N Engl J Med . 2018;378:
981-983.
154. Winfield AF, Jirotka M. Ethical governance is essential to building
trust in robotics and artificial intelligence systems. Philos Trans Royal
Soc A. 2018;376:20180085.
155. Thierer, A. D., Castillo, A., Russell, R. Artificial Intelligence and Public
Policy. Arlington, VA: Mercatus Research, Mercatus Center at George
Mason University; 2017. https://www.mercatus.org/system/files/
thierer-artificial-intelligence-policy-mr-mercatus-v1.pdf
How to cite this article: Galbusera F, Casaroli G, Bassani T.
Artificial intelligence and machine learning in spine research.
JOR Spine. 2019;e1044. https://doi.org/10.1002/jsp2.1044
20 of 20 GALBUSERA ET AL.
... In the last decade, a significant increase in the use of Artificial Intelligence (AI) has been experienced in the most disparate fields, ranging from vocal assistants commonly employed during our daily life to self-driving cars. Thanks to the unique ability of intelligent machines to be trained and automatically acquire new tasks based on previous experience or provided data, the use of AI is being increasingly investigated for applications in medical research [1]. Indeed, AI-based computers have already shown to potentially revolutionize drug design and discovery [2,3], automatic segmentation and relevant data extraction from radiological datasets [4] as well as the formulation of diagnosis, outcome prediction and treatment planning in different medical fields [5][6][7]. ...
... Indeed, AI-based computers have already shown to potentially revolutionize drug design and discovery [2,3], automatic segmentation and relevant data extraction from radiological datasets [4] as well as the formulation of diagnosis, outcome prediction and treatment planning in different medical fields [5][6][7]. The adoption of this ground-breaking technology is being explored in spine surgery as well [1]. Indeed, thanks to its interdisciplinary nature and the wide utilization of radiological images to inspect the anatomical structures of the spine, the use of AI may be of particular value in determining, for example, which are the pathological discs [8], classifying a scoliotic curve [9] and predict its progression [10]. ...
... The other four reviews do not focus specifically on LBP. In detail, in 2019 Tack [14] focused on musculoskeletal medicine in general, and determined in which fields AI had reached human prediction levels; in 2020, Azimi et al. [15] focused on the use of NNs for the treatment of the whole spine; in 2019, Galbusera et al. [1] described the application of AI to problems related to the whole spine; finally, in 2016 Yao et al. [16] performed a multi-center milestone comparative study for vertebral segmentation methods based on CT images. Two articles presenting databases were also found: LUMINOUS, which is a database of ultrasound images from 109 patients for multifidus muscle segmentation [17], and MyoSegmentum, which includes MRI images of 54 patients for the segmentation of lumbar muscles and vertebral bodies [18]. ...
Chronic Low Back Pain (LBP) is a symptom that may be caused by several diseases, and it is currently the leading cause of disability worldwide. The increased amount of digital images in orthopaedics has led to the development of methods related to artificial intelligence, and to computer vision in particular, which aim to improve diagnosis and treatment of LBP. In this manuscript, we have systematically reviewed the available literature on the use of computer vision in the diagnosis and treatment of LBP. A systematic research of PubMed electronic database was performed. The search strategy was set as the combinations of the following keywords: "Artificial Intelligence", "Feature Extraction", "Segmentation", "Computer Vision", "Machine Learning", "Deep Learning", "Neural Network", "Low Back Pain", "Lumbar". Results: The search returned a total of 558 articles. After careful evaluation of the abstracts, 358 were excluded, whereas 124 papers were excluded after full-text examination, taking the number of eligible articles to 76. The main applications of computer vision in LBP include feature extraction and segmentation, which are usually followed by further tasks. Most recent methods use deep learning models rather than digital image processing techniques. The best performing methods for segmentation of vertebrae, intervertebral discs, spinal canal and lumbar muscles achieve SĂžrensen–Dice scores greater than 90%, whereas studies focusing on localization and identification of structures collectively showed an accuracy greater than 80%. Future advances in artificial intelligence are expected to increase systems' autonomy and reliability, thus providing even more effective tools for the diagnosis and treatment of LBP.
... There was no thorough data review and no explicit measures to ensure data quality. In line with current ideas on the establishment of artificial intelligence as discussed in the current literature, we see this as the main reason for the not entirely convincing predictive power of this AI model [2][3][4]18,20]. ...
... The application of artificial intelligence algorithms in spine therapy is slowly gaining momentum. A few years ago, there were hardly any approaches to using the latest algorithms to optimize therapy for patients with back pain and spine related problems [20,24]. ...
Patients with back pain are common and present a challenge in everyday medical practice due to the multitude of possible causes and the individual effects of treatments. Predicting causes and therapy efficien cy with the help of artificial intelligence could improve and simplify the treatment. In an exemplary collective of 1000 conservatively treated back pain patients, it was investigated whether the prediction of therapy efficiency and the underlying diagnosis is possible by combining different artificial intelligence approaches. For this purpose, supervised and unsupervised artificial intelligence methods were analyzed and a methodology for combining the predictions was developed. Supervised AI is suitable for predicting therapy efficiency at the borderline of minimal clinical difference. Non-supervised AI can show patterns in the dataset. We can show that the identification of the underlying diagnostic groups only becomes possible through a combination of different AI approaches and the baseline data. The presented methodology for the combined application of artificial intelligence algorithms shows a transferable path to establish correlations in heterogeneous data sets when individual AI approaches only provide weak results.
... ML could provide subtle information, which cannot detected by eye in desired image tasks. 5 In the era of big data, ML will dramatically improve diagnostic accuracy and prognosis. 3,6 Publications about applications of ML in spine significant increased recently. ...
... Classification and regression decision tree (CART) implements a classification or a regression task, which is more visible and easier to understand than other modalities. The tree comprises internal nodes (conditions), branches (decisions), and leaves (ends), that is not computationally intensive and therefore suitable for big data 5,12 (Figure 1). SVMs accomplishes classification tasks by creating a maximum margin hyperplane between 2 outcomes, or regression tasks by plotting a best-fit plane 13 (Figure 2). ...
- GuanRui Ren
- Kun Yu
- Zhiyang Xie
- Xiaotao Wu
Study design: Narrative review. Objectives: This review aims to present current applications of machine learning (ML) in spine domain to clinicians. Methods: We conducted a comprehensive PubMed search of peer-reviewed articles that were published between 2006 and 2020 using terms (spine, spinal, lumbar, cervical, thoracic, machine learning) to examine ML in spine. Then exclude research of other domain, case report, review or meta-analysis, and which without available abstract or full text. Results: Total 1738 articles were retrieved from database, and 292 studies were finally included. Key findings of current applications were compiled and summarized in this review. Main clinical applications of those techniques including image processing, diagnosis, decision supporting, operative assistance, rehabilitation, surgery outcomes, complications, hospitalization and cost. Conclusions: ML had achieved excellent performance and hold immense potential in spine. ML could help clinical staff to improve medical level, enhance work efficiency, and reduce adverse events. However more randomized controlled trials and improvement of interpretability are essential to clinicians accepting models' assistance in real work.
... Galbusera et al. [2] presented the various AI and ML techniques used in localization and labeling of spinal structures. Recently ANNs and deep learning were also employed for the localization of spinal structures. ...
... [height=4] [count=3, bias=true, title=Input layer, text=x[count=4, bias=false, title=Hidden layer 1, text=h (1) [count=3, bias=false, title=Hidden layer 2, text=h (2) [count=2, title=Output layer, text=Ć· Output. Our algorithm returns an int, which will determine the class of the input image. ...
- Nicolas Dimeglio
- SĂ©bastien Romano
- Alexandre Vesseron
- Samir Ouchani
After two years of COVID-19 first infection and its speedy propagation, death and infection cases are till exponentially increasing. Unfortunately, during this a non-fully controlled situation, we noticed that the existing solutions for COVID-19 detection based on chest X-ray were not reliable enough in relation to the number of infected patients and the severity of the outbreak. To handle this issue by increasing the reliability and the efficiency of COVID-19 detection, we therefore deploy and compare the results of a set of reconfigurable classification approaches and deep learning techniques. Indeed, we have achieved a score of up to 99% accuracy with a dataset of 15,000 X-ray images, which makes the selected detection technique, deep learning, more reliable and effective.
... However, use of these tools still requires time, from a few minutes for 2D analysis to up to 20 min for complex 3D reconstructions [71][72][73]. Computer-assisted artificial intelligence and machine learning algorithms (MLAs) are currently being developed and tested in spine research with promising potential, from anatomic localization to predictive analytics and clinical decision support [4,74]. Yet these tools are far from smartphone/mobile integration. ...
Purpose of Review While limited to case reports or small case series, emerging evidence advocates the inclusion of smartphone-interfacing mobile platforms and wearable technologies, consisting of internet-powered mobile and wearable devices that interface with smartphones, in the orthopaedic surgery practice. The purpose of this review is to investigate the relevance and impact of this technology in orthopaedic surgery. Recent Findings Smartphone-interfacing mobile platforms and wearable technologies are capable of improving the patients' quality of life as well as the extent of their therapeutic engagement, while promoting the orthopaedic surgeons' abilities and level of care. Offered advantages include improvements in diagnosis and examination, preoperative templating and planning, and intraoperative assistance, as well as postoperative monitoring and rehabilitation. Supplemental surgical exposure, through haptic feedback and realism of audio and video, may add another perspective to these innovations by simulating the operative environment and potentially adding a virtual tactile feature to the operator's visual experience. Summary Although encouraging in the field of orthopaedic surgery, surgeons should be cautious when using smartphone-interfacing mobile platforms and wearable technologies, given the lack of a current academic governing board certification and clinical practice validation processes.
... Tech optimists believe there is no doubt that in the next 21 years, augmented artificial intelligence technological advancement will lead to a leap in deep learning that emulates the way youngsters learn, instead of arduous guidance by custommade programs directed for particular applications that are dependent on logic, decision trees, and if-then logic (Galbusera et al., 2019). For example, DeepMind depends on a neural program using deep learning that comprehends by itself how to play various Atari games, like Breakout, or better than humans, without detailed guidance for doing it, but by playing a dozen games and revamping itself every time. ...
As artificial intelligence's potential and pervasiveness continue to increase, its strategic importance, effects, and management must be closely examined. Societies, governments, and business organizations need to view artificial intelligence (AI) technologies and their usage from an entirely different perspective. AI is poised to have a tremendous impact on every aspect of our lives. Therefore, it must have a broader view that transcends AI's technical capabilities and perceived value, including areas of AI's impact and influence. Nicholas G. Carr's seminal paper "IT Does not Matter (Carr, 2003) explained how IT's potential and ubiquity have increased, but IT's strategic importance has declined with time. AI is poised to meet the same fate as IT. In fact, the commoditization of AI has already begun. This paper presents the arguments to demonstrate that AI is moving rapidly in this direction. It also proposes an artificial intelligence-based organizational framework to gain value-added elements for lowering the impact of AI commoditization.
... September 2021 | Volume 9 | Article 703144 approaches, both in case of reduced and full model ( Figure 3). Support vector machine (SVM), predictive discriminant analysis (PDA), naive Bayes classifier (BAY), decision tree (DET), k-nearest neighbors (KNN), and ensemble method (ENS) were considered (Scholz and Wimmer, 2021;Galbusera et al., 2019;Minasny, 2009;Harper, 2005). Preliminary tuning of the hyperparameters was performed (Table1). ...
A major clinical challenge in adolescent idiopathic scoliosis (AIS) is the difficulty of predicting curve progression at initial presentation. The early detection of progressive curves can offer the opportunity to better target effective non-operative treatments, reducing the need for surgery and the risks of related complications. Predictive models for the detection of scoliosis progression in subjects before growth spurt have been developed. These models accounted for geometrical parameters of the global spine and local descriptors of the scoliotic curve, but neglected contributions from biomechanical measurements such as trunk muscle activation and intervertebral loading, which could provide advantageous information. The present study exploits a musculoskeletal model of the thoracolumbar spine, developed in AnyBody software and adapted and validated for the subject-specific characterization of mild scoliosis. A dataset of 100 AIS subjects with mild scoliosis and in pre-pubertal age at first examination, and recognized as stable (60) or progressive (40) after at least 6-months follow-up period was exploited. Anthropometrical data and geometrical parameters of the spine at first examination, as well as biomechanical parameters from musculoskeletal simulation replicating relaxed upright posture were accounted for as predictors of the scoliosis progression. Predicted height and weight were used for model scaling because not available in the original dataset. Robust procedure for obtaining such parameters from radiographic images was developed by exploiting a comparable dataset with real values. Six predictive modelling approaches based on different algorithms for the binary classification of stable and progressive cases were compared. The best fitting approaches were exploited to evaluate the effect of accounting for the biomechanical parameters on the prediction of scoliosis progression. The performance of two sets of predictors was compared: accounting for anthropometrical and geometrical parameters only; considering in addition the biomechanical ones. Median accuracy of the best fitting algorithms ranged from 0.76 to 0.78. No differences were found in the classification performance by including or neglecting the biomechanical parameters. Median sensitivity was 0.75, and that of specificity ranged from 0.75 to 0.83. In conclusion, accounting for biomechanical measures did not enhance the prediction of curve progression, thus not supporting a potential clinical application at this stage.
Cervical myelopathy (CM) is a pathology of the spinal cord that causes upper limb disorders. CM is often screened by conducting the 10-s grip and release (G&R) test, which mainly focuses on hand dysfunction caused by CM. This test has patients repeat gripping and releasing their hands as quickly as possible. Spine surgeons observe the quickness of this repetition to screen for CM. We propose an automatic screening method of CM that involves patients' hands recorded as videos when they are performing the G&R test. The videos are used to estimate feature values, i.e., the positions of each part of the hand, which are obtained through image processing. A support vector machine classifier classifies CM patients and controls with these feature values after pre-processing. We validated our method with 10-fold cross-validation and the videos of 20 CM patients and 15 controls. The results indicate that sensitivity, specificity, and area under the receiver operating characteristic curve were \(90.0\%\), \(93.3\%\), and 0.947, respectively.
- Simon P. Lalehzarian
- Anirudh K Gowd
- Joseph N. Liu
Artificial intelligence and machine learning in orthopaedic surgery has gained mass interest over the last decade or so. In prior studies, researchers have demonstrated that machine learning in orthopaedics can be used for different applications such as fracture detection, bone tumor diagnosis, detecting hip implant mechanical loosening, and grading osteoarthritis. As time goes on, the utility of artificial intelligence and machine learning algorithms, such as deep learning, continues to grow and expand in orthopaedic surgery. The purpose of this review is to provide an understanding of the concepts of machine learning and a background of current and future orthopaedic applications of machine learning in risk assessment, outcomes assessment, imaging, and basic science fields. In most cases, machine learning has proven to be just as effective, if not more effective, than prior methods such as logistic regression in assessment and prediction. With the help of deep learning algorithms, such as artificial neural networks and convolutional neural networks, artificial intelligence in orthopaedics has been able to improve diagnostic accuracy and speed, flag the most critical and urgent patients for immediate attention, reduce the amount of human error, reduce the strain on medical professionals, and improve care. Because machine learning has shown diagnostic and prognostic uses in orthopaedic surgery, physicians should continue to research these techniques and be trained to use these methods effectively in order to improve orthopaedic treatment.
- Julio C Furlan
- Jefferson R. Wilson
- Eric M. Massicotte
- Fehlings G Michael
The field of spinal oncology has substantially evolved over the past decades. This review synthesizes and appraises what was learned and what will potentially be discovered from the recently completed and ongoing clinical studies related to the treatment of primary and secondary spinal neoplasms. This scoping review included all clinical studies on the treatment of spinal neoplasms registered in the ClinicalTrials.gov website from February/2000 to December/2020. The terms "spinal cord tumor", "spinal metastasis", and "metastatic spinal cord compression" were used. Of the 174 registered clinical studies on primary spinal tumors and spinal metastasis, most of the clinical studies registered in this American registry were interventional studies led by single institutions in North America (n=101), Europe (n=43), Asia (n=24) or other continents (n=6). The registered clinical studies mainly focused on treatment strategies for spinal neoplasms (90.2%) that included investigating stereotactic radiosurgery (n=33), radiotherapy (n=21), chemotherapy (n=20), and surgical technique (n=11). Of the 69 completed studies, the results from 44 studies were published in the literature. In conclusion, this review highlights the key features of the 174 clinical studies on spinal neoplasms that were registered from 2000 to 2020. Clinical trials were heavily skewed towards the metastatic population as opposed to the primary tumours which likely reflects the rarity of the latter condition and associated challenges in undertaking prospective clinical studies in this population. This review serves to emphasize the need for a focused approach to enhancing translational research in spinal neoplasms with a particular emphasis on primary tumors.
Purpose We present an automated method for extracting anatomical parameters from biplanar radiographs of the spine, which is able to deal with a wide scenario of conditions, including sagittal and coronal deformities, degenerative phenomena as well as images acquired with different fields of view. Methods The location of 78 landmarks (end plate centers, hip joint centers, and margins of the S1 end plate) was extracted from three-dimensional reconstructions of 493 spines of patients suffering from various disorders, including adolescent idiopathic scoliosis, adult deformities, and spinal stenosis. A fully convolutional neural network featuring an additional differentiable spatial to numerical (DSNT) layer was trained to predict the location of each landmark. The values of some parameters (T4–T12 kyphosis, L1–L5 lordosis, Cobb angle of scoliosis, pelvic incidence, sacral slope, and pelvic tilt) were then calculated based on the landmarks' locations. A quantitative comparison between the predicted parameters and the ground truth was performed on a set of 50 patients. Results The spine shape predicted by the models was perceptually convincing in all cases. All predicted parameters were strongly correlated with the ground truth. However, the standard errors of the estimated parameters ranged from 2.7° (for the pelvic tilt) to 11.5° (for the L1–L5 lordosis). Conclusions The proposed method is able to automatically determine the spine shape in biplanar radiographs and calculate anatomical and posture parameters in a wide scenario of clinical conditions with a very good visual performance, despite limitations highlighted by the statistical analysis of the results. Graphical abstract These slides can be retrieved under Electronic Supplementary Material. Open image in new window
- Alan F T Winfield
- Marina Jirotka
This paper explores the question of ethical governance for robotics and artificial intelligence (AI) systems. We outline a roadmap—which links a number of elements, including ethics, standards, regulation, responsible research and innovation, and public engagement—as a framework to guide ethical governance in robotics and AI. We argue that ethical governance is essential to building public trust in robotics and AI, and conclude by proposing five pillars of good ethical governance. This article is part of the theme issue 'Governing artificial intelligence: ethical, legal, and technical opportunities and challenges'.
Worldwide interest in artificial intelligence (AI) applications is growing rapidly. In medicine, devices based on machine/deep learning have proliferated, especially for image analysis, presaging new significant challenges for the utility of AI in healthcare. This inevitably raises numerous legal and ethical questions. In this paper we analyse the state of AI regulation in the context of medical device development, and strategies to make AI applications safe and useful in the future. We analyse the legal framework regulating medical devices and data protection in Europe and in the United States, assessing developments that are currently taking place. The European Union (EU) is reforming these fields with new legislation (General Data Protection Regulation [GDPR], Cybersecurity Directive, Medical Devices Regulation, In Vitro Diagnostic Medical Device Regulation). This reform is gradual, but it has now made its first impact, with the GDPR and the Cybersecurity Directive having taken effect in May, 2018. As regards the United States (U.S.), the regulatory scene is predominantly controlled by the Food and Drug Administration. This paper considers issues of accountability, both legal and ethical. The processes of medical device decision-making are largely unpredictable, therefore holding the creators accountable for it clearly raises concerns. There is a lot that can be done in order to regulate AI applications. If this is done properly and timely, the potentiality of AI based technology, in radiology as well as in other fields, will be invaluable. Teaching Points • AI applications are medical devices supporting detection/diagnosis, work-flow, cost-effectiveness. • Regulations for safety, privacy protection, and ethical use of sensitive information are needed. • EU and U.S. have different approaches for approving and regulating new medical devices. • EU laws consider cyberattacks, incidents (notification and minimisation), and service continuity. • U.S. laws ask for opt-in data processing and use as well as for clear consumer consent.
Study design: A biomechanical study of pedicle-screw pullout strength. Purpose: To develop a decision tree based on pullout strength for evaluating pedicle-screw instrumentation. Overview of literature: Clinically, a surgeon's understanding of the holding power of a pedicle screw is based on perioperative intuition (which is like insertion torque) while inserting the screw. This is a subjective feeling that depends on the skill and experience of the surgeon. With the advent of robotic surgery, there is an urgent need for the creation of a patient-specific surgical planning system. A learning-based predictive model is needed to understand the sensitivity of pedicle-screw holding power to various factors. Methods: Pullout studies were carried out on rigid polyurethane foam, representing extremely osteoporotic to normal bone for different insertion depths and angles of a pedicle screw. The results of these experimental studies were used to build a pullout-strength predictor and a decision tree using a machine-learning approach. Results: Based on analysis of variance, it was found that all the factors under study had a significant effect (p <0.05) on the holding power of a pedicle screw. Of the various machine-learning techniques, the random forest regression model performed well in predicting the pullout strength and in creating a decision tree. Performance was evaluated, and a correlation coefficient of 0.99 was obtained between the observed and predicted values. The mean and standard deviation of the normalized predicted pullout strength for the confirmation experiment using the current model was 1.01±0.04. Conclusions: The random forest regression model was used to build a pullout-strength predictor and decision tree. The model was able to predict the holding power of a pedicle screw for any combination of density, insertion depth, and insertion angle for the chosen range. The decision-tree model can be applied in patient-specific surgical planning and a decision-support system for spine-fusion surgery.
Background: A prediction model for surgical site infection (SSI) after spine surgery was developed in 2014 by Lee et al. This model was developed to compute an individual estimate of the probability of SSI after spine surgery based on the patient's comorbidity profile and invasiveness of surgery. Before any prediction model can be validly implemented in daily medical practice, it should be externally validated to assess how the prediction model performs in patients sampled independently from the derivation cohort. Methods: We included 898 consecutive patients who underwent instrumented thoracolumbar spine surgery. To quantify overall performance using Nagelkerke's R2 statistic, the discriminative ability was quantified as the area under the receiver operating characteristic curve (AUC). We computed the calibration slope of the calibration plot, to judge prediction accuracy. Results: Sixty patients developed an SSI. The overall performance of the prediction model in our population was poor: Nagelkerke's R2 was 0.01. The AUC was 0.61 (95% confidence interval (CI) 0.54-0.68). The estimated slope of the calibration plot was 0.52. Conclusions: The previously published prediction model showed poor performance in our academic external validation cohort. To predict SSI after instrumented thoracolumbar spine surgery for the present population, a better fitting prediction model should be developed.
- Leo Breiman
- Jerome H. Friedman
- Richard A. Olshen
- Charles J. Stone
The methodology used to construct tree structured rules is the focus of this monograph. Unlike many other statistical procedures, which moved from pencil and paper to calculators, this text's use of trees was unthinkable before computers. Both the practical and theoretical sides have been developed in the authors' study of tree methods. Classification and Regression Trees reflects these two sides, covering the use of trees as a data analysis method, and in a more mathematical framework, proving some of their fundamental properties.
- Hongbo Wu
- Chris Bailey
- Parham Rasoulinejad
- Shuo Li
Automated quantitative estimation of spinal curvature is an important task for the ongoing evaluation and treatment planning of Adolescent Idiopathic Scoliosis (AIS). It solves the widely accepted disadvantage of manual Cobb angle measurement (time-consuming and unreliable) which is currently the gold standard for AIS assessment. Attempts have been made to improve the reliability of automated Cobb angle estimation. However, it is very challenging to achieve accurate and robust estimation of Cobb angles due to the need for correctly identifying all the required vertebrae in both Anterior-posterior (AP) and Lateral (LAT) view x-rays. The challenge is especially evident in LAT x-ray where occlusion of vertebrae by the ribcage occurs. We therefore propose a novel Multi-View Correlation Network (MVC-Net) architecture that can provide a fully automated end-to-end framework for spinal curvature estimation in multi-view (both AP and LAT) x-rays. The proposed MVC-Net uses our newly designed multi-view convolution layers to incorporate joint features of multi-view x-rays, which allows the network to mitigate the occlusion problem by utilizing the structural dependencies of the two views. The MVC-Net consists of three closely-linked components: (1) a series of X-modules for joint representation of spinal structure (2) a Spinal Landmark Estimator network for robust spinal landmark estimation, and (3) a Cobb Angle Estimator network for accurate Cobb Angles estimation. By utilizing an iterative multi-task training algorithm to train the Spinal Landmark Estimator and Cobb Angle Estimator in tandem, the MVC-Net leverages the multi-task relationship between landmark and angle estimation to reliably detect all the required vertebrae for accurate Cobb angles estimation. Experimental results on 526 x-ray images from 154 patients show an impressive 4.04° Circular Mean Absolute Error (CMAE) in AP Cobb angle and 4.07° CMAE in LAT Cobb angle estimation, which demonstrates the MVC-Net's capability of robust and accurate estimation of Cobb angles in multi-view x-rays. Our method therefore provides clinicians with a framework for efficient, accurate, and reliable estimation of spinal curvature for comprehensive AIS assessment.
Artificial intelligence (AI) algorithms, particularly deep learning, have demonstrated remarkable progress in image-recognition tasks. Methods ranging from convolutional neural networks to variational autoencoders have found myriad applications in the medical image analysis field, propelling it forward at a rapid pace. Historically, in radiology practice, trained physicians visually assessed medical images for the detection, characterization and monitoring of diseases. AI methods excel at automatically recognizing complex patterns in imaging data and providing quantitative, rather than qualitative, assessments of radiographic characteristics. In this Opinion article, we establish a general understanding of AI methods, particularly those pertaining to image-based tasks. We explore how these methods could impact multiple facets of radiology, with a general focus on applications in oncology, and demonstrate ways in which these methods are advancing the field. Finally, we discuss the challenges facing clinical implementation and provide our perspective on how the domain could be advanced. Full text: https://rdcu.be/O1xz
- Nikolas Lessmann
- Bram van Ginneken
- Pim A. de Jong
- Ivana IĆĄgum
Precise segmentation of the vertebrae is often required for automatic detection of vertebral abnormalities. This especially enables incidental detection of abnormalities such as compression fractures in images that were acquired for other diagnostic purposes. While many CT and MR scans of the chest and abdomen cover a section of the spine, they often do not cover the entire spine. Additionally, the first and last visible vertebrae are likely only partially included in such scans. In this paper, we therefore approach vertebra segmentation as an instance segmentation problem. A fully convolutional neural network is combined with an instance memory that retains information about already segmented vertebrae. This network iteratively analyzes image patches, using the instance memory to search for and segment the first not yet segmented vertebra. At the same time, each vertebra is classified as completely or partially visible, so that partially visible vertebrae can be excluded from further analyses. We evaluated this method on spine CT scans from a vertebra segmentation challenge and on low-dose chest CT scans. The method achieved an average Dice score of 95.8% and 92.1%, respectively, and a mean absolute surface distance of 0.194 mm and 0.344 mm.
Source: https://www.researchgate.net/publication/330878118_Artificial_intelligence_and_machine_learning_in_spine_research
Posted by: kenyadriscole0199271.blogspot.com
Post a Comment