Header
TwinTree




The state of Artificial Intelligence in medical imaging • Part II
Are radiologists’ neurons faster and cheaper?

Rinckside 2022; 33,3: 5-9.


he overall positive picture of AI painted in many pre­sen­ta­tions at the Euro­pean Con­gress of Radio­logy in Vienna this summer and at other con­fe­ren­ces is not ne­cess­ari­ly mirrored outside the IT busi­ness and tech­no­lo­gists’ world of medical imaging. Features are seen in a dif­fe­rent light by ra­dio­lo­gists in hospitals and private offices as well as independent expert AI scientists and major consulting companies.

Re­ser­va­tions are build­ing up against naive and sim­plist­ic promises of what AI will be able to deliver. The often described neural networks are clearly gross oversimplifications of the actual neurons of a human brain. The neurons of a well-trained radiologist work faster and more efficient — although computer assis­tance can fa­cili­tate ad­mini­stra­tive, dia­gnos­tic, and re­search rou­tines.

Jack Copeland, a cutting-edge AI researcher and leading professor in the field, wrote about artificial intelligence:

“Exaggerated claims of success, in pro­fes­sio­nal jour­nals as well as the popular press, have damaged its reputation. At the present time even an embodied system displaying the overall intelligence of a cockroach is proving elusive, let alone a system that can rival a human being. The difficulty of scaling up AI’s modest achievements cannot be overstated [1].”

Eric Daimler, the chief executive of Conexus AI in San Francisco, shares that opinion:

"The trendy foundational models of deep learning are not software composable. This is a limitation of the models and means that they will always have weaknesses that are more appropriate to jobs with low-consequence outcomes. Deploying this tech alone in life-critical environments in not currently solvable with just bigger models [2]."


The classification of AI

AI research attempts to reach one of three goals:

(1) Strong AI which aims to build machines that think;
(2) Cognitive Simulation — here computers are used to test theories about how the hu­man mind works — for example, theories about how people re­cog­nize faces or re­call me­mo­ries; and
(3) Applied AI, also known as advanced information processing, which aims to pro­duce com­mer­ci­al­ly vi­able ‘smart’ systems — for ex­amp­le, expert medical diagnosis systems such as supervised or unsupervised computer assisted detection or diagnosis (CAD) [3] or machine (deep) learning (ML) [4].

Software designs offered for medical imaging are not genuine AI, but rather basic or sophi­sti­cated CAD or ML systems. Machine learn­ing is con­cern­ed with the question of how to construct computer programs that au­to­mat­ical­ly imp­rove with experience. Their aim in ra­dio­logy is that more rou­tine imaging, in­clud­ing dia­gno­sis and re­port­ing, be done in an auto­ma­ted way. For this purpose four pre­re­qui­sites must be met:

data of sufficient quantity and quality,
a powerful algorithm,
a narrowly defined task area,
a concrete goal to be achieved.

Of the four prerequisites, sufficient amounts of data will be easily available; however, its quality is and will remain im­pre­cise, in­ad­equate, and often ir­re­pro­duc­ible as described for instance by Lloret [5]:

“One of the problems comes from the variability of the data itself (e.g., contrast, resolution, signal-to-noise) which make the Deep Learning models suffer from a poor generalization when the training data come from different machines (different vendor, model, etc.) with different acquisition parametrization or any underlying com­po­nent that can cause the data dis­tri­bu­tion to shift.”


Data quality is and will remain
imprecise, inadequate, and often irreproducible



More so, it is well known that scanner effects can be subtly yet significantly affect machine learning [6]. This holds for both quantification and detection, the most common AI/ML applications that prospective vendors apply for approval to the FDA. We have discussed the pitfalls of such quantifications earlier [7].

Suitable algorithms will be obtainable — yet each AI vendor is producing its individual AI variants. Some will be better than others and all will presumably deliver — slightly or dis­tinct­ly — different results.

As far as the task areas and concrete goals, there will be dozens, perhaps hundreds of different soft­wares for dif­fe­rent or­gans or dia­gno­stic ques­tions. There won’t be one general algorithm based on training datasets for the whole human body — with all the va­ri­a­tions from chil­dren to old people [8] and covering sufficient geo­graphic loca­tions re­pre­sent­ing di­verse co­horts [9,10].

At the end, the software should be able to draw inferences relevant to the solution of the particular task or situation. Often validation of the CAD and ML systems is missing [11]. As one example for many, a group from the University of Cambridge scrutinized several thousand publications and concluded:

“Despite the huge efforts of researchers to develop machine learning models for COVID-19 dia­gnosis and pro­gnosis, we found me­tho­do­lo­gical flaws and many biases throug­hout the lite­rature, leading to highly op­ti­mis­tic reported performance [12].”


Fast Science

AI in medicine and particularly in medical imaging, has long slipped out of dependable scientists' control. Looking at the publications and talks at meetings, there are more unqualified than qualified contributions. Similar to the frenzied hype with functional imaging (fMRI) that led to some 40,000 fMRI papers of ‘questionable validity’ [13,14], it is to be feared that the way applied AI is used in medical imaging carries an analogous risk.

A new approach to research is catching on: Fast Science. All and sundry presume to have an expertise in anything, including AI, but most lack the competence to explain and judge. There is a quasi-religious belief in artificial intelligence with science fic­tion fan­ta­sies. Everybody wants to beat a possible financial or career competitor by a whisker. Often the arguments are not scien­ti­fic but ad hominem:

“… beneficial AI ap­pli­cations run the risk of not being adopted because of a lack of pro­ven health and eco­nomic bene­fits and may lead to po­ten­tial health loss and un­ne­cessa­ry costs, which are likely to per­sist until AI, with its seemingly end­less pos­si­bi­lities, is recog­nized as an inter­vention that can and should be properly assessed [15].”

In other words, if you don’t jump on the AI train immediately you are guilty — you hurt patients and waste their money. You are coerced into jumping on the train of ‘endless possibilities’. Crowd psychology teaches that there is a human desire to be member of a group, thinking, behaving and deciding the same way without individual critical eva­lu­ation — to minimize conflict and not be excluded: don’t check whether AI works and has proven positive impact — just be part of it.

Medicine is on its way back to ar­bi­trary re­search without questi­on­ing its un­der­stand­ing of scien­tific ground­work. Thus, half-baked layman's wishes determine the direction — and in most cases IT specialists, health ad­mini­strators, even natural scien­tists getting involved in medicine are these laymen. A wave of hocus-pocus and hocus-bogus is rolling.

However, reassessment during the last years has led to a certain pensiveness. It seems as if many of the promised benefits are missing. Will the results and the outcome be cheaper, faster, more reliable and better than the evaluation of medical images by a trained radiologist?


Surveys of radiological associations and acceptance by potential users

Surveys by radiological societies and con­sult­ing firms are sobering: ar­ti­fi­cial intel­li­gence faces a slow acceptance and is achieving fairly limited success. One of the conclusions of an analysis of the news magazine The Economist together with the Swiss Pictet Banking Group reads:

“Findings suggest that AI investment is increasingly concentrated in a narrowing field of commercial applications, which may come at the expense of more exploratory and foundational research [16].”

The acceptance by radiologists is guarded and slack as both the European Society of Ra­dio­logy (ESR) and the American Col­lege of Ra­dio­logy (ACR) reveal:

ESR: “In the previous ESR survey conducted in 2018, 51% of respondents expected that the use of AI tools would lead to a reduced reporting workload. The actual con­tri­bu­tions of AI to the work­load of diagnostic radiologists were assessed in a recent analysis based on large number of published studies. It was concluded that although there was often added value to patient care, workload was decreased in only 4% but increased in 48% and remained unchanged in 46% institutions. In summary, this survey suggests that, compared with initial expectations, the use of AI-powered algorithms in prac­tical cli­ni­cal radio­logy today is limited, most importantly because the impact of these tools on the reduction of radiologists’ workload remains unproven [17].”

ACR: “Approximately 30% of radiologists [in the U.S.A.] are currently using AI as part of their practice. Large practices were more likely to use AI than smaller ones, and of those using AI in clinical practice, most were using AI to enhance interpretation, most commonly detection of intra­cranial hemor­rhage, pul­mo­nary emboli, and mam­mo­gra­phic ab­nor­ma­li­ties. Of practices not currently using AI, 20% plan to purchase AI tools in the next 1 to 5 years. … Conclusion: … The survey results indicate a modest penetrance of AI in clinical practice [18].”

AI is mostly used and tried out in university and other teaching hospitals — to produce articles and talks to promote the career of younger doctors. The increase of the number of examinations particularly at high-throughput institutions doesn’t necessarily go hand in glove with quality.

“Recently published medical imaging studies often add value to radiological patient care. However, they likely increase the overall workload of diagnostic radiologists, and this particularly applies to AI studies [19].”


Risks

There are numerous risks, ‘second-order’ effects, and un­ex­pect­ed, un­con­troll­able im­pli­ca­tions of em­ploy­ing AI/CAD/ML.

With only small amounts of training data, deep learning models can fi­gure out de­mo­gra­phic fea­tures such as age, sex, body-mass index, and race even from corrupted, crop­ped, and noisy anonymous chest x-rays and CT images with high dis­crimi­na­tive per­for­mance — often when clinical experts are unable to pinpoint these features. This ability creates an enormous risk for all possible deployments in medical imaging because the AI software could run amok invisibly in the background. It is a bias that might lead to wrong diagnoses and therapy, as well as to discrimination of patients [20,21,22].

The results are artificial ‘gossip’ and 'rumors'. You can’t trust this secondary outcome and it has nothing to do with the task the software has been asked to perform. Artificial intelligence of this kind is not intelligent enough to distinguish real facts from self-created fiction.

A report by the US-American consulting company McKinsey discusses other potential risks of AI in detail. It claims on the one hand that AI will improve our lives by "enhancing our healthcare experiences" — whatever that might mean — but also sees:

"There also are second-order effects, such as the atrophy of skills (for example, the diagnostic skills of medical professionals) as AI systems grow in importance [23]."

However, trained radiologists are essential: While the likes of the regulatory authorities may develop a series of testing cases to compare products from different vendors these cases will only reflect a limited range of pathologies — rarer pathologies may not be included. The end user has no concept of how good or bad the particular algorithm is at making a correct diagnosis in a particular case and the system is likely to provide a black or white response.

Can the software say ‘I don't know, I am not sure — we need an expert opinion in this area’? The user of the software will be unaware that there may be a degree of un­cer­tain­ty or bias.

Radiologists tend to know colleagues who have particular expertise in certain fields. They can refer difficult cases to them up for a second opinion. Does this referral request have any equivalent place in AI? The pundits will say that AI will improve as the training data increases — but what happens when radiologists providing difficult or rare dia­gnos­tic so­lu­tions no lon­ger exist because AI has superseded them?

What happens if a health system fails (in this case the British NHS) and there are no radiologists available? Then it is: any port in a storm. For most European countries it was an unthinkable development although even on the continent this was brought up earlier [24]. Now we read:

“Radiographer reporting is accepted practice in the UK. With a national shortage of radiographers and radiologists, artificial intelligence (AI) support in reporting may help minimise the backlog of unreported images [25].”

The authors explain that a minimum of 50% of plain x-ray images should be reported by a radiographer with the help of computed diagnosis. They admit that the complexity of these systems means that the processes are not transparent, sometimes even to the developers.

Meanwhile, of all institutions, the European Union has woken up and wants ‘a risk-based approach’ to AI [26].


The commercial side and conflicts of interest

Preconceived ideas deceive the senses. Who is credible and trustworthy in AI development? A study of industry ties reveals:

“We found that the prevalence of financial ties to industry … was high. For nearly 30% of comments, we were unable to determine whether or not there was a financial tie, and disclosure of ties was non-­existent. The proportion of academic submitters was relatively low, and the use of scientific evidence to support comments was sparse. We recommend that the FDA requires disclosure of potential conflict-of-interest, and encourages greater academic participation and use of scientific evidence in public comments [27].”

There are several hundred vendors of “AI” for medical imaging applications, among them a large number of start-ups and spin-offs. There is no place in the market for more than 90% of them. They will not stay alive and disappear because there will be no return on investment — be it government or European Union money, ven­ture ca­pi­tal or other sour­ces. Some have already merged with competitors because they cannot survive as standalone companies. What will hap­pen to their employees, their founders, the venture capital invested, the state grants given?

One of the taboo topics in AI sales in medicine is the question of accountability: Who is liable if a computer's decision causes damage? Is it the manufacturer or the user? If a company tries to sell you an AI program you have to insist that in the sales contract the company underwrites its use and that it takes all responsibilities for pos­sible per­for­man­ce fail­ures.


The end — if and what to buy

A long time ago I wrote a column ‘How to purchase an MR machine • In ten easy lessons.’ It began with this sentence:

“Murphy’s Law is the most reliable guideline when buying an MR machine: anything that can go wrong usually does [28].”

The column could be easily adapted to AI/CAD/ML. Thus, I was not not puzzled when a ‘sales­woman scien­tist’ I knew well confessed to me: “I know that our software doesn’t work, but we sell it anyway.”

What was taken for granted yester­day will change today. The high-techno­logy won­der­land needs per­manent change to earn money.

My advice to department heads: Train your people even better than today, and wait and see until the method is established and proven — or not. Don’t waste time and money. And never forget: Neither radiologists nor AI are infallible.



References

1. Copeland BJ. Artificial intelligence. Encyclopædia Britannica; (retrieved July 2022).
2. Daimler E. Lower expectations for AI. The Economist. 2 July 2022. 14.
3. Alaux A, Rinck PA. Multispectral analysis of magnetic resonance images: a comparison between supervised and unsupervised classification techniques. In Proc. Int. Symp. on Tissue Characterisation in Magnetic Resonance Imaging, European Soc. for Mag. Res. in Med. & Biol, ISBN 354051532. 19th–21st April 1989, Wiesbaden, Germany, 165–169;
— Rinck PA. Chapter Fifteen: Image Processing and Visualization. In: Rinck PA. Magnetic Resonance in Medicine. The Basic Textbook of the European Magnetic Resonance Forum. 12th edition; 2018|2020. Free offprint.
4. Montagnon M, Cerny M, Cadrin-Chênevert A, et al. Deep learning workflow in radiology: a primer. Insights Imaging. 2020; 11:22. doi.org/10.1186/s13244-019-0832-5
5. Lloret Iglesias L, Sanz Bellón P, Pérez del Barrio A, Menéndez Fernández — Miranda P, Rodríguez González D, Vega JA, González Mandly AA, Parra Blanco JA. A primer on deep learning and convolutional neural networks for clinicians. Insights Imaging. 2021; 12: 117
6. Ferrari E, Bosco P, Calderoni S, Oliva P, Palumbo L, Spera G, Fantacci ME, Retico A. Dealing with confounders and outliers in classification medical studies: The Autism Spectrum Disorders case study. Artif Intell Med. 2020; 108: 101926. doi: 10.1016/j.artmed.2020.101926.
7. Rinck PA. All is not what it seems in the messy world of research. Don’t play it again, Sam. Rinckside 2021; 32,6: 17-18.;
— Rinck PA. Mapping the biological world. Rinckside 2017; 28,7: 13-15.
8. Gauriau R, Bizzo BC, Kitamura FC, et al. A Deep Learning-based model for detecting abnormalities on brain MR images for triaging: preliminary results from a multisite experience. Radiol Artif Intell. 21 April 2021; 3(4):e200184.doi: 10.1148/ryai.2021200184.
9. Kaushal A, Altman R, Langlotz C. Geographic distribution of US cohorts used to train deep learning algorithms. JAMA. 2020; 324 (12): 1212–1213. doi:10.1001/jama.2020.12067
10. Wu E, Wu K, Daneshjou R, Ouyang D, Ho DE, Zou J. How medical AI devices are evaluated: limitations and recommendations from an analysis of FDA approvals. Nat Med 2021; 27: 582–584. doi.org/10.1038/s41591-021-01312-x
11. Rinck PA. Some reflections on artificial intelligence in medicine. Rinckside 2018; 29,5: 11-13;
— Rinck PA. Artificial intelligence meets validity. Rinckside 2019; 30,5: 13-15.
12. Roberts M, Driggs D, Thorpe M, et al. Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans. Nat Mach Intell. 2021; 3: 199–217. doi.org/10.1038/s42256-021-00307-0
13. Eklund A, Nichols TE, Knutsson H. Cluster failure: Why fMRI inferences for spatial extent have inflated false-positive rates. PNAS 2016; 113, 28: 7900-7905.
14. Rinck PA. Debacles mar “Big Science” and fMRI research. Rinckside 2016; 27,7: 17-18.
15. Voets MM, Veltman J, Slump CH, Siesling S, Koffijberg H. Systematic review of health economic evaluations focused on artificial intelligence in healthcare: the tortoise and the cheetah. Value in Health. 2022; 25,3: 340-349.
16. The Economist and The Pictet Group. AI is currently enjoying a heyday, but is innovation slowing? 2022; (retrieved July 2022).
17. European Society of Radiology (ESR). Current practical experience with artificial intelligence in clinical radiology: a survey of the European Society of Radiology. Insights Imaging. 2022; 13: 107. doi.org/10.1186/s13244-022-01247-y
18. Allen B, Agarwal S, Coombs L, Wald C, Dreyer K. 2020 ACR data science institute artificial intelligence survey. J Am Coll Radiol 2021; 18: 1153–1159.
19. Kwee TC, Kwee RM. Workload of diagnostic radiologists in the foreseeable future based on recent scientific advances: growth expectations and role of artificial intelligence. Insights Imaging. 29 June 2021; 12(1): 88. doi: 10.1186/s13244-021-01031-4.
20. Jabbour S, Fouhey D, Kazerooni E, Sjoding MW, Wiens J. Deep learning applied to chest x-rays: exploiting and preventing shortcuts. PMLR 2020; 126: 750–782.
21. Seyyed-Kalantari L, Zhang H, McDermott MBA, Chen IY, Ghassemi M. Underdiagnosis bias of artificial intelligence algorithms applied to chest radiographs in under-served patient populations. Nat Med. 2021;27: 2176-2182. doi: 10.1038/s41591-021-01595-0.
22. Gichoya JW, Banerjee I, Bhimireddy AR, et al. AI recognition of patient race in medical imaging: a modelling study. Lancet Digit Health. 2022; 4(6):e406-e414. doi: 10.1016/S2589-7500(22)00063-2.
23. Cheatham B, Javanmardian K, Samandari H. Confronting the risks of artificial intelligence. McKinsey Quarterly. 26 April 2019.
24. Rinck PA. Rude awakening: Will radiographers eventually take over? Rinckside 2011; 22,4: 7-8.
25. Rainey C, O'Regan T, Matthew J, et al. UK reporting radiographers’ perceptions of AI in radiographic image interpretation – current perspectives and future developments. Radiography, 2022; 28: 881-888.
26. European Commission. Regulatory framework proposal on artificial intelligence. 2022 (retrieved July 2022).
27, Smith JA, Abhari RE, Hussain Z, Heneghan C, Collins GS, Carr AJ. Industry ties and evidence in public comments on the FDA framework for modifications to artificial intelligence/machine learning-based medical devices: a cross sectional study. BMJ Open. 14 October 2020. 10(10): e039969. doi: 10.1136/bmjopen-2020-039969.
28. Rinck PA. How to purchase an MR machine • In ten easy lessons. Rinckside 1992; 3,4: 9-10.



Citation: Rinck PA. The state of Artificial Intelligence in medical imaging • Part II
Are radiologists’ neurons are faster and cheaper?
Rinckside 2022; 33,3: 5-9.



TurnPrevPage TurnNextPage

Rinckside • ISSN 2364-3889
is pub­lish­ed both in an elec­tro­nic and in a prin­ted ver­sion. It is listed by the Ger­man Na­tio­nal Lib­rary.


Cover-Vol33


→ Print version (pdf).



The Author

PAR

Rinck is my last name, and a rink is an area of com­bat or con­test.

Rink­side means by the rink. In a double mean­ing “Rinck­side” means the page by Rinck. Some­times I could also imagine “Rinck­sighs”, “Rinck­sights” or “Rinck­sites” …
⇒ more

Contact



Bulletin Board

00-f4

00-f1

00-f2

00-f3

00-f5

00-f6