Autism overdiagnosis

CONSULTI PER BAMBINI ADOLESCENTI E FAMIGLIE

Journal of Child Psychology and PsychiatryVolume 64, Issue 5 p. 711-714
Editorial
Free Access
Editorial: Is autism overdiagnosed?
Eric Fombonne
First published: 13 April 2023
https://doi.org/10.1111/jcpp.13806
Conflict of interest statement: None.
About
Sections

Abstract
After attention was drawn in the late 1960s to the poor reproducibility of psychiatric diagnosis between clinicians, methods and procedures used to diagnose psychiatric disorders were greatly improved. Sources of variance contributing to the poor reliability of psychiatric diagnosis were identified that included: information variance (how clinicians go about enquiring about symptoms), interpretation variance (how clinicians weigh the observed symptomatology towards diagnostic formulations), and criterion variance (how clinicians arrange symptom constellations to generate specific diagnoses). To improve the reliability of diagnosis, progresses were made in two major directions. First, diagnostic instruments were developed to standardize the way symptoms are elicited, evaluated, and scored. These diagnostic interviews were either highly structured for use in large-scale studies (e.g. the DIS), by lay interviewers without a clinical background, and with a style of questioning that emphasized adherence to the exact wording of probes, reliance on closed questions with simple response formats (Yes/No) and recording respondents' answers without interviewer's judgment contribution. By contrast, semi-structured interviews (e.g. the SADS) were designed to be used by clinically trained interviewers and adopted a more flexible, conversational style, using open-ended questions, utilizing all behavioral descriptions generated in the interview, and developing scoring conventions that called upon the clinical judgment of the interviewer. Second, diagnostic criteria and algorithms were introduced in nosographies in 1980 for the DSM and soon after in ICD. Algorithm-derived diagnoses could subsequently be tested for their validity using follow-up, family history, treatment response studies, or other external criteria.

After attention was drawn in the late 1960s to the poor reproducibility of psychiatric diagnosis between clinicians, methods and procedures used to diagnose psychiatric disorders were greatly improved. Sources of variance contributing to the poor reliability of psychiatric diagnosis were identified that included: information variance (how clinicians go about enquiring about symptoms), interpretation variance (how clinicians weigh the observed symptomatology towards diagnostic formulations), and criterion variance (how clinicians arrange symptom constellations to generate specific diagnoses). To improve the reliability of diagnosis, progresses were made in two major directions. First, diagnostic instruments were developed to standardize the way symptoms are elicited, evaluated, and scored. These diagnostic interviews were either highly structured for use in large-scale studies (e.g. the Diagnostic Interview Schedule (DIS)), by lay interviewers without a clinical background, and with a style of questioning that emphasized adherence to the exact wording of probes, reliance on closed questions with simple response formats (Yes/No) and recording verbatim respondents' answers without interviewer's judgment contribution. By contrast, semi-structured interviews (e.g. the SADS) were designed to be used by clinically trained interviewers and adopted a more flexible, conversational style, using open-ended questions, utilizing all behavioral descriptions generated in the interview, and developing scoring conventions that called upon the clinical judgment of the interviewer. Second, diagnostic criteria and algorithms were introduced in nosographies in 1980 for the Diagnostic and Statistical Manual (DSM) and soon after in International Classification of Diseases (ICD). Algorithm-derived diagnoses could subsequently be tested for their validity using follow-up, family history, treatment response studies, or other external criteria.

In child and adolescent psychiatry too, diagnostic interviews were developed both of the structured (e.g. the Diagnostic Interview Schedule for Children (DISC)) and semi-structured (e.g. the Child and Adolescent Psychiatric Assessment (CAPA)) types. Guidelines to assess child and adolescent psychopathology emphasized the necessity to rely on multiple informants and data sources, to evaluate the situational specificity or pervasiveness of relevant behaviors, and to assess functional impairment beyond merely taking symptom inventories. The field of autism benefited from these improvements which is what led to the development in the late 1980s of diagnostic tools such as the Autism Diagnostic Interview-Revised (ADI-R) and the Autism Diagnostic Observation Schedule (ADOS). Both instruments are semi-structured and were initially devised to assist expert clinicians in their diagnostic evaluations. Use of these instruments, especially the ADOS, has now extended to large numbers of community providers who may not have prior experience in autism and its diagnosis but become ADOS ‘reliable’ or ‘certified’ following two days of training seminars. But are the training and certification sufficient to qualify ADOS testers as diagnosticians?

In this issue, Bishop and Lord (2023) underscore that evaluation of an individual for possible autism spectrum disorder (ASD) requires a process (not just test results) with flexible administration and scoring of tools, followed by integration of all findings by a clinician with autism knowledge and experience. They remind us that autism diagnostic tools have been essential in providing much-needed, standardized, replicable, ways to conduct diagnostic interviews or observations for the purpose of diagnosis (an issue of reliability); and in affirming that ‘clinicians make diagnoses, not instruments’, they rightly caution against mechanical interpretations of tests scores (an issue of validity). They rightly point out that these tools are deployed in particular contexts (complex clinical profiles, different languages and cultures, diverse populations) in which these instruments were not initially calibrated, necessitating adjustments. Bishop & Lord also warn that the requirement of a ‘positive’ ADOS score as an eligibility criterion for receiving services may result in denying access to services to children who need them. We could not agree more. Here, we expand the discussion by envisaging another consequence, that inadequate diagnostic processes may also result in erroneous diagnoses of ASD.

Overdiagnosis can result from shortcomings at either the diagnostic instrument or the diagnostic process levels.

Is autism overdiagnosed? Although largely indirect and anecdotal, evidence for possible overdiagnosis nonetheless exists. Epidemiological estimates keep soaring with, as an example, California-specific ASD prevalence reaching 4.5% in the latest CDC survey (Maenner et al., 2023). Referrals for suspected autism continue to flow especially among school-aged children and adults. Autism centers teams keep receiving requests to train scores of school professionals and other primary care providers in ADOS administration resulting in a larger community capacity to ascribe ASD diagnoses to children evaluated with streamlined procedures. That at least some of these community diagnoses are false positives was illustrated in one of our studies where, of 232 school-age children and adolescents with a pre-existing community diagnosis of ASD referred to our academic center for a neuroimaging study, only 47% met research criteria for ASD after an extensive diagnostic re-evaluation process (Duvall et al., 2022). Yet, many were deemed to have been ‘meeting DSM criteria’ or ‘above the ADOS cut-off’ in prior records.

Overdiagnosis can result from shortcomings at either the diagnostic instrument or the diagnostic process levels. With regard to diagnostic instrument, ADOS training workshops provide testers with a roadmap for organizing structured activities and social interactions designed to elicit diagnostically informative behaviors. Techniques of test administration are straightforward to learn. However, scoring instructions are complex and necessitate a careful analysis and interpretation of the behaviors observed. Even though the coding conventions are rigorously operationalized, a good deal of clinical judgment and experience remains necessary for the tester to accurately map observed behaviors to underlying autistic disturbances. For example, children may talk repetitively about dinosaurs they saw during a recent museum visit which may be age appropriate in young children; however, for this intense interest to be considered as excessive or ‘circumscribed’ requires that other features are demonstrated (odd quality, interference with demands, etc…). Likewise, abnormal eye gaze is not specific to ASD and is observed across a number of other clinical conditions. To count toward an ASD diagnosis, eye contact does not simply need to be absent or decreased but evidence of poor modulation in the dynamic context of a social interaction must be brought. The issue is that many atypical behaviors that are linked to ASD are not specific to ASD. Counting abnormal behaviors without establishing their specific autistic quality or nature is a source of overdiagnosis. The issue is compounded in ADOS scoring conventions whereby abnormal behaviors can be scored as present (2, indicative of ASD) or absent (0, which means either no abnormality or an abnormal response that does not have an autistic quality). Moreover, normative reference data do not exist for many behaviors evaluated during an ADOS (e.g. what is a normal range for facial expressions?) facilitating the pathologizing of behaviors that are not part of the common experience albeit not autistic in nature. Testers' lack of familiarity with psychopathology can also result in overdiagnosing ASD. Given the high prevalence of co-occurring psychiatric disorders in both children (Lai et al., 2019) and adults (Fombonne et al., 2020), accounting for the confounding influences of comorbid symptoms in ADOS scores is essential, especially when assessing older, more able individuals. For example, turn-taking in a conversation can be impaired in both ASD (Failure of normal back-and-forth conversation) and Attention Deficit Hyperactivity Disorder (ADHD) (Often talks excessively, Blurts out answers). Ascribing a symptom to either disorder requires a clinical analysis and judgment about the mechanism underlying conversation difficulty (pragmatic deficit? or impulsivity?) that comes with experience in general psychopathology.

Overdiagnosing may also occur due to deficiencies in the overall diagnostic process and formulation. As Bishop & Lord articulated, the diagnostic decision process must transcend the results of any particular tool, even if the administration of that test is considered to be a gold standard. Combination of findings from different observations across contexts, informants, and data collection procedures (direct observation, caregiver report, school evaluations, medical records, …) must be performed. Discrepancies between test results are common; there is no simple algorithmic solution to resolve them and expert clinical judgment is necessary to that end. In reputable research enterprises, final diagnostic decision often relies on arbitration by a clinical expert (e.g. Fischbach & Lord, 2010); field trials testing competing ASD diagnostic algorithms have used the clinical judgment of expert clinicians as the external validity criterion (e.g. Volkmar et al., 1994). Additionally, diagnostic criteria for ASD include an assessment of the trajectory of autistic symptoms and their resulting impact on functioning. Even though an early age of symptom onset was removed from DSM 5 as a formal diagnostic criterion, the concept of autism as a neurodevelopmental disorder manifesting in early years remains. Nosographies specify that symptoms may be present ‘currently or by history’ without much guidance on how to elicit earlier developmental anomalies and on which constellation of symptoms at which developmental time should be considered. Experienced clinicians evaluate both past and current symptoms, de facto performing a private, retrospective, longitudinal evaluation of symptoms and of their trajectory. Reflecting the importance of a developmental perspective in diagnosing ASD, some autism instruments like the ADI-R (or its derivative the Social Communication Questionnaire (SCQ)) have incorporated in their design this capture of past symptoms to contribute to the final score. When the evaluation process relies mostly or exclusively on current symptom profiles (like drawing diagnostic conclusions based on the ADOS only), the gain of specificity obtained by demonstrating an early origin to the symptom trajectory is lost, possibly contributing to falsely positive ASD diagnoses.

Evidence that ASD symptoms cause impairment is another mandatory diagnostic criterion. Yet, social impairment is a non-specific feature of ASD and occurs in the context of most psychiatric disorders. However, social impairment arising due to autism is not quite the same as social impairment resulting from other psychiatric disorders. For example, peer relationship difficulties may result from anxiety or fear of social evaluation in an anxious child, from disruptive and aggressive behaviors in individuals with externalizing disorders, or from a lack of social motivation or know-how in a child with autism. Attributing impairment specifically to ASD symptoms in the context of comorbid presentations requires clinical acumen in autism, in general psychopathology, and experience in the differential diagnosis.

The concern about false positive diagnoses extends beyond the realm of diagnostic evaluation in clinical settings. In recent research studies, criteria to include participants are increasingly relying on loose characterization of participants. For example, studies have employed web-based recruitment strategies where inclusion criteria were limited to self-completed autism checklists and/or unverified self-reported diagnoses, likely resulting in the unintentioned inclusion of non-autistic participants in ‘case’ groups. Likewise, some large-scale population-based studies investigating the association of risk factors with ASD have relied on autism-specific scale scores (e.g. Social Responsiveness Scale (SRS)) as sole outcome measures (e.g. Alampi et al., 2021), ignoring both the generally low positive predictive value of autism checklists’ total scores and the well-known confounding effects of co-occurring psychopathology on those scores (Havdahl et al., 2016). In both instances, these overinclusive measurement approaches leave an unsolvable problem of interpretation. Reliance on single informants, on isolated administration of instruments, on current only symptomatology, and a failure to adjust for the effect of co-occurring non-autistic conditions have the convergent effect of increasing misclassification and decreasing specificity.

There may be broader societal forces contributing to the phenomenon of overdiagnosis. In the US, universal screening for detecting autism in 18- and 24-months toddlers with the Modified-Checklist for Autism in Toddlers (M-CHAT) is recommended by the American Academy of Pediatrics (Hyman et al., 2020), generating large numbers of screen-positive toddlers which build up expectations of a future confirmatory diagnosis. Major changes in state and federal laws have mandated insurance companies to provide coverage for Applied Behavioral Analysis (ABA) services in most states resulting in a rapid burgeoning of the market offer for these types of services. As eligibility for services requires a formal ASD diagnosis, pressure to diagnose ASD has increased on providers. Finally, in several US states, ASD diagnosis and eligibility to access services can be determined not only by medical teams but also by school-based special education teams. ASD eligibility for services has soared in the US education system when laws were passed to expand services for autism (Newschaffer, Falb, & Gurney, 2005) illustrating how rates of diagnoses can be strongly influenced by social policies. Collectively, these factors may have contributed to overinclusive practices in diagnosing ASD although for now, these are hypotheses that remain to be empirically tested. As most of our observations originate from the US context and may, therefore, reflect specific features of the US health care system, accounts from other countries would form a useful contrast, especially from those that have universal health coverage and tiered health care systems with tertiary specialized teams still providing the bulk of ASD diagnoses.

The consequences of overdiagnosis should be appreciated in full. Many would argue that the priority is to provide access to services for children presenting with neurodevelopmental disorders and that the consequences of underdiagnosis are far more deleterious than those due to overdiagnosis. It may be so but that does not mean that erroneously diagnosing a child with ASD is harmless. At the individual level, carrying an ASD diagnosis may unduly constrain one individual's range of social and educational experiences and have long-lasting effects on his/her/their identity formation. At a population level, the unjustified use of intensive services raises concerns about equity and fairness in services access for children who have neurodevelopmental disorders other than autism and struggle to access support services that they need as much as their peers with ASD. In etiologic studies, inclusion in the ASD case groups of phenocopies will bias the results towards the null; and it will decrease the power to detect treatment effects in randomized clinical trials.

The complexities, costs, and resources involved in diagnostic confirmation are considerable, justifying the calls for streamlined diagnostic procedures in clinical settings and ‘rapid phenotyping’ in large-scale studies. Yet, while lighter instrumentation and a compressed diagnostic evaluation/confirmation process may be needed (or indeed be the only available option), investigators should keep in mind the risk of overdiagnosis of ASD and devise measurement strategies that limit misclassification and false positives. The particular combination of data sources, instruments, and informants optimal to attain this goal will vary across clinical settings and study designs, precluding any formulaic solution to the problem. Nevertheless, adherence to core principles of the diagnostic process discussed above, as well as by Bishop & Lord, may achieve improved levels of specificity in case ascertainment and outcome measurement. These cardinal principles include: (a) reliance on several informants or data sources, and not just one only; (b) supplementing the evaluation of current functioning with other data points that confirm a trajectory of autistic symptomatology; (c) demonstrating that functional impairment results from underlying autistic disturbances as opposed to co-occurring conditions or specific contextual constraints; and (d) providing a clinically informed and validated procedure to integrate all measurements at the individual level.

REFERENCES

_________________________

AVVISO IMPORTANTE: i consulti on/line hanno solo valore di consigli e non intendono sostituire in alcun modo la visita medica o psicologica diretta.