AutismInsights
Back to research database
Emerging

Identifying homogenous patient subgroups using transformer based hierarchical clustering of heterogeneous Mixed-Modality medical data.

Journal of biomedical informatics2025

Baskett William, Black Benjamin, Qureshi Adnan I, Shyu Chi-Ren

What this study means for families

Researchers developed a new computer program called Med-ROAR that can group patients with similar characteristics together. They tested it on nearly 150,000 people with autism and found it could identify meaningful patterns and subgroups within the autism community. The program can work with different types of medical information and even incomplete records, potentially helping doctors provide more personalized care by better understanding patient similarities.

Summary by AutismInsights from published abstract. This is not a substitute for reading the original paper.

Research summary

This computational study introduces Med-ROAR, a transformer-based machine learning model designed to identify homogeneous patient subgroups from heterogeneous medical data. The researchers tested the model on 147,469 individuals diagnosed with Autism Spectrum Disorder and 50,458 ICU patients. Med-ROAR uses hierarchical clustering to group patients based on learned semantic similarities rather than traditional distance-based methods. The model can process mixed data types (tabular and time-series) and handle incomplete records.

Results suggest Med-ROAR discovers more cohesive patient clusters than standard agglomerative clustering and identifies clinically meaningful autism phenotype patterns, including atypical patient subgroups within the ASD population.

Summary by AutismInsights from published abstract. This is not a substitute for reading the original paper.

Key findings

  • 1

    Med-ROAR identified clinically meaningful patterns of phenotypes within ASD population

    Confidence: The abstract states this was demonstrated but provides limited detail on validationRelevance: Could help identify autism subgroups for more targeted interventions
  • 2

    The model discovered more cohesive high-level clusters than distance-based agglomerative clustering methods

    Confidence: Comparative analysis was conducted but specific metrics not provided in abstractRelevance: May improve patient stratification for clinical decision-making
  • 3

    Med-ROAR can predict patient subgroups using incomplete, preliminary information collected shortly after admission

    Confidence: Demonstrated in ICU patient data but detailed validation metrics not specifiedRelevance: Could enable early identification of patient subgroups for timely interventions

Summary by AutismInsights from published abstract. This is not a substitute for reading the original paper.

Clinical implications

Med-ROAR represents a promising computational approach for identifying autism subgroups that could inform personalized care strategies. However, clinical validation is needed to establish whether the identified subgroups translate to meaningful differences in treatment response or outcomes. The ability to work with incomplete data could be valuable in real-world clinical settings.

Summary by AutismInsights from published abstract. This is not a substitute for reading the original paper.

Limitations

This appears to be a computational methodology study focused on algorithm development rather than clinical validation. The abstract lacks specific performance metrics, clinical outcome measures, or details about the meaningfulness of identified autism subgroups. No information provided about generalizability to other populations or clinical settings.

Summary by AutismInsights from published abstract. This is not a substitute for reading the original paper.

Original abstract

Patients are highly heterogeneous, with varying needs and responses to treatment. Identifying clinically homogenous patient subgroups is critical to improve personalized care. Patient records are often heterogeneous, may include multiple modalities which conventionally require separate data processing considerations, and are often incomplete, leading to difficulties in identifying meaningful clusters of patients. We introduce a Med-ROAR, a transformer-based Random Order AutoRegressive (ROAR) embedding model for medical data.

Med-ROAR hierarchically clusters data by encoding it into hierarchical discrete embeddings using a modified self-attention operation to facilitate random order mixed modality autoregressive modeling. This allows the model to accept arbitrary mixes of record types without special considerations. We compare our method's clustering effectiveness to standard agglomerative clustering using 147,469 individuals diagnosed with Autism Spectrum Disorder (ASD). We also evaluate its use on data with mixed modalities and its resilience to missing information using 50,458 clinical records from Intensive Care Unit (ICU) patients which include both tabular and time-series components.

We demonstrate that Med-ROAR is more likely to discover more cohesive high-level clusters than distance-based methods like agglomerative clustering. Our exploratory analysis of the autism data identifies clinically meaningful patterns of phenotypes within ASD. We identify homogenous, but atypical, patient subgroups within the ASD population. We also demonstrate Med-ROAR's effectiveness in clustering patients using mixes of both tabular and time series clinical records from ICU patients.

We demonstrate that Med-ROAR can predict patient subgroups even using incomplete, preliminary information collected shortly after admission. Med-ROAR is a flexible hierarchical clustering technique which learns to cluster patients based on learned high-level semantic similarities rather than rule-based metrics. It can accept whatever patient data may be available without modification to the underlying model architecture. The data modalities which Med-ROAR can accept are primarily constrained by computational resources, rather than architectural limitations.

View Original Paper

View original paperFull paper via publisher (may require subscription)

Evidence Grade

Emerging

emerging

Grade assigned by AutismInsights based on study type and published abstract.

Study Details

Journal
Journal of biomedical informatics
Year
2025
PMID
40664278
DOI
10.1016/j.jbi.2025.104878

MeSH Terms

HumansCluster AnalysisAutism Spectrum DisorderElectronic Health RecordsMedical InformaticsAlgorithmsIntensive Care UnitsMaleFemale