Introduction
Hospital ratings help patients make informed healthcare decisions. In the United States, the Centers for Medicare & Medicaid Services (CMS) publish quality ratings for thousands of hospitals based on several performance indicators.
In this analysis, I explored CMS hospital quality data to better understand what factors influence hospital ratings and whether machine learning can predict those ratings from hospital performance metrics.
Questions I Wanted to Answer
- What factors influence hospital ratings?
- Can hospital ratings be predicted using hospital quality metrics?
- Which performance indicators matter the most?
The Dataset
The dataset comes from the CMS Provider Data catalog and includes information for more than 5,000 hospitals across the United States.
The data contains several quality indicators related to:
- Mortality
- Patient safety
- Hospital readmissions
- Patient experience
- Timely and effective care
Each hospital also has an overall rating from 1 to 5 stars.
Exploring the Data
The first step was to understand how hospital ratings are distributed.
Most hospitals fall into the middle of the rating scale.
This means that ratings 3 and 4 are the most common, while ratings 1 and 5 are less frequent.
This imbalance makes predicting extreme ratings more difficult for machine learning models.
Can We Predict Hospital Ratings?
To test this, I trained a Random Forest classification model using several hospital performance metrics.
The model achieved an accuracy of approximately 39%, which is better than a simple baseline strategy of always predicting the most common rating (about 33%).
While the model does not perfectly predict hospital ratings, it does capture meaningful patterns in the data.
Which Factors Influence Ratings the Most?
Using feature importance from the Random Forest model, the most influential variables were:
These results highlight that patient safety and readmission outcomes appear to be strong indicators of hospital quality.
Hospitals that perform better in these areas tend to receive higher overall ratings.
Model Insights
The model performed best at identifying:
- Average hospitals (rating 3)
- High-performing hospitals (rating 5)
However, distinguishing between intermediate ratings such as 2 and 4 proved more difficult because their performance characteristics overlap.
This suggests that hospital performance often falls along a continuum rather than clear categories.
Conclusion
This analysis shows that hospital ratings can be partially predicted using quality metrics such as safety and readmission measures.
However, the moderate predictive performance also suggests that hospital ratings depend on multiple complex factors that are not fully captured by the available metrics.
Overall, patient safety and post-treatment outcomes appear to be key drivers of hospital performance.
Final Thoughts
Healthcare data offers valuable opportunities for data-driven insights.
Even relatively simple machine learning models can help highlight the most important factors affecting healthcare quality.
Future work could explore additional features or more advanced models to improve prediction accuracy.


Top comments (0)