What is biomedical data science?
Biomedical data science spans a range of biological and medical research challenges that are dataintensive and focused on the creation of novel methodologies to advance biomedical science discovery.  Annual Review of Biomedical Data Science
Here is a listing of some resources that I have found while researching and studying the field of biomedical data science and analytics. Unfortunately many books and courses listed here are paid, but I have tried my best to list some free and opensourced resources too. Let’s go to them!
Table of Contents
 What is biomedical data science?
 Notice of nonaffiliation and disclaimer
 Statistics and math
 Data engineering

Data manipulation, data analysis, and machine learning
 Data Science and Predictive Analytics: Biomedical and Health Applications using R
 Computational Learning Approaches to Data Analytics in Biomedical Applications
 Statistical Learning for Biomedical Data
 Case Studies in Neural Data Analysis
 Neural Data Science: A Primer with MATLAB and Python
 Computational Genomics with R
 Bioinformatics: The Machine Learning Approach
 Biomedical Image Analysis in Python
 Datasets
 Conclusions
Notice of nonaffiliation and disclaimer
I am not the author nor I am associated with any author, publishing company, or digital platform of the resources mentioned here. I also was not paid, endorsed, or compensated in any way for this post. Any reference in this post is for the information and convenience of the public and does not constitute an endorsement, recommendation, or favoring.
Statistics and math
A good understanding of statistics and mathematics is fundamental to any data science or machine learning analysis. The most basic and key concepts include probability distributions, statistical significance, hypothesis testing, and regression. Here are some resources dedicated to teaching you all of that (and more) with examples from biomedical sciences.
Modern Statistics for Modern Biology
Susan Holmes, Wolfgang Huber
Book 📘  Code: R  Free: ✅  Link ↗️
The aim of this book is to enable scientists working in biological research to quickly learn many of the important ideas and methods that they need to make the best of their experiments and of other available data. The book takes a handson approach.
This book is not heavy on mathematics, it goes straight to the core concepts and has a lot of R code examples and exercises! It ranges from the basics of data distributions and hypothesis testing to more advanced topics like multivariate analysis and supervised learning.
Statistics for Biomedical Engineers and Scientists
Andrew King, Robert Eckersley
Book 📘  Code: MATLAB  Free: ❌  Link ↗️
Readers will learn how to understand the fundamental concepts of descriptive and inferential statistics, analyze data and choose an appropriate hypothesis test to answer a given question, compute numerical statistical measures and perform hypothesis tests “by hand”, and visualize data and perform statistical analysis using MATLAB.
This is just what you would expect from a regular undergraduate level book about probability and statistics. Not heavy on math and it has a lot of exercises.
Applied Mathematics for the Analysis of Biomedical Data: Models, Methods, and MATLAB
Peter J. Costa
Book 📘  Code: MATLAB  Free: ❌  Link ↗️
Features a practical approach to the analysis of biomedical data via mathematical methods and provides a MATLAB® toolbox for the collection, visualization, and evaluation of experimental and reallife data
This one is heavier on maths and assumes you are familiar with elementary differential equations, linear algebra, and statistics.
DataHandling in Biomedical Science
Peter White
Book 📘  Code: ❌  Free: ❌  Link ↗️
Packed with worked examples and problems, this book will help the reader improve their confidence and skill in datahandling.
This one is a little different from the previous ones, but it is worth listing. The book has no code examples and it is not about computational methods of data handling and analysis. It teaches basic math and statistics needed for biochemistry and microbiology experiments.
Data engineering
As important as analyzing data, we also need to know how to design and maintain data pipelines. Biomedical data can be messy, heterogenous, and big, but fortunately, these authors are here to help us!
Data Warehousing for Biomedical Informatics
Richard E. Biehl
Book 📘  Code: SQL  Free: ❌  Link ↗️
A stepbystep howto guide for designing and building an enterprisewide data warehouse across a biomedical or healthcare institution, using a fouriteration lifecycle and standardized design pattern.
This book is a gem. Classical content about data warehousing and ETL pipelines, but really focused on biomedical and healthcare data. Lots of SQL code snippets!
Big Biomedical Data Engineering
Ripon Patgiri, Sabuzima Nayak
Book chapter 📄  Code: ❌  Free: ✅  Link ↗️
This chapter exploits the role of Big Data in biomedical data engineering and its storage dilemma.
A short book chapter that discusses some scenarios of biomedical big data applications and possible future.
Data manipulation, data analysis, and machine learning
This is where most people have fun. Let’s see how to handle, clean, analyze and extract insights from biomedical data.
Data Science and Predictive Analytics: Biomedical and Health Applications using R
Ivo D. Dinov
Book and MOOC 📘 💻  Code: R  Free: ✅ ❌  Link ↗️  Free online material ↗️
Complete and selfcontained treatment of the theory, experimental modeling, system development, and validation of predictive health analytics.
A comprehensive data science book: introduction to R, data manipulation, data visualization, classification, regression, NLP, and even a little Deep Learning! All of this with welldocumented R code. The book is not free, but you can find the videos, class notes, and R code on the author’s page linked above.
Computational Learning Approaches to Data Analytics in Biomedical Applications
Khalid AlJabery Tayo ObafemiAjayi Gayla Olbricht Donald Wunsch
Book 📘  Code: Python, MATLAB  Free: ❌  Link ↗️
It presents insights on biomedical data processing, innovative clustering algorithms and techniques, and connections between statistical analysis and clustering.
An interesting and more theoretical approach to data preprocessing and clustering algorithms. Examples are given in pseudocode and some math knowledge is required. The last chapter has a handson approach using MATLAB and Python codes.
Statistical Learning for Biomedical Data
James D. Malley, Karen G. Malley, Sinisa Pajevic
Book 📘  Code: MATLAB  Free: ❌  Link ↗️
This book is for anyone who has biomedical data and needs to identify variables that predict an outcome, for twogroup outcomes such as tumor/nottumor, survival/death, or response from treatment.
Not heavy on math and does not have many code examples. Great theoretical explanations covering regression, single decision trees, and Random Forests.
Case Studies in Neural Data Analysis
Mark Kramer, Uri Eden
Book 📘  Code: Python  Free: ✅  Link ↗️
The intended audience is the practicing neuroscientist  e.g., the students, researchers, and clinicians collecting neuronal data in the hospital or lab. The material can get pretty mathheavy, but we’ve tried to outline the main concepts as directly as possible, with handson implementations of all concepts.
Great handson material for neuroscientists interested in analyzing spike trains and electric fields. All notebooks are in Python and have a little explanation about the concepts and goal of the analysis.
Neural Data Science: A Primer with MATLAB and Python
Erik Lee Nylen, Pascal Wallisch
Book 📘  Code: Python, MATLAB  Free: ❌  Link ↗️
A beginner’s introduction to the principles of computation and data analysis in neuroscience, using both Python and MATLAB, giving readers the ability to transcend platform tribalism and enable coding versatility.
This book is beautifully organized and filled with images. The coolest thing about it is the MATLAB and Python code written sidebyside. The content ranges from the basics of programming to advanced techniques such as analog signal processing, biophysical modeling, clustering, and classification.
Computational Genomics with R
Altuna Akalin
Book 📘  Code: R  Free: ✅  Link ↗️
The aim of this book is to provide the fundamentals for data analysis for genomics. We want this book to be a starting point for computational genomics students and a guide for further data analysis in more specific topics in genomics.
This book has a great introduction to genomics that will help a lot if you are not coming from a biological related field. It covers many topics such as introduction to R, statistics, exploratory data analysis, supervised learning, RNASeq, and more!
Bioinformatics: The Machine Learning Approach
Pierre Baldi, Søren Brunak
Book 📘  Code: ❌  Free: ❌  Link ↗️
The book is aimed both at biologists and biochemists who need to understand new datadriven algorithms and at those with a primary background in physics, mathematics, statistics, or computer science who need to know more about applications in molecular biology.
This one is a little heavy on math, you will probably need some calculus, algebra, and probability theory. The book is really about the theoretical aspects of machine learning applied to bionformatics, including definitions of main concepts and proofs of main theorems.
Biomedical Image Analysis in Python
DataCamp
Videos and interactive code 💻  Code: Python  Free: ❌  Link ↗️
In this introductory course, you’ll learn the fundamentals of image analysis using NumPy, SciPy, and Matplotlib. You’ll navigate through a wholebody CT scan, segment a cardiac MRI time series, and determine whether Alzheimer’s disease changes brain structure.
Great content and it follows the DataCamp course structure: short videos and handson coding exercises directly in the browser!
Datasets
Here are some places where you can find datasets to explore and exercise your skills:
Synthea: Synthetic Patient Generation
MITRE Corporation
Link ↗️
SyntheaTM is an opensource, synthetic patient generator that models the medical history of synthetic patients. The resulting data is free from cost, privacy, and security restrictions, enabling research with Health IT data that is otherwise legally or practically unavailable.
PhysioNet: The Research Resource for Complex Physiologic Signals
MIT Laboratory for Computational Physiology
Link ↗️
PhysioNet is a repository of freelyavailable medical research data, managed by the MIT Laboratory for Computational Physiology.
Computational Biology Datasets Suitable For Machine Learning
Ben Lengerich
Link ↗️
This is a curated list of computational biology datasets that have been preprocessed for machine learning.
Kaggle: Healthcare tag
Link ↗️
Kaggle is the world’s largest data science community with powerful tools and public datasets.
NIH: Data Sharing Resources
TransNIH BioMedical Informatics Coordinating Committee
Link ↗️
To help researchers locate an appropriate resource for sharing their data, as well as to promote awareness of resources where datasets can be located for reuse, BMIC maintains lists of several types of data sharing resources.
Conclusions
That’s it! This comprehensive list covers many areas of biomedical data science and analytics, but there are many more great resources out there! Do you think I might have left out something important? Share with us in the comments!
Top comments (0)