In a 24-minute session titled "Decoding Diseases with Big Data", Dr. Allison Goff and Dr. Shuhan He take us through the vital role of bioinformatics and health data analytics in uncovering disease mechanisms and shaping the future of healthcare. The talk begins with Dr. Goff’s deep dive into bioinformatics (first ~20 minutes), followed by Dr. He’s overview of the MSDA (Master of Science in Health Data Analytics) program (final ~4 minutes).

"Bioinformatics connects big data to biological insight. It allows us to ask — and begin to answer — questions that were once out of reach."

Part I: The Power of Bioinformatics in Disease Research — Dr. Allison Goff

Dr. Allison Goff, a bioinformatician with a PhD in genetics, shares how big data and computational tools enable the discovery of novel disease pathways. Her experience spans analytic pipeline development, machine learning, and RNA sequencing in research settings.

What is Big Data in Biology?

Big data refers to:

  • Large, complex datasets from high-throughput experiments.
  • Data that requires computational tools for cleaning, validation, and analysis.
  • A combination of volume, quality, and speed that surpasses manual interpretation.


Bioinformatics: A Crossroads of Science and Computation

Bioinformatics merges biology, statistics, and computer science to:

  • Analyze and interpret large-scale biological data.
  • Enable storage, organization, and retrieval of complex datasets.
  • Extract meaningful patterns that inform disease understanding and therapy development.


Key Techniques and Tools
Dr. Goff introduces common bioinformatics methods and software:

  • Variant calling: Identifying genomic changes (e.g., SNPs) using tools like GATK and SAMtools.
  • Molecular docking: Predicting protein-drug interactions using AutoDock and Schrödinger.
  • Differential gene expression: Analyzing how gene activity varies between conditions using tools like DESeq2 and edgeR.


Case Study: Premenstrual Dysphoric Disorder (PMDD)
Dr. Goff illustrates the bioinformatics pipeline through her work on PMDD:

  • Model: Created cell lines from blood samples of women with PMDD and matched controls.
  • RNA Sequencing (RNA-seq): Measured gene expression in these samples.
  • Processing pipeline:
    • Raw data quality check with FastQC.
    • Read alignment with tools like STAR and HISAT2.
    • Transcript quantification with HTSeq and featureCounts.
  • Analysis:
    • Used DESeq2 in R to find differentially expressed genes
    • Visualized results with heatmaps, volcano plots, and mean-difference plots.


One notable discovery was the role of VEGFA (a growth factor involved in neurogenesis). Further exploration through databases like GeneCards and Enrichr revealed its potential connection to PMDD symptoms and treatment response to SSRIs.

slide says decoding diseases with big data, dr. alison goff, phd, over an orange and black illustration of medical and science icons like microscopes and graphs