We live in an era where data is everywhere. It has transformed numerous domains such as advertising, engineering, finance, etc. While lot of these fields have already benefitted enormously by leveraging data for effective decision making, biomedical science is just starting to extract value from big data. DNA sequencing costs are plunging making it very easy and accessible for everyone to get their genome sequenced. Fitness trackers like fitbit and iWatch are constantly monitoring and logging information about out physiological state. Data generation at such massive scale requires the right infrastructure for storage, retrieval, and complex analysis. Luckily, there are numerous companies and open source projects working to create scalable data processing pipelines for genomics and healthcare data.