Data science blog
I write about data science, programming, and other things I find interesting. Here you will also find some older posts related to my PhD work in computational biology.
08 Sep 2019
I started my first full-time data science job about eight months ago, and getting here has certainly has been a learning experience! While the experience is still fresh in my mind, I wanted to share my thoughts. This post is a reflection on how I went from a PhD program in Cell and Molecular Biology to an industry data scientist.
02 Feb 2018
Last year I published a blog post that explained how to structure a Python package with a C++ extension module. My goal was to craft a Python package that leveraged C++ for performance and had an easily maintainable and testable structure. Well, seven months later, I'm revisiting Python/C++ packaging. I'm now convinced that the structure that I described in my original post is not ideal. This post will lay out the problems I discovered with my prior approach, and a complete guide to my new approach. I'll conclude with some thoughts on the big picture of working on projects with mixed codebases.
12 Jun 2017
For my research, I've spent the better part of the last year developing a simulation tool in Python. Python abstracts away things like memory management and type information, making it a great language for working through high-level design decisions. But for simulation software, pure Python is slow. So I've taken to a workflow of prototyping in Python and then rewriting portions of the code base into C++ for performance. I'm left with a high-performance Python package that mixes both Python modules and compiled C++-based extension modules. This combination leverages both the simplicity of Python and the efficiency of C++.
11 Jun 2016
I use R with ggplot2 to create publication-ready figures in PDF format. Using a combination of ggplot2 and cowplot, I rarely make any manual changes with Adobe Illustrator or any other vector-based editing tool. These script-generated PDF figures make it easy to modify my analyses without worrying about manually reformatting my figures. Unfortunately, when R generates a PDF, it does not embed the fonts. To understand why embedding fonts matters, compare the two plots below:
02 Jan 2016
I recently wiped my Macbook Pro and reinstalled Mac OS X El Capitan. I did this for several reasons, but mostly because I was thoroughly frustrated with MacPorts and wanted to switch to Homebrew (more on that later). To cleanly rid my laptop of MacPorts, I started with a fresh copy of El Capitan. This guide describes the steps I took to configure El Capitan for biocomputing.
16 Jun 2015
As my second year of graduate school approaches, I have decided to launch a personal website. I am including this blog as an opportunity to develop my writing skills outside of academic journals and conference proceedings. I will use this space to discuss recent publications, offer tutorials for useful scripts I have written, and discuss compelling topics in science and programming.