Data science blog
I write about data science, programming, and other things I find interesting. Here you will also find some older posts related to my PhD work in computational biology.
-
08 Sep 2019
I started my first full-time data science job about eight months ago, and getting here has certainly has been a learning experience! While the experience is still fresh in my mind, I wanted to share my thoughts. This post is a reflection on how I went from a PhD program in Cell and Molecular Biology to an industry data scientist.
-
02 Feb 2018
Last year I published a blog post that explained how to structure a Python package with a C++ extension module. My goal was to craft a Python package that leveraged C++ for performance and had an easily maintainable and testable structure. Well, seven months later, I'm revisiting Python/C++ packaging. I'm now convinced that the structure that I described in my original post is not ideal. This post will lay out the problems I discovered with my prior approach, and a complete guide to my new approach. I'll conclude with some thoughts on the big picture of working on projects with mixed codebases.
-
12 Jun 2017
For my research, I've spent the better part of the last year developing a simulation tool in Python. Python abstracts away things like memory management and type information, making it a great language for working through high-level design decisions. But for simulation software, pure Python is slow. So I've taken to a workflow of prototyping in Python and then rewriting portions of the code base into C++ for performance. I'm left with a high-performance Python package that mixes both Python modules and compiled C++-based extension modules. This combination leverages both the simplicity of Python and the efficiency of C++.
-
11 Jun 2016
I use R with ggplot2 to create publication-ready figures in PDF format. Using a combination of ggplot2 and cowplot, I rarely make any manual changes with Adobe Illustrator or any other vector-based editing tool. These script-generated PDF figures make it easy to modify my analyses without worrying about manually reformatting my figures. Unfortunately, when R generates a PDF, it does not embed the fonts. To understand why embedding fonts matters, compare the two plots below:
-
02 Jan 2016
I recently wiped my Macbook Pro and reinstalled Mac OS X El Capitan. I did this for several reasons, but mostly because I was thoroughly frustrated with MacPorts and wanted to switch to Homebrew (more on that later). To cleanly rid my laptop of MacPorts, I started with a fresh copy of El Capitan. This guide describes the steps I took to configure El Capitan for biocomputing.
-
16 Jun 2015
As my second year of graduate school approaches, I have decided to launch a personal website. I am including this blog as an opportunity to develop my writing skills outside of academic journals and conference proceedings. I will use this space to discuss recent publications, offer tutorials for useful scripts I have written, and discuss compelling topics in science and programming.