Until recently, modeling transcription and translation at individual base-pair resolution has been limited, due to computational constraints, to relatively short sequences. I want to move beyond modeling these short sequences. Modeling transcription and translation across a whole virus is the next logical step. Many viruses have compact genomes, short life-cycles, and they come with a wealth of detailed experimental data. My goal is to construct base-pair resolution models of viral transcription and translation. Such a model could predict the effects of promoter knockouts, genome rearrangements, codon changes, and many other manipulations that are now possible thanks to advances in synthetic biology. I plan to constrain these models with experimental data from collaborators at UT Austin. Ultimately, my goal is to use these models to understand how basic transcriptional and translational processes drive viral fitness and adaptation.
In a given protein, different amino-acid sites evolve at different rates. Amino acids near a catalytic site tend to be more conserved than residues elsewhere in the protein, and the protein must fold into the correct form to perform its catalytic activity. It follows, then, that there must be some relationship between the three-dimensional structure of a protein and the rate at which different parts of the protein evolve. My goal is to uncover the relationship between protein structure and protein evolution. Improving our understanding of the physical constraints on protein evolution could give us broad insights into viral protein evolution, drug design, antibiotic resistance, and other areas of study.
ParseMSF is an R package for parsing ThermoFisher MSF files and estimating protein abundances. The package provides several R functions for inspecting ThermoFisher MSF files, a proprietery file format commonly used in mass spectrometry-based protein quantitation. These functions retrieve peptide information, map peptides within a parent protein sequence, and estimate protein abundances based on peak peptide areas. I developed this package to facilitate automatic processing and manipulation of proteomics data in R. The ParseMSF package is publicly available on CRAN.
In synthetic biology, organisms are engineered to produce compounds that they would not otherwise produce. Production of these foreign compounds often has a high fitness cost, so engineered organisms may accumulate mutations that disable this production. In other words, the engineered organisms break. I worked with Jeff Barrick to construct the Evolutionary Failure Mode (EFM) Calculator, a computational tool that addresses the evolutionary instability of engineered organisms. The EFM Calculator predicts hyper-mutable sites in a given DNA sequence. It allows researchers to test sequences in silico before placing them into organisms. The EFM Calculator began as a rotation project during my first few months of graduate school and I continue to improve the Calculator's predictions and interface.