I maintain most of my projects on github.

Microbial ecology bioinformatics tools

Distribution-based OTU calling (dbotu3)

dbOTU3 is the third implementation of the distribution-based (or “ecologically-informed”) OTU-calling algorithm. The distribution-based algorithm calls OTUs based on the similarity between sequences as well as their distribution across samples: sequences that are distributed similarly across samples are likely to be technically related (i.e., one is a sequencing error of the other) or biologically related (e.g., they are part of the same population of organisms). If two OTUs are sequence-similar and distribution-different, they go into different OTUs.


Caravan, a fork of SmileTrain, is a Python library for preliminary processing of 16S or other next-generation amplicon sequencing data. It has a simpler command-line interface, a set of commands that reminds me how to use usearch, and a bucket of dirty-and-purty Perl tools.

Microbial ecology analytics and modeling

Treatment Effect Explorer for Microbial Ecology Experiments (texmex)

Texmex is a tool designed to help visualize and interpret community dynamics observed in microbial ecology experiments that use sequencing count data. Given one inoculum split into two replicates, one control and one experimental, which OTUs are growing or shrinking in response to the treatment? Texmex is designed to help account for changes in the control unit’s community composition as well as dampen compositional effects that might occur in either unit.

Texmex was first implemented in R (CRAN, github). It has a beta Python implementation (github, docs).


Mystic is a collection of Matlab and Python scripts that implement a dynamic, conceptual simulation of the chemistry and microbial ecology of a dimictic lake. It was originally developed to microbial metabolisms in the epilimnion of Upper Mystic Lake outside Boston, MA.

Programming and productivity


Laterna is a Racket package that builds on the slideshow package. It’s like other text-to-slideshow tools in the sense that your slideshow content can be edited mostly as plain text, but it’s more powerful in the sense that you can use the Racket language to design your slides. If you’re writing a lot of code, I’d go with Remark. If you’re writing a lot of equations, I’d go with Beamer. If you’re showing just a little text and a lot of pictures, I’d go with Laterna.


Arginine is a Ruby gem that parses command line arguments, options, and flags. I made it for myself because I wanted to minimal features of optparse, the positional argument parsing of Python’s argparse, and the breezy syntax of gems like trollop.