This documents the common pitfalls when working with Bioinformatics data and how to prevent them.
janitor::clean_names to standardize names to snakecases.
use a standardized name:
chrfor chromosome, instead of
seqnamesetc. Sometimes you have to change the name to fit a certain software (e.g.
GenomicRanages), but only convert the name within the call of the function itself, and immediately change back. Never propagate the name change to the next function because it will then be a headache to deal with the dependencies between functions.
Decide on one naming convention. For now, I decide on
chr# instead of
# because most bcf files that I work with contain such names.