What’s in a Name? C. elegans Edition

The famed model organism Caenorhabditis elegans has quite a few feathers in its cap. It was the first multicellular organism to have its genome sequenced and the only organism to have its entire neuron wiring (connectome) mapped. Moreover, C. elegans research has led to discoveries in everything from cell life cycles and gene silencing to the biology of aging, sleep, addiction, and even space travel!

Neural anatomy of C. elegans. Source: OpenWorm project. Shared under an MIT License

Fueling these discoveries are a positive arsenal of sub-species called strains. A strain is defined as a set of individuals of a particular genotype with the capacity to produce more individuals of that same genotype. Over 3,000 strains of C. elegans are available for purchase.

How do scientists keep track of all these strains? By giving them each a unique name!

C. elegans strain names are short and may initially seem somewhat nonsensical. However, each one is following a simple naming convention first established by the geneticist Dr. Sydney Brenner.

Each strain name consists of two or three uppercase letters followed by a number. The letters refer to the lab that first registered the strain. This means that every lab that’s produced a C. elegans strain has its own special alphabetical code name! These code names are managed by the CGC. You can check out the lab name list here (https://cgc.umn.edu/laboratories) – or if your lab has a new strain but no code name request one here (https://cgc.umn.edu/laboratory/request).

Because labs often introduce multiple strains, each lab code name is followed by a number.

For example: CB112 – a dopamine deficient strain that’s often used in addiction research – was first created at Oxford University in the lab of Dr. Jonathan Hodgkin. Dr. Hodgkins was one of the first geneticists to use C. elegans as a model organism so it’s little surprise that the strains registered by his lab now number in the hundreds hence 112. You can find out more about the CB112 strain here. https://cgc.umn.edu/strain/CB1112

If you’ve worked with C. elegans before you’ve likely come across similarly short letter/number combo names that are also italicized. These are gene names.

The convention for C. elegans gene names is that they are always three italic letters followed by a hyphen and then a number and sometimes a roman numeral. The letters describe a key property of the gene – usually the protein it produces or the phenotype that it’s associated with. The number references the gene’s discovery order which helps to distinguish between genes with related properties / identical letter prefixes. Finally, the optional roman numeral refers to the linkage group that the gene maps onto.

For example: let-37 is a gene that was discovered because worms with this mutation quickly die (let for LEThal). Similarly, dpy-5 I is a gene involved in collagen production that was discovered because worms with a mutation in this gene were shorter. In this instance dpy stands for DumPY. The genetic position of dpy-5 I is on the first linkage group.

In the world of C. elegans the acronyms can continue. Sometimes after a gene name a scientist will include the name of the specific mutation/allele in parenthesis. Mutation names are given a combination of two letters and a number and an optional letter description. Wild type alleles are always represented by + sign.

So next time you work with or read about a “BA17 (fem-1 (hc17ts)) IV” worm decode the information! You’ll figure out that you’re working with a strain discovered at Dr. Herman’s lab at the University of Minnesota. And that this strain has a mutation in a gene in the IV linkage group that effects sex determination (FEMinization) by making this process more temperature sensitive (ts). Most importantly you’ll know there’s a logic and system behind the name!

Interested in finding the science secrets hidden in other names? Check out this post about restriction enzymes.

Related