C. elegans online

Contents:


General resources 

Comprehensive web site

Central database

    WormBase
    This is worm genetic and genomic information through a useful web interface.  The project is a consortium between biologists and computer scientists, and is rund by the bioinformatics guru Lincoln Stein.  It has many features not available through the ACEDB format.  The best place to start looking for information on a worm gene.  See also Worm PD.

ACEDB

    "A C. elegans database"
  • WebACEDB -- The Sanger Centre's web version - try the AltaVista interface

  • This is the comprehensive worm genetics and genomics database.  "ACEDB" refers to both the software program that accesses the data and the worm data itself.  ACeDB can show you the genetic map, physical map, genomic sequence, ESTs, information on characterized genes, references, and pictures.  It is particularly useful for seeing the correspondance between loci on the genetic and physical maps.  You can download your own copy of ACeDB (4MB) by anonymous ftp (the Macintosh version is called macace).
  • Other general information sources

  • Sequences 

    Genome sequence

    Microarray data

    The C. elegans "proteome"

    • All redicted proteins can be found through Genbank, WormBase, and Sanger's ACEDB.
    • Proteome, Inc maintains a well-annotated database of predicted C. elegans proteins, Worm PD, that includes such information as size, pI, expression patterns (when known), and relevant references.  This database has a variety of search approaches, such as post-translational modifications and subcellular localization.  See a sample output
    • The C. elegans ORFemome cloning project at Harvard -- a searchable/BLASTable database of the worm ORFs put together for the purposes of expressing the sequences in yeast and bacteria
    • Southeast Collaboratory Structural Genomics -- a C. elegans protein expression and structural biology project
    Expressed sequence tags (ESTs)

    BLAST servers

    • BLAST at NCBI -- allows you to BLAST sequences from specific organisms, including C. elegans
    • C. elegans BLAST (Sanger)

    •      -- all finished and unfinished genomic sequence, ESTs, and predicted proteins (WormPep)
           -- blastn, tblastn, blastx, tblastx, blastp
           -- searches of genomic sequence (blastn, tblastn, tblastx) return links to cosmid sequences and their WebACEDB entries
           -- searches of protein sequence (blastp, blastx) return predicted locus names and links to their WebACEDB entries
    • C. elegans BLAST (Wash U)

    •      -- all finished and unfinished genomic sequence
           -- blastn, tblastn
           -- returns only cosmid names, no links
    • C. elegans ESTs BLAST (Japan) -- all ESTs; blastn; complete links
    • C. briggsae BLAST at WashU -- same genus, different species, about 20 to 50 million years diverged.

    •      -- comparing genes between C. elegans and C. briggsae can help identifiy important conserved sequences
      C. briggsae assembled sequence BLAST at WashU and at Sanger -- preliminary assembly of 10x sequence data of the whole C. briggsae genome. Very useful for finding regulatory sequences, miRNAs, protein motifs, etc.

      Advice: To find a worm homolog, use a protein sequence to search WormPep by blastp.  Then to get the DNA and protein sequences of the homolog, go to Genbank and use the name of the hit to retrieve the annotated cosmid sequence.

      (What is the difference between the various BLAST programs, blastx and tblastn, for example? See BLAST search programs.)

    Other sequence databases


    Obtaining DNAs and worm strains 

    ESTs (cDNA clones)

  • See Yuji Kohara's EST pages -- you can order for free worm ESTs (which are long and sequenced from both ends) from: ykohara@lab.nig.ac.jp
  • Genomic clones

    • Specific cosmids or YACs of genomic DNA can be requested from Alan Coulson at the Sanger Centre (alan@sanger.ac.uk).

    Expression vectors

    • worm expression vectors from the Fire lab -- Andy Fire has provided the worm community with an extraordinarily complete and useful collection of gene expression vectors, from a variety of promoters, to multiple beta-gal and GFP fusion vectors in all frames.

    Worm strains

    Gene knockouts

    Want to name a C. elegans gene or get your own designation for strains and alleles?


    Abstracts, news, and people 

    Abstracts and literature

    News and information

    • bionet.celegans newsgroup

    • Have your questions answered, seek advice, or solicit information.  Announcements concerning ACeDB updates, meetings, etc. always appear here.  You can also search archived posts for topics previously discussed.  This link is to an HTML version.  You can also monitor it using a newsreader or by email: send the message "subscribe CELEGANS" to biosci-server@net.bio.net.  The newsgroup is moderated, meaning it is free of 'spam'.
    • What's happening at the Boston Area Worm Meeting and the New York Area Worm Meeting

    Contacting worm labs

    Genetic Map and Nomenclature

    • For all issues concerning new genetic data and gene names, contact Jonathan Hodgkin jah@bioch.ox.ac.uk


    Protocols for worm work 

    Protocol collections

    Reverse genetics

    RNAi


    Things to know about the worm 

    C. elegans is a nematode

      Caenorhabditis elegans (Caenorhabditis means "new rod-shaped thing") is a nematode, or roundworm.  The earthworm is an annelid, or segmented worm, and is in a different phylum.  Nematodes are some of the most abundant animals on the planet. They are found in almost every  environment, and many are harmful parasites of animals and plants. C. elegans however is not a parasite. It  is a free-living nematode that lives in the soil. In a teaspoon of soil from a garden it is possible to find many  nematodes, some of which may be C. elegans or its relatives.  Soil nematodes eat bacteria.

     C. elegans is microscopic and grows fast

      C. elegans is barely visible with the naked eye -- a fully grown adult is approximately 1 millimeter long, or about the size of Lincoln's nose on a penny.  Its eggs are among the smallest in the animal kingdom. C. elegans grows in about three days and has hundreds of offspring.  An egg is laid after being fertilized  inside the mother and takes about 15 hours to develop.  After hatching from the egg into a larva which looks  like a miniature version of the adult, the worms develop through four larval stages (called L1, L2, L3, and  L4).  At the end of each larval stage they synthesize a new cuticle (a layer of protein and carbohydrates that  cover the animalís hypodermis, or skin) and shed the previous cuticle by molting. After the last molt they are  mature adults capable of reproducing.  Development to adulthood takes about 2 1/2 days at 25°C, and 6 days  at 15°C.  The total life-span of a worm under the best growth conditions is about 12 to 18 days at 20°C.  In the laboratory we grow C. elegans in petri plates that  contain an agar medium suplemented with cholesterol.  On the surface of the agar we put a lawn of E.  coli.  (Sometimes we accidentally contaminate the petri dishes with mold, yeast or other bacteria from our  hands, but it usually doesnít bother the worms much.)  The worms eat all the bacteria in a few days, get crowded and begin to starve, so we transfer them to fresh  plates.  In response to crowding, C. elegans can arrest development at the end of the second larval stage,  and last in that dormant state for months to years.  When these arrested worms, called dauer ("enduring")  larvae, are moved to fresh plates with bacteria to eat, they resume development where they left off.

    C. elegans is useful for experimental genetics

      C. elegans has two sexes, male and hermaphrodite.  The hermaphrodites produce both sperm and eggs and are self-fertilizing, or automictic.  Hermaphrodites typically lay 300 fertilized eggs during life.  If they  are fertilized by a male, they can produce hundreds more.  We can  freeze all of our mutant worm strains in liquid nitrogen (something one can't do with Drosophila).  It's not  the dauers or eggs that survive the freezing best, but the young larvae.  Because the worms are very small, we use binocular dissecting microscopes when we are examining them  and moving them from plate to plate.  For other purposes, such as to see the anatomy of the worm in great  detail or to watch individual cells dividing, we use compound microscopes with Nomarski (or Differential  Interference Contrast, DIC) optics.  One of the most useful things about C. elegans to us is how easy it is to find mutants that affect many  different kinds of processes.  Using mutants, we can deduce how these processes work  normally.  Mutants of C. elegans have been found that have altered development, behavior, movement,  ability to smell and taste, feeding, defecation, rate of growth, aging and programmed cell death.

    C. elegans is arguably the most throroughly understood animal

      C. elegans is the only animal for which the entire cell lineage is known from zygote to adult.  It is also the  only animal for which the entire wiring of its nervous system is known.  It is the first animal to have its  entire genome sequenced.  The genome size of C. elegans is 100 megabases or 1/30 the size of the human genome.  It has six  chromosomes, five autosomes and a sex chromosome, all of similar size.  The chromosomes are  holokinetic, that is, they don't have a centromere.  There is little repeated DNA in the genome, which makes  it good for sequencing.  There are about 18,000 protein coding genes, and 1000 RNA genes.  About half the  genes have similarity to genes in other organisms, and some are homologs of disease genes in humans. Experiments using C. elegans are done by thousands of scientists around the world and even on the space shuttle.
    Some movies (Quicktime):


    Things to know about C. elegans sequences 

    Genomic sequences are deposited in Genbank with annotations.

      The genome sequencing project proceeds cosmid by cosmid (where there are gaps between cosmids, YACs are sequenced).  Once the sequence of a cosmid is determined, its entire sequence (or at least that part that does not overlap with that of neighboring clones) is deposited in Genbank with the gene annotations.  Therefore, ususally there are not separate Genbank entries for each gene predicted on a cosmid (however, for some there are; go figure).  To retrieve the annotated sequence of a cosmid, use its name (F19A6, for example) to search  Genbank.  The Genbank file will have each predicted protein sequence within it (F19A6.4, for example).  To make a file of the DNA sequence of your gene alone you will have to cut and paste according to the numbers given for your gene in the Genbank file.  The sequence annotations begin with the ATG and end with the termination codon.

    Most gene sequences are only predictions.

      Be aware that most of the protein sequences are only predictions and have not been verified experimentally.  The DNA sequence itself is very rarely in error, but the locations of splices, starts and stops are sometimes predicted incorrectly by the Genefinder software.  If there are ESTs indicated that correspond to the gene (yk6b1.5, for example), then there has been some confirmation of the gene structure given.  Careful re-analysis of the predicted sequence may be worth the effort.  In particular, be suspicious of 'bifunctional' genes -- see the note below on trans-splicing.

    Predicted genes are given cosmid names, confirmed genes are given different names.

      All predicted genes are named after the cosmids on which they are found: for example, F19A6.4 is the fourth gene on the cosmid F19A6.  Here are examples of other types of DNA clones: Y69A9 is a YAC; yk6b1.5 and yk6b1.3 are the 5' and 3' reads of the EST yk6b1.  If you are searching ACeDB with the name of a sequence that begins with "CE" or "CEL", you probably aren't finding anything; search with the name without the "CE" or "CEL".  "Claimed" genes, including those identified by mutation, are given prefixes of three lowercase italicized letters and a number (list of gene prefixes and what they mean). Proteins are given the same names as genes, but are in all caps, not italicized.  For example, the protein encoded by the gene lin-28 is LIN-28.  For more information on how genes, alleles, and genotypes are written, see the official genetic nomenclature and list of gene names. If you want to request a new gene name, write to Jonathan Hodgkin jah@bioch.ox.ac.uk

    Many worm genes are trans-spliced at their 5' ends.

      Worm introns most often begin with GU and end with UUUCAG.  Seventy percent of worm mRNAs begin with a trans-spliced leader.  The cis-signal for trans-splicing is the same as the most common intron splice acceptor: UUUCAG.  The fact that cis and trans acceptors are the same means Genefinder can predict a false 5' exon or to fuse two closely spaced genes.  You can determine whether an EST goes all the way to the 5' end if you see some of the sequence of a spliced leader beginning the ".5" sequence.  The sequence of the spliced leaders are below.  Over 90% of trans-spliced leaders are SL1.  A few percent of worm genes are "bi-cistronic": two products are produced from a single primary transcript by an "internal" trans-splice, commonly using SL2.
    spliced leader length sequence
    SL1 22 cap-GGUUUAAUUACCCAAGUUUGAG-
    SL2 22 cap-GGUUUUAACCCAGUUACUCAAG-
    SL3 22 cap-GGUUUUAACCCAGUUAACCAAG-
    SL4 22 cap-GGUUUUAACCCAUAUAACCAAG-
    SL5 23 cap-GGUUUUAACCCAAGUUAACCAAG-