- 1. Overview
- 2. Etymology
- 3. Cultural Impact
Rat Genome Database
The Rat Genome Database (RGD) is a comprehensive database of rat genomics , genetics, physiology, and functional data, as well as data for comparative genomics between rat, human, and mouse. It serves as a critical resource for researchers utilizing the rat as a model organism for investigating pharmacology, toxicology, general physiology, and the biology and pathophysiology of disease. RGD is responsible for attaching biological information to the rat genome via structured vocabulary, or ontology , annotations assigned to genes and quantitative trait loci (QTL), and for consolidating rat strain data and making it available to the research community.
Organism
RGD focuses on the organism Rattus norvegicus (rat), a widely used model organism in biomedical research due to its physiological and genetic similarities to humans.
Research Center and Contact
RGD is based at the Medical College of Wisconsin and is associated with the Biomedical Engineering Laboratory. The principal investigator is Anne E. Kwitek, who leads a team of researchers and curators dedicated to maintaining and expanding the database.
Authors and Primary Citation
The RGD team includes Ysabel Chen and numerous other contributors. The primary citation for RGD is available via PMID 31713623, which provides a comprehensive overview of the database’s scope and functionalities.
Access and Data Release
RGD is accessible via its website at rgd.mcw.edu . The database provides regular data releases, ensuring that researchers have access to the most up-to-date information.
Data
RGD’s data consists of manual annotations from RGD researchers as well as imported annotations from a variety of different sources. RGD also exports their own annotations to share with others. The database’s data page lists eight types of data stored in the database: Genes , QTLs , Markers , Maps, Strains , Ontologies , Sequences , and References . Of these, six are actively used and regularly updated.
Genes
Initial gene records are imported and updated from the National Center for Biotechnology Information’s (NCBI’s ) Gene database on a weekly basis. Data imported during this process includes the Gene ID, Genbank /RefSeq nucleotide and protein sequence identifiers, HomoloGene group IDs, and Ensembl Gene, Transcript, and Protein IDs. Additional protein-related data is imported from the UniProtKB database . RGD curators review the literature and manually curate Gene Ontology (GO ), diseases, phenotypes, and pathways for rat genes, diseases, and pathways for mouse genes, and diseases, phenotypes, and pathways for human genes. The site imports GO annotations for mouse and human genes from the GO Consortium, rat electronic annotations from UniProt, and mouse phenotype annotations from the Mouse Genome Database/Mouse Genome Informatics (MGD /MGI ).
QTLs
RGD’s staff manually curates data for rat and human QTLs from the literature where such publications exist or from records directly submitted by researchers. Mouse QTL records, including Mammalian Phenotype (MP) ontology assignments, are imported directly from MGI. For rat and human QTLs, curation includes assigning MP, HP, and disease ontology annotations. QTL positions are automatically assigned based on the genomic positions of peak and/or flanking markers or single nucleotide polymorphisms (SNPs ). QTL records link to information about related strains, candidate genes, associated markers, and related QTLs.
Strains
Like QTL records, RGD strain records are either manually curated from the literature or submitted by researchers. Strain records include information about the official symbol of the strain, origin and availability of the strain, associated phenotypes, whether the strain is a model for a human disease, and any information that is available about breeding, behavior, husbandry, etc. Strain records link to information about related genes, alleles, and QTLs, associated strains (e.g., parental strains or substrains), and, where available, strain-specific damaging nucleotide variants. For congenic and mutant strains, genomic positions are assigned for the introgressed region (congenic strains ) or the location of the mutated sequence (mutant strains).
Markers
Because genetic markers such as SSLPs and ESTs have been, and continue to be, used for QTLs and strains, RGD stores marker data for rat, human, and mouse. Marker data includes the sequences of the associated forward and reverse PCR primers, genomic positions, and links to NCBI ’s Probe database. Marker records link to associated QTL, strain, and gene records.
Cell Lines
RGD stores cell line records based on imports from Cellosaurus . Although the largest numbers of these are human and mouse cell lines, records are also available for rat, bonobo, dog, squirrel, pig, green monkey, and naked mole-rat.
Ontologies
In order to make RGD’s data both human-readable and available for computational analysis and retrieval, RGD relies on the use of multiple ontologies. As of July 2021, RGD used 19 different ontologies to express the various types of data applicable to RGD’s diverse datatypes. Ontology annotations are assigned manually by curators or are imported from external sources through the use of automated pipelines. Six of the ontologies in use at RGD were created or co-created at RGD, and seven are under development by RGD staff members or collaborators, these being ontologies for Pathway (PW), Rat Strains (RS), Vertebrate Traits (VT), Disease (RDO), Clinical Measurements (CMO), Measurement Methods (MMO), and Experimental Conditions (XCO). Ontologies which are imported from outside sources are updated weekly.