SoyBase Database

SoyBase
Content
Description	SoyBase the USDA-ARS soybean genetics and genomic database
Data types; captured	Nucleic Acid, Protein, Expression, Metabolism, Epigenetics
Organisms	Glycine max, Glycine soja (soy, soya, soybean)
Contact
Research center	USDA Agricultural Research Service
Laboratory	Corn Insects and Crop Genetics Research Unit
Primary citation	PMID 20008513
Access
Data format	Various
Website	soybase.org
Miscellaneous
License	Public domain-US Government
Versioning	None
Data release; frequency	Continuously
Curation policy	Professionally curated

SoyBase is a database created by the United States Department of Agriculture. It contains genetic information about soybeans. It includes genetic maps, information about Mendelian genetics and molecular data regarding genes and sequences. It was started in 1990 and is freely available to individuals and organizations worldwide.

History

SoyBase was instituted by the Corn Insects and Corn Genetics Research Unit (CICGRU) in Ames, Iowa as a central repository for the soybean genetics community's published information.^[1] Originally, the database concentrated on genetic information such as genetic linkage maps and other Mendalian information. SoyBase genetic maps are a manually-curated composite of all published mapping and QTL studies, and thus provide a species level view of markers and QTL.

In 2010^[2] the soybean genome sequence was released along with gene models and many other types of genome annotations that were integrated in to SoyBase. SoyBase genetic linkage maps were integral to the assembly of the soybean genome.

In 2018 the database received approximately 63,000 page requests from 2,600 users per month from 130 countries. About 40 organizations in the United States and 82 foreign educational institutions access SoyBase yearly. SoyBase supplies data to U.S. and foreign government organizations and corporate entities.

Data submission and release policy

Data is accepted from the original source generators only. Users that independently identify data for inclusion into the database can contact SoyBase directly. A number Excel-based spreadsheet templates are available to facilitate the inclusion of data into SoyBase.

All data in SoyBase are available without restrictions. A number of data sub-setting and download tools are provided, and when needed ad hoc subsets of the data can be requested from the SoyBase Curator.

Search tool

The SoyBase Database Search Tool uses a text entry box for queries. Results are returned as text and as displays. Results display soybean genetic (and genomic) data using Generic Model Organism Database (GMOD) open-source software. In addition to SoyBase, objects identified by exact lexical matches to the query term, the tool also uses a soybean-specific ontology to identify biologically-related SoyBase objects.

Some SoyBase sequence data and annotations are available through an InterMine instance (SoyMine), which is a collaboration with the Legume Information System Project.^[3]

Graphical displays

Genetic maps contain information on markers (SSR, RFLP, SNP, etc.), genes, and biparental and Genome-wide Association Study (GWAS) Quantitative Trait Loci (QTL). Soybean genetic maps are displayed using the CMap comparative genetic map viewer. Soybean genomic sequence and gene model data are displayed using the GBrowse sequence viewer. Other genome annotations in this viewer include epigenetic data such as DNA methylation and gene expression data of various soybean strains subjected to different treatments and from different soybean tissues/cultivars. Metabolic data and biochemical pathway information is displayed using Pathway Tools. Soybean metabolic pathway information (SoyCyc) was inferred by the Plant Metabolic Network^[4] project and was used to populate PathwayTools displays.

References

^ Grant, David; Nelson, Rex T.; Cannon, Steven B.; Shoemaker, Randy C. (2010). "SoyBase, the USDA-ARS soybean genetics and genomics database". Nucleic Acids Research. 38 (Suppl 1) (Database issue): D843–D846. doi:10.1093/nar/gkp798. PMC 2808871. PMID 20008513.
^ Schmutz, Jeremy; Cannon, Steven B.; Schlueter, Jessica; et al. (2010). "Genome sequence of the palaeopolyploid soybean". Nature. 463 (7278): 178–183. Bibcode:2010Natur.463..178S. doi:10.1038/nature08670. PMID 20075913. S2CID 4372224.
^ Dash, S.; Campbell, J.D.; Cannon, E.K.; et al. (2016). "Legume information system (LegumeInfo. org): a key component of a set of federated data resources for the legume family". Nucleic Acids Research. 44 (D1): D1181–D1188. doi:10.1093/nar/gkv1159. PMC 4702835. PMID 26546515.
^ Schlapfer, P.; Zhang, P.; Wang, C.; et al. (2010). "Genome-Wide Prediction of Metabolic Enzymes, and Gene Clusters in Plants". Plant Physiology. 173 (4): 2041–2059. doi:10.1104/pp.16.01942. PMC 5373064. PMID 28228535.

External links

[1] Grant, David; Nelson, Rex T.; Cannon, Steven B.; Shoemaker, Randy C. (2010). "SoyBase, the USDA-ARS soybean genetics and genomics database". Nucleic Acids Research. 38 (Suppl 1) (Database issue): D843–D846. doi:10.1093/nar/gkp798. PMC 2808871. PMID 20008513.

[2] Schmutz, Jeremy; Cannon, Steven B.; Schlueter, Jessica; et al. (2010). "Genome sequence of the palaeopolyploid soybean". Nature. 463 (7278): 178–183. Bibcode:2010Natur.463..178S. doi:10.1038/nature08670. PMID 20075913. S2CID 4372224.

[3] Dash, S.; Campbell, J.D.; Cannon, E.K.; et al. (2016). "Legume information system (LegumeInfo. org): a key component of a set of federated data resources for the legume family". Nucleic Acids Research. 44 (D1): D1181–D1188. doi:10.1093/nar/gkv1159. PMC 4702835. PMID 26546515.

[4] Schlapfer, P.; Zhang, P.; Wang, C.; et al. (2010). "Genome-Wide Prediction of Metabolic Enzymes, and Gene Clusters in Plants". Plant Physiology. 173 (4): 2041–2059. doi:10.1104/pp.16.01942. PMC 5373064. PMID 28228535.

[1]

[2]

[3]

[4]