Return to search results
💡 Advanced Search Tip
Search by organization or tag to find related datasets
Data from: Genetic variation among 481 diverse soybean accessions
<p dir="ltr">This data is from the manuscript titled: "Genetic variation among 481 diverse soybean accessions, inferred from genomic re-sequencing". SNP calls were obtained from resequencing 481 diverse soybean lines comprising 52 wild (<i>Glycine soja</i>) and 429 cultivated (<i>Glycine max</i>). This dataset contains 6 gzipped VCF (Variant Call Format) files with variant calls for all 481 USB accessions, all <i>G. max</i> accessions, <i>G. soja</i> accessions, accessions sequenced at 15x coverage, accessions sequenced at 40x coverage, and 106 accessions re-sequenced from a previous study (Valliyodan et al. 2016). SNPs were called using the Haplotype caller algorithm from the Genome Analysis Toolkit (GATK) version gatk-2.5-2-gf57256b. A total of 7.8 million SNPs were identified between the 481 re-sequenced accessions. SNPs were assigned IDs using the script "assign_name.awk" available at <a href="https://github.com/soybase/SoySNP-Names">https://github.com/soybase/SoySNP-Names</a>. SNP effects were predicted using SnpEff 3.0.</p><p dir="ltr">Dataset also available at <a href="https://soybase.org/data/v2/Glycine/max/diversity/Wm82.gnm2.div.Valliyodan_Brown_2021/">https://soybase.org/data/v2/Glycine/max/diversity/Wm82.gnm2.div.Valliyodan_Brown_2021/</a></p><p dir="ltr">Funding support provided by the United Soybean Board for the large-scale sequencing of soybean genomes (project #1320-532-5615), Bayer (previously Monsanto and Bayer), and Corteva (previously Dow AgroSciences), with in-kind support for analysis from USDA Agricultural Research Service project 5030-21000-069-00-D.</p><p dir="ltr">Resources in this dataset:</p><ul><li>Resource Title: Data Dictionary.File Name: Data_Dictionary_USB481.csvResource Description: Provides the name of Data file with details of Data type, Description of data content, Correspondence to SoyBase Data Store File, and Size of file.</li><li><br></li><li>Resource Title: List_of_Accessions.txt.gz.File Name: glyma.Wm82.gnm2_.div_.Valliyodan_Brown_2021.USB481_list.txt.gzResource Description: Table containing the list of all the accessions that were re-sequenced and the metadata associated with each accession.</li><li><br></li><li>Resource Title: Alignment_used_for_Phylogenetic_trees.fna.gz.File Name: glyma.Wm82.gnm2_.div_.Valliyodan_Brown_2021.sampled_25Kpos.fna_.gzResource Description: Aligned SNP data for USB481 accessions, based on SNPs sampled at one SNP per 25kb</li><li><br></li><li>Resource Title: Phylogenetic_tree.nh.txt.gz.File Name: glyma.Wm82.gnm2_.div_.Valliyodan_Brown_2021.sampled_25Kpos.nh_.txt.gzResource Description: Phylogenetic tree (newick format) of SNP data for USB481 data, based on SNPs sampled at one SNP per 25kb</li><li><br></li><li>Resource Title: Phylogenetic_tree.pxml.txt.gz.File Name: glyma.Wm82.gnm2_.div_.Valliyodan_Brown_2021.sampled_25Kpos.pxml_.txt.gzResource Description: Phylogenetic tree (phyloxml format; colored) of SNP data for USB481 data, based on SNPs sampled at one SNP per 25kb</li><li><br></li><li>Resource Title: SNP_Effect_predictions.gff3.gz.File Name: glyma.Wm82.gnm2_.div_.Valliyodan_Brown_2021.snpEff.gff3_.gzResource Description: Output from snpEff program using the SNPs from the full USB481.vcf file as input.</li><li><br></li><li>Resource Title: Soja_SNP_calls.vcf.gz.File Name: glyma.Wm82.gnm2_.div_.Valliyodan_Brown_2021.Soja_.vcf.gzResource Description: Genotype information in vcf format for 45 Soja lines from USB-funded project.</li><li><br></li><li>Resource Title: Soy106.vcf.gz.File Name: glyma.Wm82.gnm2_.div_.Valliyodan_Brown_2021.Soy106.vcf.gzResource Description: Genotype information in VCF format for 106 accessions from USB-funded project; from Valliyodan et al Sci Rep 2016.</li><li><br></li><li>Resource Title: USB481_index.vcf.gz.tbi.File Name: glyma.Wm82.gnm2_.div_.Valliyodan_Brown_2021.USB481_list.txt.gzResource Description: Binary indexed USB481.vcf.gz produced using tabix.</li><li><br></li><li>Resource Title: USB-40x.vcf.gz.File Name: glyma.Wm82.gnm2_.div_.Valliyodan_Brown_2021.USB-40x.vcf.gzResource Description: Genotype information in VCF format for 46 accessions sequenced at 40x coverage from USB-funded project.</li><li><br></li><li>Resource Title: SnpEff_predictions_Gmax_Accessions.gff.gz.File Name: glyma.Wm82.gnm2_.div_.Valliyodan_Brown_2021.snpEff_Gmax.gff_.gzResource Description: SnpEff results in GFF format using the USB481_nosoja.vcf file as input.</li><li><br></li><li>Resource Title: SnpEff_predictions_Gsoja_Accessions.gff.gz.File Name: glyma.Wm82.gnm2_.div_.Valliyodan_Brown_2021.snpEff_Gsoja.gff_.gzResource Description: SnpEff output in GFF format using Soja_SNP_Calls.vcf.gz as an input.</li><li><br></li><li>Resource Title: USB-15x.vcf.gz.File Name: glyma.Wm82.gnm2_.div_.Valliyodan_Brown_2021.USB-15x.vcf.gzResource Description: Genotype information in VCF format for 284 accessions sequenced at 15x coverage from USB-funded project.</li><li><br></li><li>Resource Title: USB481.vcf.gz.File Name: glyma.Wm82.gnm2_.div_.Valliyodan_Brown_2021.USB481.vcf.gzResource Description: Genotype information in VCF format for all 481 accessions from USB-funded project <a href="https://soybase.org/data/public/Glycine_max/Wm82.gnm2.div.G787/glyma.Wm82.gnm2.div.G787.USB481.vcf.gz" target="_blank">https://soybase.org/data/public/Glycine_max/Wm82.gnm2.div.G787/glyma.Wm82.gnm2.div.G787.USB481.vcf.gz</a> </li><li>Title: USB481_nosoja.vcf.gz.File Name: glyma.Wm82.gnm2_.div_.Valliyodan_Brown_2021.USB481_nosoja.vcf.gzResource Description: Combined genotype information, in VCF format, for all USB lines excluding the Sjoa lines from USB funded project. https://soybase.org/data/public/Glycine_max/Wm82.gnm2.div.G787/glyma.Wm82.gnm2.div.G787.USB481_nosoja.vcf.gz</li></ul><p><br></p>
Complete Metadata
| bureauCode |
[ "005:18" ] |
|---|---|
| identifier | 10.15482/USDA.ADC/1518301 |
| programCode |
[ "005:040" ] |