Barleymap

(Map markers to the barley genome - Morex Genome 2017 edition)

back



Overview


Barleymap was designed to search the genetic and physical positions of barley markers on the Barley Physical Map (IBSC[1]) and the POPSEQ map[2]. The current version allows working with the Morex Genome[3] also.

Barleymap provides three tools to retrieve data from the maps:
  • "Find markers": to retrieve the position of markers providing their identifiers.
  • "Align sequences": to obtain the position of FASTA sequences by pairwise alignment.
  • "Locate by position": to examine specific loci by map position.



Find markers


The "Find markers" tool allows searching for loci which are commonly used by the barley community. These loci include genetic markers, genes, BAC contigs, WGS contigs, etc. from different datasets. Their map positions have been previously computed and stored, so that the users can retrieve them by providing the identifier of the locus.

Be aware that "Find markers" datasets were generated using fixed parameters. In those cases when the user wants to perform a more specific search, e.g. by choosing the alignment tool or parameters, it is recommended to get the FASTA sequence of the query and use the "Align sequences" tool instead.

As input data, the user must provide a list of identifiers to use as queries. Besides that, the user needs to choose which is the map (or maps) from which to obtain the positions, using the selection list "Choose maps".

The "Genes/Markers enrichment" area allows the user to customize which additional data will be output along with the map positions of queries. First, the user can choose whether to show "genes", "markers" and/or "anchored". The last usually refers to WGS contigs, BAC contigs, or other elements associated to map positions (anchored), but which lack a biological meaning per se. Besides that, the user can also choose whether to "show only main features" for each map. For example, for Morex Genome, "HORVU" genes are configured as "main" whereas "MLOCs" are not. The "Add features" option involves 2 ways to add additional data to the results:
  • "on markers": the additional data is searched for each marker independently. Each additional row is appended after the query position.
  • "on intervals": the additional data is searched in the regions defined by all the queries. Each additional row is added only once, and in its actual position in relation to the queries.
Therefore, the "on markers" option is better to obtain detailed data associated to each query, whereas the "on intervals" option is better to obtain a map-like result. Note that both previous options show additional data in the same position as queries by default. To obtain additional data around the markers the "Extend genes/markers search" option must be activated and the interval, in cM or bp, depending on the value of the "Sort by" option (see below), should be adjusted.

Other parameters include whether to show or not markers with multiple mappings, whether to sort the output by centimorgans (cM) or basepairs (bp), and an option to send the results to an email address provided by the user. Note that the option "Sort by" will be applied only for IBSC2012, which has both cM and bp positions available. POPSEQ data will be always sorted by cM, and Morex Genome data by bp.

Datasets included in Barleymap web


The next is a list of datasets whose map positions have been pre-computed and stored in this instance of the Barleymap web application. Note that the standalone version or a custom web version of barleymap could be used to create other datasets.
  • BOPA1 dataset[4][11]: bears 1,536 sequences. "BOPA consensus" (e.g.: 11_20003) or "POPA12" identifiers must be provided (e.g.: ABC09016-2-2-348, 7174-365, BOPA1_7174-365, ...).
    A full list of markers, different identifiers and their sequences can be found at [4][11] (supplementary Table S9).
  • BOPA2 dataset[4][11]: bears 1,536 sequences. "BOPA consensus" identifiers must be provided (e.g.: 12_31342, i_12_31342, BOPA2_12_31342).
    A full list of markers, different identifiers and their sequences can be found at [4][11] (supplementary Table S10).
  • Illumina iSelect Infinium[5]: 7,864 sequences. Identifiers can be provided in different formats (e.g.: i_11_10882, 11_10882, 6964-414, BOPA1_6964-414, ...).
    A full list of markers, different identifiers and their sequences can be found at [5] (supplementary Table 6).
    (Illumina Infinium iSelect technology belongs to Illumina®)
  • Illumina 50K[6][*]: X,XXX sequences. "Illumina 50K" identifiers must be provided (e.g.: JHI-Hv50k-2016-7), but it accepts previous identifiers for markers from previous datasets (e.g. SCRI_RS_10006).
    A full list of markers, different identifiers and their sequences can be found at [6][*'] (supplementary Table XX).
    (Illumina Infinium technology belongs to Illumina®)
  • DArTs[7][6]: 2,000 sequences (e.g.: bPb-3150 or bPb-3150_PUR_f+r, bPb-2614 or bPb-2614_WSU_r).
    Sequences for DArTs can be found at [7][6'].
  • DArTseq SNPs[8][7]: 8,535 sequences (e.g.: 3254894|F|0 or 3254894).
  • DArTseq PAVs (SilicoDArTs)[8][7]: 15,526 sequences (e.g.: 3271396|F|0 or 3271396).
    NOTE that 1,761 sequences from DArTseq are PAVs and contain SNPs, so that the identifier is the same for both markers.
    (DArTsTM and DArTseqTM technologies belong to Diversity Arrays Technology®)
  • Oregon Wolfe Barley GBS SNPs[9][8]: 34,396 sequences (e.g.: owbGBS1162 or owbGBS34926).
    A full list of markers their sequences can be found at [9][8] (supplementary Dataset S1).
  • Haruna nijo cultivar flcDNAs[10][9]: 28620 sequences (e.g.: AK358336 or AK358336.1).
  • HarvEST Unigenes (assembly #36)[11][10]: 70148 sequences (e.g.: U36_70143 or U36_998).
  • IBSC2012 genes[2][*]: 14,923 HC and 19,415 LC genes (e.g.: MLOC_67805).
  • IBSC2012 BES[2][*]: IBSC_2012 and Morex Genome only. More than 400,000 BAC-End sequences (e.g.: HV_MBa0001A01.f.scf).
  • IBSC2012 BAC contigs[2][*]: IBSC_2012 only. 377,144 BAC contigs. (e.g. HVVMRX83KHA0104A24_HVVMRXALLhA0391C07_v16_c28)
  • IBSC2012 WGS contigs (Morex, Barke and Bowman)[2][*]: Barke and Bowman contigs mapped in IBSC_2012 and Morex Genome only. Morex contigs in POPSEQ map also. (e.g. morex_contig_15371, barke_contig_975766, bowman_contig_387623).
  • NCBI barley genes[11][*]: Morex Genome only. 894 sequences (e.g.: AAD02252.1, dhn11, AAF01699.1).
  • IBSC2016 genes[3][*]: Morex Genome only. 39,734 HC and 41,949 LC genes. (e.g.: HORVU1Hr1G000090).

We shall be pleased to add any dataset you suggest to the web application, granted that its use is free and public.



Align sequences


The "Align sequences" tool allows searching the map position of FASTA formatted sequences through alignment. This process is slower than "Find markers", but allows adjusting the alignment parameters as needed and searching for any DNA sequences.

Some of the features of "Align sequences" are:
  • Barleymap results are map positions, which may come from different sequence references, which are searched in a pan-genome or multi-reference fashion.
  • It allows using different alignment algorithms, what makes possible to search for sequences with and without introns.
  • Most of the details of this process are hidden from the user, who is interested only in the map and its map positions.
As such, most of the parameters are the equivalent to those explained for the "Find markers" tool above. As in "Find markers", the user has to choose a map (or maps). Note that when the user chooses a map, he is actually choosing all the sequence references associated to that map (in the internal Barleymap configuration), as references for performing the alignments.

In "Align sequences" the user can choose different options for the alignment algorithm, under the option "Choose an action".
  • cdna: it is the recommended option, specially when all the queries come from sequences which could have introns. For example, those from CDS or from markers produced from RNAseq data. All the alignments are performed using the GMAP aligner[12][4].
  • genomic: it uses the most popular alignment tool, BLASTN[13][3], to perform all the alignments.
  • auto: every query is searched with GMAP. For those queries without hits, the search is repeated with BLASTN.
Note that the user can choose also the parameters which define minimum thresholds for results of alignment to be reported. The minimum identity of alignment can be set with the "min. id." parameter, whereas the minimum query coverage in the alignment can be set with the "min. query cov." parameter. Any alignment result with one of those parameters smaller than the thresholds will be discarded by barleymap and thus not reported in the output tables.

Besides that, Barleymap is able to use 3 different algorithms when searching maps which have more than one database associated to it. The details of how these algorithms work can be found here. Here, just a brief description of the maps and databases included in this Barleymap web application, and the algorithms used on them, is provided.

References included in Barleymap web

  • Morex Genome[3]

  • The Morex Genome is an actual genome assembly. Most of the datasets precomputed in Barleymap web are available for this reference (one exception, the IBSC2012 BAC contigs). The main datasets associated to this physical map are the IBSC2016 HC and LC genes (the "HORVUs"), the Illumina 50K markers ("JHIs", "SCRIs", etc.), the Morex WGS contigs and the NCBI genes.

  • POPSEQ map[2]

  • The POPSEQ map is a genetic map with Morex WGS contigs anchored to it. The main datasets associated to this map are the IBSC2012 HC and LC genes (the "MLOCs"), the Illumina 50K markers ("JHIs", "SCRIs", etc.), and the Morex WGS contigs.

  • IBSC2012 genetic/physical map[1]

  • The IBSC2012 genetic and physical map has sequences of different nature anchored to it:
    • Three WGS assemblies from different cultivars: Morex, Barke and Bowman.
    • Morex cultivar sequenced BAC contigs.
    • Morex cultivar BAC End sequences.

    When a search is performed against the IBSC2012 map an "exhaustive" algorithm (see Figure below) is used. First, the queries are aligned against the first reference, using GMAP, BLASTN or both depending on the aligner chosen (see above discussion about parameters of "Align sequences"). For every query with a hit in the reference a map position is retrieved. Those queries without a map position are searched in the second reference. This is repeated until all the queries have a map position or all the references have been used once as reference. The order in which databases are used as alignment target is the same as in the list above.





Locate by position


The "Locate by position" tool allows examining the regions of specific map positions, mainly with the purpose of checking which genes, markers or other loci are present in those regions.

The input data are "tuples", with chromosome (or contig) and position (local position within the chromosome or contig) in basepairs or centimorgans (e.g. chr1H 100200).

All the other parameters are identical to those in "Find markers".



Barleymap output


On top of the results page, Barleymap outputs a list of maps selected by the user. He can use the links on that list to navigate to the results of a specific map.

For every map which the user selected, Barleymap shows up to five tables of results:

Map


The first result shown by Barleymap is a graphical representation of the seven barley chromosomes. Queries with map position are shown on top of those chromosomes. Using the magnifying glass button, the user can toggle between complete chromosomes or just the mapped region.

Below the graphical representation is the "Map" table, with the next fields:
  • Marker: identifier of the query sequence, either the user supplied value in "Find markers", the FASTA header of the sequence in "Align sequences", or an arbitrary code "chromosome_position" created in "Locate by position".
  • chr: chromosome (or contig or equivalent).
  • cM: centimorgans position. Only for anchored maps with cM positions (IBSC2012 and POPSEQ).
  • bp: basepairs position. Only for anchored maps with bp positions (IBSC2012).
  • start: basepairs starting position. Only for physical maps (MorexGenome).
  • end: basepairs ending position. Only for physical maps (MorexGenome).
  • strand: whether the query aligns to the target strand (+) or to the complementary strand (-). Only for physical maps (MorexGenome).
  • multiple positions: whether the current query sequence has more than one different mapping position in the current map.
    This field is shown only if the "Markers with multiple mappings" option has been selected.
  • other alignments: whether the current query sequence has other alignment targets which lack map position.
    At least one unmapped alignment should be found for such query.

Map with markers


The Map with markers table shows the mapping results along with the genetic markers that are located in the same positions (or regions if the search is extended). The table has the same fields as the Map table.

Map with genes


The Map with genes table shows the mapping results along with the genes that are located in the same positions (or regions if the search is extended). The table has all the fields of the previous tables, plus some additional fields, related to functional annotation of genes:
  • Gene class: High Confidence or Low Confidence classification.
  • Description: human-readable description of the gene.
  • InterPro: IPR identifiers for the gene.
  • GeneOntologies: GO identifiers for the gene.
  • PFAM: Protein Families identifiers for the gene.

Map with anchored elements


The Map with anchored elements table shows the mapping results along with the elements that are located in the same positions (or regions if the search is extended). In this case, they are not genes or markers; anchored elements have map position but often lack biological meaning (e.g. WGS contigs, BAC contigs, etc.). The table has the same fields as the Map and the Map with markers tables.

Unmapped and unaligned markers


In addition to the mapping results, two more tables are shown for each map.
  • Unmapped: shows those queries which have an alignment hit (field "Target ID").
  • Note that queries in this table could still have map position, through a different alignment.
  • Unaligned: shows those queries which lack alignment hit (and thus map position).



Confidentiality


We can not guarantee the security of the data used with the web tool. Currently, Barleymap is NOT using any kind of encryption algorithm for sending and receiving data (such as https).

Should this naïve confidentiality be not acceptable to some users, we would recommend installing the standalone barleymap version, or setting up their own instace of barleymap web version.



Disclaimer


This service is available AS IS and at your own risk. EEAD/CSIC do not give any representation or warranty nor assume any liability or responsibility for the service or the results posted (whether as to their accuracy, completeness, quality or otherwise). Access to the service is available free of charge for ordinary use in the course of academic research.




References


[1]IBSC 2012
[2]Mascher et al. 2013
[3]Mascher et al. 2017
[4]Close et al. 2009
[5]Comadran et al. 2012
[6]Bayer et al. 2017
[7]Wenzl et al. 2004
[7']www.diversityarrays.com
[8]Kilian et al. 2012
[9]Poland et al. 2012
[10]Matsumoto et al. 2011
[11]HarvEST
[12]Wu and Watanabe 2005
[13]Altschul et al. 1990


back