The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

Map data is a critical component in many aspects of genetics and
biological research.   Well defined toolkits for manipulating map data
do not exist at this point, we propose to build a system for
manipulating most types of map data (Genetic, RH, RFLP, Sequence, and
LD).  

Map Proposal

This document proposes an object heirarchy for maps, markers, and
their manipulation. 

Key Points
* A Map is an object which contains mapable elements.
* A Map can be defined for a given organism or population of individuals.
* A Mappable element is an element with a position within a map.

Background information
 Maps are made up of elements which are mappable.  This includes
 genetic and physical markers.  

 A genetic map consists of markers which have a given recombination
 distance between them.  This distance is usually given as
 centi-morgans or 1% recombination between them.  Other distances
 include ... Examples of these are the publicly available
 Marshfield and Genethon maps.  

 Radiation hybrid maps consist of markers which have been mapped to
 radiation hybrid panels.  Typically these markers are STSes which
 have been processed on RH panels.  The distance between markers is
 calculated in centi-Rads which represent .  Examples of these include
 Whitehead STS, GeneMap '99.

 Restriction Enzyme (RE) maps are used to describe RE cut points in a
 given sequence and can be used to "fingerprint" sections of DNA
 (typically BAC clones).  Clones which share a statitistically (based
 on known frequency of RE cutting) signifigant collection fingerprints
 are likely to overlap.  Additionally 

 Physical maps or BAC/PAC/YAC maps represent clone fragment overlap.
 These maps are used to to represent how clones overlap and form a
 consensus sequence of a genomic or cDNA region.
 
 Sequence maps represent the known consensus sequence for a given
 region of typically genomic DNA.

 LD and Haplotype maps ...
 
 Comparisions between maps from different organisms can yield useful
 observations about trends in evolution.  Additionally comparisons of
 maps for the same species can provide insight into information such
 as recombination hot spots and DNA stability.

Object proposal
 Maps are objects which are made up of mappable elements.  A mappable
 element has a position on a map and can be tested for equality and
 relative position to other mappable element positions. 
 
 These are some baseline interface and object definitions.  Other work
 has been done by Philip Lijnzaad, Emmanuel Barillot and OMG folks to
 create definitions for maps.

 Interfaces
  Bio::IdentifiableI 
    string    getID // unique identifier -- this goes with Juha's
                    // identifiable property?
   
  Bio::NameableI
    string    getName
    
  Bio::AliasableI isa Bio::NameableI
    string    getAliases


  Bio::Map::MapI isa Bio::NameableI isa Bio::Identifiable
    MapIterator	       getAllElements // for in-order iterator access)
    ?Bio::ChromosomeI? chromosome     // Should maps be build one per
                                      // chromosome aggregated for
				      // a whole report set.
    Bio::SpeciesI      species        // use existing BP species object
				      // which may need to be more robust
    numeric	       length         // not sure what to return for
                                      // relative or RFLP maps
    string	       units          // Map units
    string	       name	      // Map Name


  Bio::Map::MappableI 
   // Where to handle the fact that RFLP 
   // Markers have multiple Map positions
    PositionI position(MapI) 
    boolean   equals(MappableI)
    boolean   less_than(MappableI)
    boolean   greater_than(MappableI)

    Bio::Map::PositionI 
     // may be undef to handle relative maps [RE].  
     // This is where a known position for a marker can be retrieved  
     // Multiple positions are possible for RE on a sequence map
     Array<string>  positionValues  
  
  Bio::MarkerI isa Bio::MappableI isa Bio::AliasableI

  // heikki to help fill in Variant and Allele information
  Bio::LiveSeq::AlleleI

  Bio::LiveSeq::VariantI isa Bio::MarkerI
    Bio::PrimarySeqI getFwdPrimer()
    Bio::PrimarySeqI getRevPrimer()
    // I assume there should always be a primary set of 
    // of markers which defined start/end points 
    // should this be hidden inside more methods to 
    // handle RFLP, etc?
    Bio::LiveSeq::AlleleI getAlleles()
     
 Implementations
   Bio::Marker::RestrictionEnzyme isa Bio::MarkerI
   Bio::Marker::STS isa Bio::MarkerI
   Bio::Marker::Microsat isa Bio::LiveSeq::VariantI
   Bio::Marker::CytogeneticBand isa Bio::MarkerI
   Bio::Marker::VLTR isa Bio::MarkerI
   Bio::Marker::SNP
   Bio::Bin 
   
   Bio::Map::Cytogenetic isa Bio::Map::MapI 
     
   Bio::Map::RadiationHybrid
   Bio::Map::Genetic
   Bio::Map::GeneticMap 
     string	       getSex         // code as a string? - only 
   Bio::Map::RFLP
   Bio::Map::Sequence // Should probably be Bio::Assembly or these two
                      // need to work together Sequence Map could be 
		      // be built with Bio::Assemblies
   Bio::Map::Haplotype // what would this entail -- SNP components?
   

Caveats, questions, etc
-----------------------
Namespace is very flexible here.  

An important useful result of this toolkit will be the ability to
programatically go from one map to another.  So Querying Maps for a
marker - perhaps based on that marker's unique id will allow on to
compare distances on different maps or go from genetic to sequence
maps very easily.  

Not sure if we should be doing a Bio::ChromosomeI or can just code
with a string/numeric?  Does Polyploidy cause any problems in maps or
just in population/allele issues?