Family phaser
Establishes haplotype phase for family data. Input is UNIFORMAT file with a modified identifier. There are no limits on the number of families included in each file.
Valid UNIFORMAT files are expected (e.g. loci are tab
separated), putative homozygous are indicated by a single allele
, real homozygous by allele,allele
and unknown typings should be marked as blank
.
The relevant modification of the file concerns the identifiers, as they must include a few additional information related with the family.
Short identifier description
This information tokens are separated by :
and they refer to the family name, a number identifier (position) in the family, the position (text description) in the family and the number identifiers of the parents (or zero if unknown), like in Fxxx1:1:child:2:3:regular_id
. Both parents must be included in the file even if their typing is not complete. Alleles of haplotypes that cannot be confirmed by the family data are considered undetermined are returned as undet, they may refer to a homozyguous allele but not necessarily.
Detailed identifier description
The identifiers are composed of six tokens (elements), separated by :
. It is assumed that each family as a number or a name, anything using regular characters can be used, for this example let’s say it is Fxxx1. The each element (person) in the family should have a unique number, they do not need to be in any specific order. Let’s assume that in our Fxxx1 family we have three individuals numbered 1 (the child), 2 (the mother) and 3 (the father). We further assume that all the three have a regular typing identifier, if that is not the case just provide any identifier.
Each individual’s identifier is then composed by collating the tokens separated by :
, that is, for the supposed family of this example,
Fxxx1:1:child:2:3:typid32 Fxxx1:2:mother:0:0:typid72 Fxxx1:3:father:0:0:typid73
As can be seen the third element is a descriptive that is ignored during the calculations but can be used to describe the position of the individual in the family (there are no specific restrictions on what you can use).
The filenames are expected to end on .unif
(e.g. Demo-EFI_SE-EUR_Czech-families_A~B~C~DRB1.unif
).