Validate UNIFORMAT syntax
Check UNIFORMAT files for errors and performs expansion of valid abreviations (@, allele&allele), transliteration (or recoding) of allele names and their validation against the current version of the nomenclature or a custom defintion (see details below).
Recoding allele names
This version of uniformate also performs data transliteration (recoding) when an optional transliteration file is provided.
Substitution files are one column lists of old_allele – new_allele, where new_allele can be a abridged UNIFORMAT allele expression:
allele or allele&allele or ...
Substitutions related to different loci are grouped under the LOCUS keyword, and the keyword for each locus is mandatory even if a given locus is not being changed (see example below where LOCUS 2 does not contain substitution rules):
# Example substitution file LOCUS 1 A*01:01:01 - A*01:01 A*02 - A*02:01&A*02:02&A*02:03 # note no substitution data below LOCUS 2 LOCUS 3 C*01:02 - C*01 C*01:03 - C*01 null - C*01:02&null
Validation of allele names
This version of uniformate also performs validation of allele names either against the current verison of the IMGT/HLA nomenclature, either against a user supplied nomenclature file.
Nomenclature files are either old version of the official nomenclature, or files that are similar json two values files (only the second element taken as a valid name) or, more simply, one column lists of allele names (all loci confunded) as shown below.
# Example simple nomenclature file A*01:01 A*01:02 A*02 ## more lines [suppressed in this exemple] ## allele can be defined at very different levels and even be redundant ## order is not important DRB1*123:456:789 ## valid allele names do not need to be HLA names DiseaseMarker NonDisease