RBCeq Manual

1. About RBCeq:

RBCeq is an integrated bioinformatics webserver to characterize blood group profiles from genomics data. The software uses manually curated, publicly available database(s) listing antigen variants and reports known, novel and rare blood group variants. RBCeq is fully automated and allows seamless analysis and visualization through a user-friendly interface.

BAM files are processed in accordance with GATK best practices and include bam pre-processing steps to remove duplicates, BaseRecalibrator, ApplyBQSR, HaplotypeCaller, and variant filtration. Variant calling and haplotype phasing is done for a restricted set of 45 genes that define blood group systems. This makes the analysis fast and memory efficient.

2. How to run RBCeq: RBCeq accepts two types of input

2.1 Input Bam file (contig name should start with chr. e.g. chr1, chr2 and Read Groups are present in BAM file)

If you have whole genome or exome sequencing bam file, it is recommended for the user to use Bam Trimmer (https://github.com/MayurDivate/BamTrimmer/archive/v2.zip). Bam Trimmer will trim BAM for only 45 blood group antigen defining genes and will remove duplicate reads from the file. Note it’s not a mandatory step, but if you trim it will save uploading and analysis time or user can trim bam file on their own and upload to RBCeq. How to download Bam Trimmer and to start it, please follow the protocol mentioned in Figure1. To generate blood group gene-specific bam and how to submit jobs to RBCeq using BAM file, please follow the protocol mentioned in Figure2. Note Bam Trimmer doesn’t accept bam file which is processed using “IndelRealigner” tool (IndelRealigner is removed from GATK4). User can directly upload the IndelRealigner processed bam file to RBCeq for blood group profiling...

BAM Trimmer requirement: Please install Java (https://www3.ntu.edu.sg/home/ehchua/programming/howto/JDK_Howto.html)

Figure1: Step by step protocol to download Bam Trimmer and how to use it. Bam Trimmer will be downloaded in zip format, the user will need to extract it in order to use it. Here we have used 7zip tool, and the user can use their choice tool to extract zip format Bam Trimmer. Note, Java is required to run Bam Trimer.

Figure2: Step by step protocol on how to submit the job using the button “I have BAM file”,the first step is to trim bam file for blood group-specific gene and then use the same bam file to submit to RBCeq. Input parameters, the specified MAF (Minimal allele frequency) below variants will be defined as rare variants if it is present in gnomad database. Variants which has less than mentioned Allele Depth and Genotype Quality will not be used to call known blood group alleles.

2.2 Input VCF file and blood groups gene coverage bed file

RBCeq also accepts VCF and coverage bed files. The VCF file must be in gz format and coverage file in bed format. RBCeq only accepts coverage for 45 blood group genes generated from bamstat tools (http://lindenb.github.io/jvarkit/BamStats04.html). The coverage file can also be generated using Bam Trimmer (Figure4). To submit jobs to RBCeq using VCF and blood group coverage bed, please follow step by step protocol mentioned in Figure3 and Figure4.

2.2.1 How to gzip vcf file

Windows user :use 7zip tool ( https://www.7-zip.org/), , for details follow Figure3.

Figure3: Step by step protocol on how to generate vcf.gz file to run RBCeq through the button “ I have VCF and blood group coverage BED file”.

2.2.2 How to generate coverage bed file for blood groups gene: Please follow step by step protocol mentioned in Figure4 to generate blood group gene coverage bed using Bam Trimmer.

Figure4: Step by step protocol on how to submit the job using the button “I have VCF and blood group coverage BED file”..The first step is to generate blood group coverage bed file using BamTrimmer as mentioned above then upload the coverage bed and VCF.gz (For details, please follow Figure3) to RBCeq. Input parameters, the specified MAF (Minimal allele frequency) below variants will be defined as rare variants if it present in gnomad database. Variants which has less than mentioned Allele Depth and Genotype Quality will not be used to call known blood group alleles.

3.Output

3.1 Job Summary

Figure5:Job summary window gives details of your completed jobs with its name,input parameter provided and also gives the button“Export Excel file” to export the complete result in excel file.

3.2 Overall summary

Figure6: The summary of distribution and annotation of detected variants in the blood group antigen defining genes; includes average coverage of 45 blood group defining genes; the number of blood groups allele for which reference is changed and the distribution of total variants detected in blood group associated genes with blood group known, ClinVar and rare annotation.

3.3 Blood Group Change Summary

Figure7: The blood group change allele pie chart, summarises the known blood groups alleles, which are different from the reference with associated phenotype. Inner circle each colour represents the changed blood group allele, and the outer circle represents its respective phenotype

3.4 Per Blood Group Variants and Coverage Statistics

Figure8: The “Per Blood Group Variants and Coverage Statistics” graph is an interactive graph which gives the quantitative analysis with respect to blood group genes coverage and detected variants.A :Line/bar combo plot, the x-axis is the blood group antigen defining genes, left y-axis represents the coverage of the genes and right y-axis represent the number of variant for each blood group antigen determining genes. Each line colour represents different coverage value. B: : The table gives the number of variants detected in the input sample for each blood group antigen defining genes.

3.5 Rh Blood Group System Coverage Statistics

Figure9: The CNV ratio formula (RHD=2* (RHD gene average Coverage/RHCE gene average Coverage) and RHCE=2* (RHCE gene exon2 average overage/RHCE gene average coverage)) and its interpretation for exonic rearrangements or deletion/duplication/triplication.

3.6 Known Blood Group Alleles

Figure10: The Known Blood Group Allele tables, Complete 36 blood group profiles of input sample will be reported here. The information of reference and the blood groups alleles which are different from the reference will be reported with supporting information like variants, zygosity, allele depth and allele frequency. First columns describe blood group phenotype, the second column includes the allele information, third columns includes the variants responsible for defining allele: "|" separated variants are hg19 reference which are reported as SNPs in ISBT tables, the fourth column consists of the zygosity of the allele (if one variant is heterozygous then the whole allele will be called as a heterozygous allele), the fifth column includes the average Allele depth for the allele, and the sixth column consists of the average allele frequency.

3.7 Clinically Significant, Rare, and Potentially Novel Allele Annotation

Figure11: This window lists the variants which are not previously reported with respect to blood group antigen but are nonsynonymous and predicted as deleterious by Insilco tools. The First listing includes Clinvar Variants: The variants which have Clinvar annotation, Second listing includes Rare Variants: The variants which have MAF less than the user-provided value in gnomad database, Third listing consists of the Novel Variants: The which are predicted to be deleterious by Insilco tools and exonic function is either “nonsynonymous / frameshift /stop-gain/stop-loss/splicing/”. All the lists are provided with known variant details like dbsnpid, exonic function, refGene and genomad genome and exome frequency in 9 different population.

4. How to get help and report bugs

If you are facing difficulty in running RBCeq web browser, please report your issue at https://github.com/sudhirjadhao2009/RBCeq or reach us at sudhirshriram.jadhao@hdr.qut.edu.au.