Apparently the option for calculating HBD was removed in Beagle 5.0, but in version 4.1, you can use `ibd=true` to calculate HBD and IBD: https://faculty.washington.edu/brown...agle/b4_1.html.
Code:
wget https://faculty.washington.edu/browning/beagle/beagle.27Jan18.7e1.jar
wget https://reichdata.hms.harvard.edu/pub/datasets/amh_repo/curated_releases/V50/V50.0/SHARE/public.dir/v50.0_HO_public.{anno,ind,snp,geno}
f=v50.0_HO_public;convertf -p <(printf %s\\n genotypename:\ $f.geno snpname:\ $f.snp indivname:\ $f.ind outputformat:\ PACKEDPED genotypeoutname:\ $f.bed snpoutname:\ $f.bim indivoutname:\ $f.fam)
printf %s\\n Besermyan Enets Estonian Finnish Hungarian Karelian Mansi Mordovian Nganasan Selkup Udmurt Veps|awk 'NR==FNR{a[$0];next}$3 in a&&++b[$3]<=2{print$1}' - v50.0_HO_public.ind|awk 'NR==FNR{a[$0];next}$2 in a' - v50.0_HO_public.fam|plink --bfile v50.0_HO_public --allow-no-sex --keep <(cat) --autosome --recode vcf --out ural
java -jar beagle.27Jan18.7e1.jar gt=ural.vcf ibd=true out=ural
gzip -dc ural.hbd.gz|cut -d_ -f2-|awk 'NR==FNR{a[$1]=$3;next}{print a[$1],$0}' v50.0_HO_public.ind -|awk '{a[$1]+=$9;n[$1]++}END{for(i in a)print a[i]/n[i],i}' OFMT=%.4f|sort -n
Here's the output which shows the average LOD score for each population:
12.3450 Hungarian
14.3075 Estonian
26.5012 Finnish
27.6500 Mordovian
29.4841 Enets
30.4700 Veps
31.1556 Karelian
31.6653 Nganasan
41.9403 Mansi
42.1547 Udmurt
43.1838 Besermyan
47.1471 Selkup
I only included two samples from each population above, so the output is definitely not representative, because when I included all samples, I got an error like this with some samples:
Code:
ERROR: Missing one or both alleles for a genotype:
23 24300228 rs186929726 G A . . PR GT 0
Exiting Program
Or this:
Code:
Refined IBD
model scale: 2.00
run time: 8 seconds
mean edges/level: 8 max edges/level: 21
mean edges/node: 1.222 mean count/edge: 45
Exception in thread "main" java.lang.IllegalArgumentException:
2 225834818 rs73081884 T C
3 1706284 rs76302465 G A
at ibd.IbdSegment.checkArguments(IbdSegment.java:85)
at ibd.IbdSegment.(IbdSegment.java:68)
at main.WindowWriter.merge(WindowWriter.java:329)
at main.WindowWriter.printIbd(WindowWriter.java:307)
at main.WindowWriter.printIbd(WindowWriter.java:286)
at main.Main.printOutput(Main.java:211)
at main.Main.phaseData(Main.java:157)
at main.Main.main(Main.java:115)
Beagle 3.3 has an `estimatehbd=true` option which was removed in Beagle 4.0. It requires an additional markers file, but it gave me an error like this regardless of which format I tried to use for the markers file (http://faculty.washington.edu/browning/beagle/b3.html):
Code:
$ wget http://faculty.washington.edu/browning/beagle/beagle.jar # Beagle 3.3
$ sort -snk1,1 -nk3 ural.bim|cut -f2,3,5,6 >ural.markers # sort by chromosome and them by cM
$ # sort -nk4 ural.bim|cut -f2,3,5,6 >ural.markers # sort by integer position
$ # sort -nk4 ural.bim|cut -f2,4,5,6 >ural.markers # sort by integer position and use integer position instead of cM in the second field
$ java -jar beagle.jar unphased=ural.vcf estimatehbd=true markers=ural.markers out=ural missing=.
Beagle version 3.3.2 (31 Oct 2011)
Enter "java -jar beagle.jar" for summary of command line arguments.
Start time: 01:12 PM EEST on 28 Oct 2021
Command line: java -Xmx3641m -jar beagle.jar
unphased=ural11.vcf
estimatehbd=true
markers=ural11.markers
out=ural11
missing=.
Exception in thread "main" java.lang.IllegalArgumentException: Error: Marker positions must be nondecreasing: marker rs150294240
at phaser.ag.(Unknown Source)
at phaser.ag.a(Unknown Source)
at phaser.v.(Unknown Source)
at phaser.PhaseMain.(Unknown Source)
at phaser.PhaseMain.main(Unknown Source)
The manual of Beagle 3.3 says that the marker position must be given as cM for HBD detection (https://faculty.washington.edu/brown....2_31Oct11.pdf):
Each line of the marker file will contain four or more white-space delimited fields. The first field is the marker identifier. The second field is the marker position. The remaining fields are the marker alleles. The markers must be given in chromosomal order. If you are performing HBD/IBD detection, the position of each marker must be given using the centiMorgan scale. If you are not performing HBD/IBD detection, the marker positions on a chromosome can be base positions, genetic positions, or sequential integers. The fastIBD algorithm does not require cM marker positions
In the Beagle file given in Example 1 above, the corresponding markers file when marker positions are given in NCBI Build 35 coordinates is
Code:
rs2289311 79235661 A G
rs1248628 79236165 C T
rs10762764 79236371 G T
Bookmarks