
Relationship of the gene pool of the Khants with the peoplesof Western Siberia, Cis-Urals and the Altai-Sayan Region according to the data on the polymorphismof autosomic locus and the Y-chromosome
07Связь генофонда хантов с народами Западной Сибири, Предуралья и Алтая-Саян по данным о полиморфизме аутосомных локусов и Y-хромосомы
VN Kharkov
NA Kolesnikov
LV Valikhova
AA Zarubin
MG Svarovskaya
AV Marusin
IYu Khitrinskaya
VA Stepanov
Correspondence to: V.N. Kharkovvladimir-kharkov@medgenetics.ru
Received 2022 Oct 14; Revised 2022 Nov 23; Accepted 2022 Nov 27.
This work is licensed under a Creative Commons Attribution 4.0 License
Abstract
Khanty are indigenous Siberian people living on the territory of Western Siberia, mainly on the territory of the Khanty-Mansiysk and Yamalo-Nenets Autonomous Okrugs. The present study is aimed at a comprehensive analysis of the structure of the Khanty gene pool and their comparison with other populations of the indigenous population of Southern and Western Siberia. To address the issues of genetic proximity of the Khanty with other indigenous peoples, we performed genotyping of a wide genomic set of autosomal markers using high-density biochips, as well as an expanded set of SNP and STR markers of the Y-chromosome in various ethnic groups: Khakas, Tuvans, Southern Altaians, Siberian Tatars, Chulyms (Turkic language family) and Kets (Yeniseian language family). The structure of the gene pool of the Khanty and other West Siberian and South Siberian populations was studied using a genome-wide panel of autosomal single nucleotide polymorphic markers and Y-chromosome markers. The results of the analysis of autosomal SNPs frequencies by various methods, the similarities in the composition of the Y-chromosome haplogroups and YSTR haplotypes indicate that the Khanty gene pool is quite specific. When analyzing autosomal SNPs, the Ugrian genetic component completely dominates in both samples (up to 99–100 %). The samples of the Khanty showed the maximum match in IBD blocks with each other, with a sample of the Kets, Chulyms, Tuvans, Tomsk Tatars, Khakas, Kachins, and Southern Altaians. The degree of coincidence of IBD blocks between the Khanty, Kets, and Tomsk Tatars is consistent with the results of the distribution of allele frequencies and common genetic components in these populations. According to the composition of the Y-chromosome haplogroups, the two samples of the Khanty differ significantly from each other. A detailed phylogenetic analysis of various Y-chromosome haplogroups made it possible to describe and clarify the differences in the phylogeny and structure of individual ethnospecific sublines, to determine their relationship, traces of population expansion in the Khanty gene pool. Variants of different haplogroups of the Y-chromosome in the Khanty, Khakas and Tuvans go back to their common ancestral lines. The results of a comparative analysis of male samples indicate a close genetic relationship between the Khanty and Nenets, Komi, Udmurts and Kets. The specificity of haplotypes, the discovery of various terminal SNPs confirms that the Khanty did not come into contact with other ethnic groups for a long time, except for the Nenets, which included many Khanty clans.
Keywords: gene pool, human population, genetic diversity, genetic components, Y-chromosome, Khanty
Abstract
Ханты – коренной сибирский народ, проживающий на территории Западной Сибири, в основном на территории Ханты-Мансийского и Ямало-Ненецкого автономных округов. Настоящее исследование направлено на комплексный анализ структуры генофонда хантов и их сравнение с другими популяциями коренного населения Южной и Западной Сибири. Для решения вопросов генетической близости хантов с другими коренныминародами выполнено генотипирование широкого геномного набора аутосомных маркеров с помощьювысокоплотных биочипов, а также расширенного набора SNP- и STR-маркеров Y-хромосомы у различных эт-нических групп: хакасов, тувинцев, южных алтайцев, сибирских татар, чулымцев (тюркская языковая семья) икетов (енисейская языковая семья). Результаты анализа частот аутосомных SNP различными методами, сход-ства по составу гаплогрупп Y-хромосомы и YSTR-гаплотипов свидетельствуют, что генофонд хантов достаточноспецифичен. При анализе аутосомных SNP в обеих выборках полностью доминирует угорский генетическийкомпонент (до 99–100 %). Выборки хантов показали максимальное совпадение по IBD-блокам между собой,с выборкой кетов, чулымцев, тувинцев, томских татар, хакасов-качинцев и южных алтайцев. Степень совпа-дения IBD-блоков между хантами, кетами и томскими татарами согласуется с результатами распределения вэтих популяциях частот аллелей и общих генетических компонентов. По составу гаплогрупп Y-хромосомы двевыборки хантов значительно различаются между собой. Детальный филогенетический анализ различных га-плогрупп Y-хромосомы позволил описать и уточнить различия в филогении и структуре отдельных этноспе-цифичных сублиний, определить их родство, следы экспансии численности в генофонде хантов. Вариантыразных гаплогрупп Y-хромосомы у хантов, хакасов и тувинцев восходят к общим для них предковым линиям.Результаты сравнительного анализа образцов мужчин также свидетельствуют о близком генетическом родствемежду хантами и ненцами, коми, удмуртами и кетами. Специфичность гаплотипов, обнаружение различныхтерминальных SNP подтверждают, что ханты достаточно долго не имели контактов с другими этносами, кромененцев, в состав которых вошло много хантыйских родов.
Keywords: генофонд, популяции человека, генетическое разнообразие, генетические компоненты, Y-хромосома, ханты
Introduction
The study of the structure of the gene pools of populations ofvarious Siberian regions is one of the priority areas of modernhuman genetics and helps to reveal in detail some of the issuesrelated to their ethnogenesis
The Khanty are an indigenous people living on the territoryof Western Siberia, mainly on the territory of the Khanty-Mansiyskand Yamalo-Nenets Autonomous Okrugs, as well as theTyumen Region. Small groups of Khanty live in the north ofthe Tomsk Region and in the Komi Republic. According tothe All-Russian census of 2010, the number of Khanty was30,943 people, of which 61.6 % lived in the Khanty-MansiAutonomous Okrug and 30.7 %, in the Yamalo-Nenets AutonomousOkrug. The Khanty have three large ethnographicgroups that coincide with the groups of their language dialects– northern, southern and eastern, and the southern (Irtysh)Khanty were Turkified and became part of the Siberian Tatars,having mixed with them, and were also assimilated by Russiansettlers (Peoples of West Siberia…, 2005).
Khanty populations are of considerable interest for populationgenetic studies, both due to the relatively poor knowledgewith the involvement of modern genomic technologies, anddue to the specificity of the gene pools of their individualgroups that developed under conditions of long-term geneticisolation.
The settlement of the Khanty in antiquity was very wide –from the lower reaches of the Ob in the north to the Barabasteppes in the south and from the Yenisey in the East to theTrans-Urals, including the rivers Northern Sosva and Lyapin,as well as part of the rivers Pelym and Konda in the west.Since the 19th century, the Mansi began to move beyond theUrals from the Kama and Ural regions, being pressed by theKomi-Zyryans and Russians. From an earlier time, part of thesouthern Mansi also left to the north in connection with thecreation in the XIV–XV centuries of the Tyumen and Siberiankhanates – the states of the Siberian Tatars, and later (XVI–XVII centuries) with the development of Siberia by the Russians.In the XVII–XVIII centuries, the Mansi already livedon Pelym and Konda. Part of the Khanty also moved fromthe western regions to the east and north (to the Ob from itsleft tributaries), which is recorded by the statistical data of thearchives. Their place was taken by the Mansi. So, by the endof the XIX century, there was no Ostyak population left on therivers Northern Sosva and Lyapin: they either moved to the Obor merged with the newcomers (The Peoples of Russia, 1994).
In the north, the Khanty came into contact with the Nenets,some of them were assimilated by them, which is confirmed byethnographic data, as well as our study of the tribal structureof the Gydan Nenets according to Y-chromosome markers(Kharkov et al., 2021). The migration of the Khanty to thenorth and east continued into the 20th century. By the 20thcentury, the southern Khanty were almost completely assimilatedby the Tatars and Russians
Historically, the Khanty population was not homogeneouseither in language or culture. Some scientists divide the Khantylanguage into two large groups – western and eastern, whileothers still subdivide the western dialects into southern andnorthern. In anthropological terms, the Khanty are the mostcharacteristic representatives of the Ural type, which alsoincludes the Mansi, Selkups, Nenets, Baraba Tatars, Shors,Northern Altaians and Khakas. The closest relatives of theKhanty in origin, language and culture are the Mansi (Brook,1986).
The purpose of this study is a comprehensive analysis ofthe structure of the Khanty gene pool and the reconstructionof their origin in comparison with other populations of theindigenous population of Southern and Western Siberia. Toaddress the issues of genetic proximity of the Khanty withother indigenous peoples, genotyping of a wide genomic set ofautosomal markers using high-density biochips, as well as anexpanded set of SNP and STR-markers of the Y-chromosomewas performed in various ethnic groups: Khakas, Tuvans,Southern Altaians, Siberian Tatars, Chulyms (Turkic languagefamily) and Kets (Yeniseian language family).
Materials and methods
The material of the study was DNA samples of men andwomen from two populations of the Khanty in the villageof Russkinskaya, Surgut district and the village of Kazym,Beloyarsky district of the Khanty-Mansi Autonomous Okrug.The sampling of primary biological material (venous blood)from donors was carried out in compliance with the procedureof written informed consent for the study. For each donor, aquestionnaire was compiled with a brief pedigree, indicatingethnicity and places of birth of ancestors. An individual wasassigned to a given ethnic group based on their own ethnicidentity, their parents and place of birth.
For the analysis of Y-chromosome haplogroups and haplotypesof the Khanty, 120 DNA samples of men from the villageof Russkinskaya (N = 64) and the village of Kazym (N = 54)of the Khanty-Mansi Autonomous Okrug were used. For genotypingon high-density microchips, unrelated samples fromthe village of Kazym (N = 30) and the village of Russkinskaya(N = 26) were selected. Other populations of the indigenouspopulation of Siberia are represented by: Chulyms (N = 22),Khakas (Sagays of the Tashtyp district, N = 29 and Kachinsof the Shirinsky district, N = 26), Southern Altaians (villageof Beshpeltir of the Chemal district, N = 24 and Kulada village,Ongudaysky district, N = 25), Kets (Kellogg village,Turukhansky district, Krasnoyarsk Territory, N = 15), TomskTatars (Chernaya Rechka village, Eushta village and Takhtamyshevovillage, Tomsky district, N = 20), Tuvinians (Teelivillage of Bai-Taiginsky kozhuun, N = 28).
Genome-wide genotyping data were obtained using InfiniumMulti-Ethnic Global-8 (Illumina) microarrays for SNPgenotyping, including over 1.7 million markers. The materialwas deposited in the bioresource collection “Biobank of thePopulation of Northern Eurasia”.
Autosomal SNP (single nucleotide polymorphism) genotypearray clustering and quality control were performed usinga protocol developed by (Guo et all., 2014) using GenomeStudio(Ilumina. GenomeStudio) (genotyping module v2.0.3), asoftware package that Illumina developed for various genomicanalyses. For filtering, normalizing and calculating standardgenomic statistics and indicators, the standard set of programs,including vcftools, bcftools, and plink, proved to be optimal.
To analyze linkage blocks identical in origin, the RefinedIBD algorithm (Browning B.L., Browning S.R., 2013) wasused, which shows more accurate results compared to thealgorithms built into plink. The genotypes were preliminarilyphased using the Beagle 5.1 software (Browning S.R.,BrowningB.L., 2007). To compare the populations, the sumsof the average lengths of blocks identical in origin (IBD segments– identical by descent) were obtained between pairsof individuals.
The tSNE method was used to analyze genetic relationshipsbetween populations. The NGSadmix method (Scotte et al.,2013) and the ADMIXTURE program (Alexander et al., 2009;Alexander, Lange, 2011) were used to analyze the componentcomposition and the amount of impurities in individuals andpopulations
To study the composition and structure of Y-chromosomehaplogroups, two systems of genetic markers were includedin the study: diallelic locuses represented by SNPs and polyallelichighly variable microsatellites (YSTRs). With the help of138 SNP markers, the belonging of the samples to differenthaplogroups was determined. The classification of haplogroupsis given in accordance with the data of the InternationalSociety for Genetic Genealogy (website www.isogg.org).
Analysis of STR haplotypes within haplogroups was performedusing 44 STR markers of the non-recombining partof the Y chromosome (DYS19, 385a, 385b, 388, 389I, 389II,390, 391, 392, 393, 426, 434, 435, 436, 437, 438, 439, 442,444, 445, 448, 449, 456, 458, 460, 461, 481, 504, 505, 518,525, 531, 533, 537, 552, 570, 576, 635, 643, YCAIIa, YCAIIb,GATA H4.1, Y-GATA-A10, GGAAT1B07). STR markerswere genotyped using capillary electrophoresis on an ABIPrism 3730 genetic analyzer. Genotyping of SNP markerswas performed using PCR and subsequent analysis of DNAfragments using RFLP analysis.
Experimental studies were carried out on the basis of theCenter for the Collective Use of Research Equipment “MedicalGenomics” (Research Institute of Medical Genetics of theTomsk National Research Medical Center). The constructionof median networks of Y-chromosome haplotypes was carriedout using the Network v.10.2.0.0 (Fluxus Technology Ltd;www.fluxus-engineering.com) using the Bandelt mediannetwork method (Bandelt et al., 1999). The generation ageof the observed diversity of haplotypes in haplogroups wasestimated using the ASD method (Zhivotovsky et al., 2004)based on the mean square differences in the number of repeatsbetween all markers
Results and discussion
The large array of data on autosomal SNPs obtained as a resultof genotyping of high-density microarrays in samples of theKhanty and other indigenous Siberian peoples makes it possibleto characterize the gene pool of the studied samples inthe most detailed way using various methods. Genotyping ofan extended set of specific Y-chromosome SNPs from varioushaplogroups makes it possible to describe the molecularphylogeneticand phylogeographic structure of individualY-chromosome haplogroups much more accurately.
After processing the data on the results of a microarraystudy to filter the progenotyped samples and perform furthercalculations, a search was carried out among the mestizoKhanty using the NGSadmix program. The algorithm of thisprogram makes it possible to determine the ratio of ancestralcomponents from NGS data with a relatively shallow coveragedepth. The calculation principle is similar to other programssuch as FRAPPE and ADMIXTURE, but NGSadmix, unlikethem, works effectively when there is statistical uncertaintyin individual genotypes. The NGSadmix method, when runon the data array we formed, showed that almost all Khantysamples do not have crossbreeding, which is fully consistentwith the data from the DNA donor questionnaire. Crossbreedingwith Russians (up to 30 %) was found only for one manfrom the village of Russkinskaya. His belonging to the EuropeanY-chromosomal lineage R1b1a1b-L407 confirms themiscegenation on the paternal side. This sample was excludedfrom further calculations.
The obtained data on the frequencies of SNPs in the studiedsamples were used to elucidate the genetic relationshipsbetween the population samples included in the work. Fordimensionality reduction, spatial analysis, and identification of genetic components, we settled on two algorithms: tSNEand ADMIXTURE. The tSNE method makes it possible tomore clearly divide the data array into separate ethnospecificgroups of samples compared to the PCA method.
Genetic relationships of the Khantywith other populations of Western and Southern Siberia
When analyzing the data array on the frequencies of autosomalSNPs using the tSNE method at the level of individual samples(Fig. 1). It is shown that the two samples of the Khanty arevery close, while the samples of the Kazym and RusskinskayaKhanty do not intersect on the graph and are separated fromeach other
Fig. 1. Differentiation of the genomes of the population of Southern andWestern Siberia by three components of tSNE.

The Khanty are characterized by specific features of thegene pool and do not cluster with other populations. Comparedwith subethnic groups of the Khakas and Southern Altaiansfrom different settlements, more geographically distantsamples of the Khanty demonstrate a much greater geneticcloseness. The samples of the Kets and Tomsk Tatars are closestto the Khanty. The genetic distances between the Khantyand the populations of Southern Siberia are much greater.Samples that are ethnically and geographically close to eachother are located quite close in the Fig. 1, but each sample isincluded in a separate ethnospecific cluster. The exception isonly a few single samples of the Khakas.
Component composition of the gene pool of populations.Modern methods used in genomic studies and newbioinformatic approaches make it possible to reliably identifyancestral genetic components of different origins in the genepool of various populations and individuals. To identify individualgenetic components in the gene pool of the studiedpopulations, the ADMIXTURE program was used, whichmakes it possible to identify the mixed composition of a setof individuals based on genotype data and, thereby, to makeassumptions about the origin of the population.
Modeling using ADMIXTURE has recently become one ofthe main methods of analysis in the study of the gene poolsof modern and ancient human populations, allowing you toanalyze the same data at different hierarchical levels. Whenthe number of ancestral components is set to more than two, inmost of the studied populations, a genetic component specificto the Khanty is revealed, which is most clearly manifested inthe analyzed array of population samples at K = 8, which canbe interpreted as the “Ugric” genetic layer in the gene poolof modern populations. The Khanty are characterized by thedominance of this component, which is their genetic basis(up to 99–100 % at the level of most individuals). A significantproportion of this component is also found in the Kets (up to45–50 % in some individuals) and Tomsk Tatars (up to 5–9 %).Previously, it was shown that this component also occupiesa significant share in the gene pool of the populations of theVolga-Ural region – the Bashkirs (up to 25 %), Maris (up to20 %), Komi, Udmurts and Chuvashs (up to 15 %). It is presentwith less frequency in almost all South Siberian samples,among the Tuvans, Chulyms, Altaians, and Khakas of Sagays(from 5 to 10 %) (Kharkov et al., 2020).
The dominance of the Ugric component in all Khantysamples, starting from K = 3, and the almost complete absenceof other genetic components in their genomes at the individualand population level, indicates that their ancestral populationswere in genetic isolation for a very long time. This suggeststhat the ancient Ugric population of the modern territory ofthe Khanty settlement did not mix with other ethnic groupsand confirms the absence of other groups of migrants from theterritory of Southern Siberia and the steppe zone
The result obtained shows that the overall picture of thedistribution of the components is in good agreement with thegeographical location of the studied populations, binding toa specific region, anthropological and linguistic differences.This information makes it possible to more accurately judgethe similarities and differences between the compared populations,the composition of ancestral components, as well as theprocess of formation of their gene pool.
Identical in origin clutch blocks. As a result of bioinformaticsprocessing of genotyping data from high-densitybiochips of various Siberian populations, an analysis wasmade of the coincidence of DNA fragments common in originbetween populations and individuals. A segment with identicalnucleotide sequences is IBD in two or more individualsif they inherit it from a common ancestor without recombination,that is, in these individuals the segment has a commonorigin. The expected length of an IBD segment depends onthe number of generations since the last common ancestor.One of the applications of the analysis of genome regions ofcommon origin is the quantitative assessment of the degreeof relationship between individuals, which can also supplementinformation on the genetic relationships of populations(Gusev et al., 2012).
The samples of the Khanty showed the maximum match inIBD blocks with each other (6 %), then with a sample of theKets (1.45 %), Chulyms (0.71 %), Tuvans (0.35 %), TomskTatars (0.33 %), Khakas Kachins (0.32 %), and SouthernAltaians (0.28 %). At the same time, among the Khanty, agreater coincidence of IBD blocks is observed in Russkinskaya(23.5 %), compared with Kazym (18.1 %).
The degree of overlap of IBD blocks between the Khanty,Kets, and Tomsk Tatars is consistent with the results of tSNEand ADMIXTURE in terms of the distribution of allele frequencies and common genetic components in these populations.At the same time, in Khanty population from Russkinskaya,who have the largest sum of average lengths of IBDsegments between pairs of individuals, the greatest contributionis made by IBD longer than 10 cm (42–46 %), whichindicates a strong recent inbreeding within the population.To confirm this, the FROH inbreeding coefficient was calculatedfor all individuals for the three classes of homozygosityblocks (ROH). For the West Siberian populations, the Chu-lym population (0.0292), the Kazym (0.0280) and RusskinskayaKhanty (0.0266) and Kets (0.0259) populations, whichare close in value, have the maximum values. Among the SouthSiberian populations, including the Altaians, Tomsk Tatars,Tuvans and Khakas, the maximum value was also foundfor the sample of Khakas-Sagays from the foothill Tashtypregion (0.0318), twice as high as the Khakas-Kachins of theplain Shirinsky region. The minimum value is typical for theTomsk Tatars (0.0071).
There is a high correlation for FROH > 1.5 with the sumof mean IBD segment lengths (IBD > 1.5 cM) betweenpairs of individuals within Siberian populations (r = 0.9246,p < 5.612e-09). To calculate the Spearman correlation coefficient,cor.test was used in the R program. The ratio of thesum of the average lengths of IBD segments (IBD > 1.5 cM)between pairs of individuals to the coefficient of genomicinbreeding (FROH > 1.5) in the Russkinsskaya Khanty ishigher than in Kazym Khanty. These indicators of genomicinbreeding and distribution of IBD lengths within Khantypopulations are in good agreement with their territorial isolationand confirm the absence of recent gene flows betweenpopulations for hundreds of years
Haplogroups of the Y-chromosome. As a result of the analysisof the frequency of occurrence of the used SNP markersin the studied samples of the Khanty, eight haplogroups of theY-chromosome were identified. According to the compositionand frequencies of haplogroups, the samples of Russianand Kazym Khanty men are very different from each other.Only two haplogroups are present in both samples (see theTable).
Table 1. Frequency of occurrence of Y-chromosome haplogroups in the Khanty.

Thirty-nine samples belong to the N1a2b1b1 subline inthe Russkinskaya Khanty, and only three in the Kazymones. Terminal for this line, the Khanty have SNPs Y68212,Y70717, Y70315, Y70327. This Khanty subline is close to theN1a2b1b1 variantsin the Chulyms (VL65,Z35095,Z35099,Z35102) and Khakas-Kachins (Z35093,Z35097,Z35103)(Valikhova et al., 2022).
The haplogroup N1a2b1b1 among the Khanty is ethnospecificand does not coincide in terminal SNPs and haplotypeswith the dominant among the Nenets N1a2b1b1a~ (B171,B170, Z35091, Z35092) (Kharkov et al., 2021).
A feature of the ethnic composition of the majority of theSouth Siberian peoples is the presence of clans (seoks), wherekinship is counted along the male line. Such a generic structureis typical for the Shors, Khakas, northern and southernAltaians, and Teleuts. All other samples of men from variousWest and South Siberian populations (the Enets, Khakas-Sagays, Shors, Chelkans and Tuvans, as well as the Khakasseoks formerly part of the Beltirs and Biryusins, assimilatedin the late 19th and early 20th centuries) belong to others sublinesof haplogroup N1a2b. The median network of haplotypes(Fig. 2) demonstrates a stellate phylogeny in the Khanty witha recent founder effect and a predominance of the ancestralhaplotype in frequency.
Fig. 2. The median network of YSTR haplotypes of the N1a2b1b1 haplogroup in the Khanty, Chulyms and Khakas-Kachins.

The Khanty are marked in light blue, the Chulyms are in red, the Khakas of the Sokhkhy seok are in blue, the Khakas of the Yzyr seokare in green, and the yellow are Khakases seok Hhaskha, dark green – seok Purut.
The specific cluster of Khanty haplotypes is equidistantfrom all seoks of the Khakas-Kachins. The age of this clusteramong the Khanty was 858 years (SD = 338 years), whichis approximately one and a half to two times higher than theage of the clusters of the Kachin seoks Khaskha – 487 years(SD = 153 years), Yzyr – 501 years (SD = 203 years), Sokhkhy– 585 years (SD = 215 years) (Kharkov et al., 2020) andChulym Turks 667 years (SD = 194 years). Thus, the Khantyin this haplogroup have a direct genetic connection with theKachins, Chulyms and Nenets, whose ancestral lines divergedquite a long time ago and reflect their connection with thepeoples of the Samoyedic language group.
The second haplogroup N1a2b2a1 (VL97, L1419, Y3185,Y3188, Y3189, Y3190, Y111190) is common for two Khantysamples (previously designated as the European N1b-Elineage). This subline was found among the Bashkirs, KazanTatars, Komi, Mari, Karelians, Vepsians, Finns and Russians(https://www.yfull.com/). Phylogenetically closest to the Khantyalong this line are the Komi samples. Ethnospecific branchesof the Khanty and Komi unite SNPs Y65017 and Y89655, notfound in other populations. The Khanty and Komi have theleast ancient common ancestor for this haplogroup, comparedto other European populations.
According to the YFull website, this branch split from theancestral line about 2800 years ago. Theoretically, there aretwo options for the appearance of this haplogroup among themodern Khanty and Komi: 1) inheritance from a commonancient ancestral group of Ugric tribes; 2) the recent mixingof Khanty with ethnic Komi migrants to Siberia. However,the results of the analysis of genomic data using NGSadmix,ADMIXTURE, IBD blocks and differences in terminal SNPsof Y-chromosomes do not confirm the second variant. TheYSTR haplotypes of this line in the Khanty and Komi alsodiffer by several mutations. Previously, V.N. Pimenoff et al.suggested in their work that when the Ob-Ugric Khanty andMansi went to the western slopes of the Ural Mountains andto the north-west of Siberia, a unique association N1b-A andN1b-E was formed (Pimenoff et al., 2008). This combinationof N1b sublines in the Khanty and Mansi suggests a recentconfluence of the western and eastern lineages in North WesternSiberia. Our new data do not contradict this version
All other haplogroups are represented only in individualsamples of the Khanty. The haplogroup N1a2b2b1~ (Z35076)includes three samples of the Kazym Khanty. The lineageN1a2b2b1~ (B528, Y24382,Z35076,Z35077) closest toit is also common among the Komi. The Udmurts, Tatars,Chuvashs and Bashkirs have its more modern line (B226).The YSTR haplotypes of this haplogroup in the Komi and Udmurtsare closer to each other than to the Khanty samples. Thepresence among the Khanty and Komi of two haplogroups,N1a2b2a1 and N1a2b2b1~, with ethnospecific terminal SNPsand different haplotypes indicates their inheritance from fairlyancient common ancestors, most likely part of the early Ugricpopulation of these territories.
Thirteen samples of the Kazym Khanty belong to the haplogroupN1a1a1a1a2a1c1~ (Y13850,Y13852). Seven of themhave the surname Pyak, which is Nenets in origin, referring tothe Forest Nenets. All seven of these samples have very closehaplotypes and are descendants of a relatively recent commonNenets ancestor. In the questionnaires of these men, who considerthemselves Khanty, Nenets ancestors were indicated onthe paternal line with different depths. The remaining six menof this haplogroup differ in haplotypes from the Pyak genus
In our study of the Taz Nenets (Kharkov et al., 2021), itwas found that all men representing the Khanty origin of theSalinder, Lar and Tibichi clans completely belong to thishaplogroup. Representatives of these genera formed in theXVIII–XIX centuries in the lower reaches of the Ob as aresultof the development of the Nenets large-herd reindeerhusbandryand the involvement of part of the northern Khantyin it (Kvashnin, 2003). All haplotypes of the Kazym Khantyof this haplogroup differ significantly from the haplotypes ofthe Taz Nenets.
The other five samples of the Kazym Khanty belong to thehaplogroup N1a2b1b1b1~ (B172,Z35108). All previouslysurveyed Nenets men from the Vanuito phratry belonging tothe Vanuito, Puiko and Yaungat clans, and the Purungui clanof Khanty origin, belong to it. Four samples of the Khanty differin haplotypes from the Nenets, but one almost completelycoincides with them. Such a division into haplotypes specificto the Khanty and close to the Nenets coincides with the data on the haplogroup N1a1a1a1a2a1c1~. It is obvious that thegene pool of the Kazym Khanty includes precisely the variantsof these haplogroups of Khanty origin, but relatively recentlymarriages were also made with the Forest Nenets. The absenceof these haplogroups in the Russkinskaya Khanty is in goodagreement with the data on the distribution of IBD blocks andthe coefficient of genomic inbreeding
The distribution of various haplogroups of the N cladeof the Y-chromosome in the studied populations is in goodagreement with the frequency of the Ugric genetic component.Phylogenetic analysis of Y-chromosomal sublines andhaplotypes of various haplogroups of the N clade shows thatthe center of origin and distribution of the carriers of theUgric component in Southern, Western Siberia and EasternEurope is the territory of modern Altai and Sayan Mountains.The obtained results are well comparable with the data ofethnology, anthropology and linguistics on the contributionof the Uralic component to the formation of various peoplesof the Altai-Sayan and the historical areas of Ugric and otherlanguages of the Uralic language family.
Almost 40 % of men from Kazym belong to the haplogroupQ1b1a3b1a2~ (Z35974 xB32, B33, Z35993). The lineageQ1b1a3b1a2~ (B33,Z35991) specific to the Kets populationis closest to it. In addition to the Kets, this variant alsoprevails among the Selkups from the Tomsk Region and theKrasnoyarsk Region. A more distant line Q1b1a3b1a~ (B30,YP1693 xZ35991) is common in Tuvan populations, with amaximum frequency in the eastern mountainous regions ofTuva (up to 25 %). Khanty samples show a specific haplotypespectrum with a recent founder effect that is not observed inthe Kets (Fig. 3).
Fig. 3. Median network of YSTR haplotypes of haplogroup Q1b1a3b1a2~in Khanty and Kets.

The distribution of these sublines in the populations ofthe Khanty, Kets, and Tuvans is in good agreement with theshares of matches in IBD blocks between them, the tSNEplot, and the distribution of the Ugric genetic component inthese populations over the autosomal part of their gene pool.The presence of this lineage among the Khanty is not due torecent borrowing from other aboriginal populations (Kets andSelkups), but to the fact that it was already part of the settlingancestral groups.
Three men from the village of Russkinskaya have a completelydifferent haplogroup of the Q clade – Q1a2b~ (M25,L716, YP1674, YP1676). This is a very rare haplogroup notfound in other Siberian populations. It is presented with themaximum frequency among the ethnic Turkmens of Karakalpakstan,Iran and Afghanistan (Grugni et al., 2012; Skhalyakhoet al., 2016). In most other ethnic groups, its frequencyis very low. Khanty haplotypes are quite different from otherpopulations. Most likely, the presence of this line among themis not a consequence of recent miscegenation, but is a legacyof the Ugric groups that migrated from southern Siberia andthe Urals to the north.
The last haplogroup, which includes 16 Khanty men fromthe village of Russkinskaya, is R1a1a1b2-Y43850. The haplotypesof all samples are quite close, which indicates a recentfounder effect (Fig. 4).
Fig. 4. Median network of YSTR haplotypes of haplogroup R1a1a1b2-Y43850 in the Khanty, Khakas, Shors, Tuvans and Altaians.

Khanty are in light blue, Khakas are in blue, Shors are in crimson, Tuvans are in dark green, Altaians are in green.
Khanty-specific terminal SNPs are S7280, FGC687, andFGC38304. The R1a1a1b2-Y43850 variants closest to thislineage are represented with a high frequency in the Khakasand Shors, and less frequently in the Tuvans and Northern Altaians.According to YFull, this haplogroup is approximately3800 years old. All of these patterns belong to four differentlineages that split a long time ago. The age of the haplotypecluster in the Khanty was 933 years (SD = 336 years), whichis approximately one and a half times less than the age of theSouth Siberian lines. The Khakas seok Piltir is 1469 years old(SD = 342 years) (Y39884 хY43109). The lineage of this haplogroup(Y62155.2) specific for the Biryusa Khakas seoks ofTuran, Khyzyl Khaya and Shor seoks of Tartkyn, Shor-Kyzaiand Kara-Shor has approximately the same age – 1315 years(SD = 227 years). The branch with a wider distribution in theSayan-Altai populations (Y43109) is even older – 1566 years(SD = 350 years). The difference in SNP and STR among theKhakas, Shors, Tuvans, and Northern Altaians is greater thanwith the Khanty.
A strong heterogeneity of the studied samples of the Khantyin terms of the composition and frequencies of varioushaplogroups is shown. The phylogeny of various lineages oftwo haplogroups, N1a2b1b1 and R1a1a1b2-Y43850, indicatestheir South Siberian origin in the Khanty gene pool. The territoryof the Sayan and Altai was the primary focus of thegeneration of diversity and the expansion of the number ofancestral groups of carriers of these haplogroups in Siberia.It is most likely that the distribution of most Y-chromosomehaplogroups among the Khanty occurred during the initialsettlement of the Ugric tribes to the north and west.
It is necessary to take into account the fact that the rangeof modern Khanty is located to the north of the territory oftheir ancestors. The West Siberian and Volga-Ural regionswere the place of secondary generation of diversity, but notthe formation of the N1a2 haplogroup itself. At the moment,there is no final opinion regarding the place of formation ofthe ethnoi of the Uralic language family, but numerous data,including the results of studies of the phylogeny and phylogeography of clade N haplogroups, point to Southern Siberia.Linguistic paleontology points to the Proto-Ural ecologicalarea as a territory limited in the west by the Ural Range, in thenorth by approximately the Arctic Circle, in the east by thearea of the lower reaches of the Angara and PodkamennayaTunguska and the middle reaches of the Yenisey, in the southby approximately the modern southern border of the WestSiberian taiga from the northern foothills of the Sayan andAltai to the lower reaches of the Tobol and the Middle Uralsinclusively (Napolskikh, 2018).
Conclusion
Thus, the gene pool of the two Khanty populations is a heterogeneousset of Y-chromosome haplogroups, but very similarin autosomal markers. The expanded composition of terminalSNPs for the identified haplogroups made it possible to describein detail and clarify the differences in the phylogenyand structure of individual ethnospecific sublines, to determinetheir relationship, and traces of population expansion in theKhanty gene pool. The results of a comparative analysis ofmale samples indicate a close genetic relationship of theKhanty with the Altai-Sayan Khakas and Tuvans, as well aswith the Nenets, Komi, Udmurts and Kets. The specificity ofhaplotypes and the detection of various terminal SNPs indicatethat the Khanty did not come into contact with other ethnicgroups for a long time. The only exception is the Nenets,which included many Khanty clans. For the northern populationof the Kazym Khanty, Y-chromosomal lines show a smallcontribution of the Forest Nenets.
The results obtained do not contradict the generally acceptedversions of the Khanty ethnogenesis, but allow us to takea fresh look at this process. The main factor in the formationof the Khanty gene pool was their territorial genetic isolationand later mixing with the newcomer Samoyed population,which, when switching to tundra reindeer husbandry, led to astrong demographic growth of their clans as part of the Nenets.The relatively low genetic diversity in autosomal SNPsand the rather high level of inbreeding in the Khanty confirmthis. New information about the structure of the Khanty genepool is an important addition to the existing anthropological,archaeological, ethnological and linguistic data on their formationand kinship with other peoples.
Conflict of interest
The authors declare no conflict of interest.
References
Alexander D.H., Lange K. Enhancements to the ADMIXTURE algorithmfor individual ancestry estimation. BMC Bioinformatics. 2011;12:246. DOI 10.1186/1471-2105-12-246.
Alexander D.H., Novembre J., Lange K. Fast model-based estimationof ancestry in unrelated individuals. Genome Res. 2009;19(9):1655-1664. DOI 10.1101/gr.094052.109
Bandelt H.J., Forster P., Röhl A. Median-joining networks for inferringintraspecific phylogenies. Mol. Biol. Evol. 1999;16(1):37-48. DOI10.1093/oxfordjournals.molbev.a026036.
Brook S.I. The World Population. Ethnodemographic Reference Book.Moscow: Nauka Publ., 1986. (in Russian)
Browning B.L., Browning S.R. Improving the accuracy and efficiencyof identity-by-descent detection in population data. Genetics. 2013;194(2):459-471. DOI 10.1534/genetics.113.150029.
Browning S.R., Browning B.L. Rapid and accurate haplotype phasingand missing-data inference for whole-genome association studiesby use of localized haplotype clustering. Am. J. Hum. Genet.2007;81(5):1084-1097. DOI 10.1086/521987.
Grugni V., Battaglia V., Hooshiar Kashani B., Parolo S., Al-Zahery N.,Achilli A., Olivieri A., Gandini F., Houshmand M., Sanati M.H.,Torroni A., Semino O. Ancient migratory events in the Middle East:new clues from the Y-chromosome variation of modern Iranians.PLoS One. 2012;7(7):e41252. DOI 10.1371/journal.pone.0041252.
Guo Y., He J., Zhao S., Wu H., Zhong X., Sheng Q., Samuels D.C.,Shyr Y., Long J. Illumina human exome genotyping array clusteringand quality control. Nat. Protoc. 2014;9(11):2643-2662. DOI10.1038/nprot.2014.174
Gusev A., Palamara P.F., Aponte G., Zhuang Z., Darvasi A., GregersenP., Pe’er I. The architecture of long-range haplotypes sharedwithin and across populations. Mol. Biol. Evol. 2012;29(2):473-486.DOI 10.1093/molbev/msr133.
Kharkov V.N., Novikova L.M., Shtygasheva O.V., Luzina F.A., KhitrinskayaI.Y., Volkov V.G., Stepanov V.A. Gene pool of Khakassand Shors for Y chromosome markers: common components andtribal genetic structure. Russ. J. Genet. 2020;56(7):849-855. DOI10.1134/S1022795420070078.
Kharkov V.N., Valikhova L.V., Yakovleva E.L., Serebrova V.N., KolesnikovN.A., Petelina T.I., Khitrinskaya I.Yu., Stepanov V.A.Reconstruction of the origin of the Gydan Nenets based on geneticanalysis of their tribal structure using a new set of YSTR markers.Russ. J. Genet. 2021;57(12):1414-1423. DOI 10.1134/S1022795421120061
Kvashin Yu.N. Gydan Nenets: the History of the Formation of theModern Generic Structure (18-20th Centuries). Moscow: IEA RASPubl., 2003. (in Russian)
Napolskikh V.V. Essays on Ethnic History. Kazan: Khalikov Institute ofArcheology, 2018. (in Russian)
Peoples of West Siberia: Khanty. Mansi. Selkups. Nenets. Enets. Nganasans.Kets. Moscow: Nauka Publ., 2005. (in Russian)
Pimenoff V.N., Comas D., Palo J.U., Vershubsky G., Kozlov A., SajantilaA. Northwestern Siberian Khanty and Mansi in the junctionof West and East Eurasian gene pools as revealed by uniparentalmarkers. Eur. J. Hum. Genet. 2008;16(10):1254-1264. DOI 10.1038/ejhg.2008.101.
Skhalyakho R.A., Zhabagin M.K., Yusupov Yu.M., AgdzhoyanA.T.,Sabitov Zh.M., Gurianov V.M., Balaganskaya O.A., Dalimova D.A.,Davletchurin D.Kh., Turdikulova Sh.U., Chukhryaeva M.Ch., AsilgujinR.R., Akilzhanova A.R., Balanovsky O.P., Balanovska E.V.Gene pool of Turkmens from Karakalpakstan in their Central Asiancontext (Y-chromosome polymorphism). Vestnik Moskovskogo Universiteta.Seriya 23. Antropologiya = Moscow University AnthropologyBulletin. 2016;3:86-96. (in Russian)
Skotte L., Korneliussen T., Albrechtsen A. Estimating individual admixtureproportions from next generation sequencing data. Genetics.2013;195(3):693-702. DOI 10.1534/genetics.113.154138.
The Peoples of Russia: Encyclopedia. Moscow: Bol’shaya RossiyskayaEntsyklopedia Publ., 1994. (in Russian)
Valikhova L.V., Kharkov V.N., Zarubin A.A., Kolesnikov N.A., SvarovskayaM.G., Khitrinskaya I.Yu., Shtygasheva O.V., Volkov V.G.,Stepanov V.A. Genetic interrelation of the Chulym Turks with Khakassand Kets according to autosomal SNP data and Y-chromosomehaplogroups. Russ. J. Genet. 2022;58(10):1228-1234. DOI 10.1134/S1022795422100118.
Zhivotovsky L.A., Underhill P.A., Cinnioglu C., Kayser M., Morar B.,Kivisild T., Scozzari R., Cruciani F., Destro-Bisol G., Spedini G.,Chambers G.K., Herrera R.J., Yong K.K., Gresham D., Tournev I.,Feldman M.W., Kalaydjieva L. The effective mutation rate at Y-chromosomeSTRs with application to human population divergencetime. Am. J. Hum. Genet. 2004;74(1):50-61. DOI 10.1086/380911.
Acknowledgments
The study was supported by the Russian Science Foundation grant No. 22-64-00060 (https://rscf.ru/project/22-64-00060/).
Contributor Information
V.N. Kharkov, Research Institute of Medical Genetics, Tomsk National Research Medical Center of the Russian Academy of Sciences, Tomsk, Russia
N.A. Kolesnikov, Research Institute of Medical Genetics, Tomsk National Research Medical Center of the Russian Academy of Sciences, Tomsk, Russia
L.V. Valikhova, Research Institute of Medical Genetics, Tomsk National Research Medical Center of the Russian Academy of Sciences, Tomsk, Russia
A.A. Zarubin, Research Institute of Medical Genetics, Tomsk National Research Medical Center of the Russian Academy of Sciences, Tomsk, Russia
M.G. Svarovskaya, Research Institute of Medical Genetics, Tomsk National Research Medical Center of the Russian Academy of Sciences, Tomsk, Russia
A.V. Marusin, Research Institute of Medical Genetics, Tomsk National Research Medical Center of the Russian Academy of Sciences, Tomsk, Russia
I.Yu. Khitrinskaya, Research Institute of Medical Genetics, Tomsk National Research Medical Center of the Russian Academy of Sciences, Tomsk, Russia
V.A. Stepanov, Research Institute of Medical Genetics, Tomsk National Research Medical Center of the Russian Academy of Sciences, Tomsk, Russia