
Genetic stability in the lower Yangtze River basin from Song to Qing Dynasty
Haifeng He
Xinyuan Kong
Le Tao
Liangsai Zhu
Xuanbo Wang
Mengting Xu
Yuanming Chen
Kongyang Zhu
Yu Xu
Haodong Chen
Hao Ma
Rui Wang
Xiaomin Yang
Tianyou Bai
Jianxin Guo
Yang Yang
Xin Jia
Chuan-Chao Wang
Corresponding author.
Contributed equally.
Received 2024 Dec 18; Accepted 2025 Jul 15; Collection date 2025.
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visithttp://creativecommons.org/licenses/by-nc-nd/4.0/.
Abstract
Background
The lower Yangtze River basin holds a pivotal role in Chinese history. As previous genetic research in this region has primarily focused on modern population datasets, the limited availability of ancient human genomes has hindered our capacity to reconstruct detailed ancient population histories and evaluate the genetic impact of Yellow River-related groups.
Results
Here, we present the first set of ancient human genomes from the lower Yangtze River basin, comprising eight individuals from the Song to Qing Dynasties (960–1921 CE). We observed a high degree of genetic homogeneity in most samples, suggesting long-term regional genetic stability. Seven individuals were estimated to derive 69.3–100% of their ancestry from ancient Yellow River-related populations, while the remainder can be attributed to a southern East Asian substrate. Contemporary Han Chinese residing in the lower Yangtze basin can be modelled as direct genetic descendants of historical individuals from this area. Notably, one Qing Dynasty sample reveals a genetic link to the Eastern Mediterranean.
Conclusions
Our findings illustrate enduring genetic continuity in the lower Yangtze River basin throughout historical times. These findings underscore the region’s role as a genetic bridge between northern and southern East Asia, retaining local rice-farming ancestry while being shaped by southward expansions of Yellow River-related ancestry.
Supplementary Information
The online version contains supplementary material available at 10.1186/s12915-025-02343-3.
Keywords: The lower Yangtze River basin, Historical Jiangsu, Population history, Rice farming-related ancestry
Background
The Yangtze River basin populations were subjected to demographic changes throughout history due to factors such as war, cultural shifts, and domestication. Rice, one of the most important staple foods in the world, was first domesticated in the lower Yangtze River basin 11,000 years ago [1]. While rice-farming techniques spread beyond the basin, previous studies have also identified rice-millet mixed cultivation in the Middle and lower Yangtze River basin [2–4]. This mixed cultivation pattern suggests southward migration and genetic influence from millet-farming ancestries in the Yellow River basin. Archaeological evidence further revealed that the Liangzhu culture (5300–4000 Before present (BP)), the last Chinese Neolithic jade culture in the lower Yangtze River basin, maintained strong connections with Neolithic cultures in the Yellow River basin, such as Dawenkou, Longshan, and Erlitou cultures [5–7]. This cultural interaction may suggest a potential population mixture between northern and southern populations in this region.
During the Western Jin Dynasty (266–316 Common era (CE)), the forces of the Xiongnu sacked the capital, promoting a large population displacement from Northern China to the middle and lower Yangtze River basin. These displaced populations encompassed a broad demographic spectrum that included intelligentsia, farmers, and merchants, who subsequently contributed to the cultural and economic development of the Yangtze River basin. The lower Yangtze River basin, which remained largely unaffected by conflicts during the subsequent Northern and Southern Dynasties (420–589 CE) and the Tang Dynasty (618–907 CE), continued to attract displaced war populations from the Central Plain region, ultimately cementing its status as China’s foremost economic center [8,9]. This southward war-avoiding displacement continued until the end of the Southern Song Dynasty (1127–1279 CE). The Mongolian invasion of southern China forced the lower Yangtze River basin people to move further south, making them settle in southern China (i.e., present-day Fujian, Guangdong, and Guangxi) and Southeast Asia. However, at the beginning of the Ming Dynasty (1368–1644 CE), a migratory influx from southern China settled in the depopulated eastern region of the lower Yangtze River basin (modern-day Jiangsu Province), driven by the increased demand for manpower to support economic revitalization. During the late Qing Dynasty (1644–1912 CE), the lower Yangtze River basin experienced severe demographic decline due to the Taiping Rebellion, which resulted in an estimated 70% population loss in the Jiangsu region [10,11]. Due to the frequent demographic shifts during these historical periods, characterizing the temporal genetic profiles of populations in the lower Yangtze River basin may present a significant challenge.
Ancient deoxyribonucleic acid (aDNA) research in China has rapidly developed over the past decades, offering genetic insights into past population structure and migration events. However, due to poor DNA preservation conditions, few studies have successfully retrieved high-quality genetic data from ancient individuals in southern China. Our understanding of the genetic structure and population history in this region remains limited. Nevertheless, ancient DNA studies from surrounding areas, including the southeastern coastline, southern China, and southwestern China, have provided important insights into ancient population dynamics [12–14]. These studies reveal that populations along the southeastern coast had already migrated and admixed with southern inland populations before 6400 years ago. Moreover, the expansion of Yellow River-related populations has shaped the genetic profile of southwest China as early as the Neolithic period and of southern China approximately 1500 years ago. Despite these advances, the genetic profile of ancient individuals from the lower Yangtze River basin, a key transitional region between the north and south, has remained largely unknown. All present studies about population history in this area are based on archaeological and documentary records, which cannot directly reflect the shifting of population structure. To date, no ancient genome-wide data from this region has been published due to the poor condition of DNA preservation, thus hindering the study of population history in the Yangtze River basin. In this study, we collected and processed a new set of ancient human genomes from the lower Yangtze River basin, from which we aim to contribute to a deeper understanding of the demographic history of the region.
Results
Ancient genomic data production
To clarify the genetic components of the lower Yangtze River basin, we collected 67 individuals from seven archaeological sites dating from the Han Dynasty to the Qing Dynasty in Jiangsu province, east of the lower Yangtze River basin. We applied the single-strand deoxyribonucleic acid (ssDNA) library preparation procedure and in-solution capture strategy for all samples. We identified aDNA damage pattern in our libraries and estimated modern contamination of all individuals using schmutzi (Additional file 1: Figure S1 and Additional file 2: Table S1). We retained the libraries with contamination rates below 3% and more than 50,000 single nucleotide polymorphisms (SNPs) targeted in the 1240 k panel. To further assess contamination, we utilized ANGSD and ComtamLD. All fragments were included in the analysis for libraries with contamination rates under 3%. For samples with contamination rates over 3%, we restricted the downstream analysis to only DNA fragments presenting aDNA damage patterns. Libraries with fewer than 50,000 SNPs targeted in the 1240 k panel after filtering were excluded from the study. We obtained eight individuals with endogenous DNA ranging from 3.14 to 70.29% and 130,884 to 948,990 SNPs targeted in the 1240 k panel (Tables1 and2). After trimming six bp from each end of all reads, we performed a kinship analysis and confirmed no relationships in our samples (Additional file 3: Figure S2). All individuals were included in the subsequent population genetic analysis.
Table 1.
Summary of all sequenced individuals in this study
| Individual ID | aln.endogenous (before Dedup) | endogenous (after Dedup) | duplication rate | library.length | mapped.length | mapped reads | mapped bases | MT reads |
|---|---|---|---|---|---|---|---|---|
| 21SMM69 | 64.16% | 44.21% | 0.56 | 56 | 70 | 11383588 | 805211580 | 25835 |
| 21LHKMM16L † | 71.69% | 70.29% | 0.07 | 42 | 45 | 16051818 | 734996130 | 3974 |
| 21SMM158SE_1 ‡ | 25.18% | 24.83% | 0.02 | 46 | 49 | 12037162 | 596253889 | 6198 |
| 21LHKM106RL ‡ | 51.16% | 49.39% | 0.07 | 44 | 43 | 12629037 | 549721907 | 7370 |
| 21LHKMN112 † | 18.52% | 15.93% | 0.17 | 42 | 48 | 2712060 | 131144390 | 7920 |
| KWSM199 † | 3.34% | 3.14% | 0.06 | 52 | 55 | 1456754 | 80941301 | 1066 |
| KWSM146LR † | 13.90% | 12.55% | 0.11 | 40 | 45 | 2197490 | 100322894 | 8422 |
| 21LHKM15 | 7.39% | 6.37% | 0.15 | 38 | 44 | 666889 | 29665794 | 2238 |
| 21SMM157SE_2 † | 3.73% | 3.53% | 0.06 | 42 | 47 | 654772 | 31048704 | 2066 |
| 21SMM138E_2_2 † | 5.56% | 5.12% | 0.08 | 39 | 40 | 786679 | 32139124 | 1333 |
| KWSM119 | 2.58% | 2.47% | 0.04 | 41 | 42 | 438158 | 18405880 | 258 |
| 21LHKMM35 | 1.62% | 1.36% | 0.16 | 41 | 51 | 177795 | 9120839 | 120 |
| 21SMM158SE_2 | 0.68% | 0.63% | 0.07 | 41 | 46 | 138426 | 6444378 | 390 |
| 21LHKM98 | 1.48% | 1.32% | 0.11 | 41 | 45 | 148184 | 6746371 | 187 |
| 21SMM157SE_1 | 2.55% | 2.42% | 0.05 | 38 | 45 | 135740 | 6234972 | 265 |
| 21SMM157SE_3_2 | 0.97% | 0.90% | 0.08 | 40 | 46 | 116117 | 5395718 | 222 |
| 21LHKMM15R | 3.19% | 2.08% | 0.36 | 40 | 44 | 103782 | 4662526 | 85 |
| 21LHKMM189S | 2.17% | 1.90% | 0.12 | 42 | 44 | 113434 | 5023799 | 35 |
| 21JCXHM1 | 0.57% | 0.50% | 0.12 | 44 | 42 | 86638 | 3685942 | 3224 |
| 21LHKMM5 | 0.42% | 0.37% | 0.12 | 41 | 40 | 67110 | 2743324 | 632 |
| 21JCHYM6 | 0.45% | 0.33% | 0.27 | 47 | 50 | 51573 | 2611209 | 37 |
| 21SMM168SEb | 0.37% | 0.32% | 0.13 | 42 | 45 | 49333 | 2259857 | 110 |
| 21LHKMM75E | 0.14% | 0.13% | 0.09 | 42 | 38 | 70187 | 2718701 | 66 |
| 21SMM138 | 0.73% | 0.56% | 0.23 | 41 | 40 | 24641 | 1007995 | 5 |
| 21LHKMM14 | 1.02% | 0.71% | 0.3 | 44 | 41 | 31606 | 1314083 | 4 |
| 21LHKMM98S | 0.72% | 0.65% | 0.09 | 43 | 43 | 22741 | 986803 | 5 |
| 21SMM198 | 2.58% | 2.13% | 0.18 | 38 | 39 | 37524 | 1476886 | 5 |
| 21SMM138E_2_1 | 1.01% | 0.77% | 0.24 | 40 | 40 | 18481 | 745298 | 20 |
| 21LHKMM17R_2 | 0.88% | 0.68% | 0.23 | 40 | 37 | 22335 | 829563 | 211 |
| 21SMM157SE_3_1 | 0.66% | 0.57% | 0.13 | 44 | 41 | 24021 | 997933 | 12 |
| 21SMM82 | 0.53% | 0.35% | 0.35 | 37 | 37 | 25280 | 957457 | 14 |
| 21SMM120 | 1.64% | 1.39% | 0.16 | 38 | 38 | 19211 | 748676 | 6 |
| 21LHKMM175E | 0.71% | 0.52% | 0.28 | 39 | 38 | 13769 | 536957 | 12 |
| 21LHKMM139 | 7.11% | 5.20% | 0.28 | 41 | 46 | 34505 | 1612577 | 34 |
| 21SMM213 | 0.39% | 0.29% | 0.27 | 41 | 35 | 25863 | 929759 | 7 |
| 21LHKMM25 | 1.15% | 0.88% | 0.24 | 38 | 34 | 18964 | 654910 | 3 |
| 21LHKM110 | 0.56% | 0.50% | 0.1 | 44 | 40 | 10911 | 439276 | 1 |
| 21LHKMM24 | 0.27% | 0.21% | 0.22 | 40 | 33 | 15080 | 512105 | 2 |
| KWSM110R | 0.20% | 0.17% | 0.15 | 43 | 35 | 15084 | 533834 | 7 |
| 21LHKMM10 | 0.09% | 0.07% | 0.19 | 42 | 34 | 9854 | 341231 | 16 |
| 21LHKMM3 | 2.15% | 1.86% | 0.14 | 38 | 33 | 17060 | 572599 | 4 |
| 21SMM214 | 0.21% | 0.10% | 0.53 | 42 | 35 | 6010 | 212001 | 0 |
| 2019JCDFM3 | 0.18% | 0.15% | 0.15 | 41 | 33 | 11148 | 375833 | 0 |
| 21LHKMM26 | 0.34% | 0.31% | 0.11 | 39 | 33 | 9216 | 306476 | 2 |
| 21LHKM189N_2 | 0.32% | 0.30% | 0.06 | 43 | 35 | 5266 | 187094 | 0 |
| Individual ID | 5'C>T | 3'C>T | average quality | coverage | #SNPs HO | #SNPs 1240k | schmutzi |
|---|---|---|---|---|---|---|---|
| 21SMM69 | 0.0038 | 0.0046 | 59.5 | 26.01% | 541431 | 1057860 | NA |
| 21LHKMM16L † | 0.0941 | 0.0332 | 63.3 | 23.74% | 493542 | 948990 | 0.01 (0-0.02) |
| 21SMM158SE_1 ‡ | 0.1982 | 0.0881 | 63.3 | 19.26% | 485294 | 917479 | 0.01 (0-0.02) |
| 21LHKM106RL ‡ | 0.2386 | 0.1393 | 63.6 | 17.76% | 459131 | 859499 | 0.01 (0-0.02) |
| 21LHKMN112 † | 0.1517 | 0.1094 | 63.4 | 4.24% | 242308 | 442878 | 0.01 (0-0.02) |
| KWSM199 † | 0.2148 | 0.1378 | 62.6 | 2.62% | 205731 | 382281 | 0.01 (0-0.02) |
| KWSM146LR † | 0.1731 | 0.1126 | 63.3 | 3.24% | 194523 | 355641 | 0.01 (0-0.02) |
| 21LHKM15 | 0.1283 | 0.0876 | 63.4 | 0.96% | 77367 | 140890 | NA |
| 21SMM157SE_2 † | 0.1307 | 0.091 | 63.3 | 1.00% | 76528 | 138029 | 0.01 (0-0.02) |
| 21SMM138E_2_2 † | 0.186 | 0.1078 | 63.5 | 1.04% | 73524 | 130884 | 0.01 (0-0.02) |
| KWSM119 | 0.2561 | 0.1329 | 63.6 | 0.60% | 51960 | 93248 | 0.99 (0.98-0.99) |
| 21LHKMM35 | 0.011 | 0.0098 | 62.9 | 0.30% | 26044 | 47683 | NA |
| 21SMM158SE_2 | 0.1596 | 0.1023 | 63.2 | 0.21% | 18168 | 33479 | 0.99 (0.98-0.99) |
| 21LHKM98 | 0.1665 | 0.0886 | 63.4 | 0.22% | 17852 | 32788 | 0.99 (0.98-0.99) |
| 21SMM157SE_1 | 0.1581 | 0.0996 | 63.2 | 0.20% | 16630 | 30098 | 0.99 (0.98-0.99) |
| 21SMM157SE_3_2 | 0.0731 | 0.0542 | 63.2 | 0.17% | 14658 | 27008 | 0.99 (0.98-0.99) |
| 21LHKMM15R | 0.1653 | 0.1176 | 63.7 | 0.15% | 14422 | 25414 | NA |
| 21LHKMM189S | 0.2088 | 0.1177 | 63.5 | 0.16% | 13253 | 24037 | NA |
| 21JCXHM1 | 0.1413 | 0.103 | 63.6 | 0.12% | 9090 | 16465 | NA |
| 21LHKMM5 | 0.2231 | 0.1379 | 63.6 | 0.09% | 7730 | 13711 | 0.01 (0-0.02) |
| 21JCHYM6 | 0.0198 | 0.016 | 62.3 | 0.08% | 6606 | 11787 | NA |
| 21SMM168SEb | 0.1248 | 0.0736 | 63.3 | 0.07% | 6145 | 11374 | NA |
| 21LHKMM75E | 0.2306 | 0.1548 | 63.7 | 0.09% | 4379 | 7751 | 0.99 (0.98-0.99) |
| 21SMM138 | 0.1675 | 0.0934 | 63.5 | 0.03% | 2545 | 4503 | NA |
| 21LHKMM14 | 0.0716 | 0.0437 | 63.3 | 0.04% | 2096 | 3719 | NA |
| 21LHKMM98S | 0.0999 | 0.0552 | 63.3 | 0.03% | 1844 | 3259 | NA |
| 21SMM198 | 0.0548 | 0.0338 | 63.2 | 0.05% | 1765 | 3208 | NA |
| 21SMM138E_2_1 | 0.1007 | 0.0562 | 63.5 | 0.02% | 1451 | 2611 | NA |
| 21LHKMM17R_2 | 0.0787 | 0.0502 | 63.4 | 0.03% | 1449 | 2557 | NA |
| 21SMM157SE_3_1 | 0.0517 | 0.0392 | 63.4 | 0.03% | 1360 | 2415 | NA |
| 21SMM82 | 0.0239 | 0.0126 | 63.1 | 0.03% | 1264 | 2308 | NA |
| 21SMM120 | 0.0299 | 0.0224 | 63.1 | 0.02% | 1097 | 1968 | NA |
| 21LHKMM175E | 0.1331 | 0.0596 | 63.5 | 0.02% | 1026 | 1821 | NA |
| 21LHKMM139 | 0.0752 | 0.0466 | 63.4 | 0.05% | 945 | 1687 | NA |
| 21SMM213 | 0.0367 | 0.0284 | 63.2 | 0.03% | 596 | 1104 | NA |
| 21LHKMM25 | 0.0536 | 0.0341 | 63.3 | 0.02% | 421 | 775 | NA |
| 21LHKM110 | 0.0309 | 0.0235 | 63.2 | 0.01% | 313 | 567 | NA |
| 21LHKMM24 | 0.0421 | 0.0309 | 63.4 | 0.02% | 297 | 520 | NA |
| KWSM110R | 0.037 | 0.0259 | 63.2 | 0.02% | 284 | 512 | NA |
| 21LHKMM10 | 0.0498 | 0.0274 | 63.4 | 0.01% | 248 | 457 | NA |
| 21LHKMM3 | 0.0565 | 0.0291 | 63.2 | 0.02% | 141 | 286 | NA |
| 21SMM214 | 0.0358 | 0.0176 | 63.2 | 0.01% | 146 | 279 | NA |
| 2019JCDFM3 | 0.0163 | 0.017 | 63.2 | 0.01% | 137 | 237 | NA |
| 21LHKMM26 | 0.0322 | 0.0233 | 63.3 | 0.01% | 105 | 198 | NA |
| 21LHKM189N_2 | 0.0497 | 0.025 | 63.3 | 0.01% | 91 | 169 | NA |
Individuals used were highlighted by † (use all SNPs) and ‡(use SNPs only restrict to sequences with a PMD score of at least 3)
Table 2.
Summary of the samples that passed the quality control and used in downstream analysis
| Individual ID | Date (AD) | Lat. | Long. | endogenous | duplication rate | mapped.length | MT reads | MT haplotype | YHaplo | YLeaf | 5'C>T | 3'C>T | average quality | coverage |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 21LHKMM16L | Qing Dynasty (1644-1912) | 34.56794167 | 119.1629195 | 70.29% | 0.07 | 45 | 3974 | M7c1b2b | O1a1a2 | O-CTS52*(xO-CTS701,O-FGC66104,O-Y78688,O-Y174212) | 0.0941 | 0.0332 | 63.3 | 23.74% |
| 21SMM158SE_1 | Ming Dynasty (1368–1644) | 33.690316 | 118.634949 | 24.83% | 0.02 | 49 | 6198 | D4b2 | A | NA | 0.1982 | 0.0881 | 63.3 | 19.26% |
| 21LHKM106RL | Qing Dynasty (1644-1912) | 34.569017 | 119.178539 | 49.39% | 0.07 | 43 | 7370 | G2b2b | A | NA | 0.2386 | 0.1393 | 63.6 | 17.76% |
| 21LHKMN112 | Song Dynasty (960–1279) | 34.56794167 | 119.1629195 | 15.93% | 0.17 | 48 | 7920 | B4c1b2c2 | O1a1a2 | O-K587*(xO-CTS701,O-FGC66104,O-Y78688,O-Y137054) | 0.1517 | 0.1094 | 63.4 | 4.24% |
| KWSM199 | Qing Dynasty (1644-1912) | 34.569017 | 119.178539 | 3.14% | 0.06 | 55 | 1066 | M10a1a1b1 | A | NA | 0.2148 | 0.1378 | 62.6 | 2.62% |
| KWSM146LR | Ming Dynasty (1368–1644) | 34.569017 | 119.178539 | 12.55% | 0.11 | 45 | 8422 | D5b | O2a1c1a1 | O-F16340*(xO-MF7420) | 0.1731 | 0.1126 | 63.3 | 3.24% |
| 21SMM157SE_2 | Ming Dynasty (1368–1644) | 33.690316 | 118.634949 | 3.53% | 0.06 | 47 | 2066 | D5a2 | O2b1a | O-Y29783 | 0.1307 | 0.091 | 63.3 | 1.00% |
| 21SMM138E_2_2 | Qing Dynasty (1644-1912) | 33.690316 | 118.634949 | 5.12% | 0.08 | 40 | 1333 | B4b1c | A | NA | 0.186 | 0.1078 | 63.5 | 1.04% |
| Individual ID | #SNPs HO | pmd3filter_HO | #SNPs 1240k | pmd3filter_1240k | xCov (1240k) | yCov (1240k) | autoCov(1240k) | xCov/autoCov | gender(Cov) | schmutzi | angsd (Method1,old, MoM) | nSNPs | ContamLD_ExtCorr_Estimate | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 21LHKMM16L | 493542 | NA | 948990 | NA | 1.87974 | 2.75252 | 4.72286 | 0.398008834 | M | 0.01 (0-0.02) | 0.007645 | 7522 | 0.014 | |
| 21SMM158SE_1 | 485294 | 180080 | 917479 | 324611 | 2.72654 | 0.0113946 | 3.43195 | 0.794457961 | F | 0.01 (0-0.02) | NA | <200 | 0.07 | |
| 21LHKM106RL | 459131 | 195912 | 859499 | 350615 | 3.28572 | 0.012121 | 4.05952 | 0.809386331 | F | 0.01 (0-0.02) | NA | <200 | 0.061 | |
| 21LHKMN112 | 242308 | NA | 442878 | NA | 0.354736 | 0.48763 | 0.857697 | 0.41359128 | M | 0.01 (0-0.02) | 0.01578 | 1257 | 0.013 | |
| KWSM199 | 205731 | NA | 382281 | NA | 0.473056 | 0.00287712 | 0.527735 | 0.896389286 | F | 0.01 (0-0.02) | NA | <200 | 0.017 | |
| KWSM146LR | 194523 | NA | 355641 | NA | 0.258473 | 0.381077 | 0.654585 | 0.394865449 | M | 0.01 (0-0.02) | 0.028234 | 752 | 0.016 | |
| 21SMM157SE_2 | 76528 | NA | 138029 | NA | 0.0761866 | 0.111567 | 0.188506 | 0.40416008 | M | 0.01 (0-0.02) | 0.013656 | 91 | −0.028 | |
| 21SMM138E_2_2 | 73524 | NA | 130884 | NA | 0.144132 | 0.00256377 | 0.203977 | 0.706609078 | F | 0.01 (0-0.02) | NA | <200 | −0.013 |
We note that date of each sample was estimated vial archaeological materials
# endogenous : endogenous rate from Samtools flagstats (after Dedup)
# duplication : duplication rate from Dedup log
# mapped.length : average length of reads mapped to human only from Samtools stats
# mapped reads : mapped reads from Samtools stats
# MT haplotype : MT haplotype get from Haplogrep2
# Y haplotype : Y haplotype get from Yhaplo
# Y haplotype : Y haplotype get from Yleaf
# likelihood : likelihood of MT haplotype
# 5'C>T : C->T substitution rate in first position in 5'end
# 3'C>T : C->T substitution rate in first position in 3'end
# average quality : average base quality of mapped bases
# coverage : mapped bases/3095693981*100%
# #SNPs HO : #SNPs coverage on HO
# pmd3filter_HO : SNPs coverage on HO restrict to sequences with a PMD score of at least 3
# #SNPs 1240k : #SNPs coverage on 1240k
# pmd3filter_1240k : SNPs coverage on 1240k restrict to sequences with a PMD score of at least 3
All individuals who passed our filter came from three sites in northern Jiangsu Province (Kongwangshan, Lianyungang gym, and Nainaimiaodong), a relatively mountainous area that may benefit DNA preservation (Fig. 1A). These eight individuals in Jiangsu, ranging from the Song to the Qing Dynasties, provided genomic data for studying historical population structure in the lower Yangtze River basin.
Fig. 1.
Geographic location and principal components analysis of newly sampled individuals in this study. Each symbol stands for a population or individual inA andB, respectively.A Geographic location of newly sampled individuals and published ancient populations from East Asia used in this study. Made with Natural Earth. Free vector and raster map data @naturalearthdata.com.B Principal components analysis (PCA) of East Asian individuals. Ancient individuals (colored symbols) were projected onto the PCs calculated by modern East Asians (grey symbols)
Genetic stability and diversity in the lower Yangtze River basin
Due to the complex demographic history of the Lower Yangtze River basin, we aim to explore how population dynamics have shifted in this region over time. We first divided our individuals into three groups based on time (Jiangsu_Song, Jiangsu_Ming and Jiangsu_Qing) and performed principal component analysis (PCA) based on the “Human Origins” dataset (see also Methods and materials). Compared with ancient individuals from the Yellow River region, the samples from the Yangtze River basin were slightly shifted along PC1 towards the ancient southern populations (Fig. 1B). The clustering pattern of ancient Jiangsu individuals reveals their close genetic relationship with ancient Yellow River-related populations, suggests a minor contribution from southern ancestry, and indicates enduring genetic stability throughout this region’s complex demographic history.
We observed genetic homogeneity and similar genetic profiles among our individuals, except for KWSM199 from the Qing Dynasty (Fig. 2A, B). Outgroup-f3 statistics also support a close genetic affinity of Jiangsu individuals from other dynasties (Fig. 2C, Additional file 4: Table S2A). Furthermore, their affinity was confirmed with non-significant results in thef4(Mbuti, ancient East Asians; Jiangsu_ancient, Jiangsu_ancient), except for Nagqu1.1 k, who may have experienced continuous genetic exchanges with ancient populations from the Central Plain since the Yuan Dynasty (Additional file 5: Table S3A).
Fig. 2.
Genetic structure of ancient individuals in the lower Yangtze River basin.A Genetic homogeneity analysis using pairwiseqpWave; the number in cell stands for thep-value of Rank = 0qpWave modelling based on 1240 k datasets.B Unsupervised ADMIXTURE at K = 4. See also Additional file 6: Figure S3.C Outgroup-f3 analysis to evaluate which population shared the most alleles with our newly sampled individuals. We plotted the top 10 populations from our study to provide visual context for our findings. See also Additional file 4: Table S2A
KWSM199 showed heterogeneity compared to most Jiangsu individuals (p-value < 0.05 in pairwise-qpWave analysis). Therefore, we labelled this sample as Jiangsu_Qing_o. Meanwhile, we identified that this outlier harbored a Western Eurasian-related component in ADMIXTURE analysis (Fig. 2B and Additional file 6: Figure S3). Next, we appliedf4 analysis to quantify the affinity between Jiangsu_Qing_o and Western European populations. Slight signals inf4(Mbuti, all ancient populations; Jiangsu_Qing_o, YR_MN/YR_LBIA) suggested that Jiangsu_Qing_o shares more alleles with some Eastern Mediterranean populations (Additional file 5: Table S3B). A suggestive result from the subsequentf4(Mbuti, related populations; Jiangsu_Qing_o, Jiangsu_Qing) indicates a potential genetic connection between Jiangsu_Qing_o and Eastern Mediterranean-related populations (Additional file 5: Table S3C). In our admixture modelling, we used Egypt_Ptolemaic, which shares more alleles with Jiangsu_Qing_o, as a potential source. Our results indicate that Jiangsu_Qing_o can be modeled as a mixture of Yellow River-related populations and southern East Asians, with a minor contribution from Eastern Mediterranean-related lineage (~ 7.6%) (Fig. 3).
Fig. 3.
Ancestry components modelled byqpAdm. Each color represents an ancient population used in the analysis. See also Additional file 7: Table S4A
Southern China ancestry retained in historical Jiangsu individuals
Our Jiangsu individuals have shifted towards southern East Asians, indicating increased southern-related components compared to Yellow River-related populations (Fig. 1B). To investigate which population contributed to the formation of the historical Jiangsu population, we usedf4-statistics in the form off4(Mbuti, aEA; YR, Jiangsu_HE). As expected, significant signals confirmed the connection between Jiangsu_HE and southern East Asian populations (Z-score > 3) (Additional file 5: Table S3D). Interestingly, Tanshishan and Xitoucun, two ancient rice-farming populations in Fujian from about 4500 years ago, exhibited closer genetic affinity to our ancient Jiangsu individuals (Z-score > 3). Due to the lack of ancient DNA from the lower Yangtze River basin and considering that rice was first domesticated there, we used Tanshishan and Xitoucun as proxies for the unknown ancient populations in the lower Yangtze River basin.
We found that these two populations (i.e., Tanshishan and Xitoucun) are genetically closer to other southern ancient populations in Guangxi (i.e., GaoHuaHua, BaBanQinCen, etc.) (Additional file 5: Table S3E), suggesting the affinity between historical Jiangsu individuals and Tanshishan/Xitoucun may reflect a broad southern-related ancestry. However, we successfully modelled all historical Jiangsu individuals with the outgroup, including GaoHuaHua and BaBanQinCen: Jiangsu_Song can be 1-way modelled as YR_LBIA, while Jiangsu_Ming and Jiangsu_Qing can be modelled as a mixture of YR-LBIA (69.3–80.2%) and Tanshishan (19.8–30.7%), proving that historical Jiangsu individuals retained the local rice-farming ancestry despite the expansion of Yellow River-related populations (Fig. 3 and Additional file 7: Table S4A).
To investigate the admixture time between Yellow River-related populations and local ancestries in Jiangsu, we perform DATES analysis using YR_LBIA and Tanshishan as proxies for the northern and local population sources, respectively. Our results suggest that the genetic contribution from Yellow River-related ancestries might date to approximately 4400 years ago (Additional file 8: Table S5). Previous studies have proposed that the rice farming population from southern China may have migrated northward to the Yellow River basin between the Middle and Late Neolithic periods [15]. This is consistent with our findings, which suggest that the northern ancestries admixed with ancient Jiangsu populations around the same time. Therefore, these results support a hypothesis of a bidirectional expansion: ancient populations from the lower Yangtze River basin moved northward into the Yellow River basin, while Yellow River-related populations also expanded southward into the Yangtze River basin during the Middle to Late Neolithic periods.
Genetic contribution to modern Han Chinese
The Taiping Rebellion drastically reduced the population in the lower Yangtze River basin, especially in Jiangsu, leaving the question of what role historical Jiangsu populations played in modern Chinese populations. All Jiangsu individuals clustered with modern Han Chinese in PCA analysis (Fig. 4B). Unsurprisingly, results from the outgroup-f3 analysis show that ancient Jiangsu individuals share the highest genetic affinity with modern Han Chinese (Fig. 4C and Additional file 4: Table S2B). This genetic connection is further supported by significant negative results from the subsequentf4 analysis in the form off4(Mbuti, Jiangsu_HE; modern_Han, modern_EA) (Additional file 5: Table S3F).
Fig. 4.
Relationships between modern East Asian and ancient individuals in the lower Yangtze River basin.A Geographic location of modern populations in this study. Each symbol stands for a population or individual inA andB, respectively.B PCA of East Asian individuals. Ancient individuals were projected onto the PCs, which were calculated by modern East Asians.C Outgroup-f3 analysis to evaluate which modern East Asian population shared the most alleles with our newly sampled individuals. To provide visual context for our findings, we plotted the top 10 populations from our analysis. See also Additional file 4: Table S2B
To quantify the genetic contribution of ancient Jiangsu individuals to modern Han Chinese from the lower Yangtze River basin and surrounding regions, we usedqpAdm to model the admixture proportions. We found that modern Han Chinese in the lower Yangtze River basin (i.e., Jiangsu, Zhejiang, and Shanghai) can be modelled as direct descendants of Jiangsu_Qing. In contrast, Han_Shandong is homogeneous to Jiangsu_Song, who have more northern ancestry than Jiangsu_Qing (Additional file 7: Table S4B).
Discussion
Despite multiple waves of emigration and significant population decline, the population structure in the lower Yangtze River barely changed, maintaining genetic stability and high genetic affinity to Yellow River-related populations. However, we identified an outlier labeled as Jiangsu_Qing_o, who appears to share alleles with Eastern Mediterranean-related populations. OurqpAdm analysis indicated that Jiangsu_Qing_o can be modelled as a mixture of Yellow River-related populations and Tanshishan with a minor contribution (7.6%) from Egypt_Ptolemaic, used here as a proxy for Eastern Mediterranean-related populations. Notably,f4(Mbuti, related populations; Jiangsu_Qing_o, Jiangsu_Qing) provides only suggestive evidence for genetic affinity between Jiangsu_Qing_o and Egypt_Ptolemaic. Additional ancient human samples from this region are required to further evaluate the possibility of international intermarriage in ancient Jiangsu. On the other hand, current methods for estimating admixture time are designed for two-source mixtures; we cannot directly determine the admixture date for Jiangsu_Qing_o. According to historical records, Dapu Port, an international port in Jiangsu, had been open since 1905, providing an environment that enabled intermarriage. However, government-regulated trades prior to the port’s opening may also have contributed to genetic admixture. Therefore, we hypothesized that this East–West admixture event could have occurred before the Qing Dynasty. This finding suggests that the prosperous economy in historical Jiangsu provided an environment conducive to international intermarriage, thereby increasing genetic diversity with a stable genetic background in this area.
On the other hand, the expansion of Yellow River-related populations significantly changed the population structure in southern China [12,13]. Consistent with other ancient populations in southern China, historical Jiangsu individuals are highly associated with Yellow River-related populations, like other historical ancient individuals. However, we observed the connection between historical Jiangsu individuals and some southern ancient ancestries (i.e., Tanshishan were used in our analysis as the proxy of the unknown ancient ancestry in the lower Yangtze River). We assumed that historical Jiangsu populations still retained their local rice-farming-related ancestry. Previous studies demonstrated that ancestries in the Yellow River basin received additional gene flow from the northward expansion of rice-farming communities during the Neolithic period. Given our successful modelling of admixture events between ancient populations in the Yellow River and Yangtze River basin around 4500 years ago, we proposed that migrations between the two regions during the Neolithic period were bidirectional [1,15]. However, this assumption is based on two hypotheses: that millet and rice farming ancestry in the Neolithic were distinctly different populations, and that ancient rice farming ancestry in Fujian (i.e., Tanshishan and Xitoucun) can be representative of the unknown rice farming ancestry in the Yangtze River basin.
According to historical records, the Taiping Rebellion drastically decreased the population in the middle and lower Yangtze River basin, especially in Jiangsu, which lost around 70% of its population during that period [10]. While people in northern Jiangsu contributed the majority of the population to southern Jiangsu, some regions of southern Jiangsu still received around 50% of emigrants from other areas like Anhui and Henan [11], which may bring more northern ancestry from the Central Plain to modern populations in Jiangsu. Nonetheless, modern Han Chinese in Jiangsu and adjacent regions, including Shanghai, Zhejiang, and Shandong, can be modelled as deriving all their ancestry from historical Jiangsu populations. All Han Chinese are genetically homogenous to Jiangsu_Qing except Han in Shandong, who exhibit more northern ancestry (i.e., can be one-way modelled as Jiangsu_Song). Despite significant documentary migration in the lower Yangtze River basin, we speculate that this temporal genetic affinity is due to two reasons: (1) Frequent migrations during historical periods that homogenized genetic profiles across the lower Yangtze River basin. (2) The limitations of HO datasets in detecting subtle gene flow among populations with similar genetic components.
We note that our study is based on eight individuals from historical Jiangsu, which cannot fully represent the genetic structure in the lower Yangtze River. We acknowledge that our analysis lacks ancient genomes from rice farming populations that lived in the Neolithic Yangtze River basin. Thus, additional ancient genomic data from the Neolithic Yangtze River is needed to understand the genetic change in this area.
Conclusions
This study reports the first genomic dataset from the lower Yangtze River basin. We investigated the demographic history of Jiangsu from the Song to the Qing dynasty by analyzing eight ancient genomic samples from Jiangsu. We showed that individuals from historical Jiangsu were genetically homogeneous and had a high affinity to Yellow River-related populations, indicating high genetic stability in this region. Additionally, we identified an outlier that may harbor Eastern Mediterranean-related ancestry. Despite the expansion of ancient northern populations, ancient southern ancestry persisted in the lower Yangtze River basin, a heritage retained by all historical Jiangsu individuals. Present-day Han Chinese residing in the lower Yangtze River can be modelled as descendants of historical Jiangsu individuals. Our results provide the first genetic insights into the lower Yangtze River basin populations. By providing genetic data from this region, we can investigate the population structure shifting and associate it with social and environmental changes.
Methods
Archaeological information
The individuals sampled in this study are from archaeological sites in Jiangsu province; all individuals are dated based on archaeological evidence.
Dafencun
This site is near the north of Dafen village in Changzhou, Jiangsu province. Four tombs were found here, and the archaeological materials were identified as belonging to the Song Dynasty [16]. We sampled a male (2019JCDF M3) at this site based on anatomy for library preparation, but it was excluded due to poor endogenous DNA.
Hongmeixicun
This site is near the Hongmei West Village in Changzhou, Jiangsu province. The archaeological materials identified individuals from the Song Dynasty. Based on anatomy at this site, one female (21JCXH M1) was sampled for library preparation but excluded due to poor endogenous DNA. corr20_2e8d764a-40c8-48b5-b9af-3464d5f11c40.
Beishezhuang
This site is near the north of Beishezhuang village in Changzhou, Jiangsu province—archaeological materials identified in this site span from the Three Kingdoms period to the Qing Dynasty. We sampled a male (2020JCB M41) from the Ming Dynasty based on burial materials and anatomy for our study, but it was excluded due to poor endogenous DNA.
Huayuandi
This site is a family tomb at the Huayuandi village in Changzhou, Jiangsu province. Based on its casting structure, which was commonly seen in the Ming Dynasty [13], this archaeological site was identified as from the Ming Dynasty. We sampled two females who seemed to be the concubines of a general. Our population genetic analysis did not include these two samples due to poor endogenous DNA.
Kongwangshan
This archaeological site is near the south of Kongwangshan village in Lianyungang, Jiangsu province. It spans the Zhou Dynasty to the Qing Dynasty [17]. In this study, we collected one male from the Ming Dynasty, ten males, and seven females from the Qing Dynasty. Based on archaeological evidence, we obtained three high-quality genomic data from individuals 21LHKM106RL, KWSM199, and KWSM146LR from the Ming Dynasty.
Lianyungang gym
This site is near the south of the gymnasium of the Haizhou region in Lianyungang, Jiangsu province. It spans the periods of the Wei, Jin, Northern, and Southern Dynasties to the Qing Dynasty [18]. We prepared a library and sequenced 25 individuals dated from the late Tang Dynasty to the Qing Dynasty at this site. Our downstream analysis used one male, 21LHKMM16L, from the Qing Dynasty and one female, 21LHKMN112, from the Song Dynasty, identified based on archaeological materials and a radiocarbon-dated coffin.
Nainaimiaodong
This site is near the east of a temple in Suqian, Jiangsu province. This archaeological site spans the Han Dynasty to the Qing Dynasty. We obtained samples from 36 individuals on this site. Our downstream analysis used two individuals, 21SMM158SE_1 and 21SMM157SE_2, from the Ming Dynasty and one individual, 21SMM138E_2_2, from the Qing Dynasty that were identified based on archaeological materials.
Ancient DNA extraction and library preparation
We screened 67 individuals from seven archaeological sites in Jiangsu province. All samples were processed in the dedicated ancient DNA clean room at the Institute of Anthropology, Xiamen University. Human remains were first cleaned with 75% ethanol and 10% sodium hypochlorite solution to eliminate dust and external DNA contaminants, followed by 30 min of exposure to ultraviolet light. We used dental drills to obtain powder from teeth and the petrous parts of the temporal bones. We modified DNA extraction using Rohland’s protocol [19]: 1 ml lysis buffer containing 0.5 mM EDTA and 0.25 mg/mL Proteinase K to digest powder in a shaker at 37 ℃, 300 rpm. The DNA solution was purified with a MinElute kit (Qiagen, Germany) following the manufacturer’s manual. We applied a single-strand library preparation procedure to prepare libraries for all samples [20]. We utilized an in-solution DNA hybridization capture technique to enrich mitochondrial and nuclear DNA [21]. Enriched library sequencing was performed using the Element platform with a customized sequencing primer: ACACTCTTTCCCTACACGACGCTCTTCC.
Sequence data processing
We used AdapterRemoval v2.3.15 to trim adaptors and merge paired reads into a single sequence [22]. Merged reads at least 30 bp in length were mapped onto the human reference genome hs37d5 using the BWA-aln/samse algorithm implemented in BWA v0.7.176, with the parameters—l 1024 and—n 0.01[23]. PCR duplicates were removed using dedupe v0.12.3 [24]. Each end of the reads was trimmed using trimBam implemented in BamUtil v1.0.14 [25]. We further filtered the alignment quality using SAMtools with the parameters -q30 and -Q30[26].
Authentication of ancient DNA
We used pmdtools to analyze the deamination pattern of aDNA in all libraries [27]. We eliminated contamination rates using Schmutzi and ANGSD for libraries from all individuals and males, respectively [28,29]. Specifically, Schmutzi was used with the –uselength option for genotyping and contamination estimation, and the contDeam.pl script was applied to remove deaminated bases. For ANGSD, we used the -doCounts 1 option to count reads and -minMapQ 30 and -minQ 30 for filtering low-quality reads. Additionally, we estimated contamination using the contamination tool with the -a option for inputting the counts file and the HapMap reference for comparison. Two libraries, 21SMM158SE_1 and 21LHKM106RL, showed contamination rates above 3%. We filtered reads with a characteristic aDNA damage signature for downstream analysis using pmdtools [24] for these libraries.
Sex determination
Sex determination was based on coverage of X and Y chromosomes and autosomes [30].
Kinship detection
We used READ to detect degrees of kinship among individuals in this study [31], with normalization based on the median of all pairwise genetic distances.
Data merging
We merged our genomic data with previously published datasets using mergeit in EIGENSOFT. Two datasets were used in this study [32,33]: the HumanOrigins dataset was used for principal component analysis, and the 1240 k dataset was used in admixture analysis, outgroup-f3,f4,qpWave,qpAdm analysis, and DATES [34].
Principal components analysis (PCA)
We performed principal components analysis based on the HumanOrigins dataset using smartpca v16000 in EIGENSOFT with default settings, except that lsqproject was set to YES [35]. All historical Jiangsu individuals were projected onto the PCA, which was calculated with modern populations.
ADMIXTURE analysis
We pruned our dataset for linkage disequilibrium using plink v1.90 with parameters -indep-pairwise 200 25 0.4 [36–38]. Then, we performed unsupervised admixture analysis using ADMIXTURE v.1.3.0 [39].
f-statistic analysis
We calculated outgroup-f3 usingqp3pop v651 from the ADMIXTOOLS with the parameter inbreed: YES. We calculated thef4 analysis usingqpDstat v980 from the ADMIXTOOLS with the f4-mode: YES [34,37].
Admixture modelling
We modelled our studied populations as an admixture of the source populations usingqpAdm v810 from ADMIXTOOLS. We used settings as follows: allsnp: YES, inbreed: YES [37].
Calculating admixture dates
We applied the DATES v753 to eliminate admixture dates with the following settings: binsize, 0.001; maxdis, 0.5; runmode, 1; Seed, 77; mincount, 1; jackknife, YES; qbin, 10; lovalfit, 0.45. We assumed 29 years per generation to calculate the dates [40].
Supplementary Information
Additional file 1: Figure S1. Ancient DNA damage patterns were calculated using pmdtools. We only show the results of individuals who passed our filter and used it in this study
Additional file 2: Table S1. Summary of all individuals sequenced in this study. Individuals used were highlighted by † (use all SNPs) and ‡(use SNPs only restrict to sequences with a PMD score of at least 3)
Additional file 3: Figure S2. Kinship analysis using READ. A. The two dotted lines represent the thresholds for first- and second-degree relationships. Pairs of individuals falling below the lower or upper dotted line are inferred to share a first- or second-degree relationship, respectively. B. Frequency distribution of average pairwise P0 calculated using READ
Additional file 4: Table S2. Outgroup-f3 analysis. Table S2A. Outgroup- f3 analysis in form of f3(ancient East Asian, Ancient_Jiangsu; Mbuti). Table S2B. Outgroup- f3 analysis in form of f3(modern East Asian, Historical_Jiangsu; Mbuti).
Additional file 5: Table S3.f4 analysis. Table S3A. Pair-wisef4 analysis in form off4 (Mbuti, ancient EastAsian; Jiangsu_ancient, Jiangsu_ancient). Jiangsu_ancient stands for four groups including Jiangsu_Song, Jiangsu_Ming, Jiangsu_Qing and Jiangsu_Qing_o. Table S3B.f4 analysis in form off4 (Mbuti, all ancient populations; Jiangsu_Qing_o, YR_MN/YR_LBIA). We included all individuals in AADR 1240 k dataset in'all ancient populations'group to detect potential geneflow. Table S3C.f4 analysis in form off4 (Mbuti, related ancestry; Jiangsu_Qing_o, Jiangsu_Qing). Related European ancestry include all ancient populations that share more allels to Jiangsu_Qing_o compared to Yellow River-related populations. Table S3D.f4 analysis in form off4 (Mbuti, aEA; YR, Jiangsu_HE). aEA stands for ancient East Asian; YR stands for Yellow River-related populations; Jiangsu_HE stands for historical Jiangsu populations, including Jiangsu_Song, Jiangsu_Ming, Jiangsu_Qing and Jiangsu_Qing_o. Table S3E.f4 analysis in form off4 (Mbuti, Tanshishan/Xitoucun; Jiangsu_HE, ancient south/southeast Asian). Jiangsu_HE stands for historical Jiangsu populations, including Jiangsu_Song, Jiangsu_Ming, Jiangsu_Qing and Jiangsu_Qing_o. Table S3F.f4 analysis in form off4 (Mbuti, Jiangsu_HE; modern_Han, modern_EA). Jiangsu_HE stands for historical Jiangsu populations, including Jiangsu_Song, Jiangsu_Ming, Jiangsu_Qing and Jiangsu_Qing_o; modern_Han including all Han_Chinese in HO dataset; modern_EA stands for modern East Asian, including all modern East Asian population in HO dataset.
Additional file 6: Figure S3. Unsupervised ADMIXTURE for analyzing ancestral components of East Asians. (A) CV error of the analysis. The lowest CV is when K = 4. (B) The ADMIXTURE analysis at different K values (K = 3 to K = 9)
Additional file 7: Table S4.qpAdm modeling. qpAdm evaluation of 1-way, 2-way and 3-way models to estimate. A) the ancestry compositon of Jiangsu_ancient. B) contribution of Jiangsu_ancient ancestry in modern population. Models with tail > 0.05 suggested an adequate model fit, show on the top of the table.
Additional file 8: Table S5. Admixture dates modelling. Year(29) and year(25) represent 29 and 25 years per generation, respectively
Acknowledgements
We thank the editors and reviewers for their contributions and suggestions. We thank all participants in these studies. SF and ZX from the Information and Network Center of Xiamen University are acknowledged for their help with high-performance computing.
Abbreviations
- BP
Before present
- CE
Common era
- aDNA
Ancient deoxyribonucleic acid
- ssDNA
Single-strand deoxyribonucleic acid
- SNPs
Single nucleotide polymorphisms
- PCA
Principal component analysis
- HO
HumanOrigin
- HE
Historical era
- EA
East Asia
Authors' contributions
Xin Jia, Chuan-Chao Wang conceived and supervised the project. Xin Jia, Xinyuan Kong, Liangsai Zhu provided the materials and resources. Xinyuan Kong, Liangsai Zhu performed the archaeological data analysis. Haifeng He, Le Tao, Mengting Xu performed the wet laboratory work. Haifeng He, Le Tao performed the genetic data analysis and prepared the figures. Kongyang Zhu, Yu Xu, Haodong Chen, Rui Wang, Xiaoming Yang, Tianyou Bai, Hao Ma provided scripts for data analysis. Haifeng He, Le Tao wrote and edited the manuscript. Yang Yang, Jianxin Guo, Xin Jia, Chuan-Chao Wang revised the manuscript. All authors contributed to the article and approved the final version for submission.
Funding
The work was funded by the National Natural Science Foundation of China (T2425014 and 32270667), the Natural Science Foundation of Fujian Province of China (2023J06013), the Major Project of the National Social Science Foundation of China (21&ZD285), Open Research Fund of State Key Laboratory of Genetic Engineering at Fudan University (SKLGE-2310), Open Research Fund of Forensic Genetics Key Laboratory of the Ministry of Public Security (2023FGKFKT07), and National Key Research and Development Program of China (2024YFC3306701, 2023YFC3303701-02).
Data availability
The raw sequence data reported in this paper have been deposited in the Genome Sequence Archive in the National Genomics Data Center, China National Center for Bioinformation/Beijing Institute of Genomics, Chinese Academy of Sciences (GSA-Human: HRA008768), which is publicly accessible at https://ngdc.cncb.ac.cn/gsa-human.
Declarations
Ethics approval and consent to participate
All procedures performed in studies involving human participants were approved by the local archaeological institutions and Xiamen University (XDYX202412K88) reviewed and approved the study.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Footnotes
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Haifeng He, Xinyuan Kong, Le Tao and Liangsai Zhu contributed equally to this work.
Contributor Information
Xin Jia, Email: jiaxin@njnu.edu.cn.
Chuan-Chao Wang, Email: chuanchaowang@fudan.edu.cn.
References
- 1.Zhang J, Jiang L, Yu L, Huan X, Zhou L, Wang C, et al. Rice’s trajectory from wild to domesticated in East Asia. Science. 2024;384(6698):901–6. [DOI] [PubMed] [Google Scholar]
- 2.Jia X, Zhao D, Storozum MJ, Shi H, Bai G, Liu Z. The “2.8 ka BP cold event” indirectly influenced the agricultural exploitation during the late Zhou Dynasty in the coastal areas of the Jianghuai Region. Front Plant Sci. 2022;13: 902534. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.He K, Lu H, Zhang J, Wang C, Huan X. Prehistoric evolution of the dualistic structure mixed rice and millet farming in China. The Holocene. 2017;27(12):1885–98. [Google Scholar]
- 4.Tang Y, Marston JM, Fang X. Early millet cultivation, subsistence diversity, and wild plant use at Neolithic Anle, Lower Yangtze. China. Holocene. 2022;32(10):1003–14. [Google Scholar]
- 5.Hayashi M. The iconography of gods and beasts: ancient Chinese deities [in Chinese]. Beijing: Sanlian Press; 2009. p. 259. [Google Scholar]
- 6.Li X. Emerging from the age of doubt in ancient history [in Chinese]: Liaoning University Press; 1994.
- 7.Zhang GZ, Xu PF. The formation of Chinese civilization [in Chinese]. Beijing: New World Press; 2004. [Google Scholar]
- 8.Ge J, Cao S, Wu S. A concise history of Chinese migration [in Chinese]: Fujian People's Publishing House; 1993. p. 246–7.
- 9.Hu A. An essay on ethnic migrations after the disasters of the Yongjia Period in the Jin Dynasty [in Chinese]. J Anhui Univ (Philos Soc Sci Ed). 2010;34(05):100–11. [Google Scholar]
- 10.Mo X, Gao S. Tongzhi Era administrative offices of the two counties in Upper Jiang [in Chinese]: Jiangsu Ancient Books Publishing House; 1874.
- 11.Ge J, Cao S, Wu S. A concise history of Chinese migration [in Chinese]: Fujian People's Publishing House; 1993. p. 459–64.
- 12.Tao L, Yuan H, Zhu K, Liu X, Guo J, Min R. Ancient genomes reveal millet farming-related demic diffusion from the Yellow River into southwest China. Curr Biol. 2023; 33(22):4995-5002.e7. [DOI] [PubMed] [Google Scholar]
- 13.Wang T, Wang W, Xie G, Li Z, Fan X, Yang Q. Human population history at the crossroads of East and Southeast Asia since 11,000 years ago. Cell. 2021;184(14):3829–41 e21. [DOI] [PubMed]
- 14.Yang MA, Fan X, Sun B, Chen C, Lang J, Ko YC, et al. Ancient DNA indicates human population shifts and admixture in northern and southern China. Science. 2020;369(6501):282–8. [DOI] [PubMed] [Google Scholar]
- 15.Ning C, Li T, Wang K, Zhang F, Li T, Wu X, et al. Ancient genomes from northern China suggest links between subsistence changes and human migration. Nat Commun. 2020;11(1):2700. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Huang D, Sima J, Zeng D, Wang L. Brief report on the clearing of Song Dynasty tombs in Dafun Village. Jintan Cult Relics East. 2021;04:25–32 Chinese. [Google Scholar]
- 17.Li H, Liang W, Du P, Chen Y, Jia X. Agricultural economy investigation in Huaibei area of Jiangsu Province during the Sui and Tang Dynasties: an agricultural archaeological research from the Kongwangshan Cemetery in Lianyungang. Agric Hist China. 2022;41(06):50–62 Chinese. [Google Scholar]
- 18.Zhang D, Jia X, et al. Species identification and application potential of dendrochronology to Tang and Song dynasties coffins in northern Jiangsu Province. Quat Sci. 2024;44(4):1008–20 Chinese. [Google Scholar]
- 19.Rohland N, Glocke I, Aximu-Petri A, Meyer M. Extraction of highly degraded DNA from ancient bones, teeth and sediments for high-throughput sequencing. Nat Protoc. 2018;13(11):2447–61. [DOI] [PubMed] [Google Scholar]
- 20.Gansauge MT, Aximu-Petri A, Nagel S, Meyer M. Manual and automated preparation of single-stranded DNA libraries for the sequencing of DNA from ancient biological remains and other sources of highly degraded DNA. Nat Protoc. 2020;15(8):2279–300. [DOI] [PubMed] [Google Scholar]
- 21.Rohland N, Mallick S, Mah M, Maier R, Patterson N, Reich D. Three assays for in-solution enrichment of ancient human DNA at more than a million SNPs. Genome Res. 2022;32(11–12):2068–78. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Schubert M, Lindgreen S, Orlando L. AdapterRemoval v2: rapid adapter trimming, identification, and read merging. BMC Res Notes. 2016;9:88. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25(14):1754–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Peltzer A, Jager G, Herbig A, Seitz A, Kniep C, Krause J, et al. EAGER: efficient ancient genome reconstruction. Genome Biol. 2016;17: 60. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Jun G, Wing MK, Abecasis GR, Kang HM. An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data. Genome Res. 2015;25(6):918–25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, et al. Twelve years of SAMtools and BCFtools. Gigascience. 2021;10(2):giab008. [DOI] [PMC free article] [PubMed]
- 27.Skoglund P, Northoff BH, Shunkov MV, Derevianko AP, Paabo S, Krause J, et al. Separating endogenous ancient DNA from modern day contamination in a Siberian Neandertal. Proc Natl Acad Sci U S A. 2014;111(6):2229–34. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Korneliussen TS, Albrechtsen A, Nielsen R. ANGSD: analysis of next generation sequencing data. BMC Bioinformatics. 2014;15(1):356. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Renaud G, Slon V, Duggan AT, Kelso J. Schmutzi: estimation of contamination and endogenous mitochondrial consensus calling for ancient DNA. Genome Biol. 2015;16:224. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Fu Q, Posth C, Hajdinjak M, Petr M, Mallick S, Fernandes D, et al. The genetic history of Ice Age Europe. Nature. 2016;534(7606):200–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Monroy Kuhn JM, Jakobsson M, Gunther T. Estimating genetic kin relationships in prehistoric populations. PLoS One. 2018;13(4): e0195491. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Mallick S, Micco A, Mah M, Ringbauer H, Lazaridis I, Olalde I, et al. The Allen Ancient DNA resource (AADR) a curated compendium of ancient human genomes. Sci Data. 2024;11(1): 182. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Mallick S, Reich D. A curated compendium of ancient human genomes. 2023. Harvard Dataverse. 10.7910/DVN/FFIDCW.
- 34.Patterson N, Moorjani P, Luo Y, Mallick S, Rohland N, Zhan Y, et al. Ancient admixture in human history. Genetics. 2012;192(3):1065–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Patterson N, Price AL, Reich D. Population structure and eigenanalysis. PLoS Genet. 2006;2(12): e190. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience. 2015;4:7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Peter BM. Admixture, population structure, and F-statistics. Genetics. 2016;202(4):1485–501. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Lawson DJ, van Dorp L, Falush D. A tutorial on how not to over-interpret STRUCTURE and ADMIXTURE bar plots. Nat Commun. 2018;9(1):3258. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Alexander DH, Novembre J, Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 2009;19(9):1655–64. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Chintalapati M, Patterson N, Moorjani P. The spatiotemporal patterns of major human admixture events during the European Holocene. eLife. 2022;11. [DOI] [PMC free article] [PubMed]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Additional file 1: Figure S1. Ancient DNA damage patterns were calculated using pmdtools. We only show the results of individuals who passed our filter and used it in this study
Additional file 2: Table S1. Summary of all individuals sequenced in this study. Individuals used were highlighted by † (use all SNPs) and ‡(use SNPs only restrict to sequences with a PMD score of at least 3)
Additional file 3: Figure S2. Kinship analysis using READ. A. The two dotted lines represent the thresholds for first- and second-degree relationships. Pairs of individuals falling below the lower or upper dotted line are inferred to share a first- or second-degree relationship, respectively. B. Frequency distribution of average pairwise P0 calculated using READ
Additional file 4: Table S2. Outgroup-f3 analysis. Table S2A. Outgroup- f3 analysis in form of f3(ancient East Asian, Ancient_Jiangsu; Mbuti). Table S2B. Outgroup- f3 analysis in form of f3(modern East Asian, Historical_Jiangsu; Mbuti).
Additional file 5: Table S3.f4 analysis. Table S3A. Pair-wisef4 analysis in form off4 (Mbuti, ancient EastAsian; Jiangsu_ancient, Jiangsu_ancient). Jiangsu_ancient stands for four groups including Jiangsu_Song, Jiangsu_Ming, Jiangsu_Qing and Jiangsu_Qing_o. Table S3B.f4 analysis in form off4 (Mbuti, all ancient populations; Jiangsu_Qing_o, YR_MN/YR_LBIA). We included all individuals in AADR 1240 k dataset in'all ancient populations'group to detect potential geneflow. Table S3C.f4 analysis in form off4 (Mbuti, related ancestry; Jiangsu_Qing_o, Jiangsu_Qing). Related European ancestry include all ancient populations that share more allels to Jiangsu_Qing_o compared to Yellow River-related populations. Table S3D.f4 analysis in form off4 (Mbuti, aEA; YR, Jiangsu_HE). aEA stands for ancient East Asian; YR stands for Yellow River-related populations; Jiangsu_HE stands for historical Jiangsu populations, including Jiangsu_Song, Jiangsu_Ming, Jiangsu_Qing and Jiangsu_Qing_o. Table S3E.f4 analysis in form off4 (Mbuti, Tanshishan/Xitoucun; Jiangsu_HE, ancient south/southeast Asian). Jiangsu_HE stands for historical Jiangsu populations, including Jiangsu_Song, Jiangsu_Ming, Jiangsu_Qing and Jiangsu_Qing_o. Table S3F.f4 analysis in form off4 (Mbuti, Jiangsu_HE; modern_Han, modern_EA). Jiangsu_HE stands for historical Jiangsu populations, including Jiangsu_Song, Jiangsu_Ming, Jiangsu_Qing and Jiangsu_Qing_o; modern_Han including all Han_Chinese in HO dataset; modern_EA stands for modern East Asian, including all modern East Asian population in HO dataset.
Additional file 6: Figure S3. Unsupervised ADMIXTURE for analyzing ancestral components of East Asians. (A) CV error of the analysis. The lowest CV is when K = 4. (B) The ADMIXTURE analysis at different K values (K = 3 to K = 9)
Additional file 7: Table S4.qpAdm modeling. qpAdm evaluation of 1-way, 2-way and 3-way models to estimate. A) the ancestry compositon of Jiangsu_ancient. B) contribution of Jiangsu_ancient ancestry in modern population. Models with tail > 0.05 suggested an adequate model fit, show on the top of the table.
Additional file 8: Table S5. Admixture dates modelling. Year(29) and year(25) represent 29 and 25 years per generation, respectively
Data Availability Statement
The raw sequence data reported in this paper have been deposited in the Genome Sequence Archive in the National Genomics Data Center, China National Center for Bioinformation/Beijing Institute of Genomics, Chinese Academy of Sciences (GSA-Human: HRA008768), which is publicly accessible at https://ngdc.cncb.ac.cn/gsa-human.



