WO2025125413A1

Movatterモバイル変換

Info

Publication number: WO2025125413A1
Application number: PCT/EP2024/085898
Authority: WO
Inventors: Andreas Marx; Melanie HENKEL
Original assignee: Universitaet Konstanz
Current assignee: Universitaet Konstanz
Priority date: 2023-12-13
Filing date: 2024-12-12
Publication date: 2025-06-19
Anticipated expiration: 2026-06-13
Also published as: WO2025124705A1

Abstract

The present invention relates to a mutated DNA polymerase derived from wild-typeThermus aquaticus (Taq) DNA polymerase, a method for the detection of 5-methylcytosine nucleotides (5mC) in a DNA molecule of interest using said DNA polymerase, and a kit comprising said DNA polymerase. Further, the present invention concerns related DNA polymerases of Sequence Family A carrying the corresponding mutations.

Description

GTTCTGGAAGCCCTGCGTGAAGCACATCCGATTGTGGAAAAAATTCTGCAGTATCGCGAA CTGACCAACCTGAAAAGCACCTATATCGATCCGCTGCCGGATCTGATTCATCCGCGTACC GGTCGTCTGCATACCCGTTTTAATCAGACCGCAACCAAAACCGGTCGCCTGAGCAGCAGC GATCCGAATCTGCAGAATATTCCGGGTCGTACACCGCTGGGTCAGCGTATTCGTCGTGCA TTTATTGCAGAAGAAGGTTGGCTGCTGGTTGCACTGGATTATAGCCAGATGGAACTGCGT GTTCTGGCCCATCTGAGCGGTGATGAAAATCTGATTCGCGTGTTTCAGGAAGGTCGCGAT ATTCATACCGAAACCGCAAGCTGGATGTTTGGTGTTCCGCGTGAAGCAGTTGATCCGCTG ATGCGTCGTGCAGCAAAAACCATTAATTTTGGGGTGCTGTATGGTATGAGCGCACATCGT CTGAGCCAGGAACTGGCAATTCCGTACGAAGAAGCCCAGGCATTTATCGAACGTTATTCT CAGAGCTTTCCGAAAGTTCGTGCCTGGATTGAAAAAACCCTGGAAGAAGGTCGTCGTCGC GGTTATGTTGAAACCCTGTTTGGTCGTCGTCGTTATGTTCCGGATCTGGAAGCACGTGTT AAAAGCGTTCGTGAAGCAGCAGAACGTATGGCCTTTAATATGCCGGTTCAGGGCACCGCA GCAGATCTGATGAAACTGGCCATGGTTAAACTGTTTCCGCGTCTGGAAGAAATGGGTGCA CGTATGCTGCTGCAGGTTCATGATGAACTGGTGCTGGAAGCACCGAAAGAACGTGCAGAA GCAGTTGCCCGTCTGGCAAAAGAAGTTATGGAAGGCGTTTATCCGCTGGCAGTTCCGCTG GAAGTTGAAGTTGGTATTGGTGAAGATTGGCTGTCTGCAAAAGAA bold: mutated codon Dark grey: KlenTaq wild-type base Light grey: mutated base SEQ ID NO: 7 (nucleotide sequence coding for KlenTaq RIII H20) ATGAGAGGATCTCACCATCACCATCACCATACGGATCCGCATGCAGCACTGGAAGAAGCA CCTTGGCCTCCGCCTGAAGGTGCATTTGTTGGTTTTGTTCTGAGCCGTAAAGAACCGATG TGGGCAGATCTGCTGGCACTGGCAGCAGCACGTGGTGGTCGTGTTCATCGTGCACCGGAA CCGTATAAAGCTCTGCGCGATCTGAAAGAAGCACGCGGTCTGCTGGCAAAAGATCTGAGC GTTCTGGCACTGCGTGAAGGTCTGGGACTGCCTCCGGGTGATGATCCGATGCTGCTGGCA TATCTGCTGGATCCGAGCAATACCACACCGGAAGGTGTTGCACGTCGTTATGGTGGTGAA TGGACCGAAGAAGCAGGCGAACGCGCAGCACTGAGCGAACGTCTGTTTGCAAATCTGTGG GGTCGTCTGGAAGGTGAAGAACGTCTGCTGTGGCTGTATCGTGAAGTTGAACGTCCGCTG TCTGCAGTTCTGGCACACATGGAAGCAACCGGTGTTCGTCTGGATGTTGCATATCTGCGT GCACTGAGCCTGGAAGTTGCAGAAGAAATTGCACGTCTGGAAGCAGAAGTTTTTCGTCTG GCCGGCCATCCGTTTAAACTGAATAGCCGTGATCAGCTGGAACGTGTTCTGTTTGATGAA CTGGGTCTGCCAGCAATTGGTAAAACCCGTAAAACCGGTAAACGTAGCACCAAAGCAGCA GTTCTGGAAGCCCTGCGTGAAGCACATCCGATTGTGGAAAAAATTCTGCAGTATCGCGAA CTGACCAACCTGAAAAGCACCTATATCGATCCGCTGCCGGATCTGATTCATCCGCGTACC GGTCGTCTGCATACCCGTTTTAATCAGACCGCAACCAAAACCGGTCGCCTGAGCAGCAGC GATCCGAATCTGCAGAATATTCCGGGTCGTACACCGCTGGGTCAGCGTATTCGTCGTGCA TTTATTGCAGAAGAAGGTTGGCTGCTGGTTGCACTGGATTATAGCCAGAAAGAACTGCGT GTTCTGGCCCATCTGAGCGGTGATGAAAATCTGATTCGCGTGTTTCAGGAAGGTCGCGAT ATTCATACCGAAACCGCAAGCTGGATGTTTGGTGTTCCGCGTGAAGCAGTTGATCCGCTG ATGCGTCGTGCAGCAAAAACCATTAATTTTGGGGTGCTGTATGGTATGAGCGCACATCGT CTGAGCCAGGAACTGGCAATTCCGTACGAAGAAGCCCAGGCATTTATCGAACGTTATTTT CAGAGCTTTCCGAAAGTTCGTGCCTGGATTGAAAAAACCCTGGAAGAAGGTCGTCGTCGC GGTTATGTTGAAACCCTGTTTGGTCGTCGTCGTTATGTTCCGGATCTGGAAGCACGTGTT AAAAGCGTTCGTGAAGCAGCAGAACGTATGGCCTTTAATATGCCGGTTCAGGGCACCGCA GCAGATCTGATGAAACTGGCCATGGTTAAACTGTTTCCGCGTCTGGAAGAAATGGGTGCA CGTATGCTGCTGCAGGTTCATGATGAACTGGTGCTGGAAGCACCGAAAGAACGTGCAGAA GCAGTTGCCCGTCTGGCAAAAGAAGTTATGGAAGGCGTTTATCCGCTGGCAGTTCCGCTG GAAGTTGAAGTTGGTATTGGTGAAGATTGGCTGTCTGCAAAAGAA bold: mutated codon Dark grey: KlenTaq wild-type base Light grey: mutated base SEQ ID NO: 8 (nucleotide sequence coding for KlenTaq RIV D15) ATGAGAGGATCTCACCATCACCATCACCATACGGATCCGCATGCAGCACTGGAAGAAGCA CCTTGGCCTCCGCCTGAAGGTGCATTTGTTGGTTTTGTTCTGAGCCGTAAAGAACCGATG TGGGCAGATCTGCTGGCACTGGCAGCAGCACGTGGTGGTCGTGTTCATCGTGCACCGGAA CCGTATAAAGCTCTGCGCGATCTGAAAGAAGCACGCGGTCTGCTGGCAAAAGATCTGAGC GTTCTGGCACTGCGTGAAGGTCTGGGACTGCCTCCGGGTGATGATCCGATGCTGCTGGCA TATCTGCTGGATCCGAGCAATACCACACCGGAAGGTGTTGCACGTCGTTATGGTGGTGAA TGGACCGAAGAAGCAGGCGAACGCGCAGCACTGAGCGAACGTCTGTTTGCAAATCTGTGG GGTCGTCTGGAAGGTGAAGAACGTCTGCTGTGGCTGTATCGTGAAGTTGAACGTCCGCTG TCTGCAGTTCTGGCACACATGGAAGCAACCGGTGTTCGTCTGGATGTTGCATATCTGCGT GCACTGAGCCTGGAAGTTGCAGAAGAAATTGCACGTCTGGAAGCAGAAGTTTTTCGTCTG GCCGGCCATCCGTTTAAACTGAATAGCCGTGATCAGCTGGAACGTGTTCTGTTTGATGAA CTGGGTCTGCCAGCAATTGGTAAAACCAAAAAAACCGGTAAACGTAGCACCAACGCAGCA GTTCTGGAAGCCCTGCGTGAAGCACATCCGATTGTGGAAAAAATTCTGCAGTATCGCGAA CTGACCAACCTGAAAAGCACCTATATCGATCCGCTGCCGGATCTGATTCATCCGCGTACC GGTCGTCTGCATACCCGTTTTAATCAGACCGCAACCAAAACCGGTCGCCTGAGCAGCAGC GATCCGAATCTGCAGAATATTCCGGGTCGTACACCGCTGGGTCAGCGTATTCGTCGTGCA TTTATTGCAGAAGAAGGTTGGCTGCTGGTTGCACTGGATTATAGCCAGAAAGAACTGCGT GTTCTGGCCCATCTGAGCGGTGATGAAAATCTGATTCGCGTGTTTCAGGAAGGTCGCGAT ATTCATACCGAAACCGCAAGCTGGATGTTTGGTGTTCCGCGTGAAGCAGTTGATCCGCTG ATGCGTCGTGCAGCAAAAACCATTAATTTTGGGGTGCTGTATGGTATGAGCGCACATCGT CTGAGCCAGGAACTGGCAATTCCGTACGAAGAAGCCCAGGCATTTATCGAACGTTATTTT CAGAGCTTTCCGAAAGTTCGTGCCTGGATTGAAAAAACCCTGGAAGAAGGTCGTCGTCGC GGTTATGTTGAAACCCTGTTTGGTCGTCGTCGTTATGTTCCGGATCTGGAAGCACGTGTT AAAAGCGTTCGTGAAGCAGCAGAACGTATGGCCTTTAATATGCCGGTTCAGGGCACCGCA GCAGATCTGATGAAACTGGCCATGGTTAAACTGTTTCCGCGTCTGGAAGAAATGGGTGCA CGTATGCTGCTGCAGGTTCATGATGAACTGGTGCTGGAAGCACCGAAAGAACGTGCAGAA GCAGTTGCCCGTCTGGCAAAAGAAGTTATGGAAGGCGTTTATCCGCTGGCAGTTCCGCTG GAAGTTGAAGTTGGTATTGGTGAAGATTGGCTGTCTGCAAAAGAA bold: mutated codon Dark grey: KlenTaq wild-type base Light grey: mutated base SEQ ID NO: 9 (fragment of SEQ ID NO: 38 (Table 2), as shown in Fig.2) CTGCTGCTTGAAAATGGATTGTGCGTAAAGACGGAGGGTAATTATAGATATACCACCTAG TCTTCTTGATCCGAGGCCTACAGCTTTTGATCCCT SEQ ID NOs: 10 to 36 are shown in Table 2 SEQ ID NOs: 37 to 41 are shown in Table 3 SEQ ID NO: 42 (amino acid sequence of wild-type Thermus thermophilus DNA polymerase) MEAMLPLFEPKGRVLLVDGHHLAYRTFFALKGLTTSRGEPVQAVYGFAKSLLKALKEDGY KAVFVVFDAKAPSFRHEAYEAYKAGRAPTPEDFPRQLALIKELVDLLGFTRLEVPGYEAD DVLATLAKKAEKEGYEVRILTADRDLYQLVSDRVAVLHPEGHLITPEWLWEKYGLRPEQW VDFRALVGDPSDNLPGVKGIGEKTALKLLKEWGSLENLLKNLDRVKPENVREKIKAHLED LRLSLELSRVRTDLPLEVDLAQGREPDREGLRAFLERLEFGSLLHEFGLLEAPAPLEEAP WPPPEGAFVGFVLSRPEPMWAELKALAACRDGRVHRAADPLAGLKDLKEVRGLLAKDLAV LASREGLDLVPGDDPMLLAYLLDPSNTTPEGVARRYGGEWTEDAAHRALLSERLHRNLLK RLEGEEKLLWLYHEVEKPLSRVLAHMEATGVRLDVAYLQALSLELAEEIRRLEEEVFRLA GHPFNLNSRDQLERVLFDELRLPALGKTQKTGKRSTSAAVLEALREAHPIVEKILQHREL TKLKNTYVDPLPSLVHPRTGRLHTRFNQTATATGRLSSSDPNLQNIPVRTPLGQRIRRAF VAEAGWALVALDYSQIELRVLAHLSGDENLIRVFQEGKDIHTQTASWMFGVPPEAVDPLM RRAAKTVNFGVLYGMSAHRLSQELAIPYEEAVAFIERYFQSFPKVRAWIEKTLEEGRKRG YVETLFGRRRYVPDLNARVKSVREAAERMAFNMPVQGTAADLMKLAMVKLFPRLREMGAR MLLQVHDELLLEAPQARAEEVAALAKEAMEKAYPLAVPLEVEVGMGEDWLSAKG SEQ ID NO: 43 (amino acid sequence of wild-type Escherichia coli DNA polymerase I) MVQIPQNPLILVDGSSYLYRAYHAFPPLTNSAGEPTGAMYGVLNMLRSLIMQYKPTHAAV VFDAKGKTFRDELFEHYKSHRPPMPDDLRAQIEPLHAMVKAMGLPLLAVSGVEADDVIGT LAREAEKAGRPVLISTGDKDMAQLVTPNITLINTMTNTILGPEEVVNKYGVPPELIIDFL ALMGDSSDNIPGVPGVGEKTAQALLQGLGGLDTLYAEPEKIAGLSFRGAKTMAAKLEQNK EVAYLSYQLATIKTDVELELTCEQLEVQQPAAEELLGLFKKYEFKRWTADVEAGKWLQAK GAKPAAKPQETSVADEAPEVTATVISYDNYVTILDEETLKAWIAKLEKAPVFAFDTETDS LDNISANLVGLSFAIEPGVAAYIPVAHDYLDAPDQISRERALELLKPLLEDEKALKVGQN LKYDRGILANYGIELRGIAFDTMLESYILNSVAGRHDMDSLAERWLKHKTITFEEIAGKG KNQLTFNQIALEEAGRYAAEDADVTLQLHLKMWPDLQKHKGPLNVFENIEMPLVPVLSRI ERNGVKIDPKVLHNHSEELTLRLAELEKKAHEIAGEEFNLSSTKQLQTILFEKQGIKPLK KTPGGAPSTSEEVLEELALDYPLPKVILEYRGLAKLKSTYTDKLPLMINPKTGRVHTSYH QAVTATGRLSSTDPNLQNIPVRNEEGRRIRQAFIAPEDYVIVSADYSQIELRIMAHLSRD KGLLTAFAEGKDIHRATAAEVFGLPLETVTSEQRRSAKAINFGLIYGMSAFGLARQLNIP RKEAQKYMDLYFERYPGVLEYMERTRAQAKEQGYVETLDGRRLYLPDIKSSNGARRAAAE RAAINAPMQGTAADIIKRAMIAVDAWLQAEQPRVRMIMQVHDELVFEVHKDDVDAVAKQI HQLMENCTRLDVPLLVEVGSGENWDQAH SEQ ID NO: 44 (amino acid sequence of wild-type E. coli phage T7 DNA polymerase) MIVSDIEANALLESVTKFHCGVIYDYSTAEYVSYRPSDFGAYLDALEAEVARGGLIVFHN GHKYDVPALTKLAKLQLNREFHLPRENCIDTLVLSRLIHSNLKDTDMGLLRSGKLPGKRF GSHALEAWGYRLGEMKGEYKDDFKRMLEEQGEEYVDGMEWWNFNEEMMDYNVQDVVVTKA LLEKLLSDKHYFPPEIDFTDVGYTTFWSESLEAVDIEHRAAWLLAKQERNGFPFDTKAIE ELYVELAARRSELLRKLTETFGSWYQPKGGTEMFCHPRTGKPLPKYPRIKTPKVGGIFKK PKNKAQREGREPCELDTREYVAGAPYTPVEHVVFNPSSRDHIQKKLQEAGWVPTKYTDKG APVVDDEVLEGVRVDDPEKQAAIDLIKEYLMIQKRIGQSAEGDKAWLRYVAEDGKIHGSV NPNGAVTGRATHAFPNLAQIPGVRSPYGEQCRAAFGAEHHLDGITGKPWVQAGIDASGLE LRCLAHFMARFDNGEYAHEILNGDIHTKNQIAAELPTRDNAKTFIYGFLYGAGDEKIGQI VGAGKERGKELKKKFLENTPAIAALRESIQQTLVESSQWVAGEQQVKWKRRWIKGLDGRK VHVRSPHAALNTLLQSAGALICKLWIIKTEEMLVEKGLKHGWDGDFAYMAWVHDEIQVGC RTEEIAQVVIETAQEAMRWVGDHWNFRCLLDTEGKMGPNWAICH SEQ ID NO: 45 (amino acid sequence of wild-type Bacillus stearothermophilus DNA polymerase) MKNKLVLIDGNSVAYRAFFALPLLHNDKGIHTNAVYGFTMMLNKILAEEQPTHILVAFDA GKTTFRHETFQDYKGGRQQTPPELSEQFPLLRELLKAYRIPAYELDHYEADDIIGTMAAR AEREGFAVKVISGDRDLTQLASPQVTVEITKKGITDIESYTPETVVEKYGLTPEQIVDLK GLMGDKSDNIPGVPGIGEKTAVKLLKQFGTVENVLASIDEIKGEKLKENLRQYRDLALLS KQLAAICRDAPVELTLDDIVYKGEDREKVVALFQELGFQSFLDKMAVQTDEGEKPLAGMD FAIADSVTDEMLADKAALVVEVVGDNYHHAPIVGIALANERGRFFLRPETALADPKFLAW LGDETKKKTMFDSKRAAVALKWKGIELRGVVFDLLLAAYLLDPAQAAGDVAAVAKMHQYE AVRSDEAVYGKGAKRTVPDEPTLAEHLVRKAAAIWALEEPLMDELRRNEQDRLLTELEQP LAGILANMEFTGVKVDTKRLEQMGAELTEQLQAVERRIYELAGQEFNINSPKQLGTVLFD KLQLPVLKKTKTGYSTSADVLEKLAPHHEIVEHILHYRQLGKLQSTYIEGLLKVVHPVTG KVHTMFNQALTQTGRLSSVEPNLQNIPIRLEEGRKIRQAFVPSEPDWLIFAADYSQIELR VLAHIAEDDNLIEAFRRGLDIHTKTAMDIFHVSEEDVTANMRRQAKAVNFGIVYGISDYG LAQNLNITRKEAAEFIERYFASFPGVKQYMDNIVQEAKQKGYVTTLLHRRRYLPDITSRN FNVRSFAERTAMNTPIQGSAADIIKKAMIDLSVRLREERLQARLLLQVHDELILEAPKEE IERLCRLVPEVMEQAVTLRVPLKVDYHYGPTWYDAK SEQ ID NO: 46 (amino acid sequence of wild-type Bacillus subtilis DNA polymerase) MTERKKLVLVDGNSLAYRAFFALPLLSNDKGVHTNAVYGFAMILMKMLEDEKPTHMLVAF DAGKTTFRHGTFKEYKGGRQKTPPELSEQMPFIRELLDAYQISRYELEQYEADDIIGTLA KSAEKDGFEVKVFSGDKDLTQLATDKTTVAITRKGITDVEFYTPEHVKEKYGLTPEQIID MKGLMGDSSDNIPGVPGVGEKTAIKLLKQFDSVEKLLESIDEVSGKKLKEKLEEFKDQAL MSKELATIMTDAPIEVSVSGLEYQGFNREQVIAIFKDLGFNTLLERLGEDSAEAEQDQSL EDINVKTVTDVTSDILVSPSAFVVEQIGDNYHEEPILGFSIVNETGAYFIPKDIAVESEV FKEWVENDEQKKWVFDSKRAVVALRWQGIELKGAEFDTLLAAYIINPGNSYDDVASVAKD YGLHIVSSDESVYGKGAKRAVPSEDVLSEHLGRKALAIQSLREKLVQELENNDQLELFEE LEMPLALILGEMESTGVKVDVDRLKRMGEELGAKLKEYEEKIHEIAGEPFNINSPKQLGV ILFEKIGLPVVKKTKTGYSTSADVLEKLADKHDIVDYILQYRQIGKLQSTYIEGLLKVTR PDSHKVHTRFNQALTQTGRLSSTDPNLQNIPIRLEEGRKIRQAFVPSEKDWLIFAADYSQ IELRVLAHISKDENLIEAFTNDMDIHTKTAMDVFHVAKDEVTSAMRRQAKAVNFGIVYGI SDYGLSQNLGITRKEAGAFIDRYLESFQGVKAYMEDSVQEAKQKGYVTTLMHRRRYIPEL TSRNFNIRSFAERTAMNTPIQGSAADIIKKAMIDMAAKLKEKQLKARLLLQVHDELIFEA PKEEIEILEKLVPEVMEHALALDVPLKVDFASGPSWYDAK SEQ ID NO: 47 (amino acid sequence of wild-type Bacillus phage SP01 DNA polymerase) MGSALDTLKEFNPKPMKGQGSKKARIIIVQENPFDYEYRKKKYMTGKAGKLLKFGLAEVG IDPDEDVYYTSIVKYPTPENRLPTPDEIKESMDYMWAEIEVIDPDIIIPTGNLSLKFLTK MTAITKVRGKLYEIEGRKFFPMIHPNTVLKQPKYQDFFIKDLEILASLLEGKTPKNVLAF TKERRYCDTFEDAIDEIKRYLELPAGSRVVIDLETVKTNPFIEKVTMKKTTLEAYPMSQQ PKIVGIGLSDRSGYGCAIPLYHRENLMKGNQIGTIVKFLRKLLEREDLEFIAHNGKFDIR WLRASLDIYLDISIWDTMLIHIIDYRGERYSWSKRLAWLETDMGGYDDALDGEKPKGEDE GNYDLIPWDILKVYLADDCDVTFRLSEKYIPLVEENEEKKWLWENIMVPGYYTLLDIEMD GIHVDREWLEVLRVSYEKEISRLEDKMREFPEGVAMEREMRDKWKERVMIGNIKSANRTP EQQDKFKKYKKYDPSKGGDKINFGSTKQLGELLFERMGLETVIFTDKGAPSTNDDSLKFM GSQSDFVKVLMEFRKANHLYNNFVSKLSLMIDPDNIVHPSYNIHGTVTGRLSSNEPNAQQ FPRKVNTPTLFQYNFEIKKMFNSRFGDGGVIVQFDYSQLELRILVCYYSRPYTIDLYRSG ADLHKAVASDAFGVAIEEVSKDQRTASKKIQFGIVYQESARGLSEDLRAEGITMSEDECE IFIKKYFKRFPKVSKWIRDTKKHVKDISTVKTLTGATRNLPDIDSIDQSKANEAERQAVN TPIQGTGSDCTLMSLILINQWLRESGLRSRICITVHDSIVLDCPKDEVLEVAKKVKHIME NLGEYNEFYKFLGDVPILSEMEIGRNYGDAFEATIEDIEEHGVDGFIEMKEKEKLEKDMK EFTKIIEDGGSIPDYARIYWENIS SEQ ID NOs: 48 to 83 are shown in Table 1 EXAMPLES The present invention will be further illustrated by the following examples without being limited thereto. Material and methods: Oligonucleotides and HeLa genomic DNA DNA oligonucleotides were purchased from biomers.net GmbH in HPLC grade and directly used for primer extension reactions with capillary electrophoresis (CE) analysis, PCR and NGS library preparation. Oligonucleotides applied in primer extension reactions with denaturing polyacrylamide gel electrophoresis (PAGE) analysis and phosphor imaging were purified by preparative denaturing PAGE and radioactively 5'-end labelled with [y³²P]-ATP and T4 polynucleotide kinase (New England Biolabs) according to the manufacturer's instructions. DNA sequences of used oligonucleotides are listed in Table 2. HeLa genomic DNA (gDNA) and CpG methylated HeLa gDNA (mCpG gDNA) were purchased from New England Biolabs and directly used in experiments. DNA sequences of templates and the gDNA amplicon target region are listed in Table 3. Table 2: DNA oligonucleotides used herein. Oligonucleotide Sequence SEQ ID NO: Primer extension experiments Radioactive primer 5'-ACTACAAGCCCCAAAAGCAG-3' 10 extension primer Oligonucleotide C 5'-ATCTGCTCGAGGCCTGCTTTTGGGGCTTGTAGT-(P)-3' 11 template (P = Phosphate) Oligonucleotide 5mC 5'-ATCTGCTCGAGG5mCCTGCTTTTGGGGCTTGTAGT-(P)-3' 12 template (P = Phosphate) CE primer 20 nt 5'-(F)-ACTACAAGCCCCAAAAGCAG-3' 13 (F = FAM/HEX) CE primer 25 nt 5'-(F)-CGATCACTACAAGCCCCAAAAGCAG-3' 14 (F = FAM/HEX) CE primer 30 nt 5'-(F)-TCGATCGATCACTACAAGCCCCAAAAGCAG-3' 15 (F = FAM/HEX) CE primer 35 nt 5'-(F)-ATCGATCGATCGATCACTACAAGCCCCAAAAGCAG-3' 16 (F = FAM/HEX) CE primer 40 nt 5'-(F)-GATCGATCGATCGATCGATCACTACAAGCCCCAAAAGCAG-3' 17 (F = FAM/HEX) CE primer 45 nt 5'-(F)- 18 (F = FAM/HEX) CGATCGATCGATCGATCGATCGATCACTACAAGCCCCAAAAGCAG-3' PCR experiments 109 bp forward primer 5'-GAATGGGATAGAGAAGGGATCAAAAG-3' 19 109 bp reverse primer 5'-CTGCTGCTTGAAAATGGATTGTGC-3' 20 803 bp forward primer 5'-TCTGTCTTTTCATCATTGGTTCT-3' 21 803 bp reverse primer 5'-TCCTAGACACAACTGAATCCCAA-3' 22 Bisulfite conversion 5'-CATAATACTACTTAAAAAAATCACTCTAACA-3' 23 forward primer Bisulfite conversion 5'-GATTTTTTGGAATTTTAAATATAATTTTGAAGT-3' 24 reverse primer NGS library preparation UMI forward primer 1 5'- 25 CTTTCCCTACACGACGCTCTTCCGATCTNNNNNNNNGAATGGGATAGA GAAGGGATCAA-3' UMI forward primer 2 5'- 26 CTTTCCCTACACGACGCTCTTCCGATCTNNNNNNNNCGAATGGGATAG AGAAGGGATCAA-3' UMI forward primer 3 5'- 27 CTTTCCCTACACGACGCTCTTCCGATCTNNNNNNNNTCGAATGGGATA GAGAAGGGATCAA-3' UMI reverse primer 1 5'- 28 GGAGTTCAGACGTGTGCTCTTCCGATCTNNNNNNNNCTGCTGCTTGAA AATGGATTGTG-3' UMI reverse primer 2 5'- 29 GGAGTTCAGACGTGTGCTCTTCCGATCTNNNNNNNNACTGCTGCTTGA AAATGGATTGTG-3' UMI reverse primer 3 5'- 30 GGAGTTCAGACGTGTGCTCTTCCGATCTNNNNNNNNTACTGCTGCTTG AAAATGGATTGTG-3' 183 bp forward primer 5'-CTTTCCCTACACGACGCTCTTCCGAT-3' 31 183 bp reverse primer 5'-GGAGTTCAGACGTGTGCTCTTCCGAT-3' 32 Amplicon forward 5'-CAAGCAGAAGACGGCATACGAGAT [i7] 33 primer GTGACTGGAGTTCAGACGTGTGCTCTTCCGATC-3' ([i7] = Illumina TruSeq CD i7 indexes 1) Amplicon reverse primer 5'-AATGATACGGCGACCACCGAGATCTACAC [i5] 34 ([i5] = Illumina TruSeq ACACTCTTTCCCTACACGACGCTCTTCCGATCT-3' CD i5 indexes 2) Bisulfite conversion 5'- 35 UMI forward primer CTTTCCCTACACGACGCTCTTCCGATCTNNNNNNNNAAAAATCAAAAA CTATAAACCTC-3' Bisulfite conversion 5'- 36 UMI reverse primer GGAGTTCAGACGTGTGCTCTTCCGATCTNNNNNNNNTTGTTGTTTGAA AATGGATTGTG-3' Table 3: Templates and PCR products used herein. 803 bp template DNA generated by PCR (SEQ ID NO: 37) 5'-TCTGTCTTTTCATCATTGGTTCTTTTATTATTTTTTAAACTTACATTTGTTTTTCTGAAACCGAGCTAA AAACTGTAGACATTGCTTCATTTAATGTTTAGCATTTCTGAGAAATCTTAGATCAGTTTGATTATAATTCT TTTATAAGAATGGTGTTTTTTCCTTCATAGATTCTCTGGAATTTTAAACATAACCTTGAAGTTCAAATTAT TCACCAAGACCTGACTAATATTTAGCCTCTTTTAAATAAGTTGTCTGCTGCTTGAAAATGGATTGTGCGTA AAGACGGAGGGTAATTATAGATATACCACCTAGTCTTCTTGATCCGAGGCCTACAGCTTTTGATCCCTTCT CTATCCCATTCTATCAACAATGTCAGAGTGATCCTTCTAAGTAGCATTATGACAATGTCACTCTGCAGCTT CAAATATTCAGGTGAATCTCCTCATCTATAAAATAAAGTCCAAAATTCTCAGCATGTAATATAAGTCTATT AATGTAATATATCCAAAACACTGACCATATCTTTGTTTATCTTTTACTTTGTGTCAGTTCTGGTTTTTACT GTTCCTAATAAAGTTTTTAATTTTTATTATTAATTTTTTTTAACCAGAGTTTGCTAACCACATTCATTTCT TTTTTTATTTATCCCAGCCTCTCAATTTCATTTTCCAGCTTAATATCACCATTTCTCCACTAGACTTCAAT ATTCTAGATTACGAATACAATAAGAAGAGCCTCTAGATAACCAACTGCAAATTAGAAACCAGTAATACAAA ATTGGGATTCAGTTGTGTCTAGGA-3' 109 bp PCR product and NGS amplicon target OR10A2 olfactory receptor family 10 subfamily A member 2 [ Homo sapiens (human) ] (SEQ ID NO: 38) 5'-CTGCTGCTTGAAAATGGATTGTGCGTAAAGACGGAGGGTAATTATAGATATACCACCTAGTCTTCTTGA TCCGAGGCCTACAGCTTTTGATCCCTTCTCTATCCCATTC-3' 235 bp bisulfite conversion product (SEQ ID NO: 39) 5'-GATTTTTTGGAATTTTAAATATAATTTTGAAGTTTAAATTATTTATTAAGATTTGATTAATATTTAGTT TTTTTTAAATAAGTTGTTTGTTGTTTGAAAATGGATTGTGTGTAAAGATGGAGGGTAATTATAGATATATTATT TAGTTTTTTTGATTTGAGGTTTATAGTTTTTGATTTTTTTTTTATTTTATTTTATTAATAATGTTAGAGTGATT TTTTTAAGTAGTATTATG-3' 95 bp NGS bisulfite conversion target (SEQ ID NO: 40) 5'-TTGTTGTTTGAAAATGGATTGTGTGTAAAGATGGAGGGTAATTATAGATATATTATTTAGTTTTTTTGA TTTGAGGTTTATAGTTTTTGATTTTT-3' Sequence region for 5mC detection (SEQ ID NO: 41) 5'-CGTAAAGACGGAGGGTAATTATAGATATACCACCTAGTCTTCTTGATCCGAGGCCTACAG-3'^a Primer binding sites highlighted in grey^b C at CpG sites highlighted in bold^c C positions highlighted by underlining Primer extension with radioactively labelled primer The reaction mixtures contained 150 nM [y³²P]-labelled primer, 200 nM oligonucleotide template (C or 5mC) and 50 nM KTq wild-type in 1× KTq reaction buffer (50 mM Tris HCI (pH 9.2), 16 mM (NH4)2SO4, 2.5 mM MgCl2, 0.1% (v/v) Tween 20). Reaction mixtures were heated to 95°C for 2 min and subsequently cooled down stepwise to 4°C for annealing. Reaction mixtures were then incubated at 55°C and primer extension was started by adding 50 µM of the respective dNTP in 35 µL end volume. After indicated reaction times, 5 µL reaction mixture were stopped by mixing with 5 µL stop solution (80% formamide, 20 mM EDTA, 0.25% (w/v) bromophenol blue, 0.25%.(w/v) xylene cyanol). After denaturation for 3 min at 95°C, reactions were analysed by 12% denaturing PAGE and visualised by phosphor imaging (Typhoon TM FLA 9500, GE Healthcare Life Science). Library expression and lysate preparation in 96-well plates KTq variant libraries applied in screening included all 19 single mutants at positions N485, E507, S515, K540, Y545, T569, A570, T571, R573, D578, N583, 1584, V586, R587, I614, E615, L616, 1638, H639, R659, R660, A661; K663, T664, 1665, F667, G668, V669, L670, Y671, G672, M673, R677, E681, R728, A743, R746, M747, F749, N750, Q754, V783 and H784. In addition, 153 double mutants were generated and screened based on rational combination of functional single amino acid mutations. KTq libraries were prepared by site directed mutagenesis of the respective codons as known in the art and stored as glycerol stocks in 96-deep-well plates. 2167 PCR active variants, derived from a combinatorial library generated by random chimeragenesis on a transient template (RACHITT) were directly used for gene expression in 96-deep-well plates. For library expression, 990 µL LB medium supplemented with 100 µg/mL carbenicillin disodium salt were inoculated with 10 µL of overnight grown cultures of Escherichia coli (E. coil) BL21 (DE3) cells (Novagen) harbouring library plasmids. Cells were grown at 37°C on a plate shaker to an OD₆₀₀ of 0.4-0.6 and gene expression was induced by the addition of IPTG (final concentration 0.4 mM). After incubation at 37°C for 3 h, cells were harvested by centrifugation at 4°C for 30 min. Pellets were lysed as known in the art. Lysates were stored at 4°C up to four weeks and used without any further purification for primer extension reactions. Primer extension for determination of lysate dilution The screening reaction conditions were set up considering the optimal reaction window, with complete primer elongation after 10 min using 70 nM dGTP substrate and starting primer elongation after 15 min using 35 µM dATP substrate by the KTq wild-type (Fig.1). To conduct experiments with the same reaction conditions and lysate concentration for all KTq variants, lysate dilutions were determined in a preliminary primer extension reaction. For this, KTq wild- type lysate was diluted in 1x KTq reaction buffer (50 mM Tris HCI (pH 9.2), 16 mM (NH4)2SO4, 2.5 mM MgCl₂, 0.1% (v/v) Tween 20) in ratios between 1:10 to 1:100. Lysate dilutions (final 20% (v/v)) were mixed with 100 nM oligonucleotide C template in 1× KTq reaction buffer and added to 10 nM fluorescently labelled primers. Primers varied in length and each size was assigned to a different dilution ratio. Reaction mixtures were prepared twice, by using either a 5'-6-carboxyfluorescein (FAM) labelled primer set for primer extension in presence of the match base guanine (G) or a 5'-hexachlorofluorescein (HEX) labelled primer set for primer extension in presence of the mismatch base A. Reaction mixtures were heated to 95°C for 2 min and then cooled down to 4°C for annealing. Primer extension reaction was started at 55°C by adding 70 nM dGTP or 35 µM dATP. 5 µL of reaction mixtures were stopped by mixing with 5 µL CE stop solution (80% (v/v) formamide, 20 mM EDTA) after 5, 10 and 15 min. After denaturation for 3 min at 95°C, single-nucleotide incorporation was analysed by CE. The applicable concentrations of dGTP (70 nM) and dATP (35 µM) were determined in a preceding experiment by a dNTP dilution series employing KTq wild-type lysate. Primer extension in screening experiment Screening experiments were performed either with dGTP or dATP as substrate for single- nucleotide incorporation. KTq variant lysates were diluted in 1× KTq buffer (50 mM Tris HCI (pH 9.2), 16 mM (NH₄)₂SO₄, 2.5 mM MgCl₂, 0.1% (v/v) Tween 20) according to the predefined dilution ratios in 96-deep-well plates.4 µL per column of diluted lysates were transferred twice into adjacent columns in a 96-well reaction plate (on ice). As six primers of different length (20 nt, 25 nt, 30 nt, 35 nt, 40 nt, 45 nt) and two different fluorescence dyes (5'-FAM and 5'-HEX) were utilised in the primer extension experiment, 48 lysates could be multiplexed and analysed in one 8-capillary CE run. Prior to reaction, 10 nM fluorescently labelled primers (sorted by size in ascending order, same primer length in consecutive tubes) were mixed in a 12-tube PCR strip with 100 nM oligonucleotide template (5'-FAM labelled primers with C template, 5'-HEX labelled primers with 5mC template) in sufficient amount. Mixtures were heated for 2 min at 95°C and cooled down to 4°C. Annealed primer/template pairs were mixed with 1× KTq reaction buffer and 12 µL were distributed to each lysate row. Primer extension was started at 55°C by addition of 70 nM dGTP or 35 µM dATP in a final volume of 20 µL. After 10 min, reactions were stopped by adding 20 µL CE stop solution (80% (v/v) formamide, 20 mM EDTA). After denaturation for 3 min at 95°C, single-nucleotide incorporation was analysed by CE. Capillary electrophoresis (CE) CE was used for separation and analysis of extended 5'-fluorescent labelled primers. For one CE run, 38 µL Hi-Di^TM formamide (Thermo Fisher Scientific) mixed with 0.15% (v/v) GeneScan^TM 120 LIZ Size Standard (Thermo Fisher Scientific) were added to each well of one column of a MicroAmp^TM 96-well plate (Thermo Fisher Scientific). Then 1 µL of each of the 12 reactions from one row of the 96-well reaction plate were combined in a single well of the MicroAmp^TM 96-well plate to obtain a final volume of 50 µL per well. The MicroAmp^TM 96- well plate was shortly centrifuged and placed into the Applied Biosystems Genetic Analyzer 3500 (Thermo Fisher Scientific) with an 8-capillary array (35 cm) filled with POP-6^TM polymer (Thermo Fisher Scientific). The following parameters were applied for the CE run: G5 dye set, 60°C oven temperature, 1900 s run time, 13.0 kV run voltage, 180 s pre run time, 13.0 kV pre run voltage, 50 s injection time, 1.6 kV injection voltage and 200 s data delay. Qualitative CE data analysis was performed using the GeneMapper^TM Software 5. KTq variant expression and protein purification For KTq variant expression, 25 mL LB medium supplemented with 100 µg/mL carbenicillin disodium salt were inoculated with 250 µL of overnight grown cultures of E. coli BL21 (DE3) cells (Novagen) harbouring gene plasmids.5 mL of remaining overnight grown cultures were used for plasmid extraction and purification employing the QlAprep^® Spin Miniprep Kit (Qiagen). KTq variant plasmids were analysed by Sanger sequencing (Azenta Life Sciences) and mutation sites are listed in Table 4. Inoculated media were incubated at 37°C on a shaker to an OD600 of 0.4-0.6 and gene expression was induced by the addition of IPTG (final concentration 1 mM). After incubation at 37°C under shaking for 4 h, cells were harvested by centrifugation at 4°C for 30 min. Pellets were lysed in 15 mL 1× KTq basis buffer (10 mM Tris HCI (pH 9.2), 300 mM NaCI, 2.5 mM MgCl2, 0.1% (v/v) Triton X-100) containing 1 mg/mL lysozyme at 37°C for 20 min. After heat denaturation of E. coli host proteins at 75°C for 40 min, bacterial cell debris was pelleted by centrifugation at 20000 rpm for 45 min at 4°C. For 6× His-tagged protein purification, the supernatant was supplemented with 5 mM imidazole and metal ion-based affinity purification was performed employing 0.5 mL calibrated cOmplete^TM His-Tag Purification Resin (Roche). Before use, the Ni²⁺ chelate resin was washed and calibrated 4 times with 9 mL 1× KTq basis buffer containing 5 mM imidazole by mixing and subsequent centrifugation at 900 rpm for 2 min at 4°C. The lysate/nickel beads suspension was incubated overnight at 4°C in an overhead shaker. After centrifugation at 900 rpm for 2 min at 4°C, supernatant was removed and Ni²⁺ chelate resin was washed 2 times with 15 mL 1× KTq basis buffer containing 20 mM imidazole. For protein elution, Ni²⁺ chelate resin was incubated with 5 mL 2× elution buffer (100 mM Tris HCI (pH 9.2), 5 mM MgCl2) containing 100 mM imidazole for 30 min at 4°C in an overhead shaker. The elution fraction was obtained after centrifugation and the elution step was repeated with additional 5 mL 2× elution buffer. The imidazole was removed from the combined elution fractions using Amicon^® Ultra Centrifugal Filters 30000 MWCO (Merck) and washing 4 times with 10 mL 2× elution buffer at 4°C. Finally, elution was concentrated to an end volume of 0.1 to 0.3 mL and 1× KTq storage buffer (5 mM Tris HCI (pH 9.2), 16 mM (NH₄)₂SO₄, 0.25 mM MgCl₂, 0.1% (v/v) Tween 20) with 50% final (v/v) glycerol were added for storage at -20°C. Protein concentrations were determined using the Bradford assay and adjusted protein concentrations and purity of enzymes were verified by SDS-PAGE.

K I 6 - 5 K 84 - 5 W 1_{- - - -} 7 M L 8 41_{- -} 7 G 4 3 7 87 - V 1 5 5 R G N S R⁶_I M V K3 K N N V_I 5 8 - 7 5 0 K0 G - 6 - K4_{- - -} 1 4 05 1 4 7 8 5 5 5 5 1_{- -} R D N E S K A V⁶_I K3 A7 K N K G 5 0 0 6 M S7 V_I 8 8 4 - 05 1 4 5 5 75 - 85 - 41 - 96_{- - -} R A N E S K A V⁶_I F K I 5 N S 5 7 K I_I 61 - 84 - 1_{- - - -} 8 4 5 5 1_{- - - - -} R O N S R⁶_I K N_{- I} I - 5 - 5 W -_- 7 M4 I 1 8 1_{- -} 2 4 5 8 1_{- - - - -}7 R N N S 5 R⁶_I 4 - g n_i^s_t K3 K K N0 E0 G8 G K n n_I 8 - 70 5 6 1 4 7 7 8 - 4_{- - - - -} e a e^{i I}^r_{r I} 81 4 5 5 5 5 5 5 1 N E S A⁶_I a R J K D V c^{s v}^r q K e T 3 R K N0 K G K t_I_f K^I^a_I 02 84 - 70 5 5 1 4 07 - 68 - 4_{- - - - -} 5 5 5 5 1 R H N E S K A V⁶_I d_e n_i a_t^I_{- - - - - - - - -} K4_{- -} b_I 7 1_{- - -} 6^o R G_I s_tn K T E G G a_i 3 K7 0 0 8 6 K4 M 8 r_a^I_{- -}_I 4 7_{- - - - -} 3 1 4 05 5 5 75 85 1 2 v R L N E K A D V⁶_I⁸_I qT K3 K N N0 E0 G8 K K^I 2 8 - 70 51 4 75 7_{- -} 41_{- - - - -} g_I 2 4 5 5 5 5 n_i R B N E S K A D⁶_I s_i m K o_r 3 D K N 2 8 - 70 5 0 K0 G 1 4 7 - 68 - K41_{- - - - -} p_I 1 4 5 5 5 5 5 f R A N E S K A V⁶_I o_sn K K N Y E0 V G K G o 3 i_t 8 - 7 a_{t I} 0 5 04 7 87 68 - 4 55_{- - - -} 9 4 5 15 5 5 5 5 1 6 R N E S K A D V⁶_I D u A M_: n 4 oⁱ^e_t_l^a_{t s} 38 5 0 8 7 5 4 07 87 68 7 7 8 4 55 7 4 38 3 b_a u^e_t 4 4 05 15 5 5 5 5 5 1 6 96 7 7 2 T Mⁱ_s N N E S K A D V R⁶_I D F M V⁸_I Primer extension with fluorescently labelled primer and purified KTq variant Primer extension reactions with purified KTq variants were executed similarly to reactions in the screening experiment with the difference that one row of the reaction plate (one capillary) was assigned to one DNA polymerase variant. In short, reactions mixtures contained 2.5 nM KTq variant, 100 nM C or 5mC oligonucleotide template and 10 nM fluorescently labelled primer in 1× KTq buffer (50 mM Tris HCI (pH 9.2), 16 mM (NH4)2SO4, 2.5 mM MgCl2, 0.1% (v/v) Tween 20) with a final volume of 20 µL. Reactions were started at 55°C with addition of dNTPs. For single-nucleotide incorporation experiments 35 nM, 70 nM and 100 nM dGTP as match substrate and 35 µM dATP, 50 µM dATP and 70 µM dATP as mismatch substrate were added. For multiple-nucleotide incorporation experiments either dNTP mixtures with 100 nM dGTP and 100 nM dCTP, or 70 µM dATP and 10 µM dCTP were added. Reactions were stopped after 10 min by adding 20 µL CE stop solution (80% (v/v) formamide, 20 mM EDTA). After denaturation for 3 min at 95°C, extended primers were analysed by CE. PCR activity of purified KTq variants Reaction mixtures for quantitative real-time PCR (qPCR) contained 20 µM dGTP and 200 µM d(A/T/C)TP (each), 400 nM 109 bp forward and reverse primer, 50 pM 803 bp C template generated by PCR, 1× SYBR Green I (Sigma-Aldrich) and 250 nM purified KTq variant in 1× KTq reaction buffer (50 mM Tris HCI (pH 9.2), 16 mM (NH4)2SO4, 2.5 mM MgCl2, 0.1% (v/v) Tween 20). qPCR was performed in 10 µL using the Light Cycler^® 96 instrument (Roche Diagnostics) with an initial denaturation at 95°C for 1 min followed by amplification over 30 cycles with denaturation at 95°C for 10 s, annealing at 62°C for 30 s and elongation at 72°C for 4 min. High resolution melting curves were measured immediately after PCR amplification. qPCR data was analysed by the Light Cycler^® 96 Application Software (Version 1.1.0.1320) and quantification cycle (Cq) values were determined with the predefined fluorescence threshold value 0.2. Formation of correct 109 bp amplicon product was confirmed by comparing melting curves of KTq variant derived PCR products with the melting curve profile of the PCR product obtained by the KTq wild-type. Amplification of human genomic DNA for generation of the 803 bp template Reaction mixtures contained 200 µM dNTPs (each), 500 nM 803 bp forward and reverse primer, 15 ng/µL HeLa gDNA (New England Biolabs) and 0.02 U/µL Q5^® High-Fidelity DNA Polymerase in 1× Q5^® Reaction Buffer (New England Biolabs). PCR was performed in six separate 50 µL reaction mixtures with an initial denaturation at 98°C for 3 min followed by amplification over 30 cycles with denaturation at 98°C for 10 s, annealing at 62°C for 30 s and elongation at 72°C for 30 s. Final elongation was performed for 2 min at 72°C. The 803 bp PCR product was purified by preparative agarose gel electrophoresis using the NucleoSpin^® Gel and PCR Clean-up kit (Macherey-Nagel) according to the manufacturer's instructions. Extracted DNA was combined and treated with the PreCR^® Repair Mix (New England Biolabs) for DNA damage repair according to the manufacturer's instructions (in short: 100 µM dNTPs (each), 1× NAD⁺, 1 µL PreCR^® Repair Mix in 1× ThermoPol^® Reaction Buffer for 20 min at 37°C). Repaired DNA product was purified using the QIAEX II system (Qiagen) according to the manufacturer's instructions and was eluted in 20 µL Milli-Q water. DNA sequence was verified by Sanger sequencing (Azenta Life Sciences). Part of the purified and repaired PCR product was used as the unmodified 803 bp C template and the remaining part was directly used for the generation of the methylated template DNA. Methylation of template DNA generated by PCR Reaction mixtures for CpG methylation of PCR product contained 600 µM S-adenosyl methionine, 12 U CpG Methyltransferase (M.Sssl) in 1× NEBuffer^TM 2 (New England Biolabs). Methylation reaction was performed with purified and repaired 803 bp template DNA in 30 µL end volume by incubation at 37°C for 1 h. The reaction mixture was purified using the QIAEX II system (Qiagen) according to the manufacturer's instructions and DNA was eluted in 20 µL Milli-Q water. Methylation reaction and purification step were repeated seven times for full CpG methylation. CpG methylation of the modified 803 bp 5mC template DNA was verified by bisulfite conversion and NGS (Fig.2). Bisulfite conversion 10 µL of template DNA generated by PCR or HeLa gDNA (New England Biolabs) were bisulfite converted using the EpiMark^® Bisulfite Conversion Kit (New England Biolabs) and immediately desulphonated and purified according to the manufacturer's instructions. Next, bisulfite treated DNA 5'-3' strand was amplified applying two PCRs consecutively. Reaction mixtures for the first PCR contained 200 µM dNTP (each), 100 nM bisulfite conversion forward and reverse primer, 10 µL bisulfite converted DNA and 0.025 U/µL EpiMark^® Hot Start Taq DNA Polymerase in 1× EpiMark^® Hot Start Taq Reaction Buffer (New England Biolabs). PCR was performed in 50 µL with an initial denaturation at 95°C for 30 s, followed by amplification over 30 cycles with denaturation at 95 °C for 20 s, annealing at 49.2°C for 30 s and elongation at 68°C for 40 s. Next, 10 µL of the performed PCR reaction was used as template for the subsequent PCR with the same reaction conditions in 50 µL volume. Five of these reactions were performed and combined. The formation of the 235 bp product was verified by agarose gel electrophoresis and DNA was purified using the NucleoSpin^® Gel and PCR Clean-up kit (Macherey-Nagel) according to the manufacturer's instructions.285 pM of bisulfite converted and amplified 5'-3' strand dsDNA was employed for NGS library preparation following the protocol for gDNA based samples. Methylation status at CpG sites was analysed by NGS (Fig. 2). Linear amplification of the 5'-3' strand of template DNA generated by PCR Reaction mixtures for linear PCR contained 2 µM dGTP and 200 µM d(A/T/C)TP (each), 100 nM 109 bp forward primer, 50 pM 803 bp C or 5mC template and 250 nM KTq variant in 1× KTq reaction buffer (50 mM Tris HCI (pH 9.2), 16 mM (NH4)2SO4, 2.5 mM MgCl2, 0.1% (v/v) Tween 20). PCR was performed in 25 µL with an initial denaturation at 95°C for 3 min followed by amplification over 20 cycles with denaturation at 95°C for 10 s, annealing at 62°C for 30 s and elongation at 72°C for 4 min. Final elongation was performed for 10 min at 72°C. Single- stranded (ssDNA) PCR product from 109 to 364 nt was purified by preparative agarose gel electrophoresis using the NucleoSpin^® Gel and PCR Clean-up kit (Macherey-Nagel) in combination with the NTC binding buffer according to the manufacturer's instructions with elution in 22 µL Milli-Q water. To verify that no original template DNA was extracted, 1 µL of purified PCR product was subsequently applied in a PCR for purity verification. Reaction mixtures contained 200 µM dNTPs (each), 500 nM 803 bp forward and reverse primer, 10% (v/v) PCR product as template and 0.02 U/µL Q5^® Hot Start High-Fidelity DNA Polymerase in 1× Q5^® Reaction Buffer (New England Biolabs). PCR was performed in 10 µL reaction mixtures with an initial denaturation at 98°C for 1 min followed by amplification over 25 cycles with denaturation at 98°C for 10 s, annealing at 62°C for 30 s and elongation at 72°C for 30 s. Final elongation was performed for 2 min at 72°C. The absence of the 803 bp PCR product was verified by agarose gel electrophoresis. NGS library preparation using template DNA generated by PCR 17.5 µL of purified linear PCR product ssDNA was treated with the PreCR^® Repair Mix (New England Biolabs) in 20 µL according to the manufacturer's instructions. Without further purification, the reaction mixture was applied in a 2-cycle UMI PCR. Reaction mixtures for UMI introduction contained 200 µM dNTPs (each), 200 nM forward and reverse UMI primer, 0.02 U/µL Q5^® Hot Start High-Fidelity DNA Polymerase and 0.2× Q5^® Reaction Buffer (New England Biolabs) in 25 µL end volume. PCR was performed with an initial denaturation at 98°C for 1 min followed by primer elongation over 2 cycles with denaturation at 98°C for 10 s, annealing at 48°C for 30 s and elongation at 72°C for 30 s. Final elongation was performed for 2 min at 72°C. 183 bp UMI PCR product was purified by preparative agarose gel electrophoresis using the NucleoSpin^® Gel and PCR Clean-up kit (Macherey-Nagel) according to the manufacturer's instructions. To improve purification, NTI binding buffer was 1:1.5 diluted with Milli-Q water for solubilising agarose gel slices and the dissolved mixture was diluted in total to 1:2.5 prior to DNA binding on the purification column. Purified UMI PCR product was eluted in 15 µL Milli-Q water. DNA concentration was determined by qPCR using previously prepared 183 bp reference DNA in decadic dilution series for the quantification based on standard amplification curves. Reaction mixtures contained 0.96× NEBNext^® Ultra^TM II Q5^® Master Mix (New England Biolabs), 400 nM 183 bp forward and reverse primer, 1× SYBR Green I (Sigma-Aldrich) and 1.85 µL reference template or UMI PCR product elution in 5 µL end volume. qPCR was performed in a Light Cycler^® 96 instrument (Roche Diagnostics) with an initial denaturation at 98°C for 1 min followed by amplification over 30 cycles with denaturation at 98°C for 10 s, annealing at 60°C for 30 s and elongation at 72°C for 30 s. High resolution melting curves were measured immediately after PCR amplification. For absolute quantification, Cq values of reactions with reference template were plotted against the logarithm of their DNA concentrations and the linear regression function was used to calculate the concentration of the UMI PCR product. Calculated concentrations were multiplied by two, because applied reference template DNA was double stranded and only one strand of UMI PCR product carried both primer binding sites. Reactions for reference and UMI DNA were performed twice and a no template control reaction was performed to determine side product signals. The minimum reaction efficiency for the standard curve was 0.9 to 1.1 and the cut-off for the correlation coefficient of linear regression was 0.98. 67.63 fM of purified UMI PCR product was treated with the PreCR^® Repair Mix (New England Biolabs) in 11.25 µL according to the manufacturer's instructions. Without further purification, the reaction mixture was applied in the Amplicon PCR. Reaction mixtures for Amplicon library preparation contained 1× NEBNext^® Ultra^TM II Q5^® Master Mix (New England Biolabs), 400 nM forward and reverse Amplicon primer (containing Illumina TruSeq adapter and indexes sequences) and 26.57 fM UMI PCR product (for final analysis of approximately 100000 UMI families) in 25 µL end volume. PCR was performed with an initial denaturation at 98°C for 1 min followed by amplification over 35 cycles with denaturation at 98°C for 10 s, annealing at 70°C for 30 s and elongation at 72°C for 30 s. Final elongation was performed for 2 min at 72°C.263 bp Amplicon PCR product was purified by preparative agarose gel electrophoresis using the NucleoSpin^® Gel and PCR Clean-up kit (Macherey-Nagel) according to the manufacturer's instructions. To improve purification, NTI binding buffer was 1:1.5 diluted with Milli-Q water for solubilising agarose gel slices and the solved mixture was diluted in total to 1:6.5 prior to DNA binding on the purification column. The PCR product was eluted in 20 µL Milli-Q water and treated with the PreCR^® Repair Mix (New England Biolabs) in 20 µL according to the manufacturer's instructions. Repaired product DNA was purified using the QIAEX II system (Qiagen) according to the manufactures instructions and was eluted in 18 µL Milli-Q water. DNA library concentrations were determined using the Quantus^TM Fluorometer (Promega). After quality control by electrophoresis with the Bioanalyzer 2100 system (Agilent), DNA libraries were pooled equimolarly and sequencing was performed in paired-end mode on an Illumina MiSeg^TM or NextSeq 2000 system with 2 × 75 bp read length. Each prepared NGS library was prepared and sequenced once. Linear amplification of human genomic DNA 5'-3' strand Reaction mixtures for linear PCR contained 10 µM dGTP and 200 µM d(A/T/C)TP (each), 100 nM 109 bp forward primer, template DNA (3 fM 803 bp C or 5mC template generated by PCR, 62.5 ng HeLa gDNA native or CpG methylated (New England Biolabs)) and 300 nM RIV A8 KTq variant in 1× KTq reaction buffer (50 mM Tris HCI (pH 9.2), 16 mM (NH4)2SO4, 2.5 mM MgCl₂, 0.1% (v/v) Tween 20) in 25 µL end volume. PCR was performed with an initial denaturation at 95°C for 3 min followed by amplification over 20 cycles with denaturation at 95°C for 10 s, annealing at 60°C for 30 s and elongation at 72°C for 10 min. Reaction mixtures were purified by using the NucleoSpin^® Gel and PCR Clean-up XS kit (Macherey-Nagel) in combination with the NTC binding buffer according to the manufacturer's instructions for ssDNA PCR product purification. DNA was eluted with 18 µL Elution Buffer NE (5 mM Tris HCI (pH 8.5)). To use the linear PCR product as template for NGS library preparation, the ssDNA was exponentially amplified by a high-fidelity-DNA polymerase to increase the 109 bp product concentration. Reaction mixtures contained 200 µM dNTPs (each), 100 nM 109 bp forward and reverse primer, 67% (v/v) of the PCR elution and 0.02 U/µL Q5^® Hot Start High- Fidelity DNA Polymerase in 1× Q5^® Reaction Buffer (New England Biolabs). PCR was performed in 25 µL reaction mixtures with an initial denaturation at 98°C for 30 s followed by amplification over 10 cycles with denaturation at 98°C for 5 s, annealing at 62°C for 10 s and elongation at 72°C for 5 s. Reaction mixtures were purified by using the NucleoSpin^® Gel and PCR Clean-up XS kit (Macherey-Nagel) and a 1:2 dilution of the NTI buffer according to the manufacturer's instructions to optimise PCR product purification and yield. DNA was eluted with 18 µL Elution Buffer NE. Note, all reactions were performed in the Thermo Scientific Low Profile Tubes (Thermo Scientific) and DNA was eluted in Eppendorf DNA LoBind^® Tubes (Eppendorf). NGS library preparation using human genomic DNA 16.25 µL of purified linear PCR product dsDNA was applied as template for a 2-cycle UMI PCR. Reaction mixtures for UMI introduction contained 200 µM dNTPs (each), 200 nM forward and reverse UMI primer, 0.02 U/µL Q5^® Hot Start High-Fidelity DNA Polymerase and 1× Q5^® Reaction Buffer (New England Biolabs) in 25 µL end volume. PCR was performed with an initial denaturation at 98°C for 30 s followed by primer elongation over 2 cycles with denaturation at 98°C for 10 s, annealing at 48°C for 30 s and elongation at 72°C for 30 s. Final elongation was performed for 2 min at 72°C. 183 bp UMI PCR product was purified by preparative agarose gel electrophoresis using the NucleoSpin^® Gel and PCR Clean-up XS kit (Macherey-Nagel) according to the manufacturer's instructions. Purified UMI PCR product was eluted with 18 µL Elution Buffer NE. DNA concentration was determined by qPCR according to the method described for NGS library preparation using template DNA generated by PCR. DNA repair using the PreCR^® Repair Mix (New England Biolabs) and Amplicon PCR were performed according to the method described for NGS library preparation using template DNA generated by PCR. Amplicon PCR was performed with an initial denaturation at 98°C for 30 s followed by amplification over 30 cycles with denaturation at 98°C for 10 s, annealing at 70°C for 30 s and elongation at 72°C for 30 s. Final elongation was performed for 2 min at 72°C.263 bp Amplicon PCR product was purified by preparative agarose gel electrophoresis using the NucleoSpin^® Gel and PCR Clean-up XS kit (Macherey-Nagel) according to the manufacturer's instructions. To improve purification, non-diluted NTI binding buffer was used first for solubilising agarose gel slices and subsequently the solved mixture was diluted in total to 1:6.5 with Milli-Q water prior to DNA binding on the purification column. PCR product was eluted with Elution Buffer NE. Treatment of extracted DNA with PreCR® Repair Mix (New England Biolabs) and QIAEX II system (Qiagen) purification was performed according to the method described for NGS library preparation using template DNA generated by PCR. Repaired product DNA was eluted with 22 µL Milli-Q water. DNA library concentration determination, quality control and sequencing was performed as described for NGS library preparation using template DNA generated by PCR. Note, all reactions were performed in the Thermo Scientific Low Profile Tubes (Thermo Scientific) and DNA was eluted in Eppendorf DNA LoBind^® Tubes (Eppendorf). NGS data processing for the detection of 5mC Sequencing data quality control, processing and error calculation was executed using the KNIME Analytics Platform software. First. raw sequence and quality values were extracted from the FASTQ file format and Phred quality scores (Q scores) were transformed into base calling error probabilities (P) by using: −^^ ^_{^ = 10}¹⁰ The data were pre-processed by defining the UMI sequence context and translating Read 1 sequence and P into the reverse complement orientation. Read 1 and Read 2 were aligned to give the expected size of the template and merged into one sequence for which base calls with the lower error probability were transferred for the Read 1 and Read 2 overlay segment. High quality data was filtered by removal of reads containing a base within the UMI contexts with a P value above the threshold or removal of merged reads with a mean error probability over all bases in the sequence context above the threshold value. For each filtered read, a deletion and insertion correction was performed to adjust frameshifts by employing the Levenshtein distance between the read and reference template. Additionally, N base calls were replaced by reference bases to prevent false positive error detection. Next, reads were aligned to the reference template sequence and reads with a misalignment higher as 6% or 12% were removed from the data set. For error calculation, reads were sorted into UMI family groups with identical UMIs and analysis was proceeded with UMI families containing a minimum of three reads. First, error calculation of UMI families was performed by averaging over the error for each sequence position of each read in one UMI family. By employing an error cut-off of 0.9, true KTq derived errors were then set to 1 and errors below the cut-off were set to 0 for each UMI family. The mean error was calculated over all UMI families at each sequence position, yielding the KTq based error rate.5mC detection was facilitated by comparing the error rates of the unmodified C template data set with the error rates of the 5mC template sequencing data. Calculating the error difference with: ∆_{^^^^^^^^^^ ^^^^^^^^ = ^^^^^^^^^^ ^^^^^^^^5^^^^ ^^^^^^^^^^^^^^^^ − ^^^^^^^^^^ ^^^^^^^^^^ ^^^^^^^^^^^^^^^^} results in an increased Δ error rate at 5mC positions. Base calls were analysed with sequencing data that was grouped into UMI families but not yet further processed by the error cut-off. The coverage and UMI family number used for error calculation of each NGS library are listed in Table 5. P-values for statistical analysis in Figure 17 were determined using the Wilcoxon- Mann-Whitney (WMW) test (GraphPad Prism Version 6.00). Table 5: The coverage and UMI family number used for error calculation of NGS libraries. Bisulfite conversion of template DNA UMI Figure Illumina System Template Coverage families 803 bp C template 1173293 189832 MiSeq 803 bp 5mC template 1091292 180331 Fig.2 gDNA native 2983694 155223 NextSeq 200 gDNA mCpG 4191504 89429 Template DNA generated by PCR Illumina KTq dNTP UMI Figure Template Coverage System variant concentration families KTq 803 bp C template 1159368 137314 wild-type 803 bp 5mC template 1113872 158298 803 bp C template 1233697 139850 RIII H20 2 µM dGTP & Figs. NextSeq 803 bp 5mC template 662646 21464 200 µM 3, 4, 15 200 803 bp C template 762172 103562 RIV A8 d(A/T/C)TP 803 bp 5mC template 858997 138209 803 bp C template 911981 129173 RIV D15 803 bp 5mC template 1437209 143204 2 µM dGTP & 803 bp C template 1993496 72857 Fig.16 MiSeq RIV A8 200 µM 803 bp 5mC template 2008642 75481 d(A/T/C)TP Human genomic DNA Illumina KTq dNTP UMI Figure Template Coverage System variant concentration families 803 bp C template 1896898 150058 10 µM dGTP & NextSeq 803 bp 5mC template 1805204 206996 Fig.19 RIV A8 200 µM 200 gDNA native 1375518 78540 d(A/T/C)TP gDNA mCpG 1668979 155454 Error rate processing for the detection of 5mC in human genomic DNA Error rates, calculated from sequencing data of human genomic DNA based NGS libraries, were processed further for 5mC detection. Errors were standardised using an adapted z-score which only includes error rates at C positions, values from CpG sites (where increased misincorporation would arise) were excluded, to calculate the mean and the standard deviation. Error rates were standardised using: ^_{^^^^^^^^^ ^^^^^^^^ − ^^^^^^^^^^ ^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^}

With this, the z-score gave the number of standard deviation of each error above or below the mean error rate opposite unmodified C bases. Now, 5mC detection was independent of absolute error rates and executed by comparing the z-score of data from the unmodified C template with the z-score of data from the modified template. Calculating the z-score difference with: ^_{^ ^^^^^^^^^^ ^^ − ^^}

results in an increased Δ z-score at 5mC positions. P-values for statistical analysis in Figure 20 were determined using the Wilcoxon-Mann-Whitney (WMW) test (GraphPad Prism Version 6.00). EXAMPLE 1: Screening for DNA polymerase variants with altered fidelity opposite 5mC First, a screening-based engineering approach was developed to discover a DNA polymerase variant with increased misincorporation opposite methylated bases. Recent studies showed that the KTq already discriminates 5mC while processing 3'-modified or 3'-mismatched primer strands. These results, together with the fact that the KTq lacks a proofreading functionality due to the structural origin of the enzyme, rendered this DNA polymerase a promising starting point for the evolution of altered fidelity characteristics opposite 5mC. This was considered to be highly challenging since the methyl group is averted from the Watson-Crick site and rather positioned in the major groove at which very bulky modifications, up to several orders of magnitude larger than the natural substrate, are accepted by DNA polymerases. The DNA polymerase libraries applied for screening included single amino acid substitutions with a broad spectrum of target mutation sites. The mutated residues were chosen based on their close proximity to the nascent base pair in a ternary complex crystal structure, previously published influence on fidelity and evolutionary conservation in family A DNA polymerases. In addition, functional promising mutation sites were rationally combined to generate double mutation variants, resulting in a total of 970 focused KTq variants. Furthermore, over 2100 KTq variants containing multiple mutations were tested, established by combinatorial shuffling of active mutants using the RACHITT method followed by preselection for PCR activity. The libraries were expressed in E. coli and the DNA polymerase variants were directly evaluated from cell lysates in single-nucleotide incorporation experiments. The screening reactions were performed in parallel with oligonucleotides as templates with the same sequence and either C or 5mC at the site of first incorporation. The 5'-labelling of the primers with two different fluorescence dyes, FAM and HEX, and the varying 5'-overhangs enabled a pooling of primer extension reactions for the multiplexed analysis of several KTq variants using CE (Fig. 5A). As substrates for single-nucleotide incorporation either the complementary dGTP (match) or the non-complementary dATP (mismatch) was applied. dATP was selected to identify KTq variants with increased misincorporation activity, because the KTq wild-type enzyme misincorporated dAMP with a higher efficiency as the other mismatching nucleotides opposite C and 5mC (Fig.6). This means that a total of four primer extension reactions per KTq variant (C or 5mC template, dGMP or dAMP incorporation) were performed to evaluate the corresponding incorporation characteristics. To identify a DNA polymerase with an increased error rate opposite 5mC, KTq variants were screened for 5mC discrimination by reduced dGMP (match) incorporation opposite the modified template base. Guided by previous studies on DNA polymerase fidelity, it was reasoned that this decreased efficiency of the KTq variants to incorporate the matching nucleotide opposite 5mC would increase the ratio of mismatch nucleotide incorporation instead. To promote this misincorporation, promising KTq variants were additionally screened for a low to moderately increased dAMP misincorporation, but without discrimination, opposite C and 5mC (Fig. 5B). This should result in a catalytically active DNA polymerase variant which generates mutation signatures opposite 5mC but processes canonical nucleotides without increased error rates. Discrimination and misincorporation efficiencies of the KTq variants were evaluated in comparison to the incorporation characteristics of the KTq wild-type enzyme. In the first three screening rounds all KTq variants showing either 5mC discrimination and/or increased misincorporation were selected. High 5mC discrimination was considered if dGMP was incorporated at least 1.5-fold more efficiently opposite C than opposite 5mC. Increased misincorporation efficiency was detected as soon as 20% of the primer was extended by dAMP incorporation. The screening revealed that most of the KTq variants featured only one if any of the searched characteristics and that none of the screened KTq variants preferred to misincorporate dAMP opposite 5mC. Around 15% of the screened KTq variants combined 5mC discrimination by reduced dGMP incorporation and increased dAMP misincorporation opposite both C and 5mC. However, several KTq variants additionally discriminated 5mC by reduced dAMP incorporation or showed a high misincorporation rate combined with reduced activity to incorporate dGMP opposite C. Both characteristics could influence the incorporation fidelity opposite unmodified C, possibly rendering these KTq variants catalytically inefficient and generally unsuitable for 5mC detection. Therefore, a fourth screening round was performed. By selecting only KTq variants that misincorporated dAMP with a comparable efficiency opposite C and 5mC, as well as elongate not more than 50% of primer by dAMP incorporation, about 3% of the initially applied KTq variants were chosen for a final screening. Since it was observed that the KTq wild-type already discriminates 5mC to some extent, only those KTq variants with the same or higher 5mC discrimination efficiency were selected in the fifth screening round. In addition, promising KTq variants were entitled if the DNA polymerases incorporated dAMP and dGMP comparatively efficient opposite 5mC under the applied reaction conditions (70 nM dGTP and 35 µM dATP). Considering the combination of selected discrimination and misincorporation characteristics with the previously screened features, 12 promising KTq variants were identified as most promising hits (Fig.7 and Fig.8). All these KTq variants derived from the combinatorial RACHITT mutant library and were named according to the location in the library (Table 4). Sequence analysis revealed that all variants are mutated at residue I614 with 75% of the variants having the I614K mutation and 25% having the I614M mutation. Indeed, the KTq variant RII G7 is a single mutant solely with the I614K mutation. The other 11 KTq variants are multiple mutation variants with four to nine randomly combined mutation sites and a median of seven mutations. Here, the residues N483, E507, S515, K540, A570 and V586 are mutated in the majority of the multiple mutation variants. The mutation sites D578, N485 and R587 were detected less frequent. The mutations D655G, F697S, M747L, V783G and 1823M were only identified in one KTq variant respectively. Promising KTq enzymes were purified, and primer extension experiments for characterisation were performed. Here, similar reaction conditions as for the screening experiments were applied but with differing dNTP concentrations. Analysis of single-nucleotide incorporation reactions verified that each screening hit featured an improved 5mC discrimination by incorporating dGMP with reduced efficiency opposite 5mC (Fig.9A and Fig.10). It became clear that some mutants also showed a reduced incorporation of dGMP opposite C, indicating a decreased catalytic efficiency. Based on this, only variants RII L1, RII G7, RIII H20, RIII J18, RIV A8, RIV D15, that incorporate dGMP opposite C comparably efficient as the KTq wild-type, were considered for a detailed analysis of the discrimination characteristics. By estimating to which extend dGMP was more efficiently incorporated opposite C in comparison to 5mC (at 35 nM substrate input), the respective variants could be compared and evaluated. The KTq variants RIII H20 and RIV D15 showed the strongest discrimination against 5mC by 3.5-fold higher incorporation of dGMP opposite C than 5mC. This was followed by RII L1 (2.5-fold) and RIV A8 (2.1-fold). The variants RIII J18 (1.8-fold) and RII G7 (1.6-fold) still had an increased 5mC discrimination compared with the KTq wild-type (1.3-fold), but were considered less promising in regard to discrimination characteristics. Likewise, the analysis of dAMP incorporation verified the increased misincorporation efficiency of the KTq variants. Especially, the two variants RII G7 and RIV A8 featured an increased dAMP misincorporation as more than 80% of primer was elongated at 70 µM dATP. The variants RIII H20 and RIII O16 elongated more than 60% of primer under these reaction conditions and the remaining variants elongated between 20% to 40% of primer by dAMP misincorporation. The KTq wild-type enzyme elongated less than 10% of primer by processing 70 µM dATP substrate. Taking the verified incorporation characteristics into account, the KTq variants RIII H20 (high discrimination, moderate misincorporation), RIV A8 (moderate discrimination, high misincorporation) and RIV D15 (high discrimination, low misincorporation) represented the most interesting screening hits. EXAMPLE 2: Testing the DNA polymerase variants for mismatch extension and DNA synthesis activity Further primer extension and amplification studies with the purified screening hits were performed to investigate how efficiently the KTq variants process incorporated mismatches and synthesise DNA while having less dGTP (match nucleotide) available for the amplification reaction (Table 3). Multiple nucleotide incorporation experiments were conducted to gain insight into the DNA polymerase elongation capability by using similar reaction conditions as in the screening and adding dCTP as the second nucleotide for incorporation. First, the efficiency of the DNA polymerases to process correctly incorporated nucleotides was evaluated by studying the primer elongation after dGMP incorporation. Here, the KTq variants RII B22, RII G7, RII L1, RII O16, RIII J18 and RIV D15 showed an extension efficiency comparable to the KTq wild-type. Around 90% full-length product formation was found after incorporation opposite C and between 30% to 90% full-length formation after incorporation opposite 5mC (Fig.9B and Fig.11A). The mutants RI A9, RI A12, RIII H20, RIII N21, RIV A8 and RVI G16 extended up to 30% of primer to the full-length product after dGMP incorporation opposite C and maximum 20% of primer after dGMP incorporation opposite 5mC (the RVI G16 variant only incorporated one dCMP nucleotide with high efficiency). Both findings highlight that all KTq variants, including the KTq wild-type enzyme, discriminated against 5mC by reduced elongation after dGMP incorporation opposite the methylated template base. However, the processing and elongation of an incorrectly incorporated nucleotide represents a challenge for DNA polymerases and contributes to overall replication fidelity. Therefore, it was focused on selecting DNA polymerase mutants that feature efficient mismatch elongation. Monitoring primer elongation after dAMP misincorporation revealed that only the KTq variants RII G7, RIII H20, RIV A8 and RIV D15 were able to efficiently extend a mismatch (Fig.9C and Fig. 11B). For processing the C template, a fluorescence intensity peak after first incorporation indicated that the KTq variants tend to pause after the dAMP-dC mismatch formation. Here, approximately 5% of full-length product was obtained in presence of the C template. In comparison, the KTq variants featured a mismatch extension efficiency of 15% to 30% full-length primer elongation after dAMP was incorporated opposite 5mC. This translates into a 5mC discrimination that favours the mismatch elongation opposite methylated bases. In detail, the KTq variant RIV A8 showed a 5.8-fold more efficient elongation after dAMP incorporation opposite 5mC compared to C. This was followed by the mutant RII G7 with a 4.6-fold more efficient mismatch elongation, 4.2-fold for RIII H20 and 3.5-fold for the RIV D15 mutant. The KTq wild-type however discriminated against 5mC, by a 2-fold increased efficiency to process a mismatch opposite C. Testing the PCR efficiency and robustness of the DNA polymerases confirmed that all KTq variants were PCR active and amplified the correct PCR product (Fig. 12A, 12B). Here, the KTq variants RIII H20, RIV A8 and RIV D15 showed the highest PCR efficiency of the mutants (Fig.12C). Considering this, only the KTq variants RIII H20, RIV A8 and RIV D15 combined the improved incorporation characteristics, namely 5mC discrimination, increased misincorporation, mismatch extension capability and sufficient activity in DNA synthesis. EXAMPLE 3: DNA polymerase variants with increased misincorporation opposite 5mC Next, it was evaluated whether the KTq variants RIII H20, RIV A8 and RIV D15, were able to generate 5mC dependent mutation signatures in PCR. In order to enhance the 5mC-dependent signatures, linear amplification of an unmodified C and a modified 5mC template was performed in the presence of a reduced concentration of the match nucleotide (Fig.9 and Fig. 13A). It was assumed that supplying a decreased dGTP concentration of 2 µM, in comparison to 200 µM of d(A/T/C)TP (each), would result in an increased mismatch formation opposite cytosines as less of the matching nucleotide is available. In this case, it was reasoned that the error formation at methylated sites might be favoured due to the reduced dGMP incorporation efficiency of the KTq variants opposite 5mC. Thus, an unbalanced dNTP pool promotes both, dAMP misincorporation and 5mC discrimination, and thereby facilitates specific 5mC sensing. Subsequently, the respective PCR products served as templates for NGS library preparation and were subjected to sequencing. Data analysis and UMI-based error calculation were conducted using a self-scripted KNIME workflow (Fig.13B). The unmodified template DNA was generated by PCR in which an 803 bp long flanking region of a 109 bp target was amplified. Additionally, a portion of the generated template was methylated using the M.Sssl methyltransferase and full methylation of CpG sites C24, C32 and C72 was confirmed by bisulfite sequencing (Fig.2). For evaluating the error rates, only the 109 bp target region from positions 24 bp to 84 bp was considered, excluding the primer binding sites. Indeed, the KTq RIV A8 variant showed an up to twofold increased error rate at the methylated CpG sites C24, C32 and C72 in the 5mC template compared to processing the C template (Fig. 14A, black arrows). Detailed analysis of the error rates for the amplification from both templates revealed that the DNA polymerase preferentially incorporated mismatches opposite templating C bases, with an average error of 3.9% opposite C compared to 0.25% opposite all non-C bases. As expected, the KTq wild-type enzyme showed a much lower error profile with an average error of 0.12% opposite C bases and 0.02% opposite all non-C bases, rendering the RIV A8 variant more error-prone in general (Fig. 15A). Interestingly, different sequence positions (even non-C bases) showed variable error rates. It is known that the DNA polymerase fidelity heavily depends on the DNA sequence context and secondary structures, consequently the characteristics of the engineered mismatch formation would be equally affected. Here, it is particularly important that RIV A8 faithfully processed both templates with comparable fidelity at similar positions. Therefore, it is even more striking that a significant error difference was exclusively detected by comparing methylated and unmethylated CpG sites (Fig.14B, left black arrows). At the methylated CpG sites C24, C32 and C72, the RIV A8 variant featured an average of 16.5-fold error increase compared to the KTq wild-type enzyme, which featured a marginally increased error rate opposite 5mC only at position C24 and C32 (with a mean error difference of 5.93% for RIV A8 and 0.36% for the KTq wild-type) (Fig.15B). Analysis of the mutation signature verified that an increased dAMP misincorporation (detected as T base calls) led to the enhanced mismatch formation opposite 5mC by RIV A8 (Fig. 14B, right). Furthermore, RIV A8 reproduced this distinctive 5mC-dependent error signature with an alike outcome in a repetition experiment, rendering RIV A8 applicable for the detection of 5mC by increased misincorporation (Fig.16). Also the mutants RIII H20 and RIV D15 featured increased dAMP misincorporation opposite 5mC although with lower efficiency, resulting in a 10.8-fold increase for the RIII H20 variant (mean error difference of 3.89%) and 5.4-fold increase for the RIV D15 variant (mean error difference of 1.93%) in comparison to the KTq wildtype (Fig.3 and Fig.17). EXAMPLE 4: KTq variant RIV A8 detects 5mC in human genomic DNA Finally, it was set out to test the RIV A8 mutant for 5mC detection in human genomic DNA. In addition to the C and 5mC templates generated by PCR, genomic DNA isolated from HeLa cells was used to investigate the 5mC sensing in a native methylation pattern (gDNA native). To apply fully methylated genomic DNA, HeLa genomic DNA was utilised that was methylated at CpG sites C24, C32 and C72 using the M.Sssl methyltransferase (mCpG gDNA). Methylation levels were determined by bisulfite sequencing (Fig.2). The RIV A8 variant was also able to sense methylated CpG sites in human genomic DNA by generating site-specific 5mC-dependent signatures (Fig. 18, black arrows). Due to the inherently low copy number of genomic DNA, only small amounts of linear PCR product were obtained after the reaction with the KTq variant. Therefore, an exponential amplification by a high-fidelity DNA polymerase followed to generate the required concentration for NGS library preparation. In this step, co-amplification of the template DNA before library preparation led to a mixture of linear PCR product and the starting material. As a consequence, absolute error rates did not represent actual errors derived from RIV A8 (Fig. 19). Based on the previous findings that RIV A8 processed identical positions in each template with similar fidelity and that a significant error difference derived only from methylation (Fig.17C), the absolute error rates were standardised into customised z-scores (Fig.4). Using the z-score values, calculation of the differences between the modified 5mC and unmodified C template confirmed the preceding capability of RIV A8 to sense 5mC, displayed by an 11.37-fold average increase in z-score difference opposite the CpG sites C24, C32 and C72 (Fig.18A). Impressively, despite the challenging and complex nature of amplification from genomic DNA, RIV A8 detected methylation levels that are greater than 50% at CpG sites C24 and C32 in the native gDNA, indicated by a 6.06-fold increase of average z-score difference opposite the CpG sites in comparison to C bases (Fig.18B). This was confirmed by reading an increased misincorporation at the CpG sites C24, C32 and C72 in the mCpG gDNA template (10.74-fold increase in average z-score difference opposite CpG sites in comparison to C bases) (Fig. 18C), making RIV A8 suitable for 5mC detection in highly methylated genomic DNA (Fig.20). Discussion Here, the engineering of a thermostable DNA polymerase variant with altered fidelity opposite 5mC is described. The modified base is discriminated against its unmodified form by increased dAMP misincorporation during PCR and resulting mutation signatures are directly detected by NGS. This facilitates an easy and straightforward 5mC detection without the need for sample conversion prior to usage or extensive data analysis subsequent to sequencing. The DNA polymerase variant was identified by screening a KTq library for altered incorporation characteristics using primer extension reactions. Here, promising DNA polymerase variants were selected for enhanced 5mC discrimination by match nucleotide incorporation and for simultaneously increased misincorporation activity, but without discrimination by mismatch nucleotide incorporation. Further monitoring of mismatch extension and DNA synthesis efficiencies yielded the KTq variants RIII H20, RIV A8 and RIV D15 which were validated for effective 5mC sensing. The DNA polymerase variants showed significantly increased misincorporation opposite 5mC. Unmodified G, A and T bases produced no or only minor elevated error rates, whereas increased error rates also occurred by processing templating C bases. This strategy was applied for the valid detection of highly methylated CpG sites in human genomic DNA by RIV A8. Analysis of the sequencing data revealed 5mC detection at multiple CpG sites within the natural occurring sequence context. Of note, based on the nature of DNA polymerase mismatch processing, in which enhanced mismatch formation leads to less full-length product, and the additional co-amplification of the starting template, only a qualitative 5mC detection could be performed. Interestingly, the identified KTq variants derive all from the same mutant library and feature a relatively high mutational load of seven to eight mutations with similarly mutated residues (Fig. 21). The library was generated by combinatorial shuffling of functional mutations which are located in the proximity of the active site or are evolutionarily conserved in family A DNA polymerases. Indeed, the mutations N483K and A570K can be found in all variants and are located in evolutionary conserved motifs. N483K is located in Motif 1 and A570K in Motif 2, which are both in contact with the template. Residue I614 is evolutionarily conserved in Motif A and directly located at the active site, forming a part of the hydrophobic binding pocket for the incoming nucleotide. Hydrophilic substitution to I614K is known to decrease the DNA polymerase fidelity as well as enhance mismatch extension capability. The I614M mutation contributes to an increased ribonucleotide incorporation activity. Also the mutations K540N and V586G are present in all three KTq variants and both make direct contact with the primer. Here, hydrophilic mutation K540R, joint with other mutations, is involved in an increased DNA-binding affinity and contributes to enhanced resistance to inhibitors such as heparin. Mutation of the negatively charged residue E507 into the positively charged E507K led to an improved resistance to several inhibitors and in combination with other mutations displayed even more enhanced heparin resistance. Furthermore, the single mutation variant E507K exhibits increased activity and stability during PCR by a strong interaction with the primed template DNA. Residue S515 contacts the primer strand and mutation S515R was found in a reverse transcription active KTq variant (RT-KTq, L459M S515R I638F M747K). The mutation F697S shows no direct interaction with the nascent base pair and probably acts through interactions with the dNTP binding residues. Notably, F697 was no target for mutagenesis and was introduced during amplification of the mutant gene. The mechanism by which the KTq variants discriminate 5mC and exhibit increased misincorporation is currently unclear. However, because no single or rationally designed variant comprised the required characteristics, it can be speculated that the individual effects of the distal mutations contribute to a synergistic alteration of the DNA polymerases fidelity. Here, crucial features that contribute to the replication fidelity, such as reliable 5mC processing, high substrate specificity and mismatch discrimination, had to be overcome to engineer a DNA polymerase that efficiently detects 5mC and still retains catalytic activity. It is conceivable that the mutations act on different mechanistic levels. Substitutions at the active site could promote misincorporation by maintaining a proper alignment of the residues upon encountering mismatches and thus facilitating incorporation of incorrect nucleotides. Furthermore, mutated residues contacting primer and/or template strand could stabilise transition states and therefore enhance DNA binding, promote mismatch elongation and improve DNA synthesis in general. Moreover, mutation sites in close proximity to the nascent base pair could discriminate incorporation and elongation opposite methylated template bases. Resulting in a reduced catalytic efficiency for dGMP incorporation opposite 5mC and thus enhance tolerance for mismatch formation at this site. Therefore, 5mC sensing relies more on a generally increased misincorporation rate and specificity against 5mC, but less on targeted misincorporation opposite methylated template sites. Intriguingly, previously identified KTq variants from the same library feature very similar mutational patterns to the herein described mutants with variant Mut_ADL: N483K E507K S515N K540G A570E D578G V586G I614M, and Mut_RT: N483K E507K K540Y V586G I614K. Although these enzymes derive from a single library, they comprise highly diverse functional scopes. Mut_ADL was evolved to exclusively extend from matched primer strands, which would have been rather disadvantageous in this study. Mut_RT efficiently reverse transcribes from RNA substrate templates. Furthermore, by applying a similar evolution approach, the mutant RT-KTq l614Y was identified which is capable to discriminate modified from unmodified RNA bases during reverse transcription. This proposes that changes in these identified key residues affect fundamental properties of the polymerase and that the interplay of distant mutations leads to a variety in incorporation characteristics. Consequently, rendering the mutation sites targets for further optimisation of the evolved 5mC sensing capability. Of note, the DNA polymerases in Sequence Family A, to which Taq DNA polymerase belongs, bear highly conserved motifs. This holds true also for many mutation sites discussed herein with respect to Taq DNA polymerase and KlenTaq DNA polymerase. Therefore, it is expected that the properties of these mutated DNA polymerases can also be transferred to other members of this sequence family by way of implementing the corresponding mutations (Table 1). In this study, the ability of the discovered DNA polymerase to replicate DNA templates modulated by 5mC, combined with the here established sequencing workflow, allow reliable detection of even slight 5mC induced error differences. Considering that the methyl group is not involved in Watson-Crick base pairing, even the subtle 5mC-dependent mutation signatures shown here highlight DNA polymerase engineering as a powerful tool to overcome given fidelity characteristics and obtain enzymes with desired properties for future applications.

Claims

CLAIMS 1. A DNA polymerase derived from wild-type Thermus aquaticus (Taq) DNA polymerase, comprising the mutations N483K, E507K/A/R, S515K/N, K540N, A570K, V586G, and I614M/K with regard to the amino acid sequence of wild-type Taq DNA polymerase (SEQ ID NO: 1). 2. The DNA polymerase of claim 1 comprising an amino acid sequence at least 90% identical to SEQ ID NO: 1 including said mutations. 3. The DNA polymerase of claim 1 or claim 2 comprising an amino acid sequence corresponding to and being at least 90% identical to (i) amino acids 293 to 832 of SEQ ID NO: 1 including said mutations, (ii) amino acids 4 to 832 of SEQ ID NO: 1 including said mutations, (iii) amino acids 279 to 832 of SEQ ID NO: 1 including said mutations, or (iv) amino acids 290 to 832 of SEQ ID NO: 1 including said mutations. 4. The DNA polymerase of claim 3 comprising the amino acid sequence corresponding to amino acids 293 to 832 of SEQ ID NO: 1 including said mutations. 5. The DNA polymerase of claim 4 comprising the amino acid sequence as shown in SEQ ID NO: 2 including said mutations. 6. The DNA polymerase of any one of claims 1 to 5 further comprising the mutation F697S with regard to SEQ ID NO: 1. 7. The DNA polymerase of any one of claims 1 to 6 comprising the mutations N483K, E507A, S515K, K540N, A570K, V586G, I614M, and F697S with regard to SEQ ID NO: 1. 8. The DNA polymerase of any one of claims 1 to 5 comprising the mutations N483K, E507R, S515K, K540N, A570K, V586G, and I614K with regard to SEQ ID NO: 1. 9. The DNA polymerase of any one of claims 1 to 5 comprising the mutations N483K, E507K, S515N, K540N, A570K, V586G, and I614K with regard to SEQ ID NO: 1. 10. The DNA polymerase of any one of claims 1 to 5 comprising the amino acid sequence as shown in one of SEQ ID NOs: 3 to 5. 11. The DNA polymerase of any one of claims 1 to 10, wherein the DNA polymerase is thermostable. 12. A DNA polymerase selected from the following DNA polymerases (i) to (vi): (i) a DNA polymerase derived from wild-type Thermus thermophilus (Tth) DNA polymerase, comprising the mutations N485K, Q509K/A/R, S517K/N, K542N, A572K, V588G, and I616M/K with regard to the amino acid sequence of wild-type Tth DNA polymerase (SEQ ID NO: 42); (ii) a DNA polymerase derived from wild-type E. coli DNA polymerase I, comprising the mutations N579K, P603K/A/R, S610K/N, K635N, A665K, V681G, and I709M/K with regard to the amino acid sequence of wild-type E. coli DNA polymerase I (SEQ ID NO: 43); (iii) a DNA polymerase derived from wild-type E. coli phage T7 DNA polymerase, comprising the mutations N335K, one of the mutations T357K/A/R and V368K/A/R, one of the mutations D365K/N and D376K/N, one of the mutations K394N and K404N, V426K, V443G, and L479M/K with regard to the amino acid sequence of wild-type E. coli phage T7 DNA polymerase I (SEQ ID NO: 44); (iv) a DNA polymerase derived from wild-type Bacillus stearothermophilus (Bst) DNA polymerase, comprising the mutations N527K, S557K/N, K582N, Q612K, I628G, and I657M/K, and optionally the mutation K551A/R, with regard to the amino acid sequence of wild-type Bacillus stearothermophilus DNA polymerase (SEQ ID NO: 45); (v) a DNA polymerase derived from wild-type Bacillus subtilis (Bsu) DNA polymerase, comprising the mutations N531K, S561K/N, K586N, Q616K, I632G, and I661M/K, and optionally the mutation K555A/R, with regard to the amino acid sequence of wild-type Bacillus subtilis DNA polymerase (SEQ ID NO: 46); (vi) a DNA polymerase derived from wild-type Bacillus phage SP01 DNA polymerase, comprising the mutations N502K, D526K/A/R, H558N, V587K, V605G, and L639M/K, and optionally the mutation N533K, with regard to the amino acid sequence of wild-type Bacillus phage SP01 DNA polymerase (SEQ ID NO: 46). 13. The DNA polymerase of claim 11 comprising an amino acid sequence at least 90% identical to (i) Thermus thermophilus (Tth) DNA polymerase of SEQ ID NO: 42 and further comprising the mutations N485K, Q509K/A/R, S517K/N, K542N, A572K, V588G, and I616M/K with regard to the amino acid sequence of wild-type Tth DNA polymerase (SEQ ID NO: 42); (ii) a DNA polymerase derived from wild-type E. coli DNA polymerase I of SEQ ID NO: 43 and further comprising the mutations N579K, P603K/A/R, S610K/N, K635N, A665K, V681G, and I709M/K with regard to the amino acid sequence of wild-type E. coli DNA polymerase I (SEQ ID NO: 43); (iii) a DNA polymerase derived from wild-type E. coli phage T7 DNA polymerase of SEQ ID NO: 44 and further comprising the mutations N335K, one of the mutations T357K/A/R and V368K/A/R, one of the mutations D365K/N and D376K/N, one of the mutations K394N and K404N, V426K, V443G, and L479M/K with regard to the amino acid sequence of wild-type E. coli phage T7 DNA polymerase I (SEQ ID NO: 44); (iv) a DNA polymerase derived from wild-type Bacillus stearothermophilus (Bst) DNA polymerase of SEQ ID NO: 45 and further comprising the mutations N527K, S557K/N, K582N, Q612K, I628G, and I657M/K, and optionally the mutation K551A/R, with regard to the amino acid sequence of wild-type Bacillus stearothermophilus DNA polymerase (SEQ ID NO: 45); (v) a DNA polymerase derived from wild-type Bacillus subtilis (Bsu) DNA polymerase of SEQ ID NO: 46 and further comprising the mutations N531K, S561K/N, K586N, Q616K, I632G, and I661M/K, and optionally the mutation K555A/R, with regard to the amino acid sequence of wild-type Bacillus subtilis DNA polymerase (SEQ ID NO: 46); (vi) a DNA polymerase derived from wild-type Bacillus phage SP01 DNA polymerase of SEQ ID NO: 46 and further comprising the mutations N502K, D526K/A/R, H558N, V587K, V605G, and L639M/K, and optionally the mutation N533K, with regard to the amino acid sequence of wild-type Bacillus phage SP01 DNA polymerase (SEQ ID NO: 46). 14. A nucleic acid comprising a nucleotide sequence coding for a DNA polymerase according to any one of claim 1 to 13. 15. A vector comprising the nucleic acid of claim 14. 16. A host cell comprising the vector of claim 15 or the nucleic acid of claim 14. 17. A method for the detection of 5-methylcytosine nucleotides (5mC) in a DNA molecule of interest, comprising the steps of: (a) amplifying a first aliquot of the DNA molecule of interest in a polymerase chain reaction (PCR), said PCR using a thermostable DNA polymerase having altered fidelity opposite 5mC leading to increased nucleotide misincorporation opposite the 5mC nucleotide during PCR, (b) sequencing the amplified PCR product obtained in step (a) to generate a test sequence, (c) comparing the test sequence obtained in step (b) to a reference sequence, wherein said reference sequence is obtained by way of (i) amplifying a second aliquot of the DNA molecule of interest in a PCR, said PCR using a High-Fidelity DNA polymerase, thereby generating an unmodified reference template, (ii) amplifying the reference template obtained in step (c)(i) in a PCR, said PCR using a DNA polymerase of the present invention, and (iii) sequencing the amplified PCR product obtained in step (c)(ii) to generate the reference sequence, and (d) identifying mismatches in the test sequence as compared to the reference sequence at positions in which the reference sequence shows a C, and the test sequence shows a T at the same positions, wherein a mismatch identified in step (e) indicates the presence of a 5-methylcytosine at the corresponding positions in the DNA molecule of interest. 18. The method of claim 17, wherein the thermostable DNA polymerase is a DNA polymerase of any one of claims 1 to 13 19. The method of any one of claims 17 to 18, wherein sequencing in step (b) and/or step (c) comprises Next Generation Sequencing (NGS). 20. The method of any one of claims 17 to 19, wherein the identification of mismatches in step (e) comprises determining a relative error rate in the test sequence as compared to the reference sequence at positions in which the reference sequence shows a C. 21. The method of any one of claims 17 to 20, wherein the DNA molecule of interest does not require any chemical and/or enzymatic pre-treatment prior to step (a). 22. A kit comprising at least one container providing the DNA-polymerase of any one of claims 1 to 13. 23. The kit of claim 22, further comprising one or more additional containers selected from the group consisting of: (a) a container providing a primer hybridizable, under primer extension conditions, to a predetermined polynucleotide template; (b) a container providing dNTPs; and (c) a container providing a buffer suitable for primer extension.