Movatterモバイル変換


[0]ホーム

URL:


WO2025081111A1 - Methods and compositions for characterizing rna-binding protein binding sites by in-situ reverse transcription-based sequencing - Google Patents

Methods and compositions for characterizing rna-binding protein binding sites by in-situ reverse transcription-based sequencing
Download PDF

Info

Publication number
WO2025081111A1
WO2025081111A1PCT/US2024/051137US2024051137WWO2025081111A1WO 2025081111 A1WO2025081111 A1WO 2025081111A1US 2024051137 WUS2024051137 WUS 2024051137WWO 2025081111 A1WO2025081111 A1WO 2025081111A1
Authority
WO
WIPO (PCT)
Prior art keywords
aspects
rbp
antibody
rna
protein
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/US2024/051137
Other languages
French (fr)
Inventor
Chuan He
Yu XIAO
Zhuoning ZOU
Weixin Tang
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Chicago
Original Assignee
University of Chicago
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of ChicagofiledCriticalUniversity of Chicago
Publication of WO2025081111A1publicationCriticalpatent/WO2025081111A1/en
Pendinglegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Definitions

Landscapes

Abstract

Aspects of the present disclosure are directed to at least methods and compositions for profiling of RNA-binding protein binding sites byin-situ reverse transcription-based sequencing. Provided profiling methods can capture both stable and transient interactions between RBPs and their RNA substrates, especially when the interaction is dynamic or materials are limited. Also disclosed herein are compositions, methods, and kits suitable for profiling of RNA-binding protein binding sites byin-situ reverse transcription-based sequencing.

Description

METHODS AND COMPOSITIONS FOR CHARACTERIZING RNA-BINDING PROTEIN BINDING SITES BY IN-SITU RE VERSE TRANSCRIPTION-BASED SEQUENCING
[0001] This application claims priority to U.S. Provisional Patent Application Serial No. 63/589,874 filed October 12, 2023, which is incorporated by reference herein in its entirety.
STATEMENT OF GOVERNMENT SUPPORT
[0002] This invention was made with government support under HGOO8935 awarded by the National Institutes of Health. The government has certain rights in the invention.
SEQUENCE LISTING
[0003] The instant application contains a Sequence Listing which has been submitted in ST26 format and is hereby incorporated by reference in its entirety. Said ST26 copy, created on October 11, 2024, is named ARCD_P0808WO_Sequence_Listing.xml and is 95,382 bytes in size.
BACKGROUND
I. Field of the Invention
[0004] Aspects of this invention relate to at least the field of molecular biology. More particularly, aspects concern at least methods and compositions for characterizing RNA- binding protein binding and RNA modification sites by in-situ reverse transcription-based sequencing.
IL Background
[0005] RNA-binding proteins (RBPs) dynamically interact with their RNA targets to regulate RNA fate in all aspects, including transcription, splicing, modification, localization, translation and degradation. The dysfunction of RBPs or their binding to RNA substrates can lead to various defects or even diseases. Effective methods to capture RBP-RNA interactions, in particular dynamic or even transient interactions, are critical for obtaining better understandings of RBPs and their functional effects on target RNAs.
[0006] The widely used approaches to identify RNA substrates of RBPs are based on immunoprecipitation (IP) of the specific RBP along with their bound RNAs, either through direct RNA immunoprecipitation (RIP) or through cross-linking immunoprecipitation (CLIP) assisted by covalent capture. Substrate RNAs bound by specific RBP can be enriched through either RIP or CLIP using the antibody against the RBP, followed by high-throughput sequencing (seq) to profile RBP targets at the whole transcriptome level. CLIP-seq captures RBP binding sites on substrate RNAs via covalent crosslinking. RNase treatment digests the RBP-free regions of RNAs, which can provide a higher resolution of the binding sites. Modulations of CLIP-seq such as PAR-CLIP or eCLIP approaches further improve the efficiency of crosslinking or specificity and resolution of the binding site assignment. While these methods have been very effective and widely used, they also have limitations. All these methods are IP based and often require large amounts of starting materials due to the low efficiency of IP; the UV crosslinking in CLIP -based methods is also a low-efficiency chemical reaction. Recently reported targeted RNA immunoprecipitation sequencing (tRIP-seq) and linear amplification of complementary DNA ends sequencing (LACE-seq) can be applied in low-input samples but at the cost of significantly reducing the complexity of libraries.
[0007] Targets of RNA-binding Proteins Identified by Editing (TRIBE) and Surveying Targets by APOBEC-Mediated Profiling (STAMP) type approaches fuse RBPs with an RNA base editor to introduce mutations nearby RBP binding sites, which bypasses IP to identify RBP binding sites. These methods could be readily applied to study RNA binding by RBPs in live cells and with limited materials down to single-cell level. Their deployments into research have offered unprecedented opportunities; however, just like other methods, these editingbased methods also have limitations. They can require the manipulation of genomes through the insertion of base editing proteins in germlines or cell lines, hindering their application in primary cells and tissue samples. Induction of the editing protein expression typically takes ~ 24 hours or longer, which cannot be applied to monitor dynamic RNA binding by RBPs. These base editors have their own sequence preferences, and their fusion to RBPs could change the native binding profile of the target RBP. The entire procedure is also more complicated when comparing with CLIP or RIP type approaches. While the inventors were working on the immediate technologies, RT&Tag, a new method derived from the CUT&Tag strategy, was published. This method profiles RBP-RNA interaction by oligo(dT) primer-initiated reverse transcription and Tn5 tagmentation of the resulting full-length RNA-cDNA heteroduplex. The method can identify RBP binding in polyadenylated RNAs but is ineffective to nonpolyadenylated RNAs nor RBP binding in the cytoplasm because RT&Tag needs to be performed in isolated nuclei. Due to the low efficiency of the Tn5 enzyme on heteroduplex, it still can require ~ 100 k cells to obtain sufficient binding signals. [0008] As described above, there exists at least a need for methods and compositions for sensitively profiling RNA targets of RBPs in situ with good sequencing quality, minimal sample sizes, and/or a short time frame.
SUMMARY OF THE INVENTION
[0009] The present disclosure addresses the above need for in situ profiling of RNA targets of RNA binding proteins (RBP), with good sequencing quality, minimal sample sizes, and short time frames. Provided herein are at least methods, compositions, and kits for an Assay of Reverse Transcription-based RBP binding sites Sequencing (ARTR-seq) to capture RBP-RNA interactions through in-situ reverse transcription (RT). Additional aspects encompass modifications of ARTR-seq for simultaneous determination of RNA binding sites for multiple RBP (multiplex ARTR-seq) and use of ARTR-seq method for determining RNA modification sites (as demonstrated in advanced spatial ARTR-seq).
[0010] Aspects of the present disclosure include polypeptide constructs comprising a targeting moiety and a reverse transcriptase (RTase) enzyme, or a functional variant thereof. In some aspects, the targeting moiety is a Fc binding protein or variant thereof, an antibody or variant thereof, an oligonucleotide or variant thereof, a receptor or variant thereof, a ligand, a small molecule, an aptamer, a nucleoside, or any combination thereof. In some aspects, the targeting moiety is a Fc binding protein, wherein the Fc binding protein comprises, consists essentially of, or consists of protein A, protein G, protein A/G (pAG), protein L, anti-rabbit IgG, anti-mouse IgG, or a variant thereof, or any combination thereof. In some aspects, the Fc binding protein comprises an amino acid sequence as set forth is any one of SEQ ID NOs: 8, 10, and 12, or an amino acid sequence at least about 60% at least about 70%, at least about 80%, at least about 90%, or at least about 95% identical thereto.
[0011] In some aspects, the RTase comprises, consists essentially of, or consists of Moloney murine leukemia virus (MMLV) RTase, human immunodeficiency virus (HIV) RTase, or Avian Myeloblastosis Virus (AMV) RTase, or any functional variant thereof. In some aspects, the reverse transcriptase protein comprises an amino acid sequence as set forth is any one of SEQ ID NOs: 2, 4, and 6, or an amino acid sequence at least about 60%, at least about 70%, at least about 80%, at least about 90%, or at least about 95% identical thereto, or a functional variant thereof.
[0012] In some aspects, the current disclosure also encompasses transcriptase compositions comprising the polypeptide construct and a transcriptase mix comprising one or more adapter- RT primer, wherein the one or more adapter RT -primer each comprises an adapter primer sequence and an RT primer sequence. In some aspects, the RT -primer is a random RT -primer. In some exemplary aspects, the random RT -primer is at least, exactly, more than 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18 nucleotides long. In some aspects the adapter primer sequence may comprise a barcode. In some aspects, the adapter primer sequence may comprise a azide functional group. In some aspects the adapter-RT primer comprises a sequence as set forth in as set forth in SEQ ID NO: 25, or a sequence at least 80%, at least about 85%, at least about 90%, at least about 95%, or at least about 99% identical thereto.
[0013] In some aspects, the current disclosure encompasses a method of determining one or more RNA interaction sites of a RNA-binding Protein (RBP) in a biological sample, comprising: a) incubating a RBP-targeting agent with the RBP, wherein the RBP-targeting agent specifically binds the RBP to form a primary complex; b) incubating the first complex with one or more secondary binding agents that specifically bind the RBP-targeting agent, to form a secondary complex; c) incubating the first or the secondary complex with the transcriptase composition as disclosed herein; d) sequencing the cDNA to determine the one or more RNA interaction sites of the RBP.
[0014] Aspects of the current disclosure include method of determining one or more RNA interaction sites of a RNA-binding Protein (RBP) in a biological sample, comprising: a) incubating a RBP-targeting agent with the RBP, wherein the RBP-targeting agent specifically binds the RBP to form a primary complex; b) incubating the first complex with one or more secondary binding agents that specifically bind the RBP-targeting agent, to form a secondary complex; c) incubating the first or the secondary complex with the transcriptase composition as disclosed herein, to obtain cDNA; and d) sequencing the cDNA to determine the one or more RNA interaction sites of the RBP.
[0015] Aspects of the current disclosure encompass a kit comprising a polypeptide construct as disclosed herein. In some aspects, the kit comprises the transcriptase composition as disclosed herein. In some aspects, the kit comprise in one or more suitable container(s), an RBP-targeting agent that specifically binds to an RBP, optionally one or more secondary binding agents, and the transcriptase composition.
[0016] In some aspects, the current disclosure also encompasses a method of identifying one or more RNA interaction sites of a RNA-binding Protein (RBP) in a biological sample, comprising: (a) fixing the biological sample; (b) incubating the biological sample with an agent that permeabilizes cell membranes; (c) providing an RBP-targeting agent to the sample, wherein the RBP-targeting agent interacts with the RBP of interest; (d) providing a transcriptase composition comprising a polypeptide construct comprising a targeting moiety and a reverse transcriptase enzyme; wherein the targeting moiety interacts with the RBP- targeting agent; (e) incubating the sample with the transcriptase composition to produce cDNA; and (f) sequencing the cDNA.
[0017] Aspects of the current disclosure include a method of determining one or more RNA interaction sites of a first RNA-binding Protein (RBP) in a biological sample, comprising: a) incubating a first RBP-targeting agent comprising a functionalized first DNA barcode, to the first RBP, wherein the first RBP-targeting agent specifically binds the first RBP to form a first primary complex; b) incubating the first primary complex with one or more secondary binding agents that specifically binds the first RBP-targeting agent, to form a secondary complex; c) incubating the first primary or the secondary complex with the transcriptase composition disclosed herein, to obtain a first barcoded cDNA library; d) amplifying and sequencing the first barcoded cDNA library; and e) obtaining one or more interaction site of the first RBP by deconvoluting the sequenced cDNA library based on the first DNA barcode. In some aspects, the method further comprises determining the one or more RNA-interaction sites of a second RNA-binding Protein (RBP) in a biological sample, comprising: a) incubating a second RBP- targeting agent comprising a alkyne functionalized second DNA barcode, to the second RBP, wherein the RBP-targeting agent specifically binds the second RBP to form a second primary complex; b) incubating the second primary complex with one or more secondary binding agents that specifically binds the first RBP-targeting agent, to form a second secondary complex; c) incubating the second primary or the second secondary complex with the transcriptase composition disclosed herein, to obtain a second barcoded cDNA library; d) amplifying and sequencing the second barcoded cDNA library; and e) obtaining one or more interaction site of the second RBP by deconvoluting the sequenced cDNA library based on the second DNA barcode. In some aspects, the method can be adapted to determine the one or more RNA interaction sites for greater than, equal to, at least, at most 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500 RBPs.
[0018] Aspects of the current disclosure also comprise a method of determining spatial distribution of a RNA modification site on a biological sample bound to a solid surface, comprising: a) incubating a modification-targeting agent that specifically binds the modification site on the RNA to form a primary complex; b) incubating the primary complex with a secondary binding agent that specifically bind the primary complex to form a secondary complex; c) incubating the primary complex or the secondary complex with the transcriptase composition to obtain cDNA; d) optionally incorporating labelled barcodes into the cDNA; e) sequencing and imaging the biological sample using a single cell genomic imaging technique to determine the one or more modification sites.
[0019] Aspects of the present description include at least methods for identifying RNA- binding Protein (RBP)-RNA interaction sites, for in-situ imaging of RNA-binding Protein (RBP)-RNA interaction sites, for identifying RBP-RNA biding sites, for sequencing RBP- RNA binding sites, for evaluating at least one RBP-RNA interaction, for detecting transient and/or dynamic RNA-RBP interactions, for detecting dynamic RNA-RBP interactions during stress granule assembly, for detecting transient and/or dynamic RNA-RBP interactions that occur on a timescale within 10 minutes, for detecting RNA-RBP interaction sites in the nucleus and/or cytoplasm, for detecting RBP binding to polyadenylated RNAs, for detecting RBP binding to non-polyadenylated RNAs, for providing a quantitative and/or qualitative measurement of the binding strength of the RBP of interest to different RNA substrates, for capturing a distinctive binding pattern for the RBP of interest, for measuring relative binding strength of multiple RBPs of interest, and/or for generating a complex sequencing library, and compositions for performing the aforementioned. Some aspects of the present description include also include kits suitable for identifying RNA-binding Protein (RBP)-RNA interaction sites in a biological sample. In some aspects, interaction sites can comprise binding sites where the RBP of interest binds to the RNA.
[0020] Any method described herein can include, at least 1, 2, 3, 4, 5, 6, 7, or more of the following steps: (a) fixing the sample or incubating the sample with a fixative; (b) incubating the sample with an agent that permeabilizes cell membranes and/or incubating the sample under conditions that permeabilize the cell membranes; (c) providing an RBP-targeting agent to the sample, wherein the RBP-targeting agent interacts with an RBP of interest; (d) providing a targeting moiety to the sample, wherein the targeting moiety interacts with the RBP-targeting agent; (e) capturing or isolating the RBP of interest that is interacting with RNA; (f) incubating the sample under conditions to produce cDNA; and (g) sequencing the cDNA. Any one or more of the preceding steps can be excluded from certain aspects of the disclosure. In some aspects, step (e) of the above is expressly excluded from the method.
[0021] In some aspects, each step occurs at a temperature of at, at least, at most, or about 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, or 45 °C, including any range or value derivable therein. In certain aspects, it is specifically contemplated that a step is not performed at 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, or 45 °C. In some aspects, a step is performed at or between 20 and 30 °C (inclusive). In some aspects, a step is performed at or at about 25 °C. In some aspects, a step is performed at or at about 0 °C.
[0022] In some aspects, the sample comprises cells and/or one or more tissue section. In some aspects, the sample comprises, comprises less than, or comprises about 5,000, 4,000, 3,000, 2,000, 1,000, 500, 400, 300, 200, 100, 90, 80, 70, 60, 50, 40, 30, or 20 cells, or fewer than 20 cells, including any range or value derivable therein. In some aspects, the method detects RBP-RNA interactions in a sample comprising, comprising less than, or comprising about 5,000, 4,000, 3,000, 2,000, 1,000, 500, 400, 300, 200, 100, 50, or 20 or fewer cells, including any range or value derivable therein.
[0023] In some aspects, the sample comprises, comprises less than, or comprises about 10, 9, 8, 7, 6, 5, 4, 3, 2, or 1 tissue sections, including any range or value derivable therein. In some aspects, the method detects RBP-RNA interactions in a sample comprising, comprising less than, or comprising about 10, 9, 8, 7, 6, 5, 4, 3, 2, or 1 tissue sections, including any range or value derivable therein. In some aspects, the one or more tissue section is initially frozen, then brought to room temperature. In some aspects, the initially frozen one or more tissue section is brought to room temperature for, or for less than 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 minutes, including any range or value derivable therein. In some aspects, the initially frozen one or more tissue section is brought to room temperature for 10 minutes. In some aspects, a thin, film-like, hydrophobic barrier encircles the initially frozen one or more tissue section. In some aspects, a PAP pen further provides the thin, film-like, hydrophobic barrier.
[0024] In some aspects, the method detects transient and/or dynamic RNA-RBP interactions. In some aspects, the method detects transient and/or dynamic RNA-RBP interactions that occur on a timescale of, or of less than 30, 25, 20, 15, 10, or 5 minutes, including any range or value derivable therein. In some aspects, the method is completed in less than 24 hours. In some aspects, the method detects RNA-RBP interaction sites in the nucleus and/or cytoplasm. In some aspects, the method detects RBP binding to polyadenylated RNAs. In some aspects, the method detects RBP binding to non-polyadenylated RNAs. In some aspects, the method detects dynamic RNA-RBP interactions during stress granule assembly.
[0025] In some aspects, the method provides a quantitative and/or qualitative measurement of the binding strength of the RBP of interest to different RNA substrates. In some aspects, the RBP of interest includes, or expressly does not include, but is not limited to HuR, PTB, Musashi, eIF4E, FMRP, LARP1, IMP, hnRNP family proteins, Lin28, AUF1, IGF2BP, FUBP1, LIN28B, RBM5, FUS, TIA1, TTP, QKI, MBNL, CELF, NONO, DDX5, RBM10, SAFB, TDP-43, Ataxin-2, hnRNP A/B, C9orf72, hnRNP H/F, Matrin 3 (MATR3), Pur-alpha, TAF15, Huntingtin, RBFOX, SMN, ELAVL, Ro (SSA) and La (SSB) Proteins, hnRNP, Roquin, Staufenl, NF90/NF110, ILF3, SF3B1, SRSF2, U2AF1, ZRSR2, PRPF8, PRPF31, SNRNP200, HNRNPA1, HNRNP A2B1, NELFE, CPEB1, SRSF1, NOVAI, NOVA2, G3BP1, PTBP1, RBFOX2, HNRNPC, YTHDF1, YTHDF2, and/or YTHDC1. In some aspects, the method captures a distinctive binding pattern for the RBP of interest. In some aspects, the method captures a distinctive binding pattern of multiple RBPs of interest. In some aspects, the distinctive binding pattern of the RBP of interest indicates a difference in splicing. In some aspects, the method measures relative binding strength of multiple RBPs of interest. In some aspects, the method generates a complex sequencing library.
[0026] Also disclosed herein, in some aspects, are methods for identifying RNA-binding Protein (RBP)-RNA interaction sites in a biological sample, in which the method is free of limitations of established sequencing shortcomings. In some aspects, the method can or does not comprise ultraviolet cross-linking. In some aspects, the method can or does not comprise immunoprecipitation. In some aspects, the method can or does not comprise insertion and/or utilization of base editing proteins. In some aspects, the method can or does not comprise dissociating one or more tissue section into single cells. In some aspects, the method can or does not comprise oligo(dT) primer initiated reverse transcription. In some aspects, the method can or does not comprise Tn5 tagmentation.
[0027] In some aspects, the fixing step comprises, consists, or consists essentially of rapidly freezing the sample. In some aspects, the fixing step comprises, consists, or consists essentially of treating the sample with formaldehyde. In some aspects, the fixing step comprises, consists, or consists essentially of treating the sample with paraformaldehyde (PF A). In some aspects, the fixing step comprises, consists, or consists essentially of treating the sample with at, at least, at most, or about 0. 1%, 0.2%, 0.3%, 0.4%, 0.5%, 0.6%, 0.7%, 0.8%, 0.9%, 1.0%, 1.1%, 1.2%, 1.3%, 1.4%, 1.5%, 1.6%, 1.7%, 1.8%, 1.9%, 2.0%, 2.1%, 2.2%, 2.3%, 2.4%, or 2.5% PF A. In some aspects, the fixing step occurs for, or for less than 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 minutes, including any range or value derivable therein. In some aspects, the fixing step occurs at room temperature.
[0028] In some aspects, the fixing step is quenched. In some aspects, the fixing step is quenched with glycine. In some aspects, the quenching glycine is at greater than or equal to a concentration of, of at least, of at most, or of about 25, 50, 75, 100, 125, 150, 200, 225, or 250 mM, including any range or value derivable therein. In some aspects, the quenching step occurs for, or for less than 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 minutes, including any range or value derivable therein. In some aspects, the quenching step occurs at room temperature.
[0029] In some aspects, an agent that permeabilizes cell membranes comprises, consists, or consists essentially of a detergent. In some aspects, an agent that permeabilizes cell membranes comprises, consists, or consists essentially of greater than or equal to 0. 1%, 0.2%, 0.3%, 0.4%, 0.5%, 0.6%, 0.7%, 0.8%, 0.9%, 1.0%, 1.1%, 1.2%, 1.3%, 1.4%, or 1.5%, including any range or value derivable therein, Triton X-100. In some aspects, the incubating step occurs for, or for less than 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 minutes, including any range or value derivable therein. In some aspects, the incubating with a permeabilizing agent step occurs on ice.
[0030] In some aspects, the at least one RNase is optionally provided to the sample following the incubating step. In some aspects, the providing of the at least one RNase improves resolution during the sequencing step. In some aspects, the at least one RNase comprises, consists, or consists essentially of ribonuclease I (RNase I, via Thermo Fisher Scientific), RNase A, and/or RNase Tl. In some aspects, the RNase is provided to the sample for, or for less than 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 minutes, including any range or value derivable therein. In some aspects, the at least one RNase is provided to the sample at 37 °C.
[0031] In some aspects, the RBP targeting agent comprises, consists, or consists essentially of a first antibody that is directed to the RBP of interest. In some aspects, the RBP targeting step occurs for, or for less than 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 75, 90, 105, or 120 minutes, including any range or value derivable therein. In some aspects, the RBP targeting step occurs at room temperature.
[0032] In some aspects, a second antibody is optionally provided to the sample following the RBP targeting step. In some aspects, the second antibody is directed to the first antibody. In some aspects, the second antibody is directed to a fragment crystallizable (Fc) region of the first antibody. In some aspects, the second antibody is provided to the sample for, or for less than 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, or 60 minutes, including any range or value derivable therein. In some aspects, the second antibody is provided to the sample at room temperature. In some aspects, providing the second antibody increases a local antibody concentration around the RBP of interest. In some aspects, the first and/or second antibody are optionally tagged. In some aspects, the tag comprises a fluorophore.
[0033] In some aspects, the sample is blocked before the RBP targeting agent is provided to the sample. In some aspects, the sample is blocked with bovine serum albumin (BSA). In some aspects, the BSA is at a concentration of greater than or equal to 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5, 8.0, 8.5, 9.0, 9.5, or 10.0 mg/mL, including any range or value derivable therein. In some aspects, the sample is blocked for, for less than, or for about 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, or 60 minutes, including any range or value derivable therein. In some aspects, the sample is blocked at room temperature.
[0034] In some aspects, the sample is washed 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 times, including any range or value derivable therein, after the at least one RNase is provided to the sample, the RBP targeting step, after the second antibody is provided to the sample, and/or after the blocking step. In some aspects, the washing step comprises, consists, or consists essentially of washing the sample with DPBS. In some aspects, the washing step comprises, consists, or consists essentially of shaking the sample with DPBS. In some aspects, the washing step occurs for, or for less than 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 minutes, including any range or value derivable therein. In some aspects, the washing step occurs at room temperature.
[0035] In some aspects, the targeting moiety is fused to a reverse transcriptase (RTase). In some aspects, the targeting moiety fused to the RTase allows site-specific delivery of the RTase to the RBP of interest. In some aspects, the RTase comprises, consists, or consists essentially of Moloney murine leukemia virus (MMLV) RTase, human immunodeficiency virus (HIV) RTase, AMV RTase, or any functional variant thereof. In some aspects, the RTase comprises, consists, or consists essentially of truncated MMLV RTase. In some aspects, The MMLV RTase is a truncated MMLV RTase that does not include an H domain and does not include the first 24 N-terminal residues of MMLV RTase. In some aspects, the RTase comprises, consists, or consists essentially of a sequence at least 80% identical to SEQ ID 1-6. In some aspects, the targeting moiety comprises, consists, or consists essentially of an scFv domain. In some aspects, the targeting moiety comprises, consists, or consists essentially of a third antibody. In some aspects, the targeting moiety comprises, consists, or consists essentially of an Fc binding protein. In some aspects, the Fc binding protein is protein A, protein G, protein A/G (pAG), protein L, anti-rabbit IgG, and/or anti -mouse IgG. In some aspects, the pAG binds the Fc domain of both the first and second antibody. In some aspects, the targeting moiety fused with the RTase comprises, consists, or consists essentially of pAG-RTase. In some aspects, the targeting moiety is fused to the RTase via a short linker, a medium linker, or a long linker. In some aspects, the short linker comprises, consists, or consists essentially of 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 amino acids, including any range or value derivable therein. In some aspects, the medium linker comprises, consists, or consists essentially of 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 amino acids, including any range or value derivable therein. In some aspects, the long linker comprises, consists, or consists essentially of 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, or 40 amino acids, including any range or value derivable therein. In some aspects, the long linker is greater than 40 amino acids. In some aspects, the shorter linkers (e.g., the short and/or medium linkers) can reduce inaccuracy during the sequencing step. In some instances, the shorter linkers can slow the kinetics of the RTase, decrease the yield of biotinylated cDNA, increase read accumulation, reduce RT efficiency, concentrate ARTR-seq signals, and/or reduce off-target capture of RBP-RNA interaction sites.
[0036] In some aspects, the targeting moiety is optionally tagged. In some aspects, the tag comprises a fluorophore. In some aspects, the targeting moiety is provided to the sample for, or for less than 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 75, 90, 105, or 120 minutes, including any range or value derivable therein. In some aspects, the targeting moiety is provided to the sample at room temperature or at 4 °C.
[0037] In some aspects, the conditions to produce cDNA comprises, consists, or consists essentially of (i) providing the sample with a reverse transcription (RT) reaction mixture, and (ii) halting RT. In some aspects, the RT reaction mixture comprises, consists, or consists essentially of at least one primer, dNTPs, and other components. In some aspects, the at least one primer comprises, consists, or consists essentially of an adapter-RT primer fused to random RT primers. In some aspects, the random RT primers are not hexamers. In some aspects, the random RT primers comprise, consist, or consist essentially of at least septamers, octamers, nonamers, decamers, undecamers, dodecamers, tridecamers, tetradecamers, pentadecamers, hexadecamers, heptadecamers, octadecamers, nonadecamers, or eicosamers. In some aspects, the adapter-RT primer is at a concentration of, of at least, or of about 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0, 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5, 8.0, 8.5, 9.0, 9.5, or 10.0 pM, including any range or value derivable therein. In some aspects, the adapter-RT primer comprises, consists, or consists essentially of a sequence at least 80% identical to SEQ. ID. NO. 25.
[0038] In some aspects, the dNTPs comprise, consist, or consist essentially of at least one labelled dNTP. In some aspects, the labelled dNTP is labeled with biotin. In some aspects, the labelled dNTP is labeled with biotin- 16. In some aspects, the labeled dNTP is mixed with a corresponding non-labeled dNTP. In some aspects, the labeled dNTP is mixed with a nonlabeled dNTP at a ratio of at least 1 : 1. In some aspects, the dNTPs comprise, consist, or consist essentially of a combination of biotin- 16-dUTP, biotin- 16-dCTP, dTTP, dCTP, dATP, and/or dGTP. In some aspects, the biotin- 16-dUTP is at a concentration of, of at least, or of about 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5, 8.0, 8.5, 9.0, 9.5, or 10.0 pM, including any range or value derivable therein. In some aspects, the biotin- 16-dCTP is at a concentration of, of at least, or of about 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5, 8.0, 8.5, 9.0, 9.5, or 10.0 pM, including any range or value derivable therein. In some aspects, the dTTP is at a concentration of, of at least, or of about 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5, 8.0, 8.5, 9.0, 9.5, or 10.0 pM, including any range or value derivable therein. In some aspects, the dCTP is at a concentration of, of at least, or of about 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5, 8.0, 8.5, 9.0, 9.5, or 10.0 pM, including any range or value derivable therein. In some aspects, the dATP is at a concentration of, of at least, or of about 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5, 8.0,
8.5, 9.0, 9.5, or 10.0 pM, including any range or value derivable therein. In some aspects, the dGTP is at a concentration of, of at least, or of about 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0,
5.5, 6.0, 6.5, 7.0, 7.5, 8.0, 8.5, 9.0, 9.5, or 10.0 pM, including any range or value derivable therein.
[0039] In some aspects, the other components comprise, consist, or consist essentially of a non-competitive inhibitor of pancreatic-type ribonucleases, a buffer, and/or MgCh. In some aspects, the non-competitive inhibitor of pancreatic-type ribonucleases comprises, consists, or consists essentially of, of at least, or of about 0.1, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, or 5.0 U/pl RNaseOUT, including any range or value derivable therein. In some aspects, the buffer comprises, consists, or consists essentially of, of at least, or of about 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, or 100 pl of DPBS, including any range or value derivable therein. In some aspects, the MgCh is at a concentration of, of at least, or of about 0.1, 0.1, 0.2, 0.3, 0.4,
0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0, 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9,
3.0, 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5, 8.0, 8.5, 9.0,
9.5, or 10.0 mM, including any range or value derivable therein. [0040] In some aspects, the transcriptase composition is provided to the sample for, or for at least 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, or 60 minutes, including any range or value derivable therein. In some aspects, the transcriptase composition is provided to the sample at, at less than, at equal to, at about, or at more than 34 °C, 35 °C, 36 °C, 37 °C, 38 °C, 39 °C, 40 °C, 41 °C, 42 °C, 43 °C, 44 °C, 45 °C, 46 °C, 47 °C, 48 °C, 49 °C, 50 °C, 51 °C, or 52 °C, 53 °C, 54 °C, 55 °C, 56 °C, 57 °C, 58 °C. In some aspects, the transcriptase composition is provided to the sample at less than, equal to, about or more than 34 °C, 35 °C, 36 °C, 37 °C, 38 °C, 39 °C, 40 °C, 41 °C, 42 °C. In some aspects, the transcriptase mix is provided to the sample at 37 °C - 42 °C. In some aspects, the reverse transcriptase, or functional variant thereof, is enzymatically active at, or at about, 34 °C, 35 °C, 36 °C, 37 °C, 38 °C, 39 °C, 40 °C, 41 °C, or 42 °C, or any range derivable therein. In some aspects, the reverse transcriptase, or functional variant thereof, is enzymatically active at, or at about, 35 °C, 36 °C, 37 °C, 38 °C, or 39 °C, or any range derivable therein.
[0041] . In some aspects, the cDNA comprises, consists, or consists essentially of the dNTPs. In some aspects, the cDNA is biotinylated. In some aspects, halting RT comprises, consists, or consists essentially of providing at least one chelating agent to the sample. In some aspects, the at least one chelating agent comprises, consists, or consists essentially of EDTA and/or EGTA. In some aspects, the EDTA is at a concentration of, of at least, or of about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, or 40 mM, including any range or value derivable therein. In some aspects, the EGTA is at a concentration of, of at least, or of about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 mM, including any range or value derivable therein. In some aspects, the at least one chelating agent is provided to the sample for, or for at least 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 minutes, including any range or value derivable therein. In some aspects, the at least one chelating agent is provided to the sample at room temperature. [0042] In some aspects, the method further comprises, consists, or consists essentially of an optional in-situ imaging step after the incubating step. In some aspects, the in-situ imaging step comprises, consists, or consists essentially of: (i) first providing the sample with an imaging antibody, (ii) second providing the sample with a cell-permanent nuclear counterstain, and (iii) determining fluorescence intensity of the sample. In some aspects, the in-situ imaging is by fluorescence imaging. In some aspects, the optional in-situ imaging step provides direct spatial information of the RBP-RNA interaction sites. In some aspects, the optional in-situ imaging step provides distinct binding patterns of the RBP-RNA interaction sites. In some aspects, the optional in-situ imaging step reveals subcellular localization of the RBP of interest. In some aspects, the optional in-situ imaging step does not impede amplification of the cDNA when the optional second antibody and targeting moiety are tagged with a fluorophore. In some aspects, the optional in-situ imaging step demonstrates regulatory differences among reader proteins. In some aspects, the reader proteins comprises, consists, or consists essentially of RNA N6-methyladenosine (m6A) reader proteins. In some aspects, the m6A reader proteins comprise, consist, or consist essentially of YTH family proteins or IGF2BP proteins. In some aspects, the imaging antibody targets the biotinylated cDNA. In some aspects, the imaging antibody comprises, consists, or consists essentially of a biotin monoclonal antibody. In some aspects, the biotin monoclonal antibody can comprise, consist, or consist essentially of Alexa Fluor 488. In some aspects, the imaging antibody is provided to the sample for, or for at least 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 75, 90, 105, or 120 minutes, including any range or value derivable therein. In some aspects, the imaging antibody is provided to the sample at room temperature.
[0043] In some aspects, the cell-permanent nuclear counterstain emits blue fluorescence when bound to dsDNA. In some aspects, the cell-permanent nuclear counterstain can comprise, consist, or consist essentially of Hoechst 33342 dye. In some aspects, the cell-permanent nuclear counterstain is at a concentration of, of at least, or of about 0.1, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.5, or 2.0 pg/mL, including any range or value derivable therein. In some aspects, the cell-permanent nuclear counterstain is provided to the sample for, or for at least 5, 10, 15, 20, 25, or 30 minutes, including any range or value derivable therein. In some aspects, the cell-permanent nuclear counterstain is provided to the sample at room temperature.
[0044] In some aspects, the method further comprises, consists, or consists essentially of an optional cell digestion step after the incubating step or optional in-situ imaging step. In some aspects, the optional cell digestion step comprises, consists, or consists essentially of treating the sample with a protease. In some aspects, the protease comprises, consists, or consists essentially of an endolytic protease. In some aspects, the endolytic protease comprises, consists, or consists essentially of proteinase K. In some aspects, the optional cell digestion step occurs for, or for at least 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 75, 90, 105, 120, 135, 150, 165, or 180 minutes, including any range or value derivable therein. In some aspects, the optional cell digestion step occurs at 37 °C.
[0045] In some aspects, the cDNA sequencing step produces a binding profile for the RBP of interest. In some aspects, the cDNA sequencing step comprises, consists, or consists essentially of amplifying the cDNA, purifying the amplified cDNA, and high-throughput sequencing the purified cDNA. In some aspects, amplifying the cDNA comprises, consists, or consists essentially of PCR amplifying the cDNA with next generation sequencing (NGS) primers. In some aspects, purifying the amplified cDNA comprises, consists, or consists essentially of gel electrophoresis and extraction of the amplified cDNA. In some aspects, the cDNA sequencing step further comprises, consists, or consists essentially of trimming 3'-ends from the binding profile to remove imperfectly paired sequences.
[0046] Also disclosed herein, in some aspects, are kits suitable for identifying RNA- binding Protein (RBP)-RNA interaction sites in a biological sample. In some aspects, disclosed is a kit comprising, consisting, or consisting essentially of, in suitable container(s), an RBP targeting agent, a targeting moiety, at least one primer, dNTPs, and one or more nucleic acid buffers. In some aspects, the RBP targeting agent comprises, consists, or consists essentially of a first antibody that is directed to the RBP of interest. In some aspects, the targeting moiety interacts with the RBP-targeting agent. In some aspects, the targeting moiety is fused to a reverse transcriptase (RTase). In some aspects, the RTase comprises, consists, or consists essentially of Moloney murine leukemia virus (MMLV) RTase, human immunodeficiency virus (HIV) RTase, AMV RTase, or any functional variant thereof. In some aspects, the RTase comprises, consists, or consists essentially of truncated MMLV RTase. In some aspects, the RTase comprises a truncated MMLV RTase that does not include an H domain and does not include the first 24 N-terminal residues of MMLV RTase. In some aspects, the RTase comprises a sequence at least 80% identical to SEQ ID NO 1-6. In some aspects, the targeting moiety comprises, consists, or consists essentially of an scFv domain. In some aspects, the targeting moiety comprises, consists, or consists essentially of a third antibody. In some aspects, the targeting moiety comprises, consists, or consists essentially of an Fc binding protein. In some aspects, the Fc binding protein is protein A, protein G, protein A/G (pAG), protein L, anti-rabbit IgG, anti -and/or mouse IgG. In some aspects, the pAG binds the Fc domain of both the first and second antibody. In some aspects, the targeting moiety fused with the RTase comprises pAG-RTase.
[0047] In some aspects, the at least one primer comprises an adapter-RT primer fused to random RT primers.
[0048] In some aspects, the dNTPs comprise, consist, or consist essentially of at least one labelled dNTP. In some aspects, the labelled dNTP is labeled with biotin. In some aspects, the labelled dNTP is labeled with biotin- 16. In some aspects, the labeled dNTP is mixed with a corresponding non-labeled dNTP. In some aspects, the labeled dNTP is mixed with a nonlabeled dNTP at a ratio of at least 1 : 1. In some aspects, the dNTPs comprise, consist, or consist essentially of a combination of biotin- 16-dUTP, biotin- 16-dCTP, dTTP, dCTP, dATP, and/or dGTP. In some aspects, the biotin- 16-dUTP is at a concentration of, of at least, or of about 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5, 8.0, 8.5, 9.0, 9.5, or 10.0 pM, including any range or value derivable therein. In some aspects, the biotin- 16-dCTP is at a concentration of, of at least, or of about 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5, 8.0, 8.5, 9.0, 9.5, or 10.0 pM, including any range or value derivable therein. In some aspects, the dTTP is at a concentration of, of at least, or of about 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5, 8.0, 8.5, 9.0, 9.5, or 10.0 pM, including any range or value derivable therein. In some aspects, the dCTP is at a concentration of, of at least, or of about 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5, 8.0, 8.5, 9.0, 9.5, or 10.0 pM, including any range or value derivable therein. In some aspects, the dATP is at a concentration of, of at least, or of about 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5, 8.0,
8.5, 9.0, 9.5, or 10.0 pM, including any range or value derivable therein. In some aspects, the dGTP is at a concentration of, of at least, or of about 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0,
5.5, 6.0, 6.5, 7.0, 7.5, 8.0, 8.5, 9.0, 9.5, or 10.0 pM, including any range or value derivable therein.
[0049] In some aspects, the kit further comprises a cell-fixing agent, a quenching agent, and/or an agent that permeabilizes cell membranes. In some aspects, the cell-fixing agent comprises, consists, or consists essentially of paraformaldehyde. In some aspects, the PFA is at a concentration of, of at least, of at most, or of about 0.1%, 0.2%, 0.3%, 0.4%, 0.5%, 0.6%, 0.7%, 0.8%, 0.9%, 1.0%, 1.1%, 1.2%, 1.3%, 1.4%, 1.5%, 1.6%, 1.7%, 1.8%, 1.9%, 2.0%, 2.1%, 2.2%, 2.3%, 2.4%, or 2.5%. In some aspects, the quenching agent comprises, consists, or consists essentially of glycine. In some aspects, the quenching glycine is at a concentration of, of at least, of at most, or of about 25, 50, 75, 100, 125, 150, 200, 225, or 250 mM, including any range or value derivable therein. In some aspects, the agent that permeabilizes cell membranes comprises, consists, or consists essentially of, of at least, or of about 0.1%, 0.2%, 0.3%, 0.4%, 0.5%, 0.6%, 0.7%, 0.8%, 0.9%, 1.0%, 1.1%, 1.2%, 1.3%, 1.4%, or 1.5% Triton X-100, including any range or value derivable therein. In some aspects, the kit further comprises at least one RNase. In some aspects, the at least one RNase comprises, consists, or consists essentially of ribonuclease I (RNase I, via Thermo Fisher Scientific), RNase A, and/or RNase Tl.
[0050] In some aspects, the kit further comprises a second antibody. In some aspects, the second antibody is directed to the first antibody. In some aspects, the first antibody, the second antibody, and/or the targeting moiety are optionally tagged. In some aspects, the tag comprises a fluorophore.
[0051] In some aspects, the other components comprise, consist, or consist essentially of a non-competitive inhibitor of pancreatic-type ribonucleases, a buffer, and/or MgCh. In some aspects, the non-competitive inhibitor of pancreatic-type ribonucleases comprises, consists, or consists essentially of, of at least, or of about 0.1, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, or 5.0 U/pl RNaseOUT, including any range or value derivable therein. In some aspects, the buffer comprises, consists, or consists essentially of, of at least, or of about 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, or 100 pl of DPBS, including any range or value derivable therein. In some aspects, the MgCh is at a concentration of, of at least, or of about 0.1, 0.1, 0.2, 0.3, 0.4,
0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0, 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9,
3.0, 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5, 8.0, 8.5, 9.0,
9.5, or 10.0 mM, including any range or value derivable therein.
[0052] In some aspects, the kit further comprises, consists, or consists essentially of at least one chelating agent. In some aspects, the at least one chelating agent comprises, consists, or consists essentially of EDTA and/or EGTA. In some aspects, the EDTA is at a concentration of, of at least, or of about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, or 40 mM, including any range or value derivable therein. In some aspects, the EGTA is at a concentration of, of at least, or of about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 mM, including any range or value derivable therein In some aspects, the kit further comprises, consists, or consists essentially of a protease. In some aspects, the protease comprises, consists, or consists essentially of an endolytic protease. In some aspects, the endolytic protease comprises, consists, or consists essentially of proteinase K.
[0053] Certain aspects of the present disclosure are characterized through the following enumerated aspects.
[0054] Aspect 1 is a method of identifying RNA-binding Protein (RBP)-RNA interaction sites in a biological sample, comprising: (a) fixing the sample; (b) contacting the sample with an agent that permeabilizes cell membranes; (c) providing an RBP-targeting agent to the sample, wherein the RBP-targeting agent interacts with an RBP of interest; (d) providing a targeting moiety to the sample, wherein the targeting moiety interacts with the RBP-targeting agent; (e) incubating the sample under conditions to produce cDNA; and (f) sequencing the cDNA.
[0055] Aspect 2 is the method of aspect 1, wherein the sample comprises cells and/or one or more tissue section.
[0056] Aspect 3 is the method of any one of aspects 1 and 2, wherein the sample comprises 40,000 or fewer cells.
[0057] Aspect 4 is the method of any one of aspects 1 to 3, wherein the sample comprises 5,000 or fewer cells.
[0058] Aspect 5 is the method of any one of aspects 1 to 4, wherein the sample comprises 1,000 or fewer cells.
[0059] Aspect 6 is the method of any one of aspects 1 to 5, wherein the sample comprises 500 or fewer cells.
[0060] Aspect 7 is the method of any one of aspects 1 to 6, wherein the sample comprises 100 or fewer cells.
[0061] Aspect 8 is the method of any one of aspects 1 to 7, wherein the sample comprises 50 or fewer cells.
[0062] Aspect 9 is the method of any one of aspects 1 to 8, wherein the sample comprises 20 or fewer cells.
[0063] Aspect 10 is the method of any one of aspects 1 to 9, wherein the sample comprises 10 or fewer tissue sections.
[0064] Aspect 11 is the method of any one of aspects 1 to 10, wherein the sample comprises 5 or fewer tissue sections.
[0065] Aspect 12 is the method of any one of aspects 1 to 11, wherein the sample comprises 3 or fewer tissue sections.
[0066] Aspect 13 is the method of any one of aspects 1 to 12, wherein the sample comprises 2 or fewer tissue sections.
[0067] Aspect 14 is the method of any one of aspects 1 to 13, wherein the sample comprises 1 tissue section.
[0068] Aspect 15 is the method of any one of aspects 1 to 14, wherein the one or more tissue section is initially frozen, then brought to room temperature.
[0069] Aspect 16 is the method of any one of aspects 1 to 15, wherein the RBP of interest comprises HuR, PTB, Musashi, eIF4E, FMRP, LARP1, IMP, hnRNP family proteins, Lin28, AUF1, IGF2BP, FUBP1, LIN28B, RBM5, FUS, TIA1, TTP, QKI, MBNL, CELF, NONO, DDX5, RBM10, SAFB, TDP-43, Ataxin-2, hnRNP A/B, C9orf72, hnRNP H/F, Matrin 3 (MATR3), Pur-alpha, TAF15, Huntingtin, RBFOX, SMN, ELAVL, Ro (SSA) and La (SSB) Proteins, hnRNP, Roquin, Staufenl, NF90/NF110, ILF3, SF3B1, SRSF2, U2AF1, ZRSR2, PRPF8, PRPF31, SNRNP200, HNRNPA1, HNRNP A2B1, NELFE, CPEB1, SRSF1, NOVAI, NOVA2, G3BP1, PTBP1, RBFOX2, HNRNPC, YTHDF1, YTHDF2, and/or YTHDC1.
[0070] Aspect 17 is the method of any one of aspects 1 to 15, wherein the RBP of interest comprises G3BP1, PTBP1, RBFOX2, HNRNPC,YTHDF1, YTHDF2, and/or YTHDC1.
[0071] Aspect 18 is the method of any one of aspects 1 to 16, wherein the method does not comprise ultraviolet cross-linking.
[0072] Aspect 19 is the method of any one of aspects 1 to 18, wherein the method does not comprise immunoprecipitation.
[0073] Aspect 20 is the method of any one of aspects 1 to 19, wherein the method does not comprise insertion of base editing proteins.
[0074] Aspect 21 is the method of any one of aspects 1 to 20, wherein the method can not comprise dissociating the one or more tissue section into single cells.
[0075] Aspect 22 is the method of any one of aspects 1 to 21, wherein the method detects RBP-RNA interaction sites in a sample comprising 500 or fewer cells.
[0076] Aspect 23 is the method of any one of aspects 1 to 22, wherein the method detects the RBP-RNA interaction sites in a sample comprising 400 or fewer cells.
[0077] Aspect 24 is the method of any one of aspects 1 to 23, wherein the method detects the RBP-RNA interaction sites in a sample comprising 300 or fewer cells.
[0078] Aspect 25 is the method of any one of aspects 1 to 24, wherein the method detects the RBP-RNA interaction sites in a sample comprising 200 or fewer cells.
[0079] Aspect 26 is the method of any one of aspects 1 to 25, wherein the method detects the RBP-RNA interaction sites in a sample comprising 100 or fewer cells.
[0080] Aspect 27 is the method of any one of aspects 1 to 26, wherein the method detects the RBP-RNA interaction sites in a sample comprising 50 or fewer cells.
[0081] Aspect 28 is the method of any one of aspects 1 to 27, wherein the method detects the RBP-RNA interaction sites in a sample comprising 20 or fewer cells.
[0082] Aspect 29 is the method of any one of aspects 1 to 28, wherein the method detects the RBP-RNA interaction sites in a sample comprising 10 or fewer tissue sections.
[0083] Aspect 30 is the method of any one of aspects 1 to 29, wherein the method detects the RBP-RNA interaction sites in a sample comprising 5 or fewer tissue sections. [0084] Aspect 31 is the method of any one of aspects 1 to 30, wherein the method detects the RBP-RNA interaction sites in a sample comprising 1 tissue section.
[0085] Aspect 32 is the method of any one of aspects 1 to 31, wherein the method detects transient and/or dynamic RNA-RBP interactions.
[0086] Aspect 33 is the method of aspect 32, wherein the method detects the dynamic RNA-RBP interactions during stress granule assembly.
[0087] Aspect 34 is the method of any one of aspects 32 and 33, wherein the method detects the transient and/or dynamic RNA-RBP interactions that occur on a timescale within 10 minutes.
[0088] Aspect 35 is the method of any one of aspects 1 to 34, wherein the method is completed in less than 24 hours.
[0089] Aspect 36 is the method of any one of aspects 1 to 35, wherein the method does not comprise oligo(dT) primer initiated reverse transcription.
[0090] Aspect 37 is the method of any one of aspects 1 to 36, wherein the method does not comprise Tn5 tagmentation.
[0091] Aspect 38 is the method of any one of aspects 1 to 37, wherein the method detects RNA-RBP interaction sites in the nucleus and/or cytoplasm.
[0092] Aspect 39 is the method of any one of aspects 1 to 38, wherein the method detects RBP binding to polyadenylated RNAs.
[0093] Aspect 40 is the method of any one of aspects 1 to 39, wherein the method detects RBP binding to non-polyadenylated RNAs.
[0094] Aspect 41 is the method of any one of aspects 1 to 40, wherein the method provides a quantitative and/or qualitative measurement of the binding strength of the RBP of interest to different RNA substrates.
[0095] Aspect 42 is the method of any one of aspects 1 to 41, wherein the method captures a distinctive binding pattern for the RBP of interest.
[0096] Aspect 43 is the method of aspect 42, wherein the distinctive binding pattern of the RBP of interest indicates a difference in splicing.
[0097] Aspect 44 is the method of any one of aspects 1 to 43, wherein the method measures relative binding strength of multiple RBPs of interest.
[0098] Aspect 45 is the method of any one of aspects 1 to 44, wherein the method generates a complex sequencing library.
[0099] Aspect 46 is the method of any one of aspects 1 to 45, wherein the fixing step (a) comprises rapidly freezing the sample. [0100] Aspect 47 is the method of any one of aspects 1 to 46, wherein the fixing step (a) comprises treating the sample with formaldehyde.
[0101] Aspect 48 is the method of any one of aspects 1 to 47, wherein the fixing step (a) comprises treating the sample with paraformaldehyde (PF A).
[0102] Aspect 49 is the method of aspect 48, wherein the fixing step (a) comprises treating the sample with 1.5% paraformaldehyde.
[0103] Aspect 50 is the method of any one of aspects 1 to 49, wherein the fixing step (a) occurs for at least 10 minutes.
[0104] Aspect 51 is the method of any one of aspects 1 to 50, wherein the fixing step (a) occurs at room temperature.
[0105] Aspect 52 is the method of any one of aspects 1 to 51, wherein the fixing step (a) is quenched with glycine.
[0106] Aspect 53 is the method of aspect 52, wherein the quenching glycine is at a concentration of 125 mM.
[0107] Aspect 3 is the method of any one of aspects 52 and 53, wherein the quenching step occurs for at least 5 minutes.
[0108] Aspect 54 is the method of any one of aspects 52 to 54, wherein the quenching step occurs at room temperature.
[0109] Aspect 56 is the method of any one of aspects 1 to 55, wherein the agent that permeabilizes cell membranes comprises 0.5% Triton X-100.
[0110] Aspect 57 is the method of any one of aspects 1 to 56, wherein the contacting step (b) occurs for at least 10 minutes.
[0111] Aspect 58 is the method of any one of aspects 1 to 57, wherein the contacting step (b) occurs on ice.
[0112] Aspect 59 is the method of any one of aspects 1 to 58, wherein at least one RNase is optionally provided to the sample following the contacting step (b).
[0113] Aspect 60 is the method of aspect 59, wherein the providing of the at least one RNase improves resolution during the sequencing step (f).
[0114] Aspect 61 is the method of any one of aspects 59 and 60, wherein the at least one RNase comprises RNase I, RNase A, and/or RNase Tl.
[0115] Aspect 62 is the method of any one of aspects 59 to 61, wherein the at least one RNase is provided to the sample for at least 5 minutes.
[0116] Aspect 63 is the method of any one of aspects 59 to 62, wherein the at least one RNase is provided to the sample at 37°C. [0117] Aspect 64 is the method of any one of aspects 1 to 63, wherein the RBP targeting agent comprises a first antibody that is directed to the RBP of interest.
[0118] Aspect 65 is the method of any one of aspects 1 to 64, wherein the RBP targeting step (c) occurs for at least 60 minutes.
[0119] Aspect 66 is the method of any one of aspects 1 to 65, wherein the RBP targeting step (c) occurs at room temperature.
[0120] Aspect 67 is the method of any one of aspects 1 to 66, wherein a second antibody is optionally provided to the sample following the RBP targeting step (c).
[0121] Aspect 68 is the method of aspect 67, wherein the second antibody is directed to the first antibody.
[0122] Aspect 69 is the method of any one of aspects 67 and 68, wherein the second antibody is directed to a fragment crystallizable (Fc) region of the first antibody.
[0123] Aspect 70 is the method of any one of aspects 67 to 69, wherein the second antibody is provided to the sample for at least 30 minutes.
[0124] Aspect 71 is the method of any one of aspects 67 to 70, wherein the second antibody is provided to the sample at room temperature.
[0125] Aspect 72 is the method of any one of aspects 67 to 71, wherein providing the second antibody increases a local antibody concentration around the RBP of interest.
[0126] Aspect 73 is the method of any one of aspects 67 to 72, wherein not providing the second antibody reduces inaccuracy during the sequencing step (f).
[0127] Aspect 74 is the method of any one of aspects 67 to 73, wherein the first and/or second antibody are optionally tagged.
[0128] Aspect 75 is the method of aspect 74, wherein the tag comprises a fhiorophore.
[0129] Aspect 76 is the method of any one of aspects 1 to 75, wherein the sample is blocked before the RBP targeting agent is provided to the sample.
[0130] Aspect 77 is the method of aspect 76, wherein the sample is blocked with bovine serum albumin (BSA).
[0131] Aspect 78 is the method of aspect 77, wherein the BSA is at a concentration of at least or greater than about 1 mg/mL.
[0132] Aspect 79 is the method of any one of aspects 76 to 78, wherein the sample is blocked for at least 30 minutes.
[0133] Aspect 80 is the method of any one of aspects 76 to 79, wherein the sample is blocked at room temperature. [0134] Aspect 81 is the method of any one of aspects 1 to 80, wherein the targeting moiety is fused to a reverse transcriptase (RTase).
[0135] Aspect 82 is the method of aspect 81, wherein the targeting moiety fused to the RTase allows site-specific delivery of the RTase to the RBP of interest.
[0136] Aspect 83 is the method of any one of aspects 81 and 82, wherein the RTase comprises Moloney murine leukemia virus (MMLV) RTase, AMV RTase, human immunodeficiency virus (HIV) RTase, or any functional variant thereof.
[0137] Aspect 84 is the method of any one of aspects 81 to 83, wherein the RTase comprises a functional truncated MMLV RTase.
[0138] Aspect 85 is the method of aspect 84, wherein the functional truncated MMLV RTase does not include an H domain and does not include the first 24 N-terminal residues of MMLV RTase.
[0139] Aspect 86 is the method of any one of aspects 81 to 85, wherein the RTase comprises a sequence at least 80% identical to SEQ ID 1-6.
[0140] Aspect 87 is the method of any one of aspects 1 to 86, wherein the targeting moiety comprises an scFv domain.
[0141] Aspect 88 is the method of any one of aspects 1 to 87, wherein the targeting moiety comprises a third antibody.
[0142] Aspect 89 is the method of any one of aspects 1 to 88, wherein the targeting moiety comprises an Fc binding protein.
[0143] Aspect 90 is the method of any aspect 89, wherein the Fc binding protein is protein A, protein G, protein A/G (pAG), protein L, anti-rabbit IgG, and/or anti-mouse IgG.
[0144] Aspect 91 is the method of any one of aspects 89 and 90, wherein the Fc binding protein is protein A/G (pAG).
[0145] Aspect 92 is the method of aspect 90, wherein the pAG binds the Fc domain of both the first and second antibody.
[0146] Aspect 93 is the method of any one of aspects 90 to 92, wherein the targeting moiety fused with the RTase comprises pAG-RTase.
[0147] Aspect 94 is the method of any one of aspects 1 to 93, wherein the targeting moiety is optionally tagged.
[0148] Aspect 95 is the method of aspect 94, wherein the tag comprises a fhiorophore.
[0149] Aspect 96 is the method of any one of aspects 81 to 95, wherein the targeting moiety is fused to the RTase via a short linker, a medium linker, or a long linker. [0150] Aspect 97 is the method of aspect 96, wherein the short linker comprises three amino acids.
[0151] Aspect 98 is the method of aspect 96, wherein the medium linker comprises thirteen amino acids.
[0152] Aspect 99 is the method of aspect 96, wherein the long linker comprises thirty amino acids.
[0153] Aspect 100 is the method of any one of aspects 96 to 99, wherein the short and/or medium linkers reduce inaccuracy during the sequencing step (f).
[0154] Aspect 101 is the method of any one of aspects 1 to 100, wherein the targeting moiety is provided to the sample for at least 30 minutes.
[0155] Aspect 102 is the method of any one of aspects 1 to 101, wherein the targeting moiety is provided to the sample at room temperature.
[0156] Aspect 103 is the method of any one of aspects 1 to 102, wherein the conditions to produce cDNA comprise: (i) providing the sample with a reverse transcription (RT) reaction mixture, and (ii) halting RT.
[0157] Aspect 104 is the method of aspect 103, wherein the RT reaction mixture comprises at least one primer, dNTPs, and other components.
[0158] Aspect 105 is the method of aspect 104, wherein the at least one primer comprises an adapter-RT primer fused to random RT primers.
[0159] Aspect 106 is the method of aspect 105, wherein the random RT primers are not hexamers.
[0160] Aspect 107 is the method of any one of aspects 105 and 106, wherein the random RT primers are at least octamers, and optionally decamers.
[0161] Aspect 108 is the method of any one of aspects 105 to 107, wherein the adapter-RT primer is at a concentration of 2 pM.
[0162] Aspect 109 is the method of any one of aspects 105 to 108, wherein the adapter-RT primer comprises a sequence at least 80% identical to SEQ. ID. NO. 25.
[0163] Aspect 110 is the method of any one of aspects 104 to 109, wherein the dNTPs comprise at least one labelled dNTP.
[0164] Aspect 111 is the method of aspect 110, wherein the labelled dNTP is labeled with biotin.
[0165] Aspect 112 is the method of any one of aspects 110 and 111, wherein the labelled dNTP is labeled with biotin-16. [0166] Aspect 113 is the method of any one of aspects 110 to 112, wherein the labeled dNTP is mixed with a corresponding non-labeled dNTP.
[0167] Aspect 114 is the method of aspect 112, wherein the labeled dNTP is mixed with a non-labeled dNTP at a ratio of 1 : 1.
[0168] Aspect 115 is the method of any one of aspects 104 to 114, wherein the dNTPs comprise a combination of biotin- 16-dUTP, biotin- 16-dCTP, dTTP, dCTP, dATP, and/or dGTP.
[0169] Aspect 116 is the method of aspect 115, wherein the biotin- 16-dUTP is at a concentration of at least 0.05 mM.
[0170] Aspect 117 is the method of any one of aspects 115 and 116, wherein the biotin-16- dCTP is at a concentration of at least 0.05 mM.
[0171] Aspect 118 is the method of any one of aspects 115 to 117, wherein the dTTP is at a concentration of at least 0.05 mM.
[0172] Aspect 119 is the method of any one of aspects 115 to 118, wherein the dCTP is at a concentration of at least 0.05 mM.
[0173] Aspect 120 is the method of any one of aspects 115 to 119, wherein the dATP is at a concentration of at least 0. 1 mM.
[0174] Aspect 121 is the method of any one of aspects 115 to 120, wherein the dGTP is at a concentration of at least 0. 1 mM.
[0175] Aspect 122 is the method of any one of aspects 104 to 121, wherein the other components comprise a non-competitive inhibitor of pancreatic-type ribonucleases, a buffer, and/or MgCh.
[0176] Aspect 123 is the method of aspect 122, wherein the non-competitive inhibitor of pancreatic-type ribonucleases comprises at least 1.0 U/pl RNaseOUT.
[0177] Aspect 124 is the method of any one of aspects 122 and 123, wherein the buffer comprises at least 50 pl of DPBS.
[0178] Aspect 125 is the method of any one of aspects 122 to 124, wherein the MgCh is at a concentration of at least 3 mM.
[0179] Aspect 126 is the method of any one of aspects 104 to 125, wherein the RT mixture is provided to the sample for at least 30 minutes.
[0180] Aspect 127 is the method of any one of aspects 104 to 126, wherein the RT mixture is provided to the sample at least at 37 °C-42 °C.
[0181] Aspect 128 is the method of any one of aspects 104 to 127, wherein the cDNA comprises the dNTPs. [0182] Aspect 129 is the method of any one of aspects 1 to 128, wherein the cDNA is biotinylated.
[0183] Aspect 130 is the method of any one of aspects 104 to 129, wherein halting RT comprises providing at least one chelating agent to the sample.
[0184] Aspect 131 is the method of aspect 130, wherein the at least one chelating agent comprises, consists, or consists essentially of EDTA and/or EGTA.
[0185] Aspect 132 is the method of aspect 131, wherein the EDTA is at a concentration of at least 20 mM.
[0186] Aspect 133 is the method of any one of aspects 131 and 132, wherein the EGTA is at a concentration of at least 10 mM.
[0187] Aspect 134 is the method of any one of aspects 130 to 133, wherein the at least one chelating agent is provided to the sample for at least 3 minutes.
[0188] Aspect 135 is the method of any one of aspects 130 to 134, wherein the at least one chelating agent is provided to the sample at room temperature.
[0189] Aspect 136 is the method of any one of aspects 1 to 135, wherein the method further comprises an optional in-situ imaging step after the incubating step (e).
[0190] Aspect 137 is the method of aspect 136, wherein the in-situ imaging step comprises: (i) first providing the sample with an imaging antibody, (ii) second providing the sample with a cell-permanent nuclear counterstain, and (iii) determining fluorescence intensity of the sample.
[0191] Aspect 138 is the method of any one of aspects 136 and 137, wherein the in-situ imaging is by fluorescence imaging.
[0192] Aspect 139 is the method of any one of aspects 136 to 138, wherein the optional in- situ imaging step provides direct spatial information of the RBP-RNA interaction sites.
[0193] Aspect 140 is the method of any one of aspects 136 to 139, wherein the optional in- situ imaging step provides distinct binding patterns of the RBP-RNA interaction sites.
[0194] Aspect 141 is the method of any one of aspects 136 to 140, wherein the optional in- situ imaging step reveals subcellular localization of the RBP of interest.
[0195] Aspect 142 is the method of any one of aspects 136 to 141, wherein the optional in- situ imaging step does not impede amplification of the cDNA when the optional second antibody and targeting moiety are tagged with a fluorophore.
[0196] Aspect 143 is the method of any one of aspects 136 to 142, wherein the optional in- situ imaging step demonstrates regulatory differences among reader proteins. [0197] Aspect 144 is the method of aspect 143, wherein the reader proteins comprises RNA N6-methylad enosine (m6A) reader proteins.
[0198] Aspect 145 is the method of aspect 144, wherein the m6A reader proteins comprise
YTH family proteins or IGF2BP proteins.
[0199] Aspect 146 is the method of any one of aspects 137 to 145, wherein the imaging antibody targets the biotinylated cDNA.
[0200] Aspect 147 is the method of any one of aspects 137 to 146, wherein the imaging antibody comprises a biotin monoclonal antibody.
[0201] Aspect 148 is the method of any one of aspects 137 to 147, wherein the imaging antibody is provided to the sample for at least 60 minutes.
[0202] Aspect 149 is the method of any one of aspects 137 to 148, wherein the imaging antibody is provided to the sample at room temperature.
[0203] Aspect 150 is the method of any one of aspects 137 to 149, wherein the cellpermanent nuclear counterstain emits blue fluorescence when bound to dsDNA.
[0204] Aspect 151 is the method of any one of aspects 137 to 150, wherein the cellpermanent nuclear counterstain is at a concentration of 1.0 pg/mL.
[0205] Aspect 152 is the method of any one of aspects 137 to 151, wherein the cellpermanent nuclear counterstain is provided to the sample for at least 15 minutes.
[0206] Aspect 153 is the method of any one of aspects 137 to 152, wherein the cellpermanent nuclear counterstain is provided to the sample at room temperature.
[0207] Aspect 154 is the method of any one of aspects 1 to 153, wherein the method further comprises an optional cell digestion step after the incubating step (e) or optional in-situ imaging step.
[0208] Aspect 155 is the method of aspect 154, wherein the optional cell digestion step comprises treating the sample with a protease.
[0209] Aspect 156 is the method of aspect 155, wherein the protease comprises an endolytic protease.
[0210] Aspect 157 is the method of aspect 156, wherein the endolytic protease comprises proteinase K.
[0211] Aspect 158 is the method of any one of aspects 154 to 157, wherein the optional cell digestion step occurs for at most 120 minutes.
[0212] Aspect 159 is the method of any one of aspects 154 to 158, wherein the optional cell digestion step occurs at most at 37 °C. [0213] Aspect 160 is the method of any one of aspects 1 to 159, wherein the cDNA sequencing step (f) produces a binding profile for the RBP of interest.
[0214] Aspect 161 is the method of any one of aspects 1 to 160, wherein the cDNA sequencing step (f) comprises amplifying the cDNA, purifying the amplified cDNA, and high- throughput sequencing the purified cDNA.
[0215] Aspect 162 is the method of aspect 161, wherein amplifying the cDNA comprises PCR amplifying the cDNA with next generation sequencing (NGS) primers.
[0216] Aspect 163 is the method of any one of aspects 161 and 162, wherein purifying the amplified cDNA comprises gel electrophoresis and extraction of the amplified cDNA.
[0217] Aspect 164 is the method of any one of aspects 160 to 163, wherein the cDNA sequencing step (f) further comprises trimming 3 '-ends from the binding profile to remove imperfectly paired sequences.
[0218] Aspect 165 is the method of identifying RNA-binding Protein (RBP)-RNA interaction sites in a biological sample, comprising: (i) fixing the sample; (ii) contacting the sample with an agent that permeabilizes cell membranes; (iii) optionally providing an RNase to the sample; (iv) optionally washing the sample; (v) optionally blocking the sample; (vi) optionally washing the sample; (vii) providing an RBP-targeting agent to the sample, wherein the RBP-targeting agent interacts with an RBP of interest; (viii) optionally washing the sample; (ix) providing a targeting moiety to the sample, wherein the targeting moiety interacts with the RBP-targeting agent; (x) optionally providing a second antibody to the sample, wherein the second antibody is directed to the targeting moiety; (xi) optionally washing the sample; (xii) incubating the sample under conditions to produce cDNA; (xiii) optionally imaging the sample in-situ; (xiv) optionally digesting cells contained in the sample and (xv) sequencing the cDNA. [0219] Aspect 166 is a kit comprising, in suitable contained s), an RBP-targeting agent, a targeting moiety, at least one primer, dNTPs, and one or more nucleic acid buffers.
[0220] Aspect 167 is a kit of aspect 166, wherein the RBP targeting agent comprises a first antibody that is directed to the RBP of interest.
[0221] Aspect 168 is a kit of any one of aspects 166 and 167, wherein the kit further comprises a second antibody.
[0222] Aspect 169 is a kit of aspect 168, wherein the second antibody is directed to the first antibody.
[0223] Aspect 170 is a kit of any one of aspects 168 and 169, wherein the first antibody, the second antibody, and/or the targeting moiety are optionally tagged.
[0224] Aspect 171 is a kit of aspect 170, wherein the tag comprises a fhiorophore. [0225] Aspect 172 is a kit of any one of aspects 166 and 167, wherein the targeting moiety interacts with the RBP-targeting agent.
[0226] Aspect 173 is a kit of any one of aspects 166 to 172, wherein the targeting moiety is fused to a reverse transcriptase (RTase).
[0227] Aspect 174 is a kit of aspect 173, wherein the RTase comprises Moloney murine leukemia virus (MMLV) RTase, human immunodeficiency virus (HIV) RTase, AMV RTase, or any functional variant thereof.
[0228] Aspect 175 is a kit of aspect 174, wherein the RTase comprises a truncated MMLV RTase.
[0229] Aspect 176 is a kit of aspect 175, wherein the truncated MMLV RTase does not include an H domain and does not include the first 24 N-terminal residues of MMLV RTase.
[0230] Aspect 177 is a kit of any one of aspects 173 to 176, wherein the RTase comprises a sequence at least 80% identical to SEQ ID 1-6.
[0231] Aspect 178 is a kit of any one of aspects 166 to 177, wherein the targeting moiety comprises an scFv domain.
[0232] Aspect 179 is a kit of any one of aspects 166 to 178, wherein the targeting moiety comprises a third antibody.
[0233] Aspect 180 is a kit of any one of aspects 166 to 179, wherein the targeting moiety comprises an Fc binding protein.
[0234] Aspect 181 is a kit of aspect 180, wherein the Fc binding protein is protein A, protein G, protein A/G (pAG), protein L, anti-rabbit IgG, and/or anti-mouse IgG.
[0235] Aspect 1 is a kit of aspect 182, wherein the Fc binding protein is protein A/G (pAG).
[0236] Aspect 183 is a kit of aspect 181, wherein the p AG binds the Fc domain of both the first and second antibody.
[0237] Aspect 184 is a kit of any one of aspects 173 to 183, wherein the targeting moiety fused with the RTase comprises pAG-RTase.
[0238] Aspect 185 is a kit of any one of aspects 166 to 184, wherein the at least one primer comprises an adapter-RT primer fused to random RT primers.
[0239] Aspect 186 is a kit of aspect 185, wherein the random RT primers are not hexamers.
[0240] Aspect 187 is a kit of any one of aspects 185 and 186, wherein the random RT primers are at least octamers, and optionally decamers.
[0241] Aspect 188 is a kit of any one of aspects 185 to 187, wherein the adapter-RT primer is at a concentration of at least 2 pM. [0242] Aspect 189 is a kit of any one of aspects 185 to 188, wherein the adapter-RT primer comprises a sequence at least 80% identical to SEQ. ID. NO. 25.
[0243] Aspect 190 is a kit of any one of aspects 166 to 189, wherein the dNTPs comprise at least one labelled dNTP.
[0244] Aspect 191 is a kit of aspect 190, wherein the labelled dNTP is labeled with biotin.
[0245] Aspect 192 is a kit of any one of aspects 190 and 191, wherein the labelled dNTP is labeled with biotin-16.
[0246] Aspect 193 is a kit of any one of aspects 190 to 192, wherein the labeled dNTP is mixed with a corresponding non-labeled dNTP.
[0247] Aspect 194 is a kit of aspect 193, wherein the labeled dNTP is mixed with a nonlabeled dNTP at a ratio of at least 1 :1.
[0248] Aspect 195 is a kit of any one of aspects 166 to 194, wherein the dNTPs comprise a combination of biotin- 16-dUTP, biotin- 16-dCTP, dTTP, dCTP, dATP, and/or dGTP.
[0249] Aspect 196 is a kit of aspect 195, wherein the biotin- 16-dUTP is at a concentration of at least 0.05 mM.
[0250] Aspect 197 is a kit of any one of aspects 195 and 196, wherein the biotin- 16-dCTP is at a concentration of at least 0.05 mM.
[0251] Aspect 198 is a kit of any one of aspects 195 to 197, wherein the dTTP is at a concentration of at least 0.05 mM.
[0252] Aspect 199 is a kit of any one of aspects 195 to 198, wherein the dCTP is at a concentration of at least 0.05 mM.
[0253] Aspect 200 is a kit of any one of aspects 195 to 199, wherein the dATP is at a concentration of at least 0. 1 mM.
[0254] Aspect 201 is a kit of any one of aspects 195 to 200, wherein the dGTP is at a concentration of at least 0. 1 mM.
[0255] Aspect 202 is a kit of any one of aspects 166 to 201, wherein the kit further comprises a cell-fixing agent and a quenching agent.
[0256] Aspect 203 is a kit of aspect 202, wherein the cell-fixing agent comprises paraformaldehyde.
[0257] Aspect 204 is a kit of any one of aspects 202 and 203, wherein the quenching agent comprises glycine.
[0258] Aspect 205 is a kit of any one of aspects 166 to 204, wherein the kit further comprises at least one RNase. [0259] Aspect 206 is a kit of aspect 205, wherein the at least one RNase comprises RNase I, RNase A, and/or RNase Tl.
[0260] Aspect 207 is a kit of any one of aspects 166 to 206, wherein the kit further comprises a non-competitive inhibitor of pancreatic-type ribonucleases, a buffer, and/or MgCl2.
[0261] Aspect 208 is a kit of aspect 207, wherein the non-competitive inhibitor of pancreatic-type ribonucleases comprises at least 1.0 U/pl RNaseOUT.
[0262] Aspect 209 is a kit of any one of aspects 207 and 208, wherein the buffer comprises at least 50 pl of DPBS.
[0263] Aspect 210 is a kit of any one of aspects 207 to 209, wherein the MgCl2 is at a concentration of at least 3 mM.
[0264] Aspect 211 is a kit of any one of aspects 166 to 210, wherein the kit further comprises at least one chelating agent.
[0265] Aspect 212 is a kit of aspect 211, wherein the at least one chelating agent comprises, consists, or consists essentially of EDTA and/or EGTA.
[0266] Aspect 213 is a kit of aspect 212, wherein the EDTA is at a concentration of at least
20 mM.
[0267] Aspect 214 is a kit of any one of aspects 212 and 213, wherein the EGTA is at a concentration of at least 10 mM.
[0268] Aspect 215 is a kit comprising, in suitable contained s), an RBP-targeting agent, a targeting moiety, at least one primer, dNTPs, one or more nucleic acid buffers, a second antibody, a cell-fixing agent, a quenching agent, at least one RNase, a non-competitive inhibitor of pancreatic-type ribonucleases, a buffer, MgCl2, and/or at least one chelating agent. [0269] Aspect 216 is a kit comprising, in suitable contained s): a first antibody directed to an RBP of interest, a pAG-RTase, at least 2 pM of an adapter-RT primer fused to random octamer RT primers, at least 0.05 mM of biotin- 16-dUTP, at least 0.05 mM of biotin- 16-dCTP, at least 0.05 mM of dTTP, at least 0.05 mM of dCTP, at least 0.1 mM of dATP, at least 0.1 mM of dGTP, one or more nucleic acid buffers, a second antibody directed to the first antibody, paraformaldehyde, glycine, RNase I, at least 1.0 U/pl RNaseOUT, at least 50 pl of DPBS, at least 3 mM MgCl2, at least 20 mM EDTA, and/or at least 10 mM EGTA.
[0270] Aspect IB is a polypeptide construct comprising: a) a targeting moiety; and b) a reverse transcriptase enzyme, or a functional variant there.
[0271] Aspect 2B is the polypeptide construct of aspect IB, wherein the targeting moiety is a Fc binding protein or variant thereof, an antibody or variant thereof, an oligonucleotide or variant thereof, a receptor or variant thereof, a ligand, a small molecule, an aptamer, a nucleoside, or any combination thereof.
[0272] Aspect 3B is the polypeptide construct of aspect 2B, wherein the targeting moiety comprises, consists essentially of, or consists of a Fc binding protein or a variant thereof.
[0273] Aspect 4B is the polypeptide construct of aspect 2B, wherein the targeting moiety comprises, consists essentially of, or consists of an antibody or variant thereof.
[0274] Aspect 5B is the polypeptide construct of aspect 2B, wherein the targeting moiety comprises, consists essentially of, or consists of a oligonucleotide or a variant thereof.
[0275] Aspect 6B is the polypeptide construct of aspect 5B, wherein the oligonucleotide comprises a barcode, indices, affinity tag, label, a modified nucleotide, or any combination thereof.
[0276] Aspect 7B is the polypeptide construct of aspect 6B, wherein the affinity tag comprises a streptavidin, or an avidin tag.
[0277] Aspect 8B is the polypeptide construct of aspect 2B, wherein the targeting moiety comprises a small molecule.
[0278] Aspect 9B is the polypeptide construct of aspect 3B, wherein the Fc binding protein comprises, consists essentially of, or consists of protein A, protein G, protein A/G (pAG), protein L, anti-rabbit IgG, anti-mouse IgG, or a variant thereof, or any combination thereof.
[0279] Aspect 10B is the polypeptide construct of aspect 9B, wherein the Fc binding protein comprises, consists essentially of, or consists of pAG.
[0280] Aspect 11B is the polypeptide construct of aspect 9B, wherein the Fc binding protein comprises an amino acid sequence as set forth is any one of SEQ ID NOs: 8, 10, and 12, or an amino acid sequence at least about 60% at least about 70%, at least about 80%, at least about 90%, or at least about 95% identical thereto.
[0281] Aspect 12B is the polypeptide construct of any one of aspect IB-1 IB, wherein the reverse transcriptase comprises, consists essentially of, or consists of Moloney murine leukemia virus (MMLV) RTase, human immunodeficiency virus (HIV) RTase, AMV RTase or a functional variant thereof.
[0282] Aspect 13B is the polypeptide construct of any one of aspects 1B-12B, wherein the reverse transcriptase protein comprises an amino acid sequence as set forth is any one of SEQ ID NOs: 2, 4, and 6, or an amino acid sequence at least about 60%, at least about 70%, at least about 80%, at least about 90%, or at least about 95% identical thereto, or a functional variant thereof. [0283] Aspect 14B is the polypeptide construct of any one of aspects 1B-13B, further comprising one or more linker sequences directly or indirectly bound to the targeting moiety and the reverse transcriptase.
[0284] Aspect 15B is the polypeptide construct of any one of aspects 1B-14B, wherein the one or more linker sequences are at least, equal to, or at most, 2-100 amino acids in length.
[0285] Aspect 16B is the polypeptide construct of aspect 15B, wherein the one or more linker sequences are 2-100 amino acids in length, 2-10 amino acids in length, 11-20 amino acids in length, 21-30 amino acids in length, 31-40 amino acids in length, 41-50 amino acids in length, 51-60 amino acids in length, 61-70 amino acids in length, 71-80 amino acids in length, 81-90 amino acids in length, or 91-100 amino acids in length.
[0286] Aspect 17B is the polypeptide construct of any one of aspects 16B, wherein the linker comprises an amino acid sequence as set forth in SEQ ID NO: 28, or a sequence at least 80%, at least about 85%, at least about 90%, at least about 95%, or at least about 99% identical thereto.
[0287] Aspect 18B is the polypeptide construct of any one of aspects 1B-17B, further comprising a fluorophore.
[0288] Aspect 19B is the polypeptide construct of aspect 18B, wherein the fluorophore comprises, consists essentially of, or consists of Green Fluorescent Protein (GFP), eGFP, Red Fluorescent Protein (RFP), Teal Fluorescent Protein (TFP), Blue Fluorescent Protein (BFP), Yellow Fluorescent Protein (YFP), miRFP, cerulean fluorescent protein (CFP), eCyanFP, mCherry, mVenus, mOrange, mTurquoise, tdTomato, aminocoumarin, fluorescein, texas red, Alexa Fluor dyes (e.g. Alexa Fluor 488, Alexa Fluor 555, Alexa Fluor 594, Alexa Fluor 647, Alexa Fluor 350, Alexa Fluor 532, and Alexa Fluor 700), Cy dyes (e.g. Cy3, Cy5), DyLight dyes, FITC, or Rhodamine, or functional variants thereof.
[0289] Aspect 20B is the polypeptide construct of any one of aspects 1B-19B, further comprising a purification and/or a solubilization tag.
[0290] Aspect 2 IB is the polypeptide construct of aspect 20B, wherein the purification and/or a solubilization tag comprises, consists essentially of, or consists of a maltose binding protein (MBP) tag, a GST-tag, a FLAG tag, an HA tag, a His-tag, a SUMO-tag, a Trx-tag, a Halo-tag, or any combination thereof.
[0291] Aspect 22B is the polypeptide construct of aspect 2 IB, wherein the purification and or a solubilization tag comprises an amino acid sequence as set forth in SEQ ID NO: 30, or a sequence at least 80%, at least about 85%, at least about 90%, at least about 95%, or at least about 99% identical thereto. [0292] Aspect 23B is the polypeptide construct of any one of aspects 1B-23B, further comprising a peptide leader sequence.
[0293] Aspect 24B is a transcriptase composition comprising the polypeptide construct of any one of aspects 1B-23B, and a transcriptase mix comprising one or more adapter-RT primer, wherein the one or more adapter RT-primer each comprises an adapter primer sequence and an RT primer sequence.
[0294] Aspect 25B is the transcriptase composition of aspect 24B, wherein at least one of the one or more RT primer sequence is a random RT primer.
[0295] Aspect 26B is the transcriptase composition of aspect 25B, wherein the random RT primer comprises at least 7 nucleotides.
[0296] Aspect 27B is the transcriptase composition of aspect 25B, wherein the random RT primer is at least 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, or more, nucleotides in length.
[0297] Aspect 28B is the transcriptase composition of any one of aspects 24B-27B, wherein the adapter primer sequence comprises a sequencing barcode.
[0298] Aspect 29B is the transcriptase composition of any one of aspects 24B-28B, wherein the transcriptase mix further comprises non-labeled dNTPs, labeled dNTPs, or any combination thereof.
[0299] Aspect 30B is the transcriptase composition of aspect 29B, wherein the labeled dNTPs are biotinylated dNTPs.
[0300] Aspect 3 IB is the transcriptase composition of aspect 30B, wherein the biotinylated dNTPs comprises, consists essentially of, or consists of biotin- 16-dUTP, or biotin- 16-dCTP, or both.
[0301] Aspect 32B is the transcriptase composition of any one of aspects 29B-31B, wherein the labeled dNTP and the non-labeled dNTP are at a ratio of at least 0.5:1, 1 :1, or 2:1. [0302] Aspect 33B is the transcriptase composition of any one of aspects 24B-32B, wherein the RT sequence primer further comprises an azide functional group.
[0303] Aspect 34B is the transcriptase composition of any of aspects 24B-33B, wherein the adapter-RT primer comprises a nucleotide sequence as set forth in as set forth in SEQ ID NO: 25, or a sequence at least 80%, at least about 85%, at least about 90%, at least about 95%, or at least about 99% identical thereto.
[0304] Aspect 35B is a method of determining one or more RNA interaction sites of a RNA-binding Protein (RBP) in a biological sample, comprising: a) contacting (e.g., incubating together) a RBP-targeting agent to the RBP, wherein the RBP-targeting agent specifically binds the RBP to form a primary complex; b) contacting (e.g., incubating) the first complex with one or more secondary binding agents that specifically bind the RBP-targeting agent, to form a secondary complex; c) incubating the first or the secondary complex with the transcriptase composition of any one of aspects 24-34, to obtain cDNA; and d) sequencing the cDNA to determine the one or more RNA interaction sites of the RBP.
[0305] Aspect 36B is the method of aspect 35B, wherein the biological sample is a RNA- protein complex, a cell, or a tissue section.
[0306] Aspect 37B is the method of aspect 35B, further comprising fixing the biological sample with a fixing agent.
[0307] Aspect 38B is the method of aspect 37B, wherein the fixing agent comprises, consists essentially of, or consists of formaldehyde, paraformaldehyde, and/or glutaraldehyde. [0308] Aspect 39B is the method of aspect 38B, wherein the fixing agent is paraformaldehyde at a concentration of about 0.1% to about 5% by volume.
[0309] Aspect 40B is the method of aspect 39B, wherein the paraformaldehyde at a concentration of at least, equal to, about, or more than 0.1%, 0.2%, 0.3%, 0.4%, 0.5%, 0.6%, 0.7%, 0.8%, 0.9%, 1.0%, 1.1%, 1.2%, 1.3%, 1.4%, 1.5%, 1.6%, 1.7%, 1.8%, 1.9%, 2.0%, 2.1%, 2.2%, 2.3%, 2.4%, or 2.5% by volume.
[0310] Aspect 4 IB is the method of any one of aspects 37B-40B, wherein the fixing comprises incubating the biological sample and the fixing agent together for, or for less than 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 minutes.
[0311] Aspect 42B is the method of any one of aspects 37B-41B, further comprising quenching of the fixing agent with a quenching agent.
[0312] Aspect 43B is the method of aspect 42B, wherein the quenching agent comprises, consists essentially of, or consists of glycine.
[0313] Aspect 44B is the method of aspect 42B or aspect 43B, wherein the quenching agent is at a concentration of greater than, equal to, at least, at most, or about 25, 50, 75, 100, 125, 150, 200, 225, or 250 mM.
[0314] Aspect 45B is the method of any one of aspects 35B-44B, wherein the biological sample comprises cell and/or tissue, the method further comprising permeabilizing the cell and/or the tissue section with a permeabilizing agent.
[0315] Aspect 46B is the method of aspect 45B, wherein the permeabilizing agent comprises, consists essentially of, or consists of a detergent.
[0316] Aspect 47B is the method of aspect 46B, wherein the detergent comprises, consists essentially of, or consists of a non-ionizing detergent or an ionizing detergent. [0317] Aspect 48B is the method of aspect 47B, wherein the detergent comprises, consists essentially of, or consists of Triton X-100.
[0318] Aspect 49B is the method of aspect 48B, wherein the Triton-X is at a concentration of greater than, equal to, at least, at most, or about 0. 1%, 0.2%, 0.3%, 0.4%, 0.5%, 0.6%, 0.7%, 0.8%, 0.9%, 1.0%, 1.1%, 1.2%, 1.3%, 1.4%, or 1.5%.
[0319] Aspect 50B is the method of any one of aspects 35B-49B, further comprising incubating the primary and/or the secondary complex with an RNase enzyme.
[0320] Aspect 5 IB is the method of any one of aspects 35B-50B, wherein the RBP is a transcription factor, a splicing factor, RNA helicase, ribonuclease, RNA polymerase, translation initiation factor, or ribosomal protein.
[0321] Aspect 52B is the method of any one of aspects 35B-51B, wherein the RBP comprises YTHDF1, YTHDF2, YTHDC1, HuR, PTB, Musashi, eIF4E, FMRP, LARP1, IMP, hnRNP family proteins, Lin28, AUF1, IGF2BP, FUBP1, LIN28B, RBM5, FUS, TIA1, TTP, QKI, MBNL, CELF, NONO, DDX5, RBM10, SAFB, TDP-43, Ataxin-2, hnRNP A/B, C9orf72, hnRNP H/F, Matrin 3 (MATR3), Pur-alpha, TAF15, Huntingtin, RBFOX, SMN, ELAVL, Ro (SSA) and La (SSB) Proteins, hnRNP, Roquin, Staufenl, NF90/NF110, ILF3, SF3B1, SRSF2, U2AF1, ZRSR2, PRPF8, PRPF31, SNRNP200, HNRNPA1, HNRNP A2B1, NELFE, CPEB1, SRSF1, NOVAI, NOVA2, G3BP1, PTBP1, RBFOX2, and/or HNRNPC.
[0322] Aspect 53B is the method of any one of aspects 35B-52B, wherein the RBP comprises, consists essentially of, or consists of G3BP1, PTBP1, RBFOX2, HNRNPC, YTHDF1, YTHDF2, and/or YTHDC 1.
[0323] Aspect 54B is the method of any one of aspects 35B-53B, wherein the RBP- targeting agent specifically binds the RBP.
[0324] Aspect 55B is the method of aspect 35B or 54B, wherein the RBP-targeting agent comprises, consists essentially of, or consists of an antibody or functional variant thereof.
[0325] Aspect 56B is the method of aspect 55B, wherein the antibody or functional variant thereof comprises a polyclonal antibody, a monoclonal antibody, a chimeric antibody, a human antibody, a veneered antibody, a diabody, a humanized antibody, an antibody derivative, a recombinant antibody, a recombinant humanized antibody, an engineered antibody, single chain antibody, single domain antibody, nanobodies, diabodies, a bi-specific antibody, a multispecific antibody, a DARPin, or a variant of each thereof.
[0326] Aspect 57B is the method of any one of aspects 35B-56B, wherein the secondary binding agent comprises, consists essentially of, or consists of an antibody or functional variant thereof. [0327] Aspect 58B is the method of aspect 57B, wherein the antibody or functional variant thereof comprises a polyclonal antibody, a monoclonal antibody, a chimeric antibody, a human antibody, a veneered antibody, a diabody, a humanized antibody, an antibody derivative, a recombinant antibody, a recombinant humanized antibody, an engineered antibody, single chain antibody, single domain antibody, nanobodies, diabodies, a bi-specific antibody, a multispecific antibody, a DARPin, or a variant of each thereof.
[0328] Aspect 59B is the method of any one of aspects 35B-58B, wherein the RBP- targeting agent is labeled.
[0329] Aspect 60B is the method of aspect 59B, wherein the label comprises, consists essentially of, or consists of a radioisotopes, a hapten, a fluorescent label, a fluorescent polypeptide, a phosphorescent molecule, a chemiluminescent molecule, a chromophore, a luminescent molecule, a photoaffinity molecule, a colored particle, and/or a ligand.
[0330] Aspect 6 IB is the method of any one of aspects 35B-60B, wherein the RBP- targeting agent is linked to a functionalized DNA barcode via an amino spacer.
[0331] Aspect 62B is the method of aspect 6 IB, wherein the functionalized DNA barcode comprises a alkyne (3 '-0 -propargyl N 2'-5' linked) functionalized DNA barcode.
[0332] Aspect 63B is the method of aspect 6 IB or aspect 62B, wherein the alkyne functionalized barcodes comprise a nucleic acid sequence as set forth in any one of SEQ ID NO: 31-78, or a nucleic acid sequence at least about 80%, at least about 85%, at least about 90%, at least about 95%, or at least about 99% identical thereto.
[0333] Aspect 64B is the method of any one of aspects 35B-63B, wherein the secondary binding agent is labeled.
[0334] Aspect 65B is the method of aspect 64B, wherein the label comprises, consists essentially of, or consists of a radioisotopes, a hapten, a fluorescent label, a fluorescent polypeptide, a phosphorescent molecule, a chemiluminescent molecule, a chromophore, a luminescent molecule, a photoaffinity molecule, a colored particle, and/or a ligand.
[0335] Aspect 66B is the method of aspect 64B or 65B, wherein the fluorescent label comprises, consists essentially of, or consists of Green Fluorescent Protein (GFP), eGFP, Red Fluorescent Protein (RFP), Teal Fluorescent Protein (TFP), Blue Fluorescent Protein (BFP), Yellow Fluorescent Protein (YFP), miRFP, cerulean fluorescent protein (CFP), eCyanFP, mCherry, mVenus, mOrange, mTurquoise, tdTomato, aminocoumarin, fluorescein, texas red, Alexa Fluor dyes (e.g. Alexa Fluor 488, Alexa Fluor 555, Alexa Fluor 594, Alexa Fluor 647, Alexa Fluor 350, Alexa Fluor 532, and Alexa Fluor 700), Cy dyes (e.g. Cy3, Cy5), DyLight dyes, FITC, or Rhodamine, or functional variants thereof. [0336] Aspect 67B is the method of any one of aspects 35B-66B, wherein the steps (a) - (c) are conducted in- situ.
[0337] Aspect 68B is the method of any one of aspects 35B-67B, wherein the method further comprises imaging the biological sample.
[0338] Aspect 69B is the method of any one of aspects 35B-68B, wherein the biological sample comprises less than or equal to 1000, 750, 500, 100, 50, or 20 cells, or any range derivable therein, or wherein the biological sample comprises a single cell.
[0339] Aspect 70B is the method of any one of aspects 35B-69B, wherein the biological sample comprises less than 5 tissue sections.
[0340] Aspect 7 IB is the method of aspect 70B, wherein the biological sample comprises a single tissue section.
[0341] Aspect 72B is the method of any one of aspects 35B-71B, wherein the method does not comprise ultraviolet cross-linking.
[0342] Aspect 73B is the method of any one of aspects 35B-72B, wherein the method does not comprise immunoprecipitation.
[0343] Aspect 74B is the method of any one of aspects 35B-73B, wherein the method does not comprise use of base editing proteins.
[0344] Aspect 75B is the method of any one of aspects 35B-74B, wherein the method does not comprise dissociating the one or more tissue section into single cells.
[0345] Aspect 76B is the method of any one of aspects 35B-75B, wherein the method detects transient and/or dynamic RNA-RBP interactions.
[0346] Aspect 77B is the method of any one of aspects 35B-76B, wherein the method detects transient and/or dynamic RNA-RBP interactions that occur on a timescale within 10 minutes.
[0347] Aspect 78B is the method of any one of aspects 35B-77B, wherein the method does not comprise oligo(dT) primer initiated reverse transcription.
[0348] Aspect 79B is the method of any one of aspects 35B-78B, wherein the method does not comprise Tn5 tagmentation.
[0349] Aspect 80B is the method of aspects 5 IB, wherein the RBP is a splicing factor.
[0350] Aspect 8 IB is the method of aspect 80B, wherein the method is used to determine splice variants between one or more biological samples.
[0351] Aspect 82B is the method of aspects 5 IB, wherein the RBP is a YTH family reader protein.
[0352] Aspect 83B is the method of aspects 5 IB, wherein the RBP is G3BP1. [0353] Aspect 84B is the method of any one of aspects 35B-83B, wherein the method can be used to determine one or more interaction sites of the RBP with RNA in the cytoplasm, or nucleus, or both.
[0354] Aspect 85B is the method of any one of aspects 35B-84B, wherein the method is used to measures relative binding strength of the RBP to the RNA in comparison to one or more other RBPs to the RNA.
[0355] Aspect 86B is the method of any one of aspects 35B-85B, wherein the cDNA is labeled.
[0356] Aspect 87B is the method of aspect 86B, wherein the cDNA comprises one or more labeled nucleotides.
[0357] Aspect 88B is the method of aspect 87B, wherein the nucleotides are labeled with a fluorescent label.
[0358] Aspect 89B is the method of aspect 87B or 88B, wherein the labeled nucleotides comprises, consists essentially of, or consists of a biotinylated nucleotide.
[0359] Aspect 90B is the method of aspect 89B, wherein the method further comprises purifying the cDNA with a streptavidin comprising agent.
[0360] Aspect 9 IB is the method of aspect 90B, wherein the streptavidin comprising agent comprises, consists essentially of, or consists of a bead, a plate, a magnetic bead, an agarose bead, a microtiter plate, a nanoparticle, and/or a membrane.
[0361] Aspect 92B is the method of any one of aspects 35B-91B, wherein two or more unique RBP targeting agents that interact with one or more RBPs are used in step (a).
[0362] Aspect 93B is the method of aspect 92B, wherein each of the two or more unique RBP targeting agents comprise a unique functionalized DNA barcode linked via an amino spacer.
[0363] Aspect 94B is the method of aspect 93B, wherein the functionalized DNA barcode comprises, consists essentially of, or consist of a alkyne (3'-O-propargyl N 2'-5' linked) functionalized DNA barcode.
[0364] Aspect 95B is the method of aspect 93B or aspect 94B, wherein the alkyne functionalized barcodes comprise a nucleic acid sequence as set forth in any one of SEQ ID NO: 31-78, or a sequence at least about 80%, at least about 85%, at least about 90%, at least about 95%, or at least about 99% identical thereto.
[0365] Aspect 96B is a method of in-situ imaging of one or more RNA interaction sites of an RNA-binding Protein (RBP) in a biological sample bound to a solid surface, comprising: a) contacting (e.g., incubating together) a RBP-targeting agent to the RBP, wherein the RBP- targeting agent specifically binds the RBP to form a primary complex; b) contacting (e.g., incubating together) the first complex with one or more secondary binding agents that specifically binds the RBP-targeting agent, to form a secondary complex; c) incubating the primary or the secondary complex with a transcriptase composition of any one of aspects 24- 34, to obtain cDNA; and d) imaging the solid surface.
[0366] Aspect 97B is the method of aspect 96B, wherein the RBP-targeting agent or the one or more secondary binding agents or any combination thereof are labeled.
[0367] Aspect 98B is the method of aspect 97B, wherein the label comprises, consists essentially of, or consists of radioisotopes, a hapten, a fluorescent label, a fluorescent polypeptide, a phosphorescent molecule, a chemiluminescent molecule, a chromophore, a luminescent molecule, a photoaffinity molecule, a colored particle and/or a ligand.
[0368] Aspect 99B is the method of aspect 98B, wherein the fluorescent label comprises, consists essentially of, or consists of Green Fluorescent Protein (GFP), eGFP, Red Fluorescent Protein (RFP), Teal Fluorescent Protein (TFP), Blue Fluorescent Protein (BFP), Yellow Fluorescent Protein (YFP), miRFP, cerulean fluorescent protein (CFP), eCyanFP, mCherry, mVenus, mOrange, mTurquoise, tdTomato, aminocoumarin, fluorescein, texas red, Alexa Fluor dyes (e.g. Alexa Fluor 488, Alexa Fluor 555, Alexa Fluor 594, Alexa Fluor 647, Alexa Fluor 350, Alexa Fluor 532, and Alexa Fluor 700), Cy dyes (e.g. Cy3, Cy5), DyLight dyes, FITC, or Rhodamine, or functional variants thereof.
[0369] Aspect 100B is the method of any one of aspects 96B-99B, wherein the cDNA is labeled.
[0370] Aspect 10 IB is the method of aspect 100B, wherein the cDNA comprises one or more labeled nucleotides.
[0371] Aspect 102B is the method of aspect 101B, wherein the nucleotides are labeled with a fluorescent label.
[0372] Aspect 103B is the method of aspect 10 IB, wherein the labeled nucleotides is biotinylated.
[0373] Aspect 104B is the method of any one of aspects 96B-103B, wherein the imaging is done using fluorescence microscopy.
[0374] Aspect 105B is the method of any one of aspects 96B-104B, wherein the biological sample is a RNA-protein complex, a cell, or a tissue section.
[0375] Aspect 106B is the method of any one of aspects 96B-105B, further comprising fixing the biological sample with a fixing agent. [0376] Aspect 107B is the method of aspect 106B, wherein the fixing agent comprises, consists essentially of, or consists of formaldehyde, paraformaldehyde, and/or glutaraldehyde. [0377] Aspect 108B is the method of aspect 107B, wherein the fixing agent is paraformaldehyde at a concentration of about 0.5% to about 5% by volume.
[0378] Aspect 109B is the method of aspect 108B, wherein the paraformaldehyde at a concentration of about 0.1%, 0.2%, 0.3%, 0.4%, 0.5%, 0.6%, 0.7%, 0.8%, 0.9%, 1.0%, 1.1%, 1.2%, 1.3%, 1.4%, 1.5%, 1.6%, 1.7%, 1.8%, 1.9%, 2.0%, 2.1%, 2.2%, 2.3%, 2.4%, or 2.5% by volume.
[0379] Aspect HOB is the method of any one of aspects 106B-109B, wherein the wherein the fixing comprises incubating the biological sample and the fixing agent for, or for less than 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 minutes.
[0380] Aspect 11 IB is the method of any one of aspects 106B-110B, further comprising quenching of the fixing agent with a quenching agent.
[0381] Aspect 112B is the method of aspect 11 IB, wherein the quenching agent comprises, consists essentially of, or consists of glycine.
[0382] Aspect 113B is the method of aspect 112B, wherein the quenching agent is a concentration of greater than, equal to, at least, at most, or about 25, 50, 75, 100, 125, 150, 200, 225, or 250 mM.
[0383] Aspect 114B is the method of any one of aspects 96B-113B, further comprising permeabilizing the cell and/or the tissue section with a permeabilizing agent.
[0384] Aspect 115B is the method of aspect 114B, wherein the permeabilizing agent comprises, consists essentially of, or consists of a detergent.
[0385] Aspect 116B is the method of aspect 115B, wherein the detergent is a non-ionizing detergent or an ionizing detergent.
[0386] Aspect 117B is the method of aspect 116B, wherein the detergent comprises, consists essentially of, or consists of Triton X-100.
[0387] Aspect 118B is the method of aspect 117B, wherein the Triton-X is at a concentration of greater than, equal to, at least, at most, or about 0.1%, 0.2%, 0.3%, 0.4%, 0.5%, 0.6%, 0.7%, 0.8%, 0.9%, 1.0%, 1.1%, 1.2%, 1.3%, 1.4%, or 1.5%.
[0388] Aspect 119B is the method of any one of aspects 96B-118B, wherein the transcriptase mix further comprises an RNase.
[0389] Aspect 120B is the method of any one of aspects 96B-119B, wherein the RNA binding protein is a transcription factor, a splicing factor, RNA helicase, ribonuclease, RNA polymerase, translation initiation factor, or ribosomal protein. [0390] Aspect 121B is the method of aspect 96B or aspect 120B, wherein the RBP comprises YTHDF1, YTHDF2, YTHDC1, HuR, PTB, Musashi, eIF4E, FMRP, LARP1, IMP, hnRNP family proteins, Lin28, AUF1, IGF2BP, FUBP1, LIN28B, RBM5, FUS, TIA1, TTP, QKI, MBNL, CELF, NONO, DDX5, RBM10, SAFB, TDP-43, Ataxin-2, hnRNP A/B, C9orf72, hnRNP H/F, Matrin 3 (MATR3), Pur-alpha, TAF15, Huntingtin, RBFOX, SMN, ELAVL, Ro (SSA) and La (SSB) Proteins, hnRNP, Roquin, Staufenl, NF90/NF110, ILF3, SF3B1, SRSF2, U2AF1, ZRSR2, PRPF8, PRPF31, SNRNP200, HNRNPA1, HNRNP A2B1, NELFE, CPEB1, SRSF1, NOVAI, NOVA2, G3BP1, PTBP1, RBFOX2, and/or HNRNPC.
[0391] Aspect 122B is the method of aspect 121B, wherein the RBP comprises, consists essentially of, or consists of G3BP1, PTBP1, RBFOX2, HNRNPC, YTHDF1, YTHDF2, and/or YTHDC1.
[0392] Aspect 123B is the method of any one of aspects 96B-122B, wherein the RBP- targeting agent specifically binds the RBP.
[0393] Aspect 124B is the method of aspect 123B, wherein the RBP-targeting agent is an antibody or a functional variant thereof.
[0394] Aspect 125B is the method of aspect 124B, wherein the antibody or the functional variant thereof comprises a polyclonal antibody, a monoclonal antibody, a chimeric antibody, a human antibody, a veneered antibody, a diabody, a humanized antibody, an antibody derivative, a recombinant antibody, a recombinant humanized antibody, an engineered antibody, single chain antibody, single domain antibody, nanobodies, diabodies, a bi-specific antibody, a multi -specific antibody, a DARPin, or a variant of each thereof.
[0395] Aspect 126B is the method of any one of aspects 96B-125B, wherein the one or more secondary binding agent is an antibody, or a functional variant thereof.
[0396] Aspect 127B is the method of aspect 126B, wherein the antibody, or the functional variant thereof comprises a polyclonal antibody, a monoclonal antibody, a chimeric antibody, a human antibody, a veneered antibody, a diabody, a humanized antibody, an antibody derivative, a recombinant antibody, a recombinant humanized antibody, an engineered antibody, single chain antibody, single domain antibody, nanobodies, diabodies, a bi-specific antibody, a multi -specific antibody, a DARPin, or a variant of each thereof.
[0397] Aspect 128B is the method of any one of aspects 96B-127B, wherein the solid surface comprises, consists essentially of, or consists of slide, a multi-well plate, a capillary, or the like.
[0398] Aspect 129B is the method of any one of aspects 96B-128B, further comprising sequencing the cDNA. [0399] Aspect 130B is the method of aspect 129B, wherein the sequencing is performed using Next Generation Sequencing (NGS) techniques.
[0400] Aspect 13 IB is the method of aspect 130B, wherein the sequencing is done using a single cell genomic imaging techniques.
[0401] Aspect 132B is the method of aspect 13 IB, wherein the single cell genomic imaging technique comprises, consists essentially of, or consists of spatial transcriptomics, MERFISH, SeqFISH, STARmap, Slide-Seq, Visium Spatial Gene Expression, or deterministic barcoding in tissue for spatial omics sequencing (DBiT-seq).
[0402] Aspect 133B is the method of aspect 132B, wherein the single cell genomic imaging technique is a microfluidic based technique comprising: ligating a first set and a second set of spatial barcodes to the cDNA of step (c), prior to step (d), wherein the first set of spatial barcodes are contacted to the cDNA horizontally using a first multi-channel microfluidic chip, and wherein the second set of spatial barcodes are contacted to the solid surface vertically using a second multi-channel microfluidic chip.
[0403] Aspect 134B is the method of any one of aspects 13 IB or 133B, wherein the first set of spatial barcodes and second set of spatial barcodes form a 2D spatial barcode array [0404] Aspect 135B is a kit comprising in one or more suitable container(s), an RBP- targeting agent that specifically binds to an RBP, optionally one or more secondary binding agents, and the transcriptase composition of any one of aspects 24B-33B.
[0405] Aspect 136B is the kit of aspect 134B, wherein the RBP is a transcription factor, a splicing factor, RNA helicase, ribonuclease, RNA polymerase, translation initiation factor, or ribosomal protein.
[0406] Aspect 137B is the kit of any one of aspects 135B-136B, wherein the RBP comprises HuR, PTB, Musashi, eIF4E, FMRP, LARP1, IMP, hnRNP family proteins, Lin28, AUF1, IGF2BP, FUBP1, LIN28B, RBM5, FUS, TIA1, TTP, QKI, MBNL, CELF, NONO, DDX5, RBM10, SAFB, TDP-43, Ataxin-2, hnRNP A/B, C9orf72, hnRNP H/F, Matrin 3 (MATR3), Pur-alpha, TAF15, Huntingtin, RBFOX, SMN, ELAVL, Ro (SSA) and La (SSB) Proteins, hnRNP, Roquin, Staufenl, NF90/NF110, ILF3, SF3B1, SRSF2, U2AF1, ZRSR2, PRPF8, PRPF31, SNRNP200, HNRNPA1, HNRNP A2B1, NELFE, CPEB1, SRSF1, N0VA1, N0VA2, G3BP1, PTBP1, RBFOX2, HNRNPC, YTHDF1, YTHDF2, and/or YTHDC1.
[0407] Aspect 138B is the kit of aspect 136B, wherein the RBP comprises, consists essentially of, or consists of G3BP1, PTBP1, RBFOX2, HNRNPC, YTHDF1, YTHDF2, and/or YTHDC1. [0408] Aspect 139B is the kit of any one of aspects 135B-138B, wherein the RBP-targeting agent specifically binds the transcription factor, the splicing factor, the RNA helicase, the ribonuclease, the RNA polymerase, the translation initiation factor, or the ribosomal protein.
[0409] Aspect MOB is the kit of aspect 139B, wherein the RBP-targeting agent is comprises, consists essentially of, or consists of an antibody or functional variant thereof.
[0410] Aspect 141B is the kit of aspect 140B, wherein the antibody or the functional variant thereof comprises a polyclonal antibody, a monoclonal antibody, a chimeric antibody, a human antibody, a veneered antibody, a diabody, a humanized antibody, an antibody derivative, a recombinant antibody, a recombinant humanized antibody, an engineered antibody, single chain antibody, single domain antibody, nanobodies, diabodies, a bi-specific antibody, a multispecific antibody, a DARPin, or a variant of each thereof.
[0411] Aspect 142B is the kit of any one of aspects 135B-141B, wherein the secondary binding agent comprises, consists essentially of, or consists of an antibody or a functional variant thereof.
[0412] Aspect 143B is the kit of aspect 142B, wherein the antibody or the functional variant thereof comprises a polyclonal antibody, a monoclonal antibody, a chimeric antibody, a human antibody, a veneered antibody, a diabody, a humanized antibody, an antibody derivative, a recombinant antibody, a recombinant humanized antibody, an engineered antibody, single chain antibody, single domain antibody, nanobodies, diabodies, a bi-specific antibody, a multispecific antibody, a DARPin, or a variant of each thereof.
[0413] Aspect 144B is the kit of any one of aspects 135B-143B, wherein the RNA binding protein is a transcription factor, a splicing factor, RNA helicase, ribonuclease, RNA polymerase, translation initiation factor, or ribosomal protein.
[0414] Aspect 145B is the kit of aspect 144B, wherein the RBP comprises HuR, PTB, Musashi, eIF4E, FMRP, LARP1, IMP, hnRNP family proteins, Lin28, AUF1, IGF2BP, FUBP1, LIN28B, RBM5, FUS, TIA1, TTP, QKI, MBNL, CELF, NONO, DDX5, RBM10, SAFB, TDP-43, Ataxin-2, hnRNP A/B, C9orf72, hnRNP H/F, Matrin 3 (MATR3), Pur-alpha, TAF15, Huntingtin, RBFOX, SMN, ELAVL, Ro (SSA) and La (SSB) Proteins, hnRNP, Roquin, Staufenl, NF90/NF110, ILF3, SF3B1, SRSF2, U2AF1, ZRSR2, PRPF8, PRPF31, SNRNP200, HNRNPA1, HNRNP A2B1, NELFE, CPEB1, SRSF1, N0VA1, N0VA2, G3BP1, PTBP1, RBFOX2, HNRNPC, YTHDF1, YTHDF2, and/or YTHDC1.
[0415] Aspect 146B is the kit of aspect 145B, wherein the RBP comprises, consists essentially of, or consists of G3BP1, PTBP1, RBFOX2, HNRNPC, YTHDF1, YTHDF2, and/or YTHDC1. [0416] Aspect 147B is the kit of any one of aspects 133B-146B, wherein the RBP-targeting agent specifically binds the RBP.
[0417] Aspect 148B is the kit of aspect 147B, wherein the RBP-targeting agent is comprises, consists essentially of, or consists of an antibody or a functional variant thereof.
[0418] Aspect 149B is the kit of aspect 148B, wherein the antibody or the functional variant thereof comprises monoclonal antibodies, polyclonal antibodies, recombinant antibody, IgG, Fv, single chain antibody, single domain antibodies, nanobodies, diabodies, bispecific and/or multispecific antibodies, scFv, Fab, F(ab')2, Fab, or variants thereof.
[0419] Aspect 150B is the kit of aspect 133B, wherein the secondary binding agent is an antibody or the functional variant thereof.
[0420] Aspect 15 IB is the kit of aspect 150B, wherein the antibody or the functional variant thereof comprises monoclonal antibodies, polyclonal antibodies, recombinant antibody, IgG, Fv, single chain antibody, single domain antibodies, nanobodies, diabodies, bispecific and/or multispecific antibodies, scFv, Fab, F(ab')2, Fab, or variants thereof.
[0421] Aspect 152B is the kit of aspect 135B, or any one of aspects 147B-151B, wherein the RBP-targeting agent is labeled.
[0422] Aspect 153B is the kit of aspect 152B, wherein the label comprises, consists essentially of, or consists of a radioisotopes, a hapten, a fluorescent label, a fluorescent polypeptide, a phosphorescent molecule, a chemiluminescent molecule, a chromophore, a luminescent molecule, a photoaffinity molecule, a colored particle and/or a ligand.
[0423] Aspect 154B is the kit of aspect 135B-151B, wherein the secondary binding agent is labeled.
[0424] Aspect 155B is the kit of aspect 154B, wherein the label comprises, consists essentially of, or consists of radioisotopes, a hapten, a fluorescent label, a fluorescent polypeptide, a phosphorescent molecule, a chemiluminescent molecule, a chromophore, a luminescent molecule, a photoaffinity molecule, a colored particle and/or a ligand.
[0425] Aspect 156B is the kit of aspect 153B or 155B, wherein the fluorescent label is Green Fluorescent Protein (GFP), eGFP, Red Fluorescent Protein (RFP), Teal Fluorescent Protein (TFP), Blue Fluorescent Protein (BFP), Yellow Fluorescent Protein (YFP), miRFP, cerulean fluorescent protein (CFP), eCyanFP, mCherry, mVenus, mOrange, mTurquoise, tdTomato, aminocoumarin, fluorescein, texas red, Alexa Fluor dyes (e.g. Alexa Fluor 488, Alexa Fluor 555, Alexa Fluor 594, Alexa Fluor 647, Alexa Fluor 350, Alexa Fluor 532, and Alexa Fluor 700), Cy dyes (e.g. Cy3, Cy5), DyLight dyes, FITC, or Rhodamine, or functional variants thereof. [0426] Aspect 157B is a method of identifying one or more RNA interaction sites of a RNA-binding Protein (RBP) in a biological sample, comprising: (a) fixing the biological sample; (b) contacting (e.g., incubating together) the biological sample with an agent that permeabilizes cell membranes; (c) providing an RBP-targeting agent to the sample, wherein the RBP-targeting agent interacts with the RBP of interest; (d) providing a transcriptase composition comprising a polypeptide construct comprising a targeting moiety and a reverse transcriptase enzyme; wherein the targeting moiety interacts with the RBP-targeting agent; (e) incubating the sample with the transcriptase composition to produce cDNA; and (f) sequencing the cDNA.
[0427] Aspect 158B is the method of aspect 157B, wherein the targeting moiety comprises, consists essentially of, or consists of a Fc binding protein or a variant thereof, an antibody or variant thereof, an oligonucleotide or variant thereof, a receptor, a ligand, a small molecule, or any combination thereof.
[0428] Aspect 159B is the method of aspect 158B, wherein the targeting moiety comprises, consists essentially of, or consists of a Fc binding protein or a variant thereof.
[0429] Aspect 160B is the method of aspect 158B, wherein the targeting moiety comprises, consists essentially of, or consists of an antibody or variant thereof.
[0430] Aspect 161B is the method of aspect 158B, wherein the targeting moiety comprises, consists essentially of, or consists of a oligonucleotide or a variant thereof.
[0431] Aspect 162B is a method of determining one or more RNA interaction sites of a first RNA-binding Protein (RBP) in a biological sample, comprising: a) contacting (e.g., incubating together) a first RBP-targeting agent comprising a functionalized first DNA barcode, to the first RBP, wherein the first RBP-targeting agent specifically binds the first RBP to form a first primary complex; b) contacting (e.g., incubating together) the first primary complex with one or more secondary binding agents that specifically binds the first RBP- targeting agent, to form a secondary complex; c) incubating the first primary or the secondary complex with the transcriptase composition of any one of aspects 24-34, to obtain a first barcoded cDNA library; d) amplifying and sequencing the first barcoded cDNA library; e) obtaining one or more interaction site of the first RBP by deconvoluting the sequenced cDNA library based on the first DNA barcode.
[0432] Aspect 163B is the method of aspect 162B, wherein the transcriptase composition comprise an RT primer sequence comprising a functional group and biotinylated dNTPs.
[0433] Aspect 164B is the method of aspect 163B, wherein the functional group is an azide functional group. [0434] Aspect 165B is the method of aspect 163B or aspect 164B, wherein the biotinylated dNTPs, and the RT primer sequence comprising the azide functional group, are incorporated into the cDNA to form proximal azide labeled biotinylated cDNAs during reverse transcription in step c.
[0435] Aspect 166B is the method of any one of aspects 162B-165B, wherein the functionalized DNA barcode comprises, consists essentially of, or consist of a alkyne (3'-O- propargyl N 2'-5' linked) functionalized DNA barcode.
[0436] Aspect 167B is the method of any one of aspects 162B-166B, wherein the alkyne functionalized barcodes comprise a nucleic acid sequence as set forth in any one of SEQ ID NO: 31-78, or a sequence at least about 80%, at least about 85%, at least about 90%, at least about 95%, or at least about 99% identical thereto.
[0437] Aspect 168B is the method of any one of aspects 162B-167B, further comprising incorporating the alkyne functionalized first DNA barcode into the cDNA by reacting the alkyne functionalized first DNA barcode with the proximal azide labeled biotinylated cDNA of aspect 165, using in-situ copper catalyzed azide-alkyne cycloaddition (CuAAC), to obtain a first barcoded biotinylated cDNA library.
[0438] Aspect 169B is the method of aspect 168B, wherein the method further comprises purifying the barcoded biotinylated cDNA library over a streptavidin column prior to step (d). [0439] Aspect 170B is the method of aspect 169B, further comprising processing the CuAAC using a Klenow Fragment DNA polymerase for second strand synthesis.
[0440] Aspect 17 IB is the method of aspect 170B, wherein the one or more interaction sites of the first RBP are obtained by deconvoluting the sequenced data based on the first DNA barcode incorporated into the cDNA.
[0441] Aspect 172B is the method of any one of aspect 162B-171B, further comprising determining the one or more RNA-interaction sites of a second RNA-binding Protein (RBP) in a biological sample, comprising: a) contacting (e.g., incubating together) a second RBP- targeting agent comprising a alkyne functionalized second DNA barcode, to the second RBP, wherein the RBP-targeting agent specifically binds the second RBP to form a second primary complex; b) contacting (e.g., incubating together) the second primary complex with one or more secondary binding agents that specifically binds the first RBP-targeting agent, to form a second secondary complex; c) incubating the second primary or the second secondary complex with the transcriptase composition of any one of aspects 24B-34B, to obtain a second barcoded cDNA library; d) amplifying and sequencing the second barcoded cDNA library; and e) obtaining one or more interaction site of the second RBP by deconvoluting the sequenced cDNA library based on the second DNA barcode.
[0442] Aspect 173B is the method of any one of aspects 162B-172B, comprising determining the one or more RNA interaction sites for greater than, equal to, at least, at most 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500 RBPs.
[0443] Aspect 174B is a method of determining spatial distribution of a RNA modification site on a biological sample bound to a solid surface, comprising: a) contacting (e.g., incubating together) a modification-targeting agent that specifically binds the modification site on the RNA to form a primary complex; b) contacting (e.g., incubating together) the primary complex with a secondary binding agent that specifically bind the primary complex to form a secondary complex; c) incubating the primary complex or the secondary complex with the transcriptase composition of any one of aspects 24B-34B to obtain cDNA; d) optionally incorporating labelled barcodes into the cDNA; and e) sequencing and imaging the biological sample using a single cell genomic imaging technique to determine the one or more modification sites.
[0444] Aspect 175B is the method of aspect 174B, wherein the modification-targeting agent is an oligonucleotide, or a variant thereof, or a small molecule.
[0445] Aspect 176B is the method of aspect 175B, wherein the oligonucleotide comprises, consists essentially of, or consists of fluorescent NTPs, or a fluorescent probe.
[0446] Aspect 177B is the method of aspect 174B, wherein the modification-targeting agent is an antibody or a functional variant thereof.
[0447] Aspect 178B is the method of aspect 177B, wherein the antibody or the functional variant thereof comprises monoclonal antibodies, polyclonal antibodies, recombinant antibody, IgG, Fv, single chain antibody, single domain antibodies, nanobodies, diabodies, multispecific antibodies (e.g., bispecific antibodies), scFv, Fab, F(ab')2, Fab, or variants thereof.
[0448] Aspect 179B is the method of any one of aspects 176B-178B, wherein the modification targeting agent specifically binds to a modification comprising, consisting essentially of, or consisting of m6C, m5C, ml A, m7G, or a pseudouridine modification.
[0449] Aspect 180B is the method of aspect 174B, wherein the sequencing and imaging is done using a single cell genomic imaging technique.
[0450] Aspect 18 IB is the method of aspect 180B, wherein the single cell genomic imaging technique comprises, consists essentially of, consists of spatial transcriptomics, MERFISH, SeqFISH, STARmap, Slide-Seq, Visium Spatial Gene Expression, or deterministic barcoding in tissue for spatial omics sequencing (DBiT-seq). [0451] Aspect 182B is the method of aspect 18 IB, wherein the single cell genomic imaging technique comprises, consists essentially of, consists of deterministic barcoding in tissue for spatial omics sequencing (DBiT-seq) comprising: ligating a first set and a second set of spatial barcodes to the cDNA of step (c), prior to step (d), wherein the first set of spatial barcodes are contacted to the cDNA horizontally using a first multi-channel microfluidic chip, and the second set of spatial barcodes are contacted to the solid surface vertically using a second multichannel microfluidic chip.
[0452] Aspect 183B is the method of any one of aspects 18 IB or 182B, wherein the first set of spatial barcodes and the second set of spatial barcodes form a 2D spatial barcode array. [0453] Other objects, features and advantages of the present inventions will become apparent from the following detailed description. It should be understood, however, that the detailed description and the specific examples, while indicating specific aspects of the inventions described herein, are given by way of illustration only, since various changes and modifications within the spirit and scope of the inventions will become apparent to those skilled in the art from this detailed description.
BRIEF DESCRIPTION OF THE DRAWINGS
[0454] The following drawings form part of the present specification and are included to further demonstrate certain aspects of the present inventions. The inventions can be better understood by reference to one or more of these drawings in combination with the detailed description of specific aspects presented herein.
[0455] FIGs. 1A-1D describe ARTR-seq strategy and validation. FIG. 1A is a scheme of ARTR-seq. Cellular structure is preserved by formaldehyde fixation (I); the reverse transcriptase (RTase) is then attached to the RBP of interest by specific antibodies and a protein A/G fusion (II) for in-situ reverse transcription (RT) at RBP binding sites (III), with imaging as an optional built-in step. The resulting biotinylated cDNA is enriched for sequencing (IV). FIG. IB is a graph of qRT-PCR analysis showing the reverse transcription activity of tested purified pAG-RTase fusion proteins. Two commercial RTases, SuperScript II and SuperScript III, were loaded as positive controls, n = 3 replicates. FIG. 1C is a biotin dot blot assay showing biotinylated cDNA products produced from ARTR-seq, with methylene blue staining as the loading control. FIG. ID depicts immunofluorescence (IF) imaging of the secondary antibody (2nd ab, yellow; left panels), pAG-RTase (red; second panels), newly synthesized cDNA (green; third panels), and nucleus (blue; fourth panels) for PTBP 1 ARTR-seq, with 10 pm scale bars. The line graph analysis exhibits relative fluorescence intensity along the line. [0456] FIGs. 2A-2I describe how ARTR-seq captured binding sites of RBPs using as few as 20 cells. FIG. 2 A depicts ARTR-seq replicate correlations for usable reads per gene, normalized to coverage (RPM) for PTBP1 in HepG2 (top) and HeLa (bottom) cells, respectively. The color scale shows the point distribution density. The coefficient R and P- values were given by Pearson’s correlation. FIG. 2B depicts peaks distribution in 3' UTR, CDS, 5' UTR, non-coding exon, intergenic region and intron, and the corresponding motifs of PTBP1 binding peaks identified by ARTR-seq in the HepG2 (top) and HeLa (bottom) cells, respectively. FIG. 2C depicts snapshots from Integrative Genomics Viewer (IGV) showing the signal overlaps between ARTR-seq and eCLIP (top) or iCLIP (bottom). The ARTR-seq input is pooled by 3 replicates. FIG. 2D depicts ARTR-seq reads density at PTBP1 binding peaks of control (siCtrl) and PTBP1 knockdown (siPTBPP) HepG2 cells revealed by ATAR-seq. FIG. 2E depicts uniquely mapped reads were subsampled from PTBP1 ARTR-seq with different numbers of cells. The percentage of usable reads were calculated after PCR deduplication. The plot shows replicate 1 for simplicity. FIG. 2F depicts signal profiles and heatmaps of reads density in ARTR-seq libraries constructed from 20 to 40,000 (40 k) HepG2 cells at ARTR-seq- identified PTBP1 peaks. FIG. 2G depicts a snapshot from IGV showing the stable ARTR-seq signal in sequencing libraries constructed from different numbers of HepG2 cells. FIG. 2H is a box plot comparing the CT percentages of usable reads from libraries constructed by using ARTR-seq, CLIP, iCLIP, eCLIP, irCLIP and LACE-seq, respectively. The green dashed line represents the median percentage in the ARTR-seq input library. Boxes represent the 25th-75th percentile with lines at the median and whiskers at 1.5 x IQR. FIG. 21 depicts signal profiles of ARTR-seq reads density at CU-enriched regions. CU-enriched regions are defined as 80-nt- wide regions with a percentage of CT content greater than 70% located in the gene regions.
[0457] FIGs. 3A-3E show ARTR-seq maps of RBP binding sites in tissues. FIG. 3A is an ARTR-seq scheme for tissue samples. A section of tissue is fixed on the slide for ARTR- seq. The RTase is attached to the RBP of interest by specific antibodies and a protein A/G fusion, followed by in-situ RT, with an optional built-in imaging step. The cDNA product is then collected for library preparation. FIG. 3B depicts IF imaging showing the localization of pAG-RTase (red; first panel), 2nd Ab (yellow; second panel) and nucleus (blue; third panel) in the mouse embryo section (El 1), individually or when merged (right panelO, with 20 pm scale bars. FIG. 3C depicts peaks distribution (top) in the 3' UTR, CDS, 5' UTR, non-coding exon, intergenic region and intron, and motifs (bottom) of RBFOX2 binding peaks identified by ARTR-seq in the mouse embryonic tissue. FIG. 3D is a bar plot showing the percentage of usable reads containing the RBFOX2 canonical ‘UGCAUG’ motif for mouse embryos and HepG2 cells. FIG. 3E depicts snapshots from IGV showing overlap of RBFOX2 ARTR-seq signal in mouse embryos with ‘UGC AUG’ -containing sequences. The positions of the ‘UGCAUG’ motifs are indicated with arrows.
[0458] FIGs. 4A-4E describe sRNA binding by splicing factors identified using ARTR-seq. FIGs. 4A-4B show peaks distribution (right) in 3' UTR, CDS, 5' UTR, non-coding exon, intergenic region and intron, and the corresponding motifs (left) of RBFOX2 (4A) and HNRNPC (4B) peaks, respectively, detected by ARTR-seq in HepG2 cells. FIG. 4C shows box plots showing the splicing differences of five alternative splicing (AS) modes upon the knockdown of PTBP1 (left), RBFOX2 (middle) and HNRNPC (right). The splicing modes include skipped exon (SE), mutually exclusive exon (MXE), alternative 5' splice site (A5SS), alternative 3' splice site (A3SS), and retained intron (RI). The size of circles on the top or bottom of each bar indicates event numbers. FIG. 4D depicts normalized RBP maps for skipped exons that were excluded (red) or included (blue) upon corresponding splicing factor depletion. Lines depict average ARTR-seq peak density. FIG. 4E shows cumulative curves and boxplots (inside) showing the absolute value of splicing differences upon PTBP1 knockdown. PTBP 1 -regulated genes were divided into three groups according to their enrichment in ARTR-seq, including no enrichment (No, 0 < enrichment < 1), low enrichment (Low, 1 < enrichment < 2) and high enrichment (High, 2 < enrichment). Statistical significance was determined by the Student’s t-test of indicated group versus ‘no enrichment’ group; *P < 0.05, **P < 10"5. Boxes in (4C) and (4E) represent the 25th-75th percentile with lines at the median and whiskers at 1.5 x IQR.
[0459] FIGs. 5A-5E show ARTR-seq mapped binding features of the selected m6A binding proteins. FIG. 5A depicts peaks distribution in the 3' UTR, CDS, 5' UTR, non-coding exon, intergenic region and intron of YTHDF1, YTHDF2 and YTHDC1 identified by ARTR- seq for HeLa cells. FIG. 5B is a pie chart showing the detailed genomic feature distribution of YTHDC1 intronic and intergenic binding peaks. FIG. 5C depicts aggregation profiles showing the meta distributions of binding peaks for YTHDF1, YTHDF2, and YTHDC1 along mRNA transcripts. FIG. 5D is a bar plot showing the percentage of exonic peaks overlapping with m6A sites in polyadenylated RNA detected by m6A-SAC-seq for the m6A reader proteins. The random peaks are random exonic regions with the same lengths as peaks of the reader proteins. Three replicates of published YTHDF2 PAR-CLIP data were used as the positive control. FIG. 5E depicts cumulative curves and boxplots (inside) exhibit the peak enrichment (log2 value) of ARTR-seq targets for YTHDF1 (left) and YTHDF2 (right). Peaks of m6A reader proteins were divided into four groups according to the m6A fraction (sum value) quantified by m6A-SAC- seq. The peaks without m6A are categorized in one group (No), and other peaks divided into three groups with an equal number of peaks, including low m6A fraction (Low), medium m6A fraction (Medium) and high m6A fraction (High). Statistical significance was determined by the Student’s t-test of indicated group versus ‘no m6A’ group; *P < 0.05, **P < 10'5. Boxes represent the 25th-75th percentile with lines at the median and whiskers at 1.5 x IQR.
[0460] FIGs. 6A-6I Depict dynamic RNA binding of G3BP1 during the assembly of stress granule. FIG. 6A depicts IF imaging showing the localization of G3BP1 in HeLa cells without treatment (TO) and with the treatment of 0.5 mM NaAsCL for 10 min (T10), 20 min (T20), and 60 min (T60), respectively, with 5 pm scale bars. Stress granules (SG) could be observed at T20 and T60. FIG. 6B depicts IF imaging (top) showing that G3BP1 (yellow) was colocalized with biotinylated cDNA (green) generated from ARTR-seq, with 5 pm scale bars. The line graph analysis (bottom) showing the relative fluorescence intensity along the line. FIG. 6C is a Venn diagram showing the overlap between the G3BP1 RNA targets at TO and T60. FIG. 6D is a box plot exhibiting SG enrichment of RNA targets from three groups defined in c, including TO only (TO only) fraction, TO and T60 overlapped (OL) fraction, and T60 only (T60_only) fraction. SG enrichment values were reported in SG RNA sequencing. P-values were determined by two-tailed Wilcoxon test. FIG. 6E is KEGG enrichment analysis showing RNA targets from three groups are enriched in distinct pathways. FIGs. 6F-6G depict box plots of G3BP1 binding strength for SG-enriched RNAs (6F) and SG-depleted RNAs (6G). G3BP1 binding strength is defined as ARTR-seq log2FC(G3BPl/input). SG-enriched RNAs and SG-depleted RNAs were obtained from a previous SG RNA sequencing report. FIG. 6H depicts a heatmap (left) depicting changing patterns of G3BP1 binding strength for RNA clusters across time. RNAs were ranked from large to small according to the standard deviation (SD) of G3BP1 binding intensity over different time intervals, and the top 50% of RNAs were selected and clustered by fuzzy c-means. Line plots (right) exhibit the corresponding change of G3BP1 binding strength in each cluster. Each line represents one gene, with the black line being the centroid of the cluster. FIG. 61 depicts snapshots from IGV showing two G3BP1 RNA targets with decreased (left) and increased (right) binding strength, and each panel was normalized by CPM. Heatmaps (bottom) show G3BP1 binding strength with the size of circles representing its absolute value. Boxes in (6D), (6F), and (6G) represent the 25th-75th percentile with lines at the median, dots at the mean and whiskers at 1.5 x IQR
[0461] FIGs. 7A-7G Depict ARTR-seq setup and condition optimization. FIG. 7A shows functional domains of the MMLV RTase. The MMLV RTase (full length) is composed of three domains: polymerase, connection, and RNase H. FIG. 7B depicts Coomassie bright blue staining of three purified pAG-RTase fusion constructs. FIG. 7C depicts qRT-PCR analysis for ACTB, METTL14 and RBM15 showing the RT activity of three tested purified pAG-RTase fusion proteins. Two commercial RTases, SuperScript II and SuperScript III were loaded as positive controls, with n = 3 replicates. FIG. 7D depicts qRT-PCR showing the RT efficiency of random primers. pAG-MMLV RTase fusion protein (25-497) was used in this analysis, with n = 2 replicates. FIG. 7E depicts qRT-PCR analysis showing the effects of different biotinylated dNTPs on RT efficiency using pAG-MMLV RTase (25-497). Biotin-16- dUTP and biotin- 16-dCTP exhibited the least hindrance on RT efficiency. Both were used in the ARTR-seq procedure by mixing with regular dTTP and dCTP at a 1 : 1 ratio. FIG. 7F depicts immunofluorescence (IF) imaging of the secondary antibody (2nd ab, yellow), pAG-RTase (red), newly synthesized cDNA (green), and nucleus (blue) for RBF0X2 ARTR-seq, with 10 pm scale bars. FIG. 7G depicts qPCR analysis demonstrating relative cDNA yields of ARTR-seq samples.
[0462] FIGs. 8A-8E ARTR-seq performed favorably relative to other methods. FIG. 8A depicts numbers (left) and percentages (right) of reads remaining after processing steps for libraries constructed by using ARTR-seq, CLIP, eCLIP, iCLIP, irCLIP, LACE-seq, sCLIP, tRIP-seq, and RT&Tag, respectively. FIG. 8B depicts uniquely mapped reads that were sub sampled from PTBP1 libraries constructed by using ARTR-seq, CLIP, eCLIP, iCLIP, irCLIP, LACE-seq, sCLIP, and tRIP-seq, respectively. The percentage of usable reads were calculated after PCR deduplication. FIG. 8C depicts snapshots from Integrative Genomics Viewer (IGV) showing the read coverage of ARTR-seq libraries. The read coverage of each library was normalized by its respective sequencing depth and all tracks were set to the same scale. FIG. 8D depicts a bar plot showing the usable reads distribution in the intronic (left), intergenic (middle) and exonic (right) regions for libraries constructed by using ARTR-seq, CLIP, eCLIP, iCLIP, irCLIP, LACE-seq, sCLIP, and tRIP-seq, respectively. About 30% of usable reads for the ARTR-seq input samples were located in introns. FIG. 8E depicts meta distributions of PTBP 1 ARTR-seq peaks along mRNA transcripts and flanking 1 kb regions.
[0463] FIGs. 9A-9E ARTR-seq compared favorable relative to other methods. FIG. 9A depicts the signal profile and heatmap of reads density from ARTR-seq library reads at the eCLIP -identified PTBP1 peaks in HepG2 cells. FIG. 9B depicts signal profiles and heatmaps of reads density from ARTR-seq and LACE-seq library at the eCLIP-identified PTBP1 peaks in K562 cells. FIG. 9C depicts heatmaps exhibiting the transcriptome-wide pairwise overlap of PTBP 1 -targeted genes (left) or peaks (right) among libraries from ARTR-seq, eCLIP, and LACE-seq and iCLIP using the same cell line. Notably, the iCLIP data from the HeLa S3 cell line was compared with ARTR-seq using the HeLa cell line and LACE-seq using the HeLa cell line. The overlap proportion was determined as the number of genes (or peaks) overlapped between sample A and sample B divided by the total number of genes (or peaks) in sample A. The maximum gap between overlapping peaks was set at 200 nt. The overlap proportion of genes (or peaks) and the cell line of sample A were labeled in the corresponding position. FIG. 9D depicts IGV snapshots showing the read coverage of ARTR-seq libraries corresponding to FIG. 8C. The read coverage of each library was normalized by its respective sequencing depth. According to the ARTR-seq library types (input and PTBP1), the tracks were adjusted to distinct scales. FIG. 9E is a Western blot (left) and a quantification (right) displaying PTBP1 protein levels in control (siCtrl) and PTBP1 knockdown (siPTBP ) HepG2 cells. GAPDH was used as an internal control for normalization.
[0464] FIGs. 10A-10E Depict direct versus indirect binding in ARTR-seq. FIG. 10A depicts a schematic diagram illustrating the simplified direct and indirect targets of the RNA binding protein (RBP). The symbol “X” represents the interacting protein or complex of the RBP. FIG. 10B depicts cumulative curves displaying the frequency of RBFOX2 peaks located within a certain absolute distance on the genome from the closest RBFOX2 canonical motif ‘UGCAUG’ for both ARTR-seq and eCLIP. FIG. 10C depicts boxplots showing ARTR-seq peaks exhibiting reduced signal values (top) and q-values (bottom) as the absolute distance to the nearest 'UGCAUG' site increases. The boxes represent the 25th-75th percentile with lines at the median and whiskers at 1.5 x IQR. FIG. 10D depicts a bar plot illustrating the impact of signal value cutoffs and q-value cutoffs on the percentage of RBFOX2 peaks within an absolute distance of 500 nts from the closest RBFOX2 canonical motif ‘UGCAUG’ . The number of remaining peaks was labeled at the top of the bar after applying the cutoffs. FIG. 10E depicts cumulative curves exhibiting the frequency of YTHDF2 peaks located within a certain absolute distance on the transcriptome from the closest m6A sites identified by m6A-SAC-seq for both ARTR-seq and PAR-CLIP.
[0465] FIGs. 11A-11H Depict optimizations for reducing potential indirect binding in ARTR-seq. FIG. 11A depicts a schematic diagram demonstrating the binding of protein A/G- reverse transcriptase (pAG-RTase), secondary antibody (2nd Ab), and the primary antibody (1st Ab). FIG. 11B depicts a schematic diagram showing the constructs of pAG-RTases with different amino acid (aa) linker lengths: 3 aa for pRT3, 13 aa for pRT13, and 30 aa for pRT30. FIG. 11C depicts Coomassie bright blue staining of the purified pRT3, pRT13 and pRT30. FIG. 11D depicts qRT-PCR analysis for GAPDH, ACTB, METTL14 and RBM15 showing the in-vitro reverse transcription (RT) efficiency of pAG-RTase with different linker lengths (pRT3, left; pRT13, middle; and pRT30, right). FIG. HE depicts qPCR analysis to quantify relative cDNA yields of ARTR-seq samples. FIG. HF depicts signal profiles of ARTR-seq read density at RBF0X2 ARTR-seq peaks and flanking 0.3 kb. FIG. 11G depicts a snapshot from IGV showing signals of ARTR-seq libraries. FIG. 11H depicts cumulative curves displaying the frequency of the top 3000 RBFOX2 peaks (with the highest signal values) located within a certain absolute distance from the closest ‘UGCAUG’ for ARTR-seq libraries constructed under different conditions.
[0466] FIGs. 12A-12B Depict exemplary resolution of ARTR-seq. FIG. 12A depicts density plots showing the distribution of peak midpoint within a 400-nt window flanking the RBFOX2 canonical binding motif ‘UGCAUG’ for RBFOX2 and PTBP1 (negative control) ARTR-seq libraries. FIG. 12B depicts density plots showing the distribution of peak midpoint within a 400-nt window flanking m6A sites in HeLa cells identified by m6A-SAC-seq for the YTHDF2 ARTR-seq libraries. The distributions in FIGs. 12A-B are split into three groups based on the peak signal values.
[0467] FIGs. 13A-13D Depict the impact of RNase treatment in ARTR-seq. FIG. 13A depicts density plots exhibiting the distribution of fragment length for ARTR-seq libraries with or without RNase treatment. FIG. 13B depicts density plots showing the distribution of RBFOX2 peak midpoint within a 400-nt window flanking the RBFOX2 canonical binding motif ‘UGCAUG’ for ARTR-seq libraries without RNase treatment, with weak RNase I treatment and with strong RNase I treatment. The distribution is split into three groups based on the peak signal values. FIG. 13C depicts qPCR analysis to quantify the relative cDNA yield of ARTR-seq. FIG. 13D depicts cumulative curves displaying the frequency of the top 3000 RBFOX2 peaks (with the highest signal values) located within a certain absolute distance from the closest ‘UGCAUG’ for ARTR-seq libraries constructed with or without RNase treatment.
[0468] FIGs. 14A-14E Depict the efficacy of ARTR-seq applications using low input samples. FIG. 14A depicts ARTR-seq replicate correlations for usable reads per gene normalized to coverage (RPM) for PTBP1 with different numbers of HepG2 cells. The color scale shows the point distribution density. The coefficient R and P-values were given by Pearson’s correlation. FIG. 14B Depicts numbers (left) and percentages (right) of reads remaining after processing steps for libraries constructed from different cell numbers by ARTR-seq, LACE-seq, and RT&Tag. The libraries generated by the same method are linked with the line and indicated in the same color. FIG. 14C uniquely mapped reads were subsampled from PTBP1 libraries constructed from different numbers of cells by ARTR-seq, LACE-seq, and RT&Tag, respectively. The percentage of usable reads were calculated after PCR deduplication. FIG. 14D is bar plot showing the usable reads distribution in the intronic (left), intergenic (middle) and exonic (right) regions of libraries constructed from different cell numbers by ARTR-seq and LACE-seq, respectively. The libraries generated by the same method are linked with the line and indicated with the same color. FIG. 14E depicts the signal profile and heatmap of reads density in LACE-seq with different numbers of cells at LACE- seq-identified PTBP1 peaks.
[0469] FIGs. 15A-15C ARTR-seq generated quality sequencing libraries from mouse embryo tissue samples. FIG. 15A depicts numbers (left) and percentages (right) of reads remaining after processing steps for ARTR-seq libraries. FIG. 15B depicts ARTR-seq replicate correlations for usable reads per gene normalized to coverage (RPM) for RBFOX2 in mouse embryos. The color scale shows the point distribution density. The coefficient R and P-values were given by Pearson’s correlation. FIG. 15C is a bar plot showing the usable reads distribution in the intronic (left), intergenic (middle) and exonic (right) regions of ARTR-seq libraries constructed from mouse embryos.
[0470] FIGs. 16A-16D Depict how ARTR-seq detected binding signals of splicing factors. FIG. 11A depicts percentages of reads remaining after processing steps for ARTR- seq libraries of PTBP1, RBFOX2 and HNRNPC. FIG. 16B is a bar plot showing the usable reads distribution in the intronic (left), intergenic (middle) and exonic (right) regions for ARTR-seq libraries of PTBP1, RBFOX2 and HNRNPC. FIG. 16C depicts cumulative curves and boxplots (inside) showing the absolute value of splicing difference upon RBF0X2 (left panels) or HNRNPC (right panels) knockdown. FIG. 16D depicts cumulative curves and boxplots (inside) showing the absolute value of splicing differences of included RI upon PTBP1 knockdown. In FIGs. 16C-D, RBP-regulated genes were divided into three groups according to their enrichment in ARTR-seq, including no enrichment (No, 0 < enrichment < 1; left), low enrichment (Low, 1 < enrichment < 2; middle) and high enrichment (High, 2 < enrichment; right). Statistical significance in FIGs. 16C-D was determined using the Student’s t-test of indicated group versus ‘no enrichment’ group; *P < 0.05, **P < 10'5. Boxes in FIGs. 16C-16D represent the 25th-75th percentile with lines atthe median and whiskers at 1.5 x IQR. [0471] FIGs. 17A-17F Depict binding features of m6A reader proteins detected by ARTR-seq. FIG. 17A depicts IF imaging showing the subcellular localization of YTHDF1, YTHDF2, and YTHDC1 in HeLa cells, with 5 pm scale bars. FIG. 17B depicts ARTR-seq replicate correlations for usable reads per gene normalized to coverage (RPM) for YTHDF1, YTHDF2, and YTHDC1. The color scale shows the point distribution density. The coefficient R and P-values were given by Pearson’s correlation. FIG. 17C depicts distribution of usable reads in the intronic (left), intergenic (middle) and exonic (right) regions for ARTR-seq libraries of the individual m6A binding proteins. FIG. 17D depicts a Venn diagram illustrating overlap of peaks identified by ARTR-seq for YTHDF1, YTHDF2 and YTHDC1. FIG. 17E depicts aggregation profiles showing the meta distributions of binding peaks along mRNA transcript detected in two biological ARTR-seq replicates for YTHDF1 (top), YTHDF2 (middle), and YTHDC1 (bottom). FIG. 17F depicts cumulative curves and boxplots (inside) showing the peak enrichment (log2 value) of ARTR-seq targets for YTHDC 1. YTHDC 1 peaks were divided into four groups according to the m6A fraction (sum value) quantified by m6A- SAC-seq. The peaks without m6A are categorized in one group (No), and other peaks were divided into three groups with an equal number of peaks, including low m6A fraction (Low), medium m6A fraction (Medium) and high m6A fraction (High). Statistical significance was determined by the Student’ s t-test of indicated group versus ‘no m6A’ group; *P < 0.05, **P < 10’5.
[0472] FIGs. 18A-18D Depict G3BP1 binding at different time intervals during SG assembly captured by ARTR-seq. FIG. 18A depicts the Pearson correlation heatmap among time intervals of ARTR-seq results based on G3BP1 binding strength. G3BP1 binding strength is defined as ARTR-seq log2FC(G3BPl/input). Pairwise correlation coefficients were indicated as circle size and noted in each circle. FIG. 18B is a heatmap exhibiting stable G3BP1 binding strength of selected RNAs at different time intervals, organized by hierarchical clustering. RNAs were ranked from small to large according to the standard deviation (SD) of G3BP1 binding intensity over different time intervals, and the top 5% of RNAs were selected for clustering (n = 677). The dendrogram was constructed using complete linkage based on Euclidean distance. (18C) Depicts IGV snapshots (top) of two RNAs with stable G3BP1 binding strength in ARTR-seq, with each panel normalized by CPM. (18D) Depicts IGV snapshots showing RNAs with gradually decreased (left) and increased (right) G3BP1 binding strength. Each panel was normalized by CPM. Heatmaps (bottom) in FIGs. 18C-D show G3BP1 binding strength in ARTR-seq with the size of the circle representing its absolute value.
[0473] FIG. 19 Provides a schematic representation of multiplexed ARTR-seq.
[0474] FIG. 20 Provides a schematic representation of spatial m6-A ARTR-seq.
[0475] FIGs. 21A-21F Depict validation of m6A-ARTR-seq in HeLa cells. FIG. 21A provides immunofluorescence imaging data of the secondary antibody (2nd Ab; yellow, left panel), pAG-RTase (red; second panel) and nucleus (blue; third panel) and merge (fourth panel) for m6A-ARTR-seq. FIG. 21B depicts replicate correlations for usable reads per gene normalized to coverage (reads per million reads mapped, RPM) for m6A-ARTR-seq. The color scale shows the point density. The coefficient R and P values were given by the two-tailed Pearson’s correlation. FIG. 21C is a Venn plot showing overlap of m6A peak between two biological replicates. FIG. 21D provides aggregation profiles showing the meta distributions of m6A peaks identified in m6A-ARTR-seq. FIG. 21E provides m6A peaks distribution data (top) in exon, intergenic region and intron, and the corresponding motifs (bottom). FIG. 21F are snapshots from the IGV showing the signal overlaps between m6A -ARTR-seq, m6A-SAC- seq and GLORI.
[0476] FIGs. 22A-22G Depicts Spatial m6A profiling of mouse embryo and comparison between mESCs. FIG. 22A shows H&E staining of an adjacent El l mouse embryo section with the grey square indicating the region of interest (ROI). FIG. 22B shows correlations for usable reads per gene, normalized to coverage (RPM), for bulk and spatial m6A-ARTR-seq in mESC and El l tissues. FIG. 22C is a Venn plot showing the overlap of m6A peak between mESC and El l tissues. FIG. 22D depicts aggregation profiles displaying the meta distributions of m6A peaks identified by m6 A- ARTR-seq. FIG. 22E are IGV snapshots showing the specific and shared m6A signals. FIG. 22F is a spatial UMI count map (top) and gene count map (bottom) for spatial m6A-ARTR-seq. FIG. 22G show unsupervised clustering of m6A for two adjacent El l mouse embryo sections.
[0477] FIGs. 23A-23H Depicts Spatial m6A-ARTR-seq revealing an m6A distribution map of the mouse brain. FIG. 23A shows H&E staining of an adjacent mouse brain section (left) and tissue scan of ROI covered by 50 pm microfluidic device (right). FIG. 23B depicts IF staining of the secondary antibody (orange) and nuclei (blue) on the same tissue section used for sequencing. FIG. 23C Spatial UMI count map (right) and gene count map (left) for spatial m6A-ARTR-seq on the mouse brain section. FIG. 23D depicts spatial m6A clustering map of 20 clusters identified in the mouse brain, compared with morphological annotations of a similar coronal section from the Allen Mouse Brain Atlas (P56 mouse, Coronal section 77, atlas. brainmap. org). FIG. 23E depicts aggregation profiles showing the reads distributions along mRNA identified by m6A-ARTR-seq. FIG. 23F is a Violin plot (right) showing UMI counts of 9 regions of mouse brain (left). FIGs. 23G-24H m6A signal distributions of Cblnl (23G), and Zbtb20 (23H) (left), and the corresponding m6A signal coverage in specific brain regions (right).
DETAILED DESCRIPTION
[0478] Disclosed herein are methods, compositions, and kits for an Assay of Reverse Transcription-based RBP binding sites Sequencing (ARTR-seq) to capture RBP-RNA interactions through in-situ reverse transcription (RT). In some aspects, ARTR-seq captures RBP binding sites using in-situ RT guided by antibody -located RTase. In certain aspects, ARTR-seq identifies RBP binding sites with high sensitivity and specificity, even when using as few as 20 cells or limited tissues. In other aspects, the procedure is compatible with immunofluorescence imaging, providing a direct readout of the spatial information about targeted proteins without affecting downstream sequencing.
[0479] In some aspects, also disclosed herein are methods for simultaneously determining the RNA interactions sites of more than one RNA binding protein, broadly referred to as multiplex ARTR-seq. In some aspects, also provided herein are modifications of the disclosed methods (ARTR-seq and multiplex ARTR-seq) for determining RNA binding sites with spatial resolution. These method are broadly referred to herein as spatial ARTR-seq. In some aspects, any of the methods disclosed herein (for example, ARTR-seq, multiple ARTR-seq, spatial ARTR-seq) may be modified to study RNA modification sites as demonstrated here.
[0480] One main advantage of ARTR-seq and the related methods disclosed herein, is the employment of in-situ reverse transcription that bypasses antibody -based immunoprecipitation (IP) step(s), thereby reducing material loss. Another advantage is the easy deployment nature of ARTR-seq. The method can be readily applied to cell lines, tissues, and even clinical samples to obtain both imaging and sequencing results on specific RBPs. ARTR-seq also displays distinct advantages compared to the recently reported CUT&Tag49 and RT&Tag22 ARTR- seq employs random primers to unbiasedly capture local signals, while RT&Tag uses oligo dT for RT, resulting in the loss of signal from non-polyadenylated RNAs, such as pre-mRNAs and circular RNAs. Additionally, since the length of matured mRNA is known to peak at 2065 bp50, RT&Tag can lose local resolution as reverse transcription uniformly starts from the poly-A tail. The full-length RT efficiency is also limited in situ, resulting in bias toward RNA 3' coverage. Secondly, Tn5 tagmentation on RNA-cDNA heteroduplex is a low efficient process, which hinders the application of RT&Tag using low input samples. Thirdly, ARTR-seq can be applied to RBPs located in all cellular compartments, whereas RT&Tag is limited to the isolated nucleus but still loses binding information for RNAs without poly A tails in the nucleus. ARTR-seq therefore offers the potential to unbiasedly identify all RNA targets bound by RBPs in all cellular compartments using limited starting materials.
[0481] Investigations of dynamic RBP binding to RNA targets have been hindered by low UV crosslinking efficiency, long incubation time, and the requirement of large amounts of materials using the existing methods. Taking advantage of significantly more efficient formaldehyde crosslinking and the feasibility of low input sample requirement, ARTR-seq shows a distinct capability on capturing transient RBP binding at different time intervals. ARTR-seq can be applied to capturing dynamic RNA targeting of G3BP1 during SG assembly on a timescale as short as 10 min. The high temporal resolution of ARTR-seq can enable the investigation of dynamic or even transient RBP-RNA interaction in many other events.
[0482] In specific examples, the unique binding characteristics of PTBP1, RBFOX2 and HNRNPC related to their splicing regulatory roles can be observed with ARTR-seq. Additionally, ARTR-seq has been shown to detect the preferences of m6A reader proteins, YTHDF1, YTHDF2 and YTHDC1. In additional examples, the dynamic binding of G3BP1 to target RNAs can be demonstrated during the process of stress granule assembly.
I. Definitions
[0483] Throughout this application, the term “about” is used to indicate that a value includes the inherent variation of error for the measurement or quantitation method.
[0484] The use of the word “a” or “an” when used in conjunction with the term “comprising” can mean “one,” but it is also consistent with the meaning of “one or more,” “at least one,” and “one or more than one.”
[0485] The phrase “and/or” means “and” or “or”. To illustrate, A, B, and/or C includes: A alone, B alone, C alone, a combination of A and B, a combination of A and C, a combination of B and C, or a combination of A, B, and C. In other words, “and/or” operates as an inclusive or. It is specifically contemplated that A, B, or C can be specifically excluded from an aspect.
[0486] The words “comprising” (and any form of comprising, such as “comprise” and “comprises”), “having” (and any form of having, such as “have” and “has”), “including” (and any form of including, such as “includes” and “include”) or “containing” (and any form of containing, such as “contains” and “contain”) are inclusive or open-ended and do not exclude additional, unrecited elements or method steps.
[0487] The compositions and methods for their use can “comprise,” “consist essentially of,” or “consist of’ any of the ingredients or steps disclosed throughout the specification. Compositions and methods “consisting essentially of’ any of the ingredients or steps disclosed limits the scope of the claim to the specified materials or steps which do not materially affect the basic and novel characteristic of the claimed invention.
[0488] As used herein in the context of molecules, e.g. , nucleic acids, proteins, or small molecules, the term “variant” refers to a molecule that shows significant structural identity with a reference molecule but differs structurally from the reference molecule, for example but not limited to, in the presence or absence or in the level of one or more chemical moieties as compared to the reference entity. In some aspects, a variant also differs functionally from its reference molecule. In general, whether a particular molecule is properly considered to be a “variant” of a reference molecule is based on its degree of structural identity with the reference molecule. As will be appreciated by those skilled in the art, any biological or chemical reference molecule has certain characteristic structural elements. A variant, by definition, is a distinct molecule that shares one or more such characteristic structural elements but differs in at least one aspect from the reference molecule. In some aspects, a variant polypeptide or nucleic acid can differ from a reference polypeptide or nucleic acid as a result of one or more differences in amino acid or nucleotide sequence and/or one or more differences in chemical moi eties (e.g., carbohydrates, lipids, phosphate groups, fluorophores, small molecules) that are covalent components of the polypeptide or nucleic acid (e.g., that are attached to the polypeptide or nucleic acid backbone).
[0489] Changes can be introduced by mutation into a nucleic acid, thereby leading to changes in the amino acid sequence of a polypeptide (e.g., an antibody or antibody derivative) that it encodes. Mutations can be introduced using any technique known in the art. In some aspects, one or more particular amino acid residues are changed using, for example, a site- directed mutagenesis protocol. In another aspect, one or more randomly selected residues are changed using, for example, a random mutagenesis protocol. However it is made, a mutant polypeptide can be expressed and screened for a desired property.
[0490] Mutations can be introduced into a nucleic acid without significantly altering the biological activity of a polypeptide that it encodes. For example, one can make nucleotide substitutions leading to amino acid substitutions at non-essential amino acid residues. Alternatively, one or more mutations can be introduced into a nucleic acid that selectively changes the biological activity of a polypeptide that it encodes. See, e.g., Romain Studer et al., Biochem. J. 449:581-594 (2013), incorporated herein by reference. For example, the mutation can quantitatively or qualitatively change the biological activity. Examples of quantitative changes include increasing, reducing or eliminating the activity. Examples of qualitative changes include altering the antigen specificity of an antibody.
[0491] Variant polypeptides encoded by nucleic acids of the disclosure can contain amino acid changes that confer any of a number of desirable properties. Variant polypeptides can be made using routine mutagenesis techniques and assayed as appropriate to determine whether they possess the desired property. The stability of protein(s) encoded by a variant nucleic acid can be measured by assaying thermal stability or stability upon urea denaturation or can be measured using in silico prediction. Methods for such experiments and in silico determinations are known in the art.
[0492] In some aspects, a variant polypeptide or nucleic acid shows an overall sequence identity with a reference polypeptide or nucleic acid that is, is at least, is at most, or is between (inclusive or exclusive) any two of 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, or 99%. In some aspects, a variant polypeptide or nucleic acid does not share at least one characteristic sequence element with a reference polypeptide or nucleic acid. In some aspects, a reference polypeptide or nucleic acid has one or more biological activities. In some aspects, a variant polypeptide or nucleic acid shares one or more of the biological activities of the reference polypeptide or nucleic acid. In some aspects, a variant polypeptide or nucleic acid lacks one or more of the biological activities of the reference polypeptide or nucleic acid. In some aspects, a variant polypeptide or nucleic acid shows a reduced level of one or more biological activities as compared to the reference polypeptide or nucleic acid.
[0493] In some aspects, a polypeptide or nucleic acid of interest is considered to be a “variant” of a reference polypeptide or nucleic acid if it has an amino acid or nucleotide sequence that is identical to that of the reference but for a small number of sequence alterations at particular positions. Certain amino acids can be substituted for other amino acids in a protein or polypeptide sequence inserted, or deleted, as compared to the reference, with or without appreciable loss of interactive binding capacity with structures such as, for example, antigenbinding regions of antibodies or binding sites on substrate molecules. Since it is the interactive capacity and nature of a protein that defines its functional activity, certain amino acid substitutions can be made in a protein sequence and in its corresponding DNA coding sequence, and nevertheless produce a protein with similar or desirable properties.
[0494] Amino acid sequence variants of the disclosure can be substitutional, insertional, or deletion variants. The variant polypeptide or nucleic acid sequence has at least one modification compared to the reference polypeptide or nucleic acid sequence, e.g., from 1 to about 50 modifications. A variation in a polypeptide of the disclosure can affect 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, or more non-contiguous or contiguous amino acids of the protein or polypeptide, as compared to wild-type. In some aspects, the variant polypeptide or nucleic acid sequence has from 1 to about 50 modifications compared to the reference polypeptide or nucleic acid sequence. In some aspects, the variant polypeptide or nucleic acid sequence has from 1 to about 40 modifications compared to the reference polypeptide or nucleic acid sequence. In some aspects, the variant polypeptide or nucleic acid sequence has from 1 to about 30 modifications compared to the reference polypeptide or nucleic acid sequence. In some aspects, the variant polypeptide or nucleic acid sequence has from 1 to about 20 modifications compared to the reference polypeptide or nucleic acid sequence. In some aspects, the variant polypeptide or nucleic acid sequence has from 1 to about 10 modifications compared to the reference polypeptide or nucleic acid sequence. In some aspects, the variant polypeptide or nucleic acid sequence has from 1 to about 5 modifications compared to the reference polypeptide or nucleic acid sequence. Typically, fewer than about 20%, about 15%, about 10%, about 9%, about 8%, about 7%, about 6%, about 5%, about 4%, about 3%, or about 2% of the residues in a variant are substituted, inserted, or deleted, as compared to the reference. A variant can comprise an amino acid sequence that is at least 50%, 60%, 70%, 80%, or 90%, including all values and ranges there between, identical to any sequence provided or referenced herein.
[0495] It also will be understood that amino acid and nucleic acid sequences can include additional residues, such as additional N- or C-terminal amino acids, or 5' or 3' nucleic acid sequences, respectively, and yet still be essentially identical as set forth in one of the sequences disclosed herein, so long as the sequence meets the criteria set forth above, including the maintenance of biological protein activity where protein expression is concerned. The addition of terminal sequences particularly applies to nucleic acid sequences that can, for example, include various non-coding sequences flanking either of the 5' or 3' portions of the coding region.
[0496] The term “functionally equivalent codon” is used herein to refer to codons that encode the same amino acid, such as the six different codons for arginine. Also considered are “neutral substitutions” or “neutral mutations” which refers to a change in the codon or codons that encode biologically equivalent amino acids.
[0497] Deletion variants typically lack one or more residues of the native or wild type protein. Individual residues can be deleted or a number of contiguous amino acids can be deleted. A stop codon can be introduced (by substitution or insertion) into an encoding nucleic acid sequence to generate a truncated protein.
[0498] Insertional mutants typically involve the addition of amino acid residues at a nonterminal point in the polypeptide. This can include the insertion of one or more amino acid residues. Terminal additions can also be generated and can include fusion proteins which are multimers or concatemers of one or more peptides or polypeptides described or referenced herein. [0499] Substitutional variants typically contain the exchange of one amino acid for another at one or more sites within the protein or polypeptide and can be designed to modulate one or more properties of the polypeptide, with or without the loss of other functions or properties. Substitutions can be conservative, that is, one amino acid is replaced with one of similar chemical properties.
[0500] Conservative amino acid substitutions” can involve exchange of a member of one amino acid class with another member of the same class. Conservative replacements (also “conservative substitutions” or “conservative amino acid substitutions”) are those that take place within a family of amino acids that possess similar biochemical properties, including charge, hydrophobicity, and size. Genetically encoded amino acids are generally divided into families based on the chemical nature of the side chain, e.g., acidic (aspartate, glutamate), basic (lysine, arginine, histidine), nonpolar (alanine, valine, leucine, isoleucine, proline, phenylalanine, methionine, tryptophan), and uncharged polar (glycine, asparagine, glutamine, cysteine, serine, threonine, tyrosine). Thus, a conservative replacement can comprise replacement of an amino acid in one family for an amino acid in the same family (e.g., replacement of a lysine with an arginine, replacement of an aspartate for a glutamate, etc.). Alternatively, or in addition, amino acid similarity can be determined using a Blocks Substitution Matrix (BLOSUM), such as BLOSUM62 (Henikoff S and Henikoff JG, Proc. Natl. Acad. Sci. U.S.A 89(22): 10915-9 (1992)). In this case, a conservative replacement can be a substitution of amino acids having a non-negative value on a BLOSUM62 matrix. Whether an amino acid change results in a functional peptide can readily be determined by assaying the specific activity of the polypeptide derivative. Standard ELISA, Surface Plasmon Resonance (SPR), or other antibody binding assays can be performed by one skilled in the art to make a quantitative comparison of antigen binging affinity between the unmodified antibody and any polypeptide derivatives with conservative substitutions generated through any of several methods available to one skilled in the art. Conservative amino acid substitutions can encompass non-naturally occurring amino acid residues, which are typically incorporated by chemical peptide synthesis rather than by synthesis in biological systems. These include peptidomimetics or other reversed or inverted forms of amino acid moieties.
[0501] Alternatively, substitutions can be “non-conservative” (also “nonconservative”). In some aspects, a non-conservative substitution affects a function or activity of the polypeptide. In some aspects, a non-conservative substitution does not affect a function or activity of the polypeptide. Non-conservative changes typically involve substituting an amino acid residue with one that is chemically dissimilar, such as a polar or charged amino acid for a nonpolar or uncharged amino acid, and vice versa. Non-conservative substitutions can involve the exchange of a member of one of the amino acid classes for a member from another class.
[0502] In some aspects, a reference polypeptide or nucleic acid is a “wild type” or “WT” or “native” sequence found in nature, including allelic variations. A wild type polypeptide or nucleic acid sequence has a sequence that has not been intentionally modified. For the purposes of the present disclosure, “variants” of an amino acid sequence (peptide, protein, or polypeptide) comprise amino acid insertion variants, amino acid addition variants, amino acid deletion variants and/or amino acid substitution variants. “Variants” of a nucleotide sequence comprise nucleotide insertion variants, nucleotide addition variants, nucleotide deletion variants and/or nucleotide substitution variants. The term “variant” includes all mutants, splice variants, post-translationally modified variants, conformations, isoforms, allelic variants, species variants, and species homologs, in particular, those which are naturally occurring.
[0503] A variant of an amino acid sequence (peptide or protein) can be a “functional “functional variant.” The term “functional variant” of an amino acid sequence relates to any variant exhibiting one or more functional properties identical or similar to those of the amino acid sequence from which it is derived, e.g., it is functionally equivalent. With respect to antigens or antigenic sequences, one particular function is one or more immunogenic activities displayed by the amino acid sequence from which the fragment or variant is derived. The term “functional variant,” as used herein, in particular refers to a variant molecule or sequence that comprises an amino acid sequence that is altered by one or more amino acids compared to the amino acid sequence of the parent molecule or sequence and that is still capable of fulfilling one or more of the functions of the parent molecule or sequence. In some aspects, the modifications in the amino acid sequence of the parent molecule or sequence do not significantly affect or alter the characteristics of the molecule or sequence.
[0504] In some aspects, a “functional variant” of a protein can retains at least about 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% functionality with respect to the full length wildtype protein. Assays to determine the functionality of a given protein are well known in the art, for example, binding assays, cleavage assays, or enzyme assays. In some aspects, the protein can be an antibody, and a functional variant can be a variant that retains at least about 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% of its binding affinity and specificity for its antigen, as determined by any of the assays known in the art, for example ELISA, SPR, bilayer interferometry, flow cytometry, radioimmunoassay, isothermal titration calorimetry, affinity chromatography, or Western blotting. In some aspects, the protein can be a reverse transcriptase, and a functional variant can be a variant that retains at least about 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% of its enzyme activity, as measured by any suitable techniques known in the art, for example, reverse transcription-quantitative PCR (RT-qPCR) assay to measure cDNA synthesis, the ELISA-based enzyme activity assay for detecting reverse transcriptase activity, and the radioactive nucleotide incorporation assay to quantify DNA polymerization by tracking radiolabeled nucleotides.
[0505] An amino acid sequence (peptide, protein, or polypeptide) “derived from” a designated amino acid sequence (peptide, protein, or polypeptide) refers to the origin of the first amino acid sequence. Preferably, the amino acid sequence which is derived from a particular amino acid sequence has an amino acid sequence that is identical, essentially identical, or homologous to that particular sequence or a fragment thereof. Amino acid sequences derived from a particular amino acid sequence can be variants of that particular sequence or a fragment thereof. For example, it will be understood by one of ordinary skill in the art that the antigens suitable for use herein can be altered such that they vary in sequence from the naturally occurring or native sequences from which they were derived, while retaining the desirable activity of the native sequences.
[0506] Changes can be introduced by mutation into a nucleic acid, thereby leading to changes in the amino acid sequence of a polypeptide (e.g., an antigen or antibody or antibody derivative) that it encodes. Mutations can be introduced using any technique known in the art. In some aspects, one or more particular amino acid residues are changed using, for example, a site-directed mutagenesis protocol. In another aspect, one or more randomly selected residues are changed using, for example, a random mutagenesis protocol. In some aspects, however it is made, a mutant polypeptide can be expressed and screened for a desired property.
[0507] Mutations can be introduced into a nucleic acid without significantly altering the biological activity of a polypeptide that it encodes. For example, one can make nucleotide substitutions leading to amino acid substitutions at non-essential amino acid residues. Alternatively, one or more mutations can be introduced into a nucleic acid that selectively changes the biological activity of a polypeptide that it encodes. For example, the mutation can quantitatively or qualitatively change the biological activity. Examples of quantitative changes include increasing, reducing or eliminating the activity. Examples of qualitative changes include altering the antigen specificity of an antibody. [0508] “Sequence similarity” indicates the percentage of amino acids that either are identical or that represent conservative amino acid substitutions. “Sequence identity” between two amino acid sequences indicates the percentage of amino acids that are identical between the sequences. The terms “% identical,” “% identity,” or similar terms are intended to refer, in particular, to the percentage of nucleotides or amino acids which are identical in an optimal alignment between the sequences to be compared. Said percentage is purely statistical, and the differences between the two sequences can be but are not necessarily randomly distributed over the entire length of the sequences to be compared. Comparisons of two sequences are usually carried out by comparing the sequences, after optimal alignment, with respect to a segment or “window of comparison,” in order to identify local regions of corresponding sequences. The optimal alignment for a comparison can be carried out manually or with the aid of the local homology algorithm by Smith and Waterman, 1981, Ads. App. Math. 2, 482, with the aid of the local homology algorithm by Neddleman and Wunsch, 1970, J. Mol. Biol. 48, 443, with the aid of the similarity search algorithm by Pearson and Lipman, 1988, Proc. Natl Acad. Sci. USA 88, 2444, or with the aid of computer programs using said algorithms (FOGSAA, GAP, BESTFIT, FASTA, BLAST P, BLAST N, and TFASTA in Wisconsin Genetics Software Package, Genetics Computer Group). In some aspects, percent identity of two sequences is determined using the BLASTN or BLASTP algorithm, as available on the United States National Center for Biotechnology Information (NCBI) website.
[0509] Percentage identity is obtained by determining the number of identical positions at which the sequences to be compared correspond, dividing this number by the number of positions compared (e.g., the number of positions in the reference sequence), and multiplying this result by 100.
[0510] In some aspects, the degree of similarity or identity is given for a region that is, is at least, is at most, or is between (inclusive or exclusive) any two of about 50%, about 60%, about 70%, about 80%, about 90%, or about 100% of the entire length of the reference sequence. For example, if the reference nucleic acid sequence consists of 200 nucleotides, the degree of identity is given for, for at least, for at most, or for between any two of 100, 120, 140, 160, 180, or 200 nucleotides, or any range derivable therein, in some aspects, continuous nucleotides. In some aspects, the degree of similarity or identity is given for the entire length of the reference sequence.
[0511] Homologous amino acid sequences can exhibit at least, at most, or between (inclusive or exclusive) any two of 40%, 50%, 60%, 70%, 80%, 90%, 95%, 98%, or 99% identity of the amino acid residues. In some aspects, homologous amino acid sequences exhibit at least 95% identity of the amino acid residues. In some aspects, homologous amino acid sequences exhibit at least 98% identity of the amino acid residues. In some aspects, homologous amino acid sequences exhibit at least 99% identity of the amino acid residues.
[0512] The terms “antibody” refers to an intact immunoglobulin of any class or isotype, or a fragment thereof, or a variant that can compete with the intact antibody for specific binding to the target antigen. An isotype refers to the genetic variations or differences in the constant regions of the heavy and light chains of an antibody. In humans, there are five heavy chain isotypes: IgA, IgD, IgG, IgE, and IgM and two light chain isotypes: kappa and lambda. The IgG class is divided into four isotypes: IgGl, IgG2, IgG3 and IgG4 in humans, and IgGl, IgG2a, IgG2b and IgG3 in mice. They share more than 95% homology in the amino acid sequences of the Fc regions but show major differences in the amino acid composition and structure of the hinge region. The term “antibody” includes a polyclonal antibody, a monoclonal antibody, a chimeric antibody, a human antibody, a veneered antibody, a diabody, a humanized antibody, an antibody derivative, a recombinant antibody, a recombinant humanized antibody, an engineered antibody, single chain antibody, single domain antibody, nanobodies, diabodies, a bi-specific antibody, a multi-specific antibody, aDARPin, or a variant of each thereof. Also contemplated are antibodies having specificity for more than one antigen or target, including bispecific antibodies, trispecific antibodies, tetraspecific antibodies, and other multispecific antibodies. As used herein, an “antibody” includes whole antibodies and any antigen binding fragment or a single chain thereof. Thus the term “antibody” includes any protein or peptide containing molecule that comprises at least a portion of an immunoglobulin molecule. As used herein, the terms “antibody” or “immunoglobulin” are used interchangeably and refer to any of several classes of structurally related proteins that function as part of the immune response of an animal, including IgM, IgD, IgG, IgA, IgE, and related proteins, as well as polypeptides comprising antibody CDR domains that retain antigen-binding activity. Examples of such include but are not limited to a complementarity determining region (CDR) of a heavy or light chain or a ligand binding portion thereof, a heavy chain or light chain variable region, a heavy chain or light chain constant region, a framework (FR) region or any portion thereof or at least one portion of a binding protein.
[0513] The term “polypeptide construct” as used herein, refers to a polypeptide engineered by combining a amnio acid sequence with two or more different moieties (for example targeting moieties selected from proteins, peptides, oligonucleotides, aptamers, ligands, small molecules, or any combination thereof. The term “fusion polypeptide” as used herein, refers to a polypeptide engineered by combining sequences from two or more different proteins or peptides into a single polypeptide chain. These combined sequences typically retain one or more of their functional domains, allowing the fusion protein to exhibit multiple properties or activities from its constituent parts. The term polypeptide construct encompasses fusion polypeptides as well as polypeptides (polypeptides or fusion polypeptides) comprising non- protein/peptide moieties. In some aspects, the terms “polypeptide construct” or “fusion polypeptide” can be used interchangeably to refer to an engineered polypeptide comprising two or more protein sequences. In some aspects, the terms “polypeptide construct” or “fusion polypeptide” cannot be interchangeable, wherein the polypeptide construct can comprise one or more proteins and a non-protein/peptide component. In some aspects, the polypeptide construct or fusion polypeptide can further comprise one or more additional sequences, for example for example a leader sequence, one or more purification tags, one or more solubility tags, one or more linker sequences, one or more protease cleavage tags, fluorophores, fluorescent proteins or peptides, or any combination thereof. The one or more protein sequences that are incorporated into the polypeptide construct can be wild-type protein sequences, or variants thereof, or engineered protein sequences, or variants thereof.
[0514] As used herein, the term “promoter” refers to a nucleic acid fragment that functions to control the transcription of one or more genes (or coding sequence), located upstream with respect to the direction of transcription of the transcription initiation site of the gene, and is structurally identified by the presence of a binding site for DNA-dependent RNA polymerase, transcription initiation sites and any other DNA sequences, including, but not limited to transcription factor binding sites, repressor and activator protein binding sites, and any other sequences of nucleotides known to one of skill in the art to act directly or indirectly to regulate the amount of transcription from the promoter. A “constitutive” promoter is a promoter that is active under most physiological and developmental conditions. An “inducible” promoter is a promoter that is regulated depending on physiological or developmental conditions. A “tissue specific” promoter is preferentially active in specific types of differentiated cells/tissues.
[0515] As used herein the term “affinity interactions” refer to the specific, non-covalent binding between two molecules, for example a ligand and its receptor, based on complementary shapes, charge distribution, and molecular interactions such as hydrogen bonding, van der Waals forces, and hydrophobic effects. These interactions are fundamental to numerous biological processes, including enzyme-substrate binding, antigen-antibody recognition, hormone-receptor binding, and DNA-protein interactions. The strength of these interactions, often measured as the affinity constant (Kd), determines the stability and specificity of the binding event. High-affinity interactions, like the biotin-streptavidin pair, which is among the strongest known in nature, are utilized in molecular biology for techniques like affinity purification, immunoprecipitation, and diagnostics. It is specifically contemplated that any limitation discussed with respect to one aspect of the disclosure can apply to any other aspect of the disclosure. Furthermore, any composition of the disclosure can be used in any method of the disclosure, and any method of the disclosure can be used to produce or to utilize any composition of the disclosure. Aspects set forth in the Examples are also aspects that can be implemented in the context of aspects discussed elsewhere in a different Example or elsewhere in the application, such as in the Summary, Detailed Description, Claims, and Brief Description of the Drawings.
II. Polypeptides and Compositions
[0516] In some aspects, the current disclosure encompasses polypeptide constructs comprising a targeting moiety and a reverse transcriptase (RT) enzyme, or a functional variant thereof. In some aspects, the polypeptide comprising the RT enzyme is directed to a biological target by the targeting moiety. In some aspect, the RT enzyme is covalently liked to the targeting moiety, for example via a linker. In some aspects, the RT enzyme is non-covalently linked to the targeting moiety via an affinity interaction.
A. Targeting moiety
[0517] A targeting moiety, as used herein, can be any specific molecule or structure that directs an RTase to a particular biological target. In some aspects, the biological target can be an RNA binding protein (RBP), an RNA modification site, a nucleic acid or an antibody. In some aspects, the biological target can be in proximity, or part of an RNA molecule, which can act as a template for a reverse transcription reaction. In some aspects, wherein the biological target is a RBP, the targeting moiety may be referred to as a RBP-targeting moiety.
[0518] In some aspects, the targeting moiety can comprise an Fc binding protein, or a variant thereof; an antibody, or a variant thereof, an oligonucleotide or a variant thereof, a peptide, a receptor, an aptamer, a ligand, a small molecule, a nucleoside, or any combination thereof. In some aspects, the targeting moiety is an Fc binding protein or peptide. An Fc binding protein or peptide binds specifically to an Fc (Fragment crystallizable) region of an immunoglobulin (an antibody, or an antibody like polypeptides, for example, polyclonal antibody, a monoclonal antibody, a chimeric antibody, a human antibody, a veneered antibody, a diabody, a humanized antibody, an antibody derivative, a recombinant antibody, a recombinant humanized antibody, an engineered antibody, single chain antibody, single domain antibody, nanobodies, diabodies, a bi-specific antibody, a multi -specific antibody, a DARPin, or any variant of each thereof).
[0519] Non-limiting examples of targeting moieties that bind the Fc region of an antibody include, protein A, protein G, protein A/G (pAG), protein L, anti-rabbit IgG, and/or anti-mouse IgG. In some embodiments, a targeting moiety that binds the Fc region of an antibody expressly does not include protein A, protein G, protein A/G (pAG), protein L, anti-rabbit IgG, and/or anti-mouse IgG. In some aspects, the Fc binding protein comprises an amino acid sequence as set forth in any one of SEQ ID NOs: 8, 10, 12 or 14, or an amino acid sequence at least 60% identical thereto. In some aspects, the Fc binding protein comprises an amino acid sequence as set forth in any one of SEQ ID NOs: 8, 10, 12 or 14, or an amino acid sequence with, with at least, or with about 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%, or any percentage derivable therein, identical thereto. In some aspects, the current disclosure also encompasses polynucleotide sequences encoding the disclosed Fc binding protein. In some aspects, the polynucleotide sequence encoding the Fc binding protein can comprise a nucleic acid sequence as set forth in any one of SEQ ID Nos: 7, 9, 11, or 13, or a nucleic acid sequence with at least 60% identity thereto. In some aspects, the polynucleotide sequence encoding the Fc binding protein can comprise a nucleic acid sequence as set forth in any one of SEQ ID NOs: 7, 9, 11, or 13, or a nucleic acid sequence with, with at least, or with about 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%, or any percentage derivable therein, identical thereto. The polynucleotide can be an isolated DNA, a plasmid, a transposon, a viral vector, a genome integrated polynucleotide, or a chromosome.
[0520] The polynucleotide can further comprise a nucleic acid sequence encoding the reverse transcriptase enzyme as disclosed herein. In some aspects, the Fc binding targeting moiety can bind the Fc region of an antibody, wherein the antibody specifically binds a RNA binding protein (RBP). In some aspects, the RBP can be a transcription factor, a splicing factor, RNA helicase, ribonuclease, RNA polymerase, translation initiation factor, or a ribosomal protein. Non-limiting examples of RBPs include YTHDF1, YTHDF2, YTHDC1, HuR, PTB, Musashi, eIF4E, FMRP, LARP1, IMP, hnRNP family proteins, Lin28, AUF1, IGF2BP, FUBP1, LIN28B, RBM5, FUS, TIA1, TTP, QKI, MBNL, CELF, NONO, DDX5, RBM10, SAFB, TDP-43, Ataxin-2, hnRNP A/B, C9orf72, hnRNP H/F, Matrin 3 (MATR3), Pur-alpha, TAF15, Huntingtin, RBFOX, SMN, ELAVL, Ro (SSA) and La (SSB) Proteins, hnRNP, Roquin, Staufenl, NF90/NF110, ILF3, SF3B1, SRSF2, U2AF1, ZRSR2, PRPF8, PRPF31, SNRNP200, HNRNPA1, HNRNPA2B1, NELFE, CPEB1, SRSF1, N0VA1, N0VA2, G3BP1, PTBP1, RBFOX2, HNRNPC, or any variants thereof, or any combinations thereof. In some aspects, the RBP is G3BP1, PTBP1, RBFOX2, HNRNPC, YTHDF1, YTHDF2, YTHDC1, or any or any variants thereof, or any combinations thereof. In some embodiments, RBPs expressly do not include YTHDF1, YTHDF2, YTHDC1, HuR, PTB, Musashi, eIF4E, FMRP, LARP1, IMP, hnRNP family proteins, Lin28, AUF1, IGF2BP, FUBP1, LIN28B, RBM5, FUS, TIA1, TTP, QKI, MBNL, CELF, NONO, DDX5, RBM10, SAFB, TDP-43, Ataxin-2, hnRNP A/B, C9orf72, hnRNP H/F, Matrin 3 (MATR3), Pur-alpha, TAF15, Huntingtin, RBFOX, SMN, ELAVL, Ro (SSA) and La (SSB) Proteins, hnRNP, Roquin, Staufenl, NF90/NF110, ILF3, SF3B1, SRSF2, U2AF1, ZRSR2, PRPF8, PRPF31, SNRNP200, HNRNPA1, HNRNPA2B1, NELFE, CPEB1, SRSF1, N0VA1, N0VA2, G3BP1, PTBP1, RBFOX2, HNRNPC, or any variants thereof, or any combinations thereof. In some aspects, the RBP expressly does not include G3BP1, PTBP1, RBF0X2, HNRNPC, YTHDF1, YTHDF2, YTHDC1, or any or any variants thereof, or any combinations thereof. In some aspects, the Fc binding targeting moiety can bind an antibody that specifically targets a RNA modification site. Non-limiting examples of RNA modifications that can be targeted include m6C, m5C, mxA, m7G, or a pseudouridine modification. In some aspects, Fc binding protein can specifically bind a secondary binding agent, for example a secondary antibody, that binds a primary antibody targeting a RBP, or a RNA modification site.
[0521] In some aspects, the targeting moiety can be an antibody. As indicated, the term antibody is applied broadly here to comprise a polyclonal antibody, a monoclonal antibody, a chimeric antibody, a human antibody, a veneered antibody, a humanized antibody, an antibody derivative, a recombinant antibody, a recombinant humanized antibody, an engineered antibody, single chain antibody, single domain antibody, nanobodies, diabodies, a bi-specific antibody, a multi-specific antibody, a DARPin, a polypeptide comprising the CDRs of an antibody, or a variant of each thereof. In some aspects, the antibody can specifically bind an RBP. In some aspects, any RBP can be used as a bait for the targeting moiety. The RBP can be a transcription factor, a splicing factor, RNA helicase, ribonuclease, RNA polymerase, translation initiation factor, or a ribosomal protein. Non-limiting examples of RBPs include YTHDF1, YTHDF2, YTHDC1, HuR, PTB, Musashi, eIF4E, FMRP, LARP1, IMP, hnRNP family proteins, Lin28, AUF1, IGF2BP, FUBP1, LIN28B, RBM5, FUS, TIA1, TTP, QKI, MBNL, CELF, NONO, DDX5, RBM10, SAFB, TDP-43, Ataxin-2, hnRNP A/B, C9orf72, hnRNP H/F, Matrin 3 (MATR3), Pur-alpha, TAF15, Huntingtin, RBFOX, SMN, ELAVL, Ro (SSA) and La (SSB) Proteins, hnRNP, Roquin, Staufenl, NF90/NF110, ILF3, SF3B1, SRSF2, U2AF1, ZRSR2, PRPF8, PRPF31, SNRNP200, HNRNPA1, HNRNPA2B1, NELFE, CPEB1, SRSF1, NOVAI, NOVA2, G3BP1, PTBP1, RBFOX2, HNRNPC, or any variants thereof, or any combinations thereof. In some aspects, the RBP is G3BP1, PTBP1, RBFOX2, HNRNPC, YTHDF1, YTHDF2, YTHDC1, or any variants thereof, or any combinations thereof. In some aspects the antibody can specifically bind a RNA modification site. Non-limiting examples of RNA modifications that can be targeted include m6C, m5C, nfA, m7G, or a pseudouridine modification. In some aspects, the antibody (e.g., a secondary antibody) can specifically bind the Fc region of another antibody, that specifically binds a RBP, or a RNA modification. In some aspects, the secondary antibody can be an anti-rabbit IgG, and/or anti-mouse IgG. Both anti-rabbit IgG and anti-mouse IgG are categorized under the IgG class of antibodies. Each IgG antibody is composed of two identical heavy chains, approximately 50 kDa each, and two identical light chains, approximately 25 kDa each. In some aspects, the secondary antibody can comprise an amino acid sequence as set forth in any one of SEQ ID NOs: 16 or 18, or an amino acid sequence at least 60% identical thereto. In some aspects, the secondary antibody can comprise an amino acid sequence as set forth in any one of SEQ ID NOs: 16, or 18, or an amino acid sequence with, with at least, or with about 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%, or any percentage derivable therein, identical thereto. In some aspects, the current disclosure also encompasses polynucleotide sequences encoding the disclosed secondary antibody. In some aspects, the polynucleotide sequence encoding the secondary antibody comprises a nucleic acid sequence as set forth in any one of SEQ ID NOs: 15 or 17, or a nucleic acid sequence with at least 60% identity thereto. In some aspects, the polynucleotide sequence encoding the secondary antibody comprises a nucleic acid sequence as set forth in any one of SEQ ID NOs: 15 or 17, or a nucleic acid sequence with, with at least, or with about 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%, or any percentage derivable therein, identical thereto.
[0522] In some aspects, the targeting moiety can be a small molecule or a ligand. In some aspects, the small molecule can specifically bind a RBP. The RBP can be a transcription factor, a splicing factor, RNA helicase, ribonuclease, RNA polymerase, translation initiation factor, or a ribosomal protein. Non-limiting examples of RBPs include YTHDF1, YTHDF2, YTHDC1, HuR, PTB, Musashi, eIF4E, FMRP, LARP1, IMP, hnRNP family proteins, Lin28, AUF1, IGF2BP, FUBP1, LIN28B, RBM5, FUS, TIA1, TTP, QKI, MBNL, CELF, NONO, DDX5, RBM10, SAFB, TDP-43, Ataxin-2, hnRNP A/B, C9orf72, hnRNP H/F, Matrin 3 (MATR3), Pur-alpha, TAF15, Huntingtin, RBFOX, SMN, ELAVL, Ro (SSA) and La (SSB) Proteins, hnRNP, Roquin, Staufenl, NF90/NF110, ILF3, SF3B1, SRSF2, U2AF1, ZRSR2, PRPF8, PRPF31, SNRNP200, HNRNPA1, HNRNP A2B1, NELFE, CPEB1, SRSF1, NO VAI, N0VA2, G3BP1, PTBP1, RBFOX2, HNRNPC, or any variant thereof, or any combinations thereof In some aspects, the RBP is G3BP1, PTBP1, RBFOX2, HNRNPC, YTHDF1, YTHDF2, YTHDC1, or any variant thereof, or any combinations thereof. Non-limiting examples of ligands that can bind RBPs include certain quinoline derivatives, small molecule inhibitors like isoxazole, indole-based compounds, benzothiazole derivatives, and pyrimidine analogs.
[0523] In some aspects, the targeting moiety is an oligonucleotide or a variant thereof, wherein the oligonucleotide specifically binds a RNA or a DNA sequence of interest. In some aspects the oligonucleotide binds a nucleic acid sequence at or in proximity of the RNA that can act as a template for the RT enzyme. In some aspects, the oligonucleotide comprises, consists essentially of, consists an RNA molecule, DNA molecule, LNA (locked nucleic acid), PNA (peptide nucleic acid), morpholino oligonucleotide, phosphorothioate oligonucleotide, gapmers, 2'-Fluoro-modified RNA, or an aptamer sequence. In some aspects the oligonucleotide binds a nucleic acid sequence in proximity of the RNA that can act as a template for the RT enzyme. In some aspects, the oligonucleotide or variant thereof further comprise a label, a barcode, a modified nucleotide, indices, or an affinity tag sequences. Nonlimiting examples of labels and tags include streptavidin and Avitag™, biotin, or a fhiorophore. Non-limiting examples of modifications that can be incorporated into the oligonucleotide include modified bases, modified sugar moieties, and modified phosphate backbones. Examples of modified base moieties which can be incorporated at any position on its structure include, but are not limited to: 5 -fluorouracil, 5-bromouracil, 5 -chlorouracil, 5-iodouracil, hypoxanthine, xanthine, acetylcytosine, 5-(carboxyhydroxylmethyl) uracil, 5- carboxymethylaminomethyl-2-thiouridine, 5-carboxymethylaminom ethyluracil, dihydrouracil, beta-D-galactosylqueosine, inosine, N~6-sopentenyladenine, 1-methylguanine, 1 -methylinosine, 2,2-dimethylguanine, 2-methyladenine, 2-methylguanine, 3 -methylcytosine, 5-methyl cytosine, N6-adenine, 7-methylguanine, 5-methylaminomethyluracil, methoxyaminomethyl-2-thiouracil, beta-D-mannosylqueosine, 5'- methoxycarboxymethyluracil, 5-methoxyuracil, 2-methylthio-N6-isopentenyladenine, uracil- 5-oxyacetic acid, pseudouracil, queosine, 2-thiocytosine, 5 -methyl -2 -thiouracil, 2-thiouracil, 4-thiouracil, 5 -methyluracil, uracil-5-oxyacetic acid methylester, uracil-S-oxyacetic acid, 5- methyl -2-thiouracil, 3-(3-amino-3-N-2-carboxypropyl) uracil, 2,6-diaminopurine and biotinylated analogs, amongst others. Examples of modified sugar moieties which may be used to modify nucleotides at any position on its structure include, but are not limited to arabinose, 2-fluoroarabinose, xylose, and hexose, or a modified component of the phosphate backbone, such as phosphorothioate, a phosphorodithioate, a phosphoramidothioate, a phosphoramidate, a phosphordiamidate, a methylphosphonate, an alkyl phosphotriester, or a formacetal or analog thereof.
[0524] In some aspects, the RT enzyme may bind the tag sequence, for example a streptavidin-conjugated RT enzyme may bind the biotinylated oligonucleotide. In some aspects, the RT enzyme may be covalently linked to the oligonucleotide. In some aspects, the oligonucleotide specifically binds an RNA modification site or an RBP as disclosed herein.
[0525] In some aspects, the targeting moiety can be an aptamer. Aptamers are oligomers of artificial ssDNA, RNA, XNA (Xeno nucleic acids), or peptides that bind a specific target molecule, or family of target molecules. They exhibit a range of affinities, with variable levels of off-target binding and are sometimes classified as chemical antibodies. Peptide aptamers can include a peptide loop (which is specific for a target protein) attached at both ends to a protein scaffold. This double structural constraint greatly increases the binding affinity of the peptide aptamer to levels comparable to an antibody. The variable loop length is typically 8 to 20 amino acids (e.g., 8 to 12 amino acids), and the scaffold can be any protein which is stable, soluble, small, and non-toxic (e.g., thioredoxin-A, stefin A triple mutant, green fluorescent protein, eglin C, and cellular transcription factor Spl). Peptide aptamer selection can be made using different systems, such as the yeast two-hybrid system (e.g., Gal4 yeast-two-hybrid system) or the LexA interaction trap system. In some aspects, the peptide aptamer specifically binds the RBP as provided herein or a RNA modification site as provided herein.
[0526] Nucleic acid aptamers are single-stranded nucleic acid (DNA or RNA) ligands that function by folding into a specific globular structure that dictates binding to target proteins, nucleic acids, or other molecules with high affinity and specificity, as described by Osborne et al., Curr. Opin. Chem. Biol. 1 :5-9, 1997; and Cerchia et al., FEBS Letters 528:12-16, 2002. In particular aspects, aptamers are small (15 KD; or between 15-80 nucleotides or between 20-50 nucleotides). Aptamers are typically isolated from libraries consisting of 1014-1015 random oligonucleotide sequences by a procedure termed SELEX (systematic evolution of ligands by exponential enrichment). Further methods of generating aptamers are described in, for example, U.S. Pat. Nos. 6,344,318; 6,331,398; 6,110,900; 5,817,785; 5,756,291; 5,696,249; 5,670,637; 5,637,461; 5,595,877; 5,527,894; 5,496,938; 5,475,096; and 5,270,16. Spiegelmers are similar to nucleic acid aptamers except that at least one P-ribose unit is replaced by P-D- deoxyribose or a modified sugar unit selected from, for example, P-D-ribose, a-D-ribose, P-L- ribose. In some aspects, an aptamer for use in the current disclosure specifically binds a RBP, or a RNA modification site. Non-limiting examples of aptamers that bind RBPs, for example transcription factors, include aptamers to NF-KB, Spl, AP-1, c-Myc, STAT3, or TATA- binding protein.
[0527] Some aspects of the present disclosure are directed to and/or comprise polypeptide constructs comprising the targeting moiety fused to a RTase.
B. Polypeptide constructs and polynucleotides encoding the same
[0528] In some aspects, the current disclosure also encompasses polypeptide constructs comprising that comprise a targeting moiety and an RTase as disclosed herein. In some aspects, a targeting moiety is covalently linked to the RTase, which allows site-specific delivery of the RTase to the biological target of interest. In some aspects, a polypeptide construct is a fusion protein, comprising sequences from two or more polypeptides. In some aspects, a polypeptide construct is not a fusion protein, and comprises at least one non-polypeptide moiety. In some aspects, the polypeptide construct can further comprise additional peptide sequences, for example a leader sequence, one or more purification tags, one or more solubility tags, one or more linker sequences, one or more protease cleavage tags, or any combination thereof.
[0529] In some aspects, a polypeptide construct can comprise an RTase, and a targeting moiety, for example, an Fc binding protein, or a variant thereof; an antibody, or a variant thereof, an oligonucleotide, a peptide, a receptor, an aptamer, a ligand, a small molecule, or any combination thereof. In some aspects, a polypeptide construct comprises a RTase, or a functional variant thereof; and a Fc binding protein, or a functional variant thereof. In some aspect, the RTase may be any RTase known in the art, or functional variant thereof. These include RTases found in viruses, bacteria, plants and animals. Non-limiting examples include, Human T-Cell Leukemia Virus RTase, Hepadnavirus RTase, Moloney murine leukemia virus (MMLV) RTase, avian myeloblastosis virus (AMV) RTase, human immunodeficiency virus (HIV) RTase. In some aspects, the RTase is not HIV RTase. In some aspects, the RTase is not AMV RTase. In some aspects, the RTase is not Human T-Cell Leukemia Virus RTase. In some aspects, the RTase is not , Hepadnavirus RTase. In some aspects, a polypeptide construct comprises a RTase, for example, Moloney murine leukemia virus (MMLV) RTase, avian myeloblastosis virus (AMV) RTase, human immunodeficiency virus (HIV) RTase, or a functional variant thereof; and a Fc binding protein, or a variant thereof. In some aspects, a polypeptide construct comprises a RTase that is enzymatically active at a temperature below 46 °C, 45 °C, 44 °C, 43 °C, 42 °C, 41 °C, or 40 °C. In some aspects, a polypeptide construct comprises a RTase that is enzymatically active at 37 °C.
[0530] In some aspects, the polypeptide construct comprises an amino acid sequence as set forth in any one of SEQ ID NOs: 2, 4 or 6, or a sequence that is, is at least, or is about 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%,
77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%,
93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%, identical thereto; and an amino acid sequence as set forth in any one of SEQ ID NOs: 8, 10, or 12, or a sequence with, with at least, or with about 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%,
75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%,
91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity thereto.
[0531] In some aspects, the polynucleotide construct may further comprise a linker sequence. Suitable linker sequences are well known in the art, and can comprise a glycineserine linker. The term “linker” according to the disclosure relates to a peptide between two protein domains to connect said domains. There is no particular limitation regarding the linker sequence. However, it is preferred that the linker sequence reduces steric hindrance between the two peptide domains, and is well translated. The linker can comprise 3 or more, 6 or more, 9 or more, 10 or more, 15 or more, 20 or more and in some aspects, up to 100, up to 90, up to 80, up to 70 or up to 60, up to 50, up to 45, up to 40, up to 35, or up to 30 amino acids. The linker may be enriched in glycine and/or serine amino acids. In some aspects, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, or at least 95%, or more of the amino acids of the linker are glycine and/or serine. In some aspects, a linker is substantially composed of the amino acids glycine and serine. In some aspects, the linker is a Glycine/Serine linker and comprises the amino acid sequence (Gly-Gly-Gly-Ser)n or (Gly-Gly- Gly-Gly-Ser)n, where n is a positive integer equal to or greater than 1. For example, n=l, n=2, n=3, n=4, n=5, n=6, n=7, n=8, n=9 and n=10. In some aspects, the linkers include, but are not limited to, (Gly4Ser)4 or (Gly4Ser)3. In another aspect, the linkers include multiple repeats of (GlyxSer)n, where x=l, 2, 3, 4 or 5 and n is 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10. In some aspects, the linker comprises an amino acid sequence as set forth in SEQ ID NO: 28, or an amino acid sequence with, with at least, or with about 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%,
-n - 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%, or any percentage derivable therein, sequency identity thereto. In some aspects, the linker sequence can be encoded by a polynucleotide sequence comprising a nucleic acid sequence as set forth in SEQ ID NO: 27, or a sequence at least 60% identical thereto. In some aspects, the linker sequence can be encoded by polynucleotide sequence comprising a nucleic acid sequence as set forth in any one of SEQ ID NO: 27 or a nucleic acid sequence with, with at least, or with about 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%, or any percentage derivable therein, sequency identity thereto.
[0532] In some aspects, the polypeptide construct can comprise one or more peptide tag sequences. Peptide tag sequences are well known in the art and can comprise purification tag, solubilization tags, and cleavage tags. Non-limiting examples of peptide tags include, the a maltose binding protein (MBP) tag, a GST-tag, a FLAG tag, an HA tag, a His-tag, a SUMO- tag, a Trx-tag, or a Halo-tag. In some aspects, the purification tag comprises an amino acid sequence as set forth in SEQ ID NO: 30, or an amino acid sequence with, with at least, or with about 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%, or any percentage derivable therein, sequency identity thereto. In some aspects, the peptide tag sequence can be encoded by a polynucleotide sequence comprising a nucleic acid sequence as set forth in SEQ ID NO: 29, or a sequence at least 60% identical thereto. In some aspects, the peptide tag sequence can be encoded by polynucleotide sequence comprising a nucleic acid sequence as set forth in any one of SEQ ID NO: 29 or a nucleic acid sequence with, with at least, or with about 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%, or any percentage derivable therein, sequency identity thereto.
[0533] In some aspects, the polypeptide construct comprises a polypeptide sequence comprising an amino acid sequence as set forth in any one of SEQ ID NO: 20, 22, or 24, or a sequence at least 60% identical thereto. In some aspects, the polypeptide construct comprises a polypeptide sequence comprising an amino acid sequence as set forth in any one of SEQ ID NOs: 20, 22, or 24, or an amino acid sequence with, with at least, or with about 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%, or any percentage derivable therein, sequency identity thereto. In some aspects, the polypeptide sequence can be encoded by a polynucleotide sequence comprising a nucleic acid sequence as set forth in SEQ ID NO: 19, 21, or 23, or a sequence at least 60% identical thereto. In some aspects, the current disclosure also encompasses a polynucleotide sequence encoding the polypeptide sequence disclosed herein. In some aspects, the polypeptide sequence can be encoded by polynucleotide sequence comprising a nucleic acid sequence as set forth in any one of SEQ ID NOs: 19, 21, or 23, or a nucleic acid sequence with, with at least, or with about 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%, or any percentage derivable therein, sequency identity thereto.
[0534] In some aspects, the current disclosure also encompasses a polynucleotide encoding the polypeptides disclosed herein. In some aspects, the polynucleotide encoding the disclosed polynucleotide construct can be an isolated DNA, a plasmid, a transposon, a viral vector, a genome integrated polynucleotide, or a chromosome. In some aspects, the polynucleotide sequence can further comprise one or more regulatory sequences. A regulatory sequence refers to any genetic element that is known to drive or otherwise regulate expression of nucleic acids. Non-limiting examples include promoters, transcription terminators, enhancers, repressors, silencers, kozak sequences, polyA sequences, ribosome skipping sequences (for example sequences encoding P2A and T2A peptides) and the like. In some aspects, a regulatory sequence can, for example, be inducible, non-inducible, constitutive, cell-cycle regulated, metabolically regulated, and the like. A regulatory sequence may comprise a promoter. In some aspects, the nucleic acid sequence encoding the fusion polypeptide can be operably linked to the promoter. In some aspects, the promoter can be an inducible promoter, a constitutive promoter, a tissue specific promoter, a weak promoter, a strong promoter, or combinations thereof. In some aspects, the promoter may also comprise an enhancer sequence.
[0535] In some aspects, the polynucleotide construct disclosed herein may further comprise a fluorophore. In some aspects, the fluorophore comprises, consists essentially of, or consists of Green Fluorescent Protein (GFP), eGFP, Red Fluorescent Protein (RFP), Teal Fluorescent Protein (TFP), Blue Fluorescent Protein (BFP), Yellow Fluorescent Protein (YFP), miRFP, cerulean fluorescent protein (CFP), eCyanFP, mCherry, mVenus, mOrange, mTurquoise, tdTomato, aminocoumarin, fluorescein, texas red, Alexa Fluor dyes (e.g. Alexa Fluor 488, Alexa Fluor 555, Alexa Fluor 594, Alexa Fluor 647, Alexa Fluor 350, Alexa Fluor 532, and Alexa Fluor 700), Cy dyes (e.g. Cy3, Cy5), DyLight dyes, FITC, or Rhodamine, or functional variants thereof. In some aspects, polynucleotide constructs described herein do not comprise a fluorophore. In some aspects, a polynucleotide construct does not comprise Green Fluorescent Protein (GFP), eGFP, Red Fluorescent Protein (RFP), Teal Fluorescent Protein (TFP), Blue Fluorescent Protein (BFP), Yellow Fluorescent Protein (YFP), miRFP, cerulean fluorescent protein (CFP), eCyanFP, mCherry, mVenus, mOrange, mTurquoise, tdTomato, aminocoumarin, fluorescein, texas red, Alexa Fluor dyes (e.g. Alexa Fluor 488, Alexa Fluor 555, Alexa Fluor 594, Alexa Fluor 647, Alexa Fluor 350, Alexa Fluor 532, and Alexa Fluor 700), Cy dyes (e.g. Cy3, Cy5), DyLight dyes, FITC, or Rhodamine, or functional variants thereof.
C. Transcriptase composition
[0536] In some aspects, the current disclosure also encompasses transcriptase compositions comprising one or more of the polypeptide constructs disclosed herein. In some aspects, the transcriptase composition further comprises additional molecules (together referred to as the transcriptase mix). Non-limiting examples of molecules that can be incorporated into the transcriptase mix include primers (for example, an adapter RT -primer comprising an adapter primer and an RT primer sequence), dNTPs, buffer solutions, magnesium ions (Mg2+), and RNase inhibitor. In some aspects, the transcriptase composition comprises one or more polypeptide constructs disclosed herein, and transcriptase mix comprising one or more adapter RT primers, wherein the one or more adapter RT primer each comprises an adapter primer sequence and an RT primer sequence. An adapter primer in sequencing is a short, synthetic oligonucleotide used to facilitate the attachment of DNA fragments to a sequencing platform. It typically contains two parts: a complementary sequence that binds to the target DNA or RNA fragment and an adapter sequence that is recognized by the sequencing machinery. Without being bound by theory, adapter primers are in some aspects crucial in next-generation sequencing (NGS) workflows, where they enable DNA fragments to be amplified and sequenced. They also allow for the incorporation of additional sequences like barcodes or indexes, which help identify different samples within a sequencing run. In some aspects, the adapter primers comprises a barcode.
[0537] In some aspects, the adapter RT primer of the current disclosure comprises an adapter sequence that is a short, synthetic oligonucleotide used to facilitate the attachment of DNA fragments to a sequencing platform, and which can further comprise barcodes, indexes etc. and a RT primer. RT primer sequences are short, single-stranded sequences of nucleotides used to initiate the synthesis of complementary DNA (cDNA) from an RNA template during reverse transcription. There are three common types of primers used for reverse transcription: oligo(dT) primers, random RT primers (for example, random 6-mers, 7-mers, 8-mers, 9-mers, 10-mers, 11-mers, 12-mers), and gene-specific primers. Oligo(dT) primers are used to bind the poly-A tail of eukaryotic mRNA, ensuring that only mRNA is reverse transcribed. Random RT primers are short, random sequences that bind to multiple locations on the RNA, allowing for the reverse transcription of all RNA species, including non-polyadenylated RNA. Genespecific primers are designed to bind to a specific region of a target RNA, facilitating the reverse transcription of particular genes or transcripts.
[0538] In some aspects, any type of suitable RT primer may be used, or may be expressly excluded from the transcriptase composition. In some aspects, the RT primers do not comprise oligo(dT) primers. In some aspects, the RT primer comprises a gene-specific primer. In some aspects, the RT primer does not comprise a gene specific primer. In some aspects, the RT primer is a random RT primer. In some aspects, the random RT primer is a hexamer. In some aspects, the random RT primer is not a hexamer, but oligonucleotide greater than six nucleotides in length. In some aspects, the random RT primer is at least 8 nucleotides in length, at least 9 nucleotides in length, or at least 10 nucleotides in length, at least 11 nucleotides in length, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, or more nucleotides in length. In some aspects, the random RT primer is between 8 and 20 nucleotides in length. In some aspects, the random RT primer is less than 20 nucleotides in length. In some aspects, the random RT primers comprise, consist, or consist essentially of at least septamers, octamers, nonamers, decamers, undecamers, dodecamers, tridecamers, tetradecamers, pentadecamers, hexadecamers, heptadecamers, octadecamers, nonadecamers, or eicosamers.
[0539] In some aspects, the random RT primer may further comprise a reactive moiety such that it can react with a barcoded oligonucleotide comprising an antibody or targeting moiety, wherein the barcode oligonucleotide comprises a corresponding reactive moiety for click chemistry. A reactive moiety of a random RT primer may be selected from the nonlimiting group consisting of azides, alkynes, nitrones (e.g., 1,3 -nitrones), strained alkenes (e.g., trans-cycloalkenes such as cyclooctenes or oxanorbomadiene), tetrazines, tetrazoles, iodides, thioates (e.g., phorphorothioate), acids, amines, and phosphates. For example, the first reactive moiety of the RT primer may comprise an azide moiety, and a second reactive moiety of the barcode oligonucleotide may comprise an alkyne moiety. The first and second reactive moieties may react to form a linking moiety. A reaction between the first and second reactive moieties may be, for example, a cycloaddition reaction such as a strain-promoted azide-alkyne cycloaddition, a copper-catalyzed azide-alkyne cycloaddition, a strain-promoted alkyne- nitrone cycloaddition, a Diels-Alder reaction, a [3+2] cycloaddition, a [4+2] cycloaddition, or a [4+1] cycloaddition; a thiol-ene reaction; a nucleophilic substation reaction; or another reaction. In some cases, reaction between the first and second reactive moieties may yield a triazole moiety or an isoxazoline moiety. A reaction between the first and second reactive moieties may involve subjecting the reactive moieties to suitable conditions such as a suitable temperature, pH, or pressure and providing one or more reagents or catalysts for the reaction. For example, a reaction between the first and second reactive moieties may be catalyzed by a copper catalyst, a ruthenium catalyst, or a strained species such as a difluorooctyne, dibenzylcyclooctyne, or biarylazacyclooctynone. In some aspects, the random RT primer disclosed herein may further comprise a azide functional group (NNNN-N3 ).
[0540] In some aspects, the transcriptase composition comprises, consists, or consists essentially of at least one polypeptide construct, and transcriptase mix comprising the adapter- RT primers, dNTPs, and other components for reverse transcription. In some aspects, the adapter-RT primers comprise, consist, or consists essentially of an adapter primer fused to a random RT primer. In some aspects, an adapter-RT primer is encoded by a polynucleotide sequence with, with at least, or with about 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%, or any percentage derivable therein, identity to SEQ ID NO. 25.
SEQ ID NO: 25 - Adapter-RT primer polynucleotide coding sequence AGACGTGTGCTCTTCCGATCTNNNNNNNNNN ( SEQ ID NO : 25 )
[0541] In some aspects, the adapter RT primer comprises a nucleic acid sequence reverse complementary to a cDNA adaptor. In some aspects, the cDNA adapter is encoded by a polynucleotide sequence, with greater than, equal to, at least, at most, or about 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%, or any percentage derivable therein, identity to SEQ ID NO. 26. SEQ ID NO: 26 - cDNA Adaptor
[0542] 5Phos/NNNNNNNNAGATCGGAAGAGCGTCGTGT/ 3SpC3 / ( SEQ ID NO : 26.
[0543] In some aspects, the Transcriptase mix comprises adapter-RT primers at a concentration of at greater than, equal to, at least, at most, or about 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0, 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5, 8.0, 8.5, 9.0, 9.5, or 10.0 pM/mM, including any range or value derivable therein. In some aspects, the adapter-RT primer comprises, consists, or consists essentially of a sequence at least 80% identical to SEQ. ID. NO. 25.
[0544] In some aspects, the transcriptase mix further comprises nucleotides, for example dNTPs. In some aspects, the dNTPs comprise, consist, or consist essentially of dCTPS, dTTPs, dATPs, and dGTPs. In some aspects, the dNTPs comprise, consist, or consist essentially of at least one labeled dNTP. In some aspects, the labeled dNTP is labeled with biotin. In some aspects, the labeled dNTP is labeled with biotin- 16. In some aspects, the labeled dNTP comprise, consist, or consist essentially of biotin- 16-dUTP, or biotin- 16-dCTP. In some aspects, the labeled dNTP is mixed with a corresponding non-labeled dNTP. In some aspects, the labeled dNTP is mixed with a non-labeled dNTP at a ratio of greater than, equal to, at least, at most, or about 0.5:1, 0.6: 1, 0:7:1, 0.8: 1, 0.9: 1, 1 :1, 1.1 :1, 1.2: 1, 1.3:1, 1.4:1, 1.5:1, 1.6: 1, 1.7:1, 1.8:1, 1.9:1 or 2:1. In some aspects, the dNTPs comprise, consist, or consist essentially of a combination of biotin- 16-dUTP, biotin- 16-dCTP, dTTP, dCTP, dATP, or dGTP, or any combination thereof. In some aspects, the biotin- 16-dUTP is at a concentration of greater than, equal to, at least, at most, or about 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5, 8.0, 8.5, 9.0, 9.5, or 10.0 pM/mM, including any range or value derivable therein. In some aspects, the biotin-16-dCTP is at a concentration of greater than, equal to, at least, at most, or about 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5, 8.0, 8.5, 9.0, 9.5, or 10.0 pM/mM, including any range or value derivable therein. In some aspects, the dTTP is at a concentration of greater than, equal to, at least, at most, or about 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5, 8.0, 8.5, 9.0, 9.5, or 10.0 pM/mM, including any range or value derivable therein. In some aspects, the dCTP is at a concentration of greater than, equal to, at least, at most, or about 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5, 8.0, 8.5, 9.0, 9.5, or 10.0 pM/mM, including any range or value derivable therein. In some aspects, the dATP is at a concentration of greater than, equal to, at least, at most, or about 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5, 8.0, 8.5, 9.0, 9.5, or 10.0 pM/mM, including any range or value derivable therein. In some aspects, the dGTP is at a concentration of greater than, equal to, at least, at most, or about 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5, 8.0, 8.5, 9.0, 9.5, or 10.0 pM/mM, including any range or value derivable therein.
[0545] In some aspects, one of more nucleotides comprise modified bases, modified sugar moieties, and modified phosphate backbones. Examples of modified base moieties which can be incorporated at any position on its structure include, but are not limited to: 5-fluorouracil, 5-bromouracil, 5-chlorouracil, 5-iodouracil, hypoxanthine, xanthine, acetylcytosine, 5- (carboxyhydroxylmethyl) uracil, 5-carboxymethylaminomethyl-2-thiouridine, 5- carboxymethylaminomethyluracil, dihydrouracil, beta-D-galactosylqueosine, inosine, N~6- sopentenyladenine, 1-methylguanine, 1 -methylinosine, 2,2-dimethylguanine, 2- methyl adenine, 2-methylguanine, 3 -methylcytosine, 5-methyl cytosine, N6-adenine, 7- methylguanine, 5-methylaminomethyluracil, methoxyarninomethyl-2 -thiouracil, beta-D- mannosylqueosine, 5 '-methoxy carboxymethyluracil, 5 -methoxyuracil, 2-methylthio-N6- isopentenyladenine, uracil -5 -oxy acetic acid, pseudouracil, queosine, 2-thiocytosine, 5-methyl- 2-thiouracil, 2-thiouracil, 4-thiouracil, 5 -methyluracil, uracil-5-oxyacetic acid methylester, uracil-S-oxyacetic acid, 5-methyl-2 -thiouracil, 3-(3-amino-3-N-2-carboxypropyl) uracil, 2,6- diaminopurine and biotinylated analogs, amongst others. Examples of modified sugar moieties which may be used to modify nucleotides at any position on its structure include, but are not limited to arabinose, 2-fluoroarabinose, xylose, and hexose, or a modified component of the phosphate backbone, such as phosphorothioate, a phosphorodithioate, a phosphoramidothioate, a phosphoramidate, a phosphordiamidate, a methylphosphonate, an alkyl phosphotriester, or a formacetal or analog thereof.
[0546] In some aspects, the transcriptase mix may further comprise an RNAse inhibitor (for example, a non-competitive inhibitor of pancreatic-type ribonucleases). In some aspects, the non-competitive inhibitor of pancreatic-type ribonucleases comprises, consists, or consists essentially of at greater than, equal to, at least, at most, or about 0.1, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, or 5.0 U/pl RNaseOUT, including any range or value derivable therein. [0547] In some aspects, the transcriptase mix may further comprise a buffer. In some aspects, any suitable buffer may be used. Non-limiting examples include Tris-HCl, MOPS, phosphate buffered saline (PBS), or Dulbecco’s phosphate buffered saline. In some aspects, the pH of the reaction mixture ranges from 5 to 9, such as from 7 to 9, including from 8 to 9, e.g., 8 to 8.5. In some instances, the reaction mixture includes or expressly does not include a pH adjusting agent. pH adjusting agents of interest include, but are not limited to, sodium hydroxide, hydrochloric acid, phosphoric acid buffer solution, citric acid buffer solution, Tris- HCl, MOPS, phosphate buffered saline (PBS), or Dulbecco’s phosphate buffered saline (DPBS), and the like. In some aspects, the buffer comprises, consists, or consists essentially of greater than, equal to, at least, at most, or about 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, or 100 pl of DPBS, including any range or value derivable therein. In some aspects, the MgCh is at a concentration of, of greater than, equal to, at least, at most, or about 0.1, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0, 2.1, 2.2, 2.3,
2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 3.0, 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9, 4.0, 4.5, 5.0, 5.5, 6.0,
6.5, 7.0, 7.5, 8.0, 8.5, 9.0, 9.5, or 10.0 mM, including any range or value derivable therein.
[0548] In some aspects, the transcriptase composition is provided to the sample for of greater than, equal to, at least, or at most 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, or 60 minutes, including any range or value derivable therein to obtain a cDNA. In some aspects, the transcriptase mix is provided to the sample at less than, equal to, about or more than 34 °C, 35 °C, 36 °C, 37 °C, 38 °C, 39 °C, 40 °C, 41 °C, 42 °C, 43 °C, 44 °C, 45 °C, 46 °C, 47 °C, 48 °C, 49 °C, 50 °C, 51 °C, or 52 °C, 53 °C, 54 °C, 55 °C, 56 °C, 57 °C, 58 °C, or any range derivable there. In some aspects, the transcriptase mix is provided to the sample at less than, equal to, about, or more than 34 °C, 35 °C, 36 °C, 37 °C, 38 °C, 39 °C, 40 °C, 41 °C, 42 °C. In some aspects, the transcriptase mix is provided to the sample at 37 °C - 42 °C.
[0549] In some aspects, the transcriptase is enzymatically active at a temperature below 42 °C. In some aspects, the transcriptase comprises one or more mutations to render transcriptase enzymatically active at a temperature below 42 °C.
[0550] In some aspects, the cDNA comprises, consists, or consists essentially of the dNTPs. In some aspects, the cDNA comprises, consists, or consists essentially of unlabeled dNTPs. In some aspects, the cDNA comprises, consists, or consists essentially of labeled dNTPs. In some aspects, the cDNA comprises both labeled and unlabeled dNTPs. In some aspects, the cDNA is biotinylated. In some aspects, the cDNA comprises biotin- 16. D. RNA binding proteins (RBP) and RBP targeting agents
[0551] In some aspects, the current disclosure encompasses methods for determining one or more interaction sites of a RNA-binding protein (RBP) in a biological sample. In some aspect, the current disclosure also encompasses methods for determining one or more interaction sites of more than one RBP in a biological sample. In some aspects, an RBP of the current disclosure is any protein that interacts with RNA molecules through RNA-binding domains or motifs. RBPs are essential regulators of gene expression, playing critical roles in almost every aspect of RNA metabolism, including transcription, splicing, transport, localization, translation, and degradation. By binding to specific RNA sequences or structures, RBPs control the fate and function of various types of RNAs, such as mRNA, rRNA, and noncoding RNAs. Their activity is vital for cellular processes like differentiation, development, and response to stress. Dysregulation of RBPs is associated with various diseases, including neurodegenerative disorders and cancers. Therefore, in some aspects, the current disclosure also encompasses using the compositions and methods disclosed herein, for the study of RBPs and application of such studies for clinical and non-clinical developments.
[0552] Any RBP of interest, or variant thereof, is suitable for study using the current disclosure. In an aspect, the RBP may be a eukaryotic RBP, or a eukaryotic RBP. In an aspect, the RBP may be RBP commonly found in the eukaryotic class Animalia, Plantae, Fungi, or Protista. In an aspect, the RBP is a mammalian RBP. In an aspect, the RBP is from a laboratory animal, for example a mouse, a rat, a gerbil, a nematode, or a fruit fly. In an aspect, the RBP is a human RBP. The RBP can be a wild-type RBP, or a natural variant thereof, or an engineered RBP. In some aspects, a disclosed RBP comprises, consists essentially of, consists of a transcription factor, a splicing factor, RNA helicase, ribonuclease, RNA polymerase, translation initiation factor, or ribosomal protein. Non-limiting examples of RBPs include YTHDF1, YTHDF2, YTHDC1, HuR, PTB, Musashi, eIF4E, FMRP, LARP1, IMP, hnRNP family proteins, Lin28, AUF1, IGF2BP, FUBP1, LIN28B, RBM5, FUS, TIA1, TTP, QKI, MBNL, CELF, NONO, DDX5, RBM10, SAFB, TDP-43, Ataxin-2, hnRNP A/B, C9orf72, hnRNP H/F, Matrin 3 (MATR3), Pur-alpha, TAF15, Huntingtin, RBFOX, SMN, ELAVL, Ro (SSA) and La (SSB) Proteins, hnRNP, Roquin, Staufenl, NF90/NF110, ILF3, SF3B1, SRSF2, U2AF1, ZRSR2, PRPF8, PRPF31, SNRNP200, HNRNPA1, HNRNP A2B1, NELFE, CPEB1, SRSF1, N0VA1, N0VA2, G3BP1, PTBP1, RBFOX2, and/or HNRNPC.
[0553] In some aspects, the current disclosure also encompasses targeting agents that specifically bind to a RBP. The some aspects the RBP-targeting agent may comprise any molecule that specifically binds the RBP. Non-limiting examples include antibodies, and functional variants thereof, oligonucleotides or variants thereof, peptides, ligands, small molecules, or aptamers. In an aspect, the RBP-targeting agent is an antibody. The term antibody is used broadly here and comprises monoclonal antibodies, polyclonal antibodies, recombinant antibodies, IgG, Fv, single chain antibodies, single domain antibodies, nanobodies, diabodies, bispecific and/or multispecific antibodies, scFv, Fab, F(ab')2, Fab, or variants thereof.
[0554] In some aspects, the RBP-targeting agent can comprise an oligonucleotide comprising a DNA-barcode. In some aspects, the oligonucleotide is linked to the RBP-targeting agent via an amino spacer. In some aspects, the amino spacer is a 7 C6 amino spacer, wherein a non-nucleoside modification adds a primary amino group to an oligo's internal position. The amino group is separated from the 5' end nucleotide base by a 6-carbon spacer arm to reduce steric interaction.
[0555] The DNA-barcode can be unique for each RBP being studied. In some aspects, use of multiple barcoded antibodies, wherein each barcode is specific to a RBP, allows for studying more than one RBP using the methods disclosed herein. In some aspects, the oligonucleotide may further comprise a reactive moiety that is operable in attaching the barcode to a cDNA of the disclosed method. A reactive moiety of a barcoded antibody may be selected from the nonlimiting group consisting of azides, alkynes, nitrones (e.g., 1,3 -nitrones), strained alkenes (e.g., trans-cycloalkenes such as cyclooctenes or oxanorbomadiene), tetrazines, tetrazoles, iodides, thioates (e.g., phorphorothioate), acids, amines, and phosphates. For example, the first reactive moiety of the RT primer may comprise an azide moiety, and a second reactive moiety of the barcode oligonucleotide may comprise an alkyne moiety. The first and second reactive moieties may react to form a linking moiety. A reaction between the first and second reactive moieties may be, for example, a cycloaddition reaction such as a strain-promoted azide-alkyne cycloaddition, a copper-catalyzed azide-alkyne cycloaddition, a strain-promoted alkyne- nitrone cycloaddition, a Diels-Alder reaction, a [3+2] cycloaddition, a [4+2] cycloaddition, or a [4+1] cycloaddition; a thiol-ene reaction; a nucleophilic substation reaction; or another reaction. In some cases, reaction between the first and second reactive moieties may yield a triazole moiety or an isoxazoline moiety. A reaction between the first and second reactive moieties may involve subjecting the reactive moieties to suitable conditions such as a suitable temperature, pH, or pressure and providing one or more reagents or catalysts for the reaction. For example, a reaction between the first and second reactive moieties may be catalyzed by a copper catalyst, a ruthenium catalyst, or a strained species such as a difluorooctyne, dibenzylcyclooctyne, or biarylazacyclooctynone. [0556] Table A provides a list of some exemplary oligonucleotides that may be linked to the RBP binding agent via an amino spacer, and that comprise an alkyne group reactive moiety.
Table A: List of oligonucleotides with barcodes that can be linked to RBP-targeting agent via a amino spacer 7 C6 ([AmsC6](5’ Amino C6 linker) and comprises a reactive alkyne group ([PPG-3-O-N]: 3'-0-propargyl N 2’-5' linked )
Figure imgf000089_0001
Figure imgf000090_0001
Figure imgf000091_0001
[0557] In some aspects, the oligonucleotide comprises a nucleic acid sequence as set forth in any one of SEQ ID NOs: 31-78, or a nucleic acid sequence greater than, equal to, at least, at most, or about 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%, or any percentage derivable therein, identical thereto.
[0558] In some aspects, the RBP-targeting agent is labeled. In some aspects, the label comprises, consists essentially of, or consists of a radioisotopes, a hapten, a fluorescent label, a fluorescent polypeptide, a phosphorescent molecule, a chemiluminescent molecule, a chromophore, a luminescent molecule, a photoaffinity molecule, a colored particle, and/or a ligand. In some aspects, the RBP-targeting agent comprises a fluorescent label. In some aspects, the fluorescent label comprises, consists essentially of, or consists of Green Fluorescent Protein (GFP), eGFP, Red Fluorescent Protein (RFP), Teal Fluorescent Protein (TFP), Blue Fluorescent Protein (BFP), Yellow Fluorescent Protein (YFP), miRFP, cerulean fluorescent protein (CFP), eCyanFP, mCherry, mVenus, mOrange, mTurquoise, tdTomato, aminocoumarin, fluorescein, texas red, Alexa Fluor dyes (e.g. Alexa Fluor 488, Alexa Fluor 555, Alexa Fluor 594, Alexa Fluor 647, Alexa Fluor 350, Alexa Fluor 532, and Alexa Fluor 700), Cy dyes (e.g. Cy3, Cy5), DyLight dyes, FITC, or Rhodamine, or functional variants thereof.
III. Methods
[0559] In some aspects, the current disclosure encompasses methods of determining one or more RNA interaction sites of a RNA-binding protein (RBP) in a biological sample. In some aspects, the method comprises the steps of a) contacting (e.g., incubating together) a RBP- targeting agent to the RBP, wherein the RBP-targeting agent specifically binds the RBP to form a first complex; b) contacting (e.g., incubating together) the first complex with one or more secondary binding agents that specifically bind the RBP-targeting agent, to form a second complex; c) incubating the first or the second complex with the transcriptase composition disclosed herein, to obtain cDNA; d) sequencing the cDNA to determine the one or more RNA interaction sites of the RBP. In some aspects, the method may further comprise fixing the biological sample prior to steps (a) - (d). In some aspects, the method my further comprise permeabilizing the biological sample. Thus, in some aspects, a method comprises identifying one or more RNA interaction sites of a RNA-binding Protein (RBP) in a biological sample, comprising: a) fixing the biological sample; b) contacting (e.g., incubating together) the biological sample with an agent that permeabilizes cell membranes; c) providing an RBP- targeting agent to the sample, wherein the RBP-targeting agent interacts with the RBP of interest; d) providing a transcriptase composition comprising a polypeptide construct comprising a targeting moiety and a reverse transcriptase enzyme; wherein the targeting moiety interacts with the RBP-targeting agent; e) incubating the sample with the transcriptase composition to produce cDNA; and f) sequencing the cDNA. These methods, and variations thereof are broadly referred to herein as ARTR-seq.
[0560] In some aspects, also provided herein are methods for determining the RNA interactions sites of more than one RNA binding protein. In some aspects, the method may be used to map the RNA binding sites for greater than, equal to, at least, at most 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500 RBPs. In some aspects, the methods disclosed herein may be used to map the RNA binding sites of all the RBPs in a cell. These methods are broadly referred to as multiplex ARTR-seq.
[0561] In some aspects, also provided herein are modifications of the methods (ARTR-seq and multiplex ARTR-seq) for determining RNA binding sites with spatial resolution. These method are broadly referred to herein as spatial ARTR-seq. In some aspects, any of the methods disclosed herein (for example, ARTR-seq, multiple ARTR-seq, spatial ARTR-seq) may be modified to study RNA modification sites. The aspects provided herein are in no way limiting, and additional aspects with obvious modifications of the disclosed methods may be envisaged by a person of ordinary skill in the art. Some of these aspects are described in detail herein. Any one or more of the preceding steps of each of the methods disclosed can be excluded from certain aspects of the disclosure. A person of skill in the art is well aware of common techniques to accomplish each of the preceding steps.
A. ARTR-seq
[0562] In some aspects, the method is an ARTR-seq method. Some aspects of the method are disclosed herein. 1. Biological sample
[0563] The term biological sample, as used herein encompasses any sample obtained from an organism or prepared in vitro to mimic a sample of biological origin. Non-limiting examples of biological samples include isolated or assembled RNA-protein complexes, biological fluid, cells, tissue samples, or biological materials derived from cells or tissue samples. In some aspects, the biological sample may be obtained from a prokaryotic, or a eukaryotic organism. In some aspects, the eukaryotic organism may be from the kingdoms Animalia, Plantae, Fungi, Protista. In an aspect, the eukaryotic organism is a mammal. In an aspect, the eukaryotic organism is a laboratory animal, for example a primate, a rodent - a mouse, a rat, a gerbil, a nematode, or a fruit fly. In some aspects, the laboratory animal is a genetically engineered animal. In some aspects, the mammal is a human. In some aspects, the mammal has, or is at a risk of having a disease.
[0564] In certain aspects, the disclosed methods comprise obtaining a sample (also a “biological sample”) from a subject wherein the subject has, or is at a risk of having a disease or disorder. In some aspects, the methods of obtaining a biological sample can include methods of biopsy such as fine needle aspiration, core needle biopsy, vacuum assisted biopsy, incisional biopsy, excisional biopsy, punch biopsy, shave biopsy or skin biopsy. In other aspects the sample can be obtained from any of the tissues provided herein that include but are not limited to non-cancerous or cancerous tissue and non-cancerous or cancerous tissue from the serum, gall bladder, mucosal, skin, heart, lung, breast, pancreas, blood, liver, muscle, kidney, smooth muscle, bladder, colon, intestine, brain, prostate, esophagus, or thyroid tissue. Alternatively, the sample can be obtained from any other source including but not limited to blood, sweat, hair follicle, buccal tissue, tears, menses, feces, or saliva. In certain aspects of the current methods, any medical professional such as a doctor, nurse or medical technician can obtain a biological sample for testing. Yet further, the biological sample can be obtained without the assistance of a medical professional.
[0565] A sample can include but is not limited to, tissue, cells, or biological material from cells or derived from cells of a subject. The biological sample can be a heterogeneous or homogeneous population of cells or tissues. The biological sample can be obtained using any method known to the art that can provide a sample suitable for the analytical methods described herein. The sample can be obtained by non -invasive methods including but not limited to: scraping of the skin or cervix, swabbing of the cheek, saliva collection, urine collection, feces collection, collection of menses, tears, or semen. [0566] The sample can be obtained by methods known in the art. In certain aspects the samples are obtained by biopsy. In other aspects the sample is obtained by swabbing, endoscopy, scraping, phlebotomy, or any other methods known in the art. In some cases, the sample can be obtained, stored, or transported using components of a kit of the present methods. In some cases, multiple samples, such as multiple esophageal samples can be obtained for diagnosis by the methods described herein. In other cases, multiple samples, such as one or more samples from one tissue type (for example esophagus) and one or more samples from another specimen (for example serum) can be obtained for diagnosis by the methods. In some cases, multiple samples such as one or more samples from one tissue type (e.g. esophagus) and one or more samples from another specimen (e.g. serum) can be obtained at the same or different times. Samples can be obtained at different times are stored and/or analyzed by different methods. For example, a sample can be obtained and analyzed by routine staining methods or any other cytological analysis methods.
[0567] In some aspects the biological sample can be obtained by a physician, nurse, or other medical professional such as a medical technician, endocrinologist, cytologist, phlebotomist, radiologist, or a pulmonologist. The medical professional can indicate the appropriate test or assay to perform on the sample. In certain aspects a molecular profiling business can consult on which assays or tests are most appropriately indicated. In further aspects of the current methods, the patient or subject can obtain a biological sample for testing without the assistance of a medical professional, such as obtaining a whole blood sample, a urine sample, a fecal sample, a buccal sample, or a saliva sample.
[0568] In other cases, the sample is obtained by an invasive procedure including but not limited to: biopsy, needle aspiration, endoscopy, or phlebotomy. The method of needle aspiration can further include fine needle aspiration, core needle biopsy, vacuum assisted biopsy, or large core biopsy. In some aspects, multiple samples can be obtained by the methods herein to ensure a sufficient amount of biological material.
[0569] General methods for obtaining biological samples are also known in the art. Publications such as Ramzy, Ibrahim Clinical Cytopathology and Aspiration Biopsy 2001, which is herein incorporated by reference in its entirety, describes general methods for biopsy and cytological methods. In some aspects, the sample is a fine needle aspirate of a esophageal or a suspected esophageal tumor or neoplasm. In some cases, the fine needle aspirate sampling procedure can be guided by the use of an ultrasound, X-ray, or other imaging device.
[0570] In some aspects of the present methods, a molecular profiling business can obtain the biological sample from a subject directly, from a medical professional, from a third party, or from a kit provided by a molecular profiling business or a third party. In some cases, the biological sample can be obtained by the molecular profiling business after the subject, a medical professional, or a third party acquires and sends the biological sample to the molecular profiling business. In some cases, the molecular profiling business can provide suitable containers, and excipients for storage and transport of the biological sample to the molecular profiling business.
[0571] In some aspects of the methods described herein, a medical professional need not be involved in the initial diagnosis or sample acquisition. An individual can alternatively obtain a sample through the use of an over the counter (OTC) kit. An OTC kit can contain a means for obtaining said sample as described herein, a means for storing said sample for inspection, and instructions for proper use of the kit. In some cases, molecular profiling services are included in the price for purchase of the kit. In other cases, the molecular profiling services are billed separately. A sample suitable for use by the molecular profiling business can be any material containing tissues, cells, nucleic acids, genes, gene fragments, expression products, gene expression products, or gene expression product fragments of an individual to be tested. Methods for determining sample suitability and/or adequacy are provided.
[0572] In some aspects, the subject can be referred to a specialist such as an oncologist, surgeon, or endocrinologist. The specialist can likewise obtain a biological sample for testing or refer the individual to a testing center or laboratory for submission of the biological sample. In some cases the medical professional can refer the subject to a testing center or laboratory for submission of the biological sample. In other cases, the subject can provide the sample. In some cases, a molecular profiling business can obtain the sample.
2. Sample preparation
[0573] In some aspects, the current disclosure also encompasses methods of preparing the biological sample, as disclosed herein for further processing. Methods for preparing the samples are well known in the art and can comprise use of common laboratory equipment, for example centrifuges, perfusion equipment, dissection equipment, cryostats, mounting equipment, mounting media, solid surface, for example slides, multi-well plates, capillaries etc, microscopes, staining equipment etc. In an exemplary set up, once a tissue sample is obtained, the tissue may be placed in O.C.T, and frozen in liquid nitrogen, and sliced using a cryostat (for example, Leica CM1900). The tissue sections may then be mounted on a suitable solid surface, and further fixed and permeabilized. In another exemplary set up, a tissue obtained may be further dissected into cells, diluted and mounted on a solid surface. In yet another aspect, one or more cells from a cell line may be obtained, and processed. In yet another exemplary aspect, a ribosome, a polysome, or other RNA-protein complexes may be isolated used in the disclosed methods.
[0574] In some aspects, the processed biological sample may be fixed. A person of skill in the art is familiar with common techniques to accomplish fixation of a sample. In some aspects, the fixing step can comprise, consist, or consist essentially of rapidly freezing the sample, or can comprise, consist, or consist essentially of treating the sample with formaldehyde and/or paraformaldehyde (PF A).
[0575] A cellular sample can be fixed by treatment with a fixing agent. A fixing agent can comprise, consist, or consist essentially of a crosslinking agent, including aldehydes like formalin, glutaraldehyde, formaldehyde, PF A, or a precipitating agent, including organic solvents like methanol, acetone, or piric acid, or any combination thereof. In some aspects, the fixing step is quenched, for example with glycine. A person of skill in the art is familiar with common techniques to accomplish quenching of a fixing reaction, including addition of sodium borohydride, or addition of exogenous amine -containing reagents like ammonium chloride and/or glycine. In some aspects, the fixing step comprises, consists, or consists essentially of treating the sample with formaldehyde. In some aspects, the fixing step comprises, consists, or consists essentially of treating the sample with paraformaldehyde (PF A). In some aspects, the fixing step comprises, consists, or consists essentially of treating the sample with greater than, equal to, at least, at most 0.1%, 0.2%, 0.3%, 0.4%, 0.5%, 0.6%, 0.7%, 0.8%, 0.9%, 1.0%, 1.1%, 1.2%, 1.3%, 1.4%, 1.5%, 1.6%, 1.7%, 1.8%, 1.9%, 2.0%, 2.1%, 2.2%, 2.3%, 2.4%, or 2.5% PF A. In some aspects, the fixing step occurs for greater than, equal to, at least, or at most 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 minutes, including any range or value derivable therein. In some aspects, the fixing step occurs at room temperature.
[0576] In some aspects, the fixing step is quenched. In some aspects, the fixing step is quenched with glycine. In some aspects, the quenching glycine is greater than, equal to, at least, at most 25, 50, 75, 100, 125, 150, 200, 225, or 250 mM, including any range or value derivable therein. In some aspects, the quenching step occurs for greater than, equal to, at least, or at most 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 minutes, including any range or value derivable therein. In some aspects, the quenching step occurs at room temperature.
[0577] In some aspects, the sample is permeabilized. In some aspects, a cell permeabilizing agent may comprise a detergent, an enzyme, a solvent, a small molecule, a buffer or any combination thereof. In some aspects, the cell permeabilizing agent comprises a detergent. In some aspects, the agent that permeabilizes cell membranes comprises, consists, or consists essentially of greater than or equal to 0.1%, 0.2%, 0.3%, 0.4%, 0.5%, 0.6%, 0.7%, 0.8%, 0.9%, 1.0%, 1.1%, 1.2%, 1.3%, 1.4%, or 1.5%, including any range or value derivable therein, Triton X-100. In some aspects, the sample is contacted with the permeabilizing agent greater than, equal to, at least, or at most 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 minutes, including any range or value derivable therein. In some aspects, the contacting (e.g., incubating together) step occurs on ice.
[0578] In some aspects, the at least one RNase is optionally provided to the sample following the permeabilizing step or further downstream. In some aspects, the providing of the at least one RNase improves resolution during the sequencing step. In some aspects, the at least one RNase comprises, consists, or consists essentially of ribonuclease I (RNase I, via Thermo Fisher Scientific), RNase A, and/or RNase Tl. In some aspects, the RNase is provided to the sample for greater than, equal to, at least, or at most 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 minutes, including any range or value derivable therein. In some aspects, the at least one RNase is provided to the sample at 37 °C.
3. Primary and secondary complex formation
[0579] In some aspects, the current disclosure provides an ARTR-seq method for determining one or more RNA interaction sites of a RNA-binding Protein (RBP) in a biological sample, comprising: contacting a RBP-targeting agent to the RBP, wherein the RBP-targeting agent specifically binds the RBP to form a first complex; contacting the first complex with one or more secondary binding agents that specifically bind the RBP-targeting agent, to form a second complex; incubating the first or the second complex with the transcriptase composition disclosed herein, to obtain cDNA; sequencing the cDNA to determine the one or more RNA interaction sites of the RBP. A schematic of an exemplary ARTR-seq procedure is provided in FIG. 1A.
[0580] In some aspects, the method may further comprise one or more of a sample preparation step, a fixing step, a quenching step, a permeabilizing step, RNAse treatment, blocking step or any combination thereof, as disclosed herein and/or known in the art. In some aspects, the sample is blocked before the RBP targeting agent is provided to the sample to form a first complex. A person of skill in the art is familiar with common techniques to accomplish sample blocking, which reduces background or non-specific staining of the sample. As is known to a person of skill in the art, agents like hydrogen peroxide, levamisole, avidin/biotin blocking reagents, and/or protein blocking solutions like BSA, gelatin, and/or non-fat dry milk. In an aspect, the blocking agent may comprise BSA. In some aspects, the sample may be blocked using greater than, equal to, at least, at most 0.2, 0.4, 0.6, 0.8, 1, 1.2, 1.4, 1.6, 1.8, 2, 2.2, 2.4, 2.6, 2.8, 3, 3.2, 3.4, 3.6, 3.8, 4, 4.2, 4.4, 4.6, 4.8, 5.0 mg/mL blocking agent in any suitable buffer. In some aspects, the blocking may be done for greater than, equal to, at least, or at most 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, or more minutes at RT. In some aspects, the samples can be blocked for greater than, equal to, at least, at most 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24 hrs at 4 °C. In some aspects, practicing one or more of these steps comprising sample preparation step, a fixing step, a quenching step, a permeabilizing step, blocking step steps provide a processed sample for use in downstream steps of the method.
[0581] In an aspect, the method comprises contacting one or more RBP-targeting agents disclosed herein, with RBPs in the processed sample. In some aspects, the RBP-targeting agent may comprise any molecule as disclosed here, that specifically binds the RBP. Non-limiting examples include antibodies, and functional variants thereof, oligonucleotides or variants thereof, peptides, ligands, small molecules, or aptamers. In some aspects, the RBP-targeting agent is an antibody. In an aspect, the contacting step may be carried out in any suitable buffer composition, for example Tris-HCl, MOPS, phosphate buffered saline (PBS), or Dulbecco’s phosphate buffered saline. In some aspects, the buffer composition may further comprise a blocking agent as disclosed herein. In some aspects, the RBP-targeting agent is incubated with the processed sample for greater than, equal to, at least, or at most 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, or more minutes at RT. In some aspects, the samples can be incubated with the RBP -targeting agent for greater than, equal to, at least, at most 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24 hrs at 4°C. In some aspects, contacting of the RBP binding agent with the RBP forms a primary complex.
[0582] In some aspects, primary complex may optionally be incubated with a secondary binding agent, which specifically binds the RBP-targeting agent. In some aspects, the secondary binding agent may comprise any molecule as disclosed here, that specifically binds the RBP binding agent. Non-limiting examples include antibodies, and functional variants thereof, oligonucleotides or variants thereof, peptides, ligands, small molecules, or aptamers. In some aspects, the secondary binding agent is an antibody, for example an antibody that specifically binds the primary complex. In some aspects, the secondary binding agent is incubated with the sample for greater than, equal to, at least, or at most 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, or more minutes at RT. In some aspects, the samples can be incubated with the secondary binding agent for greater than, equal to, at least, at most 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24 hrs at 4 °C. In some aspects, contacting of the secondary binding agent with the primary complex forms a secondary complex. In some aspects, the RBP-binding agent, or the secondary binding agent may be labeled as provided herein above.
[0583] In some aspects, the sample may be washed between any or after any of the steps disclosed herein. In some aspects, the sample is washed for greater than, equal to, at least, at most 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 times, after the at least one RNase is provided to the sample, the RBP targeting step, after the primary or secondary complex formation and/or after the blocking step. In some aspects, the washing step comprises, consists, or consists essentially of washing the sample with a suitable buffer, for example Tris-HCl, MOPS, phosphate buffered saline (PBS), or Dulbecco’s phosphate buffered saline. In some aspects, the washing buffer may further comprise a blocking agent, as disclosed herein, a RNase inhibitor, and additional ingredients, as well known in the art. In some aspects, the washing step comprises, consists, or consists essentially of shaking the sample with DPBS. In some aspects, the washing step occurs for greater than, equal to, at least, or at most 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 minutes, including any range or value derivable therein. In some aspects, the washing step occurs at room temperature.
4. Reverse transcription and cDNA synthesis
[0584] In some aspects, the disclosed method further comprises incubating the primary or the secondary complex, or both with a transcriptase composition as disclosed herein. As provided herein, the transcriptase composition comprises at least one polypeptide construct and a transcriptase mix. In some aspects, the polypeptide construct comprises a targeting moiety as disclosed herein; and a reverse transcriptase enzyme as disclosed herein. As an aspect, the transcriptase mix comprises one or more ingredients for initiation and synthesis of cDNA. In an aspect, the transcriptase mix comprises one or more adapter-RT primer, wherein the one or more adapter RT-primer each comprises an adapter primer sequence and an RT primer sequence. In some aspects, the RT primer comprises random RT primers as disclosed herein. In some aspects, the adapter primer comprises one or more of a barcode sequence, indexes etc. In some aspects, the transcriptase mix may further comprise components known in the art, for example labeled and/or unlabeled dNTPs as disclosed herein, RNase inhibitor, salts, reducing agents, buffers, solvents, osmotic agents etc.
[0585] A person of skill in the art is familiar with conditions capable of producing cDNA. As noted above, in some aspects, the conditions to produce cDNA can comprise, consist, or consist essentially of providing the sample with at least one primer (random, oligo(dT) or gene specific), dNTPs, and other components in order to conduct reverse transcription (RT) before halting the reaction. In an aspect, the primer is an adapter RT primer as disclosed herein. The other components can comprise, consist, or consist essentially of a non-competitive inhibitor of pancreatic-type ribonucleases, a buffer or buffers, MgCh, a reducing reagent, and/or water. In some aspects, the transcriptase composition is provided to the sample for greater than, equal to, at least, or at most 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, or 60 minutes, including any range or value derivable therein to obtain a cDNA. In some aspects, the transcriptase mix provided to the sample at less than, equal to, about or more than 34 °C, 35 °C, 36 °C, 37 °C, 38 °C, 39 °C, 40 °C, 41 °C, 42 °C, 43 °C, 44 °C, 45 °C, 46 °C, 47 °C, 48 °C, 49 °C, 50 °C, 51 °C, or 52 °C, 53 °C, 54 °C, 55 °C, 56 °C, 57 °C, 58 °C. In some aspects, the transcriptase mix is provided to the sample at less than, equal to, about or more than 34 °C, 35 °C, 36 °C, 37 °C, 38 °C, 39 °C, 40 °C, 41 °C, 42 °C. In some aspects, the transcriptase mix is provided to the sample at 37 °C - 42 °C.
[0586] A person of skill in the art is aware of standard conditions and protocols with which to conduct reverse transcription. For example, primers with which to conduct reverse transcription can comprise, consist, or consist essentially of oligo(dT) primers, random primers, and/or gene-specific primers. A person of skill in the art can select random primers to improve cDNA synthesis for detection. These random primers can comprise, consist, or consist essentially of at least septamers, octamers, nonamers, decamers, undecamers, dodecamers, tridecamers, tetradecamers, pentadecamers, hexadecamers, heptadecamers, octadecamers, nonadecamers, or eicosamers. As a further example, the dNTPs with which to conduct reverse transcription can be labelled or not labelled; as known to a person in the art a dNTP label can comprise, consist, or consist essentially of biotin, biotin-16, a-32P, fluorescein, a fluorescent dye, and/or another label that facilitates detection and/or purification. The labeled and label- free dNTPs can be mixed at different ratios, for example 2:1, 1 :1, 1 :2, or any range or value derivable therein. In some aspects, the dNTPs can comprise, consist, or consist essentially of a combination of labelled dUTP, labelled dCTP, labelled dGTP, labelled dATP, dTTP, dCTP, dATP, and/or dGTP.
[0587] A non-competitive inhibitor of pancreatic-type ribonucleases suitable for conducting reverse transcription can comprise, consist, or consist essentially of RNase inhibitor, RNAseOUT, and/or another agent which prevents RNA degradation by RNase. Buffers with suitable for conducting reverse transcription can comprise, consist, or consist essentially of a phosphate buffer solution like PBS and/or DPBS, and/or another buffer providing a favorable pH and ionic strength for the reaction. A reducing reagent suitable for conducting reverse transcription can comprise, consist, or consist essentially of dithiothreitol (DTT), and/or another agent suitable for reducing disulfide bonds in RNases. Water suitable for conducting reverse transcription can comprise, consist, or consist essentially of nuclease- free water, water treated with diethylpyrocarbonate, and/or water treated with another agent that eliminates any RNases.
[0588] In some aspects, the disclosed method does not comprise oligo(dT) primer initiated reverse transcription. In some aspects, the method does not comprise Tn5 tagmentation.
[0589] A person of skill in the art is familiar with methods for halting RT. For example, a chelating agent can be added to the sample to halt RT. As known to a person of skill in the art, chelating agents can comprise, consist, or consist essentially of EDTA and/or EGTA. In some aspects, halting RT comprises, consists, or consists essentially of providing at least one chelating agent to the sample. In some aspects, the at least one chelating agent comprises, consists, or consists essentially of EDTA and/or EGTA. In some aspects, the EDTA is at a concentration of greater than, equal to, at least, at most, or about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, or 40 mM, including any range or value derivable therein. In some aspects, the EGTA is at a concentration of greater than, equal to, at least, at most, or about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 mM, including any range or value derivable therein. In some aspects, the at least one chelating agent is provided to the sample for greater than, equal to, at least, or at most 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 minutes, including any range or value derivable therein. In some aspects, the at least one chelating agent is provided to the sample at room temperature.
[0590] As noted above, the present methods can further comprise, consist, or consist essentially steps which permit recovery of DNA and/or cDNA from a sample. For example, an optional cell digestion step can be included after the incubating step or optional in-situ imaging step. A person of skill in the art can also use alternative DNA extraction protocols, such as treatment with chemical extractants, physical disruption, treatment with proteases, and/or treatment with other cellular lysis agents. As is known in the art, chemical extractants can comprise, consist, or consist essentially of sodium dodecyl sulfate (SDS), chloroform, phenol, Chelex 100, and/or guanadinium isothiocyanate. As is known in the art, physical disruption methods can comprise, consist, or consist essentially of bead mill homogenization and/or freeze-thaw lysis. As is known in the art, proteases or other cellular lysis agents can comprise, consist, or consist essentially of a lysozyme, a proteinase K, achromopeptidase, and/or pronase E.
5. DNA Sequencing
[0591] As noted above, in some aspects, the cDNA sequencing step produces a binding profile for the RBP of interest. As is commonly known in the art, DNA sequencing can comprise, consist, or consist essentially of amplifying the cDNA, purifying the amplified cDNA, and sequencing the purified cDNA. A person of skill in the art is familiar with common sequencing methods, which can include high-throughput sequencing.
[0592] In some aspects, the methods of the disclosure include a sequencing method. In certain aspects, methods involve sequencing the cDNA produced by incubation step. The cDNA can be prepared for sequencing by any method known in the art, such as library preparation, hybrid capture, sample quality control, product-utilized ligation-based library preparation, or a combination thereof. The cDNA can be prepared for any sequencing technique. In some aspects, a unique genetic readout for each sample can be generated by genotyping one or more highly polymorphic SNPs. In some aspects, sequencing, such as 76 base pair, paired-end sequencing, can be performed to cover approximately 70%, 75%, 80%, 85%, 90%, 95%, 99%, or greater percentage of targets at more than 20x, 25x, 30x, 35x, 40x, 45x, 50x, or greater than 50x coverage. In certain aspects, mutations, SNPS, INDELS, copy number alterations (somatic and/or germline), or other genetic differences can be identified from the sequencing using at least one bioinformatics tool, including VarScan2, any R package (including CopywriteR) and/or Annovar. Exemplary sequencing methods include those described below.
[0593] Massively parallel signature sequencing (MPSS) the first of the next-generation sequencing technologies, was developed in the 1990s at Lynx Therapeutics. MPSS was a beadbased method that used a complex approach of adapter ligation followed by adapter decoding, reading the sequence in increments of four nucleotides. This method made it susceptible to sequence-specific bias or loss of specific sequences. Because the technology was so complex, MPSS was only performed 'in-house' by Lynx Therapeutics and no DNA sequencing machines were sold to independent laboratories. Lynx Therapeutics merged with Solexa (later acquired by Illumina) in 2004, leading to the development of sequencing-by-synthesis, a simpler approach acquired from Manteia Predictive Medicine, which rendered MPSS obsolete. However, the essential properties of the MPSS output were typical of later "next-generation" data types, including hundreds of thousands of short DNA sequences. In the case of MPSS, these were typically used for sequencing cDNA for measurements of gene expression levels. Indeed, the powerful Illumina HiSeq2000, HiSeq2500 and MiSeq systems are based on MPSS. [0594] Polony sequencing developed in the laboratory of George M. Church at Harvard, was among the first next-generation sequencing systems and was used to sequence a full genome in 2005. It combined an in vitro paired-tag library with emulsion PCR, an automated microscope, and ligation-based sequencing chemistry to sequence an E. coli genome at an accuracy of >99.9999% and a cost approximately 1/9 that of Sanger sequencing. The technology was licensed to Agencourt Biosciences, subsequently spun out into Agencourt Personal Genomics, and eventually incorporated into the Applied Biosystems SOLiD platform, which is now owned by Life Technologies.
[0595] 454 pyrosequencing is a parallelized version of pyrosequencing developed by 454
Life Sciences, which has since been acquired by Roche Diagnostics. The method amplifies DNA inside water droplets in an oil solution (emulsion PCR), with each droplet containing a single DNA template attached to a single primer-coated bead that then forms a clonal colony. The sequencing machine contains many picoliter-volume wells each containing a single bead and sequencing enzymes. Pyrosequencing uses luciferase to generate light for detection of the individual nucleotides added to the nascent DNA, and the combined data are used to generate sequence read-outs. This technology provides intermediate read length and price per base compared to Sanger sequencing on one end and Solexa and SOLiD on the other.
[0596] Illumina (Solexa) sequencing. Solexa, now part of Illumina, developed a sequencing method based on reversible dye-terminators technology, and engineered polymerases, that it developed internally. The terminated chemistry was developed internally at Solexa and the concept of the Solexa system was invented by Balasubramanian and Klennerman from Cambridge University's chemistry department. In 2004, Solexa acquired the company Manteia Predictive Medicine in order to gain a massivelly parallel sequencing technology based on "DNA Clusters", which involves the clonal amplification of DNA on a surface. The cluster technology was co-acquired with Lynx Therapeutics of California. Solexa Ltd. later merged with Lynx to form Solexa Inc.
[0597] In this method, DNA molecules and primers are first attached on a slide and amplified with polymerase so that local clonal DNA colonies, later coined "DNA clusters", are formed. To determine the sequence, four types of reversible terminator bases (RT -bases) are added and non-incorporated nucleotides are washed away. A camera takes images of the fluorescently labeled nucleotides, then the dye, along with the terminal 3' blocker, is chemically removed from the DNA, allowing for the next cycle to begin. Unlike pyro sequencing, the DNA chains are extended one nucleotide at a time and image acquisition can be performed at a delayed moment, allowing for very large arrays of DNA colonies to be captured by sequential images taken from a single camera.
[0598] Decoupling the enzymatic reaction and the image capture allows for optimal throughput and theoretically unlimited sequencing capacity. With an optimal configuration, the ultimately reachable instrument throughput is thus dictated solely by the analog-to-digital conversion rate of the camera, multiplied by the number of cameras and divided by the number of pixels per DNA colony required for visualizing them optimally (approximately 10 pixels/colony). In 2012, with cameras operating at more than 10 MHz A/D conversion rates and available optics, fluidics and enzymatics, throughput can be multiples of 1 million nucleotides/second, corresponding roughly to one human genome equivalent at lx coverage per hour per instrument, and one human genome re-sequenced (at approx. 3 Ox) per day per instrument (equipped with a single camera).
[0599] SOLiD sequencing. Applied Biosystems' (now a Thermo Fisher Scientific brand) SOLiD technology employs sequencing by ligation. Here, a pool of all possible oligonucleotides of a fixed length are labeled according to the sequenced position. Oligonucleotides are annealed and ligated; the preferential ligation by DNA ligase for matching sequences results in a signal informative of the nucleotide at that position. Before sequencing, the DNA is amplified by emulsion PCR. The resulting beads, each containing single copies of the same DNA molecule, are deposited on a glass slide. The result is sequences of quantities and lengths comparable to Illumina sequencing. This sequencing by ligation method has been reported to have some issue sequencing palindromic sequences.
[0600] Ion Torrent semiconductor sequencing. Ion Torrent Systems Inc. (now owned by Thermo Fisher Scientific) developed a system based on using standard sequencing chemistry, but with a novel, semiconductor based detection system. This method of sequencing is based on the detection of hydrogen ions that are released during the polymerization of DNA, as opposed to the optical methods used in other sequencing systems. A microwell containing a template DNA strand to be sequenced is flooded with a single type of nucleotide. If the introduced nucleotide is complementary to the leading template nucleotide it is incorporated into the growing complementary strand. This causes the release of a hydrogen ion that triggers a hypersensitive ion sensor, which indicates that a reaction has occurred. If homopolymer repeats are present in the template sequence multiple nucleotides will be incorporated in a single cycle. This leads to a corresponding number of released hydrogens and a proportionally higher electronic signal. [0601] DNA nanoball sequencing is a type of high throughput sequencing technology used to determine the entire genomic sequence of an organism. The company Complete Genomics uses this technology to sequence samples submitted by independent researchers. The method uses rolling circle replication to amplify small fragments of genomic DNA into DNA nanoballs. Unchained sequencing by ligation is then used to determine the nucleotide sequence. This method of DNA sequencing allows large numbers of DNA nanoballs to be sequenced per run and at low reagent costs compared to other next generation sequencing platforms. However, only short sequences of DNA are determined from each DNA nanoball which makes mapping the short reads to a reference genome difficult. This technology has been used for multiple genome sequencing projects.
[0602] Heliscope single molecule sequencing is a method of single-molecule sequencing developed by Helicos Biosciences. It uses DNA fragments with added poly-A tail adapters which are attached to the flow cell surface. The next steps involve extension-based sequencing with cyclic washes of the flow cell with fluorescently labeled nucleotides (one nucleotide type at a time, as with the Sanger method). The reads are performed by the Heliscope sequencer. The reads are short, up to 55 bases per run, but recent improvements allow for more accurate reads of stretches of one type of nucleotides. This sequencing method and equipment were used to sequence the genome of the Ml 3 bacteriophage.
[0603] Single molecule real time (SMRT) sequencing is based on the sequencing by synthesis approach. The DNA is synthesized in zero-mode wave-guides (ZMWs) - small welllike containers with the capturing tools located at the bottom of the well. The sequencing is performed with use of unmodified polymerase (attached to the ZMW bottom) and fluorescently labelled nucleotides flowing freely in the solution. The wells are constructed in a way that only the fluorescence occurring by the bottom of the well is detected. The fluorescent label is detached from the nucleotide at its incorporation into the DNA strand, leaving an unmodified DNA strand. According to Pacific Biosciences, the SMRT technology developer, this methodology allows detection of nucleotide modifications (such as cytosine methylation). This happens through the observation of polymerase kinetics. This approach allows reads of 20,000 nucleotides or more, with average read lengths of 5 kilobases.
6. Sample Imaging
[0604] As noted above, in some aspects, present methods can further comprise, consist, or consist essentially of an optional in-situ imaging step after the incubating step. As is known in the art, imaging can be performed by light microscopy, fluorescence microscopy, confocal microscopy, and/or other commonly known microscopy techniques.
[0605] A person of skill in the art can use an imaging moiety, for example a fluorophore such aminocoumarin, fluorescein, texas red, Alexa Fluor dyes (e.g. Alexa Fluor 488, Alexa Fluor 555, Alexa Fluor 594, Alexa Fluor 647, Alexa Fluor 350, Alexa Fluor 532, and Alexa Fluor 700), Cy dyes (e.g. Cy3, Cy5), DyLight dyes, FITC, or Rhodamine, or functional variants thereof to target certain aspects of the sample of interest, for example the biotin-tagged cDNA. Suitable moieties are known to a person of skill, and can consist, comprise, or consist essentially of a biotinylated monoclonal antibody like Alexa Fluor dye. A person of skill in the art can use a nuclear counterstain to indicate live cells with intact, nonpermeable plasma membranes in the sample. A nuclear counterstain can consist, comprise, or consist essentially of a cell-permanent nuclear counterstain which emits fluorescence when bound to dsDNA, like Hoechst stains and/or SYTO stains.
B. Spatial ARTR-seq
[0606] In some aspects, the disclosed method may be modified to obtain spatial information with respect to RNA binding sites. By introducing single-cell and/or spatial barcodes, ARTR-seq can achieve single-cell or spatial resolution. These barcodes can be seamlessly incorporated either through the use of barcoded RT primers during the reverse transcription process or through ligation. They can subsequently employed to assign singlecell identity or spatial localization during data analysis.
[0607] In spatial barcoding-based ARTR-seq, resolution can be fine-tuned by adjusting the density of barcode primers, allowing for cellular and/or subcellular resolution. Apart from spatial barcoding strategy, the in-situ sequencing method may be used in spatial ARTR-seq to achieve subcellular resolution.
[0608] Spatial ARTR-seq offers compatibility with imaging techniques, such as FISH or variations on FISH, microfluidics imaging techniques, or any other single-cell profiling techniques. This compatibility provides additional information alongside sequencing data, such as subcellular structure identification and/or cell stage determination. In an aspect, the disclosed methods may be combined with advanced single cell imaging techniques to provide spatially resolved binding sites and expression date. Commonly used techniques are provided herein.
[0609] Spatial Transcriptomics: This technique combines gene expression analysis with spatial information, allowing researchers to map RNA molecules in a tissue sample. It involves capturing gene expression data while preserving the spatial context, often using barcoded slides or arrays.
[0610] MERFISH (Multiplexed Error-Robust Fluorescence In Situ Hybridization): A highly multiplexed method for visualizing the spatial distribution of thousands of RNA molecules within cells. It uses fluorescent probes to detect RNA and generate a spatially resolved map of gene expression at the single-cell level.
[0611] SeqFISH (Sequential Fluorescence In Situ Hybridization): Similar to MERFISH, SeqFISH sequentially labels and images RNA molecules within cells using different fluorescent probes, enabling the spatial resolution of hundreds to thousands of genes within 3D tissue sections.
[0612] STARmap (Spatially Resolved Transcript Amplicon Readout Mapping): A technique that preserves the 3D structure of tissues while performing RNA sequencing. It uses hydrogel-tissue chemistry to encode RNA spatial information, allowing for highly multiplexed in situ transcriptomics.
[0613] Slide-Seq: A method that uses barcoded beads on a slide to capture RNA transcripts from tissue sections. This technique maps gene expression across the tissue with single-cell resolution, while maintaining spatial context.
[0614] Visium Spatial Gene Expression: Developed by lOx Genomics, this technique captures mRNA from tissue sections using spatially barcoded microarrays. It provides a spatial map of gene expression, linking molecular data with histological information.
[0615] Laser Capture Microdissection (LCM): A technique that physically isolates specific regions or cells from a tissue sample using a laser. These cells are then analyzed for gene expression or other molecular features, allowing for spatially resolved insights, though in a more manual and targeted way.
[0616] Imaging Mass Cytometry (IMC): Combines high-resolution imaging with mass cytometry to map the spatial distribution of proteins, DNA, or RNA in tissue sections. It allows multiplexed detection of dozens of markers at a time, preserving spatial and cellular context.
[0617] DBiT-seq (Deterministic Barcoding in Tissue for Spatial Omics Sequencing) is a method for co-mapping of mRNAs and proteins in a formaldehyde-fixed tissue slide via nextgeneration sequencing (NGS). Parallel microfluidic channels are used to deliver DNA barcodes to the surface of a tissue slide, and crossflow of two sets of barcodes, Al-50 and Bl-50, followed by ligation in situ, yielding a 2D mosaic of tissue pixels, each containing a unique full barcode AB. Gene expression profiles in 10-pm pixels conformed into the clusters of single-cell transcriptomes, allowing for rapid identification of cell types and spatial distributions.
C. Multiplex ARTR-seq
[0618] In some aspects, also provided herein are methods for determining the RNA interactions sites of more than one RNA binding protein. In an aspect, the methods comprise modification of the ARTR-seq method, such that each RBP-targeting agent is tagged with a separate barcode, and wherein the barcode may be incorporated into the cDNA using click chemistry. In some aspects, the method may be used to map the RNA binding sites for greater than, equal to, at least, or at most 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 30, 36, 40, 48, 50, 60, 70, 72, 80, 84, 90, 96, 100, 108, 120, 150, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500 RBPs, or any range derivable therein. In some aspects, the methods disclosed herein may be used to map the RNA binding sites of all the RBPs in a cell.
[0619] In some aspects, provided herein is method of determining one or more RNA interaction sites of a first RNA-binding Protein (RBP) in a biological sample, comprising: a) contacting a first RBP-targeting agent comprising an alkyne functionalized first DNA barcode, to the first RBP, wherein the first RBP-targeting agent specifically binds the first RBP to form a first complex; b) contacting the first complex with one or more secondary binding agents that specifically binds the first RBP-targeting agent, to form a second complex; c) incubating the first or the second complex with the transcriptase composition disclosed herein, to obtain a first barcoded cDNA library; d) amplifying and sequencing the first barcoded cDNA library; e) obtaining one or more interaction site of the first RBP by deconvoluting the sequenced cDNA library based on the first DNA barcode. In some aspects, the transcriptase composition for use in the method may comprise RT primers comprising a reactive moiety such that it can react with a barcoded oligonucleotide containing antibody or targeting moiety, wherein the barcode oligonucleotide comprises a corresponding reactive moiety for click chemistry. A reactive moiety of a random RT primer may be selected from the non-limiting group consisting of azides, alkynes, nitrones (e.g., 1,3 -nitrones), strained alkenes (e.g., trans-cycloalkenes such as cyclooctenes or oxanorbomadiene), tetrazines, tetrazoles, iodides, thioates (e.g., phorphorothioate), acids, amines, and phosphates. For example, the first reactive moiety of the RT primer may comprise an azide moiety, and a second reactive moiety of the barcode oligonucleotide may comprise an alkyne moiety. The first and second reactive moieties may react to form a linking moiety. A reaction between the first and second reactive moieties may be, for example, a cycloaddition reaction such as a strain -promoted azide-alkyne cycloaddition, a copper-catalyzed azide-alkyne cycloaddition, a strain-promoted alkyne-nitrone cycloaddition, a Diels-Alder reaction, a [3+2] cycloaddition, a [4+2] cycloaddition, or a [4+1] cycloaddition; a thiol-ene reaction; a nucleophilic substation reaction; or another reaction. In some cases, reaction between the first and second reactive moieties may yield a triazole moiety or an isoxazoline moiety. A reaction between the first and second reactive moieties may involve subjecting the reactive moieties to suitable conditions such as a suitable temperature, pH, or pressure and providing one or more reagents or catalysts for the reaction. For example, a reaction between the first and second reactive moieties may be catalyzed by a copper catalyst, a ruthenium catalyst, or a strained species such as a difluorooctyne, dibenzylcyclooctyne, or biarylazacyclooctynone. In some aspects, the random RT primer disclosed herein may further comprise a azide functional group (NNNN-N3).
[0620] In some aspects, the method comprises RBP-targeting agents that comprise an oligonucleotide comprising a DNA-barcode. In some aspects, the oligonucleotide is linked to the RBP-targeting agent via an amino spacer. In some aspects, the amino spacer is a 7 C6 amino spacer, wherein a non-nucleoside modification adds a primary amino group to an oligo's internal position. The amino group is separated from the 5' end nucleotide base by a 6-carbon spacer arm to reduce steric interaction.
[0621] In some aspects, the DNA-barcode can be unique for each RBP being studied. In some aspects, use of multiple barcoded antibodies, wherein each barcode is specific to a RBP, allows for studying more than one RBP using the methods disclosed herein. In some aspects, the oligonucleotide may further comprise a reactive moiety that is operable in attaching the barcode to a cDNA of the disclosed method. A reactive moiety of a barcoded antibody may be selected from the non-limiting group consisting of azides, alkynes, nitrones (e.g., 1,3 -nitrones), strained alkenes (e.g., trans-cycloalkenes such as cyclooctenes or oxanorb omadiene), tetrazines, tetrazoles, iodides, thioates (e.g., phosphorothioate), acids, amines, and phosphates. For example, the first reactive moiety of the RT primer may comprise an azide moiety, and a second reactive moiety of the barcode oligonucleotide may comprise an alkyne moiety. The first and second reactive moieties may react to form a linking moiety. A reaction between the first and second reactive moieties may be, for example, a cycloaddition reaction such as a strain-promoted azide-alkyne cycloaddition, a copper-catalyzed azide-alkyne cycloaddition, a strain-promoted alkyne-nitrone cycloaddition, a Diels-Alder reaction, a [3+2] cycloaddition, a [4+2] cycloaddition, or a [4+1] cycloaddition; a thiol-ene reaction; a nucleophilic substation reaction; or another reaction. In some cases, reaction between the first and second reactive moieties may yield a triazole moiety or an isoxazoline moiety. A reaction between the first and second reactive moieties may involve subjecting the reactive moieties to suitable conditions such as a suitable temperature, pH, or pressure and providing one or more reagents or catalysts for the reaction. For example, a reaction between the first and second reactive moieties may be catalyzed by a copper catalyst, a ruthenium catalyst, or a strained species such as a difluorooctyne, dibenzylcyclooctyne, or biarylazacyclooctynone. Table A provides a list of some exemplary oligonucleotides that may be linked to the RBP binding agent via an amino spacer, and that comprise an alkyne group reactive moiety.
[0622] In some aspects, the method further comprises incorporation of the biotinylated dNTPs, and the RT primer sequence comprising azide functional group, into the cDNA to form proximal azide labeled biotinylated cDNAs during reverse transcription. In some aspects, the method further comprises incorporating the alkyne functionalized first DNA barcode into the cDNA by reacting the alkyne functionalized first DNA barcode with the proximal azide labeled biotinylated cDNA of claim, using in-situ copper catalyzed azide-alkyne cycloaddition (CuAAC), to obtain a first barcoded biotinylated cDNA library. In some aspects, the method further comprises purifying the barcoded biotinylated cDNA library over a streptavidin column prior to step (d). In some aspects, the method further comprises processing the CuAAC using a Klenow Fragment DNA polymerase for second strand synthesis prior to sequencing. In some aspects the one or more interaction sites of the first RBP are obtained by deconvoluting the sequenced data based on the first DNA barcode incorporated into the cDNA. In some aspects, the method further comprises similarly determining the one or more RNA-interaction sites of a second RNA-binding Protein (RBP) in a biological sample, comprising: a) contacting a second RBP-targeting agent comprising a alkyne functionalized second DNA barcode, to the second RBP, wherein the RBP-targeting agent specifically binds the second RBP to form a second primary complex; b) contacting the second primary complex with one or more secondary binding agents that specifically binds the first RBP-targeting agent, to form a second secondary complex; c) incubating the second primary or the second secondary complex with the transcriptase composition, to obtain a second barcoded cDNA library; d) amplifying and sequencing the second barcoded cDNA library; and e) obtaining one or more interaction site of the second RBP by deconvoluting the sequenced cDNA library based on the second DNA barcode.
[0623] In some aspects, this process can be simultaneously conducted for, for greater than, for equal to, for at least, or for at most 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 30, 36, 40, 48, 50, 60, 70, 72, 80, 84, 90, 96, 100, 108, 120, 150, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500 RBPs, or any range derivable therein.
D. Advanced ARTR-seq and spatial ARTR-seq for RNA modifications
[0624] In some aspects, any of the methods disclosed herein may be modified to determine RNA modification sites, either at sequence level or spatial level. In an aspect, suitable modification sites may comprise, consist of, consist essentially of m6C, m5C, mxA, m7G, or a pseudouridine modification. In an aspect, the method may comprise using a modification targeting agent instead of a RBP targeting agent in the ARTR-method. In some aspects, suitable modification targeting agents comprise, consist of, consist essentially of an antibody or variant thereof, an oligonucleotide or variant thereof, a receptor, a ligand, a small molecule, an aptamer, or any combination thereof. In some aspects, the modification targeting agent specifically binds to a modification site. The method may be used with ARTR-seq, multiplex ARTR-seq or spatial ARTR-seq as provided herein, with suitable adjustments as will be known to a person of skill in the art with the disclosure herein.
[0625] Thus, in some aspects, the current disclosure encompasses method of determining spatial distribution of a RNA modification site on a biological sample bound to a solid surface, comprising: a) contacting a modification-targeting agent that specifically binds the modification site on the RNA to form a primary complex; b) contacting the primary complex with a secondary binding agent that specifically bind the primary complex to form a secondary complex; c) incubating the primary complex or the secondary complex with the transcriptase composition to obtain cDNA; optionally incorporating labelled barcodes into the cDNA; and sequencing and imaging the biological sample using a single cell genomic imaging technique to determine the one or more modification sites.
[0626] In some aspects, the modification-targeting agent is an oligonucleotide, or a variant thereof, or a small molecule. In some aspects, the oligonucleotide comprises, consists essentially of, or consists of fluorescent NTPs, or a fluorescent probe. In some aspects, modification-targeting agent is an antibody or a functional variant thereof. In some aspects, the antibody or the functional variant thereof comprises monoclonal antibodies, polyclonal antibodies, recombinant antibody, IgG, Fv, single chain antibody, single domain antibodies, nanobodies, diabodies, multispecific antibodies (e.g., bispecific antibodies), scFv, Fab, F(ab')2, Fab, or variants thereof. In some aspects, modification targeting agent specifically binds to a modification comprising, consisting essentially of, or consisting of m6C, m5C, ml A, m7G, or a pseudouridine modification. In some aspects, the sequencing and imaging is done using a single cell genomic imaging technique as disclosed herein. In some aspects, the single cell genomic imaging technique comprises, consists essentially of, consists of deterministic barcoding in tissue for spatial omics sequencing (DBiT-seq) comprising: ligating a first set and a second set of spatial barcodes to the cDNA of step (c), prior to step (d), wherein the first set of spatial barcodes are contacted to the cDNA horizontally using a first multi-channel microfluidic chip, and the second set of spatial barcodes are contacted to the solid surface vertically using a second multi-channel microfluidic chip.
IV. Kits
[0627] Certain aspects of the present disclosure also concern kits containing compositions of the disclosure and/or compositions to implement methods disclosed herein. In some aspects, the current disclosure encompasses a kit comprising a polypeptide construct as disclosed herein. In some aspects, the kit comprises a transcriptase composition as disclosed herein. In some aspects, the current disclosure encompasses a kit comprising in one or more suitable container(s), an RBP-targeting agent that specifically binds to an RBP, one or more secondary binding agents, a polypeptide construct as disclosed herein, and/or the transcriptase composition as disclosed herein.
[0628] In some aspects, disclosed are kits that can be used to prepare a sample for RBP- RNA binding site and/or RNA modification site identification. In some aspects, disclosed are kits that can be used to identify RBP-RNA binding sites via ARTR-seq, spatial ARTR-seq, multiplexed ARTR-seq and advanced ARTR-seq techniques for determining RNA modification sites.
[0629] The kit can optionally provide additional components that are useful in the procedure. These optional components include buffers, capture reagents, developing reagents, labels, reacting surfaces, means for detection, control samples, instructions, and interpretive information. In certain aspects, a kit contains, contains at least, or contains at most 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 100, 500, 1,000 or more probes, primers or primer sets, synthetic molecules or inhibitors, or any value or range and combination derivable therein. In some aspects, there are kits for evaluating RBP binding activity in cells.
[0630] Kits can comprise components, which can be individually packaged or placed in a container, such as a tube, bottle, vial, syringe, or other suitable container means.
- I l l - [0631] Individual components can also be provided in a kit in concentrated amounts; in some aspects, a component is provided individually in the same concentration as it would be in a solution with other components. Concentrations of components can be provided as lx, 2x, 5x, lOx, or 20x or more. In certain aspects, negative and/or positive control nucleic acids, probes, and inhibitors are included in some kit aspects.
[0632] Kits for using probes, synthetic nucleic acids, nonsynthetic nucleic acids, RBP targeting agents, and/or targeting moieties of the disclosure for prognostic or diagnostic applications are included as part of the disclosure. In certain aspects, negative and/or positive control nucleic acids, probes, and inhibitors are included in some kit aspects. In addition, a kit can include a sample that is a negative or positive control for RBP-RNA interactions.
[0633] Any aspect of the disclosure involving specific RBP, RNA, or other biomarker by name is contemplated also to cover aspects involving biomarkers whose sequences are at least 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99% identical to the mature sequence of the specified nucleic acid.
[0634] Detection Kits and Systems: One can recognize that based on the methods described herein, detection reagents, kits, and/or systems can be utilized to detect the biomarkers, including ST2, for diagnosing or prognosing an individual. The reagents can be combined into at least one of the established formats for kits and/or systems as known in the art. The kits could also contain other reagents, chemicals, buffers, enzymes, packages, containers, electronic hardware components, etc. The kits/systems could also contain packaged sets of PCR primers, oligonucleotides, arrays, beads, antibodies, or other detection reagents. Any number of probes could be implemented for a detection array. In some aspects, the detection reagents and/or the kits/systems are paired with chemiluminescent or fluorescent detection reagents. Particular aspects of kits/systems include the use of electronic hardware components, such as DNA chips or arrays, or microfluidic systems, for example. In specific aspects, the kit also comprises one or more therapeutic or prophylactic interventions in the event the individual is determined to be in need of.
[0635] It is contemplated that any method or composition described herein can be implemented with respect to any other method or composition described herein and that different aspects can be combined, and that these compositions may be packaged into kits and/or kits may be designed to facilitate these methods.
[0636] The claims originally filed are contemplated to cover claims that are multiply dependent on any filed claim or combination of filed claims. V. Clinical, and non-clinical applications
[0637] In some aspects, the current disclosure also encompasses methods of using the methods and/or compositions disclosed herein for use in clinical, non-clinical, and/or research use. The novel approach of the methods and products of the present disclosure utilizes now well established array and sequencing technology to yield single cell level information for RNA binding proteins, whilst retaining the positional information. It will be evident to the person of skill in the art that this represents a milestone in the life sciences. The new technology opens new avenues of research, which is likely to have profound consequences for our collective understanding of tissue development and tissue and cellular function in all multicellular organisms. It will be apparent that such techniques will be particularly useful in our understanding of the cause and progress of disease states and in developing effective treatments for such diseases, for example but not limited to, cancer. The methods of the disclosure will also find uses in the diagnosis of numerous medical conditions. These methods can be used to advance research of RBPs, translatome, transcriptome, and epitranscriptomic regulations. For example, over 150 distinct chemical modifications occur on the RNA molecules, impacting various aspects of gene expression, such as RNA decay and translation. These modifications also play critical roles in physiology and diseases. Notable, A6 -methyladenosine (m6A) stands out as the most prevalent modification in mammalian mRNA and chromatin-associated RNA (caRNA), with close associations with disorders and cancers. In addition, m6A modification exhibits distinct tissue-specific distributions. Measuring transcriptome-wide m6A at single-cell and spatial resolution allows a deeper understanding of epitranscriptomic regulations in heterogeneous cell types within tissues.
[0638] In some aspects, the disclosed methods may also be used to develop diagnostic methods to detect a disease or a disorder, to study disease progression or to study susceptibility if a subject to a disease or disorder.
Diseases or Disorders
[0639] In certain aspects, methods involve obtaining a sample from a subject with a disease or disorder. In some embodiments, compositions, methods, and/or kits described herein may be used in a method of preventing, treating, reducing the progression of, and/or reducing the risk of a disease or disorder, wherein the disease or disorder is a cancer and/or a neurodegenerative disease.
[0640] In some embodiments, the disease or disorder is a cancer. In some embodiments, the cancer is pancreatic cancer, breast cancer, kidney cancer, bladder cancer, prostate cancer, testicular cancer, urothelial cancer, endometrial cancer, ovarian cancer, cervical cancer, renal cancer, esophageal cancer, gastrointestinal stromal tumor (GIST), multiple myeloma, cancer of secretory cells, thyroid cancer, gastrointestinal carcinoma, chronic myeloid leukemia, hepatocellular carcinoma, colon cancer, melanoma, malignant glioma, glioblastoma, glioblastoma multiforme, astrocytoma, dysplastic gangliocytoma of the cerebellum, Ewing’ s sarcoma, rhabdomyosarcoma, ependymoma, medulloblastoma, ductal adenocarcinoma, adenosquamous carcinoma, nephroblastoma, acinar cell carcinoma, neuroblastoma, or lung cancer. In some embodiments, the cancer of secretory cells is non-Hodgkin’s lymphoma, Burkitt’s lymphoma, chronic lymphocytic leukemia, monoclonal gammopathy of undetermined significance (MGUS), plasmacytoma, lymphoplasmacytic lymphoma or acute lymphoblastic leukemia.
[0641] In some embodiments, the disease or disorder is a neurological disorder. Neurological disorders are diseases of the body’s nervous system. Structural, biochemical or electrical abnormalities in the brain, spinal cord or other nerves can result in a range of symptoms. There are more than 600 diseases of the nervous system, such as epilepsy, dementias, Alzheimer’s disease and cerebrovascular diseases including stroke, multiple sclerosis, Parkinson’s disease, amyotrophic lateral sclerosis, migraine, neuroinfections, brain tumors and traumatic disorders of the nervous system such as brain trauma and autism.
EXAMPLES
[0642] The following examples are included to demonstrate certain aspects of the disclosure. It should be appreciated by those of skill in the art that the techniques disclosed in the examples which follow represent techniques discovered by the inventors to function well in the practice of the disclosure, and thus can be considered to constitute certain modes for its practice. However, those of skill in the art should, in light of the present disclosure, appreciate that many changes can be made in the specific aspects which are disclosed and still obtain a like or similar result without departing from the spirit and scope of the inventions described herein.
EXAMPLE 1 - Development of ARTR-seq
[0643] To overcome the limitations of existing methods, the inventors introduce an Assay of Reverse Transcription-based RBP binding sites Sequencing (ARTR-seq) to capture RBP- RNA interactions through in-situ reverse transcription (RT). The inventors demonstrated that ARTR-seq sensitively profiled the RNA targets of RBPs with good sequencing quality, using as few as 20 cells or even a single tissue section. Additionally, an imaging step was readily built into the ARTR-seq procedure, which provides direct spatial information of RBP-RNA interactions. With ARTR-seq, the inventors show distinct binding patterns of splicing factors and regulatory differences among the YTH family reader proteins of RNA ^-methyladenosine (m6A) modification. ARTR-seq unbiasedly detected RNA binding by RBPs in both cytoplasm and nucleus and measured the binding strength of RBP on different RNA substrates. Furthermore, the inventors show that ARTR-seq can be applied to monitor dynamic RNA targeting by G3BP1 during stress granule assembly on a small timescale of 10 minutes.
METHODS
Cell culture and stress treatment
[0644] In some instances, HeLa cells and HepG2 cells can be purchased from ATCC, and cultured in DMEM medium (Gibco) supplemented with 10% fetal bovine serum (FBS, Gibco) and Penicillin-Streptomycin (Gibco). In some instances, K562 cells can be obtained from ATCC and cultured in RPMI 1640 Medium (Gibco) supplemented with 10% (v/v) fetal bovine serum. Penicillin-Streptomycin (Gibco) and 2 mM L-glutamine (Gibco). Cells can be grown at 37 °C with 5% CO2. For NaAsCh treatment, HeLa cells can be grown to 90% confluence and replaced in the pre-warmed DMEM medium containing 0.5 mM NaAsCh, which can be further maintained at 37 °C with 5% CO2 for indicated times.
Expression and purification of recombinant protein A/G-reverse transcriptase
[0645] In some instances, the recombinant plasmids can be constructed by assembly of pet28A vector, protein A/G (pAG), linkers with different lengths, and reverse transcriptase (RTase) or a modified RTase with NEBuilder® HiFi DNA Assembly Master Mix (NEB) following the manufacturer’s protocol. Protein A/G dNA segment can be amplified from pAG/MNase plasmid (Addgene, #123461). In some instances, the engineered MMLV RTase can be modified from pCMV-PE2 plasmid (Addgene, #132775). In other instances, the recombinant proteins were expressed in BL21(DE3) Competent E. coli. (NEB) with IPTG induction at 16 °C for 18 h. In some instances, cells can be collected by centrifuge at 6000 rpm for 10 min, and lysed in the buffer of 50 mM Tris HC1 (pH 7.5), 300 mM NaCl and ImM PMSF with sonication at 10s-on/l Os-off setting for 10 min at 4 °C. In some instances, the recombinant proteins can be purified from the supernatant using HisTrap HP column (GE Healthcare), followed by ion exchange chromatography column (GE Healthcare) on an AKTA Purifier 10 system (GE Healthcare) according to the manufacturer’ s protocol, and concentrated to about 20 mg/ml. The purified enzyme can be supplemented with 40% glycerol and stored in -80 °C for future use. Quantitative reverse transcription-polymerase chain reaction (qRT-PCR)
[0646] In some instances, RNA can be reverse transcribed with purified protein AG- reverse transcriptases (pAG-RTases) or commercial reverse transcriptases in reaction buffer (50 mM Tris HC1, 150 mMNaCl, pH 7.5) at 37 °C for 15 min, and denatured at 85 °C for 5min. Quantitative PCR can be performed with FastStart Essential DNA Green Master (Roche) on LightCycler 96 System (Roche). The efficiency of reverse transcription (RT) can be quantified by delta quantitation cycle (Cq) method.
Protein detection by Coomassie brilliant blue (CBB) stain and western blot
[0647] In some instances, the mammalian cell samples can be lysed with cold RIPA buffer (Thermo Fisher Scientific) containing l x protease inhibitor cocktail (Roche). The cell lysate can be cleared with centrifugation at 15,000 g for 10 min at 4 °C. The supernatant or purified protein can then be mixed with LDS loading buffer (Bio-Rad) and boiled at 95 °C for 10 min. Denatured protein can be loaded into 4-12% NuPAGE BIS-Tris gel (Thermo Fisher Scientific). [0648] In some instances, for CBB stain, the gel can be stained with Imperial Protein Stain (Thermo Fisher Scientific) and detected by FluroChem R (Proteinsimple). For the western blot, the protein can be transferred to the PVDF membrane from gel. The membranes can be blocked in 3% BSA (diluted in PBST) for 1 h at room temperature, incubated in a diluted primary antibody solution at 4°C overnight, washed with PBST 4 times, and incubated in a dilution of secondary antibody conjugated to HRP for 1 h at room temperature. Protein bands can be imaged by SuperSignal West Dura Extended Duration Substrate kit (Thermo Fisher Scientific) on the FluroChem R machine (Proteinsimple). Quantification can be performed using Imaged software.
Transfection
[0649] In some instances, PTBP1 siRNA can be purchased from Horizon Discovery /Dharmacon. Cells can be seeded in 30 % confluency one day before. After 12h, siRNA can be transfected with RNAimax (Thermo Fisher Scientific) following the manufacturer’s manual. The fresh medium can then be changed at 6h after transfection. Cells can then be cultured for another 48h, and the knockdown efficiency can be quantified by western blot.
ARTR-seq
[0650] In some instances, cells can be fixed to a chamber with 1.5% paraformaldehyde (PF A) at room temperature for 10 min. To mitigate cell loss, 1.5% PFA crosslinking can be applied instead of the commonly used 1% PFA crosslinking. Samples can then be quenched with 125 mM glycine at room temperature for 5 min, and permeabilized with 0.5% Triton X- 100 on ice for 10 min. Samples can then blocked with 1 mg/ml UltraPur BSA (Thermo Fisher Scientific) at room temperature for 30 min, stained with the primary antibody at room temperature for Ih, and then stained with fluorophore-labeled secondary antibody at room temperature for 30 min, followed by incubation with pAG-RTase for an additional 30 min. For input samples, the primary antibody can be replaced by the DPBS buffer with 1 mg/ml UltraPure BSA. Cells can be washed with DPBS at least once, twice, thrice, or more after each staining step by shaking at room temperature for 3 min.
[0651] In some instances, a reverse transcription reaction mixture was prepared by mixing 2 pM adapter-RT primer (5'-AGACGTGTGCTCTTCCGATCTNNNNNNNNNN-3'), 0.05 mM biotin- 16-dUTP (Jena Bioscience), 0.05 mM biotin- 16-dCTP (Jena Bioscience), 0.05 mM dTTP (Thermo Fisher Scientific), 0.05 mM dCTP (Thermo Fisher Scientific), 0.1 mM dATP (Thermo Fisher Scientific), 0.1 mM dGTP (Thermo Fisher Scientific), 1 U/pl RNaseOUT (Thermo Fisher Scientific) in 50 pl buffer of DPBS supplemented with 3 mM MgCh. In-situ reverse transcription can be performed by immersing cells with the Transcriptase mix and incubating at 37 °C for 30 min, then stopping by adding 20 mM EDTA and 10 mM EGTA and incubating at room temperature for 3 min. Next, cells can then be stained with biotin monoclonal antibody (BK-1/39) - Alexa fluor 488 (Thermo Fisher Scientific) by incubation at room temperature for Ih, followed by stain with 1 pg/mL Hoechst 33342 dye (Thermo Fisher Scientific) at room temperature for 15 min, and then imaged by Leica SP8 laser confocal microscope. The fluorescence intensity distribution on a line can be quantified by ImageJ software. After imaging, cells can be digested with proteinase K (Thermo Fisher Scientific) at 37 °C for 2 h. The nucleic acids can be recovered by phenol-chloroform extraction and concentrated by ethanol precipitation. RNA can be digested with RNase H (NEB) and RNase A/Tl (Thermo Fisher Scientific) at 37 °C for Ih, followed by enriching biotinylated cDNA using 10 pl pre-blocked Dynabeads MyOne Streptavidin Cl (Thermo Fisher Scientific) at room temperature for 20 min. The beads can be washed, and the on-beads 3' cDNA adapter (5'Phos-NNNNNNNNAGATCGGAAGAGCGTCGTGT-3'SpC3) (nucleic acid sequence as in SEQ ID NO: 26) can be ligated by T4 RNA ligase 1 (NEB) by incubating at 25 °C for 16 h. The beads can be washed again, and cDNA can be recovered with the elution buffer of 95 % (v/v) formamide and lOmM EDTA (pH 8.0) by boiling at 95 °C for 10 min, followed by ethanol precipitation.
[0652] The library can be obtained by PCR amplification with NGS sequencing primer and gel purification of size between 180 bp and 400 bp. Next-generation sequencing can be carried out either at the University of Chicago Single Cell Immunophenotyping Core on an Illumina NextSeq 550 machine or at the University of Chicago Genomics Facility on an Illumina NovaSeq 6000 platform.
Spatial ARTR-seq
[0653] In addition to identifying RBP binding sites, ARTR-seq can profile translation and RNA modifications. By targeting the RTase to ribosomes, ARTR-seq can identify ribosome binding sites. Additionally, ARTR-seq can capture RNA modification sites through in-situ reverse transcription.
[0654] By introducing single-cell or spatial barcodes, ARTR-seq can achieve single-cell or spatial resolution. These barcodes can be seamlessly incorporated either through the use of barcoded RT primers during the reverse transcription process or through ligation, as exemplified in SPLiT-seq72. They can subsequently employed to assign single-cell identity or spatial localization during data analysis.
[0655] In spatial barcoding-based ARTR-seq, resolution can be fine-tuned by adjusting the density of barcode primers, allowing for cellular and/or subcellular resolution. Apart from spatial barcoding strategy, the in-situ sequencing, such as FISSEQ73 can be applied in spatial ARTR-seq to achieve subcellular resolution.
[0656] Spatial ARTR-seq offers compatibility with imaging techniques, such as FISH or variations on FISH, microfluidics imaging techniques, or any other single-cell profiling techniques. This compatibility provides additional information alongside sequencing data, such as subcellular structure identification and/or cell stage determination.
RNase treatment in ARTR-seq
[0657] RNase treatment can be incorporated into ARTR-seq procedure with the following adjustments. After permeabilization, cells can incubated with lU/pl RNase I (Thermo Fisher Scientific) at 37 °C for at least 5min, followed by at least one, two, or more washes with a buffer like DBPS. For the samples with strong RNase treatment, an additional RNase I treatment can be conducted as previously described before reverse transcription.
Dot blot
[0658] In some instances, after the proteinase K digestion step in ARTR-seq, the total nucleic acids can be recovered with Oligo Clean & Concentrator Kits (Zymo) to get rid of free biotinylated dNTP. The concentration of nucleic acids can be measured by Nanodrop 8000 Spectrophotometer and adjusted to 50 ng/pL. Next, 1 pL nucleic acids can be loaded onto the Amersham Hybond- N+ membrane (GE Healthcare). Membranes can be air-dried and crosslinked by ultraviolet (UV) strata linker 2400 at 150 mJ/cm2 twice. The membranes can be then blocked in 5% fatty-acid-free BSA in PBST (PBS with 0.1% Tween-20) at room temperature for 1 h, followed by incubation in streptavidin-HRP (Thermo Fisher Scientific) in PBST supplemented with 5% fatty-acid free BSA at room temperature for another 1 h. The membrane can be washed with PBST four times before being imaged by SuperSignal West Dura Extended Duration Substrate kit (Thermo Fisher Scientific) on the FluroChem R machine (Proteinsimple).
ARTR-seq in the mouse embryo
[0659] In some instances, C57 mouse embryo (El l) frozen tissue sections can be purchased from Zyagen. The slide with frozen tissue sections can be brought to room temperature for 10-minute incubation. The PAP pen can be used to draw a circle around the mouse tissue on the slide, providing a thin film-like hydrophobic barrier for reagent incubation. Then the tissue can be subjected to typical ARTR-seq procedures.
ARTR-seq with low input
[0660] In some instances, ARTR-seq can be applied to 20 to 5k HepG2 cells with the following changes. 4% PFA can be employed to minimize cell loss for low input samples. 2 pM adapter-barcode-RT primer (5'-AGACGTGTGCTCTTCCGATCT-8-nt barcode- NNNNNNNNNN-3') (together as in SEQ ID NO: 25) can be applied for in-situ reverse transcription. After digestion of proteinase K, two biological replicates can be pooled together for biotinylated cDNA enrichment, adapter ligation, library amplification and library sequencing. Sequence data can be isolated based on the 8-nt barcode in RT primers.
Genome reference
[0661] Genome and the corresponding reference of Homo sapiens (GRCh38.pl3, GENCODE Release 39), Mus musculus (GRCm39, GENCODE Release M29), and Drosophila melanogaster (BDGP6.32, Ensembl Release 107) can be used for mapping the sequencing reads in this study. rRNA reference sequences can be downloaded from NCBI for H. sapiens (NR_003285.3, NR_003286.4, NR_003287.4, NR_023363.1), M. musculus (NR_003278.3, NR_003279.1, NR_003280.2, NR_046156. 1), and from FlyBase for D. melanogaster (5SrRNA-CR33353, 18SrRNA-CR45841, 5.8SrRNA-CR45842, 28SrRNA-CR4584)
ARTR-seq primary data processing
[0662] In some instances, reads from the small cell number libraries containing cell barcodes can be firstly demultiplexed with an in-house script using read 2. The adaptor sequences can be trimmed with Cutadapt52 (v4.2) using the parameter cutadapt — nextseq- trim=20 -a AGATCGGAAGAGCACACGTCTGAACTCCAG (SEQ ID NO: 79); the 8-nt UMI sequences can be moved and add to the read name for the further deduplication. Extra 4 nts at the reads’ 3 -end can be removed from the adapter-free sequence to minimize mapping mismatch caused by the imperfect paired sequence in the random primer.
[0663] In some instances, the reads can first be mapped to the corresponding rRNA sequences using Bowtie253 (v2.4.4) with parameters: — seedlen=15, and the mapped reads can be discarded to avoid rRNA contamination. The remaining unmapped reads can be mapped to the corresponding genome using STAR54 (v2.7.9a) with parameters: — readFilesCommand zcat — alignEndsType EndToEnd — genomeLoad NoSharedMemory — quantMode
TranscriptomeSAM — alignMatesGapMax 15000 — outFilterMultimapNmax 1 outFilterMultimapScoreRange 1 — outSAMprimaryFlag AllBestScore — outSAMattributes All — outSAMtype BAM SortedByCoordinate — outFilterType BySJout — outReadsUnmapped Fastx — outFilterScoreMin 10 — outFilterMatchNmin 24. Uniquely mapped reads can be deduplicated to get the usable reads using UMI-tools55 (vl.1.2) with the parameter, —method unique. The usable reads can be assigned to genomic regions with RNASeQC56 (v2.4.2) using default parameters. Deduplicated reads can be assigned to genes with featureCounts57 (v2.0.3) for the calculation of Pearson’s correlation coefficient. For visualization in IGV58 (v2. 13.1), bam files of the usable reads can be converted to bigWig with bamCoverage in the deepTools suite59 (v3.5.1) with normalization by its respective sequencing depth using the parameters — normalizeUsing BPM —binSize 1. All the sample tracks can be set to the same scale for display, except for the additional instruction in the legend.
Peak Calling
[0664] In some instances, for peak calling, the usable reads in one library can be first split into two sam files containing reads aligned to the positive and negative strands, respectively. macs360 can be used to identify peaks with default parameters, except for adding keep-dup all —nomodel -extsize 30’ . The peaks located in two strands can be called separately using the corresponding strand read in the input libraries as background. The two peak files for one library can later be combined. To generate the consensus motif for peaks, 20 nts can first be extended to both upstream and downstream, and the overrepresented sequences can be generated using fmdMotifsGenome.pl in the HOMER suite61 (v4.11) with parameters: -ma -S 10 -len 5, 6, 7, 8, 9. Specifically, for motif generation for peaks in mouse tissue, the peak genomic coordinates can be converted from mm39 to mm 10 using liftOver from UCSC Genome Browse62. Peaks can be assigned to specific genomic regions with in-house scripts, and the peaks overlapping two genomic regions can be assigned to the region of longer overlapping size. The peaks from the reader YTHDC1 can be further assigned to repeats and other regions with annotatePeaks.pl in the HOMER suite. Subsampling
[0665] In some instances, to calculate the percentage of usable reads at different sequencing depths, the uniquely mapped reads can be subsampled with samtools view in the Samtools suite63 (vl.16.1). For the comparison between small cell number input libraries for different methods, the sizes of all libraries can be reduced to that of the smallest library. Specifically, instead of directly subsampling the fastq files, the usable reads can be subsampled to match the usable reads percentage of each library.
Alternative splicing identification
[0666] The differential alternative splicing events of each gene can be identified using rMATS (v4.1.2). The RBP-knockdown RNA-seq libraries bam files and the corresponding control libraries bam files with the annotation of ENCODE4 vl.2.1 GRCh38 V29 can be downloaded from the ENCODE and can be analyzed by rMATS for the identification of five alternative splicing modes, including SE (skipped exon), MXE (mutually exclusive exons), A3SS (alternative 3' splice site), A5SS (alternative 5' splice site) and RI (retained intron). Events of FDR >= 0.05 can be discarded for the subsequent analysis.
ARTR-seq enrichment level at the gene level
[0667] In some instances, to calculate the ARTR-seq enrichment at the gene level, the reads in one library can be divided into two groups by whether they were in one specific gene to have a pair of in/out read number for each of the IP and Input library. For each gene, two-by-two tables for all the combinations of in/out read number between IP and Input libraries can be generated. The ARTR-seq enrichment for a gene can be defined as the common odds ratio of the tables with significance determined by the Cochran-Mantel-Haenszel Chi-Squared test.
Data visualization and statistical analysis
[0668] Read heatmaps and profiles were generated with plotHeatmap and plotProfile in the deepTools suite59 (v3.5.1). The splicing regulatory maps of slicing factors are generated by RBP-Maps64 with default parameters, and the coordinates of native cassette exons and constitutive exons were downloaded from the software GitHub deposit. The random regions of the same length as the m6A reader proteins binding peaks were generated by bedtools shuffle in the BEDTools suite65 (v2.30.0).
[0669] The meta-distributions of binding peaks were generated by the R package Guitar66. All statistical analyses were performed with R67, and all the plots are generated by the R package ggplot268. Quantification of ARTR-seq signal at the gene level
[0670] In some instances, to analyze G3BP1 binding strength at the gene level, ARTR-seq reads can be counted for genes in both G3BP1 and paired input samples, and fold changes and significance between G3BP1 and input can be determined by DESeq269. Only genes with the read sum equal to or greater than 10 for G3BP1 and input samples can be considered. RNA targets of G3BP1 can be defined as those with fold change
Figure imgf000123_0001
2 and p-value < 0.05.
Clustering analysis of G3BP1 ARTR-seq signal
[0671] To track the changing pattern of G3BP1 binding single during the stress granule assembly, log2 fold change (G3BPl/input) of genes can be used to represent G3BP1 binding signal, and fuzzy c-means clustering analysis can be performed on log2FC by the Mfuzz package70 (v2.54.0). Only genes with the top 50% of the greatest standard deviation (SD) of log2FC can be considered, and the log2FC values can be scaled by z score before clustering. The cluster number can be determined by the ‘Dmin’ function in the Mfuzz package. Clustering can be calculated by the ‘mfuzz’ function in Mfuzz package with 10,000 iterations with Euclidean distance as the clustering method.
Functional enrichment analysis
In some instances, KEGG enrichment analysis can be performed to compare G3BP1 RNA targets at different time points using the ‘compareCluster’ function in the clusterProfiler package71 (v4.4.4). The KEGG terms with adjusted p values less than 0.05 can be visualized.
EXAMPLE 2: Strategy and Development of ARTR-seq
[0672] ARTR-seq relies on in-situ RT to capture binding sites of specific RBPs. In design of ARTR-seq, the inventors started with formaldehyde fixation to rapidly freeze and preserve the cellular structure, followed by permeabilization of cell membranes to facilitate subsequent processing (FIG. 1A-I). The inventors then targeted the reverse transcriptase (RTase) to the RBP of interest with the guidance of specific antibodies (FIG. 1A-II). The inventors first delivered the primary antibody to bind the RBP through antigen-antibody interaction (FIG. 1A-II1). Then, the inventors incubated cells with a secondary antibody that can efficiently bind the fragment crystallizable (Fc) region of the primary antibody (FIG. 1A-II2). As multiple secondary antibodies could bind to a single primary antibody, the incorporation of the secondary antibody increased the local antibody concentration around the targeted RBP. The inventors next incubated cells with a fusion protein of protein A/G and reverse transcriptase (pAG-RTase); the specific binding of protein A/G (pAG) to the Fc regions on both primary and secondary antibodies would allow site-specific delivery of the tethered RTase to the target RBP (FIG. 1A-II3). pAG can interact with various types of antibodies and be easily expressed in bacterial systems with high yield, making it an ideal choice for fusion with RTase. Subsequent to each delivery of the primary antibody, the secondary antibody, and the pAG- RTase, the inventors conducted multiple wash steps to remove any unbound antibodies or pAG-RTase.
[0673] After localizing RTase to the RBP, the inventors initiated in-situ RT at RBP binding sites by the addition of primers, dNTPs and other components (FIG. 1A-III). To achieve efficient reverse transcription, the inventors screened three commonly used RTases, including engineered Moloney murine leukemia virus (MMLV) RTase (H8Y, D200N, T306K, W313F, T330P, D524G, L603W)24, 25, human immunodeficiency virus (HIV) RTase, and a truncated version of engineered MMLV RTase (25-497) in the pAG-RTase fusion constructs with a linker length of 30 amino acids (FIGs. 7A-B). The RNase H domain and the first 24 N-terminal residues were omitted in MMLV RTase (25-497) to improve its reactivity. Quantitative reverse transcription-polymerase chain reaction (qRT-PCR) was used to evaluate the properties of these pAg-RTase constructs. The inventors found pAG-MMLV RTase (25-497) exhibited the highest RT activity among the three and used this fusion construct for subsequent studies (FIG. IB and FIG. 7C)
[0674] To identify all RBP binding sites without sequence bias, the inventors next applied random reverse transcription primers with an adapter tagged at their 5' ends for library construction. However, the commonly used random 6-mer primer, when tagged with the adapter, presented a noticeable reduction in reverse transcription efficiency. The inventors therefore increased the primer length to 10 nt (FIG. 7D). Moreover, for effective enrichment of cDNAs produced in ARTR-seq, the inventors tested biotinylated dNTPs that could be incorporated into the final cDNA products. After screening five commercially available biotinylated dNTPs, the inventors found that biotin- 16-dUTP and biotin- 16-dCTP exhibited the least hindrance on RT efficiency (FIG. 7E). The inventors proceeded with including biotin- 16-dUTP and biotin- 16-dCTP, in a 1 :1 ratio with regular dTTP and dCTP, respectively, in the current ARTR-seq protocol. The inventors could enrich the biotinylated cDNAs with the streptavidin beads, and perform 3' end adapter ligation of cDNAs, library amplification and high-throughput sequencing to acquire the binding profile of the RBP of interest (FIG. 1A-IV). Note that after in-situ reverse transcription, IF imaging could be performed to reveal subcellular localization of RBPs without disturbing the subsequent library constructions if the secondary antibody and pAG-RTase delivered to the RBP are fluorophore-modified. EXAMPLE 3: Validation of ARTR-seq
[0675] To evaluate ARTR-seq in capturing binding sites of RBPs, the inventors applied ARTR-seq to a well-known RBP, PTBP1. PTBP1 is a splicing factor with a variety of published CLIP-seq datasets that can be readily utilized for comparison. To confirm the production of biotinylated cDNA from in-situ RT, the inventors monitored the biotin group in the cDNA product by dot plot. cDNA biotinylation was mostly abolished with the omission of biotin-dNTP, pAG-RTase, or primary antibody, confirming the usefulness of each of these components for successful cDNA synthesis (FIG. 1C). With further IF staining of biotinylated cDNA, with their signals largely disappeared upon exclusion of the primary antibody, the inventors also confirmed the colocalization of pAG-RTase, the secondary antibody and newly synthesized cDNA, supporting the localized RT reaction performed by pAG-RTase tethered to the RBP of interest (FIG. ID and FIG. 7F). Note that the utilization of the secondary antibody led to an increased overall yield of biotinylated cDNA (FIG. ID and FIGs. 7F and 7G).
[0676] Altogether, the inventors showed that ARTR-seq can specifically and effectively reverse transcribe RNAs nearby the targeted protein into biotinylated cDNA products.
[0677] The inventors next proceeded to test ARTR-seq on PTBP1 in 40,000 HepG2 and HeLa cells, respectively. The inventors compared ARTR-seq results with the published data from several known methods, namely CLIP, iCLIP, irCLIP, eCLIP, sCLIP, tRIP, LACE-seq and RT&Tag. By counting the usable reads, which were defined as reads uniquely mapped to the genome and remained after PCR deduplication, the inventors observed that ARTR-seq displayed a comparable or higher percentage of usable reads compared to all published methods, suggesting a high complexity of the ARTR-seq libraries (FIGs. 8A-8B). Then, the inventors calculated the correlation between biological replicates based on usable reads per gene normalized to coverage (reads per million reads mapped, RPM), and observed a high correlation (R= 0.98 for both HepG2 and HeLa samples), indicating good reproducibility of ARTR-seq (FIG. 2A).
[0678] Further, the inventors introduced input samples prepared by ARTR-seq with the omission of the primary antibody as controls to help filter out potential background signals caused by the non-specific binding of RTase (FIG. 8C). In the case of PTBP 1, the inventors found that over 70% of usable reads and over 80% of ARTR-seq peaks were annotated to introns, with the majority of exon peaks located within the 3' untranslated region (3' UTR), consistent with results obtained from using other methods10, 12, 13, 27-30 (FIG. 2B and FIGs. 8D- 8E). The consensus motif of PTBP 1 ARTR-seq peaks was identified as the canonical CU-rich sequence also known previously31 (FIG. 2B). At the genomic scale, the inventors plotted read distribution around the published eCLIP peaks32. ARTR-seq reads for PTBP1 were well aligned at the eCLIP peaks, while the input sample did not show such accumulation (FIGs. 9A-B). Additionally, the inventors observed that over 50% of genes identified by ARTR-seq overlapped with those targeted by other methods (52% for eCLIP, 51% for LACE-seq, and 82% for iCLIP). At the peak level, ARTR-seq successfully identified 41% of eCLIP -targeted peaks. (FIG. 9C). Examination of individual PTBP1 binding sites revealed similar reads distribution and density between ARTR-seq and eCLIP or iCLIP results (FIG. 2C and FIG. 9D). To further validate PTBP bindings captured by ARTR-seq, the inventors knocked down PTBP1 in HepG2 cells using two distinct siRNAs and performed ARTR-seq (FIG. 8E). The reads located around the ARTR-seq peaks reduced accordingly upon PTBP1 knockdown, indicating the high specificity of ARTR-seq (FIG. 2D).
Direct versus indirect binding sites detected by ARTR-seq
[0679] ARTR-seq identifies RBP binding by in-situ RT, which enables the capture of RNAs directly bound by the RBP (direct targets) or potentially those spatially close to the RBP (indirect targets) (FIG. 10A). To evaluate direct versus indirect targets, the inventors employed the splicing factor RBFOX2 as an example; RBFOX2 possesses a well-defined canonical binding motif ‘UGCAUG’ . Peaks close to the ‘UGCAUG’ motifs likely represent direct targets, while those farther away have an increasing possibility of being indirect targets. The inventors calculated the distances between the peak center to the nearest ‘UGCAUG’ sequence, and observed over 70% of ARTR-seq peaks were within 500 nucleotides (nts) from ‘UGCAUG’ . This peak percentage of ARTR-seq is slightly higher than that of eCLIP9. The two methods are comparable when the distance was set to 200 nts (FIG. 10B). It is important to note that RBFOX2 can have other non-canonical binding sites beyond the ‘UGCAUG’ motif, as suggested by the similar ratio of distant RBFOX2 eCLIP peaks from this motif. Additionally, setting more stringent signal values and q-value cutoffs for peaks increased confidence in identifying the direct targets, albeit at the expense of target numbers (FIGs. 10C- 10D) Furthermore, taking advantage of single-nucleotide-resolution m6A sequencing results offered by m6A-SAC-seq42, the inventors also examined YTHDF2, an m6A binding protein. The inventors observed about 80% of YTHDF2 ARTR-seq peaks were within 300 nts from individual m6A sites, comparable to that from the PAR-CLIP method37 (FIG. 10E). These results all indicate that the indirect interactions captured in ARTR-seq are likely limited. The ratios of direct targets identified by ARTR-seq are comparable to those observed in CLIP-based methods. [0680] To further interrogate potential indirect targets identified in ARTR-seq, the inventors limited the movement range of RTase by shortening the linker in pAG-RTase and omitting the secondary antibody (FIGs. 11A-11C). The inventors found shorter linkers reduced RT activity of pAT-RTase, implying that shorter linkers might lead to a slowdown in the kinetics of RTase (FIG. 11D). In RBFOX2 ARTR-seq, the employments of shortening linkers or omitting the secondary antibody resulted in decreased biotinylated cDNA yield but slightly increased read accumulation at RBFOX2 ARTR-seq peaks, indicating reduced RT efficiency and concentrated ARTR-seq signals (FIGs. 11E-11G). Moreover, by calculating the distances between the peak center to the nearest ‘UGCAUG’, the inventors observed a little higher percentage (1.9% - 3.4%) of peaks within 500 nts of ‘UGCAUG’ when a shorter linker or the omission of the secondary antibody was applied (FIG. 11H). These findings indicate that restricting the RTase movement range tested here only moderately reduced potential indirect RNA capture by ARTR-seq. Higher RT efficiency is another factor that needs to be considered when designing optimal linkers.
Resolution of ARTR-seq
[0681] To assess the resolution of ARTR-seq, the inventors examined the distribution of RBFOX2 peak centers around ‘UGCAUG’ sites, and observed a clear enrichment with the majority of peaks positioned within 200 nts flanking the ‘UGCAUG’ motif (FIG 12A). Furthermore, the inventors conducted a parallel analysis on YTHDF2. Compared to RBFOX2, the inventors observed a similar but more enriched distribution of YTHDF2 around the corresponding m6A sites, further supporting ARTR-seq as a method that can capture direct targets and binding sites of RBPs (FIG 12B).
[0682] In an attempt to improve the resolution of binding site identification by ARTR-seq, the inventors evaluated the impact of RNase treatment in RBFOX2 ARTR-seq. As expected, the stronger RNase treatment reduced the library fragment lengths (FIG. 13A). The inventors observed that the stronger RNase treatment led to a sharper enrichment of RBFOX2 ARTR- seq peaks around ‘UGCAUG’ sites, indicating an improved resolution of ARTR-seq upon RNase treatment (FIG. 13B) The inventors quantified RT efficiency through qPCR of biotinylated cDNA, and found that samples with the stronger RNase treatment exhibited lower RT efficiency (FIG. 13C). By calculating the distances from the peak center to the nearest ‘UgCAUG’, the inventors observed that the stronger RNase treatment resulted in obviously decreased proportion of peaks located within 500 nts of the canonical ‘UGCAUG’ motif. This observation suggested that the application of RNase can reduce reads from direct target transcripts, thereby potentially elevating the ratio of non-specific or indirect-binding signals (FIG. 13D) Overall, the studies revealed that RNase treatment could improve the resolution of ARTR-seq. The strength of RNase treatment in ARTR-seq needs to be optimized to achieve the desired balance between resolution and sensitivity, especially for samples with limited starting materials.
ARTR-seq specifically detected PTBP1 binding sites with as few as 20 cells
[0683] The in-situ RT -based ARTR-seq bypasses the IP step to minimize sample loss, potentially making it feasible for low cell number samples. To test this, the inventors generated libraries for PTBP1 using different numbers of HepG2 cells and compared the results with published data from LACE-seq and RT&Tag of low cell number samples13, 22. The inventors found the correlations at the gene level remained strong for ARTR-seq libraries prepared from as few as 20 cells (FIG. 14A). Additionally, ARTR-seq libraries exhibited a much higher percentage of usable reads compared to other methods when using comparable numbers of cells (FIG. 2E and FIGs. 14B-14C). Furthermore, ARTR-seq presented a consistently high percentage of intronic reads for PTBT1, suggesting its effectiveness in capturing informative reads even with the limited starting material (FIG. 14D). The inventors further subsampled the libraries from different numbers of cells to an equal sequencing depth and examined their reads distribution at peaks identified in the corresponding bulk samples. Compared to LACE-seq, ARTR-seq exhibited a clearer accumulation at the center of peaks with a higher proportion of effective reads (FIG. 2F and FIG. 14E). Visible ARTR-seq signal remained stable for libraries with different numbers of cells as exemplified in the IGV plot (FIG. 2G).
[0684] Because PTBP1 binds to a canonical CU-rich sequence, the inventors therefore compared the CT percentages in usable reads of PTBP 1 libraries constructed by different methods. The inventors found that all the ARTR-seq libraries showed comparable or higher CT percentage compared to that of CLIP, iCLIP, eCLIP, irCLIP or LACE-seq10, 13, 27-29 (FIG. 2H). The inventors further assessed the read distribution around CU-rich regions and observed the stable read accumulation in ARTR-seq libraries of all cell numbers peaked at the center of the regions (FIG. 21). Taken together, ARTR-seq can effectively and specifically capture the RBP binding sites even with limited starting materials.
Application of ARTR-seq in mouse embryo sections
[0685] RBPs can have strong tissue-specific expression, or are only expressed in certain tissues rather than cultured cells. The identification of RBP binding sites in tissues is still technically challenging33. IP -based methods can require dissociating tissues into single cells to allow UV-crosslinking, which limits their application to whole tissues, particularly embedded frozen tissues or formalin-fixed tissues. Editing-based methods can require genetic modification and cannot be applied to patient tissues.
[0686] ARTR-seq offers an opportunity for identification of RBP binding sites from tissues. The inventors studied a splicing factor RBFOX2 with a section of OCT-embedded El l mouse embryo to validate the feasibility of ARTR-seq in tissue samples (FIG. 3A). The inventors first confirmed that the localization of RBFOX2 was predominantly in the nucleus of mouse embryos with the IF imaging built into the ARTR-seq procedure (FIG. 3B). The ARTR-seq reads for mouse embryo tissue showed a high percentage of usable reads and good reproducibility at gene level between biological replicates (FIGs. 15A-15B). Compared to the input, a higher percentage of usable reads from ARTR-seq of RBFOX2 were mapped to introns, consistent with the known binding preference of RBFOX232 (FIG. 15C). RBFOX2 binding peaks were mostly located in introns and contain the canonical ‘UGCAUG’ motif9 (FIG. 3C). In addition, the inventors calculated the percentage of usable reads containing ‘UGCAUG’ sequence and found that mouse tissue samples displayed a similar percentage of enriched motif to that of HepG2 cell samples, indicating a comparable signal detection efficiency of ARTR- seq for tissues and cultured cells (FIG. 3D). Examination of individual binding sites supported binding of the ‘UGCAUG’ sequences by RBFOX2 (FIG. 3E). Overall, ARTR-seq can identify RBP binding sites in embedded tissue samples with high specificity.
ARTR-seq profiles regulatory features of splicing factors
[0687] The previously mentioned RBPs, PTBP1 and RBFOX2, are well-known splicing factors, with PTBP1 belonging to the heterogeneous ribonucleoprotein (hnRNP) family34. To show broader applicability the inventors also studied HNRNPC, another splicing factor belonging to the hnRNP family (FIG. 16A). Consistent with the binding preference of the splicing factors, both reads (over 70%) and peaks (over 80%) from the ARTR-seq libraries of all three splicing factors (PTBP1, HNRNPC, and RBFOX2) were mainly located in introns in HepG2 cells (FIGs. 4A-4B and FIG. 16B). The RNA binding motifs of RBFOX2 and HNRNPC are the canonical ‘UGCAUG’ and U-rich sequence, respectively, consistent with the previous report32 (FIGs. 4A-4B)
[0688] To explore how splicing factor binding is associated with their splicing regulation, the inventors identified the alternative splicing (AS) events by comparing the ENCODE RNA- seq data from RBP-knockdown cells with that from control cells35. The inventors found most of the AS events were categorized as exon skipping (FIG. 4C). The inventors then generated ‘RNA splicing maps’ for exon skipping events, which plot the peak density on alternatively spliced exons upon RBP knockdown and their proximal introns2 (FIG. 4D). The corresponding ARTR-seq peaks were predominantly enriched at upstream proximal introns of the included exons for PTBP1, at downstream proximal introns of the excluded exons for RBF0X2, and at both upstream and downstream proximal introns of the included exons for HNRNPC, but not around native cassette exons and constitutive exons. The inventors quantified relative RBP binding strength by ARTR-seq enrichment at gene level, and divided the genes into three groups of no, low or high ARTR-seq enrichment. The inventors observed that genes with higher ARTR-seq enrichment tend to present a higher splicing difference upon knockdown for all three splicing factors (FIG. 4E and FIG. 16C). In addition to SE, the number of included retained introns (RI) upon PTBP1 knockdown (491 events) outnumbered other splicing modes. The inventors further inspected the relationship between ARTR-seq enrichment and splicing difference of RIs (FIG. 16D) and found that higher enrichment corresponded to higher splicing inclusion differences of RIs, similar to the trend observed for SEs. Altogether, ARTR-seq robustly captures distinctive binding patterns for different splicing factors, and the ARTR-seq enrichment could indicate differences in splicing.
Distinct binding features of m6A reader proteins identified by ARTR-seq
[0689] In addition to recognizing specific sequences, RBPs can also recognize RNA targets in a chemical modification-dependent manner. m6A modification is the most prevalent chemical modification in mammalian mRNA, and m6A reader proteins can preferentially bind m6A-modified RNAs to regulate its processing and metabolism in both the nucleus and cytoplasm36'40. In addition to YTHDF2, the inventors performed ARTR-seq in HeLa cells for another cytosolic m6A reader YTHDF1, and a nuclear reader YTHDC1.
[0690] The inventors first verified the subcellular localization of the three readers with the built-in imaging step in ARTR-seq procedure (FIG. 17A). The sequencing data of ARTR-seq remained highly reproducible between replicates for all three proteins (FIG. 17B). Over 80% of the peaks of the two cytoplasmic m6A readers (YTHDF1 and YTHDF2) were located in exons, whereas ~ 81% of the peaks of nuclear reader YTHDC1 were located in introns or intergenic regions, consistent with their distinct subcellular localization features (FIG. 5A and FIGs. 17A and 17C). The high unique peak ratios observed for the three reader proteins (84.2% for YTHDC1, 34.3% for YTHDF1, and 47.5% for YTHDF2) can be explained by their unique subcellular localization; YTHDF1 and YTHDF2 display different sequences of the N-terminal low-complexity domain, which most likely affect their binding to different partner proteins and therefore different RNA targets41 (FIG. 17D). The inventors further investigated the much more abundant non-exonic peaks of YTHDC1, and found more than half of them located in repeat elements, with the most prevalent being long interspersed nuclear elements (LINEs) (~ 45%), consistent with a previous report40 (FIG. 5B). The inventors next examined the distribution of exonic peaks along mRNA and found that the profiles for all readers showed enrichment around stop codons, which resembles the meta profile of m6A modifications, especially for YTHDF1 and YTHDF242 (FIG. 5C and FIG. 17E).
[0691] Further, the inventors calculated the percentage of exonic peaks overlapping with m6 A sites in polyadenylated RNA identified by m6 A-S AC-seq42. The peaks for all three readers captured by ARTR-seq showed higher percentages than random peaks, also comparable to the YTHDF2 peaks captured by PAR-CLIP37, supporting the m6A-dependent binding features of these three readers (FIG. 5D). The inventors then analyzed the association between the m6A fraction and RBP binding strength. By dividing peaks into four groups based on the m6A fraction (sum value), the inventors observed that the group with higher m6A fractions showed higher RBP enrichment signals for YTHDF1 and YTHDF2, suggesting ARTR-seq can measure the relative binding strength of RBPs (FIG. 5E). However, the association was not strong for YTHDC1 (FIG. 17D). Most of the YTHDC1 peaks were located in introns that lack quantitative m6A seq data, resulting in a lower number of exon peaks being used for analysis, which can explain the reduced association. Overall, ARTR-seq captured different features of three m6A binding proteins in cytoplasm and nucleus.
Dynamic RNA binding of G3BP1 during stress granule assembly
[0692] Stress granules (SGs) are membraneless organelles composed of proteins and RNAs that form in response to stress. The RBP G3BP1 is the central node in the network of protein- RNA interaction during SG assembly43, 44 Under sodium arsenite (NaAsO?) treatment, SGs could be observed after 13 min with a progressive increase in size over time, with most of the SG assembly completed by 40 min, providing a rapid stress response45. However, whether RNA targets of G3BP1 vary during SG assembly has yet to be investigated.
[0693] Taking advantage of the potential high temporal resolution offered by fast formaldehyde fixation and low material requirements of ARTR-seq, the inventors performed ARTR-seq for G3BP1 in HeLa cells with the treatment of 0.5 mM NaAsCh and monitored the SG assembly process at time intervals of 0 min, 10 min, 20 min, and 60 min post stress. The inventors first visualized G3BP1 localization using IF imaging, and confirmed the gradual condensation of G3BP1 into granules over time (FIG. 6 A). The colocalization of G3BP1 and biotinylated cDNAs produced in ARTR-seq was further verified (FIG. 6B). Subsequently, the same samples examined by imaging were used for ARTR-seq library construction and sequencing. The inventors then analyzed sequencing data and determined G3BP1 binding strength by calculating the ARTR-seq log2 fold change (log2FC) between G3BP1 and input samples at gene level. The inventors observed ~ 78% of G3BP1 RNA targets (log2FC > 1, P- value < 0.05) were no longer enriched at 60 min (T60) post NaAsCh treatment (FIG. 6C). SG enrichment of RNA was previously reported by sequencing RNAs separated from NaAsCh- induced SGs to quantify the relative degree of RNA SG localization46. Through integrative analysis, the inventors observed that G3BP1 targets at T60 showed significantly higher SG enrichment compared to the starting point (without stress, TO) (FIG. 6D). These results supported the accuracy of ARTR-seq and distinct RNA binding of G3BP1 in the presence and absence of stress. The functions of stress-induced G3BP1 targets (T60_only) were enriched to KEGG pathways of protein processing in the endoplasmic reticulum (ER) and human papillomavirus (HPV) infection, consistent with previous results47, 48 (FIG. 6E).
[0694] To further explore the dynamic RNA targeting of G3BP1 over time, the inventors calculated pairwise correlations of the G3BP1 binding strength among time points. The correlation coefficients were generally low (R = 0.38-0.57), suggesting distinct G3BP1 bindings at different time intervals (FIG. 18A). RNAs were previously classified into SG- enriched RNAs and SG-depleted RNAs according to their SG enrichment46. The inventors found that during SG assembly, the G3BP1 binding strength from ARTR-seq gradually increased for SG-enriched RNAs, and decreased for SG-depleted RNAs, suggesting a shift of G3BP1 targets towards SG-enriched RNAs as SGs assemble (FIGs. 6F-6G). A portion of RNAs captured in ARTR-seq displayed stable G3BP1 binding, while others showed dynamic G3BP1 binding across time intervals (FIG. 6H and FIGs. 18B-18C). The inventors then grouped these RNAs based on G3BP1 binding strength using the fuzzy c-means clustering algorithm. The inventors found that G3BP1 binding strength for these RNAs displayed not only unidirectional trajectories of increasing or decreasing, but also transient changes during a 60-minute period of NaAsCh treatment, suggesting rapid and dynamic responses of cells to stress (FIGs. 6H-6I and FIG. 18D) Taken together, the ARTR-seq performed along the stress progression showcased the highly dynamic nature of G3BP1-RNA interaction during SG assembly. The results also indicate ARTR-seq as a method that allows capturing temporal changes of protein-RNA interactions in a short timescale with the limited starting material. [0695] In summary, the inventors have created methods and compositions that can be utilized to an assay of reverse transcription-based RBP binding sites sequencing (ARTR-seq), which relies on in-situ reverse transcription of RBP-bound RNAs guided by antibodies to identify RBP binding sites. ARTR-seq avoids ultraviolet cross-linking and immunoprecipitation, allowing for efficient and specific identification of RBP binding sites from as few as 20 cells or a tissue section. Taking advantage of rapid formaldehyde fixation, ARTR-seq enables capturing the dynamic binding of RBPs over a short period of time, as demonstrated by the discovery of dynamic RNA binding of G3BP1 during stress granule assembly on a timescale as short as 10 min.
Data availability
[0696] All the sequencing data generated in this study have been deposited in NCBI's Gene Expression Omnibus (GEO) under the accession number GSE226161. Previously published data from CLIP-seq27, eCLIP29, iCLIP28, irCLIP10, LACE-seq13, sCLIP11, tRIP-seq12 and RT&Tag22 are available under accession numbers of GSE42701, GSE92205, E-MTAB-3108, GSE78832, GSE137925, GSE92995, DRA005743 and GSE195654, respectively. The data were downloaded and processed as described in the articles. The PTBP1, RBFOX2 and HNRNPC knockdown RNA-Seq data were downloaded from ENCODE portal32 under the accession numbers of ENCSR052IYH, ENCSR305XWT, ENCSR634KBO, ENCSR572FFX, ENCSR767LLP, ENCSR104ABF, ENCSR336DFS, ENCSR667PLJ, ENCSR064DXG, ENCSR603TCV, ENCSR527IVX, ENCSR129RWD. The published PAR-CLIP data and the corresponding peaks for YTHDF2 are available under the GEO accession number of GSE49339. The m6A modification sites list identified by m6A-SAC-seq is available under the GEO accession number of GSE198246.
Code availability
[0697] Codes for processing ARTR-seq data are available in the following GitHub repository https://github.com/mingming-cgz/ARTR-seq.
EXAMPLE 4: Multiplex ARTR-seq
[0698] A multiplexed ARTR-seq approach was developed to spontaneously detect the binding of multiple RNA binding proteins (RBPs) within a single sample. In the design of multiplexed ARTR-seq (FIG. 19), unique DNA barcodes (with alkyne) are covalently ligated to RPB-specific antibodies (Adapter AB in FIG. 19), allowing for decoding the RBP targets during next-generation sequencing (NGS). Multiple DNA-barcoded antibodies that recognize corresponding RBPs were added to the assay system, followed by the application of a secondary antibody and protein A/G-reverse transcriptase (pAG-RTase). After removing unbound pAG-RTase, in situ reverse transcription (RT) is initiated at the binding sites of RBPs by adding azide-labeled random RT primers (NNNN-N3), biotinylated dNTP, and other components.
[0699] To link the antibody barcodes to their corresponding cDNA products, alkyne group was incorporated into the DNA barcodes of RBP antibodies, enabling in situ copper-catalyzed azide-alkyne cycloaddition (CuAAC) to ligate antibody barcode oligos specifically with their proximal cDNA products. After the biotin enrichment of these cDNAs, the inventors performed adapter ligation for library construction. The heterocycle generated during CuAAC can be processed by Klenow Fragment (3'— >5' exo-) DNA polymerase for second-strand synthesis. After library amplification and NGS, RBP binding sites are deconvoluted based on their specific barcode sequences.
[0700] The encouraging results from these experiments show that other approaches could also be envisioned using similar approaches with the goal to simultaneously map binding sites of multiple RBPs using DNA barcoded antibodies that recognize corresponding RBPs.
EXAMPLE 5: Enhanced Spatial ARTR-seq for RNA modifications
[0701] Over 150 distinct chemical modifications occur on the RNA molecules, impacting various aspects of gene expression, such as RNA decay and translation. These modifications also play critical roles in physiology and disease. Notable, N6-methyladenosine (m6A) stands out as the most prevalent modification in mammalian mRNA and chromatin-associated RNA (caRNA), with close associations with disorders and cancers. In addition, m6A modification exhibits distinct tissue-specific distribution. Measuring transcriptome-wide m6A at single-cell and spatial resolution allows a deeper understanding of epitranscriptomic regulations in heterogeneous cell types within tissues. However, existing methods enable transcriptome-wide m6A exploration at bulk or single-cell level, but they lack spatial information. Identifying spatial m6A distribution within tissues at high spatial resolution remains a challenge.
[0702] Next it was tested if Spatial ARTR-seq could be used to map RNA modifications spatially. For this, m6A modification was used as an example. Building on the ARTR-seq methods, which detects binding sites of RNA binding proteins through in situ reverse transcription, and deterministic barcoding in tissue (DBiT) technology, which uses microfluidic chips with parallel channels directly placed against a fixed tissue slide for barcoding, spatial m6A-ARTR-seq for de novo spatial profiling of m6A modifications across the transcriptome was attempted.
[0703] FIG. 20 provides a schematic design of spatial m6A-ARTR-seq. OCT-embedded tissue sections are fixed with formaldehyde. After permeabilization, m6A modifications are targeted using m6A-specific antibody, followed by the application of a secondary antibody and the protein A/G-reverse transcriptase (pAG-RTase) fusion protein, locating RTase at m6A sites. In situ reverse transcription is then initiated by the addition of RT components. Spatial barcoding is achieved via a microfluidic device with two PDMS chips featuring multiple parallel microchannels, delivering horizontal (Al-An) and vertical (Bl-Bn) barcodes sequentially to generate a unique 2D barcode array. After imaging, the tissues are digested for downstream cDNA enrichment and library preparation, followed by high-throughput sequencing to decode spatial m6A distribution within tissues.
[0704] In addition to the microfluidic system, note that many other spatial methods could be used as well. For instance, once cDNA is generated, these cDNAs could be recognized by nucleic acid probes and imaged with methods such as MERFISH or STARmap, or any other method that can image or sequence cDNAs in the spatial manner. DBiT is only presented as one example for the approach. These approaches could be used to map RBP binding sites. These RBPs also include ribosomes so one can spatially map translation.
[0705] This procedure was tested experimentally using m6A-ARTR-seq. The m6A-ARTR- seq was tested in HeLa cells. Immunofluorescence (IF) staining showed that m6A was predominantly localized in cytoplasm, with strong colocalization of pAG-RTase and the secondary antibody, and their signals largely disappeared when the m6A antibody was omitted, demonstrating the high specificity of pAG-RTase targeting (FIG. 21A). Correlation analysis between biological replicates showed high reproducibility of m6A-ARTR-seq, with a Pearson’ s correlation coefficient of 0.999 (FIG. 21B) At the peak level, approximately 80% of m6A peaks were shared between replicates, with enrichment around stop codon regions (FIGs. 21C- D). The majority of m6A peaks were annotated to exonic regions and associated with the canonical consensus sequence of ‘GGACU’, consistent with previous reports (FIG. 21E). Additionally, comparison of individual m6A sites revealed similar distribution between m6A - ARTR-seq, m6A-SAC-seq, and GLORI (FIG. 21F). Collectively, these findings confirm the robust performance of m6A-ARTR-seq in detection.
[0706] To further benchmark this method for spatial m6A profiling, spatial m6 A- ARTR- seq was applied to sagittal tissue sections of embryonic day 11 (El l) mouse embryos using a microfluidic device with a pixel resolution of 50 pm (FIG. 22A). Simultaneously, m6A-ARTR- seq was applied to E14 mouse embryonic stem cells (mESCs) and compared the m6A distribution compared between El l tissues and mESCs. Correlation analysis showed high reproducibility between replicates from adjacent sections, with a Pearson correlation coefficient of 0.98 (FIG. 22B). In contrast, the correlation between El l tissues and mESCs was much lower, as reflected by lower Pearson correlation coefficients of about 0.38, suggesting a dramatic m6A distinction. At the m6A peak level, only 34% of peaks in El 1 tissue overlapped with those in mESCs, with the canonical ‘GGACU’ motif observed in the overlapped peaks (FIG. 22C). Analysis of exonic peak distribution along mRNA showed less enrichment of m6A at stop codon and higher enrichment in the 5' UTR region for El 1 tissues (FIF. 22D). The genome browser snapshots showcased the specific m6A sites in Fat4 for El 1 tissue, Akapl2 for mESC and shared m6A modifications in Arhgap5 (FIG. 22E). The spatial m6A-ARTR-seq detected an average of 2359 unique molecular identifiers (UMIs) and 1594 genes per pixel (FIG. 22F). Unsupervised clustering identified 16 m6A clusters, and the spatial Uniform Manifold Approximation and Projection (UMAP) closely aligned with the histology from an adjacent hematoxylin and eosin (H&E) stained section, demonstrating distinct m6A distribution within mouse embryo tissues and the ability of m6A modification to reveal the subtle tissue structures (FIGs. 22A-22G).
[0707] Spatial m6A profiling was further extended to coronal mouse brain section (FIG. 23A). Fluorescence imaging of the fhiorophore-labeled secondary antibody revealed the overall m6A distribution in the same spatial section, which was subsequently applied for downstream sequencing (FIG. 23B). On average, 2489 UMIs and 1664 genes were captured in each pixel, with the UMI map highly mirroring the overall m6A distribution detected via imaging (FIGs. 23B-23C). Unsupervised clustering of the gene-by-pixel matrix unveiled 20 spatially organized m6A clusters, whose spatial distribution closely aligned with the anatomical annotations of a similar brain section in the Allen Mouse Brain Atlas (FIG. 23D). Furthermore, read distributions along mouse mRNA showed clear enrichment at stop codon region in both biological replicates, recapitulating the canonical m6A distribution observed in bulk samples, further confirming the reproducibility and specificity of spatial m6A-ARTR-seq in mouse brain (FIG. 23E) To investigate brain region-specific m6A modifications, 20 spatial m6A clusters were grouped into 9 brain regions and compared their m6A level (FIG. 23F). The m6A signal of Cblnl. which encodes a secreted glycoprotein regulated by m6A reader protein YTHDF3 to affect synaptic transmission, was dominantly distributed in thalamus (TH), particularly for the perifascicular nucleus (PF) (FIG. 23G). Another example is Zbtb20.j which regulates the hippocampus development, with its m6A signal were mainly enriched in the dentate gyrus (DG) (FIG. 4H) Taken together, these findings demonstrate that spatial m6A-ARTR-seq provides a high-resolution map of m6A modifications in the mouse brain, revealing region-specific m6A distributions and highlighting the potential of this method to uncover spatially organized epitranscriptomic regulation in complex tissues.
* * *
[0708] All of the methods disclosed and claimed herein can be made and executed without undue experimentation in light of the present disclosure. While the compositions and methods of this invention have been described in terms of certain aspects, it will be apparent to those of skill in the art that variations can be applied to the methods and in the steps or in the sequence of steps of the method described herein without departing from the concept, spirit and scope of the invention. More specifically, it will be apparent that certain agents which are both chemically and physiologically related can be substituted for the agents described herein while the same or similar results would be achieved. All such similar substitutes and modifications apparent to those skilled in the art are deemed to be within the spirit, scope and concept of the invention as defined by the appended claims.
SEQUENCES
Figure imgf000137_0001
Figure imgf000138_0001
Figure imgf000139_0001
Figure imgf000140_0001
Figure imgf000141_0001
Figure imgf000142_0001
Figure imgf000143_0001
Figure imgf000144_0001
Figure imgf000145_0001
Figure imgf000146_0001
Figure imgf000147_0001
Figure imgf000148_0001
Figure imgf000149_0001
REFERENCES
[0709] All references cited herein, including patent applications, patent publications, and Accession numbers, to the extent that they provide exemplary procedural or other details supplementary to those set forth herein, are specifically herein incorporated by reference in their entirety, as if each individual reference were specifically and individually indicated to be incorporated by reference.
[0710] 1. Gerstberger, S., Hafner, M. & Tuschl, T. A census of human RNA-binding proteins. Nat Rev Genet 15, 829-845 (2014).
[0711] 2 Gebauer, F., Schwarzl, T., Valcarcel, J. & Hentze, M.W. RNA-binding proteins in human genetic disease. Nat Rev Genet 22, 185-198 (2021).
[0712] 3. Lerner, M.R. & Steitz, J. A. Antibodies to small nuclear RNAs complexed with proteins are produced by patients with systemic lupus erythematosus. Proceedings of the National Academy of Sciences 76, 5495-5499 (1979).
[0713] 4. Tenenbaum, S. A., Carson, C.C., Lager, P. J. & Keene, J.D. Identifying mRNA subsets in messenger ribonucleoprotein complexes by using cDNA arrays. Proceedings of the National Academy of Sciences 97, 14085-14090 (2000).
[0714] 5. Ule, J. et al. CLIP Identifies Nova-Regulated RNA Networks in the Brain.
Science 302, 1212-1215 (2003).
[0715] 6. Licatalosi, D.D. et al. HITS-CLIP yields genome-wide insights into brain alternative RNA processing. Nature 456, 464-469 (2008).
[0716] 7 Hafner, M. et al. Transcriptome-wide identification of RNA-binding protein and microRNA target sites by PAR-CLIP. Cell 141, 129-141 (2010).
[0717] 8. Konig, J. et al. iCLIP reveals the function of hnRNP particles in splicing at individual nucleotide resolution. Nat Struct Mol Biol 17, 909-915 (2010).
[0718] 9. Van Nostrand, E.L. et al. Robust transcriptome-wide discovery of RNA-binding protein binding sites with enhanced CLIP (eCLIP). Nat Methods 13, 508-514 (2016).
[0719] 10. Zamegar, B.J. et al. irCLIP platform for efficient characterization of protein-
RNA interactions. Nat Methods 13, 489-492 (2016).
[0720] 11. Kargapolova, Y., Levin, M., Lackner, K. & Danckwardt, S. sCLIP-an integrated platform to study RNA-protein interactomes in biomedical research: identification of CSTF2tau in alternative processing of small nuclear RNAs. Nucleic Acids Res 45, 6074- 6086 (2017).
[0721] 12. Masuda, A. et al. tRIP-seq reveals repression of premature polyadenylation by co-transcriptional FUS-U1 snRNP assembly. EMBO Rep 21, e49890 (2020).
[0722] 13. Su, R. et al. Global profiling of RNA-binding protein target sites by LACE-seq.
Nat Cell Biol 23, 664-675 (2021). [0723] 14. Blue, S.M. et al. Transcriptome-wide identification of RNA-binding protein binding sites using seCLIP-seq. Nat Protoc 17, 1223-1265 (2022).
[0724] 15. Lorenz, D.A. et al. Multiplexed transcriptome discovery of RNA-binding protein binding sites by antibody-barcode eCLIP. Nat Methods 20, 65-69 (2023).
[0725] 16. McMahon, A.C. et al. TRIBE: Hijacking an RNA-Editing Enzyme to Identify
Cell-Specific Targets of RNA-Binding Proteins. Cell 165, 742-753 (2016).
[0726] 17. Brannan, K.W. et al. Robust single-cell discovery of RNA targets of RNA- binding proteins and ribosomes. Nat Methods 18, 507-519 (2021).
[0727] 18. Nguyen, D.T.T. et al. HyperTRIBE uncovers increased MUSASHI-2 RNA binding activity and differential regulation in leukemic stem cells. Nat Commun 11, 2026 (2020).
[0728] 19. Xu, W., Rahman, R. & Rosbash, M. Mechanistic implications of enhanced editing by a HyperTRIBE RNA-binding protein. RNA 24, 173-182 (2018).
[0729] 20. Flamand, M.N., Ke, K., Tamming, R. & Meyer, K.D. Single-molecule identification of the target RNAs of different RNA binding proteins simultaneously in cells. Genes Dev 36, 1002-1015 (2022).
[0730] 21. Meyer, K.D. DART-seq: an antibody-free method for global m6A detection.
Nat Methods 16, 1275-1280 (2019).
[0731] 22. Khyzha, N., Henikoff, S. & Ahmad, K. Profiling RNA at chromatin targets in situ by antibody-targeted tagmentation. Nat Methods (2022).
[0732] 23. Kaya-Okur, H.S. et al. CUT&Tag for efficient epigenomic profiling of small samples and single cells. Nat Commun 10, 1930 (2019).
[0733] 24. Anzalone, A. V. et al. Search-and-replace genome editing without double-strand breaks or donor DNA. Nature 576, 149-157 (2019).
[0734] 25. Potter, R. J., & Rosenthal, K. High fidelity reverse transcriptases and uses thereof. US Patent No. US7056716B2. June 6, 2006.
[0735] 26. Oscorbin, I.P. & Filipenko, M.L. M-MuLV reverse transcriptase: Selected properties and improved mutants. Comput Struct Biotechnol J 19, 6315-6327 (2021).
[0736] 27. Xue, Y. et al. Direct conversion of fibroblasts to neurons by reprogramming
PTB-regulated microRNA circuits. Cell 152, 82-96 (2013).
[0737] 28. Coelho, M.B. et al. Nuclear matrix protein Matrin3 regulates alternative splicing and forms overlapping regulatory networks with PTB. EMBO J 34, 653-668 (2015).
[0738] 29. Consortium, E.P. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57-74 (2012). [0739] 30. Fred, R.G., Tillmar, L. & Welsh, N. The role of PTB in insulin mRNA stability control. Curr Diabetes Rev 2, 363-366 (2006).
[0740] 31. Xue, Y. et al. Genome-wide analysis of PTB-RNA interactions reveals a strategy used by the general splicing repressor to modulate exon inclusion or skipping. Mol Cell 36, 996-1006 (2009).
[0741] 32. Van Nostrand, E.L. et al. A large-scale binding and functional map of human
RNA-binding proteins. Nature 583, 711-719 (2020).
[0742] 33. Hafner, M. et al. CLIP and complementary methods. Nature Reviews Methods
Primers 1 (2021).
[0743] 34. Dvinge, H. Regulation of alternative mRNA splicing: old players and new perspectives. FEBS Lett 592, 2987-3006 (2018).
[0744] 35. Luo, Y. et al. New developments on the Encyclopedia of DNA Elements
(ENCODE) data portal. Nucleic Acids Res 48, D882-D889 (2020).
[0745] 36. Shi, H., Wei, J. & He, C. Where, When, and How: Context-Dependent
Functions of RNA Methylation Writers, Readers, and Erasers. Mol Cell 74, 640-650 (2019).
[0746] 37. Wang, X. et al. N6-methyladenosine-dependent regulation of messenger RNA stability. Nature 505, 117-120 (2014).
[0747] 38. Wang, X. et al. N6-m ethyladenosine Modulates Messenger RNA Translation
Efficiency. Cell 161, 1388-1399 (2015).
[0748] 39. Roundtree, I. A. et al. YTHDC1 mediates nuclear export ofN6-methyladenosine methylated mRNAs. Elife 6 (2017).
[0749] 40. Liu, J. et al. N6-m ethyladenosine of chromosome-associated regulatory RNA regulates chromatin state and transcription. Science 367, 580-586 (2020).
[0750] 4L Zou, Z., Sepich-Poore, C., Zhou, X., Wei, J. & He, C. The mechanism underlying redundant functions of the YTHDF proteins. Genome Biol 24, 17 (2023).
[0751] 42. Ge, R. et al. m6A-SAC-seq for quantitative whole transcriptome m6A profiling.
Nat Protoc (2022).
[0752] 43. Yang, P. et al. G3BP1 Is a Tunable Switch that Triggers Phase Separation to
Assemble Stress Granules. Cell 181, 325-345 e328 (2020).
[0753] 44. Protter, D.S.W. & Parker, R. Principles and Properties of Stress Granules.
Trends Cell Biol 26, 668-679 (2016).
[0754] 45. Wheeler, J.R., Matheny, T., Jain, S., Abrisch, R. & Parker, R. Distinct stages in stress granule assembly and disassembly. Elife 5 (2016). [0755] 46. Khong, A. et al. The Stress Granule Transcriptome Reveals Principles of mRNA
Accumulation in Stress Granules. Mol Cell 68, 808-820 e805 (2017).
[0756] 47. Chou, R.H. & Huang, H. Sodium arsenite suppresses human papillomavirus- 16
E6 gene and enhances apoptosis in E6-transfected human lymphoblastoid cells. J Cell Biochem 84, 615-624 (2002).
[0757] 48. Sun, H. et al. Sodium Arsenite-Induced Learning and Memory Impairment Is
Associated with Endoplasmic Reticulum Stress-Mediated Apoptosis in Rat Hippocampus. Front Mol Neurosci 10, 286 (2017).
[0758] 49. Henikoff, S. & Ahmad, K. In situ tools for chromatin structural epigenomics.
Protein Sci 31, e4458 (2022).
[0759] 50. Lopes, I., Altab, G., Raina, P. & de Magalhaes, J.P. Gene Size Matters: An
Analysis of Gene Length in the Human Genome. Front Genet 12, 559998 (2021).
[0760] 51. Irgen-Gioro, S., Yoshida, S., Walling, V. & Chong, S. Fixation can change the appearance of phase separation in living cells. Elife 11 (2022).
[0761] 52. Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet. journal 17, 10-12 (2011).
[0762] 53. Langmead, B. & Salzberg, S.L. Fast gapped-read alignment with Bowtie 2. Nat
Methods 9, 357-359 (2012).
[0763] 54. Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner.
Bioinformatics 29, 15-21 (2013).
[0764] 55. Smith, T., Heger, A. & Sudbery, I. UMI-tools: modeling sequencing errors in
Unique Molecular Identifiers to improve quantification accuracy. Genome Res 27, 491- 499 (2017).
[0765] 56. Graubert, A., Aguet, F., Ravi, A., Ardlie, K.G. & Getz, G. RNA-SeQC 2: efficient RNA-seq quality control and quantification for large cohorts. Bioinformatics 37, 3048-3050 (2021).
[0766] 57. Liao, Y., Smyth, G.K. & Shi, W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923-930 (2014). [0767] 58. Robinson, J.T. et al. Integrative genomics viewer. Nat Biotechnol 29, 24-
26 (2011).
[0768] 59. Ramirez, F., Dundar, F., Diehl, S., Gruning, B.A. & Manke, T. deepTools: a flexible platform for exploring deep- sequencing data. Nucleic Acids Res 42, W187-191 (2014). [0769] 60. Zhang, Y. et al. Model-based analysis of ChlP-Seq (MACS). Genome
Biol 9, R137 (2008). [0770] 61. Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol Cell 38, 576-589 (2010).
[0771] 62. Kent, W.J. et al. The human genome browser at UCSC. Genome Res 12, 996-
1006 (2002).
[0772] 63. Danecek, P. et al. Twelve years of SAMtools and BCFtools. Gigascience 10 (2021).
[0773] 64. Yee, B.A., Pratt, G. A., Graveley, B.R., Van Nostrand, E.L. & Yeo, G.W. RBP-
Maps enables robust generation of splicing regulatory maps. RNA 25, 193-204 (2019).
[0774] 65. Quinlan, A.R. & Hall, I.M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841-842 (2010).
[0775] 66. Cui, X. et al. Guitar: An R/Bioconductor Package for Gene Annotation Guided
Transcriptomic Analysis of RNA-Related Genomic Features. Biomed Res Int 2016, 8367534 (2016).
[0776] 67. R Core Team, R. R: A language and environment for statistical computing.
(2013).
[0777] 68. Wickham, H. ggplot2: elegant graphics for data analysis New York. NY:
Springer (2009).
[0778] 69. Love, M.I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15, 550 (2014).
[0779] 70. Kumar, L. & M, E.F. Mfuzz: a software package for soft clustering of microarray data. Bioinformation 2, 5-7 (2007).
[0780] 71. Wu, T. et al. clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. Innovation (Camb) 2, 100141 (2021).
[0781] 72. Rosenberg AB, Roco CM, Muscat RA, Kuchina A, Sample P, Yao Z, Graybuck
LT, Peeler DJ, Mukherjee S, Chen W, Pun SH, Sellers DL, Tasic B, Seelig G. Single-cell profiling of the developing mouse brain and spinal cord with split-pool barcoding. Science. 2018 Apr 13;360(6385):176-182. doi: 10.1126/science.aam8999.
[0782] 73. Lee JH, Daugharthy ER, Scheiman J, Kalhor R, Yang JL, Ferrante TC, Terry
R, Jeanty SS, Li C, Amamoto R, Peters DT, Turczyk BM, Marblestone AH, Inverse SA, Bernard A, Mali P, Rios X, Aach J, Church GM. Highly multiplexed subcellular RNA sequencing in situ. Science. 2014 Mar 21;343(6177): 1360-3. doi: 10.1126/science.1250212.

Claims

CLAIMS What is claimed is:
1. A polypeptide construct comprising: a) a targeting moiety; and b) a reverse transcriptase enzyme, or a functional variant there.
2. The polypeptide construct of claim 1, wherein the targeting moiety is a Fc binding protein or variant thereof, an antibody or variant thereof, an oligonucleotide or variant thereof, a receptor or variant thereof, a ligand, a small molecule, an aptamer, a nucleoside, or any combination thereof.
3. The polypeptide construct of claim 2, wherein the targeting moiety comprises a Fc binding protein or a variant thereof.
4. The polypeptide construct of claim 2, wherein the targeting moiety comprises an antibody or variant thereof.
5. The polypeptide construct of claim 2, wherein the targeting moiety comprises an oligonucleotide or a variant thereof.
6. The polypeptide construct of claim 5, wherein the oligonucleotide comprises a barcode, indices, affinity tag, label, a modified nucleotide, or any combination thereof.
7. The polypeptide construct of claim 6, wherein the affinity tag comprises a streptavidin, or an avidin tag.
8. The polypeptide construct of claim 2, wherein the targeting moiety comprises a small molecule.
9. The polypeptide construct of claim 3, wherein the Fc binding protein comprises protein A, protein G, protein A/G (pAG), protein L, anti-rabbit IgG, anti-mouse IgG, or a variant thereof, or any combination thereof.
10. The polypeptide construct of claim 9, wherein the Fc binding protein comprises pAG.
11. The polypeptide construct of claim 9, wherein the Fc binding protein comprises an amino acid sequence as set forth is any one of SEQ ID NOs: 8, 10, and 12, or an amino acid sequence at least 60% identical thereto.
12. The polypeptide construct of claim 1, wherein the reverse transcriptase comprises Moloney murine leukemia virus (MMLV) RTase, human immunodeficiency virus (HIV) RTase, Avian Myeloblastosis Virus (AMV) RTase or a functional variant thereof.
13. The polypeptide construct of claim 1, wherein the reverse transcriptase protein comprises an amino acid sequence as set forth is any one of SEQ ID NO s: 2, 4, and 6, or an amino acid sequence at least 60% identical thereto, or a functional variant thereof.
14. The polypeptide construct of claim 1, further comprising one or more linker sequences directly or indirectly bound to the targeting moiety and the reverse transcriptase.
15. The polypeptide construct of claim 1, wherein the one or more linker sequences are at least, equal to, or at most, 2-100 amino acids in length.
16. The polypeptide construct of claim 15, wherein the one or more linker sequences are 2-100 amino acids in length, 2-10 amino acids in length, 11-20 amino acids in length, 21-30 amino acids in length, 31-40 amino acids in length, 41-50 amino acids in length, 51-60 amino acids in length, 61-70 amino acids in length, 71-80 amino acids in length, 81-90 amino acids in length, or 91-100 amino acids in length.
17. The polypeptide construct of claim 16, wherein the linker comprises an amino acid sequence as set forth in SEQ ID NO: 28, or a sequence at least 80% identical thereto.
18. The polypeptide construct of claim 1, further comprising a fluorophore.
19. The polypeptide construct of claim 18, wherein the fluorophore comprises Green Fluorescent Protein (GFP), eGFP, Red Fluorescent Protein (RFP), Teal Fluorescent Protein (TFP), Blue Fluorescent Protein (BFP), Yellow Fluorescent Protein (YFP), miRFP, cerulean fluorescent protein (CFP), eCyanFP, mCherry, mVenus, mOrange, mTurquoise, tdTomato, aminocoumarin, fluorescein, texas red, Alexa Fluor dyes (e.g. Alexa Fluor 488, Alexa Fluor 555, Alexa Fluor 594, Alexa Fluor 647, Alexa Fluor 350, Alexa Fluor 532, and Alexa Fluor 700), Cy dyes (e.g. Cy3, Cy5), DyLight dyes, FITC, or Rhodamine, or functional variants thereof.
20. The polypeptide construct of claim 1, further comprising a purification and/or a solubilization tag.
21. The polypeptide construct of claim 20, wherein the purification and/or a solubilization tag comprises a maltose binding protein (MBP) tag, a GST-tag, a FLAG tag, an HA tag, a His-tag, a SUMO-tag, a Trx-tag, a Halo-tag, or any combination thereof.
22. The polypeptide construct of claim 21, wherein the purification and or a solubilization tag comprises an amino acid sequence as set forth in SEQ ID NO: 30, or a sequence at least 80% identical thereto.
23. The polypeptide construct of claim 1, further comprising a peptide leader sequence.
24. A transcriptase composition comprising the polypeptide construct of any one of claims 1-23, and a transcriptase mix comprising one or more adapter-RT primer, wherein the one or more adapter RT -primer each comprises an adapter primer sequence and an RT primer sequence.
25. The transcriptase composition of claim 24, wherein at least one of the one or more RT primer sequence is a random RT primer.
26. The transcriptase composition of claim 25, wherein the random RT primer comprises at least 7 nucleotides.
27. The transcriptase composition of claim 25, wherein the random RT primer is at least 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, or more, nucleotides in length.
28. The transcriptase composition of claim 24, wherein the adapter primer sequence comprises a sequencing barcode.
29. The transcriptase composition of claim 24, wherein the transcriptase mix further comprises non-labeled dNTPs, labeled dNTPs, or any combination thereof.
30. The transcriptase composition of claim 29, wherein the labeled dNTPs are biotinylated dNTPs, optionally wherein the biotinylated dNTPs comprises biotin- 16-dUTP, or biotin- 16-dCTP, or both.
31. The transcriptase composition of claim 29, wherein the labeled dNTP and the nonlabeled dNTP are at a ratio of at least 0.5:1, 1 :1, or 2:1.
32. The transcriptase composition of claim 24, wherein the RT sequence primer further comprises an azide functional group.
33. The transcriptase composition of claim 24, wherein the adapter-RT primer comprises a nucleotide sequence as set forth in as set forth in SEQ ID NO: 25, or a sequence at least 80% identical thereto.
34. A method of determining one or more RNA interaction sites of a RNA-binding Protein (RBP) in a biological sample, comprising: a) incubating a RBP-targeting agent with the RBP, wherein the RBP-targeting agent specifically binds the RBP to form a primary complex; b) incubating the first complex with one or more secondary binding agents that specifically bind the RBP-targeting agent, to form a secondary complex; c) incubating the first or the secondary complex with the transcriptase composition of claim 24, to obtain cDNA; d) sequencing the cDNA to determine the one or more RNA interaction sites of the RBP.
35. The method of claim 34, wherein the biological sample is a RNA-protein complex, a cell, or a tissue section.
36. The method of claim 34, further comprising fixing the biological sample with a fixing agent.
37. The method of claim 36, wherein the fixing agent comprises formaldehyde, paraformaldehyde, and/or glutaraldehyde.
38. The method of claim 37, wherein the fixing agent is paraformaldehyde at a concentration of about 0.1% to about 5% by volume, or at a concentration of at least, equal to, about, or more than 0.1%, 0.2%, 0.3%, 0.4%, 0.5%, 0.6%, 0.7%, 0.8%, 0.9%, 1.0%, 1.1%, 1.2%, 1.3%, 1.4%, 1.5%, 1.6%, 1.7%, 1.8%, 1.9%, 2.0%, 2.1%, 2.2%, 2.3%, 2.4%, or 2.5% by volume.
39. The method of claim 36, wherein the fixing comprises incubating the biological sample and the fixing agent together for, or for less than 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 minutes.
40. The method of claim 36, further comprising quenching of the fixing agent with a quenching agent.
41. The method of claim 40, wherein the quenching agent comprises glycine.
42. The method of claim 40 or claim 41, wherein the quenching agent is at a concentration of greater than, equal to, at least, at most, or about 25, 50, 75, 100, 125, 150, 200, 225, or 250 mM.
43. The method of claim 34, wherein the biological sample comprises cell and/or tissue, the method further comprising permeabilizing the cell and/or the tissue section with a permeabilizing agent.
44. The method of claim 43, wherein the permeabilizing agent comprises a detergent.
45. The method of claim 43, wherein the detergent comprises Triton X-100, optionally wherein the Triton-X is at a concentration of greater than, equal to, at least, at most, or about 0.1%, 0.2%, 0.3%, 0.4%, 0.5%, 0.6%, 0.7%, 0.8%, 0.9%, 1.0%, 1.1%, 1.2%, 1.3%, 1.4%, or 1.5%.
46. The method of claim 34, further comprising incubating the primary and/or the secondary complex with an RNase enzyme.
47. The method of claim 34, wherein the RBP is a transcription factor, a splicing factor, RNA helicase, ribonuclease, RNA polymerase, translation initiation factor, or ribosomal protein.
48. The method of claim 34, wherein the RBP comprises YTHDF1, YTHDF2, YTHDC1, HuR, PTB, Musashi, eIF4E, FMRP, LARP1, IMP, hnRNP family proteins, Lin28, AUF1, IGF2BP, FUBP1, LIN28B, RBM5, FUS, TIA1, TTP, QKI, MBNL, CELF, NONO, DDX5, RBM10, SAFB, TDP-43, Ataxin-2, hnRNP A/B, C9orf72, hnRNP H/F, Matrin 3 (MATR3), Pur-alpha, TAF15, Huntingtin, RBFOX, SMN, ELAVL, Ro (SSA) and La (SSB) Proteins, hnRNP, Roquin, Staufenl, NF90/NF110, ILF3, SF3B1, SRSF2, U2AF1, ZRSR2, PRPF8, PRPF31, SNRNP200, HNRNPA1, HNRNPA2B1, NELFE, CPEB1, SRSF1, NO VAI, NOVA2, G3BP1, PTBP1, RBFOX2, and/or HNRNPC.
49. The method of claims 34, wherein the RBP-targeting agent specifically binds the RBP, optionally wherein the RBP-targeting agent comprises an antibody or functional variant thereof, optionally wherein the antibody or functional variant thereof comprises a polyclonal antibody, a monoclonal antibody, a chimeric antibody, a human antibody, a veneered antibody, a diabody, a humanized antibody, an antibody derivative, a recombinant antibody, a recombinant humanized antibody, an engineered antibody, single chain antibody, single domain antibody, nanobodies, diabodies, a bi-specific antibody, a multi-specific antibody, a DARPin, or a variant of each thereof.
50. The method of claim 34, wherein the secondary binding agent comprises an antibody or functional variant thereof, optionally wherein the antibody or functional variant thereof comprises a polyclonal antibody, a monoclonal antibody, a chimeric antibody, a human antibody, a veneered antibody, a diabody, a humanized antibody, an antibody derivative, a recombinant antibody, a recombinant humanized antibody, an engineered antibody, single chain antibody, single domain antibody, nanobodies, diabodies, a bi-specific antibody, a multi-specific antibody, a DARPin, or a variant of each thereof.
51. The method of claim 34, wherein the RBP-targeting agent is labeled, optionally wherein the label comprises a radioisotopes, a hapten, a fluorescent label, a fluorescent polypeptide, a phosphorescent molecule, a chemiluminescent molecule, a chromophore, a luminescent molecule, a photoaffinity molecule, a colored particle, and/or a ligand.
52. The method of claim 34, wherein the RBP-targeting agent is linked to a functionalized DNA barcode via an amino spacer, optionally wherein the functionalized DNA barcode comprises a alkyne (3'-O-propargyl N 2'-5' linked) functionalized DNA barcode.
53. The method of claim 52, wherein the alkyne functionalized barcodes comprise a nucleic acid sequence as set forth in any one of SEQ ID NO: 31-78, or a nucleic acid sequence at least 80% identical thereto.
54. The method of claim 34, wherein the secondary binding agent is labeled, optionally wherein the label comprises a radioisotopes, a hapten, a fluorescent label, a fluorescent polypeptide, a phosphorescent molecule, a chemiluminescent molecule, a chromophore, a luminescent molecule, a photoaffinity molecule, a colored particle, and/or a ligand.
55. The method of claim 54, wherein the fluorescent label comprises Green Fluorescent Protein (GFP), eGFP, Red Fluorescent Protein (RFP), Teal Fluorescent Protein (TFP), Blue Fluorescent Protein (BFP), Yellow Fluorescent Protein (YFP), miRFP, cerulean fluorescent protein (CFP), eCyanFP, mCherry, mVenus, mOrange, mTurquoise, tdTomato, aminocoumarin, fluorescein, texas red, Alexa Fluor dyes (e.g. Alexa Fluor 488, Alexa Fluor 555, Alexa Fluor 594, Alexa Fluor 647, Alexa Fluor 350, Alexa Fluor 532, and Alexa Fluor 700), Cy dyes (e.g. Cy3, Cy5), DyLight dyes, FITC, or Rhodamine, or functional variants thereof.
56. The method of claim 34, wherein the steps (a) - (c) are conducted in-situ.
57. The method of claims 34, wherein the method further comprises imaging the biological sample.
58. The method of claim 34, wherein the biological sample comprises less than or equal to 1000, 750, 500, 100, 50, or 20 cells, or wherein the biological sample comprises a single cell, wherein the biological sample comprises less than 5 tissue sections, or wherein the biological sample comprises a single tissue section.
59. The method of claim 34, wherein the method does not comprise ultraviolet crosslinking.
60. The method of claim 34, wherein the method does not comprise immunoprecipitation.
61. The method of claim 34, wherein the method does not comprise use of base editing proteins.
62. The method of claim 34, wherein the method does not comprise dissociating the one or more tissue section into single cells.
63. The method of claim 34, wherein the method detects transient and/or dynamic RNA- RBP interactions, optionally wherein the transient and/or dynamic RNA-RBP interactions occur on a timescale within 10 minutes.
64. The method of claim 34, wherein the method does not comprise oligo(dT) primer initiated reverse transcription.
65. The method of claim 34, wherein the method does not comprise Tn5 tagmentation.
66. The method of claims 47, wherein the RBP is a splicing factor.
67. The method of claim 66, wherein the method is used to determine splice variants between one or more biological samples.
68. The method of claims 47, wherein the RBP is a YTH family reader protein, or wherein the RBP is G3BP1.
69. The method of claim 34, wherein the method can be used to determine one or more interaction sites of the RBP with RNA in the cytoplasm, or nucleus, or both.
70. The method of claim 34, wherein the method is used to measures relative binding strength of the RBP to the RNA in comparison to one or more other RBPs to the RNA.
71. The method of claim 34, wherein the cDNA is labeled.
72. The method of claim 71, wherein the cDNA comprises one or more labeled nucleotides.
73. The method of claim 72, wherein the nucleotides are labeled with a fluorescent label.
74. The method of claim 72, wherein the labeled nucleotides comprises a biotinylated nucleotide.
75. The method of claim 74, wherein the method further comprises purifying the cDNA with a streptavidin comprising agent, optionally wherein the streptavidin comprising agent comprises, a bead, a plate, a magnetic bead, an agarose bead, a microtiter plate, a nanoparticle, and/or a membrane.
76. The method of claim 34, wherein two or more unique RBP targeting agents that interact with one or more RBPs are used in step (a).
77. The method of claim 76, wherein each of the two or more unique RBP targeting agents comprise a unique functionalized DNA barcode linked via an amino spacer.
78. The method of claim 77, wherein the functionalized DNA barcode comprises an alkyne (3 '-0 -propargyl N 2 -5' linked) functionalized DNA barcode.
79. The method of claim 78, wherein the alkyne functionalized barcodes comprise a nucleic acid sequence as set forth in any one of SEQ ID NO: 31-98, or a sequence at least 80% identical thereto.
80. A method of in-situ imaging of one or more RNA interaction sites of an RNA-binding Protein (RBP) in a biological sample bound to a solid surface, comprising: a) incubating a RBP-targeting agent with the RBP, wherein the RBP-targeting agent specifically binds the RBP to form a primary complex; b) incubating the first complex with one or more secondary binding agents that specifically binds the RBP-targeting agent, to form a secondary complex; c) incubating the primary or the secondary complex with a transcriptase composition of claim 24, to obtain cDNA; and d) imaging the solid surface.
81. The method of claim 80, wherein the RBP-targeting agent or the one or more secondary binding agents or any combination thereof are labeled, optionally wherein the label comprises radioisotopes, a hapten, a fluorescent label, a fluorescent polypeptide, a phosphorescent molecule, a chemiluminescent molecule, a chromophore, a luminescent molecule, a photoaffinity molecule, a colored particle and/or a ligand.
82. The method of claim 81, wherein the fluorescent label comprises Green Fluorescent Protein (GFP), eGFP, Red Fluorescent Protein (RFP), Teal Fluorescent Protein (TFP), Blue Fluorescent Protein (BFP), Yellow Fluorescent Protein (YFP), miRFP, cerulean fluorescent protein (CFP), eCyanFP, mCherry, mVenus, mOrange, mTurquoise, tdTomato, aminocoumarin, fluorescein, texas red, Alexa Fluor dyes (e.g. Alexa Fluor 488, Alexa Fluor 555, Alexa Fluor 594, Alexa Fluor 647, Alexa Fluor 350, Alexa Fluor 532, and Alexa Fluor 700), Cy dyes (e.g. Cy3, Cy5), DyLight dyes, FITC, or Rhodamine, or functional variants thereof.
83. The method of claim 80, wherein the cDNA is labeled.
84. The method of claim 83, wherein the cDNA comprises one or more labeled nucleotides optionally wherein the nucleotides are labeled with a fluorescent label and/or are biotinylated.
85. The method of claim 80, wherein the imaging is done using fluorescence microscopy.
86. The method of claim 80, wherein the biological sample is a RNA-protein complex, a cell, or a tissue section.
87. The method of claim 80, further comprising fixing the biological sample with a fixing agent.
88. The method of claim 87, wherein the fixing agent comprises formaldehyde, paraformaldehyde, and/or glutaraldehyde.
89. The method of claim 88, wherein the fixing agent is paraformaldehyde at a concentration of about 0.5% to about 5% by volume or wherein the paraformaldehyde at a concentration of about 0.1%, 0.2%, 0.3%, 0.4%, 0.5%, 0.6%, 0.7%, 0.8%, 0.9%, 1.0%, 1.1%, 1.2%, 1.3%, 1.4%, 1.5%, 1.6%, 1.7%, 1.8%, 1.9%, 2.0%, 2.1%, 2.2%, 2.3%, 2.4%, or 2.5% by volume.
90. The method of claim 87, wherein the wherein the fixing comprises incubating the biological sample and the fixing agent for, or for less than 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 minutes.
91. The method of claim 87, further comprising quenching of the fixing agent with a quenching agent, optionally wherein the quenching agent comprises glycine.
92. The method of claim 91, wherein the quenching agent is a concentration of greater than, equal to, at least, at most, or about 25, 50, 75, 100, 125, 150, 200, 225, or 250 mM.
93. The method of claim 80, further comprising permeabilizing the cell and/or the tissue section with a permeabilizing agent.
94. The method of claim 93, wherein the permeabilizing agent comprises a detergent.
95. The method of claim 94, wherein the detergent comprises Triton X-100, optionally wherein the Triton-X is at a concentration of greater than, equal to, at least, at most, or about
0.1%, 0.2%, 0.3%, 0.4%, 0.5%, 0.6%, 0.7%, 0.8%, 0.9%, 1.0%, 1.1%, 1.2%, 1.3%, 1.4%, or 1.5%.
96. The method of claim 80, wherein the transcriptase mix further comprises an RNase.
97. The method of claim 80, wherein the RNA binding protein is a transcription factor, a splicing factor, RNA helicase, ribonuclease, RNA polymerase, translation initiation factor, or ribosomal protein.
98. The method of claim 80, wherein the RBP comprises YTHDF1, YTHDF2, YTHDC1, HuR, PTB, Musashi, eIF4E, FMRP, LARP1, IMP, hnRNP family proteins, Lin28, AUF1, IGF2BP, FUBP1, LIN28B, RBM5, FUS, TIA1, TTP, QKI, MBNL, CELF, NONO, DDX5, RBM10, SAFB, TDP-43, Ataxin-2, hnRNP A/B, C9orf72, hnRNP H/F, Matrin 3 (MATR3), Pur-alpha, TAF15, Huntingtin, RBFOX, SMN, ELAVL, Ro (SSA) and La (SSB) Proteins, hnRNP, Roquin, Staufenl, NF90/NF110, ILF3, SF3B1, SRSF2, U2AF1, ZRSR2, PRPF8, PRPF31, SNRNP200, HNRNPA1, HNRNP A2B1, NELFE, CPEB1, SRSF1, NO VAI, NOVA2, G3BP1, PTBP1, RBFOX2, and/or HNRNPC.
99. The method of claim 80, wherein the RBP-targeting agent specifically binds the RBP.
100. The method of claim 99, wherein the RBP-targeting agent is an antibody or a functional variant thereof, optionally wherein the antibody or the functional variant thereof comprises a polyclonal antibody, a monoclonal antibody, a chimeric antibody, a human antibody, a veneered antibody, a diabody, a humanized antibody, an antibody derivative, a recombinant antibody, a recombinant humanized antibody, an engineered antibody, single chain antibody, single domain antibody, nanobodies, diabodies, a bi-specific antibody, a multi-specific antibody, a DARPin, or a variant of each thereof.
101. The method of claim 80, wherein the one or more secondary binding agent is an antibody, or a functional variant thereof, optionally wherein the antibody, or the functional variant thereof comprises a polyclonal antibody, a monoclonal antibody, a chimeric antibody, a human antibody, a veneered antibody, a diabody, a humanized antibody, an antibody derivative, a recombinant antibody, a recombinant humanized antibody, an engineered antibody, single chain antibody, single domain antibody, nanobodies, diabodies, a bi-specific antibody, a multi-specific antibody, a DARPin, or a variant of each thereof.
102. The method of claim 80, wherein the solid surface comprises a slide, a multi-well plate, a capillary, or the like.
103. The method of claim 80, further comprising sequencing the cDNA.
104. The method of claim 103, wherein the sequencing is performed using Next Generation Sequencing (NGS) techniques.
105. The method of claim 104, wherein the sequencing is done using a single cell genomic imaging techniques.
106. The method of claim 105, wherein the single cell genomic imaging technique comprises, consists essentially of, or consists of spatial transcriptomics, MERFISH, SeqFISH, STARmap, Slide-Seq, Visium Spatial Gene Expression, or deterministic barcoding in tissue for spatial omics sequencing (DBiT-seq).
107. The method of claim 106, wherein the single cell genomic imaging technique is a microfluidic based technique comprising: ligating a first set and a second set of spatial barcodes to the cDNA of step (c), prior to step (d), wherein the first set of spatial barcodes are contacted to the cDNA horizontally using a first multi-channel microfluidic chip, and wherein the second set of spatial barcodes are contacted to the solid surface vertically using a second multi-channel microfluidic chip.
108. The method of claim 105, wherein the first set of spatial barcodes and second set of spatial barcodes form a 2D spatial barcode array
109. A kit comprising a polynucleotide construct of any one of claims 1-23, or a transcriptase composition of any one of claims 24-32.
110. A method of identifying one or more RNA interaction sites of a RNA-binding Protein (RBP) in a biological sample, comprising:
(a) fixing the biological sample;
(b) incubating the biological sample with an agent that permeabilizes cell membranes;
(c) providing an RBP-targeting agent to the sample, wherein the RBP-targeting agent interacts with the RBP of interest; (d) providing a transcriptase composition comprising a polypeptide construct comprising a targeting moiety and a reverse transcriptase enzyme; wherein the targeting moiety interacts with the RBP-targeting agent;
(e) incubating the sample with the transcriptase composition to produce cDNA; and
(f) sequencing the cDNA.
111. The method of claim 110, wherein the targeting moiety comprises a Fc binding protein or a variant thereof, an antibody or variant thereof, an oligonucleotide or variant thereof, a receptor, a ligand, a small molecule, or any combination thereof.
112. The method of claim 111, wherein the targeting moiety comprises a Fc binding protein or a variant thereof, an antibody or variant thereof, and/or an oligonucleotide or a variant thereof.
113. A method of determining one or more RNA interaction sites of a first RNA-binding Protein (RBP) in a biological sample, comprising: a) incubating a first RBP-targeting agent comprising a functionalized first DNA barcode, with the first RBP, wherein the first RBP-targeting agent specifically binds the first RBP to form a first primary complex; b) incubating the first primary complex with one or more secondary binding agents that specifically binds the first RBP-targeting agent, to form a secondary complex; c) incubating the first primary or the secondary complex with the transcriptase composition of claim 24, to obtain a first barcoded cDNA library; d) amplifying and sequencing the first barcoded cDNA library; and e) obtaining one or more interaction site of the first RBP by deconvoluting the sequenced cDNA library based on the first DNA barcode.
114. The method of claim 113, wherein the transcriptase composition comprise an RT primer sequence comprising a functional group and biotinylated dNTPs.
115. The method of claim 114, wherein the functional group is an azide functional group.
116. The method of claim 114 or claim 115, wherein the biotinylated dNTPs, and the RT primer sequence comprising the azide functional group, are incorporated into the cDNA to form proximal azide labeled biotinylated cDNAs during reverse transcription in step c.
117. The method of claim 113, wherein the functionalized DNA barcode comprises an alkyne (3 '-0 -propargyl N 2 -5' linked) functionalized DNA barcode.
118. The method of claim 113, wherein the alkyne functionalized barcodes comprise a nucleic acid sequence as set forth in any one of SEQ ID NO: 31-78, or a sequence at least 80% identical thereto.
119. The method of claim 113, further comprising incorporating the alkyne functionalized first DNA barcode into the cDNA by reacting the alkyne functionalized first DNA barcode with the proximal azide labeled biotinylated cDNA of claim 116, using in-situ copper catalyzed azide-alkyne cycloaddition (CuAAC), to obtain a first barcoded biotinylated cDNA library.
120. The method of claim 119, wherein the method further comprises purifying the barcoded biotinylated cDNA library over a streptavidin column prior to step (d).
121. The method of claim 120, further comprising processing the CuAAC using a Klenow Fragment DNA polymerase for second strand synthesis.
122. The method of claim 121, wherein the one or more interaction sites of the first RBP are obtained by deconvoluting the sequenced data based on the first DNA barcode incorporated into the cDNA.
123. The method of claim 113, further comprising determining the one or more RNA- interaction sites of a second RNA-binding Protein (RBP) in a biological sample, comprising: a) incubating a second RBP-targeting agent comprising a alkyne functionalized second DNA barcode, with the second RBP, wherein the RBP-targeting agent specifically binds the second RBP to form a second primary complex; b) incubating the second primary complex with one or more secondary binding agents that specifically binds the first RBP-targeting agent, to form a second secondary complex; c) incubating the second primary or the second secondary complex with the transcriptase composition of claim 24, to obtain a second barcoded cDNA library; d) amplifying and sequencing the second barcoded cDNA library; and e) obtaining one or more interaction site of the second RBP by deconvoluting the sequenced cDNA library based on the second DNA barcode.
124. The method of claim 113, comprising determining the one or more RNA interaction sites for greater than, equal to, at least, at most 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, or 1500 RBPs, wherein each RBP targeting agent has a unique alkyne functionalized DNA barcode.
125. A method of determining spatial distribution of a RNA modification site on a biological sample bound to a solid surface, comprising: a) incubating a modification-targeting agent that specifically binds the modification site on the RNA to form a primary complex; b) incubating the primary complex with a secondary binding agent that specifically bind the primary complex to form a secondary complex; c) incubating the primary complex or the secondary complex with the transcriptase composition of claim 24 to obtain cDNA; d) optionally incorporating labelled barcodes into the cDNA; e) sequencing and imaging the biological sample using a single cell genomic imaging technique to determine the one or more modification sites.
126. The method of claim 125, wherein the modification-targeting agent is an oligonucleotide, or a variant thereof, or a small molecule.
127. The method of claim 126, wherein the oligonucleotide comprises fluorescent NTPs, or a fluorescent probe.
128. The method of claim 125, wherein the modification-targeting agent is an antibody or a functional variant thereof, optionally wherein the antibody or the functional variant thereof comprises monoclonal antibodies, polyclonal antibodies, recombinant antibody, IgG, Fv, single chain antibody, single domain antibodies, nanobodies, diabodies, multi specific antibodies (e.g., bispecific antibodies), scFv, Fab, F(ab')2, Fab, or variants thereof.
129. The method of claim 127, wherein the modification targeting agent specifically binds to a modification comprising m6C, m5C, mxA, m7G, or a pseudouridine modification.
130. The method of claim 125, wherein the sequencing and imaging is done using a single cell genomic imaging technique.
131. The method of claim 130, wherein the single cell genomic imaging technique comprises spatial transcriptomics, MERFISH, SeqFISH, STARmap, Slide-Seq, Visium Spatial Gene Expression, or deterministic barcoding in tissue for spatial omics sequencing (DBiT-seq).
132. The method of claim 131, wherein the single cell genomic imaging technique comprises deterministic barcoding in tissue for spatial omics sequencing (DBiT-seq) comprising: ligating a first set and a second set of spatial barcodes to the cDNA of step (c), prior to step (d), wherein the first set of spatial barcodes are contacted to the cDNA horizontally using a first multi-channel microfluidic chip, and the second set of spatial barcodes are contacted to the solid surface vertically using a second multi-channel microfluidic chip.
133. The method of claim 131, wherein the first set of spatial barcodes and the second set of spatial barcodes form a 2D spatial barcode array.
PCT/US2024/0511372023-10-122024-10-11Methods and compositions for characterizing rna-binding protein binding sites by in-situ reverse transcription-based sequencingPendingWO2025081111A1 (en)

Applications Claiming Priority (2)

Application NumberPriority DateFiling DateTitle
US202363589874P2023-10-122023-10-12
US63/589,8742023-10-12

Publications (1)

Publication NumberPublication Date
WO2025081111A1true WO2025081111A1 (en)2025-04-17

Family

ID=95396587

Family Applications (1)

Application NumberTitlePriority DateFiling Date
PCT/US2024/051137PendingWO2025081111A1 (en)2023-10-122024-10-11Methods and compositions for characterizing rna-binding protein binding sites by in-situ reverse transcription-based sequencing

Country Status (1)

CountryLink
WO (1)WO2025081111A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN120174065A (en)*2025-05-232025-06-20北京大学 Construction method and application of spatial transcriptomics library

Citations (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20050176068A1 (en)*2002-04-262005-08-11Emmert-Buck Michael R.Direct cell target analysis
US20150344942A1 (en)*2012-10-172015-12-03Spatial Transcriptomics AbMethods and product for optimising localised or spatial detection of gene expression in a tissue sample
US20160159877A1 (en)*2014-12-012016-06-09Pfenex Inc.Fusion partners for peptide production
CN110372799A (en)*2019-08-012019-10-25北京大学A kind of fusion protein and its application for the preparation of the unicellular library ChIP-seq
US20210292753A1 (en)*2020-03-192021-09-23Evolve Biotech, Inc.Methods and compositions for directed genome editing
US20220056537A1 (en)*2016-09-022022-02-24New England Biolabs, Inc.Analysis of Chromatin Using a Nicking Enzyme
US20220356469A1 (en)*2019-03-192022-11-10The Broad Institute, Inc.Methods and compositions for editing nucleotide sequences methods and compositions for editing nucleotide sequences

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20050176068A1 (en)*2002-04-262005-08-11Emmert-Buck Michael R.Direct cell target analysis
US20150344942A1 (en)*2012-10-172015-12-03Spatial Transcriptomics AbMethods and product for optimising localised or spatial detection of gene expression in a tissue sample
US20160159877A1 (en)*2014-12-012016-06-09Pfenex Inc.Fusion partners for peptide production
US20220056537A1 (en)*2016-09-022022-02-24New England Biolabs, Inc.Analysis of Chromatin Using a Nicking Enzyme
US20220356469A1 (en)*2019-03-192022-11-10The Broad Institute, Inc.Methods and compositions for editing nucleotide sequences methods and compositions for editing nucleotide sequences
CN110372799A (en)*2019-08-012019-10-25北京大学A kind of fusion protein and its application for the preparation of the unicellular library ChIP-seq
US20210292753A1 (en)*2020-03-192021-09-23Evolve Biotech, Inc.Methods and compositions for directed genome editing

Cited By (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN120174065A (en)*2025-05-232025-06-20北京大学 Construction method and application of spatial transcriptomics library

Similar Documents

PublicationPublication DateTitle
US12098417B2 (en)Spatial transcriptomics for antigen-receptors
US20250250632A1 (en)Molecular barcode readers for analyte detection
EP4180533B1 (en)Highly-multiplexed fluorescent imaging
JP5766948B2 (en) Use of aptamers in proteomics
Gulbranson et al.AAGAB controls AP2 adaptor assembly in clathrin-mediated endocytosis
US12276668B2 (en)Compositions and methods for assaying proteins and nucleic acids
WO2025081111A1 (en)Methods and compositions for characterizing rna-binding protein binding sites by in-situ reverse transcription-based sequencing
US20230160882A1 (en)Compositions and methods for low-volume biomolecule assays
Ament et al.Long-read RNA sequencing: A transformative technology for exploring transcriptome complexity in human diseases
CN115715330A (en)System and method for inferring gene expression and tissue of origin from cell-free DNA
WO2020176534A1 (en)Multiplexed signal amplification methods using enzymatic based chemical deposition
Majumder et al.Compendium of methods to uncover RNA-protein interactions in vivo
Curras-Alonso et al.Spatial transcriptomics for respiratory research and medicine
Kim et al.Direct profiling the post-translational modification codes of a single protein immobilized on a surface using Cu-free click chemistry
US10526608B2 (en)In vitro selection of formaldehyde cross-linking aptamers
US20250230498A1 (en)Spatially identifying nucleic acids that interact with proteins
MelendezDevelopments in Proteomics, Trans-Splicing Technology And Endogenous Transcript Manipulation
JP2006506058A (en) Process for determining target function and identifying drug leads
Brancato et al.NGS Approaches in Clinical Diagnostics: From Workflow to Disease-Specific Applications
NwosuOptimization of Mass Spectrometry-Based Methods for Low-Input and Spatial Proteomics
AU2022387613A1 (en)Methods for identification of antigen-binding molecules
WangHighly Multiplexed in Situ Protein Imaging Using DNA-Exchange-Imaging and Immuno-Saber
Zhong et al.Modular DNA Barcoding of Nanobodies Enables Multiplexed in situ Protein Imaging and High-throughput Biomolecule Detection
RanseyMechanistic and Functional Studies of RNA Processing Pathway Components
HK40009147A (en)Highly-multiplexed fluorescent imaging

Legal Events

DateCodeTitleDescription
121Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number:24878195

Country of ref document:EP

Kind code of ref document:A1


[8]ページ先頭

©2009-2025 Movatter.jp