Movatterモバイル変換


[0]ホーム

URL:


TandemRepeat Analysis from Capillary Electrophoresis (trace)

R-CMD-check

This package provides a pipeline for short tandem repeat instabilityanalysis from fragment analysis data. The inputs are fsa files or peaktables (e.g. Genemapper 5 software peak table output), and a usersupplied metadata data-frame. The functions identify ladders, callspeaks, and calculate repeat instability metrics (i.e. expansion index oraverage repeat gain).

This code is not for clinical use. There are features for accuraterepeat sizing if you use validated control samples, but integer repeatunits are not returned.

To report bugs or feature requests, please visit the Github issuetrackerhere. Forassistance or any other inquires, contactZach McLean.

If you use this package, please citethis paperfor now.

How to use the package

For an easy way to get started with your own data or to run anexample, usetrace::generate_trace_template() to generate adocument with the pipeline pre-populated.

In this package, each sample is represented by an R6 ‘fragments’object, which are organized in lists. Functions in the package iterateover these lists, so you usually don’t need to interact with the objectsdirectly. If you do, the attributes of the objects can be accessed with$, and note that most functions modify the objects in place, sore-assignment isn’t necessary. The only exception to thatfind_fragments(), which transitions to a new object sincethe class structure changes.

There are several important factors to a successful repeatinstability experiment and things to consider when using thispackage:

Installation

Install the package from CRAN:

install.packages("trace")

Development version

You can install the development version fromGitHub with:

if (!require("pak",quietly =TRUE))install.packages("pak")pak::pak("zachariahmclean/trace")

Import data

Load the package:

library(trace)

First, we read in the raw data. In this case we will used exampledata within this package, but usually this would be fsa files that areread in usingread_fsa(). The example data is also clonedsince the next step modifies the object in place.

fsa_list<-lapply(cell_line_fsa_list,function(x) x$clone())

Find ladders

The known ladder sizes are assigned to peaks in the ladder channeland bp are predicted for each scan.

find_ladders(  fsa_list,show_progress_bar =FALSE)

visually inspect each ladder to make sure that the ladders werecorrectly assigned

plot_ladders(fsa_list[1])

If the ladders are are not assigned correctly, you can adjustparameters or manually using the built-in fix_ladders_interactive()app.

Find fragments

The fragment peaks are identified in the raw continuous trace data.These objects are assigned because find_fragments transitions to a newobject since the class structure changes. This reflects the data movingfrom a continuous trace to a peak table.

fragments_list<-find_fragments(  fsa_list,min_bp_size =300)

Visually inspect the traces and called peaks to make sure they werecorrectly assigned.

plot_traces(fragments_list[1],xlim =c(400,550),ylim =c(0,1200))

Alternatively, this is where you would use data exported fromGenemapper if you would rather use the Genemapper bp sizing and peakidentification algorithms. However, this is not recommended as some ofthe functionality of this package would not be accessible (mainly incall_repeats(), withbatch_correction andrepeat calling algorithms)

fragments_list_genemapper<-peak_table_to_fragments(example_data,data_format ="genemapper5",dye_channel ="B",min_size_bp =300)

Add metadata

Metadata can be incorporated to allow additional functionality incall_repeats() (batch or repeat correction) andassign_index_peaks() (assigning index peak from anothersample). Prepare a file (eg spreadsheet saved as .csv) with thefollowing columns. If you use the specified column names, it will beautomatically parsed byadd_metadata(), otherwise you willneed to match up which column name belongs to which metadata category(as done below inadd_metadata()):

Metadata table columnFunctionality metadata is associated withDescription
unique_idRequired for adding metadata usingadd_metadata()The unique identifier for the fsa file. Usually the sample filename. This must be unique, including across runs.
metrics_group_idassign_index_peaks(), allows settinggroupedThis groups the samples for instability metric calculations. Providea group id value for each sample. For example, in a mouse experiment andusing the expansion index, you need to group the samples since they havethe same metrics baseline control (eg inherited repeat length), soprovide the mouse id.
metrics_baseline_controlassign_index_peaks(), allows settinggroupedThis is related to metrics_group_id. Indicate with ‘TRUE’ to specifywhich sample is the baseline control (eg mouse tail for inherited repeatlength, or day-zero sample in cell line experiments)
batch_run_idcall_repeats(), allows settingcorrection= “batch” or “repeat”This groups the samples by batch. Provide a value for each fragmentanalysis run (eg date).
batch_sample_idcall_repeats(), allows settingcorrection= “batch” or “repeat”This groups the samples across batches. Give a unique sample id toeach different sample.
batch_sample_modal_repeatcall_repeats(), allows settingcorrection= “repeat”The validated modal repeat length for the particularbatch_sample_id sample used to accurately call repeatlength.
add_metadata(fragments_list = fragments_list,metadata_data.frame = metadata,unique_id ="unique_id",metrics_group_id ="metrics_group_id",metrics_baseline_control ="metrics_baseline_control",batch_run_id ="batch_run_id",batch_sample_id ="batch_sample_id",batch_sample_modal_repeat ="batch_sample_modal_repeat")

Identify modal peaks andcall repeats

Next we identify the modal peaks withfind_alleles() andconvert the base pair fragments to repeats withcall_repeats().

find_alleles(fragments_list)call_repeats(fragments_list)

We can view the distribution of repeat sizes and the identified modalpeak with a plotting function.

plot_traces(fragments_list[1],xlim =c(110,150))

Assign index peaks

A key part of several instability metrics is the index peak. This isthe repeat length used as the reference for relative instability metricscalculations, like expansion index or average repeat gain. In themetadata, samples are grouped by ametrics_group_id and asubset of the samples are set asmetrics_baseline_control,meaning they are the samples taken at day 0 in this experiment. Thisallows us to setgrouped = TRUE and set the index peak forthe expansion index and other metrics. For mice, if just a few sampleshave the inherited repeat signal shorter than the expanded population,you could not worry about this and instead use theindex_override_dataframe inassign_index_peaks().

assign_index_peaks(  fragments_list,grouped =TRUE)

We can validate that the index peaks were assigned correctly with adotted vertical line added to the trace. This is perhaps more useful inthe context of mice where you can visually see when the inherited repeatlength should be in the bimodal distribution.

plot_traces(fragments_list[1],xlim =c(110,150))

Calculate instabilitymetrics

All of the information we need to calculate the repeat instabilitymetrics has now been identified. We can finally usecalculate_instability_metrics to generate a dataframe ofper-sample metrics.

metrics_grouped_df<-calculate_instability_metrics(fragments_list = fragments_list,peak_threshold =0.05,window_around_index_peak =c(-40,40))

These metrics can then be used to quantify repeat instability. Forexample, this reproduces a subset of Figure 7e ofourmanuscript.

library(ggplot2)library(dplyr)#>#> Attaching package: 'dplyr'#> The following objects are masked from 'package:stats':#>#>     filter, lag#> The following objects are masked from 'package:base':#>#>     intersect, setdiff, setequal, unionmetrics_grouped_df|>left_join(metadata,by =join_by(unique_id))|>filter(    day>0,    modal_peak_signal>500  )|>ggplot(aes(as.factor(treatment), average_repeat_change))+geom_boxplot()+geom_jitter()+labs(y ="Average repeat gain",x ="Branaplam (nM)")


[8]ページ先頭

©2009-2025 Movatter.jp