Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

LDlinkR

NotificationsYou must be signed in to change notification settings

CBIIT/LDlinkR

Repository files navigation

Calculating Linkage Disequilibrium in Human Populations of Interest


CRAN versionmetacran downloadsCRAN/METACRANstatusWebsite ldlink.nih.govR-CMD-checkDOI

Description

LDlink is an interactive and powerful suite of web-based tools for querying germline variants in human population groups of interest to generate interactive tables and plots. All population genotype data originates from Phase 3 (Version 5) of the 1000 Genomes Project and variant RS (reference SNP) numbers are indexed based ondbSNP build 155.

LDlinkR is an R package developed to query and download results(internet access required) generated byLDlink web-based applications from the R console. It facilitates researchers who are interested in performing batch queries.LDlinkR accelerates genomic research by providing efficient and user-friendly functions to programmatically interrogate pairwise linkage disequilibrium from large lists of genetic variants.

Please see the onlineLDlink documentation for more information about understanding linkage disequilibrium (LD) and additional details about howLDlink calculates patterns of LD across a variety of ancestral human populations.

Installation

  • The release version ofLDlinkR can be installed fromCRAN with:
install.packages("LDlinkR")
  • The development version of theLDlinkR package can be installed from theGitHub repository by using theremotes package:
install.packages("remotes")remotes::install_github("CBIIT/LDlinkR")

LDlinkR depends on the following packages:

  • utils, version 3.4.2 or later
  • httr, version 1.4.0 or later

Following installation, attach theLDlinkR package with:

library(LDlinkR)

Personal Access Token -Required

In order to access theLDlink API viaLDlinkR, we use a personal access token. This is a common convention followed by many APIs and emulates the more familiar HTTPS username/password or SSH keys.

You will need to:

  • Make a one-time request for your personal access token from a web browser athttps://ldlink.nih.gov/?tab=apiaccess.
  • Once registered, your personal access token will be emailed to you. It is a string of 12 random letters and numbers.
  • Provide your token as an argument when usingLDlinkR. See example below:
LDhap(snps= c("rs3","rs4","rs148890987"),pop="YRI",token="YourTokenHere123",genome_build="grch38")

Available Functions

FunctionDescription
LDexpressDetermine if a list of genomic variants is associated with gene expression in tissues of interest.
LDhapCalculates population specific haplotype frequencies of all haplotypes observed for a list of query variants.
LDmatrixGenerates a data frame of pairwise linkage disequilibrium statistics.
LDpairInvestigates potentially correlated alleles for a pair of variants.
LDpopInvestigates allele frequencies and linkage disequilibrium patterns across 1000 Genomes Project populations.
LDproxyExplore proxy and putative functional variants for a single query variant.
LDproxy_batchQueryLDproxy using a list of query variants.
LDtraitSearch theGWAS Catalog (data updated nightly) to determine if a list of variants (or variants in LD with those variants) have been previously associated with a trait or disease.
SNPchipFind commercial genotyping chip arrays for variants of interest.
SNPclipPrune a list of variants by linkage disequilibrium.

Utilities

Utility FunctionDescription
list_chipsProvides a data frame listing the names and abbreviation codes for available commercial SNP Chip Arrays from Illumina and Affymetrix.
list_popProvides a data frame listing the available reference populations from the 1000 Genomes Project.
list_gtex_tissuesProvides a data frame listing the GTEx full names,LDexpress full names (without spaces) and acceptable abbreviation codes of the 54 non-diseased tissue sites collected for theGTEx Portal and used as input for theLDexpress function.

Basic example

In this basic example, theLDproxy function is used to explore proxy and putative functional variants for a single query variant. Usage by other functions is similar.

my_proxies <- LDproxy(snp = "rs456",                       pop = "YRI",                       r2d = "r2",                       token = "YourTokenHere123",                      genome_build = "grch38"                     )

This example uses a single reference SNP ID (rsID) for the query variant, a population of interest (YRI = Yoruba in Ibadan, Nigeria), "r2" for the desired output to be based on estimated R2, and genome build GRCH38 (hg38). The output is stored in the variablemy_proxies.Note: Replace "YourTokenHere123" with your personal access token. See section above, "Personal Access Token".


The output can be viewed by using the R Utils Packagehead function to return the first parts of the objectmy_proxies.

head(my_proxies)
##    RS_Number         Coord Alleles    MAF Distance Dprime     R2 Correlated_Alleles## 1 rs58333091 chr7:24922800   (G/C) 0.1963        0      1 1.0000            G=G,C=C## 2 rs60614713 chr7:24922807   (T/C) 0.1963        7      1 1.0000            G=T,C=C## 3 rs59826225 chr7:24925014   (G/T) 0.1963     2214      1 1.0000            G=G,C=T## 4      rs123 chr7:24926827   (C/A) 0.1963     4027      1 1.0000            G=C,C=A## 5 rs10341080 chr7:24920084   (C/T) 0.2056    -2716      1 0.9434            G=C,C=T## 6 rs56794736 chr7:24919358   (C/T) 0.2056    -3442      1 0.9434            G=C,C=T##   RegulomeDB Function## 1          4     <NA>## 2         2b     <NA>## 3          4     <NA>## 4         1f     <NA>## 5         3a     <NA>## 6          7     <NA>

Another example

This example demonstrates the use of theLDexpress function to search if a genomic variant (or list of variants) is associated with gene expression in tissues of interest. Usage by other functions is similar.

my_output <- LDexpress(snps = "rs4",                       pop = c("YRI", "CEU"),                       tissue =  c("ADI_SUB", "ADI_VIS_OME"),                       token = "YourTokenHere123"                      )

For the function arguments, this example uses a single rsID for a query variant, multiple populations (e.g., YRI = Yoruba in Ibadan, Nigeria and CEU = Utah Residents from North and West Europe) and multiple tissue types using acceptable abbreviations for available tissues (e.g., ADI_SUB = Adipose - Subcutaneous and ADI_VIS_OME = Adipose - Visceral (Omentum)). The output is stored in the variablemy_output.Note: Replace "YourTokenHere123" with your personal access token. See section above, "Personal Access Token".


In order to view the output, use the R Utils Packagehead function to return the first parts of the objectmy_output.

head(my_output)
##   Query      RS_ID       Position                R2                D'## 1   rs4 rs10637519 chr13:32430479 0.174249321651574 0.965976331360947## 2   rs4 rs10637519 chr13:32430479 0.174249321651574 0.965976331360947## 3   rs4   rs473641 chr13:32431244 0.174249321651574 0.965976331360947## 4   rs4   rs473641 chr13:32431244 0.174249321651574 0.965976331360947## 5   rs4   rs671746 chr13:32431263 0.174249321651574 0.965976331360947## 6   rs4   rs671746 chr13:32431263 0.174249321651574 0.965976331360947##    Gene_Symbol        Gencode_ID                       Tissue## 1 RP1-257C22.2 ENSG00000279314.1       Adipose - Subcutaneous## 2 RP1-257C22.2 ENSG00000279314.1 Adipose - Visceral (Omentum)## 3 RP1-257C22.2 ENSG00000279314.1       Adipose - Subcutaneous## 4 RP1-257C22.2 ENSG00000279314.1 Adipose - Visceral (Omentum)## 5 RP1-257C22.2 ENSG00000279314.1       Adipose - Subcutaneous## 6 RP1-257C22.2 ENSG00000279314.1 Adipose - Visceral (Omentum)##   Non_effect_Allele_Freq Effect_Allele_Freq Effect_Size     P_value## 1                G=0.565          GTC=0.435    0.225642  2.2578e-07## 2                G=0.565          GTC=0.435    0.207161  1.0227e-05## 3                A=0.565            G=0.435    0.225642  2.2578e-07## 4                A=0.565            G=0.435    0.207161  1.0227e-05## 5                C=0.565            T=0.435    0.226558 1.93289e-07## 6                C=0.565            T=0.435    0.207161  1.0227e-05

Utility function example

The following example demonstrates the usage of the utility functionlist_pop which returns a listing of the available reference populations from the 1000 Genomes Project and their corresponding population code and super population code used byLDlinkR functions. Usage of the other utility functions is similar.

list_pop()
##    pop_code super_pop_code                                  pop_name## 1       ALL            ALL                           ALL POPULATIONS## 2       AFR            AFR                                   AFRICAN## 3       YRI            AFR                  Yoruba in Ibadan, Nigera## 4       LWK            AFR                    Luhya in Webuye, Kenya## 5       GWD            AFR                 Gambian in Western Gambia## 6       MSL            AFR                     Mende in Sierra Leone## 7       ESN            AFR                            Esan in Nigera## 8       ASW            AFR   Americans of African Ancestry in SW USA## 9       ACB            AFR           African Carribbeans in Barbados## 10      AMR            AMR                         AD MIXED AMERICAN## 11      MXL            AMR    Mexican Ancestry from Los Angeles, USA## 12      PUR            AMR            Puerto Ricans from Puerto Rico## 13      CLM            AMR        Colombians from Medellin, Colombia## 14      PEL            AMR                 Peruvians from Lima, Peru## 15      EAS            EAS                                EAST ASIAN## 16      CHB            EAS              Han Chinese in Bejing, China## 17      JPT            EAS                  Japanese in Tokyo, Japan## 18      CHS            EAS                      Southern Han Chinese## 19      CDX            EAS       Chinese Dai in Xishuangbanna, China## 20      KHV            EAS         Kinh in Ho Chi Minh City, Vietnam## 21      EUR            EUR                                  EUROPEAN## 22      CEU            EUR Utah Residents from North and West Europe## 23      TSI            EUR                         Toscani in Italia## 24      FIN            EUR                        Finnish in Finland## 25      GBR            EUR           British in England and Scotland## 26      IBS            EUR               Iberian population in Spain## 27      SAS            SAS                               SOUTH ASIAN## 28      GIH            SAS  Gujarati Indian from Houston, Texas, USA## 29      PJL            SAS             Punjabi from Lahore, Pakistan## 30      BEB            SAS                   Bengali from Bangladesh## 31      STU            SAS              Sri Lankan Tamil from the UK## 32      ITU            SAS                 Indian Telugu from the UK

Additional examples

More detailed examples demonstrating the usage of each function can be found in the package vignette.

browseVignettes("LDlinkR")

Contributors

Timothy A. Myers, Stephen J. Chanock and Mitchell J. Machiela


[8]ページ先頭

©2009-2025 Movatter.jp