bigGenePred Track Format

The bigGenePred format stores positional annotations for collections of exons in a compressedformat, similar to howBED files are compressedinto bigBeds. The bigGenePred format is a superset of thegenePred text-based format supported using thebigBed format, so it can be efficiently accessed over anetwork. The bigGenePred format includes 8 additional fields that contain details about codingframes, annotation status, and other gene-specific information. This is commonly used in the Browserto display start codons, stop codons, and amino acid translations.

Before compression, bigGenePred files can be described as bed12+8 files. bigGenePredfiles can be created using the programbedToBigBed, run with the-asoption to pull in a specialautoSql (.as) file that defines the extra fields of the bigGenePred.

Much like bigBed, bigGenePred files are in an indexed binary format. The advantage of using a binaryformat is that only the portions of the file needed to display a particular region are read, allowing for much faster performance when working with large data sets. As with all big* files, bigGenePred files must be hosted on a web-accessible server (http, https, or ftp) to be displayed.For more information on finding a hosting location for your bigGenePred files, please seethehosting section of the Track Hub Help documentation.

bigGenePred format description

The following autoSql definition specifies bigGenePred gene prediction files. This definition, contained in the filebigGenePred.as, is pulled in when thebedToBigBed utility is run with the-as=bigGenePred.as option.

table bigGenePred"bigGenePred gene models"    (    string chrom;       "Reference sequence chromosome or scaffold"    uint   chromStart;  "Start position in chromosome"     uint   chromEnd;    "End position in chromosome"    string name;        "Name or ID of item, ideally both human-readable and unique"    uint score;         "Score (0-1000)"    char[1] strand;     "+ or - for strand"    uint thickStart;    "Start of where display should be thick (start codon)"    uint thickEnd;      "End of where display should be thick (stop codon)"    uint reserved;       "RGB value (use R,G,B string in input file)"    int blockCount;     "Number of blocks"    int[blockCount] blockSizes; "Comma separated list of block sizes"    int[blockCount] chromStarts;"Start positions relative to chromStart"    string name2;       "Alternative/human readable name"    string cdsStartStat; "Status of CDS start annotation (none, unknown, incomplete, or complete)"    string cdsEndStat;   "Status of CDS end annotation (none, unknown, incomplete, or complete)"    int[blockCount] exonFrames; "Reading frame of the start of the CDS region of the exon, in the direction of transcription (0,1,2), or -1 if there is no CDS region."    string type;        "Transcript type"    string geneName;    "Primary identifier for gene"    string geneName2;   "Alternative/human-readable gene name"    string geneType;    "Gene type"    )

The fieldexonFrames is a comma-separated list of the numberswith the possible values 0, 1, 2 or -1, one per exon, in order of transcription.This order means that the first value for a transcript on the minus (-) strand isthe exon on the right of the screen on the Genome Browser.A value of zero means that the first codon of the exon starts at the first nucleotide of theexon. A value of one means that the first codon starts after the firstnucleotide and a value of two means that it starts after the second nucleotide.UTRs are non-coding and their exonFrame value is -1.

The fields cdsStartStat and cdsEndStat have the following values: 'none' = none,'unk' = unknown, 'incmpl' = incomplete, and 'cmpl' = complete. Thevalues, however, are not used for our display and cannot be used to identify coding or non-coding genes.For most purposes, to get more information about a transcript, other tables will need to be used. Forinstance, in the case of hg38, the tables named wgEncodeGencodeAttrsVxx, where xx is the Gencode Version number.See thiscoding/non-coding genes FAQfor more information.

The following bed12+8 is an example of apre-bigGenePred text file.

Creating a bigGenePred track from a bed12+8 file

Step 1. Format your pre-bigGenePred file. The first 12 fields of pre-bigGenePred files are described by theBED file format. Your file mustalso contain the 8 extra fields described in the autoSql file definition shown above:name2, cdsStartStat, cdsEndStat, exonFrames, type, geneName, geneName2, geneType. For example, you can use this bed12+8 input file,bigGenePred.txt. Your pre-bigGenePred file must be sorted first on thechrom field, and secondarily on thechromStart field. Youcan use the UNIXsort command to do this:

sort -k1,1 -k2,2n unsorted.bed > input.bed

Step 2. Download thebedToBigBed program from thebinary utilities directory.

Step 3. Download thechrom.sizes file for your assembly fromourdownloads page (click on "Fulldata set" for your organism). For example, thehg38.chrom.sizes file for the hg38database is located athttp://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes.Alternatively, you can use thefetchChromSizes script from theutilities directory.

Step 4. Create the bigGenePred file from your pre-bigGenePred file using thebedToBigBed utility command:

bedToBigBed -as=bigGenePred.as -type=bed12+8 bigGenePred.txt chrom.sizes myBigGenePred.bb

Step 5. Move the newly created bigGenePred file (myBigGenePred.bb) to a web-accessible http, https,or ftp location. Seehosting section if necessary.

Step 6. Construct acustom track using a singletrack line. Any of the track attributes will beavailable for use on bigBed tracks. The basic version of the track line will look something like this:

track type=bigGenePred name="My Big GenePred" description="A Gene Set Built from Data from My Lab" bigDataUrl=http://myorg.edu/mylab/myBigGenePred.bb

Step 7. Paste this custom track line into the text box on thecustom track page with your modified URL. Click and your track should load successfully. Then click to be taken to the Browser windowwith your custom track at the top. Note that there might not be data at all positions.

Examples

Example #1: Create a Custom Track

Create a bigGenePred custom track using the bigGenePred file located on the UCSC Genome Browser http server,bigGenePred.bb. This file contains data forthe hg38 assembly.

Construct a track line that references the hosted file:

track type=bigGenePred name="bigGenePred Example One" description="A bigGenePred file" bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigGenePred.bb

Paste the track line into thecustom track page for the human assembly, hg38.
Click the button.

Custom tracks can also be loaded via one URL line. The link below loads the same bigGenePred track and sets additional parameters in the URL:

http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&hgct_customText=track%20type=bigGenePred%20bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigGenePred.bb

After this example bigGenePred track is loaded in the Genome Browser, click on the track to change display from dense to pack, then click on a gene in the Browser's track display to view the gene details page. Note that the page offers links to translated protein, predicted mRNA,and genomic sequence.

Example #2: Display Amino Acids and Codon Numbers

In this example, you will configure the bigGenePred track loaded in Example #1 to display amino acids and codon numbering:

Access the track configuration page by right-clicking anywhere in the track and clicking "Configure User Track" or alternately, from within a gene's details page, click the "Go to User Track track controls" link.
Making sure the display is in pack or full visibility mode, change the "Color track by codons:" option from "OFF" to "genomic codons". Then click or.
Zoom to a region with track data, such aschr9:133,255,650-133,255,700, and note that the track now displays amino acids.
Return to the track configuration page and check the box next to "Show codon numbering", then click. The Browser tracks display will now show amino acid letters and codon numbering when sufficiently zoomed in.

Alternatively, you can also add a parameter in the custom track line,baseColorDefault=genomicCodons, to set amino acids and codon numbering to display by default:

browser position chr10:67,884,600-67,884,900track type=bigGenePred baseColorDefault=genomicCodons name="bigGenePred Example Two" description="A bigGenePred file" visibility=pack bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigGenePred.bb

Paste the above into the hg38custom track page to view an example of bigGenePred amino acid display at the beginning of the SIRT1 gene on chromosome 10.

An image of a track with codons colored

Example #3: Bed12+8 to BigGenePred

In this example, you will create your own bigGenePred file from an existing pre-bigGenePred inputfile, a bed12+8 file.

Save the example bed12+8 input file to your computer,bigGenePred.txt.
Download thebedToBigBed utility (Step 2, in theCreating a bigGenePred section above).
Save thehg38.chrom.sizes text file to your computer. This file contains the chrom.sizes for the human hg38 assembly (Step 3, above).
Save the autoSql filebigGenePred.as to your computer.

Run thebedToBigBed utility to create the bigGenePred output file (step 4, above):

bedToBigBed -type=bed12+8 -tab -as=bigGenePred.as bigGenePred.txt hg38.chrom.sizes bigGenePred.bb

Place the newly created bigGenePred file (bigGenePred.bb) on a web-accessible server (Step 5, above).
Construct a track line that points to the bigGenePred file (Step 6, above).
Create the custom track on the human assembly hg38 (Dec. 2013), and view it in the Genome Browser (step 7, above).

Example #4: GTF (or GFF) to BigGenePred

In this example, you will convert a GTF file to bigGenePred using command line utilities.You will needgtfToGenePred,genePredTobigGenePred,andbedToBigBed. If you would like to convert a GFF file to bigGenePred, you will usegff3ToGenePred in place of thegtfToGenePred. You can download utilities from theutilities directory.

Obtain a GTF file using thewget command. Skip this step if you already have a GTF or GFF file.
```
wget http://genome.ucsc.edu/goldenPath/help/examples/bigGenePredExample4.gtf
```
Convert the GTF file to genePred extended format using thegtfToGenePred command.
```
gtfToGenePred -genePredExt bigGenePredExample4.gtf example4.genePred
```
If you are converting a GFF file, use thegff3ToGenePred command.
```
gff3ToGenePred yourFile.gff example4.genePred
```
Convert the genePred extended file to a pre-bigGenePred text file.
```
genePredToBigGenePred example4.genePred bigGenePredEx4.txt
```

Download a helper file that specifies column names.

wget https://genome.ucsc.edu/goldenPath/help/examples/bigGenePred.as

Convert your text bigGenePred to a binary indexed format. If you are not using hg38, you will need to replace the hg38.chrom.sizes file path with your organism's file path from thedownloads directory under "Genome Sequence Files".
```
bedToBigBed -type=bed12+8 -tab -as=bigGenePred.as bigGenePredEx4.txt http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes bigGenePredEx4.bb
```
Put your binary indexed file, bigGenePredEx4.bb, in a web-accessible location. See thehosting section for more information.
To view this example, you can click this into this Browser link. To view your own data, paste the link into your web browser and replace the URL after "bigDataUrl=" with a link to your own hosted data.
```
http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&position=chr19:44905790-44909388&hgct_customText=track%20type=bigGenePred%20bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigGenePredEx4.bb
```
You can also add your data in thecustom track managementpage. This allows you to set position, configuration options, and write a more complete desciption. If you want to see codons, you can right click, then click configure codon view orset this options usingbaseColorDefault=genomicCodons as is done below.
```
browser position chr19:44905790-44909388 track type=bigGenePred baseColorDefault=genomicCodons name="bigGenePred Example Four" description="Ex4:BigGenePred Made from GTF" visibility=pack bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigGenePredEx4.bb
```
Once you are done, you should have a track on the Genome Browser like the one below.

An image of a bigGenePred track on the Browser

Example #5: Create a BigGenePred Track Hub

In this example, you will set up a Track Hub that displays bigGenePred data and uses one of thebigGenePred-specific settings to display gene codons. You can see a pre-built version of this hubby clickingthis link.

Make sure you have access to a web-hosted file location like GitHub, CyVerse, or an institutional website. This is where you will store your bigData files and configuration files.For more information, please visit thehosting section of our Track Hub help guide.
Copy the text from thebigGenePred example hub filesinto identically named files on your hosted website. You will need a hub.txt and a genomes.txtfile, along with a directory for each assembly you would like to visualize (hg38). This assemblydirectory stores your data files (bigGenePred.bb) and a trackDb.txt file which defines track settings, such as track name, description, and the bigGenePred setting for amino acid display (baseColorDefault genomicCodons). Visit thetrackDb help page for more information about trackDb settings.
Verify that you have the four minimum files, two of which in thehg38 directory:
--hub.txt
--genomes.txt
--hg38
----trackDb.txt
----bigGenePred.bb
You may also include a hub description page and atrack description page, where you will find additional instructions.
Copy the URL address of your hub.txt file and paste it into the text input box on theTrack Hub page. For an example, you can paste the following link into the input box:http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubBigGenePred/hub.txt.Clicking the button will connect to your track hub and navagate you to your assembly of interest. For more instructions on setting up Track Hubs, visit theTrack Hub set-up page.

An image of a bigGenePred track hub on the Browser

Sharing your data with others

If you would like to share your bigGenePred data track with a colleague, learn how to create a URLlink to your data by looking atExample #6 on the custom track help page.

Extracting data from bigBed format

Because the bigGenePred files are an extension of bigBed files, which are indexed binary files, it can be difficult to extract data from them. UCSC has developed the following programs toassist in working with bigBed formats, available from thebinary utilities directory.

bigBedToBed — converts a bigBed file to ASCII BED format.
bigBedSummary — extracts summary information from a bigBed file.
bigBedInfo — prints out information about a bigBed file.

As with all UCSC Genome Browser programs, simply type the program name (with no parameters) at the command line to view the usage statement.

Troubleshooting

If you encounter an error when you run thebedToBigBed program, check your input file for data coordinates that extend past the end of the chromosome. If these are present, run thebedClip program (available here) to remove the problematic row(s) before running thebedToBigBed program.

Movatterモバイル変換