bigChain Track Format

The bigChain format describes a pairwise alignment that allow gaps in both sequences simultaneously, just as chain files do; however, bigChain files are compressed and indexed as bigBeds. Chain files are converted to bigChain files using the program bedToBigBed, run with the -as option to pull in a special autoSql (.as) file that defines the fields of the bigChain.

The bigChain files are in an indexed binary format. The main advantage of this format is that only those portions of the file needed to display a particular region are transferred to the Genome Browser server. Because of this, bigChain files have considerably faster display performance than regular chain files when working with large data sets. The bigChain file remains on your local web-accessible server (http, https or ftp), not on the UCSC server, and only the portion needed for the currently displayed chromosomal position is locally cached as a "sparse file". If you do not have access to a web-accessible server and need hosting space for your bigChain files, please see the Hosting section of the Track Hub Help documentation.

bigChain format definition

The following autoSql definition is used to specify bigChain pairwise alignment files. This definition, contained in the file bigChain.as, will be pulled in when the bedToBigBed utility is run with the -as=bigChain.as option.

    table bigChain
    "bigChain pairwise alignment"
        (
        string chrom;       "Reference sequence chromosome or scaffold"
        uint   chromStart;  "Start position in chromosome"
        uint   chromEnd;    "End position in chromosome"
        string name;        "Name or ID of item, ideally both human readable and unique"
        uint score;         "Score (0-1000)"
        char[1] strand;     "+ or - for strand"
        uint tSize;         "size of target sequence"
        string qName;       "name of query sequence"
        uint qSize;         "size of query sequence"
        uint qStart;        "start of alignment on query sequence"
        uint qEnd;          "end of alignment on query sequence"
        uint chainScore;    "score from chain"
        )

Note that the bedToBigBed utility uses a substantial amount of memory: approximately 25% more RAM than the uncompressed BED input file.

Creating a bigChain track

To create a bigChain track, follow these steps:

Step 1. If you already have a chain file you would like to convert to a bigChain, skip to Step 3. Otherwise download this example chain file for the human GRCh38 (hg38) assembly.

Step 2. Download these autoSql files needed by bedToBigBed: bigChain.as and bigLink.as.

Step 3. Download the bedToBigBed and hgLoadChain programs from the UCSC binary utilities directory.

Step 4. Use the fetchChromSizes script from the same directory to create a chrom.sizes file for the UCSC database with which you are working (e.g., hg38). Alternatively, you can download the chrom.sizes file for any assembly hosted at UCSC from our downloads page (click on "Full data set" for any assembly). For example, the hg38.chrom.sizes file for the hg38 database is located at http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes.

Here are wget commands to obtain these above files.

wget https://genome.ucsc.edu/goldenPath/help/examples/bigChain.as
wget https://genome.ucsc.edu/goldenPath/help/examples/bigLink.as
wget http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes
wget https://genome.ucsc.edu/goldenPath/help/examples/chr22_KI270731v1_random.hg38.mm10.rbest.chain

Step 5. Use the hgLoadChain utility to generate the chain.tab and link.tab files needed to create the bigChain file:

hgLoadChain -noBin -test hg38 bigChain chr22_KI270731v1_random.hg38.mm10.rbest.chain

Step 6. Create the bigChain file from your input chain file using a combination of sed, awk and the bedToBigBed utility:

sed 's/\.000000//' chain.tab | awk 'BEGIN {OFS="\t"} {print $2, $4, $5, $11, 1000, $8, $3, $6, $7, $9, $10, $1}' > chr22_KI270731v1_random.hg38.mm10.rbest.bigChain
bedToBigBed -type=bed6+6 -as=bigChain.as -tab chr22_KI270731v1_random.hg38.mm10.rbest.bigChain hg38.chrom.sizes bigChain.bb

Step 7. To display your date in the Genome Browser, you must also create a binary indexed link file to accompany your bigChain file:

awk 'BEGIN {OFS="\t"} {print $1, $2, $3, $5, $4}' link.tab | sort -k1,1 -k2,2n > bigChain.bigLink
bedToBigBed -type=bed4+1 -as=bigLink.as -tab bigChain.bigLink hg38.chrom.sizes bigChain.link.bb

Step 8. Move the newly created bigChain (bigChain.bb) and bigLink (bigChain.link.bb) files to a web-accessible http, https or ftp location.

Step 9. Construct a custom track using a single track line. Note that any of the track attributes listed here are applicable to tracks of type bigBed. The most basic version of the track line will look something like this:

track type=bigChain name="My Big Chain" bigDataUrl=http://myorg.edu/mylab/bigChain.bb linkDataUrl=http://myorg.edu/mylab/bigChain.link.bb

Step 10. Paste the custom track line into the text box on the custom track management page.

The bedToBigBed program can be run with several additional options. For a full list of the available options, type bedToBigBed (with no arguments) on the command line to display the usage message.

Examples

Example #1

In this example, you will create a bigChain custom track using an existing bigChain file, bigChain.bb, located on the UCSC Genome Browser http server. This file contains data for the hg38 assembly.

To create a custom track using this bigChain file:

Construct a track line that references the file:

track type=bigChain name="bigChain Example One" description="A bigChain file" bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigChain.bb linkDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigChain.link.bb

Paste the track line into the custom track management page for the human assembly hg38 (Dec. 2013).
Click the "submit" button.

Custom tracks can also be loaded via one URL line. This link loads the same bigChain.bb track and sets additional display parameters in the URL:

http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&position=chr22_KI270731v1_random &hgct_customText=track%20type=bigChain%20name=Example %20bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigChain.bb %20linkDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigChain.link.bb%20visibility=pack

After this example bigChain is loaded in the Genome Browser, click into a chain on the browser's track display. Note that the details page displays information about the individual chains, similar to that which is available for a standard chain track.

Example #2

In this example, you will create your own bigChain file from an existing chain input file.

Save this chain file to your computer (Step 1 in Creating a bigChain track, above).
Save the autoSql files bigChain.as and bigLink.as to your computer (Step 2, above).
Download the bedToBigBed and hgLoadChain utilities (Step 3, above).
Save the hg38.chrom.sizes text file to your computer. This file contains the chrom.sizes for the human hg38 assembly (Step 4, above).
Run the utilities in Steps 5-7, above, to create the bigChain and bigLink output files.
Place the newly created bigChain (bigChain.bb) and and bigLink (bigChain.link.bb) files on a web-accessible server (Step 8).
Construct a track line that points to the bigChain file (Step 9, above).
Create the custom track on the human assembly hg38 (Dec. 2013), and view it in the Genome Browser (Step 10, above).

Sharing your data with others

If you would like to share your bigChain data track with a colleague, learn how to create a URL by looking at Example 6 on this page.

Extracting data from the bigChain format

Because the bigChain files are an extension of bigBed files, which are indexed binary files, it can be difficult to extract data from them. UCSC has developed the following programs to assist in working with bigBed formats, available from the binary utilities directory.

bigBedToBed — converts a bigBed file to ASCII BED format.
bigBedSummary — extracts summary information from a bigBed file.
bigBedInfo — prints out information about a bigBed file.

As with all UCSC Genome Browser programs, simply type the program name (with no parameters) at the command line to view the usage statement.

Troubleshooting

If you encounter an error when you run the bedToBigBed program, check your input file for data coordinates that extend past the the end of the chromosome. If these are present, run the bedClip program (available here) to remove the problematic row(s) in your input file before running the bedToBigBed program.