Difference between revisions of "20.109(S20):Analyze RNA-seq data and select gene targets for quantitative PCR (qPCR) experiment (Day 6)"

From Course Wiki
Jump to: navigation, search
(Part 1: Analyze RNA-seq data)
(Introduction)
Line 4: Line 4:
  
 
==Introduction==
 
==Introduction==
The transcriptome is the full suite of transcripts within an organism and provides the key link between the genetic code and phenotype.  Research focused on the transcriptome has provided important insights into how gene expression is altered in different cell / tissue types, in developmental phases, in disease states, and between species.  In this module, you will evaluate gene expression differences in the parental DLD-1 cell line compared to the BRCA2-/- mutant cell line.  In addition, you will assess the effects of DNA damage and drug treatments on the transcriptome in these cells.
+
The transcriptome is the full suite of transcripts within an organism and provides the key link between the genetic code and phenotype.  Research focused on the transcriptome has provided important insights into how gene expression is altered in different cell / tissue types, in developmental phases, in disease states, and between species.  In this module, you will evaluate gene expression in the DLD-1 cell line to assess the effects of a drug treatment on the transcriptome in these cells.
  
 
The gene expression data was generated using RNA-seq.  In this method deep sequencing is completed using reagents and equipment from Illumina.  With this technology, transcripts are directly sequenced and mapped to a reference genome.  Then the reads are counted to provide information on gene expression levels for a particular portion of the genome (i.e. for a particular gene).   
 
The gene expression data was generated using RNA-seq.  In this method deep sequencing is completed using reagents and equipment from Illumina.  With this technology, transcripts are directly sequenced and mapped to a reference genome.  Then the reads are counted to provide information on gene expression levels for a particular portion of the genome (i.e. for a particular gene).   

Revision as of 21:58, 19 March 2020

20.109(S20): Laboratory Fundamentals of Biological Engineering

Sp20 banner image v2.png

Spring 2020 schedule        FYI        Assignments        Homework        Class data        Communication
       1. Screening ligand binding        2. Measuring gene expression        3. Engineering antibodies              


Introduction

The transcriptome is the full suite of transcripts within an organism and provides the key link between the genetic code and phenotype. Research focused on the transcriptome has provided important insights into how gene expression is altered in different cell / tissue types, in developmental phases, in disease states, and between species. In this module, you will evaluate gene expression in the DLD-1 cell line to assess the effects of a drug treatment on the transcriptome in these cells.

The gene expression data was generated using RNA-seq. In this method deep sequencing is completed using reagents and equipment from Illumina. With this technology, transcripts are directly sequenced and mapped to a reference genome. Then the reads are counted to provide information on gene expression levels for a particular portion of the genome (i.e. for a particular gene).

Image from Goodwin et al. (2013) Nature Rev. 17:333-351.
In RNA-seq, RNA is purified from cells and reverse transcribed into DNA. The DNA molecules are modified with adapters, which are ligated to both ends of the DNA. Sequences complementary to the adapters are attached to the surface of flow cell channels and facilitate binding of modified DNA molecules and provide a primer for DNA polymerase. Following the initial binding to the flow cell channel, the DNA molecules from bridges that enable bridge amplification and cluster generation (see figure to the right). Through this process millions of dense clusters containing double-stranded DNA are generated.

To directly sequence from the clusters a sequencing by synthesis approach is used. In this, several rounds of amplification are performed using deoxynucleoside triphosphate (dNTP) bases. dNTP are terminator molecules given that the ribose 3'-OH group is blocked thereby preventing elongation by polymerase. Each terminating base (dATP, dTTP, dCTP, and dGTP) is fluorescently labeled (dATP = red, dTTP = green, dCTP = blue, and dGTP = yellow). For each round of sequencing a mixture containing all four labelled dNTP bases is added and a single base is incorporated to each DNA molecule bound to the flow cell channels. The flow cell is then imaged to capture the dNTP base that was added at each cluster location. Then the fluorescent label and 3'-OH blocking group is removed from the incorporated dNTP and another round of sequencing is performed. This results in the full sequences of every DNA molecule bound to the flow cell channels. Therefore, the sequence of the cluster denoted by the asterisk is GCTGA in the schematic provided below.

Sp18 20.109 M2D4 illumina sequencing.png

As with all technologies, there are positives and negatives to RNA-seq. On the plus side, the ability to directly sequence enables researchers to assess gene expression in organisms for which a full genome sequence is not available or not fully annotated. Furthermore, this method allows for the quantification of individual isoforms that result from alternate splicing. On the minus side, the cost of RNA-seq can limit the depth of sequencing achieved and genes that are not highly expressed may not be captured in a data set.

Protocols

Welcome back!! To ensure we all are ready to go, let's review RNA-seq. In addition to reading the text provided in the Introduction, please watch the Illumina synthesis by sequencing (SBS) video which shows the process used to acquire the RNA-seq data you are analyzing in this module (linked here). Remember, following RNA purification, the samples were submitted to the BioMicro Center for Illumina sequencing. Illumina sequencing technology, SBS, is used for massively parallel sequencing with a proprietary method that detects single bases as they are incorporated into growing DNA strands.

Part 1: Analyze RNA-seq data

Today you will expand on your analysis of the RNA-seq data gathered from the DLD-1 cell line.

Complete Exercise #3 developed by Amanda Kedaigle, Anne Shen & Prof. Ernest Fraenkel. In this exercise, you will first work through a refresher focused on the clustering methods used in the previous analysis. Then you will use the skills you developed to examine a published dataset (A549 cells treated with etoposide) and compare the results to those collected for DLD-1.

Part 2: Select genes for qPCR experiment

Navigation links

Next day: Review qPCR experiment and complete statistical analysis

Previous day: Purify RNA from etoposide-treated cells and generate cDNA