Difference between revisions of "20.109(S22):M2D7"

From Course Wiki
Jump to: navigation, search
(Navigation links)
(Part 1: Examine targeted gene sequences for regulatory binding sites)
Line 18: Line 18:
 
===Part 1: Examine targeted gene sequences for regulatory binding sites===
 
===Part 1: Examine targeted gene sequences for regulatory binding sites===
  
Today you will examine the DNA sequences of the targeted genes to identify features that may impact sgRNA_target binding.  To do this you will first identify the -10 / -35 sites and second identify any transcription factor binding sites (TFBS) within the promoters of ''ldhA'' and ''pta-ack''.  The goal for this exercise is to identify if the CRISPRi system, guided by the sgRNA_target sequences, is inhibiting the binding of the RNAP due to blocking access of σ factor to the -10 / -35 or by blocking access of TF to TFBS.  If so, this could provide information regarding the mechanism of gene regulation observed in the ethanol yield data.  Furthermore, it could provide insight into how the ethanol yield can be improved.  Because you will be collecting a lot of information today for ''ldhA'' and ''pta-ack'', it is important that you keep clear, organized notes in your laboratory notebook. As in the previous exercises, feel free to divide the workload between laboratory partners.
+
Today you will examine the DNA sequences of the genes you targeted to identify features that may impact sgRNA_target binding.  To do this you will first identify the -10 / -35 sites to define the promoter and second, identify any transcription factor binding sites (TFBS) within the promoter or gene sequence of the gene you targeted.  The goal for this exercise is to identify if the CRISPRi system, guided by the sgRNA_target sequences, is interfering with the binding of regulatory proteins to the target site of the gene you selected.  If so, this could provide information regarding the mechanism of gene regulation observed in your ethanol / acetate yield data.  Furthermore, it could provide insight into how the ethanol yield can be improved.   
  
#Use the [http://www.genome.jp/kegg-bin/show_organism?org=eco KEGG Database] to obtain the DNA sequences of the targeted genes (''ldhA'' and ''pta-ack'') in the ''E. coli'' K-12 MG1655 strain.
+
#Use the [http://www.genome.jp/kegg-bin/show_organism?org=eco KEGG Database] to obtain the DNA sequences of the targeted genes in the ''E. coli'' K-12 MG1655 strain.
#*Enter the name of targeted gene in the Search genes box and click Go.
+
#*Enter the name of gene you targeted in the Search genes box and click Go.
#*Because sgRNA_target molecules were generated that target the promoter, enter 50 in the +upstream box to get the 50 basepair sequence immediately preceding the start codon.
+
#*Because some of sgRNA_target molecules were generated to target the promoter, enter 50 in the +upstream box to get the 50 basepair sequence immediately preceding the start codon.
 
#*You can also use the GeneSnap file you created previously.
 
#*You can also use the GeneSnap file you created previously.
 
#First, examine the portion of the sequence that is the promoter and identify the -10 / -35 sites.
 
#First, examine the portion of the sequence that is the promoter and identify the -10 / -35 sites.
 
#*[[File:Fa20 M3D4 E. coli promoter.png|right|600px|thumb]]You can do this visually using the image to the right as a guide or you can use [http://www.softberry.com/berry.phtml?topic=bprom&group=programs&subgroup=gfindb BPROM], an online promoter prediction tool.
 
#*[[File:Fa20 M3D4 E. coli promoter.png|right|600px|thumb]]You can do this visually using the image to the right as a guide or you can use [http://www.softberry.com/berry.phtml?topic=bprom&group=programs&subgroup=gfindb BPROM], an online promoter prediction tool.
#<font color =  #4a9152 >'''In your laboratory notebook,'''</font color> record if any of the sgRNA_target sequences bind to the -10 / -35 sites in the promoter.
+
#<font color =  #4a9152 >'''In your laboratory notebook,'''</font color> record if your sgRNA_target sequence binds to the predicted -10 / -35 sites in the promoter.
 
#Second, examine the entire sequence and identify TFBS.
 
#Second, examine the entire sequence and identify TFBS.
 
#*Copy the the sequence for the targeted gene and paste it into the [http://www.prodoric.de/vfp/vfp_promoter.php Virtural Footprint] interface. Remember to include the upstream sequence!
 
#*Copy the the sequence for the targeted gene and paste it into the [http://www.prodoric.de/vfp/vfp_promoter.php Virtural Footprint] interface. Remember to include the upstream sequence!
Line 35: Line 35:
 
#*The top of the screen shows the sequence analyzed with red highlights to indicate possible TFBS.
 
#*The top of the screen shows the sequence analyzed with red highlights to indicate possible TFBS.
 
#*The table at the bottom of the screen indicates the following from left to right: 1. the TF predicted to bind at the identified binding site (more description on PWM is below), 2. the basepair position of the start of the predicted TFBS, 3. the basepair position of the end of the predicted TFBS, 4. on which strand the sequence is located (+ = nontemplate, - = template), 5. the score associated with the predicted TFBS (more description is below), and 5. the sequence of the predicted TFBS (5' &rarr; 3').
 
#*The table at the bottom of the screen indicates the following from left to right: 1. the TF predicted to bind at the identified binding site (more description on PWM is below), 2. the basepair position of the start of the predicted TFBS, 3. the basepair position of the end of the predicted TFBS, 4. on which strand the sequence is located (+ = nontemplate, - = template), 5. the score associated with the predicted TFBS (more description is below), and 5. the sequence of the predicted TFBS (5' &rarr; 3').
#<font color =  #4a9152 >'''In your laboratory notebook,'''</font color> record if any of the sgRNA_target sequences bind TFBS.
+
#<font color =  #4a9152 >'''In your laboratory notebook,'''</font color> record if your sgRNA_target sequencess binds to a predicted TFBS.
#*Be clear as to which sgRNA_target binds to which TFBS.
+
#*Be as to which strand the sgRNA_target binds and which strand the TFBS is located.
#*Also, be mindful of which strand the sgRNA_target binds and which strand the TFBS is located.
+
 
#To validate the predictions from the Virtual Footprint tool, you will next look more closely at what the results mean for the TFBS identified in the previous step.  For demonstration purposes, let's look at the results a random gene sequence:[[File:Fa20 M3D4 virtual footprint example.png|center|750px|thumb]]
 
#To validate the predictions from the Virtual Footprint tool, you will next look more closely at what the results mean for the TFBS identified in the previous step.  For demonstration purposes, let's look at the results a random gene sequence:[[File:Fa20 M3D4 virtual footprint example.png|center|750px|thumb]]
 
#The above screenshot shows a portion of the data for a random gene.  The TFBS that we will investigate further is the site predicted for Lrp binding.  Lrp is predicted to bind the template strand of the sequence entered from basepair 4 to 18.
 
#The above screenshot shows a portion of the data for a random gene.  The TFBS that we will investigate further is the site predicted for Lrp binding.  Lrp is predicted to bind the template strand of the sequence entered from basepair 4 to 18.

Revision as of 17:51, 11 April 2022

20.109(S22): Laboratory Fundamentals of Biological Engineering

Sp17 20.109 M1D7 chemical structure features.png

Spring 2022 schedule        FYI        Assignments        Homework        Class data        Communication        Accessibility

       M1: Drug discovery        M2: Metabolic engineering        M3: Project design       


Introduction

Schematic of RNAP, σ factor, promoter complex. A. The core RNAP enzyme is composed of five subunits: β, β', two α units, and ω. When the core RNAP enzyme is associated with a σ factor, the holoenzyme is formed. B. The holoenzyme binds to DNA promoters at the -10 / -35 sites that are specific to the σ factor. C. When bound to DNA, the holoenzyme is oriented such that the β and β' subunits are 'facing' the gene.
In the initial experimental approach used to increase ethanol yield using the CRISPRi system, the previous 109 students considered only a simplistic model for gene expression when designing sgRNA_target sequences: transcription initiation occurs when RNAP binds to the promoter regions and elongation proceeds as RNAP reads the DNA sequence to generates transcript. To inhibit expression, sgRNA_target sequences were designed to block one of these processes using the dCas9 protein as a roadblock. In actuality, regulation of gene expression in bacteria is far more complex.

The core RNAP enzyme is composed of five subunits: β, β', αI, αII, and ω. The β and β' subunits catalyze the formation of phospodiester bonds in elongating transcripts and form a 'claw' structure that is joined together by the α-dimer. The N-terminal domains of the α-dimer are in contact with the β and β' subunits and the C-terminal domains contain a flexible linker that is able to associate with DNA and / or regulatory proteins. The ω subunit is involved in assembly of RNAP. When this core enzyme is complexed with a σ factor, the holoenzyme is formed. The σ factor is critical in RNAP binding to DNA as it is responsible for promoter recognition and binds to the -10 / -35 sites, thereby recruiting RNAP to the promoter. In bacterial cells σ are classified into two distinct families, with several species also encoding alternate σ factors that regulate specific sub-sets of genes in specific conditions. When the holoenzyme is bound to a promoter, the association of RNAP + σ factor + DNA is termed the closed complex. Another role of the σ factor is in facilitating the formation of the open complex (also referred to as the transcription bubble) by stabilizing unwound DNA.

In addition to σ factors, transcription factors (TF) are regulators important in controlling gene expression. TF are regulatory proteins that bind to specific sites, called transcription factor binding sites (TFBS), near genes to control transcription. Unlike σ factors, TF are able to promote or repress gene expression depending on the location of the TFBS in the promoter as represented in the figure below. Each TF regulates a network of genes in response to environmental signal or intracellular cues. This provides a mechanism by which bacterial cells are able to control a suite of genes all involved in the same cellular process or response. To illustrate the importance of TF in bacterial gene regulation, E. coli is predicted encode genes for ~270 TF. This accounts for 6% of the protein-coding genes in the genome! Though much of the current literature states that transcription factors only bind to the promoters of genes to regulate gene expression, recent research shows that these regulators also bind within the coding regions of gene sequences.

Schematic of transcription factor binding within the promoter.

Protocols

Part 1: Examine targeted gene sequences for regulatory binding sites

Today you will examine the DNA sequences of the genes you targeted to identify features that may impact sgRNA_target binding. To do this you will first identify the -10 / -35 sites to define the promoter and second, identify any transcription factor binding sites (TFBS) within the promoter or gene sequence of the gene you targeted. The goal for this exercise is to identify if the CRISPRi system, guided by the sgRNA_target sequences, is interfering with the binding of regulatory proteins to the target site of the gene you selected. If so, this could provide information regarding the mechanism of gene regulation observed in your ethanol / acetate yield data. Furthermore, it could provide insight into how the ethanol yield can be improved.

  1. Use the KEGG Database to obtain the DNA sequences of the targeted genes in the E. coli K-12 MG1655 strain.
    • Enter the name of gene you targeted in the Search genes box and click Go.
    • Because some of sgRNA_target molecules were generated to target the promoter, enter 50 in the +upstream box to get the 50 basepair sequence immediately preceding the start codon.
    • You can also use the GeneSnap file you created previously.
  2. First, examine the portion of the sequence that is the promoter and identify the -10 / -35 sites.
    • Fa20 M3D4 E. coli promoter.png
      You can do this visually using the image to the right as a guide or you can use BPROM, an online promoter prediction tool.
  3. In your laboratory notebook, record if your sgRNA_target sequence binds to the predicted -10 / -35 sites in the promoter.
  4. Second, examine the entire sequence and identify TFBS.
    • Copy the the sequence for the targeted gene and paste it into the Virtural Footprint interface. Remember to include the upstream sequence!
    • In the section labeled 'Step 1: Input DNA Sequence...', insert the sequence from KEGG into the 'Paste Sequence' box.
  5. In the section labeled 'Step 2: Select Pattern and Start Searching...' for the Position Weight Matrix click the 'Select All' button.
  6. Click the 'Start' button.
  7. A new browser tab will open with the results.
    • The top of the screen shows the sequence analyzed with red highlights to indicate possible TFBS.
    • The table at the bottom of the screen indicates the following from left to right: 1. the TF predicted to bind at the identified binding site (more description on PWM is below), 2. the basepair position of the start of the predicted TFBS, 3. the basepair position of the end of the predicted TFBS, 4. on which strand the sequence is located (+ = nontemplate, - = template), 5. the score associated with the predicted TFBS (more description is below), and 5. the sequence of the predicted TFBS (5' → 3').
  8. In your laboratory notebook, record if your sgRNA_target sequencess binds to a predicted TFBS.
    • Be as to which strand the sgRNA_target binds and which strand the TFBS is located.
  9. To validate the predictions from the Virtual Footprint tool, you will next look more closely at what the results mean for the TFBS identified in the previous step. For demonstration purposes, let's look at the results a random gene sequence:
    Fa20 M3D4 virtual footprint example.png
  10. The above screenshot shows a portion of the data for a random gene. The TFBS that we will investigate further is the site predicted for Lrp binding. Lrp is predicted to bind the template strand of the sequence entered from basepair 4 to 18.
  11. To begin, PWM refers to the position weight matrix. The PWM is used to represent the similarity to a DNA pattern and is constructed using the alignments of known binding sites.
    • The PWM for the TFBS can be found by returning to the Virtual Footprint interface.
      Fa20 M3D4 PWM example.png
    • The species category of the PWM indicates the species where the consensus sequence was identified. This is important to note, however there is likely to be conservation among species so consider any appropriate transcription factor regardless of species listed.
    • In the section labeled 'Step 2: Select Pattern and Start Searching...' select the TF in the 'Position Weight Matrices' box, then click the 'Show Weight Matrix' button. A separate window titled 'Position Weight Matrix Information' will open.
    • The PWM for Lrp is shown to the right (this is located at the bottom of window that opens when you complete the above step). The size of each letter indicates the level of conservation at that base across the aligned sequences used to generate the PWM.
    • The genus / species information indicates from which organism the sequences were obtained for the alignment. It is not critical that the PWM was generated with sequences from the organism you analyzed.
  12. The PWM is used to generate a score for each predicted TFBS in the sequence submitted for analysis.
    Fa20 PWM table example.png
    • A 'Position Weight Matrix' table for the Lrp PWM is shown to the right (this is located at top of the window with the PWM).
    • This table contains values that indicate the level of conservation at each base across the aligned sequences used to generate the PWM.
    • For the Lrp TFBS predicted in the random sequence use for this example, the first base is T. According to the PWM table, the numerical value for a T at position 1 of the TFBS = 0.01. If we did this for every base in the 15 TFBS sequence, the score would equal 5.18. This corresponds to the score provided in the results from the initial Virtual Footprint analysis (see image at Step #9).
  13. In your laboratory notebook, complete the following for each TFBS identified in Step #8:
    • Research the functional role of the TF that binds at that site. This can simply be a Google search that identifies the environmental conditions or intracellular cues under which it regulates gene expression.
    • Include a screenshot of the PWM. Comparing the predicted TFBS sequence identified in your gene to the PWM for that TFBS, are you confident that this is a true binding site for the TF?
    • Include a screenshot of the PWM table. Given the maximum score possible for this TFBS (calculate by adding the highest values for each base across the sequence), are you confident that this is a true binding site for the TF?

Navigation links

Next day: Organize figures and outline text for Research article

Previous day: Complete CRISPRi experiment and measure fermentation products