Schematic of RNAP, σ factor, promoter complex.
A. The core RNAP enzyme is composed of five subunits: β, β', two α units, and ω. When the core RNAP enzyme is associated with a σ factor, the holoenzyme is formed. B. The holoenzyme binds to DNA promoters at the -10 / -35 sites that are specific to the σ factor. C. When bound to DNA, the holoenzyme is oriented such that the β and β' subunits are 'facing' the gene.
In the initial experimental approach used to increase ethanol yield using the CRISPRi system, the previous 109 students considered only a simplistic model for gene expression when designing sgRNA_target sequences: transcription initiation occurs when RNAP binds to the promoter regions and elongation proceeds as RNAP reads the DNA sequence to generates transcript. To inhibit expression, sgRNA_target sequences were designed to block one of these processes using the dCas9 protein as a roadblock. In actuality, regulation of gene expression in bacteria is far more complex.
The core RNAP enzyme is composed of five subunits: β, β', αI, αII, and ω. The β and β' subunits catalyze the formation of phospodiester bonds in elongating transcripts and form a 'claw' structure that is joined together by the α-dimer. The N-terminal domains of the α-dimer are in contact with the β and β' subunits and the C-terminal domains contain a flexible linker that is able to associate with DNA and / or regulatory proteins. The ω subunit is involved in assembly of RNAP. When this core enzyme is complexed with a σ factor, the holoenzyme is formed. The σ factor is critical in RNAP binding to DNA as it is responsible for promoter recognition and binds to the -10 / -35 sites, thereby recruiting RNAP to the promoter. In bacterial cells σ are classified into two distinct families, with several species also encoding alternate σ factors that regulate specific sub-sets of genes in specific conditions. When the holoenzyme is bound to a promoter, the association of RNAP + σ factor + DNA is termed the closed complex. Another role of the σ factor is in facilitating the formation of the open complex (also referred to as the transcription bubble) by stabilizing unwound DNA.
In addition to σ factors, transcription factors (TF) are regulators important in controlling gene expression. TF are regulatory proteins that bind to specific sites, called transcription factor binding sites (TFBS), near genes to control transcription. Unlike σ factors, TF are able to promote or repress gene expression depending on the location of the TFBS in the promoter as represented in the figure below. Each TF regulates a network of genes in response to environmental signal or intracellular cues. This provides a mechanism by which bacterial cells are able to control a suite of genes all involved in the same cellular process or response. To illustrate the importance of TF in bacterial gene regulation, E. coli is predicted encode genes for ~270 TF. This accounts for 6% of the protein-coding genes in the genome! Though much of the current literature states that transcription factors only bind to the promoters of genes to regulate gene expression, recent research shows that these regulators also bind within the coding regions of gene sequences.
Schematic of transcription factor binding within the promoter.
Part 1: Examine targeted gene sequences for regulatory binding sites
Today you will examine the DNA sequences of the targeted genes to identify features that may impact sgRNA_target binding. To do this you will first identify the -10 / -35 sites and second identify any transcription factor binding sites (TFBS) within the promoters of ldhA and pta-ack. The goal for this exercise is to identify if the CRISPRi system, guided by the sgRNA_target sequences, is inhibiting the binding of the RNAP due to blocking access of σ factor to the -10 / -35 or by blocking access of TF to TFBS. If so, this could provide information regarding the mechanism of gene regulation observed in the ethanol yield data. Furthermore, it could provide insight into how the ethanol yield can be improved. Because you will be collecting a lot of information today for ldhA and pta-ack, it is important that you keep clear, organized notes in your laboratory notebook. As in the previous exercises, feel free to divide the workload between laboratory partners.
- Use the KEGG Database to obtain the DNA sequences of the targeted genes (ldhA and pta-ack) in the E. coli K-12 MG1655 strain.
- Enter the name of targeted gene in the Search genes box and click Go.
- Because sgRNA_target molecules were generated that target the promoter, enter 50 in the +upstream box to get the 50 basepair sequence immediately preceding the start codon.
- You can also use the GeneSnap file you created previously.
- First, examine the portion of the sequence that is the promoter and identify the -10 / -35 sites.
- You can do this visually using the image to the right as a guide or you can use BPROM, an online promoter prediction tool.
- In your laboratory notebook, record if any of the sgRNA_target sequences bind to the -10 / -35 sites in the promoter.
- Second, examine the entire sequence and identify TFBS.
- Copy the the sequence for the targeted gene and paste it into the Virtural Footprint interface. Remember to include the upstream sequence!
- In the section labeled 'Step 1: Input DNA Sequence...', insert the sequence from KEGG into the 'Paste Sequence' box.
- In the section labeled 'Step 2: Select Pattern and Start Searching...' for the Position Weight Matrix click the 'Select All' button.
- Click the 'Start' button.
- A new browser tab will open with the results.
- The top of the screen shows the sequence analyzed with red highlights to indicate possible TFBS.
- The table at the bottom of the screen indicates the following from left to right: 1. the TF predicted to bind at the identified binding site (more description on PWM is below), 2. the basepair position of the start of the predicted TFBS, 3. the basepair position of the end of the predicted TFBS, 4. on which strand the sequence is located (+ = nontemplate, - = template), 5. the score associated with the predicted TFBS (more description is below), and 5. the sequence of the predicted TFBS (5' → 3').
- In your laboratory notebook, record if any of the sgRNA_target sequences bind TFBS.
- Be clear as to which sgRNA_target binds to which TFBS.
- Also, be mindful of which strand the sgRNA_target binds and which strand the TFBS is located.
- To validate the predictions from the Virtual Footprint tool, you will next look more closely at what the results mean for the TFBS identified in the previous step. For demonstration purposes, let's look at the results a random gene sequence:
- The above screenshot shows a portion of the data for a random gene. The TFBS that we will investigate further is the site predicted for Lrp binding. Lrp is predicted to bind the template strand of the sequence entered from basepair 4 to 18.
- To begin, PWM refers to the position weight matrix. The PWM is used to represent the similarity to a DNA pattern and is constructed using the alignments of known binding sites.
- The PWM for the TFBS can be found by returning to the Virtual Footprint interface.
- The species category of the PWM indicates the species where the consensus sequence was identified. This is important to note, however there is likely to be conservation among species so consider any appropriate transcription factor regardless of species listed.
- In the section labeled 'Step 2: Select Pattern and Start Searching...' select the TF in the 'Position Weight Matrices' box, then click the 'Show Weight Matrix' button. A separate window titled 'Position Weight Matrix Information' will open.
- The PWM for Lrp is shown to the right (this is located at the bottom of window that opens when you complete the above step). The size of each letter indicates the level of conservation at that base across the aligned sequences used to generate the PWM.
- The genus / species information indicates from which organism the sequences were obtained for the alignment. It is not critical that the PWM was generated with sequences from the organism you analyzed.
- The PWM is used to generate a score for each predicted TFBS in the sequence submitted for analysis.
- A 'Position Weight Matrix' table for the Lrp PWM is shown to the right (this is located at top of the window with the PWM).
- This table contains values that indicate the level of conservation at each base across the aligned sequences used to generate the PWM.
- For the Lrp TFBS predicted in the random sequence use for this example, the first base is T. According to the PWM table, the numerical value for a T at position 1 of the TFBS = 0.01. If we did this for every base in the 15 TFBS sequence, the score would equal 5.18. This corresponds to the score provided in the results from the initial Virtual Footprint analysis (see image at Step #9).
- In your laboratory notebook, complete the following for each TFBS identified in Step #8:
- Research the functional role of the TF that binds at that site. This can simply be a Google search that identifies the environmental conditions or intracellular cues under which it regulates gene expression.
- Include a screenshot of the PWM. Comparing the predicted TFBS sequence identified in your gene to the PWM for that TFBS, are you confident that this is a true binding site for the TF?
- Include a screenshot of the PWM table. Given the maximum score possible for this TFBS (calculate by adding the highest values for each base across the sequence), are you confident that this is a true binding site for the TF?
Part 2: Discuss your Research proposal with peers
To help you focus your ideas and develop the details of your project, you will discuss the project description you submitted today with a classmate from another group. As you listen to your classmate's idea, consider the following criteria proposed for small research project grants by the NIH:
Small Research Grant Program: ...small grant supports discrete, well-defined projects that realistically can be completed in two years and that require limited levels of funding. Because the research project usually is limited, the grant application may not contain extensive detail or discussion. Accordingly, reviewers should evaluate the conceptual framework and general approach to the problem. Appropriate justification for the proposed work can be provided through literature citations, data from other sources, or from investigator-generated data. Preliminary data are not required, particularly in application proposing pilot or feasibility studies.
Use the following exercise to guide your discussion as you consider both your and your classmate's project. Because you are still in the early stages of developing your research topic, it is okay if you do not have all of the answers to the following questions. This is meant to help you critically think about your proposal...not to point out the additional research you need to complete! Furthermore, this is an informal conversation and you should feel free to look up information during this exercise or just make notes so you know what to research later with your co-investigator.
Outline of the peer review exercise:
- Find the partner you were assigned by the teaching faculty and begin by deciding which partner will present first.
- As the presenter, focus on why you believe your topic is important and provide the context needed to convince your listener that it is indeed worth pursuing.
- As the listener, verbally summarize the topic back to the presenter to ensure you understood the proposal. Are you convinced that the topic is important? Why or why not? Discuss this with the presenter.
- Now that you have the needed background information discuss the 'Hows' of the project.
- As the presenter, consider the questions below as you give some details about your proposal.
- As the listener, feel free to ask questions and maybe provide some helpful feedback as the presenter discusses the details of their project.
Questions to guide your discussion:
- What is the novel aspect of your proposal?
- Why do you believe your project is feasible?
- Is there evidence that supports your proposal?
- How will your research advance the field?
- How will you complete your research (what methods might you use)?
- What is the expected result?
- How might you 'double-check' or confirm an expected result?
- What if you do not get the expected result?
- What can be learned if you get the expected result? If you get an unexpected result?
- What are some alternative approaches (methods) for your proposed research?
Once you have completed discussing the presenter's project, switch roles and complete this exercise with the listener's project.
In your laboratory notebook, complete the following:
- Record the notes from your discussions with your peers.
- What insights regarding your own project were provided by your peers?
- What did you find interesting about your peers projects?
Part 3: Consider societal implications of your Research proposal
Thus far we have focused on the experiments that you are designing as part of the Research proposal presentation; however, another important aspect of this assignment is defining the societal impacts of your proposed work. In recent years, funding agencies have placed increased emphasis on societal impact when reviewing grant applications and the public has been more critical of the use of government funds toward research that does not benefit the population. This sentiment is expressed in a recently published editorial:
"...research funding agencies will no longer be satisfied with claims that our research has impact merely because we use it in training of our students, because it is well-cited by other academics, or because it is published in reputable journals...it seems reasonable that at least some discernible societal value should emerge from research." (Davison and Bjorn-Anderson. Info Systems J. 2019;29:989-993)
Read and discuss the following perspective with your co-investigator:
Frodeman and Holbrook. "Science's social effects." Issues in Science and Technology. Vol. XXIII, No. 3, Spring 2007.
In your laboratory notebook, complete the following with your partner:
- What are the societal implications of your proposed research?
- Which populations benefit from your research? Is it possible any populations will be negatively impacted by your research?
- Will cost limitations impede / bias which populations are benefited?
- Is your research applicable to all populations?
- What are the ethical implications of your proposed research?
- Will you use harmful substances in your experiments?
- Will you generate potentially harmful materials in your research?
- Will you collect personal information as part of your research?
- If sampling / testing the population, how will you select the subject pool?
- Is equal representation across gender / race / sex / ethnicity / age important? Why or why not?
- If using animals, how will you determine if experiments are necessary?
- How will you ensure humane treatment of animals during experiments?
Next day: Design optimized gRNA sequence for enhanced ethanol production
Previous day: Analyze ethanol yield data