20.109(S17):Complete data analysis (Day6)

From Course Wiki
Jump to: navigation, search
20.109(S17): Laboratory Fundamentals of Biological Engineering

KoehlerDotBanner.png

Schedule Spring 2017        Announcements        Assignments        Homework        Communication
       1. High-throughput ligand screening        2. Gene expression engineering        3. Biomaterials engineering              

Introduction

Today is the culminating day of Module 1! Hopefully you will identify 'hits' from your SMM, or ligands that are able to bind FKBP12. Though you may be able to qualitatively visualize spots that appear to emit more fluorescence, it is important to complete quantitative analysis that supports your observations.

During our previous laboratory session, you used microarrayer to read the fluorescence signals on the surface of the SMM at two excitation wavelengths. As noted previously, the 532 nm wavelength was used to excite fluorescein, which was printed in an 'X' pattern to assist with alignment. The 635 nm wavelength was used to excite the fluorophore-conjugated anti-His antibody, which should only bind FKBP12. A hit denotes a spot on the slide that emits a fluorescence signal significantly higher than the background fluorescence level. In terms of protein binding, a hit denotes that the FKBP12 protein is bound to a compound and the antibody is therefore localized to that position on the slide. You will analyze the fluorescence emission data collected by the microarray scanner using two quantitative approaches: a robust z score and a p-value. Both calculations were explained in detail by Shelby Doyle during the M1D5 lecture.

Sp17 20.109 M1D6 spots.png

When the Koehler Lab prepared each printed glass slide, the microarrayer also produced a GAL file, or GenePix Array List, which can be viewed using Excel. The GAL file contains information about where each spot was printed, and what compound was printed there. However, the relationship between the GAL file and the actual contact of the print head is very imprecise. Instead, we will use the fluorescein guide spots to align the array in the GAL file to the true print location for each pin. For this alignment, we’ll use a tool provided by the Chemical Genetics Section at the NCI. This tool searches the scanned image for these guide spots and attempts to rotate, translate, and scale the array to best match the observed fluorescence. Following the alignment, we will compare the fluorescence at 635nm within the deposition region of each spot (foreground) to the fluorescence immediately outside of this region, where nothing was printed (background). We’ll use these values to calculate a robust z score. From the robust z score, we can determine the associated probability that the observed fluorescence occurred by chance (p-value), and if this probability is sufficiently low, we call the compound a hit.

(Written with assistance from Rob Wilson.)

Protocols

Part 1: Align the array and quantify spot fluorescence

The SMM alignment tool is provided on 20.109 laboratory computers. If you feel comfortable working with a Python development environment, we can also make the source code available to download. This will require the installation of Python 3.5 and multiple third party libraries which are included in Anaconda 4.2.0.

  1. Download the .gal files that correspond to the barcodes on your slides from the Discussion page associated with the Module 1 homepage.
  2. Download on the desktop of your 20.109 computer this software package, developed by Rob Wilson and courtesy of the NCI Chemical Genetics Section.
  3. Open a Terminal window from Finder\ Applications. Type in >>python ~/Desktop/SMMAnalysisTool.zip, and press Enter to execute the code.
    • Note: a recurring bug may prevent the menu bar from responding. If this occurs, click out of the window then click back into the window.
  4. Open the .tiff file for one of your slides by selecting File → Load TIFF.
    • You can change the wavelength channel that is displayed using the drop-down menu labeled 'Display'.
    • You can adjust the background fluorescence signal by moving the top slider rightward.
    • You can saturate the foreground fluorescence signal by moving the bottom slider leftward.
  5. Visually inspect the slide image and note all observations concerning flaws (damaged spots), obvious hits, residual fluorescence, etc.
  6. Open the .gal file corresponding to the barcode on the slide by selecting File → Load GAL.
    • Hover your cursor over the interesting spots observed to see which compounds were printed at these locations.
  7. In the box labeled 'Guide Name' at the left of the window, type "Sentinel" in the field for 532.
    • Leave the field for 635 blank as we do not use guide spots in this channel.
    • You should observe spots in the array turn green in an X pattern (the pattern in which the fluorescein spots were printed).
  8. Click the 'Align All' button at the lower left of the window to align the entire array to the nearest observed guide spots.
    • Confirm that the alignment is reasonable throughout the entire slide.
    • Flaws in the slide may disrupt the alignment and negatively affect the quantification.
  9. Click the 'Align Each' button at the lower left of the window to align each subarray to the nearest guide spots.
    • Confirm that the alignment is reasonable for each block.
    • If the alignment is not reasonable, or can be improved upon, you should drag and drop the spot outlines manually.
    • To return to the original array, click the 'Reset' button at the lower left of the window.
  10. Select File → Save Image to save a picture of your slide.
    • This saves the current channel exactly as it is displayed in PNG format.
  11. Calculate and save the fluorescence of the foregrounds and backgrounds of each spot.
    • Select File → Save Spot Intensities.
    • This will output a CSV file.
  12. Repeat Steps #4-11 for each of your slides.
    • Note: the GAL file for each slide is specific to that slide and you will therefore need a different GAL file for each slide.
Sp17 20.109 M1D6 array window.png

Part 2: Calculate robust z scores and call hits

The output data, or CSV, file you created in Part 1 saves the information for each spot on your SMM. Each spot contains one compound and each compound was printed at multiple spot locations. Ultimately, we are interested in the summary information for each compound that was printed on your slides. To analyze the summary information, we will use a Pivot Table. A Pivot Table is a data summarization tool that is able to sort, count, total, or average the data stored in one table or spreadsheet. This tool allows researchers to quickly highlight and manipulate desired information within more complex spreadsheets. You will use a Pivot Table to calculate the z scores and p-values for your data.

The directions below are written according for use with the 20.109 laboratory computers. If you use your own computer, the directions may be slightly different, especially if you use a PC. Please ask if you need assistance!

  1. Open the output data CSV file for one of your slides in Excel.
    • There are several columns, but don't worry; we are only interested in the one labeled 'SNR 635'.
    • This value represents the signal-to-noise ratio calculated in the 635 nm channel and is defined by SNR = ( μforeground - μbackground ) / σbackground , where μ is mean and σ is standard deviation.
      Sp17 20.109 M1D6 PivotTable window v2.png
  2. Select Data → PivotTable to summarize the data and create a PivotTable in a new worksheet.
    • Be sure that a cell within your spreadsheet is highlighted (any cell will work).
    • A new Sheet will be created in your spreadsheet and the 'PivotTable Builder' window should appear.
  3. To populate your PivotTable complete the following:
    • From the 'Field Name' box, select 'Name' and drag it into the 'Row Labels' box.
    • From the 'Field Name' box, select 'SNR 635' and drag it into the 'Values' box.
      • Excel will default to 'Sum' of SNR635. Change to 'Average...' by clicking the i then selecting Average from the 'Summarize by:' options.
    • Again, from the 'Field Name' box, select 'SNR 635' and drag it into the 'Values' box.
      • Change to 'StdDev...' by clicking the i then selecting StdDev from the 'Summarize by:' options.
    • Be sure your 'PivotTable Builder' window looks like the image to the right.
  4. The PivotTable should be populated as you complete the items in Step #3 and appear similar to the image below. When you are done, close the PivotTable Builder window.
    Sp17 20.109 M1D6 PivotTable example.png
  5. Calculate the median absolute deviation, defined as MAD = median ( |xi - median(x)| ).
    • Enter "=MEDIAN(ABS(data-MEDIAN(data)))" where data is the data range in the 'Average of SNR 635' column.
    • After you enter the equation, click within the fx field and use the key stroke 'COMMAND + SHIFT + ENTER' to input the array formula.
  6. Calculate the robust z score for each compound, defined as Z = (xi - median(x)) / (1.486(MAD)).
    • Enter "=(cell-MEDIAN($data))/(1.486*$mad) where cell is the cell containing the 'Average of SNR 635' for the compound, data is the data range in the 'Average of SNR 635' column, and mad is the fixed value in the cell that contains the calculated MAD.
    • Time-saving tip: After you calculate the z score of the first compound, double-click on the small black box at the bottom right of its cell; it will automatically expand the formula to all available rows!
  7. Sort your compounds by robust z score.
    • What threshold will you choose to label a small molecule "a hit", i.e. an FKBP12 binder?

Part 3: Visualize your hits and post them on the class wiki

  1. Open ChemDraw on the 20.109 laptop computer.
  2. Copy the SMILES of your first hit (from column A of the Excel pivot table).
  3. In ChemDraw, go to Edit/ Paste Special... and choose SMILES.
    • You can now visualize the chemical structure of your hit with high resolution, and save this image for your M1 Data Summary.
  4. Repeat for all hits.
  5. Finally, share the SMILES of all of your hits with the class on the M1 main Discussion page.
    • At the top right, click on the Edit tab, and paste your SMILES underneath your team name.
  6. Email your excel worksheet to Rob (rwilson at mit).

Thank you!

Navigation links

Next day: Identify chemical structures common to FKBP12 binders

Previous day: Scan slides to identify FKBP12 binders