<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Astrolabe Diagnostics, Inc.</title>
    <description>The latest articles on DEV Community by Astrolabe Diagnostics, Inc. (@astrolabediag).</description>
    <link>https://dev.to/astrolabediag</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F313820%2F837ecd67-46b3-4f72-9c11-6ecc187c8439.jpg</url>
      <title>DEV Community: Astrolabe Diagnostics, Inc.</title>
      <link>https://dev.to/astrolabediag</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/astrolabediag"/>
    <language>en</language>
    <item>
      <title>How to reverse engineer a heat map into its underlying values</title>
      <dc:creator>Astrolabe Diagnostics, Inc.</dc:creator>
      <pubDate>Fri, 10 Jan 2020 15:04:55 +0000</pubDate>
      <link>https://dev.to/astrolabediag/how-to-reverse-engineer-a-heat-map-into-its-underlying-values-1gm0</link>
      <guid>https://dev.to/astrolabediag/how-to-reverse-engineer-a-heat-map-into-its-underlying-values-1gm0</guid>
      <description>&lt;p&gt;Astrolabe Diagnostics is a fully bootstrapped five-person biotech startup. We offer the &lt;a href="http://www.antibodystainingdataset.com/"&gt;Antibody Staining Data Set&lt;/a&gt; (ASDS), a free service that helps immunologists find out the expression of different molecules (markers) across subsets in the immune system. Essentially, the ASDS is a big table of numbers, where every row is a subset and every column a marker. Recently, the Sean Bendall lab at Stanford released &lt;a href="https://www.biorxiv.org/content/10.1101/801530v1"&gt;the preprint of a similar study&lt;/a&gt;, where they measured markers for four of the subsets that the ASDS covered. Since the two studies used different techniques for their measurements I was curious to examine the correlation between the results. However, the preprint did not include any of the actual data. The closest was Figure 1D, a heat map for 98 of the markers measured in the study:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--rNrGBTFG--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/astrolabediagnostics/reverse-engineer-heat-map/blob/master/figure_1d.png%3Fraw%3Dtrue" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--rNrGBTFG--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/astrolabediagnostics/reverse-engineer-heat-map/blob/master/figure_1d.png%3Fraw%3Dtrue" alt="Figure 1D from Glass et al."&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I decided to take the heat map image and "reverse engineer" it into the underlying values. Specifically, what I needed was the "Median scaled expression" referred to in the legend in the bottom right. Since I could not find any existing packages or use cases for easily doing this I decided to hack a solution (check out the code and PNG and CSV files at the &lt;a href="https://github.com/astrolabediagnostics/reverse-engineer-heat-map"&gt;github repository&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;First, I manually entered the marker names from the X-axis into a spreadsheet. Then, I cropped the above image, removing the legends, axes, and the top heat map row which includes an aggregate statistic not relevant to this exercise.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--7atXDni0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/astrolabediagnostics/reverse-engineer-heat-map/blob/master/heat_map.png%3Fraw%3Dtrue" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--7atXDni0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/astrolabediagnostics/reverse-engineer-heat-map/blob/master/heat_map.png%3Fraw%3Dtrue" alt="Crop of just the heat map image itself from the figure"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I loaded the image into R using the &lt;a href="https://www.rdocumentation.org/packages/png/versions/0.1-7/topics/readPNG"&gt;readPNG&lt;/a&gt; function from the &lt;a href="https://cran.r-project.org/web/packages/png/index.html"&gt;png&lt;/a&gt; package. This results in a three-dimensional matrix where the first two dimensions are the X- and Y-values and the third is the RGB values. The X axis maps to the markers and the Y axis maps to the four subsets ("Transitional", "Naive", "Non-switched", and "Switched"), and I wanted to get a single pixel value for each (Subset, Marker) combination. Deciding on the row for each subset was easy enough: I loaded the image in GIMP and picked rows 50, 160, 270, and 380. In order to find the column for each marker I initially planned to iterate over the tile width. Unfortunately, tile widths are not consistent, which is further complicated by the vertical white lines. I ended up choosing them manually in GIMP as well:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Marker,Pixel
CD1d,14
CD31,40
HLA-DQ,70
CD352,100
CD21,128
CD196,156
CD79b,185
CD1c,219
...
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;I could now get the RGB value for a (Subset, Marker) from the PNG. For example, if I wanted the CD31 value for the "Non-switched" subset, I would go to &lt;code&gt;heat_map_png[270, 40, ]&lt;/code&gt;. This will give me the vector &lt;code&gt;[0.6823529, 0.0000000, 0.3882353]&lt;/code&gt;. In order to map these values into the "Median scaled expression" values, I used the legend in the bottom left. First, I cropped it into its own PNG file:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--m6eVk6dl--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/astrolabediagnostics/reverse-engineer-heat-map/blob/master/legend.png%3Fraw%3Dtrue" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--m6eVk6dl--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/astrolabediagnostics/reverse-engineer-heat-map/blob/master/legend.png%3Fraw%3Dtrue" alt="Crop of just the legend from the figure"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I imported it into R using &lt;code&gt;readPNG&lt;/code&gt;, arbitrarily took the pixels from row 10, and mapped them into values using &lt;code&gt;seq&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Import legend PNG, keep only one row, and convert to values. The values "0"
# and "0.86" are taken from the image.
legend_png &amp;lt;- png::readPNG("legend.png")
legend_mtx &amp;lt;- legend_png[10, , ]
legend_vals &amp;lt;- seq(0, 0.86, length.out = nrow(legend_mtx))
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;At this point I planned to reshape the heat map PNG matrix into a data frame and join the RGB values into the legend values. However, this led to two issues.&lt;/p&gt;

&lt;p&gt;One, reshaping a three-dimensional matrix into two dimensions is a headache since I want to make sure I end up with the row and column order I need. Sticking to the spirit of the hack, I iterated over all (Subset, Marker) values instead. This is inelegant (iterating in R is frowned upon) but is a reasonable compromise given the small image size.&lt;/p&gt;

&lt;p&gt;Two, I can't actually join on the legend RGB values. The heat map uses a gradient and therefore some of its values might be missing from the legend itself (the reader can visually infer them). Instead, I calculated the distance between each heat map pixel and the legend pixels and picked the nearest legend pixel for its "Median scaled expression".&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;heat_map_df &amp;lt;- lapply(names(marker_cols), function(marker) {
  lapply(names(cell_subset_rows), function(cell_subset) {
    v &amp;lt;- t(heat_map_png[cell_subset_rows[cell_subset], marker_cols[marker], ])
    dists &amp;lt;- apply(legend_mtx, 1, function(x) sqrt(sum((x - v) ^ 2)))
    data.frame(
      Marker = marker,
      CellSubset = cell_subset,
      Median = legend_vals[which.min(dists)],
      stringsAsFactors = FALSE
    )
  }) %&amp;gt;% dplyr::bind_rows()
}) %&amp;gt;% dplyr::bind_rows()
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;I now have the heat_map_df values I need to compare to the ASDS! As a sanity check, I reproduced the original heat map using ggplot:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;heat_map_df$Marker &amp;lt;- 
  factor(heat_map_df$Marker, levels = names(marker_cols))
heat_map_df$CellSubset &amp;lt;-
  factor(heat_map_df$CellSubset, levels = rev(names(cell_subset_rows)))

ggplot(heat_map_df, aes(x = Marker, y = CellSubset)) +
  geom_tile(aes(fill = Median), color = "white") +
  scale_fill_gradient2(
    name = "Median Scaled Expression",
    low = "black", mid = "red", high = "yellow",
    midpoint = 0.4) +
  theme(axis.text.x = element_text(angle = -90, hjust = 0, vjust = 0.4),
        axis.title = element_blank(),
        legend.position = "bottom",
        panel.background = element_blank())
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--FBwEz6L4--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/astrolabediagnostics/reverse-engineer-heat-map/blob/master/heat_map_reproduction.png%3Fraw%3Dtrue" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--FBwEz6L4--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/astrolabediagnostics/reverse-engineer-heat-map/blob/master/heat_map_reproduction.png%3Fraw%3Dtrue" alt="ggplot2 reproduction of heat map"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The resulting code gets the job done and can be easily repurposed for other heat maps. There will be some manual work involved, namely, setting &lt;code&gt;cell_subset_rows&lt;/code&gt; to the rows in the new heat map, updating &lt;code&gt;marker_cols.csv&lt;/code&gt; accordingly, and setting the boundary values in the &lt;code&gt;seq&lt;/code&gt; call when calculating &lt;code&gt;legend_vals&lt;/code&gt;. Furthermore, we should be able to adapt the above into a more autonomous solution by calculating the boundaries between tiles using &lt;code&gt;diff&lt;/code&gt;, running it separately on the rows and the columns (getting the row and column labels will not be trivial and will require OCR). For a one-time exercise, though, the above hack works remarkably well -- sometimes that is all the data science you need to get the job done. Check out this &lt;a href="https://www.youtube.com/watch?v=QwjH0WlUj74"&gt;YouTube video&lt;/a&gt; for the actual comparison between the data sets!&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
