<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Gordon Shotwell</title>
    <description>The latest articles on DEV Community by Gordon Shotwell (@gshotwell).</description>
    <link>https://dev.to/gshotwell</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F27604%2Fd090c7d9-3011-42de-923b-f0704dea0bc8.jpeg</url>
      <title>DEV Community: Gordon Shotwell</title>
      <link>https://dev.to/gshotwell</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/gshotwell"/>
    <language>en</language>
    <item>
      <title>Why I use R</title>
      <dc:creator>Gordon Shotwell</dc:creator>
      <pubDate>Mon, 30 Dec 2019 00:00:00 +0000</pubDate>
      <link>https://dev.to/gshotwell/why-i-use-r-4hlj</link>
      <guid>https://dev.to/gshotwell/why-i-use-r-4hlj</guid>
      <description>&lt;h1&gt;
  
  
  They said the war was over...
&lt;/h1&gt;

&lt;p&gt;Over the last couple of years prominent members of both the R and Python communities have tried to move past the language wars and support both R and Python workflows. &lt;br&gt;
This makes sense intellectually; after all, R and Python are not all that different in the scheme of things, and so we should let people use whichever language they find more productive. &lt;br&gt;
This conversation manifests very differently in the workplace, however.&lt;br&gt;&lt;br&gt;
Most of the time when a Python data scientist hears that the language wars are over, they think "Well, great --- if R and Python are equally effective, then we can all just standardize on Python."&lt;/p&gt;

&lt;p&gt;This comes up for me personally when coworkers tell me some version of "Hey, your work is great, you're an excellent developer, but have you thought of switching over to Python/Scala/Javascript so that you can really make a contribution?" &lt;br&gt;
Early in my career I took these suggestions seriously.&lt;br&gt;
I came to R from an Excel background, and for a long time I had internalized the feeling that serious engineers used Python, while analysts or researchers could use languages like R.&lt;br&gt;
Over time I've realized that the people making that statement often aren't really informed. &lt;br&gt;
They rarely know anything about R, and often don't really write production-quality code themselves. &lt;br&gt;
In contrast, most of the very senior engineers I've met understand that all programming languages are basically just bundles of trade-offs, and so no single language is going to be globally superior to another.&lt;br&gt;
There really are no production languages -- only production engineers. &lt;/p&gt;

&lt;p&gt;The thing is, I don't use R out of some blind brand loyalty but because I don't like working hard. &lt;br&gt;
Every time I'm faced with a problem, I try to figure out how I can solve that problem in a stable way with the least amount of effort; for most of the problems I face, R is the right tool. &lt;br&gt;
This is partially an accident of training -- I know R very well at this point, so it's usually the most efficient way for me to solve a problem -- but it's also because of core language features that don't really exist in Python.&lt;/p&gt;

&lt;p&gt;Overall I think there are four main features of the core R language that are essential to my work. &lt;br&gt;
These are things that are present in R, that I haven't found to be available or accessible in any other single language, and that make R the best choice for my work:&lt;/p&gt;

&lt;p&gt;1) Native data science structures &lt;br&gt;
2) Non-standard evaluation&lt;br&gt;
3) Packaging consensus (The glory of CRAN)&lt;br&gt;
4) Functional programming&lt;/p&gt;
&lt;h1&gt;
  
  
  1. Native data science structures
&lt;/h1&gt;

&lt;p&gt;It's relatively easy to do data science in R without any external libraries. &lt;br&gt;
You can read data from a csv into a data frame, plot and clean that data, and analyse it  using built-in statistical models. &lt;br&gt;
This is  possible because R was built to do statistics, and so it includes features like vectors  and  data frames, and lets you invert a matrix or fit a linear model. &lt;br&gt;
Over time  Python  has added  all of these capabilities with  numpy, pandas, and scikitlearn, but you usually require dependencies to do data science work. &lt;/p&gt;

&lt;p&gt;I'm generally in favour of using external libraries, but it's nice to have the option of avoiding them in certain circumstances. &lt;br&gt;
Data structures in the base language tend to be more stable than those provided by external dependencies. For instance, I can run base R code from 2010 and be reasonably sure that the code will behave the same way today as it did a decade ago, because maintainers of the core  language are very conservative in introducing breaking changes. &lt;/p&gt;

&lt;p&gt;This is probably not true of R or Python code that relies on libraries like dplyr or pandas because both of those libraries prioritize feature improvement over stability. &lt;br&gt;
This isn't meant as a criticism of either of those libraries, but is meant just to point out that it is beneficial to use a language  that  allows you to do meaningful  data science work without importing external libraries. &lt;/p&gt;

&lt;p&gt;One of the weird things that you probably won't hear from the "R isn't a real language" Pythonistas is that there's this whole group  of production Python engineers who don't think that the Python scientific  computing stack is appropriate for  production. &lt;br&gt;
These people will  make the  same basic argument about pandas that pandas people make about  R:  it's   appropriate for research, but production code requires vanilla Python. Switching from R to Python often doesn't significantly reduce deployment friction, because you still have to do some kind of microservice process in order to isolate the data science dependencies from the rest of the Python code base. &lt;/p&gt;
&lt;h1&gt;
  
  
  2. Non-Standard Evaluation
&lt;/h1&gt;

&lt;p&gt;R includes a strange and wonderful type of &lt;a href="https://stackoverflow.com/questions/2565572/metaprogramming-self-explanatory-code-tutorials-articles-books/2566561#2566561"&gt;metaprogramming&lt;/a&gt; called &lt;a href="http://adv-r.had.co.nz/Computing-on-the-language.html"&gt;Non-standard evaluation&lt;/a&gt;,&lt;br&gt;
which allows you to access and manipulate the calling environment of a  function. &lt;br&gt;
This lets you do things like use a variable name in a plot title, or evaluate a user-supplied expression in a different environment. &lt;/p&gt;

&lt;p&gt;NSE reminds me of a niche and dangerous power tool:&lt;br&gt;&lt;br&gt;
it shouldn't be the first thing you reach for, and it's very dangerous if you don't know what you're doing, &lt;br&gt;
but it allows you to solve problems that would  be otherwise  unsolvable. &lt;/p&gt;

&lt;p&gt;There are three main ways that I use NSE: &lt;/p&gt;
&lt;h3&gt;
  
  
  Separate user representations from programmatic representations
&lt;/h3&gt;

&lt;p&gt;There's often a tension between the most natural way for a user to represent a problem  and the best way to organize that problem internally. &lt;br&gt;
Internally, it's  good if the inputs to a system are unambiguously specified so that it's crystal clear what the system should do, and how the system should  be organized. &lt;br&gt;
In contrast, the user of a function often doesn't need or want to know about the implementation details and instead wants to provide the inputs in the way that requires them to learn the fewest number of new things.&lt;br&gt;
Without NSE, it's very hard to solve this problem because what goes into a function is what the function has to use, but  NSE lets you capture and modify the expressions the user sends into the function so you can  translate them into another form. &lt;br&gt;
For example, R lets you specify models with a formula interface like this: &lt;code&gt;lm(mtcars, mpg ~ cyl)&lt;/code&gt;.&lt;br&gt;
This is a natural way for statisticians to specify statistical models because they're usually familliar with the syntax, but without NSE there's no way to make that function work as written&lt;br&gt;
because &lt;code&gt;mpg&lt;/code&gt; and &lt;code&gt;cyl&lt;/code&gt; are not objects in the calling environment. &lt;br&gt;
NSE allows &lt;code&gt;lm&lt;/code&gt; to capture the &lt;code&gt;mpg ~ cyl&lt;/code&gt; and evaluate them within the &lt;code&gt;data&lt;/code&gt; environment. &lt;br&gt;
To accomplish the same thing in a standard evaluation model you'd need to do some sort of string manipulation, which is what you find in the &lt;a href="https://patsy.readthedocs.io/en/latest/R-comparison.html"&gt;Python version&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Note that you give up a lot by using NSE, and in almost every context it's the wrong tool.&lt;br&gt;
You might look at the R and Python formula interface and think that giving up referential transparency isn't worth avoiding a few quotation marks, but there are lots of cases where it saves the day. &lt;/p&gt;

&lt;p&gt;A more complex example is dplyr, which puts a consistent user facing api in front of dataframe and database backends. &lt;/p&gt;

&lt;p&gt;The programmatic representation varies greatly across the different backends. &lt;br&gt;
Using NSE, the package captures the expressions supplied by the user and translates them into programmatic representations that are understood by the various backends.&lt;/p&gt;
&lt;h3&gt;
  
  
  Make code more concise
&lt;/h3&gt;

&lt;p&gt;Recently I had a whole bunch of functions that all included a structure like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight r"&gt;&lt;code&gt;&lt;span class="n"&gt;f1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;nrow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nf"&gt;return&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="n"&gt;data.frame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;join_col&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;NA&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="c1"&gt;### Lots of time consuming code ....&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;What I wanted to do  was create a new function, &lt;code&gt;returnIfEmpty&lt;/code&gt;, which caused its calling function to return the default dataframe if it were passed  a dataframe without any rows. &lt;br&gt;
I can do this with NSE like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight r"&gt;&lt;code&gt;&lt;span class="n"&gt;returnIfEmpty&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;nrow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="n"&gt;default&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;data.frame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;join_col&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;NA&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="n"&gt;assign&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"return_data"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;default&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;envir&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;parent.frame&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;# Modify calling environment&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;rlang&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;expr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;return&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;return_data&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;# Capture expression&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="n"&gt;rlang&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;eval_bare&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;env&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;parent.frame&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;# Evaluate expression in calling environment&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="n"&gt;f1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="n"&gt;returnIfEmpty&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="c1"&gt;### Lots of time consuming code ....&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="n"&gt;f1&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data.frame&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;





&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;##   join_col
## 1       NA
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;In this example, &lt;code&gt;returnIfEmpty&lt;/code&gt; is first creating an object in its parent environment, then building a call to return that object, and finally  evaluating that call in the parent environment. This was a good way for me to avoid a lot of code repetition, which I don't think I would've been able to do otherwise. &lt;/p&gt;

&lt;h3&gt;
  
  
  Learn the user's language
&lt;/h3&gt;

&lt;p&gt;One of the great things about accessing the user's environment is that there's a wealth of information in that environment that is particularly meaningful to that user. &lt;br&gt;
In particular, if you can use the names that a user assigned to a variable in function output or error messages it makes your function much easier for them to understand. &lt;/p&gt;

&lt;p&gt;Consider this function:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight r"&gt;&lt;code&gt;&lt;span class="n"&gt;regularError&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;inherits&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"data.frame"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="n"&gt;warning&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"df must be a dataframe"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="n"&gt;my_var&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="m"&gt;10&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;regularError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;my_var&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;





&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;## Warning in regularError(my_var): df must be a dataframe
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;The user doesn't know anything about the internal arguments of your function, so they probably don't know what the &lt;code&gt;df&lt;/code&gt; in the warning message is referring to.&lt;br&gt;
In order to understand the warning, they need to read the documentation, or worse, the source code. &lt;br&gt;
This leads to confusion, because you're forcing the user to learn your language rather than telling them what's wrong in their own terms. &lt;br&gt;
NSE allows us to make the error much friendlier:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight r"&gt;&lt;code&gt;&lt;span class="n"&gt;fancyError&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="n"&gt;class&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;class&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="n"&gt;var_name&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;as.character&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;substitute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;inherits&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"data.frame"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="n"&gt;warning&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;glue&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;glue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"'{var_name}' is of class '{class}' when it needs to be a dataframe"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;fancyError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;my_var&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;





&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;## Warning in fancyError(my_var): 'my_var' is of class 'integer' when it needs
## to be a dataframe
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;This communicates what's going wrong in terms that are much more meaningful to the user, because they assigned the name "my_var" to the object in the first place. &lt;/p&gt;

&lt;p&gt;You might think that it's too much effort to write a friendly error message, but in my work I find that details like this help build delightful products that are easy to use. &lt;br&gt;
It's worth taking the time to communicate problems in the user's language, and NSE is the best way I know to learn that language. &lt;/p&gt;

&lt;h1&gt;
  
  
  3. The glory of CRAN
&lt;/h1&gt;

&lt;p&gt;I started programming on a Saturday morning during law school. &lt;br&gt;
In hindsight this was a very important morning for me because in many ways it shaped the course of my career for the next ten years. &lt;br&gt;
I probably had about 20 minutes to become interested and excited about the project, since I had lots of homework to do and programming wasn't something that was on any of my to-do lists at the time. &lt;br&gt;
The resource I started with was "R Twotorials," which taught you how to use R in two minute lessons.&lt;/p&gt;

&lt;p&gt;R let me get up and running, installing packages, filtering data, and printing plots in under 20 minutes, which meant that I stayed interested in the language and eventually started using it professionally. &lt;br&gt;
I had actually started to learn Python at around the same time but just found it too difficult. &lt;br&gt;
I didn't know how to open a terminal window, I didn't want to spend any time on configuration, and I didn't have any time to devote to setup. &lt;br&gt;
Python required me to spend more than 20 minutes on setup, and R didn't, so I picked R. &lt;/p&gt;

&lt;p&gt;The reason why this all worked was because of CRAN. &lt;br&gt;
CRAN has (maybe forcibly) created a strong consensus on how to package and distribute R code, which means that nine times out of ten an R package will install and run with no user configuration. &lt;br&gt;
Today I am fairly comfortable at the command line and futzing around getting computer programs to work on my machine, but I'm still completely unwilling to  use an R package that requires me to do much more than &lt;code&gt;install.packages("package_name")&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;This feature is important to me because I want my code to be useable and installable by people who, like my law school self, do not think of themselves as programmers and do not have any tolerance for &lt;a href="http://pgbovine.net/command-line-bullshittery.htm"&gt;command line bullshittery&lt;/a&gt;. &lt;br&gt;
My goal for all the products I develop is that they are installable and runnable with a single user action, and that action cannot take place in the terminal. &lt;br&gt;
This means that when I'm importing dependencies, I need to be confident that all of those dependencies themselves can be installed and set up without user intervention.&lt;br&gt;
CRAN does this by moving a lot of the setup and configuration pain from the end user to the package maintainer, and while this probably slows down development and release of packages, it vastly improves the user experience of the average user. &lt;/p&gt;

&lt;p&gt;Python is, well, &lt;a href="https://medium.com/knerd/the-nine-circles-of-python-dependency-hell-481d53e3e025"&gt;not like that&lt;/a&gt;. &lt;br&gt;
I'm not a great Python developer, but I am a professional computer programmer and I still feel like it's even odds that installing some Python library is going to cost me an afternoon of torture and four broken keyboards. &lt;br&gt;
If there are any Python evangelists still reading this, they might have a response that begins with, "Well you just," but remember the user that I care the most about only has 20 minutes of attention and no real programming skill, so the only thing they can "just" do is copy and paste one line of code into a console.&lt;br&gt;
If that doesn't work, I've lost them, and they'll spend another lonely year renewing their SPSS licenses. &lt;/p&gt;

&lt;h1&gt;
  
  
  4. Functional programming
&lt;/h1&gt;

&lt;p&gt;R is a &lt;a href="http://adv-r.had.co.nz/Functional-programming.html"&gt;functional programming&lt;/a&gt; language, which means that the natural way to accomplish something in the language is to use functions. &lt;br&gt;
I really like this pattern of programming because breaking complicated jobs down into small functional bricks gives me confidence that the overall solution is correct. &lt;br&gt;
I can work on the small functions, verify that they're correct through tests, and then know that combining those building blocks together won't change their behaviour. &lt;/p&gt;

&lt;p&gt;Functional programming is not the best paradigm for all problems. &lt;br&gt;
For example, React is a functional paradigm for building user interfaces where you stack together functional components to build complicated web apps. &lt;br&gt;
The initial issue with React was that since the components were designed to be independent from one another, it made it very difficult to track application state. &lt;br&gt;
If a user logged in in some part of the application, you would have to pass that information back up to the top level of the app, and then back down  to all the components that needed to know about it.&lt;br&gt;
It was easy to miss a connection, which would result in the user being logged into one part of the app, but logged out of another. &lt;/p&gt;

&lt;p&gt;This creates a lot of difficult bugs where the application state is inconsistent across the product. &lt;br&gt;
React solved this problem by adding in global state stores like Rudux, which let people still use pure functional  components for most things, but break that pattern when you need to set or access application state. &lt;/p&gt;

&lt;p&gt;Functional programming is a great tool for data science problems because they are mostly stateless. &lt;br&gt;
When I'm building a statistical model, what I'm really doing is creating a mapping between some set of inputs and an output; in other words, a function. &lt;br&gt;
It's usually more important that the mapping is clearly defined than that it takes into account user state. &lt;br&gt;
It is possible to do &lt;a href="https://stackabuse.com/functional-programming-in-python/"&gt;functional programming&lt;/a&gt; in Python, but it's a bit like ordering soup at a pizza parlour. &lt;/p&gt;

&lt;p&gt;While a lot of the FP tools are there, the majority of the community doesn't use functional patterns as their main development paradigm, and you'd probably get a few "That's not Pythonic" comments on your pull request.&lt;/p&gt;

&lt;h1&gt;
  
  
  Conclusion
&lt;/h1&gt;

&lt;p&gt;None of this is to suggest that anyone else should use R. &lt;br&gt;
R, like all other languages, is a bundle of trade-offs, and those are bad trade-offs in many contexts. &lt;br&gt;
In my context, however, the flexibility of R is extremely useful, and I can't give up those language features without my work suffering. &lt;/p&gt;

</description>
    </item>
    <item>
      <title>Advice for non-traditional data scientists</title>
      <dc:creator>Gordon Shotwell</dc:creator>
      <pubDate>Tue, 29 Aug 2017 00:00:00 +0000</pubDate>
      <link>https://dev.to/gshotwell/advice-for-non-traditional-data-scientists</link>
      <guid>https://dev.to/gshotwell/advice-for-non-traditional-data-scientists</guid>
      <description>&lt;p&gt;I have a pretty strange background for a data scientist. In my career I’ve sold electric razors, worked on credit derivatives during the 2008 financial crash, written market reports on orthopaedic biomaterials, and practiced law. I started programming in R during law school, partly as a way to learn more about data visualization and partly to help analyze youth criminal justice data. Over time I came to enjoy programming more than law and decided to make the switch to data work about three years ago. Since then I’ve freelanced a bit, worked as a Data Scientist at Upworthy, and now am a Senior R Developer at a survey company called Crunch.io.&lt;/p&gt;

&lt;p&gt;When I started, I honestly didn’t have any particular skills or capacity which would have made data science a good career choice. I studied philosophy in undergrad, and while I had done a bit of statistics, it wasn’t something I would have said I was comfortable with. All I really had was an interest and the capacity to learn new things. If you’re in a similar boat, here is some advice about the process:&lt;/p&gt;

&lt;h3&gt;
  
  
  Emulate one or two people who know what they're doing.
&lt;/h3&gt;

&lt;p&gt;There is a huge diversity of tools and techniques for approaching data work, and if you half-learn a lot of different techniques you won’t be able to fully understand or accomplish any one technique. My recommendation is to pick one or two people who work in the field and who speak in a language you understand and try to emulate them. In my case I really focused on learning the R programming language, and picked a few R programmers to follow. I listened to all of their online talks, read their blogs and followed their activity on Github. This meant that I ended up with a deep understanding of a few small areas of the language and missed out on a lot of other areas. For instance, I learned &lt;code&gt;dplyr&lt;/code&gt; really well, but didn’t learn much about object oriented programming. It’s good to try to develop depth of knowledge, rather than breadth, because when you know one thing very well you can usually apply that knowledge to other areas. A shallow understanding of many areas won’t help you tackle advanced problems in a specialized area.&lt;/p&gt;

&lt;h3&gt;
  
  
  "Learning to code" and "learning statistics" are terrible goals because they have no end point.
&lt;/h3&gt;

&lt;p&gt;When you are learning a new skill, it’s important to have specific criteria for success. This helps keep you on track and also helps mitigate imposter syndrome. You don’t want to move the goalposts as you develop your understanding. From this perspective, “learning to code and “learning statistics are terrible goals, because there’s always more to learn about these fields. It’s better to have smaller goals, like, “Learn to write a function in R, or, “Be able to fit a linear model, because those things can be accomplished. Goals that can be accomplished are good things because you can accomplish them, rather than being constantly reminded how far you have to go.&lt;/p&gt;

&lt;h3&gt;
  
  
  Focus on trajectory
&lt;/h3&gt;

&lt;p&gt;We naturally compare ourselves to others and tend to judge our own skills in terms of other people’s skills. The problem with this is that as our understanding improves, we tend to change our measures of comparison to more and more accomplished people. This is especially a problem when we compare our own general understanding of an area to that of specialists. For instance, you might have a good general understanding of neural networks, but if you compare yourself to someone who studies them full time, your understanding will obviously be pretty paltry. This kind of comparative thinking leads to always feeling insufficient, because no matter who you are or how much you know, there is always somebody who will know more.&lt;/p&gt;

&lt;p&gt;A better approach is to focus on trajectory. Ask whether you are making progress rather than whether you are relatively successful. Think about what you knew yesterday and feel good if you learned a bit more today. Over time that approach will lead to much better understanding with much less suffering.&lt;/p&gt;

&lt;h3&gt;
  
  
  Follow Kind Experts
&lt;/h3&gt;

&lt;p&gt;Every field has experts, and many of those experts are assholes. Indeed in our society we frequently use lack of empathy or kindness as a sign of intelligence or accomplishment. We call these people “geniuses, or “rockstars, and try to forgive their personal faults by pointing to their intellectual accomplishments. I think you should ignore these people and instead try to find experts who have a genuine concern for other people. There are two reasons for this.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Kind experts will help you learn. This is common sense. If somebody has a genuine concern with helping other people understand an area, the resources they produce will be better at teaching other people how to do that thing. Moreover the community of learners that surrounds these people is going to be supportive, rather than combative, and so engaging with that community will be a good experience&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Kind people know more than unkind people. This is less intuitive, but just as important. Experts who are genuinely concerned with other people tend to create environments where other people help them. A good example of this is Hadley Wickham’s Twitter feed. What you notice following him is that he is for the most part very kind in how he communicates with other people. The result of this is that he has developed a powerful network of people who answer his questions, test and provide feedback on his software projects, and grow into kind experts themselves. Most of the really productive developers I have interacted with have these same kinds of networks, and the reason they have them is that they spend a lot of time and energy supporting people, rather than belittling them. No matter how brilliant a brilliant jerk is, they will always always lose out to a group of people working together on a problem.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Try to Ignore Boundary Setting Behaviour
&lt;/h3&gt;

&lt;p&gt;Boundary setting behaviour is when people who are part of a group attempt to draw the lines around that group to include themselves and exclude you. For instance, programmers sometimes say things like, “Real programmers use the command line, or, “You really need to learn Scala if you want to be a programmer. The motivation for this is not to accurate express the boundaries of the discipline, but instead to make themselves feel better about their own skills. Often, out of insecurity, people will express the importance of their own skills and try to minimize the importance of skills they lack. About half of the stuff you read is written to address that insecurity rather than to help you learn. If possible, you should try to avoid this kind of advice.&lt;/p&gt;

&lt;p&gt;But you will definitely encounter it. You will get rejected from jobs, or made to feel like an idiot, because of this kind of behaviour, and there’s nothing you can really do about that. Applying for data science jobs from a non-traditional background, I frequently encountered people who believe all of the following:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Data manipulation, visualization, and communication are the most common data science tasks;&lt;/li&gt;
&lt;li&gt; We should focus on building teams with distinct capacities rather than trying to find one person who knows how to do everything;&lt;/li&gt;
&lt;li&gt; Most of the time you want a simple linear model, not a complex machine learning algorithm;&lt;/li&gt;
&lt;li&gt; There is a labour shortage for data science roles and we should expand the applicant pool; and&lt;/li&gt;
&lt;li&gt; It is absolutely essential that you have an advanced degree in statistics to apply for a job at my company&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Now, there’s a way that all of these things can be true at the same time, but more likely than not #1-4 describe a lot of the actual job requirements, and #5 is boundary setting. But when you apply for and get rejected from the job you can’t really ignore that behaviour; you have to deal with it emotionally.&lt;/p&gt;

&lt;p&gt;One of the key ways you can recognize boundary setting behaviour is when people start to equate being a member of a profession with being a particularly good member of that profession. For instance someone might tell you that to be a real data scientist you need to have a PHD in statistics, and have mastered R, Python, and big-data query languages, and be an exceptional written and verbal communicator. Having these skills probably makes you an extremely skilled data scientist, but are they really hard boundaries around the profession? I'm not so sure. In most cases we talk about jobs based on the job title, rather than the job requirements. If you’re a baseball player who stands near first base we call you a first baseman, if you write for a living we call you a writer. These things are true even if you write trashy science fiction novels or are a terrible fielder. The boundaries of the profession are set by the market, not by your skills, and so you can be a good or bad example of the profession without having that change your membership in that profession. I think the same thing should be true of programming. Can you get a job writing computer programs? Then you're a programmer. Do you work with data for a living? You can probably call yourself a data scientist. &lt;/p&gt;

&lt;h3&gt;
  
  
  Learn to Bounce Back
&lt;/h3&gt;

&lt;p&gt;The main currency in learning a new skill is enthusiasm. Each time you have some success, you bank a bit of enthusiasm, and each time you experience challenges you lose a little bit of it. It is important to try to understand what helps you gain energy and enthusiasm, and what helps you lose it. Whenever you notice that you’re going into the red in the enthusiasm department, either take a break, or do something that helps you feel a bit better. Here are some of the things I do when I feel bleak about my abilities:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Take a walk&lt;/li&gt;
&lt;li&gt;  Take a nap&lt;/li&gt;
&lt;li&gt;  Exercise for five minutes&lt;/li&gt;
&lt;li&gt;  Answer easy stack overflow questions&lt;/li&gt;
&lt;li&gt;  If I’m frustrated by a computer problem I copy and paste the examples and run them&lt;/li&gt;
&lt;li&gt;  Meditate&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I think the key to this is not so much what particular things you do to bounce back, but that you recognize when you are losing energy for a task, and take a break from learning.. Stress basically obliterates our ability to learn new things and so simply stopping for a bit is a powerful way to maintain our capacity for new skills.&lt;/p&gt;

</description>
      <category>r</category>
      <category>datascience</category>
      <category>education</category>
    </item>
    <item>
      <title>Why you should work remotely</title>
      <dc:creator>Gordon Shotwell</dc:creator>
      <pubDate>Wed, 26 Jul 2017 14:46:48 +0000</pubDate>
      <link>https://dev.to/gshotwell/why-you-should-work-remotely</link>
      <guid>https://dev.to/gshotwell/why-you-should-work-remotely</guid>
      <description>&lt;p&gt;My last job was as a data scientist at Upworthy, which is a 100% remote company. Prior to starting the position I was worried about whether I could be happy and productive on a remote team. I wondered how project planning would work, whether it would be terribly lonely, and how communication would function when things got hectic. What I discovered is that the company was one of the more efficient and friendly places that I've worked, and I think the changes that they have made to accommodate remote work deserve much of the credit. &lt;/p&gt;

&lt;p&gt;Most companies who hire remote employees do so to take advantage of cheaper labour markets. If you are a Silicon Valley tech company and are able to hire remotely you will be able to hire from areas which are not experiencing a labour shortage and as a result you will able to hire better people for less money. The thing which prevents companies from taking advantage of this opportunity is that they are afraid that remote work will disrupt the company's workflow. How will they organize projects across multiple time zones? How will we come up with ideas if we have no watercoolers? I think that most of the changes required to accommodate remote employees are actually independent goods which every company should adopt. And since I learned all this at Upworthy, I'm going to present these benefits as a listical with cute animal pictures. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2F0xlzm8xxfftf5yresabg.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2F0xlzm8xxfftf5yresabg.jpg"&gt;&lt;/a&gt;&lt;br&gt;
&lt;br&gt;&lt;br&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Four Ways that Remote Work Helps Your Company
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1) Results orientation
&lt;/h3&gt;

&lt;p&gt;Results oriented work environments are those which only care about employee &lt;em&gt;output&lt;/em&gt; and do not care about their effort. In these environments it doesn't matter if an employee leaves early or checks Facebook at work so long as their results are good. This makes a lot of sense because ultimately, the company, as a whole is judged on its results, and so an employee's work should be judged relative to how it contributes to those results. The problem is that defining results for an employee is a hard job, and it's much easier to judge people based on their effort. Most companies which say they are results focused still use this heuristic.  &lt;/p&gt;

&lt;p&gt;Remote work makes results-orientation mandatory. The simple fact that you can't tell if or when your remote employees are at their desk means that managers have no choice but to evaluate employees based on their work product. As a result employees have incentives to make their work more efficient, for instance by taking short naps after lunch. &lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fwkq62f945lfoty9avf38.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fwkq62f945lfoty9avf38.jpg"&gt;&lt;/a&gt;&lt;br&gt;
&lt;br&gt;&lt;br&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  2) Intentional communication
&lt;/h3&gt;

&lt;p&gt;When you work with a remote team you have to be intentional about your communication. If you are having a bad day or have run into a block of some kind, you have you proactively tell other people about it. Nobody is going to walk by your desk and notice why you are frustrated, and so you have to be intentional about asking for help, and connecting with your colleagues. My team at Upworthy did this, in part, by scheduling weekly one-on-one video calls between different engineers. These were a half-hour long, and the only requirement was that you not talk about work. We also did things like daily Slack check-ins and making sure to start meetings by spending a bit of time asking how everyone was doing. There's an assumption that this kind of informal communication happens automatically at in-person offices, but I don't think that's actually true. It might be true that some employees are making some connections, but most employees don't actually end up connecting with a wide slice of their colleagues. This kind of intentional communication is connection and emotional safety. People develop better relationships with one another, and so they are more likely to ask for help or raise a dissenting view. Because remote work brings the problem of disconnection into sharper focus, it provides an incentive to build structures which help people connect. &lt;/p&gt;

&lt;p&gt;Here is an example of my dog Cadence engaging in intentional communication through a digital platform. &lt;/p&gt;



&lt;h3&gt;
  
  
  3) Continuous documentation
&lt;/h3&gt;

&lt;p&gt;Remote work environments promote asynchronous communication. You spend more time communicating through email, Slack, or other written media than through conversation. Even video conferences involve a substantial amount of written communication. For instance at Upworthy we would frequently keep collaborative meetings notes in a Dropbox Paper document, or create Jira tickets as tasks were being assigned during the meeting. This is in contrast to in-person work environments where a lot of the work assignment takes place through verbal communication either at a meeting or through your boss dropping by your desk to ask you to do something. These synchronous bits of communication then need to be documented if anyone's going to keep track of them. &lt;/p&gt;

&lt;p&gt;The great thing about remote communication is that teams are constantly producing written artifacts which are easy to turn into documentation. For instance, you never forget what exactly your boss asked you to do because work assignments almost always include a written description. It's also easy to move from a brainstorming document used at a meeting, to a working document discussing a system, to a polished bit of documentation about that system. Because everyone works in text all the time, there's a lower cost to creating documentation. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2F7thkavjqm6uugyozc36p.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2F7thkavjqm6uugyozc36p.jpg"&gt;&lt;/a&gt;&lt;/p&gt;



&lt;h3&gt;
  
  
  4) Support for diversity
&lt;/h3&gt;

&lt;p&gt;Almost every tech company has a major diversity problem. This is bad both from a basic ethical perspective but also because it causes teams to ignore perspectives other than their own. Structuring your company to promote remote work is, in my opinion, the single biggest action that you can take to promote diversity. There are two main ways that it does this: &lt;/p&gt;

&lt;h4&gt;
  
  
  Remote work accommodates diverse workplace requirements
&lt;/h4&gt;

&lt;p&gt;Workplaces are path-dependent places. You start your company with a few employees, and then make decisions about how that workplace develops physically and socially based on those employees. These decisions in turn attract employees who like those work environments and the cycle continues. For instance, it's very likely that your open-concept, start-up office with a climbing wall and beer in the fridge is tailored to support the work of able-bodied young men. Probably you won't make the investments to support the work of, say, a blind engineer who needs to code by voice, and so you will never hire that engineer. &lt;/p&gt;

&lt;p&gt;The same thing is true for geography. A small or midsized company tends to pick office locations based on where their current employees want to work, and so tend to hire people from particular neighborhoods, because that office location is convenient to those neighborhoods. This embeds a fair amount of socioeconomic and ethnic bias into your workplace because neighborhoods tend to be ethically and socio-economically segregated.&lt;/p&gt;

&lt;p&gt;This is a kind of Catch-22 for workplace accommodation: you don't know what to change if none of your employees need accommodation, but you won't be able to hire employees who need accommodation if you haven't made those changes. &lt;/p&gt;

&lt;p&gt;A simple solution is to just let an employee determine their own work environment. If someone has a particular workplace need, it's very likely that their home already meets that need. If they need a quiet environment to code by voice, they will be able to find or create that space when you let them work remotely. If they need to be close to their children or ailing parents, they will be able to do that. Similarly your company can hire from communities which are geographically removed from your current work neighborhoods which increases diversity.  &lt;/p&gt;

&lt;h4&gt;
  
  
  Reducing bias
&lt;/h4&gt;

&lt;p&gt;Before getting into this, I should say that I have almost no personal experience with bias in the workplace. That said, I'm lawyer with some experience in human rights litigation and can speak from that perspective. The thing you notice again and again with human rights disputes is that bias flourishes in areas with poor evidence. It's hard to prove workplace discrimination, because this discrimination often happens informally and usually without witnesses. Claimants have a hard time reporting a discriminatory conduct, because they are justifiably worried that they won't be believed without some kind of smoking gun bit of evidence. Since the assholes of the world are well aware of this fact, they tend to do their harassing in circumstances where it will be hard to prove the conduct. Because remote work funnels more communication through channels which leave a paper trail, it reduces the amount communication which is amendable to this kind of discriminatory conduct. If someone harasses their female coworker through Slack or email, it's very easy for that coworker to forward that communication on to human resources or a labour lawyer. It's also easier to correct people for inadvertently saying or doing something discriminatory, because you have a record to look back on to identify what exactly was wrong with their behavior. &lt;/p&gt;

&lt;p&gt;You can probably think of other ways that remote work helps with bias. For instance results-orientation might lead to more evidence-based promotion decisions, women might be better able to better integrate their professional life with gendered, non-professional work like child care, or text-based communication might ameliorate implicit bias. I have a suspicion that these factors are valid, but have neither data nor experience to support that suspicion so, will leave it at that. &lt;/p&gt;

&lt;p&gt;Accommodating diversity extends beyond people who you typically think of as requiring accommodation. Almost everybody's work-life could be improved if that work fit their life circumstances a bit better.&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
