DRY (Don’t Repeat Yourself) is a principle of software development. The focus of DRY is to avoid repetition of information
When you write code that performs the same tasks over and over again, any modification of one task requires the same change to be made to every single instance of that task! Editing every instance of a task is a lot of work
Instead, you can create functions that perform those tasks, using sets of arguments or inputs to specify how the task is performed.
Good functions only do one thing, but they do it well and often in a variety of contexts. Often the operations contained in a good function are generally useful for many tasks. Take for instance the R function mean(), which computes sample mean values. This function only does one thing (computes a mean). However you may use the mean() function in different places in your code. You may use it to calculate a new column value in a data.frame. Or you could use it to calculate that mean of a matrix.
Global variables are objects in R that exist within the global environment. You learned about the global environment in the first few weeks of class. You can think of it as a bucket filled with all of the objects (and package functions) in your R session. When you code line by line, you create numerous intermediate variables that you don’t need to use again.
Similar to pipes, functions run in their own environment. This function environment is created when the function is called, and deleted (by default) once the function returns a result. Objects defined inside of functions are thus created inside of that function’s environment. Once the function is done running, those objects are gone! This means less memory is used.
Ideally, your code is easy to understand. However, what might seem clear to you now might be clear as mud 6 months from now or even 3 weeks from now (remember we discussed your future self in week 1 of this class).
Well written functions use names that help you better understand the task that the function performs.
Name functions using verbs the indicate what the function does. This makes your code more expressive or self describing and in turn makes it easier to read for both you, your future self and your colleagues.
If all your code is written line by line, with repeated code in multiple parts of your document, it can be challenging to maintain.
Imagine having to fix one element of a line of code that is repeated many times. You will have to find and replace that code to implement the fix in EVERY INSTANCE it occurs in your code. This makes your code difficult to maintain.
Do you also duplicate your comments where you duplicate parts of your scripts? How do you keep the duplicated comments in sync? A comment that is misleading because the code changed is worse than no comment at all.
Re-organizing your code using functions (or organizing your code using functions from the beginning) allows you to explicitly document the tasks that your code performs.
While you won’t learn this in class this week, functions are also useful for testing. As your code gets longer and more complex, it is more prone to mistakes. For example, if your analysis relies on data that gets updated often, you may want to make sure that all the columns in your spreadsheet are present before performing an analysis. Or that the new data are not formatted in a different way.
Changes in data structure and format could cause your code to not run. Or in the worse case scenario, your code may run but return the wrong values!