# How to apply any summary function to data with Dplyr, ..., and non-standard evaluation in R

A few days ago I asked this question on stack overflow. The answer was awesome, but left me with some confusion as to what exactly was happening. And thus the idea for this post was born.

Summarizing data across a dimension happens **a lot**. And while it mostly happens with simple functions like mean, median, standard deviations, min, max, etc…, occassionally you want to summarize using a more complex operation. Enter Dplyr in all it’s glory.

Simple summarizing of data with

1 | dplyr |

Here

1 | group_by() |

1

summarize()

applies 1

mean()

and 1

sd()

to each group. 1

%>%

is a pipe operator that applies the output of a previous function to the next function. You can have lots of fun reading about it and dplyr in this nice post.
So, that’s great and all. But you can extend this to apply to any function, not just

1 | mean() |

1

sd()

:
But what if you want to vary

1 | m |

1

b

in the example above based on some other data? In effect, what if you want to wrap summarize in another function to allow for different argument values? This is not a theoretical question. I needed to do this for a shiny app I’m working on. In it we want to display “Growing Degree Days”, which are a measure of how much of the heat in a day is available to crops. It’s calculated like this:
GDD can be calculated using the defaults in the function above, but it can also be tuned to a specific crop, for which different values may apply. Apples, for example, would be:

We want to wrap

1 | calcGDD() |

1

mean

that’s relatively easy, and looks like this:
Notice that “group_by” is now “group_by_” and “summarize” is now “summarize_”. The added “_” to many (all?) dplyr functions allows for what is called **non-standard evaluation** (NSE). Basically, evaluation is held until the value for a variable can be substituted, rather than evaluated directly. So here “group_var” is first evaluated to its value “day”, which is then passed to the group_by function. Wickham has a great vignette you can access in R by `vignette(“nse”), or you can read about it in Advanced R.

With the dynamic function, if you had multiple data sets for which you wanted to calculate the mean over a group, instead of grouping and summarizing each one, you could just make a series of calls to

1 | dynamicSummarize() |

###The Hard Part
Suppose we sometimes want to replace some of the default values to

1 | calcGDD() |

1

dynamicCalcGDD()

function.
Unfortunately, (as so often happens) the first thing I did didn’t work:

Of course…, because

1 | ... |

Welcome to the wonderful world of… something. Anyway, as you can imagine I spent quite a while trying to figure out how to get this working, which eventually led to my swearing off programming, deciding to be a hobo, and finally posting the question I linked to at the beginning of this post to stack exchange.

The solution was less convoluted than I had imagined, and ends up using

1 | call |

1

do.call

:
But *why* does it work? Well,

1 | call |

A simple test function reveals more:

So by evaluating

1 | temp_var |

1

dyanimicCalcGDD

returns: 1

calcGDD(temp, <list of optional variables>)

. These then get evaluated by 1

lazy_

in the summarize call. To annotate each line:
If you’re here looking for help, I hope you found it. I know it definitely helped me to write this.