tidyr has been extremely useful to me as an analyst since I discovered it. Specifically, I’m talking about the two functions: gather
and spread
. Often, “secondhand data” (just heard this term today) need to be transformed into long format , or the results of my analyses need to be in wide format so it’s more easily digestable by business people. I just did a deep dive of dplyr, so I thought I’d look into tidyr next!
Here are the functions that I discovered and think will be useful in future analyses:
extract_numeric
- A convenient wrapper for the corresponding regex, which I have not yet committed to memory…
fill
- Another good one to deal with a potential quirk of data you get from others. It’s like Excel’s copy down, but for all values in one simple call!
gather
- The parameters can be a little confusing upon first discovering this method. So, I personally feel it makes the most sense to think of the first parameter as what you want to call the various columns you’re trying to gather, then the second column is what you want to call the values of the columns, and lastly you need to tell the function which column is not part of the transformation, if any exist.
replace_na
- The easiest way to replace all NAs with a given value is just selecting all the indices where is.na
is true, but this function allows you to change the replacement value by column, which could come in handy!
separate_rows
- This function separates the bunched up values within a column into their own rows. It’s cool, so I’m mentioning it here, but I’m not sure how often this is needed!
spread
- The reverse of gather. The first parameter is the name of the column that will be spread into their individual columns, while the second parameter is the name of the column containing the numeric values.
I just used the tidyr reference manual to create this write-up!