How to Add Factors to Continuous Variable in R Dplyr
In this tutorial, you will learn how to rename factor levels in R. First, we will use the base functions that are available in R, and then we will use dplyr.
To rename factor levels using levels()
we can assign a character vector with the new names. If we want to recode factor levels with dplyr we can use the recode_factor()
function.
Outline
This R tutorial has the following outline. First, we start by answering some simple questions. Second, we will have a look at what is required to follow this tutorial. Third, we will read an example data set so that we have something to practice on. Fourth, we will go into how to rename factor levels using 1) the levels() function, and 2) the recode_factor() function from the dplyr package.
How do I Rename a Level in R?
One simple method to rename a factor level in R is levels(your_df$Category1)[levels(our_df$Category1)=="A"] <- "B"
where your_df
is your data frame and Category1
is the column containing your categorical data. Now, this would recode your factor level "A" to the new "B".
How do I Rename Factor Levels in R?
The simplest way to rename multiple factor levels is to use the levels() function. For example, to recode the factor levels "A", "B", and "C" you can use the following code: levels(your_df$Category1) <- c("Factor 1", "Factor 2", "Factor 3")
. This would efficiently rename the factors "Factor 1" and so on.
In the next section, we will have a look at what is needed to follow this post.
Prerequisites
To learn to recode factor levels by the examples in this post you need to download this data set. Furthermore, if you plan on using dplyr and the recode_factor() function, you will need to install this package. Here's how to install an R-package:
install.packages("dplyr")
Code language: R ( r )
Note that this package is very useful. You can, for instance, use dplyr to remove columns in R, and calculate descriptive statistics. A quick tip, before going on to the tutorial part of the post, is that you can install dplyr among plenty of other very good r packages if you install the Tidyverse package. For example, you will get ggplot2 that can be used for data visualization (e.g., can be used to create a scatter plot in R), lubridate to handle datetime data (e.g. to extract year from datetime). In the next section, we are going to read the example data from the .csv file.
Example Data
Here is how to read a CSV file in R using the read.csv function:
# Import data data <- read.csv("flanks.csv")
Code language: R ( r )
Note that you need to download the CSV file and store it in the same directory as your R script. Data can, of course, also be imported from other data sources. See the following tutorials for more information:
- How to Read & Write SPSS Files in R Statistical Environment
- R Excel Tutorial: How to Read and Write xlsx files in R
- How to Read and Write Stata (.dta) Files in R with Haven
- Reading SAS Files in R with Haven & sas7dbat
Now, we have the data frame called data
. If we want to get information about the variables in the data frame we can use the str()
function:
In the image above, we it is clear that we have a data frame containing 5 columns (i.e., variables). Notice that the first column probably is the index column but we will leave it like this. Of particular interest, for this post we can see that we have one column with a categorical variable called "TrialType". Furthermore, we can see that this variable has two factor levels.
In the, we are going to use levels()
to change the name of the levels of a categorical variable. First, we are just assigning a character vector with the new names. Second, we are going to use a list renaming the factor levels by name.
Example 1: Rename Factor Levels in R with levels()
Here's how to change the name of factor levels using levels()
:
# Renaming factor levels levels(data$TrialType) <- c("Con", "InCon")
Code language: R ( r )
In the example above, we used the levels() function and selected the categorical variable that we wanted. Furthermore, we created a character vector. Notice how we here put the new names. If we use the levels() function again without assigning anything we can now see that we actually renamed the factor levels:
Note that if we try to assign a character vector containing too few, or too many, elements (i.e., names) it will not work. This will lead to an error (i.e., 'Error in `levels<-.factor`(`*tmp*`, value = "Con") : number of levels differs'). Now that you have renamed the levels of a factor, you might want to clean the data frame from duplicate rows or columns. Furthermore, you can use the t() function to transpose in R (i.e a matrix OR dataframe).
In the next example we will rename factor levels by name also using the levels() function.
Example 2: Rename Factor Levels By Name with levels()
Here's how to rename the factor levels by name:
# Recode factor levels by name levels(data$TrialType) <- list(Congruent = "Con", InCongruent = "InCon")
Code language: R ( r )
Here's the output from str()
in which we can see that we renamed the levels of the TrialType factor, again:
Note, however, that when we rename factor levels by name like in the example above, ALL levels need to be present in the list; if any are not in the list, they will be replaced with NA. In the next example, we are going to work with dplyr to change the name of the factor levels. That is, you will end up with only a single factor level and NA scores. Not that good.
Note, if you are planning on carrying out regression analysis and still want to use your categorical variables, you can at this point create dummy variables in R.
Example 3: Rename Factor Levels in R with dplyr's recode_factor()
One of the simplest ways to rename factor levels is by using the recode_factor()
function:
# Renaming factor levels dplyr data$TrialType <- recode_factor(data$TrialType, congruent = "Con", incongruent = "InCon")
Code language: R ( r )
In the code example above, we first loaded dplyr so that we get the recode_factor()
function into our name space. On the second line, we assign the renamed factors to the column containing our categorical variable. The recode_factor()
function works in a way that the first argument is the character vector. This argument is then followed by the level of a factor (e.g., the first) and then the new name. Each following argument is then the other factors we want to be renamed.
As previously mentioned, dplyr is a very useful package. It can also be used to add a column to an R data frame based other columns, or to simply add a column to a data frame in R. This can be, of course, also be done with other packages that are part of the TIdyverse. Note that there are other ways to recode levels of a factor in R. For instance, another package that is part of the Tidyverse package has a function that can be used: forcats.
Conclusion
In this tutorial, you have learned how to rename factor levels in R. First, we had a look at how to use the levels()
function to recode the levels of factors. Second, we had a look at the recode_factor()
function from the dplyr package to do the same. Hope you learned something valuable. Please share the tutorial on your social media accounts if you did.
Other R Resources
Here are some other resources that you may find useful when working in R statistical environment:
- How to use %in% in R: 7 Example Uses of the Operator
- Learn How to Generate a Sequence of Numbers in R with :, seq() and rep()
- How to use the Repeat and Replicate functions in R
- More on working with datetime objects in R: How to Extract Day from Datetime in R with Examples and How to Extract Time from Datetime in R – with Examples
- R Resources for Psychologists - for a collection of useful resources
- How to Take Absolute Value in R – vector, matrix, & data frame
Source: https://www.marsja.se/how-to-rename-factor-levels-in-r-dplyr/
0 Response to "How to Add Factors to Continuous Variable in R Dplyr"
Post a Comment