The encode and decode commands in Stata allow you to convert string variables to numeric variables (encode) and numeric variables to string variables (decode).
The encode command assigns a number to each different string, starting with the number 1 and continuing on (2, 3, 4, etc), while applying a value label to each number.
The decode command will convert a numeric variable to a string variable when that variable has a value label attached to it.
A value label can easily be created in Stata and attached to the variable you want to decode. The value label is necessary because it assigns a string description to each number, while keeping the number intact. Without a value label, Stata does not know what string you want to convert each different number into, and so is unable to decode the variable.
If you would simply like to convert your numeric variable into a string with the numbers saved as strings, use the tostring command.
How to Use:
To encode your string variable to make it a numeric variable:
encode variable_name, generate(new_variable_name)
To attach value labels and then decode your numeric variable to a string variable:
label define label_name 0 "string1" 1 "string2"
label values variable_name label_name
decode variable_name, generate(new_variable_name)
I have a dataset with the string variable “cancer” which lists different types of cancer, and the numeric binary variable “gender” which records gender as either 0 or 1 for each observation. A snapshot of this dataset is shown below:
I want to convert my cancer string variable into a numeric variable so that each cancer is represented by a number, and I am going to do this using the encode command. My gender numeric variable contains either 0 or 1, where 0 represents “female” and 1 represents “male”. I also want to convert my 0/1 numeric gender variable into a male/female string variable using the decode command.
In the command pane I type the following:
encode cancer, generate(cancer2)
label values cancer2 .
label define sex 0 "female" 1 "male"
label values gender sex
decode gender, generate(gender2)
The dataset now looks like this:
As you can see, the gender is now a string rather than a binary variable, and the cancer variable now lists the cancers by assigning each a number. In the above code, the line label values cancer2 . removed the labels from each cancer so when you browse the data you only see the cancers as their numeric representation. To add the value labels back to the cancer variable so you can see the names again, you use label values cancer2 cancer2. This code works because when you encoded the cancer variable and generated the new cancer2 variable it created a value label and named it the same name as the newly generated variable. There is an option to name the value label separately. To learn more about this you can see the help file with the command help encode.
There are many uses for encoding and decoding variables. One reason common to medical research is to assign numbers to things like cancers when doing analysis. This removes any perceived bias, as the researcher does not know which number represents which cancer. Stata makes this easy to do, and provides a simple mechanism of re-attaching the labels to their numeric identifiers when it is time to present the analysis.