I have a file with a variable state that has 55 states abbreviations. I will like to replace them with the full state name. Is there an easy way to do this in R?
2 Answers
You can use a named list and take advantage of the datasets package :
states <- setNames(as.list(datasets::state.name), datasets::state.abb)
states[["NY"]]
[1] "New York"
-
It works without applying to my file, when I apply to my file it gives and error: Error in
$<-.data.frame
(*tmp*
, State, value = list(AL = "Alabama", : replacement has 50 rows, data has 53. How should I proceed? Thank you! Jun 7, 2020 at 18:00 -
This is probably due to the fact that the states list I used (included in base R) only has 50 states. You can get the structure of the list with dput(states) and manually add the three missing states to create a full list.– WaldiJun 7, 2020 at 18:58
If you are doing this for mapping purposes, this longer answer that draws from Waldi's answer above may also be helpful. Given that you have a data frame with state abbreviations and some number to plot, like:
df <- structure(list(abbrev = c("AK", "AL", "AR", "AZ", "CA", "CO",
"CT", "DC", "DE", "FL", "GA", "GU", "HI", "IA", "ID", "IL", "IN",
"KS", "KY", "LA", "MA", "MD", "ME", "MI", "MN", "MO", "MS", "MT",
"NC", "ND", "NE", "NH", "NJ", "NM", "NV", "NY", "OH", "OK", "OR",
"PA", "PR", "RI", "SC", "SD", "TN", "TX", "UT", "VA", "VI", "VT",
"WA", "WI", "WV", "WY"), n = c(4L, 20L, 11L, 51L, 98L, 11L, 68L,
37L, 102L, 116L, 32L, 9L, 29L, 72L, 9L, 191L, 60L, 23L, 12L,
32L, 43L, 24L, 8L, 65L, 38L, 44L, 15L, 12L, 59L, 11L, 38L, 48L,
64L, 14L, 21L, 159L, 144L, 34L, 17L, 157L, 18L, 20L, 48L, 13L,
26L, 175L, 10L, 19L, 4L, 96L, 7L, 104L, 15L, 2L)), row.names = c(NA,
-54L), class = c("tbl_df", "tbl", "data.frame"))
You can pull from built-in data to get state name, abbreviation, and even lat/long:
state.info <- inner_join(data.frame(state=state.name,
long=state.center$x, lat=state.center$y, stringsAsFactors=FALSE),
data.frame(state=state.name, abbrev=state.abb))
Then you can bring the datasets together so you have the state abbreviations supplemented by full name as well as lat long:
> inner_join(df, state.info, by="abbrev")
# A tibble: 50 x 5
abbrev n state long lat
<chr> <int> <chr> <dbl> <dbl>
1 AK 4 Alaska -127. 49.2
2 AL 20 Alabama -86.8 32.6
3 AR 11 Arkansas -92.3 34.7
4 AZ 51 Arizona -112. 34.2
5 CA 98 California -120. 36.5
6 CO 11 Colorado -106. 38.7
7 CT 68 Connecticut -72.4 41.6
8 DE 102 Delaware -75.0 38.7
9 FL 116 Florida -81.7 27.9
VLOOKUP
in Excel on your data. it takes less than 2 minutes.