A common knitr issue on Windows
Running R scripts on a Windows machine is equivalent to a dive into enconding hell.
In effect, your non-English data most likely contains characters like Ä, ü, è or š, or even 语言.
In all cases, the only serious way of dealing with these, in fact with any data in an international context, is adopting UTF-8 encoding.
This is why newest R packages like knitr or quanteda work with UTF-8 internally. The problem is: Windows doesn’t. UTF-8 has been around since 1996 and your Windows 10 operating system – unlike Linux and OS/X – most likely runs a Latin-1 or other Widows codepage local behind the scenes. I’ve given up programming R on Windows for that very reason and happily write scripts on Ubuntu ever since, whenever I can. Chances are, though, your corporate environment runs on a park of Dell machines with Windows installed and you cannot change your OS.
Bottom line: if you try to knit a Rmd script to html or shiny you are very likely to see the following error:
Error in eval(expr, envir, enclos) : unknown column ‘x’
By now, you have already tried everything, like setting the encoding of your data with iconv() or Encoding()
or a stringr() wrapper around such functions. You examine your vector of characters and see that some have been converted to UTF-8 while others remain “unknown”.
This is normal, bugged behavior. In effect, as mentioned here, Encoding() “cannot distinguish ASCII from UTF-8 and the bit will not stick even if you set it”:
> txt <- "€ euro"
> Encoding(txt)
[1] "UTF-8"
> txt2 <- "euro" > Encoding(txt2)
[1] "unknown"
> Encoding(txt2) <- "UTF-8"
> Encoding(txt2)
[1] "unknown"
Code language: R (r)
The workaround
This is really terrible but there is a workaround. A very ugly one but that does work: export your data.frame to a CSV temporary file and reimport with data.table::fread() , specifying Latin-1 as source encoding.
package(data.table)
df <- your_data_frame_with_mixed_utf8_or_latin1_and_unknown_str_fields
fwrite(df,"temp.csv")
your_clean_data_table <- fread("temp.csv",encoding = "Latin-1")
Code language: JavaScript (javascript)
Tested with satisfaction on a Windows 10 machine.