Repair a pandoc-generated LaTeX table with R

Pandoc is a great piece of software but it is not always kind to HTML tables when converting to LaTeX. Especially tables containing <tr> elements with rowspan or <td> elements with colspan attributes end up as sequences of lines of text, not embedded in a table environment like longtableand devoid of both line endings (\\) and column separators (&).

This is a brief R code to help you repair this.

First, add LaTeX table line endings (\\) wherever necessary , i.e. wherever you want them to be, based on their position in the original html table.

Then select the whole block of text, and copy it to the clipboard (with Ctrl-C or Command-C) and execute the following chunk of R code:

library(clipr) # necessary only on first use
data <- read_clip()
data <- data[data != ""]
data <- str_replace_all(data,"([^\\\\])$","\\1 & ") %>% 
  stri_replace_all_fixed("\\\\","\\\\\n") %>%
  paste(.,collapse=" ")
data <- paste("\\begin{samepage}\n\\begin{longtable}{llllllllll}\n\\toprule\n\\midrule\n\\endhead\n",data,"\\bottomrule\n\\end{longtable}\n\\end{samepage}")
write_clip(data)Code language: PHP (php)

Then go back to your LaTeX file and paste the content of the clipboard (i.e., press Ctrl-V or Command-V) wherever you want; most probably as a replacement of the code you’ve just before copied.

Manual editting might still be necessary, but the most tedious part is done.

Leave a comment

Your email address will not be published. Required fields are marked *