Triggered by the RStudio blog article about feather I did the one line install and compared the results on a data frame of 19 million rows. First results look indeed promising:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 # build the package > devtools::install_github("wesm/feather/R") # load an existing data frame (19 million rows with batch job execution results) > load("batch-12-2015.rda") # write it in feather format... > write_feather(dt,"batch-12-2015.feather") # ... which is not compressed, hence larger on disk > system("ls -lh batch-12-2015.*") -rw-r--r-- 1 dirkd staff 813M 7 Apr 11:35 batch-12-2015.feather -rw-r--r-- 1 dirkd staff 248M 27 Jan 22:42 batch-12-2015.rda # a few repeat reads on an older macbook with sdd > system.time(load("batch-12-2015.rda")) user system elapsed 8.984 0.332 9.331 > system.time(dt1 <- read_feather("batch-12-2015.feather")) user system elapsed 1.103 1.094 7.978 > system.time(load("batch-12-2015.rda")) user system elapsed 9.045 0.352 9.418 > system.time(dt1 <- read_feather("batch-12-2015.feather")) user system elapsed 1.110 0.658 3.997 > system.time(load("batch-12-2015.rda")) user system elapsed 9.009 0.356 9.393 > system.time(dt1 <- read_feather("batch-12-2015.feather")) user system elapsed 1.099 0.711 4.548 So, around half the elapsed time and about 1/10th of the user cpu time (uncompressed) ! Of course these measurements are from file system cache rather than the laptop SSD, but the reduction in wall time is nice for larger volume loads.
...