http://www.barik.net/posts/fuse-a-reproducible-extendable-internet-scale-corpus-of-spreadsheets/