Write an additional implementation of A0 using Hadoop & MapReduce, then re-evaluate them by analogy to A1. Use a pseudo-distributed cluster to evaluate the MapReduce implementation.
Collect and analyze execution-time information of all implementations. Use the following big corpus (SHA512).
Prepare a report using R Markdown highlighting the differences in the execution profile of the variants. In particular, compare the difference in execution profiles of the Hadoop/MR vs other variants and comment on the reasons for performance increases or decreases.
Deliver the code of the implementation and the performance report via a private repository on CCS GitHub or Github and share the repository with the instructors and TAs:
github.ccs.neu.edu
, share it with aviralgoel
, samarthshetty1990
, and ksiek
.github.com
, share it with aviralg
, kondziu
, janvitek
, and shettysamarth
.Repository name: pdpmr-f17-your-name-a2
(replace your-name
with your name in lowercase letters with dashes between words).
Required files:
src/
(directory containing the sources of the implentation)README.md
(Markdown file containing the description of the implementations)Makefile
(Configuration file for the make command with the following rules: build
— builds all the implementations, run
— runs the variants and generate the report, all
– build and run; be sure that build
works on somebody else’s machine)report.Rmd
(The report as described in the previous section)report.html
(An HTML rendering of report.Rmd
)input/books/
(A directory containing input files used in the report for A0/A1; please gzip
all text files to save space)input/big-corpus/
(An empty directory. Keep your copy of the big corpus there, but do not push it to git. A code reviewer will put their own copy of the corpus there too.)Submissions not adhering to the prescribed structure will be penalized.