Write an additional implementation of A0 using Hadoop & MapReduce, then re-evaluate them by analogy to A1. Use a pseudo-distributed cluster to evaluate the MapReduce implementation.

Collect and analyze execution-time information of all implementations. Use the following big corpus (SHA512).

Prepare a report using R Markdown highlighting the differences in the execution profile of the variants. In particular, compare the difference in execution profiles of the Hadoop/MR vs other variants and comment on the reasons for performance increases or decreases.

Deliverables

Deliver the code of the implementation and the performance report via a private repository on CCS GitHub or Github and share the repository with the instructors and TAs:

Repository name: pdpmr-f17-your-name-a2 (replace your-name with your name in lowercase letters with dashes between words).

Required files:

Submissions not adhering to the prescribed structure will be penalized.