This is a detailed article on how to avoid failures in execution of big data analytics projects.
Build AWS parallel processing infrastructure for an embarrassingly parallel problem. Goal is to invoke the same R script in many simultaneous runs with high computational speed per run. Small datasets are inputted from and outputted to an existing S3 bucket. Open to suggestions (EC2 Spot Instances, AWS Parallel Cluster, HCP).