we are core test team in top 1 software company(*racle). and be responsible for all the products testing.
so if you have some data want to retrieve from website, we are best:
depend on website complex we have:
1, simple solution with Python, you can get your data in 2 or 3 hours.
2, website did some tech to forbidden crawler.
we have selenium, it could auto open real explorer(Firefox/IE/chrome) and store data.
3, more complex website, which may need log in or dynamic generate page with ajax;
website is too big. like amazon.
we have java solution with distributed crawler system. (using storm or Hadoop). we can use >1000 instances to crawl.
4, website use some special tech to forbidden crawler. like *racle EBS system. the common crawler tech will not work.
we have OATS(Openscript) solution. it have more power than selenium, it could do crawl with distribute system , but it need buy. it is very expensive, so if you just need data, we will bought it and give you data.
for delivered data, we can give txt,excel,mysql,mongodb format. and if you have amazon aws or other cloud system, we can help to upload data.