Hi,
You want to replace the current scraping method with "on-demand" scraping. Example, if during 3 days only 10 users view a product, you scrape 10 times instead of 250.000. Ok, makes sense.
- "...scrape and update DB, there would be a considerable delay"
What is "considerable"? The scrape might last only 0.5 seconds, and the DB update even less.
- "...after DB update, info not updated instantly due to cache and indexes"
You need to reindex and uncache the product info, but this can be done for a single product (quicker), not all products.
- "Amazon blocking or amazon throttling"
Sometimes it might happen. I'd put a timeout so if no response from Amazon within 3 seconds then just show the old price (and message "current price is being refreshed, please check again soon"). A log file will track timeouts.
- "Proxies with IP rotation to avoid being detected"
You need to subscribe to a reliable proxy service.
- "Use python with wget, scrapy, urllib and other tools..."
I'll use a tool if you want but I prefer regular expressions because it's faster.
As for experience, I've previously worked on the aspects you require: Magento caching, indexing, custom modules. Amazon AWS, scraping, proxies.
I'm available to discuss all the above further of course. Thanks for considering my services.
Stan
All my freelancer.com feedback: http://www.freelancer.com/u/kilobytes.html
All feedback on a single page: [login to view URL]