A web scraping program written in C# to crawl multiple competitors' websites, and record time-stamped product pricing to MSSQL tables with minimal impact on the competitors' website performance.
9/17/10 @ 10:00am US Central: Updated project with change from PHP and MySQL to C# and Microsoft SQL Server 2005 (with 2008 compatibility).
## Deliverables
These are the websites that will need to be parsed for product id and pricing (in order of importance): [login to view URL], [login to view URL], [login to view URL], and hamcity.com. UPDATE: [login to view URL], [login to view URL] will be excluded from this initial project but possibly added as a separate project later.
Since potentially thousands of individual product IDs are involved and we want to run this on a daily basis, we'd prefer not to hit their site with an individual search for each product ID but rather formulate and optimize generic search terms that will return lots of results.
It may not be necessary to obtain their entire list of products because we are mostly interested in the prices of the products that we sell and that they sell also. If necessary, we could compile a list of manufacturer part numbers that we are interested in comparing to our own.
We want to keep historical records of pricing, so every time we obtain pricing from a competitor's site, additional rows will be added to the table(s), rather than overwriting existing rows.
Update: To further clarify, the scope of this project does not include an interface for retrieving, viewing, or analyzing the database. However, error logging and email notification upon error will likely be necessary. <s>Each scraping job (including separate jobs for sites that require multiple scrapes to minimize server load) should be able to be launched from an url like: http://{server}/{path}/[login to view URL]
</s>