this is a part of the description
it starts with the output from an extractprogram
like this
"Company","Address","Telephone","Mobile","Website","Email"
"Apotheek Centrum Schelle","Provinciale Steenweg 95 2627 Schelle","03 887 54 72","","",""
"De Lindeboom","Nationalestraat 119 2000 Antwerpen","","","",""
"Morel E","Brusselsesteenweg 298 2800 Mechelen","015 41 55 65","","",""
"Van De Mierop-Mestdagh BVBA","Lindenlaan 66 2340 Beerse","014 61 13 64","","",""
"Hooijmaaijer J","Clemenceaustraat 43 2860 Sint-Katelijne-Waver","015 21 22 93","","",""
"Horsten L NV","Vrijheid 98 2320 Hoogstraten","03 314 57 24","","",""
"Vandeweyer R","Oranjestraat 94 2060 Antwerpen","03 233 82 75","","",""
"Danckaert J","Ter Heydelaan 173-175 2100 Deurne (Antwerpen)","03 324 95 30","","",""
"Vermylen K BVBA","Leo Kempenaersstraat 7 2223 Schriek (Heist-Op-Den-Berg)","015 23 33 70","","",""
"Onze Apotheek cv","Antwerpsesteenweg 146 Bus 1 2500 Lier","","","",""
"Peleman","Schipstraat 1 2870 Puurs","03 889 23 63","","",""
"Ter Borcht BVBA","Bernard van Orleyplein 5 2650 Edegem","03 440 64 91","","",""
"De Lindeboom-Apotheek","Nationalestraat 119 2000 Antwerpen","","","",""
the program must work with diffrent steps
the first step is checking if there is a site ( iff i the input fille already has an url it can go imidiatly
to step 2 )
exemple "Pica Pica","Hofkwartier 20 2200 Herentals","014 22 02 55","",""
try the following urls [login to view URL] [login to view URL] they both have a site so it must be
crawld to look for an adress in this example this url [login to view URL] is the right one it can
use pica pica as businessname in the output and must add the url in the output
iff in the businessname is bvba , nv , one letter , and 't it may not be used in the url
exemple pica pica nv only try [login to view URL] or [login to view URL] not [login to view URL]
another example "Sleepwise","Turnhoutsebaan 328B 2970 Schilde","03 385 31 21","",""
url [login to view URL] is a site crawl for the adres
this is the adress on the site
Turnhoutsebaan 225 - B-2970 Schilde
only the number 225 is diffrent , make the program so that it then uses this adress because only 1
thing is diffrent but use the adress from the site then in the output
also iff there is no .be try .com then like this example
"Poppels Meubelhuis","Zandkuilstraat 23 2382 Poppel (Ravels)","","",""
there is no .be but [login to view URL] is a site an on that site is the right adress
Poppels meubelhuis, Tilburgseweg 64 (Slaapwinkel),
Zandkuilstraat 23 (Woonwinkel), B-2382 Poppel, België
tel.: +32 (0)14 65 78 54, fax: +32 (0)14 65 94 69
e-mail:
also here can the e-mail and faxnumber being added to the output the things between ( ) are not important
example "C-Meubel","Antwerpsesteenweg 19 2840 Rumst","015 31 77 16","",""
there is no [login to view URL] so also try [login to view URL] and that does exist there is also
the adress on the site so it is a good one las e-mail that can be addad to the output
another example "VI-Spring","Dorp 78 2230 Herselt","014 54 55 11","",""
has a site [login to view URL] but ther is not the adress so this cannot be use and this
businessname must be checked in the next step
another exaple businessname Hof van Aragon NV ( something between ( ) must not been used )
[login to view URL] is a site bus imediatly rediricts you to [login to view URL] it must then look
for the adress on [login to view URL]
another example businessname Zuid-West
[login to view URL] is a site and has the right adress
another example "Odrada Interieur NV","Molsesteenweg 46 2490 Balen","014 34 66 00","",""
no site for [login to view URL] , [login to view URL] , [login to view URL] [login to view URL]
but for [login to view URL] is a site and that site has the right adress
another example businessname is "de lindeboom" the urls that need to be tryd are [login to view URL]
[login to view URL] [login to view URL] [login to view URL] [login to view URL] [login to view URL]
iff there are sites the site must be crawld for the adress ( if the url i rediricted then that site must be crawled )
in this case [login to view URL] is the right site
iff there is an e-mail also add it to the output
----------------------------------------------------------------------------------------------------
Step 2 ( iff it has already a site )
if the listing has a site the program must check iff the site is still online
then the program must check iff the bussinesname is in the title
example "Luigi Lloyd Loom","Puursesteenweg 392B 2880 Bornem","03 899 26 35","","http://www.luigi.be"
the site is still online [login to view URL] the title is
<title>:: Luigi - Original Lloyd loom - Exclusive Rattan furniture - Outdoor furniture - Bedrooms ::</title>
Luigi Lloyd Loom is in the title so the businessname can be the same
( iff there was only luigi in the title the businessname must be changed in luigi
iff there is noting in the title crawl te site looking for the adress and the businessname , iff it doesnt
find an adress and a part ofthe businessname it must go to step 3 if it finds the adress and part of the
bussinesname ( example only luigi ) then it must use the part of the businessname ( luigi and not luigi Lloyd loom )
and adress in the output