There are millions of scientific publications available on the internet either as raw text or as PDF files. The aim of this project is to create software that is able to do the following:
1) Open a website link and look for a publication text
2) Follow all links and look for a publication text
3) Save title of publication in database
4) Save autor list of publication in database
5) Save journal name and page in database
6) save text in database, Abstract and introduction etc. in separate chapters.
7) if available, save PDF of publication in database
8) If only PDF is available, extract this data from the PDF and save
9) before saving the same publication, it needs to check if the publication already exists in our database
10) Create a panel where I can browse and view the publications and where I can see the number of
In other words I need a program that I can direct to certain links and then use to extract all publication text as described above. I will then feed this program text (copied from websites) intersperced with website links that it should follow and serach for publications.
All of this should be web-based. I will have a server for you to use.
The interface should allow me to upload links, browse and search for publications and see how many are in the processing qeue and how many we have saved so far.
I will put 100% of the milestones online, but payment is after completion of the project. Please do not bid if this is not ok.
25 фрілансерів(-а) подали заявки на цю роботу; середня заявка - €630
Hi sir, I am scraping expert, I have did too many similar projects, please check my feedback then you will know. Can you tell me more details? then I will provide demo data for you. Thanks, Kimi
hi,i am expert in web scraping and interested in this project, let me do this work with perfection, accuracy and according to your requirements, i have done many similar project so its easy task for me. thanks