Conversion of plist and pdf files using python from S3
$30-250 USD
Завершено
Опублікований over 11 years ago
$30-250 USD
Оплачується при отриманні
We have an S3 server where our customers can upload various files, including plists and pdfs.
We need to perform on a regular basis the following operations, using python on an EC2 instance:
1) Step 1
- Read pending S3 log files, and in each log file:
* List all plist files uploaded by customers
* List all pdf files uploaded by customers
- Archive the read S3 files (so that they are not processed a second time later)
2) Step 2
- For each listed pdf file, do the following
* create an image for each page, in 3 different sizes (sizes and naming conventions to be specified later) and store the images on the S3
* list all internal and external links in the pdf, with the position and size of the "rect" of the link; put this list in a json file on the S3
3) Step 3
- For each plist file, convert it to json, and store the json file on the S3.
IMPORTANT GUIDELINES
The deliverable is the python code. We will ask you to use the git repository on Github, and to publish your work regularly for us to monitor the progress of the work.
Please avoid generic bids such as " "Hello, Let me do it for you." (we will not read them).
In order to show that you have read and understood our requirements, please start your bid with "Python Project"; if your bid concerns only 1 or 2 of the steps specified above, please let us know clearly.
Python Project. Sir, I have over 6 years of Python experience. I know exactly which Python modules to utilize to accomplish what you need. I can get this done quickly, efficiently and to your satisfaction. More details in PMB. Thanks.
Python Project.
How are you thinking to archive the read S3 files ? Just remove the files from the server ? Or store the file name in a plain txt or SQLite Database (better than a full-fleded database for this simple task) ?
The "image for each page" you say is a screenshot-like image ? I mean, the image would be a clone of the pdf page ? PIL (Python imaging Library) suits fine for this task.
My bid concerns all 3 steps.