Using data from a single MySQL / MariaDB table, create a statistical model that outputs averages based on column pairs (such as make->model->price). The table has 13 columns, but for this statistical model only 5 columns will be used. Will calculate averages based on the whole data set but will also have the ability to calculate the same statistics on just a subset of data. Code will be in Python. This project will be coded in such a way, as to make it really easy to incorporate more statistical models, on the same database table, at a later time.
Will provide database schema and a sample .sql file with ~500,000 rows.
Do you have experience in python 3 or just 2? Do you have experience in optimizing statistical calculations in python 3? Please provide a sample.
* python >= 3.4.3
* SQLAlchemy >= 1.0.7, unless you have a really good reason to use something else
* comments & code in English only
* indentation: 4 spaces
* good, thorough code comments
* optimized for speed, low CPU usage and low memory usage
* run without issue using database table with 50,000,000 rows
* run without issue with 350 such calculations (queries) per minute
* final output in JSON which should be easy to incorporate at a later time (not in the scope of this project) into an API
Do you forsee that this will run without issue (see Requirements) on a modern not-oversold VPS with 1 CPU core and 1 GB ram?
Use of any non-standard library (built-in) Python modules and frameworks must be approved, unless mentioned in Requirements above.
Will of course give more specific details to qualified bidders and upon request. Will also work with the winning bidder to help more quickly familiarize him/her with the specifics of this type of data and issues that have been faced before.
Please note once the above project is successfully delivered, there will be multiple other projects to further expand the successfully delivered project.