Data Mining with Apache Spark, Pandas and IPython (Proof of Concept)

This post is written as an IPython Notebook page, you can continue reading below or open it inside nbviewer.

Auditing and Analysing Image (Photo) Sizes

IPython Notebook File

Environment: Python 3.4
Main Libraries Used: reportlab, pandas, vincent

Convert IPython Notebook to PDF

Environment: Ubuntu 13.04, Python 3.3

This should work for ipython starting from version 1.0 (otherwise nbconvert can be installed separately):

my_notebook.pdf will be created


Installation of some additional packages and libraries may be required:


More about nbconvert formats and options at: http://ipython.org/ipython-doc/rel-1.0.0/interactive/nbconvert.html


Interactive Data Analysis Setup on AWS


Current setup includes: Linux Ubuntu Server on EC2 (Elastic Computing Cloud) with Postgres and Python Data Analysis Tools (IPython, Numpy, Pandas, etc.) + Elastic IP + Load Balancer + EBS (Elastic Block Store)