Init script for Google Dataproc cluster with Apache Spark, Python 3 (miniconda) and some pre-installed libraries for data processing

Open in Bitbucket

Sample commands:

copy this shell script to dataproc init directory:
gsutil cp jupyter-spark.sh gs://dataproc-inits/

start cluster:
gcloud dataproc clusters create jupyter-1 --zone asia-east1-b --master-machine-type n1-standard-2 --master-boot-disk-size 100 --num-workers 3 --worker-machine-type n1-standard-4 --worker-boot-disk-size 50 --project spark-recommendation-engine --initialization-actions gs://dataproc-inits/jupyter-spark.sh --scopes 'https://www.googleapis.com/auth/cloud-platform' --properties spark:spark.executorEnv.PYTHONHASHSEED=0

change number of workers:
gcloud dataproc clusters update jupyter-1 --num-workers 3

initiate ssh channel:
gcloud compute ssh --zone=asia-east1-b --ssh-flag="-D 1080" --ssh-flag="-N" --ssh-flag="-n" jupyter-1-m

start jupyter session:
chromium-browser --proxy-server="socks5://localhost:1080" --host-resolver-rules="MAP * 0.0.0.0, EXCLUDE localhost" --user-data-dir=/tmp/

Interactive c3.js/d3.js charts inside Jupyter Notebook

This post is written as an IPython Notebook page, you can continue reading below or open it inside nbviewer.

accessing AWS RedShift with Python Pandas via psycopg2 driver

Data Mining with Apache Spark, Pandas and IPython (Proof of Concept)

This post is written as an IPython Notebook page, you can continue reading below or open it inside nbviewer.

Crime by Local Government Area and Offence Type – Dashboard

Crime by Local Government Area and Offence TypeURL: http://dataviz.yznotes.com/crime-vic-2014
This App visualises Number of offences by geographic area (VIC, Australia) and offence type, year ending December 2014. The original data used in this application can be downloaded from data.vic.gov.au web site.

The Application renders data as a multidimensional visualisation. All charts, including Crime Map are inter-connected and interactive, so you should be able to narrow down data to a subset of your interest within just a few mouse clicks. You can switch between Actual Offence Numbers and Crime Rate data views / measurements in the navigation bar.