Author Archive

table_diff – Python micro library for Auditing Data Changes (drafts)

Bitbucket Repository: bitbucket.org/yurz/table_diff

This script compares two CSV files with the same structure and provides list of differences – entries that have been removed or added and per column changes for each entry.

One of the possible scenarios – you have two copies of some data extract in different points of time and want to have an audit log of all the changes. Some people may say “why don’t just use MS Excel vlookups” and even though I do agree that MS Excel is a great tool and vlookups might be convenient for comparing one-two columns, but imagine if your data set contains a lot of columns or/and if you want to set up an automated regular monitoring and keep a log of changes – this is what this script was designed for.

With some minor amendments the same approach can be used for monitoring changes in database tables or any other tabular data structures.

Usage:

Sample data:

 

Image

 

Optional Parameters:
sqlite_path – path to SQLite file to be created (otherwise – SQLite runs in memory).
fields_to_check – list of columns to check (and report about, by default – every column is checked).
fields_to_ignore – similar to above, in case it is easier to provide a list of columns to be ignored.
keep_tables – working tables in SQLite will be preserved if “yes”
diff_csv – name/path for the report file (“diff.csv” by default)

Melbourne Tweet Cloud


Selection_339

URL: words.yznotes.com

If a picture is worth a thousand words, then the picture on the left should be an equivalent to 1,100 words, at least :)

This App collects tweets posted within 10km of Melbourne CBD, does some Natural Language processing and renders top 100 words into a Word Cloud.
Refreshed every 10 minutes 24/7. Ran on Amazon Web Services.

This is Work in Progress, there will be new features added with time.

Census 2011 Heat Map

Australian Census 2011 Heat Map

URL: census2011.yznotes.com

This Application aims to make Australian Census Data more accessible by rendering it on Google Maps.
There are Data Sets (at suburb level) from 46 Categories of the ABS Census 2011 Basic Community Profile files available in the current app release. All together it adds up to 7,942 various data sets.

You have to use Google Chrome or Firefox to get full functionality of the app. Unfortunately, Microsoft Internet Explorer and Apple Safari do not seem to support some of the web standards used in this app.

If you view this app from a device with a small screen, some features will be disabled / sacrificed for better user experience.

Auditing and Analysing Image (Photo) Sizes

IPython Notebook File

Environment: Python 3.4
Main Libraries Used: reportlab, pandas, vincent


VTAC Tertiary Offers 2014 (Round 1) Heat Map

Follow this link for a live demo: http://dev01.yznotes.com/viz_map_vtac2014/#

vtac_2014


Most of the JavaScript code is a copy of a great work done by Peter Neish for his Australian Election 2013 heat map
Data preparation and aggregation was done using Python Data Analysis library Pandas
Use Select Uni drop-down menu to switch between Victorian Universities