Tag: drafts

table_diff – Python micro library for Auditing Data Changes (drafts)

Bitbucket Repository: bitbucket.org/yurz/table_diff

This script compares two CSV files with the same structure and provides list of differences – entries that have been removed or added and per column changes for each entry.

One of the possible scenarios – you have two copies of some data extract in different points of time and want to have an audit log of all the changes. Some people may say “why don’t just use MS Excel vlookups” and even though I do agree that MS Excel is a great tool and vlookups might be convenient for comparing one-two columns, but imagine if your data set contains a lot of columns or/and if you want to set up an automated regular monitoring and keep a log of changes – this is what this script was designed for.

With some minor amendments the same approach can be used for monitoring changes in database tables or any other tabular data structures.

Usage:

Sample data:

 

Image

 

Optional Parameters:
sqlite_path – path to SQLite file to be created (otherwise – SQLite runs in memory).
fields_to_check – list of columns to check (and report about, by default – every column is checked).
fields_to_ignore – similar to above, in case it is easier to provide a list of columns to be ignored.
keep_tables – working tables in SQLite will be preserved if “yes”
diff_csv – name/path for the report file (“diff.csv” by default)

Postgres Upgrade from 9.1 to 9.3 and post-upgrade configs (draft notes)

Environment: Postgres 9.1 running on Ubuntu 13.04 on AWS EC2 instance with a single cluster on AWS EBS

Some pre-steps can be found here:
https://wiki.postgresql.org/wiki/Using_pg_upgrade_on_Ubuntu/Debian

Get details of the present clusters:

Stop all Postgres services:

Drop new 9.3 cluster created by default during installation of postgres 9.3:

Create a new 9.3 cluster from existing 9.1 one:

Check if it worked:

Make sure you can access your data: if ok – drop 9.1 cluster:


All steps below are optional: Move data storage to an alternative location
In my case it’s an instance of Amazon EBS (Elastic Storage) mounted to /data/main

Stop all Postgres services:

Move actual directory :

Amend 9.3 configs:

Comment out old data_directory path and point it to the new location:

Check if it worked:

Start postgres:

Optional – remove /var/lib/postgresql/9.3/main created by default during 9.3 installation:

INSTR() / SUBSTRING_INDEX() – SQLite Way (with help of Python)

Since SQLite can’t do INSTR (Oracle/MySQL), SUBSTRING_INDEX (MySQL) or CHARINDEX (SQL Server), this is an attemptĀ to do it with help of Python.
It’s one of my very first experiments with Python (although I’m falling in love already :) ) so will need to revisit the actual code, marking it as drafts meanwhile.

 

Dynamic Pivoting in Oracle

Preparing dummy data:

 

Pivoting: