JSON Pretty Printer in Python

JavaScript Object Notation (JSON) is a very useful text based data format that is relatively compact, human readable and avoids the angle brackets tax of XML.  However to shrink JSON as much as possible for transmission without using compression it is not uncommon to strip out the whitespace from a JSON blob to make it more compact.  This is quite a reasonable approach but it does JSON harder to read for humans, fortunately reversing the striping is trivial using Python to ‘pretty print’ the JSON file.

"""Pretty print an JSON file into a human readable format."""
import json, argparse

# Use arg parse to make script a little bit more user friendly.
PARSER = argparse.ArgumentParser(description='JSON pretty printer script.')
PARSER.add_argument('infile',metavar='In', help='Input JSON file name.')
PARSER.add_argument('outfile',metavar='Out', help='Output JSON file name.')
ARGS = PARSER.parse_args()

# Read in input JSON file.
RAW = json.load( open(ARGS.infile, "r") )

# Pretty print JSON and write to output file.
PRETTY = json.dump( RAW, open(ARGS.outfile, "w"), sort_keys=True, indent=4, separators=(',', ': '))

The astute reader will realise that the interesting part of this script is the json.dump call and the rest of the script is simply there to make it easy to call from the command line!

Counting words and images in RSS posts with python

I have often wondered how many words I should be aiming to have per blog post or how many images I should include. This lead me to reach for python and whip up the following script which will grab the posts from a sites RSS feed, count the image tags, strip the HTML and then count the words left.

This script lets me see the minimum, maximum and average number of images and words for some of my favourite blogs, which have an average word count of about 330 words and a dozen images. This is reassuring as I was never sure how many words justified a post and its clear indication that I should consider using images much more..

### Script to fetch an RSS feed and work out the min, max, average word and 
### image counts of posts in the feed.

import xml.etree.ElementTree 
import urllib2
from HTMLParser import HTMLParser

class MLStripper(HTMLParser):
    ### From http://stackoverflow.com/questions/753052/
    def __init__(self):
        self.fed = []
    def handle_data(self, d):
    def get_data(self):
        return ''.join(self.fed)

def strip_tags(html):
    ### From http://stackoverflow.com/questions/753052/
    s = MLStripper()
    return s.get_data()

def getPostStats( url ):
    ### Fetch specified feed and count words and images per post.

    # Download the feed.
    raw = urllib2.urlopen(url)

    # Parse the feed xml.
    parsed = xml.etree.ElementTree.parse(raw)
    root = parsed.getroot()
    channel = root.find('channel')

    titles = []
    images = []
    words = []

    # Find the articles (items)
    for item in channel.findall('item'):
        text = item.findtext('description','')

        # RSS 2.0 uses content instead of description for the post body!
        namespaces = {'content': 'http://purl.org/rss/1.0/modules/content/'} 
        content = item.findtext('content:encoded','', namespaces=namespaces)

        if len(text) < len(content):
            text = content

        # Count the number of images
        # Count the number of words
        text = strip_tags(text)
        # Get the post title

    return titles,words,images

def getMinMaxAvg( counts ):
    ### return the min, max & average counts.

    minCount = min(counts)
    maxCount = max(counts)
    avgCount = sum(counts,0) / len(counts)

    return minCount,maxCount,avgCount 

if __name__ == "__main__":
    # List of interesting blogs.
    TESTURLS = [ {"name":"Lady Slider", "url":"http://www.ladyslider.com/blog?format=RSS"},
                 {"name":"Shoot Tokyo", "url":"http://shoottokyo.com/feed/"}, 
                 {"name":"DeadPxl", "url":"http://dedpxl.com/feed/"}, 
                 {"name":"circa 1983", "url":"http://blog.circa1983.ca/rss"}, 
                 {"name":"David Duchemin", "url":"http://davidduchemin.com/feed/"} ]

    # Go find the image and word counts for each blog!
    for TEST in TESTURLS:
        TITLES,WORDS,IMAGES = getPostStats( TEST["url"] )

        print "--- %s ---" % TEST["name"]

        for n in range(0,len(TITLES)):
            print "  '%s' - %d words, %d images." % (TITLES[n],WORDS[n],IMAGES[n])

        print "Posts - %d." % len(TITLES)
        MIN,MAX,AVG = getMinMaxAvg(IMAGES)
        print "Images  - Min: %d Max: %d Avg: %d." % (MIN,MAX,AVG)
        MIN,MAX,AVG = getMinMaxAvg(WORDS)
        print "Words   - Min: %d Max: %d Avg: %d." % (MIN,MAX,AVG)

You should get output like the following for each RSS feed:

--- David Duchemin ---
'PHOTOGRAPH, Issue 10' - 203 words, 16 images.
'Make It Now.' - 524 words, 1 images.
'A World of Stories' - 378 words, 3 images.
'Light , Gesture, & Color' - 449 words, 4 images.
'Cape Churchill Polar Bears' - 714 words, 7 images.
'Hudson Bay Polar Bears' - 330 words, 1 images.
'Study the Masters: Margaret Bourke-White' - 655 words, 4 images.
'About Critique' - 693 words, 1 images.
'Inspired by the Tangible' - 518 words, 1 images.
'The Created Image, Vol.02' - 390 words, 3 images.
Posts - 10.
Images - Min: 1 Max: 16 Avg: 4.
Words - Min: 203 Max: 714 Avg: 485.

Listen, learn … then lead!

An interesting TED talk from four star General Stanley McChrystal about how the events following 9/11 lead to a new style of war and a requirement for a very different form of leadership of the widely distributed military response.

I think this is a worth while talk for any leader to watch and hear about how the General adapted in the face of change..

Generating passwords with Python

Occasionally I find myself lacking inspiration for a password that I will not use frequently which I want to be secure and that I don’t mind storing in a secure password manager. When this happens I use the very handy UUID module in the Python standard library to generate me a semi-decent password.

"""Generate a string suitible for password usage using the UUID module."""

from uuid import uuid4

print str(uuid4())

This will produce output like the following:

The main drawback with this approach is the generated passwords are not easily rememberable by the average human being so you need to store it somewhere safe and secure. If you lose the password or forget it your stuffed!

Converting Lightroom GPS coordinates for Google Maps

I have wanted to add a map of the locations of the photographs on my photo blog SeeStockholm.se for a while now.  I have the coordinates in Lightroom for the images in the degrees, minutes, seconds (DMS) format e.g. 59°16’31” N 18°19’8″. However Google Maps uses the decimal degrees (DD) format e.g. 59.2753 N 18.3189 E.  

I needed a way to convert the coordinates from one representation to the other. After a bit of googling and some experiementation I wrote the following Javascript functions to convert from DMS format to DD format and create a google maps google.maps.LatLng object.

    function ConvertDMSToDD(days, minutes, seconds, direction) 
        var dd = parseFloat(days) + parseFloat(minutes/60) + parseFloat(seconds/(60*60));
        if (direction == "S" || direction == "W") {
            dd = dd * -1;
        } // Don't do anything for N or E
        return dd;

    function ParseDMS(input) 
        var parts = input.split(/[^\d\w]+/);
        var lat = ConvertDMSToDD(parts[0], parts[1], parts[2], parts[3]);
        var lng = ConvertDMSToDD(parts[4], parts[5], parts[6], parts[7]);
        return new google.maps.LatLng( lat, lng );

This makes the conversion process simply a case of calling ParseDMS with a DMS format coordinate in string form and it will return a LatLng object ready for use in Google Maps. These conversion functions allowed me to easily implement the map feature for my photo blog.

Exceeding the forty hour work week

To follow on from ‘How to Make work-life balence work‘ video Alison Morris from Online MBA has a pretty interesting inforgraphic regarding the effect of the current trend in America to work more than forty hours a week: it is pretty sobering stuff!

While Europe tends to better at work-life balance than North America there is still room for improvement on both sides of the Atlantic.  I believe it is in an employers best interests to not over work their staff if they want to get the best quality of work.