<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Endlessly Curious &#187; Python</title>
	<atom:link href="http://www.endlesslycurious.com/tag/python/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.endlesslycurious.com</link>
	<description>Programming, Productivity &#38; Software Development.</description>
	<lastBuildDate>Mon, 09 Jan 2012 09:00:50 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Top posts of 2011</title>
		<link>http://www.endlesslycurious.com/2012/01/09/top-posts-of-2011/</link>
		<comments>http://www.endlesslycurious.com/2012/01/09/top-posts-of-2011/#comments</comments>
		<pubDate>Mon, 09 Jan 2012 09:00:50 +0000</pubDate>
		<dc:creator>Daniel</dc:creator>
				<category><![CDATA[Links]]></category>
		<category><![CDATA[Miscellaneous]]></category>
		<category><![CDATA[links]]></category>
		<category><![CDATA[Lists]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://www.endlesslycurious.com/?p=2982</guid>
		<description><![CDATA[The top ten posts for 2011 according to Google Analytics were: Installing Python, MatPlotLib &#38; iPython on Snow Leopard. Finding duplicate files using Python. Getting started with Python. Praise for Python. Basic Graphing with MatPlotLib. Graphing real data with MatPlotLib. Extracting image EXIF data with Python. Python 2.7.1 Goodness. Running WordPress on Mac OS X [...]]]></description>
			<content:encoded><![CDATA[<p>The top ten posts for 2011 according to Google Analytics were:</p>
<ol>
<li><a title="Installing Python, MatPlotLib &amp; iPython on Snow Leopard" href="http://www.endlesslycurious.com/2011/04/06/installing-python-matplotlib-ipython-on-snow-leopard/">Installing Python, MatPlotLib &amp; iPython on Snow Leopard</a>.</li>
<li><a title="Finding duplicate files using Python" href="http://www.endlesslycurious.com/2011/06/01/finding-duplicate-files-using-python/">Finding duplicate files using Python</a>.</li>
<li><a title="Getting started with Python" href="http://www.endlesslycurious.com/2011/06/14/getting-started-with-python/">Getting started with Python</a>.</li>
<li><a title="Praise for Python" href="http://www.endlesslycurious.com/2011/05/02/praise-for-python/">Praise for Python</a>.</li>
<li><a title="Basic Graphing with MatPlotLib" href="http://www.endlesslycurious.com/2011/05/04/basic-graphing-with-matplotlib/">Basic Graphing with MatPlotLib</a>.</li>
<li><a title="Graphing real data with MatPlotLib" href="http://www.endlesslycurious.com/2011/05/06/graphing-real-data-with-matplotlib/">Graphing real data with MatPlotLib</a>.</li>
<li><a title="Extracting image EXIF data with Python" href="http://www.endlesslycurious.com/2011/05/11/extracting-image-exif-data-with-python/">Extracting image EXIF data with Python</a>.</li>
<li><a title="Python 2.7.1 Goodness" href="http://www.endlesslycurious.com/2011/06/10/python-2-7-1-goodness/">Python 2.7.1 Goodness</a>.</li>
<li><a title="Running WordPress on Mac OS X with XAMPP" href="http://www.endlesslycurious.com/2011/03/14/running-wordpress-on-mac-os-x-with-xampp/">Running WordPress on Mac OS X with XAMPP</a>.</li>
<li><a title="John Clease on Creativity" href="http://www.endlesslycurious.com/2011/03/10/john-cleese-on-creativity/">John Cleese on creativity</a>.</li>
</ol>
<p>Eight of the top ten are Python related, the top twenty is more diversified:</p>
<ol start="11">
<li><a title="Querying Reddit with Python" href="http://www.endlesslycurious.com/2011/11/30/querying-reddit-with-python/">Querying Reddit with Python</a>.</li>
<li><a title="Barbara" href="http://www.endlesslycurious.com/2011/08/22/barbara/">Barbara</a>.</li>
<li><a title="Processing Perforce command output with Python" href="http://www.endlesslycurious.com/2011/03/28/processing-perforce-command-output-with-python/">Processing Perforce command output with Python</a>.</li>
<li><a title="Downloading Wallpaper Images from Reddit with Python" href="http://www.endlesslycurious.com/2011/12/31/downloading-wallpaper-images-from-reddit-with-python/">Downloading wallpaper images from Reddit using Python</a>.</li>
<li><a title="Why Scrum fails..." href="http://www.endlesslycurious.com/2011/04/25/why-scrum-fails/">Why scrum fails</a>&#8230;</li>
<li><a title="Hacking Work Manifesto" href="http://www.endlesslycurious.com/2011/05/12/hacking-work-manifesto/">Hacking work manifesto</a>.</li>
<li><a title="The ascendancy of JSON" href="http://www.endlesslycurious.com/2011/05/20/the-ascendancy-of-json/">The ascendeancy of JSON</a>.</li>
<li><a title="Using Perforce Counters to control syncing" href="http://www.endlesslycurious.com/2011/03/24/using-perforce-counters-to-control-syncing/">Using Perforce counters to control syncing</a>.</li>
<li><a title="Why work doesn't happen at work." href="http://www.endlesslycurious.com/2011/05/03/why-work-doesnt-happen-at-work/">Why work doesn&#8217;t happen at work</a>.</li>
<li><a title="Small steps to big goals" href="http://www.endlesslycurious.com/2011/05/05/small-steps-to-big-goals/">Small steps to big goals</a>.</li>
</ol>
<p>On a personal note I hope 2012 will bring more posts and less personal tragedy..</p>
]]></content:encoded>
			<wfw:commentRss>http://www.endlesslycurious.com/2012/01/09/top-posts-of-2011/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Organising photographs with Python</title>
		<link>http://www.endlesslycurious.com/2012/01/02/organising-photographs-with-python/</link>
		<comments>http://www.endlesslycurious.com/2012/01/02/organising-photographs-with-python/#comments</comments>
		<pubDate>Mon, 02 Jan 2012 09:00:48 +0000</pubDate>
		<dc:creator>Daniel</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://www.endlesslycurious.com/?p=2959</guid>
		<description><![CDATA[Previously I posted about extracting EXIF information from images using the Python Image Library (PIL).  The reason I was investigating how to do this was I wanted to programmatically reorganise my personal photograph collection from its current ad-hoc mess to something more structured. My goal was to use Python to extract the EXIF information from each [...]]]></description>
			<content:encoded><![CDATA[<p>Previously I <a title="Extracting Image EXIF data with Python" href="http://www.endlesslycurious.com/2011/05/11/extracting-image-exif-data-with-python/">posted</a> about extracting <a title="Wikipedia: EXIF" href="http://en.wikipedia.org/wiki/Exchangeable_image_file_format">EXIF</a> information from images using the <a title="Python Image Library (PIL)" href="http://www.pythonware.com/products/pil/">Python Image Library</a> (PIL).  The reason I was investigating how to do this was I wanted to programmatically reorganise my personal photograph collection from its current ad-hoc mess to something more structured.</p>
<p>My goal was to use Python to extract the EXIF information from each image file and use the creation time of each image as key to organise each image into the directory structure Year/Month/Day.  If an image file is missing EXIF data then the file&#8217;s creation time can be used instead via an option.</p>
<p>An example of running this script to reoranise the photos folder and leave the original files in place would be:</p>
<pre class="brush: bash; title: ; notranslate">
python PhotoShuffle.py -copy /Daniel/Pictures /Daniel/OrganisedPictures
</pre>
<p>You can also find the latest version on github at <a title="PhotoShuffle on GitHub" href="http://github.com/dpbrown/PhotoShuffle">github.com/dpbrown/PhotoShuffle</a>, the following is the current script:</p>
<pre class="brush: python; title: ; notranslate">
&quot;&quot;&quot;Scans a folder and builds a date sorted tree based on image creation time.&quot;&quot;&quot;

if __name__ == '__main__':
    from os import makedirs, listdir, rmdir
    from os.path import join as joinpath, exists, getmtime
    from datetime import datetime
    from shutil import move, copy2 as copy
    from ExifScan import scan_exif_data
    from argparse import ArgumentParser

    PARSER = ArgumentParser(description='Builds a date sorted tree of images.')
    PARSER.add_argument( 'orig', metavar='O', help='Source root directory.')
    PARSER.add_argument( 'dest', metavar='D',
                         help='Destination root directory' )
    PARSER.add_argument( '-filetime', action='store_true',
                         help='Use file time if missing EXIF' )
    PARSER.add_argument( '-copy', action='store_true',
                         help='Copy files instead of moving.' )
    ARGS = PARSER.parse_args()

    print 'Gathering &amp; processing EXIF data.'

    # Get creation time from EXIF data.
    DATA = scan_exif_data( ARGS.orig )

    # Process EXIF data.
    for r in DATA:
        info = r['exif']
        # precidence is DateTimeOriginal &gt; DateTime.
        if 'DateTimeOriginal' in info.keys():
            r['ftime'] = info['DateTimeOriginal']
        elif 'DateTime' in info.keys():
            r['ftime'] = info['DateTime']
        if 'ftime' in r.keys():
            r['ftime'] = datetime.strptime(r['ftime'],'%Y:%m:%d %H:%M:%S')
        elif ARGS.filetime == True:
            ctime = getmtime( joinpath( r['path'], r['name'] + r['ext'] ))
            r['ftime'] = datetime.fromtimestamp( ctime )

    # Remove any files without datetime info.
    DATA = [ f for f in DATA if 'ftime' in f.keys() ]

    # Generate new path YYYY/MM/DD/ using EXIF date.
    for r in DATA:
        r['newpath'] = joinpath( ARGS.dest, r['ftime'].strftime('%Y/%m/%d') )

    # Generate filenames per directory: 1 to n+1 (zero padded) with DDMMMYY.
    print 'Generating filenames.'
    for newdir in set( [ i['newpath'] for i in DATA ] ):
        files = [ r for r in DATA if r['newpath'] == newdir ]
        pad = len( str( len(files) ) )
        usednames = []
        for i in range( len(files) ):
            datestr = files[i]['ftime'].strftime('%d%b%Y')
            newname = '%0*d_%s' % (pad, i+1, datestr)
            j = i+1
            # if filename exists keep looking until it doesn't. Ugly!
            while ( exists( joinpath( newdir, newname + files[i]['ext'] ) ) or
                newname in usednames ):
                j += 1
                jpad = max( pad, len( str( j ) ) )
                newname = '%0*d_%s' % (jpad, j, datestr)
            usednames.append( newname )
            files[i]['newname'] = newname

    # Copy the files to their new locations, creating directories as requried.
    print 'Copying files.'
    for r in DATA:
        origfile = joinpath( r['path'], r['name'] + r['ext'] )
        newfile = joinpath( r['newpath'], r['newname'] + r['ext'] )
        if not exists( r['newpath'] ):
            makedirs( r['newpath'] )
        if not exists( newfile ):
            if ARGS.copy:
                print 'Copying '+ origfile +' to '+ newfile
                copy( origfile, newfile )
            else:
                print 'Moving '+ origfile +' to '+ newfile
                move( origfile, newfile )
        else:
            print newfile +' already exists!'

    if ARGS.copy:
        print 'Removing empty directories'
        DIRS = set( [ d['path'] for d in DATA ] )
        for d in DIRS:
            # if the directory is empty then delete it.
            if len( listdir( d ) ) == 0:
                print 'Deleting dir ' + d
                rmdir( d )
</pre>
<p>UPDATE: I tend to run my duplicate file script over image collections before I organise them to remove any duplicates.  You can find that script on github at <a href="github.com/dpbrown/Duplicate-Files" title="Duplicate Files Script">github.com/dpbrown/Duplicate-Files</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.endlesslycurious.com/2012/01/02/organising-photographs-with-python/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Downloading Wallpaper Images from Reddit with Python</title>
		<link>http://www.endlesslycurious.com/2011/12/31/downloading-wallpaper-images-from-reddit-with-python/</link>
		<comments>http://www.endlesslycurious.com/2011/12/31/downloading-wallpaper-images-from-reddit-with-python/#comments</comments>
		<pubDate>Sat, 31 Dec 2011 09:00:15 +0000</pubDate>
		<dc:creator>Daniel</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://www.endlesslycurious.com/?p=2881</guid>
		<description><![CDATA[In my previous post I demonstrated how to query Reddit using Python and JOSN. My goal was a script to download the latest and greatest wallpapers off of image sub-reddits like wallpaper to keep my desktop wallpaper fresh and interesting. The main function of the script is to download any JPEG formatted image that listed [...]]]></description>
			<content:encoded><![CDATA[<p>In my <a href="http://www.endlesslycurious.com/2011/11/30/querying-reddit-with-python/" title="Querying Reddit with Python">previous post</a> I demonstrated how to query Reddit using Python and JOSN.  My goal was a script to download the latest and greatest wallpapers off of image sub-reddits like <a href="http://www.reddit.com/r/wallpaper" title="Wallpaper sub-reddit">wallpaper</a> to keep my desktop wallpaper fresh and interesting.  The main function of the script is to download any JPEG formatted image that listed in the specified sub-reddit and download them to a folder.  </p>
<p>Allot of the script turned out to be managing URLs, handling exceptions and checking image types so that links to the most commonly encountered image repository: <a href="http://imgur.com/" title="Imgur">imgur</a> worked.  I opted to use the reddit hash id for each post as the filename for the downloaded JPEGs as this seems to be unique value, which means there are no collisions and its easy to programatically check if that item&#8217;s image has already been download or not.  Although using a hash value instead of the items text title doesn&#8217;t make the most memorable filenames..</p>
<p>The single most frustrating thing I encountered when writing this script is that I have yet to discover a programatic way to work out the URL for an image on Flickr given a Flickr page URL.  This is a real shame as Flickr is a really popular image hosting site with allot of great images.</p>
<p>An example of running the script to download images with a score greater than 50 from the wallpaper sub-reddit into a folder called wallpaper would be as follows:</p>
<pre class="brush: bash; title: ; notranslate">
python redditdownload.py wallpaper wallpaper -s 50
</pre>
<p>And to run the same query but only get any new images you don&#8217;t already have, run the following:</p>
<pre class="brush: bash; title: ; notranslate">
python redditdownload.py wallpaper wallpaper -s 50 -update
</pre>
<p>You can find the source code for this post (and the previous) on GitHub at <a href="http://github.com/dpbrown/RedditImageGrab" title="GitHub">github.com/dpbrown/RedditImageGrab</a> and the current source for the script is as follows:</p>
<pre class="brush: python; title: ; notranslate">
&quot;&quot;&quot;Download images from a reddit.com subreddit.&quot;&quot;&quot;

from urllib2 import urlopen, HTTPError, URLError
from httplib import InvalidURL
from argparse import ArgumentParser
from os.path import exists as pathexists, join as pathjoin
from os import mkdir
from reddit import getitems

if __name__ == &quot;__main__&quot;:
    PARSER = ArgumentParser( description='Downloads files with specified externsion from the specified subreddit.')
    PARSER.add_argument( 'reddit', metavar='r', help='Subreddit name.')
    PARSER.add_argument( 'dir', metavar='d', help='Dir to put downloaded files in.')
    PARSER.add_argument( '-last', metavar='l', default='', required=False, help='ID of the last downloaded file.')
    PARSER.add_argument( '-score', metavar='s', default='0', type=int, required=False, help='Minimum score of images to download.')
    PARSER.add_argument( '-num', metavar='n', default='0', type=int, required=False, help='Number of images to process.')
    PARSER.add_argument( '-update', default=False, action='store_true', required=False, help='Run until you encounter a file already downloaded.')
    ARGS = PARSER.parse_args()

    print 'Downloading images from &quot;%s&quot; subreddit' % (ARGS.reddit)

    ITEMS = getitems( ARGS.reddit, ARGS.last )
    N = D = E = S = F = 0
    FINISHED = False

    # Create the specified directory if it doesn't already exist.
    if not pathexists( ARGS.dir ):
        mkdir( ARGS.dir )

    while len(ITEMS) &gt; 0 and FINISHED == False:
        LAST = ''
        for ITEM in ITEMS:
            if ITEM['score'] &lt; ARGS.score:
                print '\tSCORE: %s has score of %s which is lower than required score of %s.' % (ITEM['id'],ITEM['score'],ARGS.score)
                S += 1
            else:
                FILENAME = pathjoin( ARGS.dir, '%s.jpg' % (ITEM['id'] ) )
                # Don't download files multiple times!
                if not pathexists( FILENAME ):
                    try:
                        if 'imgur.com' in ITEM['url']:
                            # Change .png to .jpg for imgur urls.
                            if ITEM['url'].endswith('.png'):
                                ITEM['url'] = ITEM['url'].replace('.png','.jpg')
                            # Add .jpg to imgur urls that are missing it.
                            elif '.jpg' not in ITEM['url']:
                                ITEM['url'] = '%s.jpg' % ITEM['url']
                            elif '.jpeg' not in ITEM['url']:
                                ITEM['url'] = '%s.jpg' % ITEM['url']

                        RESPONSE = urlopen( ITEM['url'] )
                        INFO = RESPONSE.info()

                        # Work out file type either from the response or the url.
                        if 'content-type' in INFO.keys():
                            FILETYPE = INFO['content-type']
                        elif ITEM['url'].endswith( 'jpg' ):
                            FILETYPE = 'image/jpeg'
                        elif ITEM['url'].endswith( 'jpeg' ):
                            FILETYPE = 'image/jpeg'
                        else:
                            FILETYPE = 'unknown'

                        # Only try to download jpeg images.
                        if FILETYPE == 'image/jpeg':
                            FILEDATA = RESPONSE.read()
                            FILE = open( FILENAME, 'wb')
                            FILE.write(FILEDATA)
                            FILE.close()
                            print '\tDownloaded %s to %s.' % (ITEM['url'],FILENAME)
                            D += 1
                        else:
                            print '\tWRONG FILE TYPE: %s has type: %s!' % (ITEM['url'],FILETYPE)
                            S += 1
                    except HTTPError as ERROR:
                            print '\tHTTP ERROR: Code %s for %s.' % (ERROR.code,ITEM['url'])
                            F += 1
                    except URLError as ERROR:
                            print '\tURL ERROR: %s!' % ITEM['url']
                            F += 1
                    except InvalidURL as ERROR:
                            print '\tInvalid URL: %s!' % ITEM['url']
                            F += 1
                else:
                    print '\tALREADY EXISTS: %s for %s already exists.' % (FILENAME,ITEM['url'])
                    E += 1
                    if ARGS.update == True:
                        print '\tUpdate complete, exiting.'
                        FINISHED = True
                        break
            LAST = ITEM['id']
            N += 1
            if ARGS.num &gt; 0 and N &gt;= ARGS.num:
                print '\t%d images attempted , exiting.' % N
                FINISHED = True
                break;
        ITEMS = getitems( ARGS.reddit, LAST )

    print 'Downloaded %d of %d (Skipped %d, Exists %d)' % (D, N, S, E)
</pre>
]]></content:encoded>
			<wfw:commentRss>http://www.endlesslycurious.com/2011/12/31/downloading-wallpaper-images-from-reddit-with-python/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Querying Reddit with Python</title>
		<link>http://www.endlesslycurious.com/2011/11/30/querying-reddit-with-python/</link>
		<comments>http://www.endlesslycurious.com/2011/11/30/querying-reddit-with-python/#comments</comments>
		<pubDate>Wed, 30 Nov 2011 09:00:12 +0000</pubDate>
		<dc:creator>Daniel</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://www.endlesslycurious.com/?p=2856</guid>
		<description><![CDATA[I&#8217;ve long been a fan of reddit: which is a social news site where users can submit news, they can also comment and vote on submissions of other users.  Reddit provides a form of content filtration though subreddits which are specialized by topic e.g. the Python programming language. I thought it would be fun to figure out [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve long been a fan of <a title="Reddit" href="http://www.reddit.com/">reddit</a>: which is a social news site where users can submit news, they can also comment and vote on submissions of other users.  Reddit provides a form of content filtration though subreddits which are specialized by topic e.g. the Python programming language.</p>
<p>I thought it would be fun to figure out how to get the most recent items for a particular subreddit and the previous items for an item in a subreddit. Both these things turned out to be really simple using existing Python packages to query reddit and process the <a href="http://www.json.org/" title="JSON">JSON</a> formatted response.</p>
<pre class="brush: python; title: ; notranslate">
&quot;&quot;&quot;Return list of items from a sub-reddit of reddit.com.&quot;&quot;&quot;

from urllib2 import urlopen, HTTPError
from json import JSONDecoder

def getitems( subreddit, previd=''):
    &quot;&quot;&quot;Return list of items from a subreddit.&quot;&quot;&quot;
    url = 'http://www.reddit.com/r/%s.json' % subreddit
    # Get items after item with 'id' of previd.
    if previd != '':
        url = '%s?after=t3_%s' % (url, previd)
    try:
        json = urlopen( url ).read()
        data = JSONDecoder().decode( json )
        items = [ x['data'] for x in data['data']['children'] ]
    except HTTPError as ERROR:
        print '\tHTTP ERROR: Code %s for %s.' % (ERROR.code, url)
        items = []
    return items

if __name__ == &quot;__main__&quot;:

    print 'Recent items for Python.'
    ITEMS = getitems( 'python' )
    for ITEM in ITEMS:
        print '\t%s - %s' % (ITEM['title'], ITEM['url'])

    print 'Previous items for Python.'
    OLDITEMS = getitems( 'python', ITEMS[-1]['id'] )
    for ITEM in OLDITEMS:
        print '\t%s - %s' % (ITEM['title'], ITEM['url'])
</pre>
<p>In my <a href="http://www.endlesslycurious.com/2011/12/31/downloading-wallpaper-images-from-reddit-with-python/" title="Downloading Wallpaper Images from Reddit using Python">next post</a> I&#8217;ll detail what I used this script for..</p>
]]></content:encoded>
			<wfw:commentRss>http://www.endlesslycurious.com/2011/11/30/querying-reddit-with-python/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Getting started with Python</title>
		<link>http://www.endlesslycurious.com/2011/06/14/getting-started-with-python/</link>
		<comments>http://www.endlesslycurious.com/2011/06/14/getting-started-with-python/#comments</comments>
		<pubDate>Tue, 14 Jun 2011 09:00:38 +0000</pubDate>
		<dc:creator>Daniel</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://www.endlesslycurious.com/?p=2727</guid>
		<description><![CDATA[The following is how I&#8217;d recommend getting started programming in Python: The Python Tutorial . First off work your way through the official Python tutorial, its very comprehensive and covers all the language features and also has a quick tour of the modules available in the standard library. Code Like a Pythonista: Idiomatic Python Next I&#8217;d highly [...]]]></description>
			<content:encoded><![CDATA[<p>The following is how I&#8217;d recommend getting started programming in Python:</p>
<ol>
<li><strong>The Python Tutorial</strong> .<br />
First off work your way through the official Python <a title="Docs.Python.org" href="http://docs.python.org/tutorial/index.html">tutorial</a>, its very comprehensive and covers all the language features and also has a quick tour of the modules available in the standard library.</li>
<li><strong>Code Like a Pythonista: Idiomatic Python<br />
</strong>Next I&#8217;d highly recommend reading the &#8216;<a href="http://python.net/~goodger/projects/pycon/2007/idiomatic/handout.html">Code Like a Pythonista</a>&#8216; article is its entirety, its very useful for learning about the Pythonic way of thinking.</li>
<li><strong>The Python Style Guide.</strong><br />
Next read the <a href="http://www.python.org/dev/peps/pep-0008/">Python Style Guide</a> (know as PEP-8), which will teach you about the general python coding style which depending on the languages you&#8217;ve used before could be quite a different style.</li>
<li><strong>The Python Challenge.<br />
</strong>Now try the Python challenge, this will push your new Python skills and riddle solving abilites.  If you get stuck the official <a title="Python Challenge forums" href="http://www.pythonchallenge.com/forums/">forums</a> are helpful, I found I got stuck on the riddles more than the programming.  Once you&#8217;ve solved each of the challenges I&#8217;d <em>strongly</em> recommend going and checking out the submitted <a title="Python Challenge Solutions" href="http://wiki.pythonchallenge.com/index.php?title=Main_Page">solutions</a> to that challenges.  I found this a incredibly helpful learning experience, as by looking at the solutions I learned the pythonic way to solve the problems.  Note: you can&#8217;t access these solutions till you&#8217;ve solved them yourself.</li>
<li><strong>&#8216;Learn Python the Hard Way&#8217; or <strong>&#8216;Dive into Python&#8217;.</strong></strong><br />
For gaining further knowledge there are several ebooks available online for free: the first is <a title="Learn Python the Hard Way" href="http://learnpythonthehardway.org/">Learn Python the Hard Way</a> and there is also the dated <a title="Dive into Python" href="http://diveintopython.org/">Dive into Python</a>.  I&#8217;ve not read Learn Python the Hard Way but I&#8217;ve heard good reviews from several people.</li>
</ol>
<p>For getting help with Python programming I&#8217;d recommend:</p>
<ul>
<li><strong>Stack Overflow.</strong><br />
<a href="http://stackoverflow.com/">Stack Overflow</a> is a collaborative quesion and answers site for programmers and has a very active python community.  It is highly recommended to searching to see if your question has been asked already before  posting a question.</li>
<li><strong>#python on irc.freenode.net.</strong><br />
Visiting the #python <a href="http://en.wikipedia.org/wiki/Internet_Relay_Chat">IRC</a> channel on irc.freenode.net is also a very good way to get help with Python questions.  You can find our more about the various Python IRC channels <a href="http://www.python.org/community/irc/">here</a>.  Note: You&#8217;ll need an IRC client like <a href="http://xchat.org/">X-Chat</a> (Linux &amp; Windows) or <a href="http://colloquy.info/index.html">Colloquy</a> (Mac).</li>
</ul>
<p>Here are some tools I&#8217;d recommending picking up:</p>
<ul>
<li><strong>Package installer &#8211; PIP or easy_install.<br />
</strong><a title="www.pip-installer.org" href="http://www.pip-installer.org/en/latest/installing.html">PIP</a> is the current Python package installer of choice and lets you easily download and install Python from various sources such as the official Python package repository &#8211; <a href="http://pypi.python.org/pypi">PyPi</a> and <a href="http://sourceforge.net/">SourceForge</a>.  I found that PIP makes installing new Python packages trivial 99% of the time, the other 1% of the time you&#8217;ll need to build the packages locally which is more involved.  Note: Windows users may be better off sticking to the older <a title="easy_install" href="http://pypi.python.org/pypi/setuptools">easy_install</a> tool instead of PIP.</li>
<li><strong>Enhanced command line &#8211; iPython or bPython.<br />
</strong><a href="http://ipython.scipy.org/moin/">iPython</a> is an enhanced command line environment for Python that I&#8217;d highly recommend over the basic command line interpreter.  You can find several different  of video tutorials for iPython listed <a href="http://ipython.org/ipython-doc/">here</a>. I am told that <a href="http://bpython-interpreter.org/">bPython</a> is another enhanced command line that is worth checking out too.</li>
<li><strong>Code analyser &#8211; PyLint or pyflakes.<br />
</strong><a href="http://www.logilab.org/project/pylint">PyLint</a> is a python version of the Lint C/C++ static code analysis tool which will analyse your Python code and give you useful feedback on your code as well as a score out of 10.  PyLint will also check your code adheres to the official Python Style Guide which I found very useful for learning the Python coding style.  Alternatively <a title="pyflakes" href="https://launchpad.net/pyflakes">pyflakes</a> has also been recommended for static analysis of python code.</li>
</ul>
<p>I&#8217;d be interested in hearing of any other resources you found useful to help you get started with python.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.endlesslycurious.com/2011/06/14/getting-started-with-python/feed/</wfw:commentRss>
		<slash:comments>14</slash:comments>
		</item>
		<item>
		<title>Python 2.7.1 Goodness</title>
		<link>http://www.endlesslycurious.com/2011/06/10/python-2-7-1-goodness/</link>
		<comments>http://www.endlesslycurious.com/2011/06/10/python-2-7-1-goodness/#comments</comments>
		<pubDate>Fri, 10 Jun 2011 09:00:48 +0000</pubDate>
		<dc:creator>Daniel</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://www.endlesslycurious.com/?p=2706</guid>
		<description><![CDATA[So far my favorite additions and changes in Python 2.7.1 since upgrading from the default Python 2.6.1 installation in Mac OS X Snow Leopard are the following: Dictionary and Set Comprehensions. List comprehensions are one of my favorite language features in Python, they are incredibly useful for processing and building lists.  So I am very excited [...]]]></description>
			<content:encoded><![CDATA[<p>So far my favorite additions and changes in Python 2.7.1 since upgrading from the default Python 2.6.1 installation in Mac OS X Snow Leopard are the following:</p>
<ol>
<li><strong>Dictionary and Set Comprehensions.<br />
</strong><a title="docs.python.org" href="http://docs.python.org/tutorial/datastructures.html#list-comprehensions">List comprehensions</a> are one of my favorite language features in Python, they are incredibly useful for processing and building lists.  So I am very excited to see <a title="docs.python.org" href="http://docs.python.org/dev/whatsnew/2.7.html#python-3-1-features">dictionary and set comprehensions</a> back ported from Python 3 to Python 2.7.1.</li>
<li><strong>The ArgParse Module.</strong><br />
As a C/C++ programmer I original did command line argument processing in Python manually using <a title="docs.python.org" href="http://docs.python.org/dev/library/sys.html#sys.argv">sys.argv</a>, then I discovered the C-style <a title="docs.python.org" href="http://docs.python.org/library/getopt">getopt</a> module.  I always found myself wondering if there was a more concise Pythonic way to handle command line parameters.  The <a title="docs.python.org" href="http://docs.python.org/library/argparse">argparse</a> module is the solution, it replaces the <a title="docs.python.org" href="http://docs.python.org/library/optparse">optparse</a> module.  I particularly like how argparse (and optparse) will generate the command line help for you!</li>
<li><strong>csv.DictWriter.writeheader method.</strong><br />
While this is a very minor change  (in Python 2.7 to be precise), I am a big fan of the <a title="docs.python.org" href="http://docs.python.org/library/csv">csv</a> module&#8217;s <a title="docs.python.org" href="http://docs.python.org/library/csv#csv.DictWriter">DictWriter</a> class as a way to easily dump lists of dictionaries to a file for easy analysis and debugging with Excel.  The addition of the DictWriter class of an new <a title="docs.python.org" href="http://docs.python.org/library/csv#csv.DictWriter.writeheader">writeheader</a> method makes this class even easier to use.</li>
</ol>
<p>You can find the full release notes for Python 2.7.1 <a title="docs.python.org" href="http://docs.python.org/dev/whatsnew/2.7.html">here</a>, there are so many more changes than I&#8217;ve covered here so its well worth checking out the release notes.  What are your favorite changes in Python 2.7.1?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.endlesslycurious.com/2011/06/10/python-2-7-1-goodness/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Finding duplicate files using Python</title>
		<link>http://www.endlesslycurious.com/2011/06/01/finding-duplicate-files-using-python/</link>
		<comments>http://www.endlesslycurious.com/2011/06/01/finding-duplicate-files-using-python/#comments</comments>
		<pubDate>Wed, 01 Jun 2011 09:00:32 +0000</pubDate>
		<dc:creator>Daniel</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://www.endlesslycurious.com/?p=2679</guid>
		<description><![CDATA[I wrote this script to find and optionally delete duplicate files in a directory tree.  The script uses MD5 hashes of each file&#8217;s content to detect duplicate files. This script is based on zalew&#8217;s answer on stackoverflow. So far I have found this script sufficient for accurately finding and removing duplicate files in my photograph collection. [...]]]></description>
			<content:encoded><![CDATA[<p>I wrote this script to find and optionally delete duplicate files in a directory tree.  The script uses <a href="http://en.wikipedia.org/wiki/MD5">MD5</a> hashes of each file&#8217;s content to detect duplicate files. This script is based on zalew&#8217;s <a href="http://stackoverflow.com/questions/748675/finding-duplicate-files-and-removing-them/748879#748879">answer</a> on stackoverflow. So far I have found this script sufficient for accurately finding and removing duplicate files in my photograph collection.</p>
<pre class="brush: python; title: ; notranslate">
&quot;&quot;&quot;Find duplicate files inside a directory tree.&quot;&quot;&quot;

from os import walk, remove, stat
from os.path import join as joinpath
from md5 import md5

def find_duplicates( rootdir ):
    &quot;&quot;&quot;Find duplicate files in directory tree.&quot;&quot;&quot;
    filesizes = {}
    # Build up dict with key as filesize and value is list of filenames.
    for path, dirs, files in walk( rootdir ):
        for filename in files:
            filepath = joinpath( path, filename )
            filesize = stat( filepath ).st_size
            filesizes.setdefault( filesize, [] ).append( filepath )
    unique = set()
    duplicates = []
    # We are only interested in lists with more than one entry.
    for files in [ flist for flist in filesizes.values() if len(flist)&gt;1 ]:
        for filepath in files:
            with open( filepath ) as openfile:
                filehash = md5( openfile.read() ).hexdigest()
            if filehash not in unique:
                unique.add( filehash )
            else:
                duplicates.append( filepath )
    return duplicates

if __name__ == '__main__':
    from argparse import ArgumentParser

    PARSER = ArgumentParser( description='Finds duplicate files.' )
    PARSER.add_argument( 'root', metavar='R', help='Dir to search.' )
    PARSER.add_argument( '-remove', action='store_true',
                         help='Delete duplicate files.' )
    ARGS = PARSER.parse_args()

    DUPS = find_duplicates( ARGS.root )

    print '%d Duplicate files found.' % len(DUPS)
    for f in sorted(DUPS):
        if ARGS.remove == True:
            remove( f )
            print '\tDeleted '+ f
        else:
            print '\t'+ f
</pre>
<p>I discovered the <a href="http://docs.python.org/library/argparse">argparse</a> module (added in Python 2.7) in the standard library this week and it makes command line parameter handling nice and concise.</p>
<p>UPDATE: Changed uniques array into a set and added first pass using file sizes as performance improvement, allot faster now.</p>
<p>UPDATE: You can now find this script on github at <a href="https://github.com/dpbrown/Duplicate-Files" title="Duplicate Files on github">github.com/dpbrown/Duplicate-Files</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.endlesslycurious.com/2011/06/01/finding-duplicate-files-using-python/feed/</wfw:commentRss>
		<slash:comments>16</slash:comments>
		</item>
		<item>
		<title>Extracting image EXIF data with Python</title>
		<link>http://www.endlesslycurious.com/2011/05/11/extracting-image-exif-data-with-python/</link>
		<comments>http://www.endlesslycurious.com/2011/05/11/extracting-image-exif-data-with-python/#comments</comments>
		<pubDate>Wed, 11 May 2011 09:00:48 +0000</pubDate>
		<dc:creator>Daniel</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://www.endlesslycurious.com/?p=2552</guid>
		<description><![CDATA[Most digital cameras and smartphones embed EXIF (EXchangeable Image Format) data into the photographs they capture.  This can include: camera make &#038; model, date and time, camera settings like orientation, aperture, ISO, shutter speed, focal, length and even GPS location. After a bit of experimentation I have found the following method of using the undocumented ExifTags [...]]]></description>
			<content:encoded><![CDATA[<p>Most digital cameras and smartphones embed <a title="Wikipedia" href="http://en.wikipedia.org/wiki/Exchangeable_image_file_format">EXIF</a> (EXchangeable Image Format) data into the photographs they capture.  This can include: camera make &#038; model, date and time, camera settings like orientation, aperture, ISO, shutter speed, focal, length and even GPS location.</p>
<p>After a bit of experimentation I have found the following method of using the undocumented ExifTags module in the <a title="Python Ware" href="http://www.pythonware.com/products/pil/">Python Image Library</a> (PIL) to be the simplest way to extract EXIF tags from images using Python.  There are other EXIF modules available for Python however currently PIL is the simplest to install on Mac OS X.</p>
<pre class="brush: python; title: ; notranslate">
from PIL import Image
from PIL.ExifTags import TAGS

def get_exif_data(fname):
    &quot;&quot;&quot;Get embedded EXIF data from image file.&quot;&quot;&quot;
    ret = {}
    try:
        img = Image.open(fname)
        if hasattr( img, '_getexif' ):
            exifinfo = img._getexif()
            if exifinfo != None:
                for tag, value in exifinfo.items():
                    decoded = TAGS.get(tag, tag)
                    ret[decoded] = value
    except IOError:
        print 'IOERROR ' + fname
    return ret
</pre>
<p>The above code was based on the code snippet in Paolo&#8217;s answer to <a title="StackOverflow" href="http://stackoverflow.com/questions/765396/exif-manipulation-library-for-python">this</a> StackOverflow question.  I have added basic exception handling and a check for the existence of the _getexif attribute prior to accessing it.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.endlesslycurious.com/2011/05/11/extracting-image-exif-data-with-python/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Graphing real data with MatPlotLib</title>
		<link>http://www.endlesslycurious.com/2011/05/06/graphing-real-data-with-matplotlib/</link>
		<comments>http://www.endlesslycurious.com/2011/05/06/graphing-real-data-with-matplotlib/#comments</comments>
		<pubDate>Fri, 06 May 2011 09:00:37 +0000</pubDate>
		<dc:creator>Daniel</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://www.endlesslycurious.com/?p=2307</guid>
		<description><![CDATA[In a previous post I covered the basics of graphing in Python with the MatPlotLib module.  In this post I am going to demostrate how to use MatPlotLib with some real world data retrieved from a web service and then processed into a format usable by MatPLotLib. The example script performs the following steps: Takes a specified stock&#8217;s ticker [...]]]></description>
			<content:encoded><![CDATA[<p>In a previous <a href="http://www.endlesslycurious.com/2011/05/04/basic-graphing-with-matplotlib/">post</a> I covered the basics of graphing in Python with the <a href="http://matplotlib.sourceforge.net/">MatPlotLib</a> module.  In this post I am going to demostrate how to use MatPlotLib with some real world data retrieved from a web service and then processed into a format usable by MatPLotLib.</p>
<p>The example script performs the following steps:</p>
<ol>
<li>Takes a specified stock&#8217;s ticker symbol and column to plot over time (from Open, High, Low, Close, Volume, Adj Close) as input.</li>
<li>Fetches the corresponding stock data from <a title="Yahoo!" href="http://finance.yahoo.com/">Yahoo! Finance</a> and saves it into a CSV file using the <a title="Python Docs" href="http://docs.python.org/library/urllib.html">urllib</a> module.</li>
<li>Processes the data in the CSV file into a suitable format for matplotlib using the <a title="Python Docs" href="http://docs.python.org/library/csv.html">csv</a>, <a title="Python Docs" href="http://docs.python.org/library/datetime.html">datetime</a> and <a title="MatPlotLib Docs" href="http://matplotlib.sourceforge.net/api/dates_api.html#module-matplotlib.dates">matplotlib.dates</a> modules.</li>
<li>Plots a graph of the data plotted over time using MatPlotLib and a saves a copy as PNG format image.</li>
</ol>
<p>Note: To keep the example concise I am not performing <em>any</em> error handling.</p>
<pre class="brush: python; title: ; notranslate">
&quot;&quot;&quot;Fetches specified stock data from Yahoo and graph it with MatPlotLib.&quot;&quot;&quot;

from urllib import urlretrieve
from csv import DictReader
from matplotlib import pyplot
from matplotlib.dates import date2num
from datetime import datetime

def fetchstockdata( stockticker, filename ):
    &quot;&quot;&quot;Fetch specified stock data and store it in named file.&quot;&quot;&quot;
    url = 'http://ichart.finance.yahoo.com/table.csv?s=%s' % stockticker
    urlretrieve( url, filename )

def importstockdata( filename ):
    &quot;&quot;&quot;Import CSV data into dict of lists, converting dates into timestamps.&quot;&quot;&quot;
    results = {}
    for row in DictReader( open( filename,'rb' ) ):
        for col in row.keys():
            if col == 'Date':
                coldata = date2num( datetime.strptime( row[col], '%Y-%m-%d') )
            else:
                coldata = row[col]
            results.setdefault( col, [] ).append( coldata )
    return results

def plotstockdata( stockdata, stockticker, dates, col ):
    &quot;&quot;&quot;Use MatPlotLib to graph speciifed stock data.&quot;&quot;&quot;
    pyplot.plot_date( stockdata[dates], stockdata[col], '-', xdate=True )
    pyplot.title( '%s - %s / %s' % (stockticker, col, dates) )
    pyplot.xlabel( dates )
    pyplot.ylabel( col )
    pyplot.savefig( '%s.png' % stockticker )
    pyplot.show()

if __name__ == '__main__':
    from sys import argv
    # Use second argument as ticker and third argument as column.
    TICKER = argv[1].upper()
    COL = argv[2]
    # Grab the stock data from Yahoo!
    FILENAME = '%s.csv' % TICKER
    fetchstockdata( TICKER, FILENAME )
    # Import the data.
    DATA = importstockdata( FILENAME )
    # Plot the graph with Date as X-Axis and User selected column as Y-Axis.
    plotstockdata( DATA, TICKER, 'Date', COL )
</pre>
<p>Running this script with using the command line &#8220;python StockChart.py goog &#8216;Adj Close&#8217;&#8221; will produce a chart like the following.<br />
<img class="alignnone size-full wp-image-2500" title="GOOG - Adj Close / Date" src="http://www.endlesslycurious.com/wp-content/uploads/2011/05/goog-e1304568930661.png" alt="" width="500" height="375" /><br />
This is a good example of why I like Python&#8217;s batteries included <a title="About Python" href="http://www.python.org/about/">philosophy</a> so much: it means I spend more of my time writing interesting bits of code as the utility functionality I need has already been implemented or is only an <a title="SetupTools on PyPi" href="http://pypi.python.org/pypi/setuptools">easy_install</a> away.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.endlesslycurious.com/2011/05/06/graphing-real-data-with-matplotlib/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Basic graphing with MatPlotLib</title>
		<link>http://www.endlesslycurious.com/2011/05/04/basic-graphing-with-matplotlib/</link>
		<comments>http://www.endlesslycurious.com/2011/05/04/basic-graphing-with-matplotlib/#comments</comments>
		<pubDate>Wed, 04 May 2011 09:00:51 +0000</pubDate>
		<dc:creator>Daniel</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://www.endlesslycurious.com/?p=2418</guid>
		<description><![CDATA[One of the Python modules that has most interested me recently is MatPlotLib which is a sophisticated graphing module which can be used to create journal grade graphs of almost anything.  The official gallery for MatPlotLib is worth checking out to get an idea of the sheer range of graph types it can be used to create. [...]]]></description>
			<content:encoded><![CDATA[<p>One of the Python modules that has most interested me recently is <a title="MatPlotLib" href="http://matplotlib.sourceforge.net/">MatPlotLib</a> which is a sophisticated graphing module which can be used to create journal grade graphs of almost anything.  The official <a title="MatPlotLib Gallery" href="http://matplotlib.sourceforge.net/gallery.html">gallery</a> for MatPlotLib is worth checking out to get an idea of the sheer range of graph types it can be used to create.</p>
<p>It is simple enough to get started using MatPlotLib for example to create a line graph of x*x and save it as a <a href="http://en.wikipedia.org/wiki/Portable_Network_Graphics">PNG</a> file requires only the following:</p>
<pre class="brush: python; title: ; notranslate">
&quot;&quot;&quot;Simple demonstration of MatPlotLib plotting.&quot;&quot;&quot;

from matplotlib import pyplot

X = range(0,100)
Y = [ i*i for i in X ]

pyplot.plot( X, Y, '-' )
pyplot.title( 'Plotting x*x' )
pyplot.xlabel( 'X Axis' )
pyplot.ylabel( 'Y Axis' )
pyplot.savefig( 'Simple.png' )
pyplot.show()
</pre>
<p>The above script will produce the following graph:<br />
<img class="alignnone size-full wp-image-2419" title="Simple" src="http://www.endlesslycurious.com/wp-content/uploads/2011/05/Simple-e1304491395710.png" alt="" width="500" height="375" /></p>
<p>To plot data over a time period the simplest solution is to convert date/time units to timestamps using MatPlotLibs date2num function and then to plot using the plot_date method as follows:</p>
<pre class="brush: python; title: ; notranslate">
&quot;&quot;&quot;Simple demonstration of MatPlotLib Date plotting.&quot;&quot;&quot;

from matplotlib import pyplot
from matplotlib.dates import date2num
from datetime import datetime, timedelta

# Generate a series of timestamps from today to today + 100 years.
X = [date2num(datetime.today()+timedelta(days=365*x)) for x in range(0,100)]
Y = [i*i for i in range(0,100)]

pyplot.plot_date( X, Y, '-', xdate=True )
pyplot.title( 'Plotting x*x' )
pyplot.xlabel( 'X Axis' )
pyplot.ylabel( 'Y Axis' )
pyplot.savefig( 'SimpleDates.png' )
pyplot.show()
</pre>
<p>Which will generate a chart like the following:<br />
<img class="alignnone size-full wp-image-2453" title="Simple Dates" src="http://www.endlesslycurious.com/wp-content/uploads/2011/05/Simple1-e1304492748433.png" alt="" width="500" height="375" /></p>
<p>As you can see it is fairly simple to graph data using MatPlotLib.  This makes Python and MatPlotLib a compelling solution for data analysis when combined with the many available modules for dealing with common data storage formats like text (using RegEx), CSV, XML and JSON files and SQL databases.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.endlesslycurious.com/2011/05/04/basic-graphing-with-matplotlib/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
	</channel>
</rss>

