I’ve long been a fan of reddit: which is a social news site where users can submit news, they can also comment and vote on submissions of other users.  Reddit provides a form of content filtration though subreddits which are specialized by topic e.g. the Python programming language.

I thought it would be fun to figure out how to get the most recent items for a particular subreddit and the previous items for an item in a subreddit. Both these things turned out to be really simple using existing Python packages to query reddit and process the JSON formatted response.

"""Return list of items from a sub-reddit of reddit.com."""

from urllib2 import urlopen, HTTPError 
from json import JSONDecoder

def getitems( subreddit, previd=''):
    """Return list of items from a subreddit."""
    url = 'http://www.reddit.com/r/%s.json' % subreddit
    # Get items after item with 'id' of previd.
    if previd != '':
        url = '%s?after=t3_%s' % (url, previd)
    try:
        json = urlopen( url ).read()
        data = JSONDecoder().decode( json )
        items = [ x['data'] for x in data['data']['children'] ]
    except HTTPError as ERROR:
        print '\tHTTP ERROR: Code %s for %s.' % (ERROR.code, url)
        items = []
    return items

if __name__ == "__main__":

    print 'Recent items for Python.'
    ITEMS = getitems( 'python' )
    for ITEM in ITEMS:
        print '\t%s - %s' % (ITEM['title'], ITEM['url'])

    print 'Previous items for Python.'
    OLDITEMS = getitems( 'python', ITEMS[-1]['id'] )
    for ITEM in OLDITEMS:
        print '\t%s - %s' % (ITEM['title'], ITEM['url'])

In my next post I’ll detail what I used this script for..