P3: OpenStreetMap Data Case Study. Dubai and Abu-Dhabi. Postcodes.

After the review of the project https://review.udacity.com/#!/reviews/293667 I've created a special notebook

for preprocessing the field "address: postcode" in the .osm file. At first, we should find the range of values.

In [3]:
# Import python libraries
import re
import numpy as np
import scipy
import json
import codecs
import matplotlib.pyplot as plt
import xml.etree.cElementTree as ET
%matplotlib inline
In [4]:
# Fuction for counting postcodes and getting values
def zip_codes(filename):
    count = 0
    data = set()

    for event, elem in ET.iterparse(filename, events=("start",)):
        if elem.tag == 'node' or elem.tag == 'way':
            for tag in elem.iter('tag'):
                if tag.attrib['k'] == "addr:postcode":
                    count += 1
                    data.add( tag.attrib['v'] )
                                     
    return count, data

The next step is to set up the working directory and files for preprocessing:

In [5]:
FILEDIR = "/Users/olgabelitskaya/large-repo/"
In [6]:
SAMPLE_FILE = FILEDIR + "sample_dubai_abu-dhabi.osm"
In [7]:
FILE = FILEDIR + "dubai_abu-dhabi.osm"
In [8]:
FILE0 = FILEDIR + "dubai_abu-dhabi0.osm"
In [9]:
JSON_FILE = FILEDIR + "dubai_abu-dhabi.osm.json"
In [10]:
JSON_FILE0 = FILEDIR + "dubai_abu-dhabi0.osm.json"

Applying the function 'zip_codes':

In [8]:
z = zip_codes(FILE)

The number of the unique values:

In [33]:
len(z[1])
Out[33]:
96

The number of notes with postcodes:

In [9]:
z[0]
Out[9]:
116

Discovering problems in the data:

In [62]:
znp = np.array(sorted(z[1]))
print "All postcodes:", znp
expected = np.append(znp[3:65], znp[66:84])
print "Expected:", expected
unexpected0 = np.append(znp[:3], znp[84:])
unexpected = np.insert(unexpected0, 3, znp[65])
print "Unexpected:", unexpected
All postcodes: ['0' '0000' '000000' '000001' '00962' '00971' '103711' '108100' '1111'
 '111695' '113431' '114692' '115443' '119417' '11999' '121641' '1234'
 '12345' '1243' '125939' '128358' '16095' '20268' '20661' '20767' '2157'
 '22436' '23117' '231992' '232144' '232574' '24857' '24976' '2504' '25494'
 '2574' '26268' '263076' '2666' '277' '28676' '28818' '32923' '34121'
 '34238' '3541' '38126' '38495' '38575' '392189' '41318' '41974' '42324'
 '42524' '44263' '444786' '44548' '4599' '46477' '473828' '47602' '47612'
 '500368' '502227' '52799' '5280 dubai' '53577' '549' '57566' '60884'
 '64649' '6834' '71444' '7770' '77947' '7819' '79506' '811' '81730' '8845'
 '8988' '9292' '97717' '9978' 'Muhaisnah 4' 'P O BOX 3766'
 'P. O. Box 123234' 'P. O. Box 31166' 'P.O. Box 4605'
 'P.O. Box 5618, Abu Dhabi, U.A.E' 'P.O. Box 6446' 'P.O. Box 9770'
 'PO Box 114822' 'PO Box 118737' 'PO Box 43377' 'PO Box 6770']
Expected: ['000001' '00962' '00971' '103711' '108100' '1111' '111695' '113431'
 '114692' '115443' '119417' '11999' '121641' '1234' '12345' '1243' '125939'
 '128358' '16095' '20268' '20661' '20767' '2157' '22436' '23117' '231992'
 '232144' '232574' '24857' '24976' '2504' '25494' '2574' '26268' '263076'
 '2666' '277' '28676' '28818' '32923' '34121' '34238' '3541' '38126'
 '38495' '38575' '392189' '41318' '41974' '42324' '42524' '44263' '444786'
 '44548' '4599' '46477' '473828' '47602' '47612' '500368' '502227' '52799'
 '53577' '549' '57566' '60884' '64649' '6834' '71444' '7770' '77947' '7819'
 '79506' '811' '81730' '8845' '8988' '9292' '97717' '9978']
Unexpected: ['0' '0000' '000000' '5280 dubai' 'Muhaisnah 4' 'P O BOX 3766'
 'P. O. Box 123234' 'P. O. Box 31166' 'P.O. Box 4605'
 'P.O. Box 5618, Abu Dhabi, U.A.E' 'P.O. Box 6446' 'P.O. Box 9770'
 'PO Box 114822' 'PO Box 118737' 'PO Box 43377' 'PO Box 6770']

Mapping the right values:

In [11]:
correction = {'0': 'NA', '0000': 'NA', '000000': 'NA', '5280 dubai': '5280', 
           'Muhaisnah 4': 'NA', 'P O BOX 3766': '3766', 'P. O. Box 123234': '123234', 
           'P. O. Box 31166': '31166', 'P.O. Box 4605': '4605', 
           'P.O. Box 5618, Abu Dhabi, U.A.E': '5618', 'P.O. Box 6446': '6446', 
           'P.O. Box 9770': '9770', 'PO Box 114822': '114822', 'PO Box 118737': '118737', 
           'PO Box 43377': '43377', 'PO Box 6770': '6770'}
correction
Out[11]:
{'0': 'NA',
 '0000': 'NA',
 '000000': 'NA',
 '5280 dubai': '5280',
 'Muhaisnah 4': 'NA',
 'P O BOX 3766': '3766',
 'P. O. Box 123234': '123234',
 'P. O. Box 31166': '31166',
 'P.O. Box 4605': '4605',
 'P.O. Box 5618, Abu Dhabi, U.A.E': '5618',
 'P.O. Box 6446': '6446',
 'P.O. Box 9770': '9770',
 'PO Box 114822': '114822',
 'PO Box 118737': '118737',
 'PO Box 43377': '43377',
 'PO Box 6770': '6770'}

Creating the function for updating:

In [14]:
# Function for updating values:
def update_name(name, correction):    
    if name not in correction.keys():
        raise Exception(name)
    else:
        unexpect = name        
    replace = correction[unexpect]    
    if not replace:
        raise Exception(unexpect)
    updated_name = re.sub(unexpect, replace, name)
    return updated_name
In [44]:
# Check the function
update_name('PO Box 43377', correction)
Out[44]:
'43377'

Let's create the list of dictionaries from the .json file:

In [109]:
DICT = []
for line in open(JSON_FILE, 'r+'):
   DICT.append(json.loads(line))
In [114]:
len(DICT)
Out[114]:
2124505
In [146]:
DICT[1200]
Out[146]:
{u'created': {u'changeset': u'20943519',
  u'timestamp': u'2014-03-06T05:32:37Z',
  u'uid': u'1770239',
  u'user': u'Jennings Anderson',
  u'version': u'5'},
 u'id': u'31475480',
 u'pos': [25.1527723, 55.1958039],
 u'source': u'Bing',
 u'type': u'node'}

Now we can apply the function for updating and check the results.

In [188]:
# Apply the function 'update_name'
for i in range(len(DICT)):
    if DICT[i].get('address') != None:
        if DICT[i]['address'].get('postcode') != None:
            value = DICT[i].get('address').get('postcode')
            if value in unexpected:
                DICT[i]['address']['postcode'] = update_name(value, correction)
In [190]:
# Check the correction of the postcodes
postcodes = []
for element in DICT:
    address = element.get('address')
    if address != None:
        postcode = address.get('postcode')
        if postcode != None:
            postcodes.append(postcode)
print postcodes
[u'115443', u'34121', u'811', u'811', u'42524', u'473828', u'473828', u'473828', u'473828', u'3766', u'549', u'6656', u'24976', u'6834', u'500368', u'2666', u'47602', u'232574', u'32923', u'9292', u'444786', u'125939', u'128358', u'119417', u'113431', u'77947', u'41318', u'38495', u'1243', u'28676', u'5618', u'121641', u'42324', u'20268', u'2157', u'5280', u'111695', u'53577', u'53577', u'22436', u'232144', u'81730', u'23117', u'44548', u'47612', u'24857', u'97717', u'60884', u'2574', u'57566', u'NA', u'114692', u'7770', u'20661', u'NA', u'392189', u'46477', u'38575', u'NA', u'4599', u'38126', u'231992', u'103711', u'103711', u'00971', u'7819', u'2504', u'26268', u'64649', u'00962', u'NA', u'NA', u'1234', u'4758', u'44263', u'263076', u'9978', u'71444', u'12345', u'79506', u'108100', u'8988', u'123234', u'125939', u'118737', u'28818', u'31166', u'6770', u'3541', u'114822', u'9770', u'1111', u'811', u'25494', u'41974', u'811', u'811', u'43377', u'16095', u'6834', u'000001', u'8845', u'502227', u'6446', u'52799', u'277', u'20268', u'4605', u'24857', u'20767', u'34238', u'22436', u'22436', u'23117', u'71444', u'24857', u'11999', u'11999']

Finally, let's create a new .json file, insert into the MongoDB collection and compare results.

In [192]:
# Create new file
with open(FILEDIR + "dubai_abu-dhabi_postcode.osm.json", 'w') as f:
    for line in DICT:
        json.dump(line, f)
        f.write('\n')
In [218]:
%load_ext rpy2.ipython
In [219]:
%R m <- mongo("openstreetmap_correct", verbose = FALSE)
In [220]:
%R stream_in(file("/Users/olgabelitskaya/large-repo/dubai_abu-dhabi_postcode.osm.json"), 
             handler = function(df){m$insert(df)})
In [225]:
# Open databases before correction and after
from pymongo import MongoClient
client = MongoClient('localhost:27017')
database = client['test']
dubai_abu_dhabi = database['openstreetmap']
dubai_abu_dhabi_correct = database['openstreetmap_correct']
In [226]:
# Create a list of zipcodes without correction
zipcodes = dubai_abu_dhabi.aggregate( [ 
    { "$match" : { "address.postcode" : { "$exists" : 1} } }, 
    { "$group" : { "_id" : "$address.postcode", "count" : { "$sum" : 1} } },  
    { "$sort" : { "count" : -1}}
] )
list(zipcodes)
Out[226]:
[{u'_id': u'811', u'count': 5},
 {u'_id': u'473828', u'count': 4},
 {u'_id': u'22436', u'count': 3},
 {u'_id': u'24857', u'count': 3},
 {u'_id': u'11999', u'count': 2},
 {u'_id': u'125939', u'count': 2},
 {u'_id': u'000000', u'count': 2},
 {u'_id': u'71444', u'count': 2},
 {u'_id': u'23117', u'count': 2},
 {u'_id': u'20268', u'count': 2},
 {u'_id': u'53577', u'count': 2},
 {u'_id': u'103711', u'count': 2},
 {u'_id': u'6834', u'count': 2},
 {u'_id': u'20767', u'count': 1},
 {u'_id': u'277', u'count': 1},
 {u'_id': u'00971', u'count': 1},
 {u'_id': u'502227', u'count': 1},
 {u'_id': u'00962', u'count': 1},
 {u'_id': u'16095', u'count': 1},
 {u'_id': u'25494', u'count': 1},
 {u'_id': u'P O BOX 3766', u'count': 1},
 {u'_id': u'P.O. Box 9770', u'count': 1},
 {u'_id': u'PO Box 114822', u'count': 1},
 {u'_id': u'34238', u'count': 1},
 {u'_id': u'PO Box 6770', u'count': 1},
 {u'_id': u'119417', u'count': 1},
 {u'_id': u'231992', u'count': 1},
 {u'_id': u'108100', u'count': 1},
 {u'_id': u'8845', u'count': 1},
 {u'_id': u'4758', u'count': 1},
 {u'_id': u'115443', u'count': 1},
 {u'_id': u'64649', u'count': 1},
 {u'_id': u'44263', u'count': 1},
 {u'_id': u'46477', u'count': 1},
 {u'_id': u'26268', u'count': 1},
 {u'_id': u'28676', u'count': 1},
 {u'_id': u'2504', u'count': 1},
 {u'_id': u'7819', u'count': 1},
 {u'_id': u'38126', u'count': 1},
 {u'_id': u'44548', u'count': 1},
 {u'_id': u'9978', u'count': 1},
 {u'_id': u'4599', u'count': 1},
 {u'_id': u'1234', u'count': 1},
 {u'_id': u'20661', u'count': 1},
 {u'_id': u'2574', u'count': 1},
 {u'_id': u'3541', u'count': 1},
 {u'_id': u'47602', u'count': 1},
 {u'_id': u'81730', u'count': 1},
 {u'_id': u'57566', u'count': 1},
 {u'_id': u'121641', u'count': 1},
 {u'_id': u'111695', u'count': 1},
 {u'_id': u'0000', u'count': 1},
 {u'_id': u'5280 dubai', u'count': 1},
 {u'_id': u'47612', u'count': 1},
 {u'_id': u'32923', u'count': 1},
 {u'_id': u'2157', u'count': 1},
 {u'_id': u'12345', u'count': 1},
 {u'_id': u'60884', u'count': 1},
 {u'_id': u'0', u'count': 1},
 {u'_id': u'113431', u'count': 1},
 {u'_id': u'42324', u'count': 1},
 {u'_id': u'8988', u'count': 1},
 {u'_id': u'1111', u'count': 1},
 {u'_id': u'38495', u'count': 1},
 {u'_id': u'P.O. Box 4605', u'count': 1},
 {u'_id': u'41318', u'count': 1},
 {u'_id': u'263076', u'count': 1},
 {u'_id': u'PO Box 118737', u'count': 1},
 {u'_id': u'128358', u'count': 1},
 {u'_id': u'41974', u'count': 1},
 {u'_id': u'77947', u'count': 1},
 {u'_id': u'2666', u'count': 1},
 {u'_id': u'232574', u'count': 1},
 {u'_id': u'9292', u'count': 1},
 {u'_id': u'28818', u'count': 1},
 {u'_id': u'Muhaisnah 4', u'count': 1},
 {u'_id': u'24976', u'count': 1},
 {u'_id': u'P.O. Box 5618, Abu Dhabi, U.A.E', u'count': 1},
 {u'_id': u'38575', u'count': 1},
 {u'_id': u'232144', u'count': 1},
 {u'_id': u'000001', u'count': 1},
 {u'_id': u'97717', u'count': 1},
 {u'_id': u'PO Box 43377', u'count': 1},
 {u'_id': u'549', u'count': 1},
 {u'_id': u'P.O. Box 6446', u'count': 1},
 {u'_id': u'42524', u'count': 1},
 {u'_id': u'P. O. Box 123234', u'count': 1},
 {u'_id': u'500368', u'count': 1},
 {u'_id': u'1243', u'count': 1},
 {u'_id': u'P. O. Box 31166', u'count': 1},
 {u'_id': u'7770', u'count': 1},
 {u'_id': u'6656', u'count': 1},
 {u'_id': u'444786', u'count': 1},
 {u'_id': u'79506', u'count': 1},
 {u'_id': u'114692', u'count': 1},
 {u'_id': u'392189', u'count': 1},
 {u'_id': u'52799', u'count': 1},
 {u'_id': u'34121', u'count': 1}]
In [227]:
# Create a list of zipcodes without correction
correct_zipcodes = dubai_abu_dhabi_correct.aggregate( [ 
    { "$match" : { "address.postcode" : { "$exists" : 1} } }, 
    { "$group" : { "_id" : "$address.postcode", "count" : { "$sum" : 1} } },  
    { "$sort" : { "count" : -1}}
] )
list(correct_zipcodes)
Out[227]:
[{u'_id': u'811', u'count': 5},
 {u'_id': u'473828', u'count': 4},
 {u'_id': u'NA', u'count': 3},
 {u'_id': u'22436', u'count': 3},
 {u'_id': u'24857', u'count': 3},
 {u'_id': u'71444', u'count': 2},
 {u'_id': u'11999', u'count': 2},
 {u'_id': u'23117', u'count': 2},
 {u'_id': u'20268', u'count': 2},
 {u'_id': u'53577', u'count': 2},
 {u'_id': u'103711', u'count': 2},
 {u'_id': u'125939', u'count': 2},
 {u'_id': u'6834', u'count': 2},
 {u'_id': u'34238', u'count': 1},
 {u'_id': u'20767', u'count': 1},
 {u'_id': u'277', u'count': 1},
 {u'_id': u'502227', u'count': 1},
 {u'_id': u'16095', u'count': 1},
 {u'_id': u'25494', u'count': 1},
 {u'_id': u'9770', u'count': 1},
 {u'_id': u'114822', u'count': 1},
 {u'_id': u'31166', u'count': 1},
 {u'_id': u'108100', u'count': 1},
 {u'_id': u'28818', u'count': 1},
 {u'_id': u'12345', u'count': 1},
 {u'_id': u'8845', u'count': 1},
 {u'_id': u'4758', u'count': 1},
 {u'_id': u'00962', u'count': 1},
 {u'_id': u'64649', u'count': 1},
 {u'_id': u'44263', u'count': 1},
 {u'_id': u'26268', u'count': 1},
 {u'_id': u'2504', u'count': 1},
 {u'_id': u'7819', u'count': 1},
 {u'_id': u'38126', u'count': 1},
 {u'_id': u'9978', u'count': 1},
 {u'_id': u'4599', u'count': 1},
 {u'_id': u'46477', u'count': 1},
 {u'_id': u'392189', u'count': 1},
 {u'_id': u'20661', u'count': 1},
 {u'_id': u'231992', u'count': 1},
 {u'_id': u'2574', u'count': 1},
 {u'_id': u'3541', u'count': 1},
 {u'_id': u'81730', u'count': 1},
 {u'_id': u'57566', u'count': 1},
 {u'_id': u'6446', u'count': 1},
 {u'_id': u'43377', u'count': 1},
 {u'_id': u'111695', u'count': 1},
 {u'_id': u'5280', u'count': 1},
 {u'_id': u'47612', u'count': 1},
 {u'_id': u'32923', u'count': 1},
 {u'_id': u'44548', u'count': 1},
 {u'_id': u'2157', u'count': 1},
 {u'_id': u'60884', u'count': 1},
 {u'_id': u'113431', u'count': 1},
 {u'_id': u'42324', u'count': 1},
 {u'_id': u'118737', u'count': 1},
 {u'_id': u'3766', u'count': 1},
 {u'_id': u'121641', u'count': 1},
 {u'_id': u'8988', u'count': 1},
 {u'_id': u'38495', u'count': 1},
 {u'_id': u'41318', u'count': 1},
 {u'_id': u'263076', u'count': 1},
 {u'_id': u'128358', u'count': 1},
 {u'_id': u'4605', u'count': 1},
 {u'_id': u'41974', u'count': 1},
 {u'_id': u'77947', u'count': 1},
 {u'_id': u'2666', u'count': 1},
 {u'_id': u'232574', u'count': 1},
 {u'_id': u'9292', u'count': 1},
 {u'_id': u'24976', u'count': 1},
 {u'_id': u'1111', u'count': 1},
 {u'_id': u'28676', u'count': 1},
 {u'_id': u'38575', u'count': 1},
 {u'_id': u'119417', u'count': 1},
 {u'_id': u'232144', u'count': 1},
 {u'_id': u'00971', u'count': 1},
 {u'_id': u'5618', u'count': 1},
 {u'_id': u'79506', u'count': 1},
 {u'_id': u'114692', u'count': 1},
 {u'_id': u'000001', u'count': 1},
 {u'_id': u'97717', u'count': 1},
 {u'_id': u'123234', u'count': 1},
 {u'_id': u'549', u'count': 1},
 {u'_id': u'42524', u'count': 1},
 {u'_id': u'47602', u'count': 1},
 {u'_id': u'1234', u'count': 1},
 {u'_id': u'500368', u'count': 1},
 {u'_id': u'1243', u'count': 1},
 {u'_id': u'7770', u'count': 1},
 {u'_id': u'6656', u'count': 1},
 {u'_id': u'444786', u'count': 1},
 {u'_id': u'115443', u'count': 1},
 {u'_id': u'52799', u'count': 1},
 {u'_id': u'6770', u'count': 1},
 {u'_id': u'34121', u'count': 1}]

Correction is successful.

The alternative way to update zip codes is to change the file for converting osm format into json:

In [10]:
z0 = zip_codes(FILE0)
In [11]:
z0[0]
Out[11]:
119
In [19]:
znp0 = np.array(sorted(z0[1]))
print "All postcodes:", znp0
expected0 = np.append(znp0[3:67], znp0[68:87])
print "Expected:", expected0
unexpected00 = np.append(znp0[:3], znp0[87:])
unexpected0 = np.insert(unexpected00, 3, znp0[67])
print "Unexpected:", unexpected0
All postcodes: ['0' '0000' '000000' '000001' '00962' '00971' '103711' '108100' '1111'
 '111695' '113431' '114692' '115443' '119417' '11999' '121641' '1234'
 '12345' '1243' '125939' '128358' '16095' '20268' '20661' '20767' '2157'
 '22436' '23117' '231992' '232144' '24857' '24976' '2504' '25494' '2574'
 '26268' '263076' '2666' '277' '28676' '28818' '32923' '33500' '34121'
 '34238' '35004' '3541' '38126' '38495' '38575' '392189' '41318' '41974'
 '42324' '42524' '44263' '444786' '44548' '4599' '46477' '473828' '4758'
 '47602' '47612' '500368' '502227' '52799' '5280 dubai' '53577' '549'
 '57566' '60884' '64649' '6656' '6834' '71444' '7770' '77947' '7819'
 '79506' '811' '81730' '8845' '8988' '9292' '97717' '9978' 'Muhaisnah 4'
 'P O BOX 3766' 'P. O. Box 123234' 'P. O. Box 31166' 'P.O. Box 4605'
 'P.O. Box 5618, Abu Dhabi, U.A.E' 'P.O. Box 6446' 'P.O. Box 9770'
 'PO Box 114822' 'PO Box 118737' 'PO Box 43377' 'PO Box 6770']
Expected: ['000001' '00962' '00971' '103711' '108100' '1111' '111695' '113431'
 '114692' '115443' '119417' '11999' '121641' '1234' '12345' '1243' '125939'
 '128358' '16095' '20268' '20661' '20767' '2157' '22436' '23117' '231992'
 '232144' '24857' '24976' '2504' '25494' '2574' '26268' '263076' '2666'
 '277' '28676' '28818' '32923' '33500' '34121' '34238' '35004' '3541'
 '38126' '38495' '38575' '392189' '41318' '41974' '42324' '42524' '44263'
 '444786' '44548' '4599' '46477' '473828' '4758' '47602' '47612' '500368'
 '502227' '52799' '53577' '549' '57566' '60884' '64649' '6656' '6834'
 '71444' '7770' '77947' '7819' '79506' '811' '81730' '8845' '8988' '9292'
 '97717' '9978']
Unexpected: ['0' '0000' '000000' '5280 dubai' 'Muhaisnah 4' 'P O BOX 3766'
 'P. O. Box 123234' 'P. O. Box 31166' 'P.O. Box 4605'
 'P.O. Box 5618, Abu Dhabi, U.A.E' 'P.O. Box 6446' 'P.O. Box 9770'
 'PO Box 114822' 'PO Box 118737' 'PO Box 43377' 'PO Box 6770']
In [30]:
'P.O. Box 9770' in correction.keys()
Out[30]:
True
In [31]:
correction['P.O. Box 9770']
Out[31]:
'9770'
In [15]:
# osm_json_correct.py

# Strings with chars that will cause problems as keys
problemchars = re.compile(r'[=\+/&<>;\'"\?%#$@\,\. \t\r\n]')

# Function for creating nodes
def shape_element(element):
# Create the empty dictionary for the data in the osm string
    node = {}
    if element.tag == "node" or element.tag == "way":
        # Create the empty dictionary for the 'address' attributes and the list for the 'nd' attribute
        address = {}
        nd = []
        # Add the type and the id of the element
        node["type"] = element.tag
        node["id"] = element.attrib["id"]
        # Add the tag 'visible'
        if "visible" in element.attrib.keys():
            node["visible"] = element.attrib["visible"]
        # Add the geoposition
        if "lat" in element.attrib.keys():
            node["pos"] = [float(element.attrib['lat']), float(element.attrib['lon'])]
        # Add the set of the attributes
        node["created"] = {"version": element.attrib['version'],
                            "changeset": element.attrib['changeset'],
                            "timestamp": element.attrib['timestamp'],
                            "uid": element.attrib['uid'],
                            "user": element.attrib['user']}
        # Analize the problemchars and add address attributes
        for tag in element.iter("tag"):
            p = problemchars.search(tag.attrib['k'])
            if p:
                print "problemchars: ", p.group()
                continue
            elif tag.attrib['k'][:5] == "addr:":
                if ":" in tag.attrib['k'][5:]:
                    continue
                else:
                    # Correction the postcodes
                    if tag.attrib['k'] == "addr:postcode":
                        if tag.attrib['v'] in correction.keys():
                            address[tag.attrib['k'][5:]] = update_name(tag.attrib['v'], 
                                                                       correction)
                        else:
                            address[tag.attrib['k'][5:]] = tag.attrib['v']
                    else:
                        address[tag.attrib['k'][5:]] = tag.attrib['v']
            else:
                node[tag.attrib['k']] = tag.attrib['v']
        if address != {}:
            node['address'] = address
        # Add the 'node_ref' attribute
        for tag2 in element.iter("nd"):
            nd.append(tag2.attrib['ref'])
        if nd != []:
            node['node_refs'] = nd
        return node
    # Skip elements without the tags 'node' or 'way'
    else:
        return None
    
# Function for creating the .json file
def process_map(file_in, pretty = False):
    # Setup the format for output files
    file_out = "{0}.json".format(file_in)
    # Create the empty data
    data = []
    # Open the osm file and read strings
    with codecs.open(file_out, "w") as fo:
        for _, element in ET.iterparse(file_in):
            # Apply the created function 'shape_element'
            el = shape_element(element)
            if el:
                data.append(el)
                # Write the element into the json file
                if pretty:
                    fo.write(json.dumps(el, indent=2)+"\n")
                else:
                    fo.write(json.dumps(el) + "\n")
    return data
In [16]:
# Create a json file
DATA0 = process_map(FILE0)
problemchars:   
problemchars:   
problemchars:   
problemchars:   
problemchars:   
problemchars:   
problemchars:   
problemchars:   
problemchars:   
problemchars:   
problemchars:   
problemchars:   
problemchars:   
problemchars:   
problemchars:   
problemchars:   
problemchars:  .
problemchars:  .
problemchars:  .
problemchars:  .
In [17]:
# Check the correction of the postcodes
postcodes0 = []
for element in DATA0:
    address = element.get('address')
    if address != None:
        postcode = address.get('postcode')
        if postcode != None:
            postcodes0.append(postcode)
print postcodes0
['115443', '34121', '811', '811', '42524', '473828', '473828', '473828', '473828', '3766', '549', '6656', '24976', '6834', '500368', '2666', '47602', '232574', '32923', '9292', '444786', '125939', '128358', '119417', '113431', '77947', '41318', '38495', '1243', '28676', '5618', '121641', '42324', '20268', '2157', '5280', '111695', '53577', '53577', '22436', '232144', '81730', '23117', '44548', '47612', '24857', '97717', '60884', '2574', '57566', 'NA', '114692', '7770', '20661', 'NA', '392189', '46477', '38575', 'NA', '4599', '38126', '231992', '103711', '103711', '00971', '7819', '2504', '26268', '64649', '00962', 'NA', 'NA', '1234', '4758', '44263', '263076', '9978', '71444', '12345', '79506', '108100', '8988', '35004', '33500', '123234', '125939', '118737', '28818', '31166', '6770', '3541', '114822', '9770', '1111', '811', '25494', '41974', '811', '811', '43377', '16095', '6834', '000001', '8845', '502227', '6446', '52799', '277', '20268', '4605', '24857', '20767', '34238', '22436', '22436', '23117', '71444', '24857', '11999', '11999']

Correction is successful as well.