P3: OpenStreetMap Data Case Study. Dubai and Abu-Dhabi.

0. Code Resources

0.1. Code Library
In [ ]:
from IPython.core.display import display, HTML
In [2]:
import matplotlib
In [3]:
import matplotlib.pyplot as plt
In [4]:
%matplotlib inline
In [5]:
import folium
In [6]:
import geopandas as gpd
In [7]:
# from mpl_toolkits.basemap import Basemap
In [8]:
import xml.etree.cElementTree as ET
In [9]:
from collections import defaultdict as dfdict
In [10]:
import numpy as np
In [11]:
import pandas as pd
In [12]:
import pprint
In [13]:
import urllib
In [14]:
import re
In [15]:
import os
In [16]:
import csv
In [17]:
import cerberus
In [18]:
import json
In [19]:
import codecs
In [20]:
from schema import Schema
In [1]:
import sqlite3
In [22]:
from sqlalchemy import create_engine
engine = create_engine('sqlite:///:memory:')
In [23]:
from odo import odo, discover, resource
In [24]:
hide_code = ''
HTML('''<script>
code_show = true; 
function code_display() {
  if (code_show) {
    $('div.input').each(function(id) {
      if (id == 0 || $(this).html().indexOf('hide_code') > -1) {
        $(this).hide();
      }
    });
    $('div.output_prompt').css('opacity', 0);
  } else {
    $('div.input').each(function(id) {
      $(this).show();
    });
    $('div.output_prompt').css('opacity', 1);
  }
  code_show = !code_show
} 
$( document ).ready(code_display);
</script>
<form action="javascript: code_display()"><input style="opacity: 100" type="submit" 
value="Click to show or to hide code cells"></form>''')
Out[24]:
For displaying or hiding the code cells the reader of the project can use the button "Click to show or to hide code cells" on the top.
0.3. Code for Researching the Imported Files and Creating the Data.

Сode snippets of the courses "Intro to Relational Databases", "SQL for Data Analysis" (udacity.com) have been used here.

In [25]:
hide_code
# Function for counting tags
def count_tags(filename):
    count = dfdict(int)
    for item in ET.iterparse(filename):
        count[item[1].tag] += 1
    return count
In [26]:
hide_code
# Functions for counting users
def get_user(element):
    return

def process_map_users(filename):
    users = set()
    for _, element in ET.iterparse(filename):
        if element.tag == 'node' or element.tag == 'way' or element.tag == 'relation':
            users.add(element.attrib['user'])

    return users
In [27]:
hide_code
# Strings containing lower case chars
lower = re.compile(r'^([a-z]|_)*$') 
# Strings with lower case chars and a ':'
lower_colon = re.compile(r'^([a-z]|_)*:([a-z]|_)*$')
# Strings with chars that will cause problems as keys. 
problemchars = re.compile(r'[=\+/&<>;\'"\?%#$@\,\. \t\r\n]')  

# Function for sorting by key type
def key_type(element, keys):
    if element.tag == "tag":

            if lower.search(element.attrib['k']) != None: 
                keys['lower'] += 1
            elif lower_colon.search(element.attrib['k']) != None:
                keys['lower_colon'] += 1
            elif problemchars.search(element.attrib['k']) != None:
                keys['problemchars'] += 1
            else: 
                keys['other'] += 1

    return keys

# Function for counting keys by type
def process_map_keys(filename):
    keys = {"lower": 0, "lower_colon": 0, "problemchars": 0, "other": 0}
    for _, element in ET.iterparse(filename):
        keys = key_type(element, keys)

    return keys
In [28]:
hide_code
# Function for counting street addresses
def street_number(file_name):
    count = 0

    for event, elem in ET.iterparse(file_name, events=("start",)):
        if elem.tag == 'node' or elem.tag == 'way':
            for tag in elem.iter('tag'):
                if tag.attrib['k'] == "addr:street":
                    count += 1
    return count
In [29]:
hide_code
# Function for counting zip codes
def zip_codes(filename):
    count = 0
    data = set()

    for event, elem in ET.iterparse(filename, events=("start",)):
        if elem.tag == 'node' or elem.tag == 'way':
            for tag in elem.iter('tag'):
                if tag.attrib['k'] == "addr:postcode":
                    count += 1
                    data.add( tag.attrib['v'] )
                                     
    return count, data
In [30]:
hide_code
# Functions for auditing zip codes.
expected=[]

def audit_postcode_range(postcode,tag):
    if tag.attrib["v"] not in expected:
        if tag.attrib["v"] not in postcode:
            postcode[tag.attrib["v"]]=1
        else:
            postcode[tag.attrib["v"]]+=1    
            
def is_postcode(elem):
    return (elem.attrib['k'] == "addr:postcode")

def process_map_postcodes(filename):
    postcode={}
    osm_file = open(filename, "r")
    for event, elem in ET.iterparse(osm_file, events=("start",)):
        if elem.tag == "node" or elem.tag == "way":
            for tag in elem.iter("tag"):
                if is_postcode(tag):
                    audit_postcode_range(postcode,tag)
    return postcode   
In [31]:
hide_code
# Function for displaying english names
def process_map_names(filename):
    count = 0
    data = set()

    for event, elem in ET.iterparse(filename, events=("start",)):
        if elem.tag == 'node' or elem.tag == 'way':
            for tag in elem.iter('tag'):
                if tag.attrib['k'] == "name:en":
                    count += 1
                    data.add( tag.attrib['v'] )
                                     
    return count, data 
In [32]:
hide_code
# Function for sorting by place
def place_type(element, places):
    if element.tag == "node":
         for tag in element.iter('tag'):
                if tag.attrib['k'] == 'place':
                    if tag.attrib['v'] == 'city': 
                        places['city'] += 1
                    elif tag.attrib['v'] == 'town':
                        places['town'] += 1
                    elif tag.attrib['v'] == 'village':
                        places['village'] += 1
                    elif tag.attrib['v'] == 'hamlet':
                        places['hamlet'] += 1
                    elif tag.attrib['v'] == 'island':
                        places['island'] += 1   
                    else: 
                        places['other'] += 1                      
    return places

# Function for counting places by type.
def process_map_places(filename):
    places = {"city": 0, "town": 0, "island" : 0, "village": 0, "hamlet" : 0, "other": 0}
    for _, element in ET.iterparse(filename):
        places = place_type(element, places)

    return places
In [33]:
hide_code
# Functions for auditing street names.
street_type_re = re.compile(r'\b\S+\.?$', re.IGNORECASE)

expected = ["Avenue", "Boulevard", "Commons", "Court", 
            "Drive", "Lane", "Parkway", "Place", 
            "Road", "Square", "Street", "Trail"]

mapping = {'Ave'  : 'Avenue',
           'Blvd' : 'Boulevard',
           'Dr'   : 'Drive',
           'Ln'   : 'Lane',
           'Pkwy' : 'Parkway',
           'ROAD' : 'Road',
           'Rd'   : 'Road',
           'Rd.'  : 'Road',
           'road' : 'Road',
           'rd'   : 'Road',
           'STREET' : 'Street',
           'St.'  : 'Street',
           'st.'  : 'Street',
           'St'   : 'Street',
           'st'   : 'Street',
           'street' :"Street",
           'Ct'   : "Court",
           'Cir'  : "Circle",
           'Cr'   : "Court",
           'ave'  : 'Avenue',
           'Hwg'  : 'Highway',
           'Hwy'  : 'Highway',
           'Sq'   : "Square"}

def audit_street_type(street_types, street_name):
    m = street_type_re.search(street_name)
    if m:
        street_type = m.group()
        if street_type not in expected:
            street_types[street_type].add(street_name)

def is_street_name(elem):
    return (elem.attrib['k'] == "addr:street")

def audit_street_names(filename):
    osm_file = open(filename, "r")
    street_types = dfdict(set)
    for event, elem in ET.iterparse(osm_file, events=("start",)):

        if elem.tag == "node" or elem.tag == "way":
            for tag in elem.iter("tag"):
                if is_street_name(tag):
                    audit_street_type(street_types, tag.attrib['v'])

    return street_types
In [34]:
hide_code
# Functions for updating street names
def update_name(name, mapping, regex):
    m = regex.search(name)
    if m:
        street_type = m.group()
        if street_type in mapping:
            name = re.sub(regex, mapping[street_type], name)

    return name
In [35]:
hide_code
# Functions for creating the sample file
def get_element(osm_file, tags=('node', 'way', 'relation')):
    """Yield element if it is the right type of tag

    Reference:
    http://stackoverflow.com/questions/3095434/inserting-newlines-in-xml-file-generated-via-xml-etree-elementtree-in-python
    """
    context = iter(ET.iterparse(osm_file, events=('start', 'end')))
    _, root = next(context)
    for event, elem in context:
        if event == 'end' and elem.tag in tags:
            yield elem
            root.clear()
In [36]:
hide_code
# Strings containing lower case chars
lower = re.compile(r'^([a-z]|_)*$')
# Strings with lower case chars and a ':'
lower_colon = re.compile(r'^([a-z]|_)*:([a-z]|_)*$')
# Strings with chars that will cause problems as keys
problemchars = re.compile(r'[=\+/&<>;\'"\?%#$@\,\. \t\r\n]')

CREATED = [ "version", "changeset", "timestamp", "user", "uid"]

# Function for creating nodes
def shape_element1(element):
    node = {}
    if element.tag == "node" or element.tag == "way":
        address = {}
        nd = []
        node["type"] = element.tag
        node["id"] = element.attrib["id"]
        if "visible" in element.attrib.keys():
            node["visible"] = element.attrib["visible"]
        if "lat" in element.attrib.keys():
            node["pos"] = [float(element.attrib['lat']), float(element.attrib['lon'])]
        node["created"] = {"version": element.attrib['version'],
                            "changeset": element.attrib['changeset'],
                            "timestamp": element.attrib['timestamp'],
                            "uid": element.attrib['uid'],
                            "user": element.attrib['user']}
        for tag in element.iter("tag"):
            problems = problemchars.search(tag.attrib['k'])
            if problems:
                print "problemchars: ", problems.group()
                continue
            elif tag.attrib['k'][:5] == "addr:":
                if ":" in tag.attrib['k'][5:]:
                    continue
                else:
                    address[tag.attrib['k'][5:]] = tag.attrib['v']
            else:
                node[tag.attrib['k']] = tag.attrib['v']
        if address != {}:
            node['address'] = address
        for tag2 in element.iter("nd"):
            nd.append(tag2.attrib['ref'])
        if nd != []:
            node['node_refs'] = nd
        return node
    else:
        return None
    
# Function for creating the .json file
def process_map1(file_in, pretty = False):
    file_out = "{0}.json".format(file_in)
    data = []
    with codecs.open(file_out, "w") as fo:
        for _, element in ET.iterparse(file_in):
            el = shape_element1(element)
            if el:
                data.append(el)
                if pretty:
                    fo.write(json.dumps(el, indent=2)+"\n")
                else:
                    fo.write(json.dumps(el) + "\n")
    return data
In [37]:
hide_code
import schema
SCHEMA = schema.Schema
In [38]:
hide_code
NODES_PATH = "nodes.csv"
NODE_TAGS_PATH = "nodes_tags.csv"
WAYS_PATH = "ways.csv"
WAY_NODES_PATH = "ways_nodes.csv"
WAY_TAGS_PATH = "ways_tags.csv"

LOWER_COLON = re.compile(r'^([a-z]|_)+:([a-z]|_)+')
PROBLEMCHARS = re.compile(r'[=\+/&<>;\'"\?%#$@\,\. \t\r\n]')

# Make sure the fields order in the csvs matches the column order in the sql table schema
NODE_FIELDS = ['id', 'lat', 'lon', 'user', 'uid', 'version', 'changeset', 'timestamp']
NODE_TAGS_FIELDS = ['id', 'key', 'value', 'type']
WAY_FIELDS = ['id', 'user', 'uid', 'version', 'changeset', 'timestamp']
WAY_TAGS_FIELDS = ['id', 'key', 'value', 'type']
WAY_NODES_FIELDS = ['id', 'node_id', 'position']
In [39]:
hide_code
def shape_element(element, node_attr_fields = NODE_FIELDS, way_attr_fields = WAY_FIELDS,
                  problem_chars = PROBLEMCHARS, default_tag_type = 'regular'):
    """Clean and shape node or way XML element to Python dict"""

    node_attribs = {}
    way_attribs = {}
    way_nodes = []
    tags = []  # Handle secondary tags the same way for both node and way elements

    if element.tag == 'node':
        for node in NODE_FIELDS:
            node_attribs[node] = element.attrib[node]
        for child in element:
            tag = {}
            if PROBLEMCHARS.search(child.attrib["k"]):
                continue
        
            elif LOWER_COLON.search(child.attrib["k"]):
                tag_type = child.attrib["k"].split(':',1)[0]
                tag_key = child.attrib["k"].split(':',1)[1]
                tag["key"] = tag_key
                if tag_type:
                    tag["type"] = tag_type
                else:
                    tag["type"] = 'regular'
            
                tag["id"] = element.attrib["id"]
                tag["value"] = child.attrib["v"]
            else:
                tag["value"] = child.attrib["v"]
                tag["key"] = child.attrib["k"]
                tag["type"] = "regular"
                tag["id"] = element.attrib["id"]
            if tag:
                tags.append(tag)
        return {'node': node_attribs, 'node_tags': tags}
    elif element.tag == 'way':
        for way in WAY_FIELDS:
            way_attribs[way] = element.attrib[way]
        for child in element:
            nd = {}
            tag = {}
            if child.tag == 'tag':
                if PROBLEMCHARS.search(child.attrib["k"]):
                    continue
                elif LOWER_COLON.search(child.attrib["k"]):
                    tag_type = child.attrib["k"].split(':',1)[0]
                    tag_key = child.attrib["k"].split(':',1)[1]
                    tag["key"] = tag_key
                    if tag_type:
                        tag["type"] = tag_type
                    else:
                        tag["type"] = 'regular'
                    tag["id"] = element.attrib["id"]
                    tag["value"] = child.attrib["v"]
    
                else:
                    tag["value"] = child.attrib["v"]
                    tag["key"] = child.attrib["k"]
                    tag["type"] = "regular"
                    tag["id"] = element.attrib["id"]
                if tag:
                    tags.append(tag)
                    
            elif child.tag == 'nd':
                nd['id'] = element.attrib["id"]
                nd['node_id'] = child.attrib["ref"]
                nd['position'] = len(way_nodes)
            
                if nd:
                    way_nodes.append(nd)
            else:
                continue
        return {'way': way_attribs, 'way_nodes': way_nodes, 'way_tags': tags}
In [40]:
hide_code
# ================================================== #
#               Helper Functions                     #
# ================================================== #
def get_element(osm_file, tags=('node', 'way', 'relation')):
    """Yield element if it is the right type of tag"""

    context = ET.iterparse(osm_file, events=('start', 'end'))
    _, root = next(context)
    for event, elem in context:
        if event == 'end' and elem.tag in tags:
            yield elem
            root.clear()


def validate_element(element, validator, schema=SCHEMA):
    """Raise ValidationError if element does not match schema"""
    if validator.validate(element, schema) is not True:
        field, errors = next(validator.errors.iteritems())
        message_string = "\nElement of type '{0}' has the following errors:\n{1}"
        error_strings = (
            "{0}: {1}".format(k, v if isinstance(v, str) else ", ".join(v))
            for k, v in errors.iteritems()
        )
        raise cerberus.ValidationError(
            message_string.format(field, "\n".join(error_strings))
        )


class UnicodeDictWriter(csv.DictWriter, object):
    """Extend csv.DictWriter to handle Unicode input"""

    def writerow(self, row):
        super(UnicodeDictWriter, self).writerow({
            k: (v.encode('utf-8') if isinstance(v, unicode) else v) for k, v in row.iteritems()
        })

    def writerows(self, rows):
        for row in rows:
            self.writerow(row)
In [41]:
hide_code
def process_map(file_in, validate):
    """Iteratively process each XML element and write to csv(s)"""

    with codecs.open(NODES_PATH, 'w') as nodes_file, \
    codecs.open(NODE_TAGS_PATH, 'w') as nodes_tags_file, \
    codecs.open(WAYS_PATH, 'w') as ways_file, \
    codecs.open(WAY_NODES_PATH, 'w') as way_nodes_file, \
    codecs.open(WAY_TAGS_PATH, 'w') as way_tags_file:

        nodes_writer = UnicodeDictWriter(nodes_file, NODE_FIELDS)
        node_tags_writer = UnicodeDictWriter(nodes_tags_file, NODE_TAGS_FIELDS)
        ways_writer = UnicodeDictWriter(ways_file, WAY_FIELDS)
        way_nodes_writer = UnicodeDictWriter(way_nodes_file, WAY_NODES_FIELDS)
        way_tags_writer = UnicodeDictWriter(way_tags_file, WAY_TAGS_FIELDS)

        nodes_writer.writeheader()
        node_tags_writer.writeheader()
        ways_writer.writeheader()
        way_nodes_writer.writeheader()
        way_tags_writer.writeheader()

        validator = cerberus.Validator()

        for element in get_element(file_in, tags=('node', 'way')):
            el = shape_element(element)
            if el:
                if validate is True:
                    validate_element(el, validator)

                if element.tag == 'node':
                    nodes_writer.writerow(el['node'])
                    node_tags_writer.writerows(el['node_tags'])
                elif element.tag == 'way':
                    ways_writer.writerow(el['way'])
                    way_nodes_writer.writerows(el['way_nodes'])
                    way_tags_writer.writerows(el['way_tags'])

1. Map Area

1.1. The map

I have chosen the map sector of the dynamically developing area in the UAE.

For displaying the area I have used the package "folium" and the coordinates of this area in dubai_abu-dhabi.osm.

In [42]:
hide_code
# Display the coordinates of bounds from .osm file
HTML('<h4>bounds minlat="23.7350" minlon="53.5800" maxlat="26.5390" maxlon="56.8870"</h4>')
Out[42]:

bounds minlat="23.7350" minlon="53.5800" maxlat="26.5390" maxlon="56.8870"

In [43]:
hide_code
# Setup the coordinates of the map center and the zoom option.
map_osm = folium.Map(location=[25.2048, 55.2708], zoom_start=8)
# Add labels with coordinates.
folium.LatLngPopup().add_to(map_osm)
# Setup the coordinates of the map area.
points=[[23.7350, 53.5800], [23.7350, 56.8870], [26.5390, 56.8870], [26.5390, 53.5800], [23.7350, 53.5800]]
# Setup the border line with options.
folium.PolyLine(points, color="red", weight=5, opacity=0.3).add_to(map_osm)
# Display the map.
map_osm
Out[43]:
1.2 Extract with Python

There are several ways to extract geodata. One of them is to do this with this python code cell. This set of commands allows us to upload a file in the format .osm using the coordinates of the rectangle corners.

In [41]:
# Extract from overpass-api.de
file00 = urllib.URLopener()
file00.retrieve("http://overpass-api.de/api/map? bbox=53.5800,23.7350,56.8870,26.5390", "dubai_abu-dhabi0.osm")
Out[41]:
('dubai_abu-dhabi0.osm', <httplib.HTTPMessage instance at 0x10ca09710>)
1.3 Extract from OpenStreetMaps.org

Another possible way is extracting data files in many different formats from the website:

https://mapzen.com/data/metro-extracts/metro/dubai_abu-dhabi/ .

The files dubai_abu-dhabi.osm, dubai_abu-dhabi_buildings.geojson, etc. were downloaded.

1.4. Size of downloaded files.

In [45]:
hide_code
# Setup file directories and names of file variables
filedir1 = '/Users/olgabelitskaya/large-repo/'
filedir2 = '/Users/olgabelitskaya/large-repo/dubai_abu-dhabi.imposm-geojson/'
filedir3 = '/Users/olgabelitskaya/large-repo/dubai_abu-dhabi.imposm-shapefiles/'
file0 = filedir1 + 'dubai_abu-dhabi0.osm'
file1 = filedir1 + 'dubai_abu-dhabi.osm'
file2 = filedir2 + 'dubai_abu-dhabi_admin.geojson'
file3 = filedir2 + 'dubai_abu-dhabi_roads.geojson'
file4 = filedir2 + 'dubai_abu-dhabi_waterareas.geojson'
file5 = filedir2 + 'dubai_abu-dhabi_buildings.geojson'
file6 = filedir2 + 'dubai_abu-dhabi_amenities.geojson'
In [46]:
hide_code
# Get size of the .osm files
print "Size of files"
print "dubai_abu-dhabi0.osm: ", os.path.getsize(file0)
print "dubai_abu-dhabi.osm: ", os.path.getsize(file1)
# Get size of the .geojson files
print "dubai_abu-dhabi_admin.geojson: ", os.path.getsize(file2)
print "dubai_abu-dhabi_roads.geojson: ", os.path.getsize(file3)
print "dubai_abu-dhabi_waterareas.geojson: ", os.path.getsize(file4)
print "dubai_abu-dhabi_buildings.geojson: ", os.path.getsize(file5)
print "dubai_abu-dhabi_amenities.geojson: ", os.path.getsize(file6)
Size of files
dubai_abu-dhabi0.osm:  404994999
dubai_abu-dhabi.osm:  394382598
dubai_abu-dhabi_admin.geojson:  1345560
dubai_abu-dhabi_roads.geojson:  86725595
dubai_abu-dhabi_waterareas.geojson:  2415039
dubai_abu-dhabi_buildings.geojson:  30994125
dubai_abu-dhabi_amenities.geojson:  144750
1.5 Osm files

This is not so large piece of data to process (394,4 MB) in the dubai_abu-dhabi .osm file and for me it is a very interesting subject for reseach because of many reasons.

For example, it is a constant and rapidly changing territory with awesome ideas about area development.

Applying the special function (§ 0.3) I created the sample_dubai_abu-dhabi.osm file from the dubai_abu-dhabi .osm file.

In [38]:
hide_code
# Setup the file for sample extraction
OSM_FILE = file1 
# Setup the name for the file with a sample
SAMPLE_FILE = "sample_dubai_abu-dhabi.osm"
In [48]:
hide_code
# Create a sample file
k = 100 # Parameter: take every k-th top level element

with open(SAMPLE_FILE, 'wb') as output:
    output.write('<?xml version="1.0" encoding="UTF-8"?>\n')
    output.write('<osm>\n  ')

    # Write every kth top level element
    for i, element in enumerate(get_element(OSM_FILE)):
        if i % k == 0:
            output.write(ET.tostring(element, encoding='utf-8'))

    output.write('</osm>')
In [47]:
hide_code
# Setup the file directory and the name of a file variable
file8 = filedir1 + 'sample_dubai_abu-dhabi.osm'
In [48]:
hide_code
# Get size of the created .osm file
print "Size of sample_dubai_abu-dhabi.osm: ", os.path.getsize(file8)
Size of sample_dubai_abu-dhabi.osm:  3947501

1.6 Geojson files

It's possible to download from OpenStreetMap several type of files: .osm, .geojson, etc.

For displaying the data in .geojson files the package "geopandas" also can be useful. As an example you can see the map of administrative borders, roads and water areas.

In [49]:
hide_code
# Read the .geojson files
df_admin = gpd.read_file(file2)
In [50]:
hide_code
df_roads = gpd.read_file(file3)
In [51]:
hide_code
df_waterareas = gpd.read_file(file4)
In [52]:
hide_code
df_buildings = gpd.read_file(file5)
In [53]:
hide_code
df_amenities = gpd.read_file(file6)
In [54]:
hide_code
print "The dimensionality of the data"
print "dataframe for admin borders:", df_admin.shape
print "dataframe for roads:", df_roads.shape
print "dataframe for water areas:", df_waterareas.shape
print "dataframe for buildings:", df_buildings.shape
print "dataframe for amenities:", df_amenities.shape
The dimensionality of the data
dataframe for admin borders: (231, 6)
dataframe for roads: (130060, 13)
dataframe for water areas: (1510, 6)
dataframe for buildings: (57802, 5)
dataframe for amenities: (648, 5)
In [54]:
hide_code
print "Displaying the examples of these data frames"
df_admin.head(3)
Displaying the examples of these data frames
Out[54]:
admin_leve geometry id name osm_id type
0 2.0 POLYGON ((56.20800613403377 25.25621456273814,... 1.0 عمان -305138.0 administrative
1 2.0 (POLYGON ((53.97770508117634 25.22422729239028... 2.0 الإمارات العربيّة المتّحدة -307763.0 administrative
2 4.0 (POLYGON ((54.71539805585797 25.06933869038014... 3.0 دبي‎ -3766483.0 administrative
In [55]:
hide_code
# Setup the size of the image
matplotlib.rcParams['figure.figsize'] = (14, 14)
# plt.figure(figsize=(14,14))
# Print map
df_admin.plot()
plt.show()
In [47]:
hide_code
df_roads.head(3)
Out[47]:
access bridge class geometry id name oneway osm_id ref service tunnel type z_order
0 None 0 highway LINESTRING (55.32015262128414 25.2757784276593... 1.0 None 0 4342763.0 None None 0 residential 3.0
1 None 0 highway LINESTRING (55.31645762690815 25.2760928328473... 2.0 None 0 4342765.0 None None 0 residential 3.0
2 None 0 highway LINESTRING (55.36891333716767 25.2681990917146... 3.0 None 0 4387626.0 None None 0 residential 3.0
In [65]:
hide_code
matplotlib.rcParams['figure.figsize'] = (14, 14)
df_roads.plot()
plt.show()
In [48]:
hide_code
df_waterareas.head(3)
Out[48]:
area geometry id name osm_id type
0 0.000007 POLYGON ((55.16472162631962 25.06663418550218,... 1.0 None -2809.0 water
1 0.000001 POLYGON ((55.33883487762949 25.24116309539045,... 2.0 None -1271995.0 water
2 0.000001 POLYGON ((55.15874105458627 25.07026178937664,... 3.0 None -2812.0 water
In [69]:
hide_code
matplotlib.rcParams['figure.figsize'] = (14, 14)
df_waterareas.plot()
plt.show()
In [66]:
hide_code
df_buildings.head(3)
Out[66]:
geometry id name osm_id type
0 POLYGON ((55.34428713800582 25.2762302960594, ... 1.0 Al-Yasmeen Building -215463.0 yes
1 POLYGON ((54.47010844882664 24.40112892697888,... 2.0 None -968963.0 yes
2 POLYGON ((54.33904339493045 24.46584627193386,... 3.0 None -1308776.0 yes
In [68]:
hide_code
matplotlib.rcParams['figure.figsize'] = (14, 14)
df_buildings.plot()
plt.show()
In [69]:
hide_code
df_amenities.head(3)
Out[69]:
geometry id name osm_id type
0 POINT (55.00592345321226 24.9266173428357) 1.0 None 255379927.0 fuel
1 POINT (55.23636518932394 25.16734937383546) 2.0 None 260060737.0 fuel
2 POINT (55.2574323485793 25.18745579934903) 3.0 Eppco 260061366.0 fuel
In [70]:
hide_code
matplotlib.rcParams['figure.figsize'] = (14, 14)
df_amenities.plot()
plt.show()
1.7 Shapefiles

For displaying the data in shapefiles it's possible to apply the package "basemap". As an example you can see the map of roads and aeroways.

In [49]:
hide_code
# Setup the size of the image
matplotlib.rcParams['figure.figsize'] = (14, 14)
# Setup the colors for surfaces
water = 'lightskyblue'
earth = 'cornsilk'
# Create a map
mm = Basemap(llcrnrlon=53.58, llcrnrlat=23.73, urcrnrlon=56.89, urcrnrlat=26.53, 
             ellps='WGS84', resolution='i', projection='cass', lat_0 = 25.0756, lon_0 = 55.3821)
# Variables for drawing map components
coast = mm.drawcoastlines()
rivers = mm.drawrivers(color=water, linewidth=1)
continents = mm.fillcontinents(color=earth,lake_color=water)
bound= mm.drawmapboundary(fill_color=water)
countries = mm.drawcountries()
merid = mm.drawmeridians(np.arange(-180, 180, 0.5), labels=[False, False, False, True])
parall = mm.drawparallels(np.arange(0, 80, 0.5), labels=[True, True, False, False])
# Read shapefiles
mm.readshapefile('/Users/olgabelitskaya/large-repo/dubai_abu-dhabi.imposm-shapefiles/dubai_abu-dhabi_osm_roads', 
                 name='roads', drawbounds=True, color='grey')
mm.readshapefile('/Users/olgabelitskaya/large-repo/dubai_abu-dhabi.imposm-shapefiles/dubai_abu-dhabi_osm_aeroways', 
                 name='aeroways', drawbounds=True, color='blue')
# Display the map
plt.show()
1.8 Json file

Applying the special function (§ 0.3) I created the dubai_abu-dhabi.osm.json from the dubai_abu-dhabi.osm file.

In [56]:
hide_code
# Extract data from the dataset in the .osm format as json files
# data1 = process_map(file1)
Out[56]:
''
In [55]:
hide_code
# Setup the variable for the .json file
file7 = filedir1 + 'dubai_abu-dhabi.osm.json'
In [56]:
hide_code
# Get size of the .json file
print "size of dubai_abu-dhabi.osm.json: ", os.path.getsize(file7)
size of dubai_abu-dhabi.osm.json:  458155339
1.9 CSV files
In [52]:
hide_code
# Creating csv files
if __name__ == '__main__':
    # Note: Validation is ~ 10X slower. For the project consider using a small
    # sample of the map when validating.
    process_map(file1, validate=False)
In [57]:
hide_code
print "size of nodes.csv: ", os.path.getsize('nodes.csv')
print "size of nodes_tags.csv: ", os.path.getsize('nodes_tags.csv')
print "size of ways.csv: ", os.path.getsize('ways.csv')
print "size of ways_tags.csv: ", os.path.getsize('ways_tags.csv')
print "size of ways_nodes.csv: ", os.path.getsize('ways_nodes.csv')
size of nodes.csv:  154228820
size of nodes_tags.csv:  3912302
size of ways.csv:  13797779
size of ways_tags.csv:  13383027
size of ways_nodes.csv:  55135540
In [56]:
hide_code
df_admin.to_csv('admin.csv', sep='\t', encoding='utf-8')
In [57]:
hide_code
df_roads.to_csv('roads.csv', sep='\t', encoding='utf-8')
In [58]:
hide_code
df_waterareas.to_csv('waterareas.csv', sep='\t', encoding='utf-8')
In [59]:
hide_code
df_buildings.to_csv('buildings.csv', sep='\t', encoding='utf-8')
In [60]:
hide_code
df_amenities.to_csv('amenities.csv', sep='\t', encoding='utf-8')
In [58]:
hide_code
print "size of admin.csv: ", os.path.getsize('admin.csv')
print "size of roads.csv: ", os.path.getsize('roads.csv')
print "size of waterareas.csv: ", os.path.getsize('waterareas.csv')
print "size of buildings.csv: ", os.path.getsize('buildings.csv')
print "size of amenities.csv: ", os.path.getsize('amenities.csv')
size of admin.csv:  1102948
size of roads.csv:  48529142
size of waterareas.csv:  1851199
size of buildings.csv:  19930367
size of amenities.csv:  56463

2. Data (OSM)

Let's discover the data in .osm files in details. It contains a lot of information of geographical objects.

2.1 Tags

OpenStreetMap represents physical features on the ground (e.g., roads or buildings) using tags attached to its basic data structures (its nodes, ways, and relations). Tags help describe an item and allow it to be found again by browsing or searching. They are chosen by the item's creator depending on the data point.

In [51]:
hide_code
# Count tags
print count_tags(file1)
defaultdict(<type 'int'>, {'node': 1890178, 'nd': 2271372, 'bounds': 1, 'member': 9779, 'tag': 503027, 'relation': 2820, 'way': 234327, 'osm': 1})
2.2 Users

Map data is collected from zero by volunteers (users). We can get the number and the list of them for this piece of the data.

In [57]:
hide_code
# Count users of the map editing
users1 = process_map_users(file1)
In [58]:
hide_code
# Display number of users
print "Number of users -", len(users1)
# Display example of the user list
user_list = list(users1)
print sorted(user_list)[:50]
Number of users - 1895
['-KINGSMAN-', '0508799996', '08xavstj', '12DonW', '12Katniss', '13 digits', '25837', '3abdalla', '4b696d', '4rch', '66444098', '7dhd', '@kevin_bullock', 'AAANNNDDD', 'AC FootCap', 'AE35', 'AHMED ABASHAR', 'AKazariani', 'ASHRAF CHOOLAKKADAVIL', 'A_Sadath', 'AakankshaR', 'Aal Ibra240380heem', 'Abbadi', 'Abdalmajeed Najmi', 'Abdelhadi Azaizeh', 'Abdul Noor Bank', 'Abdul Rahim Khan', 'Abdul wahab rashid', 'Abdulaziz AlSweda', 'Abdulla Shuqair', 'Abdullah Al Hany', 'Abdullah Alshareef', 'Abdullah Rana', 'Abdullah777', 'Abdurehman', 'AbeMazid', 'Abhin', 'Abiodun Babalola', 'Abllo', 'Aboad Jasim', 'Abood Ad', 'Abrarbhai', 'Absamc', 'AbuFazal', 'Abud', 'Adel alsaad', 'Adib Yz', 'Adil Alsuleimani', 'Adley', 'Adm Vtc']

Exploring the digital data in this file, we can get a large number of other statistics and information.

2.3 Keys
In [66]:
hide_code
# Count keys by types
process_map_keys(file1)
Out[66]:
{'lower': 479404, 'lower_colon': 20602, 'other': 3001, 'problemchars': 20}
2.4 Number of street addresses
In [67]:
hide_code
# Count street addresses
street_number(file1)
Out[67]:
1789
2.5 Places
In [84]:
hide_code
# Count places by types
print process_map_places(file1)
 {'town': 31, 'city': 13, 'island': 67, 'hamlet': 97, 'other': 749, 'village': 608}
2.6 Names in English
In [85]:
hide_code
# Count names in english with values
english_names1 = process_map_names(file1)
In [86]:
hide_code
print "The number of names in English: ", english_names1[0]
The number of names in English:  3413
In [87]:
hide_code
print list(english_names1[1])[:50]
['Tawi Madsus', 'Bani Umar', 'Jabal Nazifi', 'Umm Al Quwain', 'Le Mart', 'DoT Scrap Store', 'Buraimi Police station', 'CEREEN Textiles', 'Faseela Grocery', 'Al Muwaylah', 'Najman Grocery', 'Ras Huwayni', 'Al Musalall Grocery', 'Huwayniyah', 'Dariush Boulevard', 'OM YAMAN', 'Azerbayejan', 'Ayn al Mahab', 'Saad Pharmacy', 'Nabil house', 'Al Jowar', 'Zarub', 'Dubai Homeopathy Health Centre', 'Wadi Massayid', 'New academey school', 'Mohd Ibrahn Grocery', 'Ramlat Qasf', 'Fujayj', 'Harat ash Shaykh', 'Subtan', 'VMS-19', 'Emirates Towers', 'Qurun Hamad', 'Ruqat Bakhit', 'Tawi Salim', 'Carrefour City', 'mini mart', 'Al Husaifin', 'Darwish Lshkari Grocery', 'Wadi Hiyar', 'Khawr Naid', 'Wadi Sidr', 'Hadirah', 'Al Darak Grocery', 'Holiday Mini Mart', 'Sharyat Ayqal', 'Ras Humsi', 'Dansaf', 'Tawi Saghabah', 'Wadi Shukayyah']

On this map it may be noted a large number of duplicate names in English.

2.7 Postal Codes

In UAE mail is usually delivered to a P.O Box. As we can see practically all postcodes are individual. Let's display the list of P.O Boxes and the number of users for each of them (1-2 in the most of cases).

In [93]:
hide_code
print "The number of postcodes:", zip_codes(file1)[0]
The number of postcodes: 116
In [89]:
hide_code
# Audit P.O Box
postcode1 = process_map_postcodes(file1)
In [90]:
hide_code
# Display  P.O Box
print postcode1
{'00962': 1, '34121': 1, '7819': 1, '108100': 1, 'P.O. Box 5618, Abu Dhabi, U.A.E': 1, '8988': 1, '0': 1, '23117': 2, 'P O BOX 3766': 1, '103711': 2, '549': 1, '38495': 1, 'P.O. Box 4605': 1, 'Muhaisnah 4': 1, '20767': 1, '81730': 1, '2504': 1, 'PO Box 6770': 1, '8845': 1, 'PO Box 43377': 1, '97717': 1, '24857': 3, '232574': 1, 'P.O. Box 9770': 1, '60884': 1, '44263': 1, '277': 1, '16095': 1, 'P. O. Box 31166': 1, '502227': 1, '2666': 1, '41318': 1, 'P. O. Box 123234': 1, '00971': 1, '128358': 1, '79506': 1, '115443': 1, '500368': 1, '473828': 4, '28676': 1, '114692': 1, '232144': 1, '2574': 1, '121641': 1, '1243': 1, '125939': 2, 'PO Box 118737': 1, '57566': 1, '6834': 2, '28818': 1, 'PO Box 114822': 1, '42524': 1, '52799': 1, '2157': 1, '392189': 1, '9978': 1, '22436': 3, '231992': 1, '46477': 1, '5280 dubai': 1, '811': 5, '42324': 1, '12345': 1, '38126': 1, '113431': 1, '64649': 1, '47612': 1, '24976': 1, 'P.O. Box 6446': 1, '111695': 1, '41974': 1, '44548': 1, '0000': 1, '119417': 1, '1111': 1, '7770': 1, '77947': 1, '4599': 1, '25494': 1, '47602': 1, '1234': 1, '11999': 2, '34238': 1, '20661': 1, '53577': 2, '20268': 2, '9292': 1, '3541': 1, '000000': 2, '000001': 1, '38575': 1, '444786': 1, '263076': 1, '71444': 2, '32923': 1, '26268': 1}
2.8 Street names
In [59]:
hide_code
# Audit street names
street_names1 = audit_street_names(file1)
In [60]:
hide_code
# Display street names
pprint.pprint(dict(street_names1))
{'07': set(['07']),
 '1': set(['20B Street, Safa 1',
           'City Walk, Jumeirah 1',
           'E 1',
           'Hattan Street 1',
           'aljurf ind 1']),
 '10': set(['Street 10', 'ind area 10']),
 '11': set(['shabiya -11']),
 '111': set(['P.O.Box 111']),
 '12': set(['District 12', 'Street 12']),
 '12A': set(['12A']),
 '12K': set(['District 12K']),
 '13': set(['Street 13', 'industrial 13', 'street 13\n']),
 '14': set(['11th street, Musaffah M 14',
            'Musaffah Industrial Area Street 14']),
 '147': set(['147']),
 '15': set(['sweet 15']),
 '153': set(['Community 153']),
 '166': set(['166']),
 '17': set(['17']),
 '18': set(['Street 18', 'street 18']),
 '19': set(['19']),
 '19th)': set(["Sa'ada Street (19th)"]),
 '1D': set(['1D']),
 '2': set(['Al Barsha south 2',
           'Al Jaddaf 2',
           'Al Nahda 2',
           'Dragon Mart 2',
           'Dubai Investment Park 2',
           'Hattan Street 2',
           'Icad 2',
           'Street 2',
           'dubai investment park 2']),
 '2-A': set(['2-A']),
 '282825': set(['P.O. Box 282825']),
 '3': set(['13C street , Al Quoz Industrial 3', 'Street 3']),
 '3005': set(['P.O. Box: 3005']),
 '333': set(['Exit 333']),
 '34429': set(['P.O. Box 34429']),
 '39999': set(['Corniche West Street, P.O.Box 39999']),
 '4': set(['industrial 4']),
 '4010': set(['Corniche West Street, P.O. Box: 4010']),
 '43': set(['Street 43']),
 '434': set(['P.O.Box 434']),
 '47': set(['47']),
 '5': set(['Jumeirah Village Triangle,  District 2, Street 5',
           'Street 5',
           'industrial area 5']),
 '50': set(['P.O. Box 50']),
 '54': set(['Plot No. M-26, Area 54']),
 '56': set(['56']),
 '6': set(['6']),
 '606': set(['Gate Precinct 5 Suite 606']),
 '7': set(['Street 7']),
 '724': set(['Street 724']),
 '74147': set(['P.O.Box 74147']),
 '8': set(['Street 8']),
 '92': set(['Street 92']),
 'AE': set(['Bahar 7 Jumeirah Beach Res, Marsa, Dubai, AE']),
 'Almidfaa': set(['Ibrahim Almidfaa']),
 'Arab': set(['Mina Al Arab']),
 'Arif': set(['Abdul Salam Arif']),
 'Badaa': set(['Al Hudaiba Road, Al Badaa\n']),
 'Bahar': set(['Souk Al Bahar']),
 'Baharia': set(['Baharia']),
 'Bahia': set(['Al Shahama Rd., Al Bahia']),
 'Barsha': set(['Al Barsha']),
 'Bateen': set(['Al Bateen']),
 'Bay': set(['Business Bay']),
 'Br': set(['Al Qaram Br']),
 'Bridge': set(['Rawdha Bridge']),
 'Building': set(['Al Ghuwair Building']),
 'C': set(['TECOM SECTION C']),
 'C12': set(['District 12 C12']),
 'C15': set(['China C15']),
 'CBD06': set(['CBD06']),
 'Cafe': set(['No Fear Cafe']),
 'Cafeteria': set(['Warrior Net Cafeteria']),
 'Center': set(['High Bay Business Center']),
 'Centre': set(['Ibn Sina Medical Centre']),
 'Circuit': set(['Yas Marina Circuit']),
 'City': set(['Academic City',
              'Dubai Healthcare City',
              'Mohammed Bin Zayed City',
              'Motor City',
              'Up Town Motor City',
              'Zayed Sports City']),
 'Community': set(['Emirates Living Community']),
 'Cornich': set(['Airport road across Cornich']),
 'Corniche': set(['Al Qawasim Corniche',
                  'P.O.Box 33704,Villa No. 2, Al Qawasim Corniche']),
 'Crownn': set(['14th Street, Al Khalidiya, Near Saba Crownn']),
 'DIFC': set(['DIFC']),
 'Dhabi': set(['Sheraton Abu Dhabi', 'Southern Sun Abu Dhabi']),
 'Dip2': set(['Dip2']),
 'Dome': set(['The Dome']),
 'Downtown': set(['Al Reef Downtown']),
 'Dubai': set(['Silverene Tower, Dubai',
               'SouthRidge Branch, Downtown Dubai',
               'Street 13, Dubai']),
 'E': set(['Road E']),
 'E-11)': set(['Sheikh Zayed Road (E-11)']),
 'East': set(['Corniche Road East', 'Kahraba South East']),
 'Emirates': set(['Rose 2 - 17a St - Dubai 17a St Dubai United Arab Emirates']),
 'Floor': set(['Boutik Mall, 1st Floor', 'Tameem House Building Floor']),
 'Galleria': set(['The Galleria']),
 'Garhoud': set(['-Al Garhoud', 'Al Garhoud']),
 'Gate': set(['Dubai Ibn Battuta Gate']),
 'Ghuwair': set(['Al Ghuwair']),
 'Hail': set(['Abu Hail']),
 'Highway': set(['Sheikh Khalifa Bin Zayed Highway']),
 'Hill': set(['Newbridge Hill']),
 'Intersection': set(['Al Murraqabat Intersection']),
 'Island': set(['Paragon Mall, Reem Island', 'Park Island', 'Yas Island']),
 'JLT': set(['Cluster T, JLT']),
 'Jumeirah': set(['Crescent Rd, The Palm Jumeirah',
                  'Crescent Road, Atlantis Palm Jumeirah']),
 'Karama': set(['Karama']),
 'Khaimah': set(['Ras Al Khaimah']),
 'Khalidiya': set(['Khalidiya']),
 'Khorfakkan': set(['Al Rifa Street, Khorfakkan', 'Khorfakkan']),
 'L': set(['Greece L']),
 'M-24': set(['M-24']),
 'Mai': set(['6th Street Tanker Mai']),
 'Majaz': set(['Corniche Street, Al Majaz']),
 'Maktoum': set(['Al Maktoum']),
 'Mall': set(['Abu Dhabi Mall',
              'Al Wahda Mall',
              'Deerfields Mall',
              'Paragon Mall',
              'Yas Mall']),
 'Mankhool': set(['Al Mankhool']),
 'Marina': set(['Dubai Marina', 'The Walk, Dubai Marina']),
 'Market': set(['Near Ramla Hyper Market', 'Souk, Central Market']),
 'Marrakesh': set(['Marrakesh']),
 'Masyaf': set(['Dar Al Masyaf']),
 'Metha': set(['Oud Metha']),
 'Muneera': set(['Al Muneera']),
 'Muroor': set(['Muroor']),
 'Musaffah': set(['Musaffah']),
 'Muzera': set(['Hatta Oman road Muzera']),
 'N': set(['Persia N']),
 'Naser': set(['Jamal Abdul Naser']),
 'Nayhan': set(['Al Nayhan']),
 'OASIS': set(['AXIS 8 SILICON OASIS']),
 'Oasis': set(['Dubai Silicon Oasis']),
 'Plaza': set(['Etihad Plaza', 'Marsa Plaza', 'Yas Plaza']),
 'Qasba': set(['Maraya Art Centre, Al Qasba']),
 'Qasr': set(['Al Qasr']),
 'ROAD': set(['MUROOR ROAD']),
 'Ramth': set(['Al Ramth']),
 'Ranches': set(['Arabian Ranches']),
 'Rashid': set(['Al Mina Road, Port Rashid']),
 'Rd': set(['Al Safouh Rd',
            'Al Sufouh Rd',
            'JBR Rd',
            'Jumeirah Beach Rd',
            'Oud Metha Rd',
            'Sheikh Rashed Bin Said Rd']),
 'Residence': set(['Ewan Residence', 'The Walk, Jumeirah Beach Residence']),
 'Resort)': set(['(Inside of the Hotel The Cove Rotana Resort)']),
 'Road)': set(['2nd Street (Old Airport Road)',
               'Al Khaleej Al Arabi Road (Coast Road)']),
 'Rumooolooo': set(['Marrakesh Street, Garhuod, Rumooolooo']),
 'STREET': set(['11TH STREET']),
 'Saada': set(['Al Saada']),
 'Sadiyat': set(['Mina Sadiyat']),
 'Safa': set(['1 Street 17,  Al Safa']),
 'Sanaiya': set(['Sanaiya']),
 'Sands': set(['Seven Sands']),
 'Shamkha': set(['Al Shamkha']),
 'Sharjah': set(['Al Nud Sharjah',
                 'Corniche Street Al Jubail ,Sharjah',
                 'Jawazath Road, Behind Mashreq Bank, Sharjah']),
 'Sharq': set(['Al Sharq']),
 'Slipway': set(['Memzar Slipway']),
 'South': set(['Bijada Boulevard South']),
 'St': set(['12 D St',
            '16 D St',
            '1b St',
            '3 A St',
            '34d St',
            '6b St',
            '81 A St',
            'Al Hubob St',
            'Al Zahra St',
            'Electra St',
            'Meena St',
            'Rigga Al Buteen St']),
 'St.': set(['153 St.',
             "Al Bada' St.",
             'Al Burooj St.',
             'Al Etisalat St.',
             'Al Falak St.',
             'Hamdan Bin Zayed Al Awwal St.',
             'King Abdul Aziz St.',
             'King Faisal St.',
             'Mohammed Bin Khalifa St.',
             'Nahyan Al Awwal St.',
             'Noor Islamic Bank St.',
             'Third Industrial St.',
             'Twam St.',
             'Wahda St.']),
 'Street)': set(['Fatima Bint Mubarak Street (former Najda Street)',
                 'Mohammed Bin Khalifa Street (15th Street)',
                 'Sheikh Zayed The First Street (Khalidiyah Street)']),
 'Thammam': set(['Al Thammam']),
 'Tower': set(['Sheikh Zayed Road, Latifa Tower']),
 'Towers': set(['Cluster I, AU Tower (Gold Tower),Jumierah Lake Towers',
                'Jumeirah Lake Towers',
                'Jumeirah Lakes Towers']),
 'Townsquare': set(['Deerfields Townsquare']),
 'Village': set(['Al Badia Hillside Village', 'Cedre Village']),
 'WB3': set(['Road: WB3']),
 'Walk': set(['Jumeirah Walk']),
 'Way': set(['Buttercup Way', 'Daffodil Way', 'Daisy Way', 'Tulip Way']),
 'West': set(['Corniche Road West']),
 'Y': set(['Swiss Tower, Cluster Y']),
 'a': set(['29 a']),
 'albuteen': set(['albuteen']),
 'b': set(['11 b']),
 'buil.': set(['17th St. Old Airport Rd. Sea Shell buil.']),
 'building': set(['Abraj al Mamzar Emirates building']),
 'cctv': set(['Max com Technologies cctv']),
 'center': set(['near health care center']),
 'club': set(['Muhaila , Al Teqa children club']),
 'cluster': set(['England cluster']),
 'exit': set(['Dubai Marina Mall exit']),
 'gate': set(['al khail gate']),
 'morocco': set(['morocco']),
 'rawada': set(['al rawada']),
 'rd': set(['sheik rashid rd']),
 'residential': set(['residential']),
 'road': set(['Abu Baker al Siddique road',
              'Airport road',
              'Ajman Corniche road',
              'Ajman corniche road',
              'Al ghubaiba road',
              'Beirut road',
              'Doha road',
              'al nahda road',
              'al rams road']),
 'shikla': set(['Khatam al shikla']),
 'skycourts': set(['skycourts']),
 'south': set(['al barsha south']),
 'st': set(['13c st', '27b st', 'Al Safouh st']),
 'street': set(['10th street',
                '15th street',
                '8th street',
                'Al Nasr street',
                'Amman street',
                'Baghdad street',
                'Damascus street',
                'Damscus street',
                'Electra street',
                'Marrakech street',
                'Wahda street',
                'sheikh khalid bin khalid street']),
 'tower': set(['lakeshore tower']),
 'track': set(['Nahda park jogging track'])}
In [61]:
hide_code
# Update street names
for street_type, ways in street_names1.iteritems():
    for name in ways:
        better_name = update_name(name, mapping, street_type_re)
        print name, "=>", better_name
Al Nayhan => Al Nayhan
Al Sufouh Rd => Al Sufouh Road
JBR Rd => JBR Road
Sheikh Rashed Bin Said Rd => Sheikh Rashed Bin Said Road
Oud Metha Rd => Oud Metha Road
Al Safouh Rd => Al Safouh Road
Jumeirah Beach Rd => Jumeirah Beach Road
Al Hudaiba Road, Al Badaa
=> Al Hudaiba Road, Al Badaa

14th Street, Al Khalidiya, Near Saba Crownn => 14th Street, Al Khalidiya, Near Saba Crownn
Newbridge Hill => Newbridge Hill
Dubai Ibn Battuta Gate => Dubai Ibn Battuta Gate
Oud Metha => Oud Metha
12A => 12A
Maraya Art Centre, Al Qasba => Maraya Art Centre, Al Qasba
P.O.Box 74147 => P.O.Box 74147
residential => residential
District 12K => District 12K
The Dome => The Dome
industrial 4 => industrial 4
Al Qaram Br => Al Qaram Br
The Walk, Dubai Marina => The Walk, Dubai Marina
Dubai Marina => Dubai Marina
Street 8 => Street 8
MUROOR ROAD => MUROOR Road
Al Ghuwair Building => Al Ghuwair Building
SouthRidge Branch, Downtown Dubai => SouthRidge Branch, Downtown Dubai
Street 13, Dubai => Street 13, Dubai
Silverene Tower, Dubai => Silverene Tower, Dubai
Al Ghuwair => Al Ghuwair
Paragon Mall, Reem Island => Paragon Mall, Reem Island
Park Island => Park Island
Yas Island => Yas Island
Max com Technologies cctv => Max com Technologies cctv
Greece L => Greece L
Business Bay => Business Bay
Al Thammam => Al Thammam
Deerfields Townsquare => Deerfields Townsquare
Souk Al Bahar => Souk Al Bahar
al khail gate => al khail gate
Sheikh Zayed Road, Latifa Tower => Sheikh Zayed Road, Latifa Tower
Al Shahama Rd., Al Bahia => Al Shahama Rd., Al Bahia
Al Murraqabat Intersection => Al Murraqabat Intersection
Corniche Street, Al Majaz => Corniche Street, Al Majaz
Plot No. M-26, Area 54 => Plot No. M-26, Area 54
56 => 56
Baharia => Baharia
Airport road across Cornich => Airport road across Cornich
11TH STREET => 11TH Street
Seven Sands => Seven Sands
Al Sharq => Al Sharq
Marrakesh Street, Garhuod, Rumooolooo => Marrakesh Street, Garhuod, Rumooolooo
Corniche Road West => Corniche Road West
P.O.Box 33704,Villa No. 2, Al Qawasim Corniche => P.O.Box 33704,Villa No. 2, Al Qawasim Corniche
Al Qawasim Corniche => Al Qawasim Corniche
Damscus street => Damscus Street
Amman street => Amman Street
15th street => 15th Street
Marrakech street => Marrakech Street
Baghdad street => Baghdad Street
10th street => 10th Street
Damascus street => Damascus Street
Wahda street => Wahda Street
8th street => 8th Street
Al Nasr street => Al Nasr Street
sheikh khalid bin khalid street => sheikh khalid bin khalid Street
Electra street => Electra Street
P.O.Box 111 => P.O.Box 111
Road E => Road E
Street 5 => Street 5
industrial area 5 => industrial area 5
Jumeirah Village Triangle,  District 2, Street 5 => Jumeirah Village Triangle,  District 2, Street 5
DIFC => DIFC
District 12 => District 12
Street 12 => Street 12
2nd Street (Old Airport Road) => 2nd Street (Old Airport Road)
Al Khaleej Al Arabi Road (Coast Road) => Al Khaleej Al Arabi Road (Coast Road)
07 => 07
Jumeirah Lakes Towers => Jumeirah Lakes Towers
Jumeirah Lake Towers => Jumeirah Lake Towers
Cluster I, AU Tower (Gold Tower),Jumierah Lake Towers => Cluster I, AU Tower (Gold Tower),Jumierah Lake Towers
(Inside of the Hotel The Cove Rotana Resort) => (Inside of the Hotel The Cove Rotana Resort)
1 Street 17,  Al Safa => 1 Street 17,  Al Safa
Al Reef Downtown => Al Reef Downtown
Street 3 => Street 3
13C street , Al Quoz Industrial 3 => 13C street , Al Quoz Industrial 3
Dubai Marina Mall exit => Dubai Marina Mall exit
Street 7 => Street 7
Memzar Slipway => Memzar Slipway
Al Shamkha => Al Shamkha
sheik rashid rd => sheik rashid Road
Sanaiya => Sanaiya
TECOM SECTION C => TECOM SECTION C
Al Maktoum => Al Maktoum
Sheikh Khalifa Bin Zayed Highway => Sheikh Khalifa Bin Zayed Highway
Etihad Plaza => Etihad Plaza
Marsa Plaza => Marsa Plaza
Yas Plaza => Yas Plaza
Jamal Abdul Naser => Jamal Abdul Naser
Paragon Mall => Paragon Mall
Yas Mall => Yas Mall
Deerfields Mall => Deerfields Mall
Abu Dhabi Mall => Abu Dhabi Mall
Al Wahda Mall => Al Wahda Mall
Yas Marina Circuit => Yas Marina Circuit
Al Muneera => Al Muneera
lakeshore tower => lakeshore tower
Cluster T, JLT => Cluster T, JLT
albuteen => albuteen
Muroor => Muroor
Dip2 => Dip2
Dar Al Masyaf => Dar Al Masyaf
Abu Hail => Abu Hail
Ras Al Khaimah => Ras Al Khaimah
Ibrahim Almidfaa => Ibrahim Almidfaa
P.O.Box 434 => P.O.Box 434
No Fear Cafe => No Fear Cafe
al rams road => al rams Road
Doha road => Doha Road
Al ghubaiba road => Al ghubaiba Road
Beirut road => Beirut Road
Ajman corniche road => Ajman corniche Road
Abu Baker al Siddique road => Abu Baker al Siddique Road
Ajman Corniche road => Ajman Corniche Road
Airport road => Airport Road
al nahda road => al nahda Road
AXIS 8 SILICON OASIS => AXIS 8 SILICON OASIS
Wahda St. => Wahda Street
Al Falak St. => Al Falak Street
153 St. => 153 Street
Hamdan Bin Zayed Al Awwal St. => Hamdan Bin Zayed Al Awwal Street
Twam St. => Twam Street
Al Etisalat St. => Al Etisalat Street
King Faisal St. => King Faisal Street
Mohammed Bin Khalifa St. => Mohammed Bin Khalifa Street
Al Bada' St. => Al Bada' Street
Al Burooj St. => Al Burooj Street
King Abdul Aziz St. => King Abdul Aziz Street
Third Industrial St. => Third Industrial Street
Nahyan Al Awwal St. => Nahyan Al Awwal Street
Noor Islamic Bank St. => Noor Islamic Bank Street
Exit 333 => Exit 333
P.O. Box: 3005 => P.O. Box: 3005
1D => 1D
Tulip Way => Tulip Way
Daisy Way => Daisy Way
Daffodil Way => Daffodil Way
Buttercup Way => Buttercup Way
The Galleria => The Galleria
Arabian Ranches => Arabian Ranches
Marrakesh => Marrakesh
Mina Sadiyat => Mina Sadiyat
Ibn Sina Medical Centre => Ibn Sina Medical Centre
Al Garhoud => Al Garhoud
-Al Garhoud => -Al Garhoud
Corniche West Street, P.O.Box 39999 => Corniche West Street, P.O.Box 39999
Al Bateen => Al Bateen
Fatima Bint Mubarak Street (former Najda Street) => Fatima Bint Mubarak Street (former Najda Street)
Sheikh Zayed The First Street (Khalidiyah Street) => Sheikh Zayed The First Street (Khalidiyah Street)
Mohammed Bin Khalifa Street (15th Street) => Mohammed Bin Khalifa Street (15th Street)
Hattan Street 2 => Hattan Street 2
Al Nahda 2 => Al Nahda 2
Street 2 => Street 2
Dubai Investment Park 2 => Dubai Investment Park 2
Al Barsha south 2 => Al Barsha south 2
dubai investment park 2 => dubai investment park 2
Dragon Mart 2 => Dragon Mart 2
Al Jaddaf 2 => Al Jaddaf 2
Icad 2 => Icad 2
P.O. Box 282825 => P.O. Box 282825
Al Ramth => Al Ramth
P.O. Box 34429 => P.O. Box 34429
Karama => Karama
Abraj al Mamzar Emirates building => Abraj al Mamzar Emirates building
Mina Al Arab => Mina Al Arab
al rawada => al rawada
High Bay Business Center => High Bay Business Center
Al Mina Road, Port Rashid => Al Mina Road, Port Rashid
Muhaila , Al Teqa children club => Muhaila , Al Teqa children club
6 => 6
12 D St => 12 D Street
Al Zahra St => Al Zahra Street
Al Hubob St => Al Hubob Street
6b St => 6b Street
3 A St => 3 A Street
34d St => 34d Street
Electra St => Electra Street
1b St => 1b Street
Rigga Al Buteen St => Rigga Al Buteen Street
16 D St => 16 D Street
Meena St => Meena Street
81 A St => 81 A Street
Corniche Street Al Jubail ,Sharjah => Corniche Street Al Jubail ,Sharjah
Jawazath Road, Behind Mashreq Bank, Sharjah => Jawazath Road, Behind Mashreq Bank, Sharjah
Al Nud Sharjah => Al Nud Sharjah
166 => 166
Street 92 => Street 92
Road: WB3 => Road: WB3
shabiya -11 => shabiya -11
ind area 10 => ind area 10
Street 10 => Street 10
industrial 13 => industrial 13
street 13
=> street 13

Street 13 => Street 13
11 b => 11 b
sweet 15 => sweet 15
11th street, Musaffah M 14 => 11th street, Musaffah M 14
Musaffah Industrial Area Street 14 => Musaffah Industrial Area Street 14
17 => 17
19 => 19
street 18 => street 18
Street 18 => Street 18
CBD06 => CBD06
Jumeirah Walk => Jumeirah Walk
morocco => morocco
Street 724 => Street 724
Hatta Oman road Muzera => Hatta Oman road Muzera
Al Saada => Al Saada
Near Ramla Hyper Market => Near Ramla Hyper Market
Souk, Central Market => Souk, Central Market
Motor City => Motor City
Dubai Healthcare City => Dubai Healthcare City
Mohammed Bin Zayed City => Mohammed Bin Zayed City
Zayed Sports City => Zayed Sports City
Up Town Motor City => Up Town Motor City
Academic City => Academic City
Community 153 => Community 153
Gate Precinct 5 Suite 606 => Gate Precinct 5 Suite 606
Boutik Mall, 1st Floor => Boutik Mall, 1st Floor
Tameem House Building Floor => Tameem House Building Floor
Abdul Salam Arif => Abdul Salam Arif
Sa'ada Street (19th) => Sa'ada Street (19th)
England cluster => England cluster
Khorfakkan => Khorfakkan
Al Rifa Street, Khorfakkan => Al Rifa Street, Khorfakkan
District 12 C12 => District 12 C12
China C15 => China C15
Khalidiya => Khalidiya
Al Mankhool => Al Mankhool
Southern Sun Abu Dhabi => Southern Sun Abu Dhabi
Sheraton Abu Dhabi => Sheraton Abu Dhabi
47 => 47
Street 43 => Street 43
City Walk, Jumeirah 1 => City Walk, Jumeirah 1
Hattan Street 1 => Hattan Street 1
20B Street, Safa 1 => 20B Street, Safa 1
aljurf ind 1 => aljurf ind 1
E 1 => E 1
Dubai Silicon Oasis => Dubai Silicon Oasis
M-24 => M-24
Musaffah => Musaffah
6th Street Tanker Mai => 6th Street Tanker Mai
Al Qasr => Al Qasr
skycourts => skycourts
147 => 147
Bahar 7 Jumeirah Beach Res, Marsa, Dubai, AE => Bahar 7 Jumeirah Beach Res, Marsa, Dubai, AE
Nahda park jogging track => Nahda park jogging track
The Walk, Jumeirah Beach Residence => The Walk, Jumeirah Beach Residence
Ewan Residence => Ewan Residence
Khatam al shikla => Khatam al shikla
Crescent Rd, The Palm Jumeirah => Crescent Rd, The Palm Jumeirah
Crescent Road, Atlantis Palm Jumeirah => Crescent Road, Atlantis Palm Jumeirah
Rose 2 - 17a St - Dubai 17a St Dubai United Arab Emirates => Rose 2 - 17a St - Dubai 17a St Dubai United Arab Emirates
P.O. Box 50 => P.O. Box 50
Kahraba South East => Kahraba South East
Corniche Road East => Corniche Road East
Swiss Tower, Cluster Y => Swiss Tower, Cluster Y
2-A => 2-A
Persia N => Persia N
Bijada Boulevard South => Bijada Boulevard South
29 a => 29 a
Rawdha Bridge => Rawdha Bridge
near health care center => near health care center
Al Barsha => Al Barsha
Corniche West Street, P.O. Box: 4010 => Corniche West Street, P.O. Box: 4010
Emirates Living Community => Emirates Living Community
Al Safouh st => Al Safouh Street
13c st => 13c Street
27b st => 27b Street
Sheikh Zayed Road (E-11) => Sheikh Zayed Road (E-11)
al barsha south => al barsha south
Al Badia Hillside Village => Al Badia Hillside Village
Cedre Village => Cedre Village
Warrior Net Cafeteria => Warrior Net Cafeteria
17th St. Old Airport Rd. Sea Shell buil. => 17th St. Old Airport Rd. Sea Shell buil.

More accurate correction is possible by comparison with data from other map sites and in the studying of the real situation.

3. CSV & SQL

3.1 Database

From csv files we can create separated data frames. Let's have a look on them.

In [72]:
hide_code
print "Nodes: ", len(nodes)
matplotlib.rcParams['figure.figsize'] = (6, 4)
plt.style.use('seaborn-pastel')
nodes.timestamp.hist()
print "Histogram for node timestamps"
Nodes:  1890178
Histogram for node timestamps
In [71]:
# Setup variables for created from csv files data frames
nodes = odo('nodes.csv', pd.DataFrame)
ways = odo('ways.csv', pd.DataFrame)
nodes_tags = odo('nodes_tags.csv', pd.DataFrame)
ways_tags = odo('ways_tags.csv', pd.DataFrame)
ways_nodes = odo('ways_nodes.csv', pd.DataFrame)

In [66]:
hide_code
nodes.head(1)
Out[66]:
id lat lon user uid version changeset timestamp
0 21133776 25.175283 55.396638 Tommy 18885 3 7291467 2011-02-15 02:24:49
In [67]:
hide_code 
nodes_tags.head(1)
Out[67]:
id key value type
0 21136186 crossing island regular
In [117]:
print "List of keys for node tags"
print sorted(set(nodes_tags['key']))
List of keys for node tags
[u'Cell', u'FIXME', u'Note', u'Service_1', u'Town', u'access', u'ace', u'admin_level', u'aerialway', u'aeroway', u'af', u'ak', u'alcohol', u'als', u'alt', u'alt_name', u'alt_name_1', u'am', u'amenity', u'an', u'ang', u'ar', u'ar1', u'arc', u'area', u'artwork_type', u'arz', u'ast', u'atm', u'attraction', u'av', u'az', u'backrest', u'bar', u'barrier', u'bat-smg', u'bcl', u'be', u'be-tarask', u'beacon', u'beacon_isolated_danger:colour', u'beacon_isolated_danger:colour_pattern', u'beacon_lateral:category', u'beacon_lateral:colour', u'beacon_lateral:system', u'beacon_special_purpose:colour', u'bench', u'bg', u'bicycle', u'bicycle_parking', u'biodiesel', u'biogas', u'bitcoin', u'bjn', u'bm', u'bn', u'bo', u'board_type', u'boundary', u'bpy', u'br', u'brand', u'bs', u'bug', u'building', u'buoy_cardinal:category', u'buoy_cardinal:colour', u'buoy_cardinal:colour_pattern', u'buoy_cardinal:shape', u'buoy_installation:category', u'buoy_isolated_danger:colour', u'buoy_isolated_danger:colour_pattern', u'buoy_isolated_danger:shape', u'buoy_lateral:category', u'buoy_lateral:colour', u'buoy_lateral:shape', u'buoy_lateral:system', u'buoy_safe_water:colour', u'buoy_safe_water:colour_pattern', u'buoy_safe_water:shape', u'buoy_special_purpose:category', u'buoy_special_purpose:colour', u'buoy_special_purpose:shape', u'bus', u'bus_routes', u'button_operated', u'bxr', u'ca', u'camp_type', u'capacity', u'capital', u'capital_city', u'caravans', u'cargo', u'cash', u'cbk-zam', u'ce', u'ceb', u'chr', u'city', u'ckb', u'climbing', u'clothes', u'cng', u'coins', u'construction', u'continent', u'country', u'country_code', u'country_code_fips', u'country_code_iso3166_1_alpha_2', u'covered', u'craft', u'created_by', u'credit_cards', u'crh', u'crossing', u'crossing_ref', u'cryptocurrencies', u'cs', u'cuisine', u'cuisine_1', u'currency:AED', u'cy', u'da', u'de', u'delivery', u'denomination', u'description', u'designation', u'destination', u'diesel', u'diplomatic', u'diq', u'direction', u'direction_east', u'direction_west', u'dispensing', u'disposal', u'district', u'disused', u'dog', u'door', u'dress_code', u'drinking_water', u'drive_in', u'drive_through', u'dsb', u'dv', u'dz', u'e10', u'e85', u'ee', u'el', u'ele', u'electronic_purses', u'email', u'embankment', u'emergency', u'en', u'enforcement', u'entrance', u'eo', u'es', u'et', u'ethanol', u'eu', u'exit', u'exit_to', u'ext', u'fa', u'facebook', u'fax', u'fee', u'female', u'ff', u'fi', u'fiu-vro', u'fixme', u'flag', u'flats', u'floor', u'fo', u'fog_signal:category', u'fog_signal:group', u'fog_signal:period', u'food', u'foot', u'ford', u'fr', u'frp', u'fuel', u'fuel:1_25', u'fuel:HGV_diesel', u'full', u'fur', u'fy', u'ga', u'gag', u'gd', u'geometry', u'girocard', u'gl', u'glidepath', u'glideslope_angle', u'gn', u'golf', u'google_plus', u'gu', u'gv', u'ha', u'hak', u'hamlet', u'harbour', u'he', u'healthcare', u'height', u'hgv', u'hi', u'hif', u'highway', u'hiking', u'historic', u'hoops', u'horse', u'housename', u'housenumber', u'hr', u'hsb', u'ht', u'hu', u'hy', u'hypermarket', u'ia', u'iata', u'icao', u'id', u'ie', u'ilo', u'image', u'impromptu', u'indoor', u'indoor_seating', u'information', u'instagram', u'int_name', u'internet_access', u'interpolation', u'io', u'is', u'is_capital', u'is_in', u'it', u'ja', u'junction', u'jv', u'ka', u'kab', u'karak', u'kbd', u'ki', u'kk', u'kl', u'km', u'kn', u'ko', u'ks', u'ksh', u'ku', u'kv', u'kw', u'ky', u'la', u'lamp_type', u'landmark:category', u'landmark:height', u'landuse', u'lanes', u'layer', u'lb', u'lbe', u'leaf_type', u'leisure', u'level', u'levels', u'lez', u'lg', u'li', u'light:1:category', u'light:1:character', u'light:1:colour', u'light:1:height', u'light:1:period', u'light:1:radius', u'light:1:range', u'light:1:sector_end', u'light:1:sector_start', u'light:1:sequence', u'light:2:category', u'light:2:character', u'light:2:colour', u'light:2:height', u'light:2:period', u'light:2:radius', u'light:2:range', u'light:2:sector_end', u'light:2:sector_start', u'light:3:character', u'light:3:colour', u'light:3:height', u'light:3:period', u'light:3:radius', u'light:3:range', u'light:3:sector_end', u'light:3:sector_start', u'light:category', u'light:character', u'light:colour', u'light:group', u'light:height', u'light:period', u'light:range', u'light:reference', u'light:sequence', u'lij', u'lines', u'lit', u'lmo', u'ln', u'lo', u'local_ref', u'localiser', u'locality', u'location', u'locked', u'lpg', u'lt', u'lv', u'male', u'man_made', u'map_type', u'mastercard', u'material', u'maxspeed', u'maxstay', u'mg', u'mi', u'military', u'mk', u'ml', u'mn', u'mobile', u'monorail', u'motor_vehicle', u'motorcar', u'motorcycle', u'mountain_pass', u'mr', u'ms', u'mt', u'my', u'na', u'nah', u'name', u'name_1', u'natural', u'navigation_aid', u'nds', u'ne', u'network', u'nl', u'nn', u'no', u'noexit', u'note', u'notes', u'nov', u'nv', u'oc', u'octane_100', u'octane_91', u'octane_95', u'octane_98', u'office', u'old_name', u'oneway', u'openfire', u'opening_hours', u'opening_hours_1', u'operator', u'or', u'organic', u'os', u'outdoor_seating', u'outdoor_serving', u'output:electricity', u'pa', u'pag', u'pam', u'park_ride', u'parking', u'phone', u'pl', u'place', u'platform:category', u'platforms', u'playground', u'pms', u'pnb', u'po_box', u'population', u'position', u'postcode', u'power', u'ps', u'pt', u'public_transport', u'qu', u'radar_reflector', u'radar_transponder:category', u'radar_transponder:group', u'radar_transponder:period', u'radar_transponder:range', u'railway', u'rank', u'ref', u'religion', u'repair', u'rm', u'rn', u'ro', u'roa-tara', u'road_side', u'role', u'rooms', u'ru', u'rue', u'runway_ref', u'rw', u'sah', u'scn', u'sco', u'sd', u'se', u'seats', u'second_hand', u'service', u'services', u'serving', u'sg', u'sh', u'shelter', u'shelter_type', u'shop', u'shop_1', u'short_name', u'shower', u'si', u'simple', u'site', u'sk', u'sl', u'smoking', u'sn', u'so', u'source', u'species', u'sport', u'sq', u'sr', u'ss', u'stars', u'start_date', u'state', u'station', u'status', u'stq', u'street', u'street_cabinet', u'su', u'sub_station', u'substation', u'suburb', u'subway', u'supervised', u'surface', u'sv', u'sw', u'szl', u'ta', u'takeaway', u'te', u'telephone', u'tents', u'tg', u'th', u'ti', u'tk', u'tl', u'to', u'toilets', u'topmark:colour', u'topmark:shape', u'tourism', u'tr', u'traffic_calming', u'traffic_sign', u'traffic_signals', u'transport', u'trucks', u'tt', u'twitter', u'type', u'udm', u'ufi', u'ug', u'uk', u'unisex', u'unit', u'ur', u'url', u'user_defined_other', u'uz', u'vegetarian', u'vehicle', u'vending', u'vep', u'vi', u'visa', u'vo', u'voltage-high', u'voltage-low', u'vor', u'war', u'waterway', u'website', u'wheelchair', u'whitewater', u'width', u'wifi', u'wikidata', u'wikipedia', u'wine', u'wo', u'wuu', u'xal', u'yi', u'yo', u'zh', u'zh-min-nan', u'zh-yue', u'zh_pinyin', u'zu']
In [106]:
hide_code
print "List of values for node tags 'wifi'"
print set(nodes_tags[nodes_tags['key'] == 'wifi']['value'])
List of values for node tags 'wifi'
set([u'yes', u'free', u'no'])
In [116]:
hide_code
print "List of values for node tags 'place'"
print sorted(set(nodes_tags[nodes_tags['key'] == 'place']['value']))
List of values for node tags 'place'
[u'Abu Dhabi Mall', u'Al Maryah Island', u'Al Reef Downtown', u'Al Reem Island', u'Dalma Mall', u'Deerfields Townsquare Shopping Centre', u'Dubai Design District', u'Khalidiyah Village', u'Khalifa City Abu Dhabi Al Bandar Area', u'Saadiyat Island Golf Course', u'Shangri-La Qaryat Al Beri', u'Villa T3', u'Yas Island', u'Yas Mall', u'city', u'country', u'farm', u'hamlet', u'island', u'isolated_dwelling', u'locality', u'neighbourhood', u'region', u'suburb', u'town', u'village', u'yes']
In [115]:
hide_code
print "List of values for node and way tags 'street'"
streets_nodes = set(nodes_tags[nodes_tags['key'] == 'street']['value'])
streets_ways = set(ways_tags[ways_tags['key'] == 'street']['value'])
print "Number of streets in table 'nodes': ", len(streets_nodes), ';', \
"number of streets in table 'ways': ", len(streets_ways)
print
print sorted(streets_nodes.union(streets_ways))
List of values for node and way tags 'street'
Number of streets in table 'nodes':  406 ; number of streets in table 'ways':  251

[u'(Inside of the Hotel The Cove Rotana Resort)', u'-Al Garhoud', u'07', u'1 A Street', u'1 Street', u'1 Street 17,  Al Safa', u'10 B Street', u'10th Street', u'10th street', u'11 b', u'11TH STREET', u'11th street, Musaffah M 14', u'12 D St', u'129-13b Street', u'12A', u'12th Street', u'13B Street', u'13C street , Al Quoz Industrial 3', u'13c st', u'147', u'14th Street, Al Khalidiya, Near Saba Crownn', u'153 St.', u'15A Street', u'15b Street', u'15th Street', u'15th street', u'16 D St', u'16 Street', u'166', u'16th Street', u'17', u'17th St. Old Airport Rd. Sea Shell buil.', u'17th Street', u'19', u'19C Street', u'19th Street', u'1D', u'1b St', u'1st Street', u'2 A Street', u'2-A', u'20B Street, Safa 1', u'20th Street', u'21 A Street', u'21A Street', u'22A Street', u'23rd Street', u'24 Street', u'24th Street', u'25B Street', u'25C Street', u'25b Street', u'25c Street', u'27b st', u'27th Street', u'28 Street', u'29 a', u'29A Street', u'2nd December Road', u'2nd Street', u'2nd Street (Old Airport Road)', u'2nd Zabeel Road', u'3 A St', u'318 Street', u'31st Street', u'34d St', u'37D Street', u'3A Street', u'3C Street', u'4 Street', u'42A Street', u'47', u'4A Street', u'4B Street', u'5 Street', u'5 b Street', u'51st Street', u'52B Street', u'56', u'59 Street', u'6', u'6 Street', u'6b St', u'6th Street', u'6th Street Musaffah ', u'6th Street Tanker Mai', u'79 a Street', u'7A Street', u'7th Street', u'81 A St', u'8th Street', u'8th street', u'9th Street', u'AL Rolla Road', u'AXIS 8 SILICON OASIS', u'Abdul Salam Arif', u'Abraj al Mamzar Emirates building', u'Abu Baker al Siddique road', u'Abu Dhabi Mall', u'Abu Dhar Al Ghafari Street', u'Abu Hail', u'Abu Mousa Island Street', u'Academic City', u'Academic City Road', u'Acadmeic City Road', u'Airport Road', u'Airport road', u'Airport road across Cornich', u'Ajman Corniche road', u'Ajman Ring Road', u'Ajman corniche road', u'Al Abraj Street', u'Al Arouba Street', u"Al Bada' St.", u'Al Badia Hillside Village', u'Al Barsha', u'Al Barsha Road', u'Al Barsha South Road', u'Al Barsha south 2', u'Al Barsha south 2 ', u'Al Bateen', u'Al Burj Street', u'Al Burooj St.', u'Al Dhaid - East Coast Road', u'Al Esbij (30th) Street', u'Al Etihad Road', u'Al Etisalat St.', u'Al Etisalat Street', u'Al Ettihad Road', u'Al Fahidi (19th) Street', u'Al Falah Street', u'Al Falak St.', u'Al Garhoud', u'Al Garhoud Road', u'Al Ghubaiba Road', u'Al Ghuwair', u'Al Ghuwair Building', u'Al Himrar Street', u'Al Hisn (44th) Street', u'Al Hubob St', u'Al Hubob Street', u'Al Hudaiba Road', u'Al Hudaiba Road, Al Badaa\n', u'Al Hudaiba cor. 2 a Street', u'Al Ittihad Road', u'Al Ittihad Street', u'Al Jaddaf 2', u'Al Kabeer Street', u'Al Khail Gate ', u'Al Khaleej Al Arabi Road (Coast Road)', u'Al Khaleej Al Arabi Street', u'Al Khaleej al Arabi Road', u'Al Khaleej al Arabi Street', u'Al Khan Street', u'Al Kharamah Street', u'Al Khawaneej Road', u'Al Khor Street', u'Al Ladeem Street', u'Al Maareedh Street', u'Al Maktoum', u'Al Maktoum Road', u'Al Mamsha Street', u'Al Mankhool', u'Al Marsa Street', u'Al Meena Street', u'Al Mina Road', u'Al Mina Road, Port Rashid', u'Al Montaser Road', u'Al Muneera', u'Al Murraqabat Intersection', u'Al Murraqabat Road', u'Al Mussalla Road', u'Al Muteena Street', u'Al Nahda (26th) Street', u'Al Nahda 2', u'Al Nahda Road', u'Al Nahda Street', u'Al Nahdha (26th) Street', u'Al Nakhal Road', u'Al Nasr street', u'Al Nayhan', u'Al Nud Sharjah', u'Al Qaram Br', u'Al Qasr', u'Al Qawasim Corniche', u'Al Raffa Street', u'Al Ramth', u'Al Rebat Street', u'Al Reef Downtown', u'Al Rifa Street, Khorfakkan', u'Al Rigga Road', u"Al Sa'ada Street", u'Al Saada', u'Al Safa Street', u'Al Safouh Rd', u'Al Safouh st', u'Al Salam Street', u'Al Seba Street', u'Al Shahama Rd., Al Bahia', u'Al Shamkha', u'Al Sharq', u'Al Sharq Street', u'Al Sharqi Road', u'Al Sufouh Rd', u'Al Sufouh Road', u'Al Sufouh Street', u'Al Suq (5A) Street', u'Al Taawun Street', u'Al Thammam', u'Al Thanya Road', u'Al Wahda Mall', u'Al Warood 1 Street', u'Al Wasl Road', u'Al Zaher Street', u'Al Zahra St', u'Al ain poultry farm ', u'Al ghubaiba road', u'Al salam Street', u'Ali Bin Abi Taleb (11C) Street', u'Amman street', u'Ammar Bin Yasir Street', u'Arabian Ranches', u'Badr Street', u'Baghdad street', u'Bahar 7 Jumeirah Beach Res, Marsa, Dubai, AE', u'Baharia', u'Bainuna Street', u'Baniyas Road', u'Baniyas Square', u'Baniyas Street', u'Bank Street', u'Barton Street', u'Beirut Road', u'Beirut road', u'Beniyas Road', u'Bijada Boulevard South', u'Bluebell Place', u'Boutik Mall, 1st Floor', u'Business Bay', u'Buttercup Way', u'CBD06', u'Carnation Place', u'Casablanca Road', u'Cascade Street', u'Cedre Village', u'Channel Street', u'China C15', u'City Walk, Jumeirah 1', u'Claverton Street', u'Cluster I, AU Tower (Gold Tower),Jumierah Lake Towers', u'Cluster T, JLT', u'Commercial Road', u'Community 153', u'Corner 7a & 4a Street', u'Corniche Road', u'Corniche Road East', u'Corniche Road West', u'Corniche Street', u'Corniche Street Al Jubail ,Sharjah', u'Corniche Street, Al Majaz', u'Corniche West Street, P.O. Box: 4010', u'Corniche West Street, P.O.Box 39999', u'Crescent Rd, The Palm Jumeirah', u'Crescent Road, Atlantis Palm Jumeirah', u'Cultural Square', u'DIFC', u'Daffodil Way', u'Daisy Way', u'Damascus Street', u'Damascus street', u'Damscus street', u'Dar Al Masyaf', u'Deerfields Mall', u'Deerfields Townsquare', u'Deira Street', u'Detroit Road', u'Dibba Road', u'Dibba-Khor Fakkan Road', u'Dickens Street', u'Dip2', u'District 12', u'District 12 C12', u'District 12K', u'Doha road', u'Dragon Mart 2', u'Dubai - Al Ain Road', u'Dubai Abudhabi Road', u'Dubai Healthcare City', u'Dubai Ibn Battuta Gate', u'Dubai Investment Park 2', u'Dubai Marina', u'Dubai Marina Mall exit', u'Dubai Road', u'Dubai Silicon Oasis', u'E 1', u'Easton Street', u'Electra St', u'Electra Street', u'Electra street', u'Emaar Boulevard', u'Emaar Square', u'Emirates Living Community', u'Emirates Road', u'England cluster', u'Etihad Plaza', u'Ewan Residence', u'Exit 333', u'Fatima Bint Mubarak Street (former Najda Street)', u'Fire Station Road', u'First Al Khail Street', u'First Industrial Street', u'Gardenia Road', u'Gate Precinct 5 Suite 606', u'Greece L', u'Halab Street', u'Hamdan Bin Mohammad Street', u'Hamdan Bin Mohammed - 5th Street', u'Hamdan Bin Zayed Al Awwal St.', u'Hamdan Street', u'Hamdan bin Mohammed Street', u'Hamed Bin Abdulla Road', u'Hatta Oman road Muzera', u'Hattan Avenue', u'Hattan Street 1', u'Hattan Street 2', u'Hazaa Bin Zayed The First Street', u'Hazza Bin Zayed Street', u'Hessa Street', u'High Bay Business Center', u'Hisham bin Abdul Rahman Street', u'Honsho Road', u'Hor Al Anz Street', u'Ibn Sina Medical Centre', u'Ibrahim Almidfaa', u'Icad 2', u'JBR Rd', u'Jamal Abdul Naser', u'Jawazath Road, Behind Mashreq Bank, Sharjah', u'Jumeirah Beach Rd', u'Jumeirah Beach Road', u'Jumeirah Lake Towers', u'Jumeirah Lakes Towers', u'Jumeirah Road', u'Jumeirah Village Triangle,  District 2, Street 5', u'Jumeirah Walk', u'Jumeria Center ', u'Kahraba South East', u'Karama', u'Khalid Bin Waleed Road', u'Khalid bin Waleed Road', u'Khalidiya', u'Khalifa Bin Shakbout Street', u'Khalifa Bin Zayed Street', u'Khalifa Street', u'Khalifa bin Shakhbout Street', u'Khalifa bin Zayed the First Street', u'Khata Al Shikle Street', u'Khatam al shikla', u'Khorfakkan', u'Khuzam Road', u'King Abdul Aziz St.', u'King Faisal St.', u'King Faisal Street', u'Kuwait Road', u'Kuwaiti Street', u'Latifa Bint Hamdan Street', u'Lavender Place', u'Lilly Lane', u'Link Delma Street > Sheikh Rashid Bin Saeed Al Maktoum Street', u'Liwa Street', u'M-24', u'MUROOR ROAD', u'Magnolia Place', u'Maliha Road', u'Malqtah Street', u'Mankhool Road', u'Maraya Art Centre, Al Qasba', u'Marlowe Street', u'Marrakech street', u'Marrakesh', u'Marrakesh Street, Garhuod, Rumooolooo', u'Marsa Plaza', u'Max com Technologies cctv', u'Meena Road', u'Meena St', u'Memzar Slipway', u'Meydan Road', u'Mina Al Arab', u'Mina Sadiyat', u'Mohammed Bin Khalifa St.', u'Mohammed Bin Khalifa Street (15th Street)', u'Mohammed Bin Zayed City', u'Motor City', u'Mubarak bin Mohammed Street', u'Muhaila , Al Teqa children club', u'Muroor', u'Muroor Road', u'Muroor Street', u'Musaffah', u'Musaffah Industrial Area Street 14', u'Nad Al Hamar Road', u'Nadd Al Sheba Road', u'Nahda park jogging track', u'Nahyan Al Awwal St.', u'Najda Street', u'Near Ramla Hyper Market', u'Newbridge Hill', u'No Fear Cafe', u'Noor Islamic Bank St.', u'Norton Street', u'Old Baladiya Street', u'Orchid Lane', u'Oud Metha', u'Oud Metha Rd', u'Oud Metha Road', u'P. O. Box: 9, Corniche Road', u'P.O. Box 282825', u'P.O. Box 34429', u'P.O. Box 50', u'P.O. Box: 3005', u'P.O.Box 111', u'P.O.Box 33704,Villa No. 2, Al Qawasim Corniche', u'P.O.Box 434', u'P.O.Box 74147', u'Paragon Mall', u'Paragon Mall, Reem Island', u'Park Island', u'Persia N', u'Plot No. M-26, Area 54', u'Poppy Lane', u'Ras Al Khaimah', u'Ras Al Khor Road', u'Ras Al Khor Road;Al Awir Road', u'Rawdha Bridge', u'Regent Street', u'Rigga Al Buteen St', u'Road E', u'Road: WB3', u'Rose 2 - 17a St - Dubai 17a St Dubai United Arab Emirates', u"Sa'ada Street (19th)", u'Saada Street', u'Sahhab Street', u'Salam Street', u'Sanaiya', u'Satwa Road', u'Second Industrial Street', u'Seven Sands', u'Shaikh Mohammed bin Rashid Boulevard', u'Shakespeare Street', u'Shakhboot Bin Sultan Street', u'Shakhbout Bin Sultan 19 Street', u'Sheikh Humaid Bin Rashid Al Nuaimi Street', u'Sheikh Khalifa Bin Zayed Highway', u'Sheikh Khalifa Bin Zayed Street', u'Sheikh Khalifa bin Zayed Street', u'Sheikh Maktoum bin Rashid Road', u'Sheikh Mohammed bin Zayed Road', u'Sheikh Rashed Bin Said Rd', u'Sheikh Rashid Bin Saeed Al Maktoum Street', u'Sheikh Rashid Road', u'Sheikh Rashid bin Saeed Street', u'Sheikh Saqr Bin Mohammad Al Qasimi Road', u'Sheikh Zayed Road', u'Sheikh Zayed Road (E-11)', u'Sheikh Zayed Road, Latifa Tower', u'Sheikh Zayed The First Street', u'Sheikh Zayed The First Street (Khalidiyah Street)', u'Sheikh Zayed bin Sultan Street', u'Sheraton Abu Dhabi', u'Sherlock Street', u'Shk Maktoom Road', u'Silverene Tower, Dubai', u'Souk Al Bahar', u'Souk, Central Market', u'SouthRidge Branch, Downtown Dubai', u'Southern Sun Abu Dhabi', u'Street 10', u'Street 12', u'Street 13', u'Street 13, Dubai', u'Street 18', u'Street 2', u'Street 3', u'Street 43', u'Street 5', u'Street 7', u'Street 724', u'Street 7A', u'Street 8', u'Street 92', u'Sultan Bin Zayed The First Street', u'Sunflower Road', u'Swiss Tower, Cluster Y', u'T C A ', u'TECOM SECTION C', u'Tameem House Building Floor', u'Tawam Street', u'Terranova Avenue', u'The Dome', u'The Galleria', u'The Walk, Dubai Marina', u'The Walk, Jumeirah Beach Residence', u'Third Industrial St.', u'Third Industrial Street', u'Truck Road', u'Tulip Way', u'Tunisian Food Court', u'Turin Boulevard Road', u'Twam St.', u'Umm Suqueim Street', u'University City Rd. 120 Street', u'University City Road 120 Street', u'University Street', u'Unnamed Road', u'Up Town Motor City', u'Vienna Street', u'Violet Place', u'WB-21 Street', u'Wahda St.', u'Wahda Street', u'Wahda street', u'Warrior Net Cafeteria', u'Weston Street', u'Widcombe Street', u'Yas Island', u'Yas Leisure Drive', u'Yas Mall', u'Yas Marina Circuit', u'Yas Plaza', u"Za'abeel Road", u'Zayed Bin Sultan Street', u'Zayed Sports City', u'Zen Street', u'academic City Road', u'al barsha south', u'al khail gate', u'al nahda road', u'al nud al qasimiya ', u'al rams road', u'al rawada', u'albuteen', u'aljurf ind 1', u'behind Al Teqa handicap club ', u'corniche ', u'dubai investment park 2', u'ind area 10', u'industrial 13', u'industrial 4', u'industrial area 5', u'lakeshore tower', u'morocco', u'near health care center', u'off 18th Street', u'residential', u'shabiya -11', u'sheik rashid rd', u'sheikh Zayed Road', u'sheikh khalid bin khalid street', u'skycourts', u'street 13\n', u'street 18', u'sweet 15', u'\u0628\u0644\u0648\u0627\u0631 \u062e\u06cc\u0627\u0645', u'\u0634\u0627\u0631\u0639 \u0627\u0644\u062c\u0627\u0645\u0639\u0629', u'\u0634\u0627\u0631\u0639 \u0627\u0644\u062e\u062f\u0645\u0627\u062a \u0639\u0648\u062a\u0628', u'\u0634\u0627\u0631\u0639 \u0627\u0644\u062e\u0644\u064a\u062c', u'\u0634\u0627\u0631\u0639 \u0627\u0644\u0633\u0644\u0627\u0645', u'\u0634\u0627\u0631\u0639 \u0627\u0644\u0634\u064a\u062e \u062e\u0644\u064a\u0641\u0629 \u0628\u0646 \u0632\u0627\u064a\u062f', u'\u0634\u0627\u0631\u0639 \u0627\u0644\u0634\u064a\u062e \u0645\u062d\u0645\u062f \u0628\u0646 \u0633\u0627\u0644\u0645  ', u'\u0634\u0627\u0631\u0639 \u0627\u0644\u0643\u062a\u0631\u0627', u'\u0634\u0627\u0631\u0639 \u0627\u0644\u0647\u0645\u0628\u0627\u0631', u'\u0634\u0627\u0631\u0639 \u0646\u0627\u064a\u0641', u'\u0645\u0646\u0637\u0642\u0629 \u0645\u0632\u064a\u0631\u0639 \u0639\u062c\u0645\u0627\u0646', u'\u0645\u0646\u0637\u0642\u0629 \u0646\u062f \u0627\u0644\u062d\u0645\u0631 \u062e\u0644\u0641 \u0645\u0646 \u0627\u0633\u0648\u0627\u0642']
In [94]:
hide_code
print "Ways: ", len(ways)
ways.head(3)
Ways:  234327
Out[94]:
id user uid version changeset timestamp
0 4009554 rehan727 2952340 25 42505170 2016-09-28 21:02:31
1 4334711 4b696d 1420318 21 28096059 2015-01-12 19:49:12
2 4340534 wk2 1808544 18 18943947 2013-11-16 22:28:53
In [64]:
hide_code
print "Example of way tags"
ways_tags.head(3)
Example of way tags
Out[64]:
id key value type
0 4009554 bridge yes regular
1 4009554 lanes no|no|no|no|yes|yes hgv
2 4009554 highway motorway regular
In [65]:
hide_code
print "Example of ways-nodes connections"
ways_nodes.head(3)
Example of ways-nodes connections
Out[65]:
id node_id position
0 4009554 90031463 0
1 4009554 90028252 1
2 4009554 21133804 2

We can manage the data with special tools as well.

The set of comands in the package "odo" allowed to download information from the files to the SQL database.

In [63]:
hide_code
# Import the table into the sql database
dshape_nodes = discover(resource('nodes.csv'))
nodes_sql = odo('nodes.csv', 'sqlite:///openstreetmap_dubai.db::nodes', dshape=dshape_nodes)
In [64]:
hide_code
dshape_ways = discover(resource('ways.csv'))
ways_sql = odo('ways.csv', 'sqlite:///openstreetmap_dubai.db::ways', dshape=dshape_ways)
In [65]:
hide_code
dshape_nodes_tags = discover(resource('nodes_tags.csv'))
nodes_tags_sql = odo('nodes_tags.csv', 'sqlite:///openstreetmap_dubai.db::nodes_tags', dshape=dshape_nodes_tags)
In [66]:
hide_code
dshape_ways_tags = discover(resource('ways_tags.csv'))
ways_tags_sql = odo('ways_tags.csv', 'sqlite:///openstreetmap_dubai.db::ways_tags', dshape=dshape_ways_tags)
In [67]:
hide_code
dshape_ways_nodes = discover(resource('ways_nodes.csv'))
ways_nodes_sql = odo('ways_nodes.csv', 'sqlite:///openstreetmap_dubai.db::ways_nodes', dshape=dshape_ways_nodes)
3.2 Indicators of the dataset
In [60]:
hide_code
conn = sqlite3.connect('openstreetmap_dubai.db')
c = conn.cursor()

The number of nodes:

In [61]:
c.execute("SELECT COUNT(*) FROM nodes;")
print c.fetchall()
[(1890178,)]

The number of ways:

In [62]:
c.execute("SELECT COUNT(*) FROM ways;")
print c.fetchall()
[(234327,)]
In [63]:
hide_code
conn.close()
3.3 Users
In [25]:
hide_code
conn = sqlite3.connect('openstreetmap_dubai.db')
c = conn.cursor()

With the help of simple manipulations in the database, the user can perform a selection of interesting information.

In [73]:
c.execute("SELECT COUNT(DISTINCT(e.uid)) FROM \
          (SELECT uid FROM nodes UNION ALL SELECT uid FROM ways) e;")
print c.fetchall()
[(1885,)]

The database allows to evaluate the contribution of each individual user in map editing.

Let us list the 3 most active editors of this map section:

In [74]:
c.execute("SELECT e.user, COUNT(*) as num \
           FROM (SELECT user FROM nodes UNION ALL SELECT user FROM ways) e \
           GROUP BY e.user \
           ORDER BY num DESC \
           LIMIT 3;")
print c.fetchall()
[(u'eXmajor', 492808), (u'chachafish', 156874), (u'Seandebasti', 125767)]

The examples of the 10 users with only one note:

In [75]:
c.execute("SELECT e.user, COUNT(*) as num \
           FROM (SELECT user FROM nodes UNION ALL SELECT user FROM ways) e \
           GROUP BY e.user \
           ORDER BY num \
           LIMIT 10;")
print c.fetchall()
[(u'08xavstj', 1), (u'12DonW', 1), (u'66444098', 1), (u'7dhd', 1), (u'AC FootCap', 1), (u'AHMED ABASHAR', 1), (u'ASHRAF CHOOLAKKADAVIL', 1), (u'A_Sadath', 1), (u'Aal Ibra240380heem', 1), (u'Abdul Rahim Khan', 1)]
In [105]:
hide_code
conn.close()
3.4 Places
In [29]:
hide_code
conn = sqlite3.connect('openstreetmap_dubai.db')
c = conn.cursor()

The list of the 3 most common types of places:

In [96]:
c.execute("SELECT value, COUNT(*) as num \
            FROM nodes_tags \
           WHERE key='place' \
           GROUP BY value \
           ORDER BY num DESC \
           LIMIT 3;")
print c.fetchall()
[(u'village', 608), (u'locality', 507), (u'suburb', 144)]

The list of building types:

In [35]:
c.execute("SELECT value, COUNT(*) as num \
            FROM nodes_tags \
           WHERE key='building' \
           GROUP BY value \
           ORDER BY num DESC;")
building_types = c.fetchall()
print building_types
[(u'yes', 78), (u'mosque', 50), (u'hut', 41), (u'residential', 37), (u'apartments', 14), (u'commercial', 7), (u'entrance', 7), (u'university', 7), (u'public', 6), (u'industrial', 5), (u'school', 5), (u'office', 4), (u'hangar', 3), (u'hospital', 3), (u'hotel', 3), (u'roof', 3), (u'service', 3), (u'house', 2), (u'retail', 2), (u'warehouse', 2), (u'MAJ Building', 1), (u'cabin', 1), (u'college', 1), (u'construction', 1), (u'dormitory', 1), (u'garage', 1), (u'manufacture', 1), (u'terrace', 1)]
In [65]:
print(plt.style.available)
[u'seaborn-darkgrid', u'seaborn-notebook', u'classic', u'seaborn-ticks', u'grayscale', u'bmh', u'seaborn-talk', u'dark_background', u'ggplot', u'fivethirtyeight', u'seaborn-colorblind', u'seaborn-deep', u'seaborn-whitegrid', u'seaborn-bright', u'seaborn-poster', u'seaborn-muted', u'seaborn-paper', u'seaborn-white', u'seaborn-pastel', u'seaborn-dark', u'seaborn-dark-palette']
In [36]:
building_df = pd.DataFrame( [[ij for ij in i] for i in building_types] )
In [69]:
matplotlib.rcParams['figure.figsize'] = (14, 5)
x = building_df.index.values
y = building_df[1]
xticks = building_df[0]
plt.xticks(x, xticks, rotation=60)
plt.style.use('seaborn-pastel')
plt.scatter(x, y, marker = 'H', s = 100, alpha = 0.5)
Out[69]:
<matplotlib.collections.PathCollection at 0x11a2b1ad0>

The list of the 10 most common facilities:

In [98]:
c.execute("SELECT value, COUNT(*) as num \
            FROM nodes_tags \
           WHERE key='amenity' \
           GROUP BY value \
           ORDER BY num DESC \
           LIMIT 10;")
print c.fetchall()
[(u'restaurant', 1310), (u'parking', 596), (u'fast_food', 427), (u'cafe', 392), (u'place_of_worship', 362), (u'pharmacy', 310), (u'bank', 290), (u'fuel', 274), (u'atm', 216), (u'bench', 215)]

The list of the 3 most common zipcodes:

In [26]:
c.execute("SELECT value, COUNT(*) as num \
           FROM nodes_tags \
           WHERE key='postcode' \
           GROUP BY value \
           ORDER BY num DESC \
           LIMIT 3;")
print c.fetchall()
[(u'473828', 4), (u'000000', 2), (u'103711', 2)]

Counting zipcodes with one document:

In [104]:
c.execute("SELECT value, COUNT(*) as num \
           FROM nodes_tags \
           WHERE key='postcode' \
           GROUP BY value \
           ORDER BY num \
           LIMIT 70;") 
print c.fetchall()
[(u'0', 1), (u'0000', 1), (u'00962', 1), (u'00971', 1), (u'108100', 1), (u'111695', 1), (u'113431', 1), (u'114692', 1), (u'115443', 1), (u'119417', 1), (u'121641', 1), (u'1234', 1), (u'12345', 1), (u'1243', 1), (u'125939', 1), (u'128358', 1), (u'20268', 1), (u'20661', 1), (u'2157', 1), (u'22436', 1), (u'23117', 1), (u'231992', 1), (u'232144', 1), (u'232574', 1), (u'24857', 1), (u'24976', 1), (u'2504', 1), (u'2574', 1), (u'26268', 1), (u'263076', 1), (u'2666', 1), (u'28676', 1), (u'32923', 1), (u'34121', 1), (u'38126', 1), (u'38495', 1), (u'38575', 1), (u'392189', 1), (u'41318', 1), (u'42324', 1), (u'42524', 1), (u'44263', 1), (u'444786', 1), (u'44548', 1), (u'4599', 1), (u'46477', 1), (u'4758', 1), (u'47602', 1), (u'47612', 1), (u'500368', 1), (u'5280 dubai', 1), (u'549', 1), (u'57566', 1), (u'60884', 1), (u'64649', 1), (u'6656', 1), (u'6834', 1), (u'71444', 1), (u'7770', 1), (u'77947', 1), (u'7819', 1), (u'79506', 1), (u'81730', 1), (u'8988', 1), (u'9292', 1), (u'97717', 1), (u'9978', 1), (u'Muhaisnah 4', 1), (u'P O BOX 3766', 1), (u'P.O. Box 5618, Abu Dhabi, U.A.E', 1)]

The list of the 20 most common streets:

In [27]:
c.execute("SELECT value, COUNT(*) as num \
            FROM nodes_tags \
           WHERE key='street' \
           GROUP BY value \
           ORDER BY num DESC \
           LIMIT 20;")
print c.fetchall()
[(u'Al Taawun Street', 48), (u'Sheikh Zayed Road', 23), (u'Sheikh Mohammed bin Zayed Road', 21), (u'Al Ramth', 20), (u'Al Ettihad Road', 14), (u'King Faisal Street', 13), (u'Yas Leisure Drive', 11), (u'Paragon Mall, Reem Island', 10), (u'Al Fahidi (19th) Street', 9), (u'Sheikh Zayed The First Street', 9), (u'Corniche Road West', 8), (u'Hamdan Street', 8), (u'Yas Mall', 8), (u'King Abdul Aziz St.', 7), (u'10 B Street', 6), (u'Al Meena Street', 6), (u'Al Raffa Street', 6), (u'Al Thammam', 6), (u'Hazaa Bin Zayed The First Street', 6), (u'Jumeirah Beach Road', 6)]
In [28]:
hide_code
conn.close()

4. Problems and errors

4.1

One of the main problems of public maps - no duplication of all place names in other languages. If it were possible to automate the translation process by increasing a common database of map names in many languages, it would save users from many difficulties and mistakes.

4.2

The next problem - the presence of a large number of databases (including mapping) on the same map objects. Some intergraph procedures of already available data would relieve a lot of people from unnecessary work, save time and effort.

4.3

Obviously, the information about the number of buildings and their purpose is incomplete. Completeness of public maps can be increased by bringing in the process of mapping new users. For this goal enter the information should be as simple as possible: for example, a choice of the available options with automatic filling many fields for linked options (for example, linking the name of the street and the administrative area in which it is located).

4.4

There are a number of mistakes and typos as in every public data. For correction them well-known methods can be proposed: automatic comparison with existing data and verification for new data by other users.

4.5

The lack of a uniform postal code system in this concrete dataset complicates their identification and verification.

4.6

During working on the project, I spent a lot of time on the conversion of one type of data file to another. Each format has its own advantages and disadvantages. Probably, it is possible to design a universal file type that allows us to store data of any kind, combining the advantages of all existing types and applicable in the most of existing programming languages.

4.7

Correction of errors made in the data seems to me appropriate to carry out after uploading files to the database. Sometimes a record that is a mistake in terms of filling a particular type of data just contains additional information about geoobjects.

5. Data Overview

5.1 Description of the data structure:

1) nodes - points in space with basic characteristics (lat, long, id, tags);

2) ways - defining linear features and area boundaries (an ordered list of nodes);

3) relations - tags and also an ordered list of nodes, ways and/or relations as members which is used to define logical or geographic relationships between other elements.

5.2 Indicators.

1) Size of the .osm file: 394,4 MB.

2) Size of the .osm sample file : 3,9 MB.

3) Nodes: 1890178.

4) Ways: 234327.

5) Relations: 2820.

6) Tags: 503027.

7) Users: 1895.

5.3 SQL

With the help of a specific set of commands we can perform a statistical description of the data collections and the database.

6. Conclusion

I think this project is educational for me. I believe that one of the main tasks in this case was to study the methods of extraction and researching of map data in open access. For example, I used a systematic sample of elements from the original .osm file for trying functions of processing before applying them to the whole dataset. As a result I have some new useful skills in parsing, processing, storing, aggregating and applying the data.

In the research I have read through quite a lot of projects of other students on this topic. After my own research and review the results of other authors I have formed a definite opinion about the ideas in OpenStreetMap.

This website can be viewed as a testing ground of interaction of a large number of people (ncluding non-professionals) to create a unified information space. The prospects of such cooperation can not be overemphasized. The success of the project will allow to implement the ambitious plans in the field of available information technologies, the creation of virtual reality and many other areas.

Increasing of the number of users leads to many positive effects in this kind of projects:

1) a rapid improvement in the accuracy, completeness and timeliness of information;

2) approximation of the information space to the reality , the objectivity of the data evaluation;

3) reduce the effort for data cleansing on erroneous details.

Ideas for improving the project OpenStreetMap are simple and natural.

Increasing the number of users can be achieved by additional options like marks of the rating evaluation (eg, the best restaurant or the most convenient parking).

The popularity of the project may be more due to the temporary pop-up messages of users (placement is not more than 1-3 hours) with actual information about the geographic location (eg, the presence of traffic jams).