This guide describes how to install, set-up and configure all the necessary software to have a database of OpenStreetMap data which you can use to render maps or develop stylesheets. The step-by-step instructions are written for Ubuntu Linux 14.04 LTS (Trusty Tahr), but have notes on adaptations needed for Ubuntu 12.04 LTS (Precise Pangolin). They should transfer without much difficulty to other distributions. Basic Linux command line and PostgreSQL knowledge is required. The machine used for the database can either be a desktop machine or a machine you have remote access to.

Software Installation

This guide covers installation of osm2pgsql and loading a PostgreSQL/PostGIS database with OpenStreetMap data. Before starting, make sure your Ubuntu system is fully up-to-date:

sudo apt-get update
sudo apt-get -y upgrade

Core software

Before continuing, we need some essential software. This will allow us to download other software

sudo apt-get --no-install-recommends -y install git unzip curl \
    build-essential software-properties-common

On Ubuntu 12.04 use the python-software-properties package instead of software-properties-common

PostgreSQL + PostGIS

PostgreSQL is a relational database, and PostGIS “spatially enables” it, which allow you to store map data in it. PostgreSQL + PostGIS are used for a wide variety of uses such as rendering maps, geocoding, and analysis. It serves a similar function to ESRI’s SDE or Oracle’s Spatial extension. A minimum of PostgreSQL 9.1 and PostGIS 2.0 is required with this guide. It is possible to use PostgreSQL 8.4 and PostGIS 1.5, but it is strongly not recommended as it is substantially slower and harder to set up, requiring significant changes to the guide. On Ubuntu 12.04 you will want to use PostgreSQL from the PGDG APT repository.

sudo apt-get --no-install-recommends install -y postgresql-9.3-postgis-2.1 \
    postgresql-contrib-9.3 proj-bin libgeos-dev

Assorted software

We’re going to need some assorted software for the upcoming steps, and for monitoring performance. None of this is essential for loading and updating OpenStreetMap data, but it is very hard to monitor your server and debug any problems without diagnostic software. We will use Munin to monitor and record system information. This allows comparison of the current state of the server with its past, allowing you to observe changes. You can see an example of the information available for orm.openstreetmap.org, one of the rendering servers serving the Standard layer for OpenStreetMap.org.

sudo apt-get --no-install-recommends install -y apache2 \
    munin munin-node munin-plugins-extra libdbd-pg-perl \
    sysstat iotop ptop

By default the Munin graphs are only accessible from the machine they are run on, but if you are using a remote server you will have to make them accessible to you. They are also useful if you are asking someone else for help with your server, as it allows them to see how it is running.

sudo sed -i "s|Allow from.*|Allow from all|" /etc/munin/apache.conf
sudo service apache2 reload

OpenStreetMap-specific software

Lastly, we need the OpenStreetMap specific software. Osm2pgsql is the software used to load the OSM data into the Postgres database, while osmctools contains osmconvert, osmupdate, and osmfilter, a trio of useful programs for working with downloaded OSM data. If you are not using Ubuntu you can compile osm2pgsql from source and instructions for osmconvert, osmupdate, and osmfilter can be found on the Wiki.

sudo add-apt-repository -y ppa:kakrueger/openstreetmap
sudo apt-get update
sudo apt-get --no-install-recommends install -y osm2pgsql osmctools

Checking versions

Before continuing you should check that you have the expected versions of all the software, by typing the commands in black. Check that the response (shown here in orange) indicates you have at least these versions. If you do not, something has gone wrong and that needs to be fixed before continuing.

$ proj
Rel. 4.8.0, 6 March 2012
usage: proj [ -beEfiIlormsStTvVwW [args] ] [ +opts[=arg] ] [ files ]
$ psql --version
psql (PostgreSQL) 9.3.4
$ grep 'default_version' '/usr/share/postgresql/9.3/extension/postgis.control'
default_version = '2.1.2'
$ geos-config --version
3.4.2
$ osm2pgsql --version
osm2pgsql SVN version 0.85.0 (64bit id space)
$ osmconvert -h | head -n2
osmconvert 0.7T  Parameter Overview

Getting ready to load

osm2pgsql uses overcommit like many scientific and large data applications, which requires adjusting a kernel setting to allow the data import to work successfully with multiple processes. Changing overcommit is also recommended for renderd and postgresql.

sudo tee /etc/sysctl.d/60-overcommit.conf <<EOF
# Overcommit settings to allow faster osm2pgsql imports
vm.overcommit_memory=1
EOF
sudo sysctl -p /etc/sysctl.d/60-overcommit.conf

To just change overcommit temporarily you can use sudo sysctl -w vm.overcommit_memory=1 instead.

Creating a database

We need to create a database to store the OpenStreetMap data in. For traditional reasons this database is named “gis”. We also need to enable the PostGIS and hstore extensions on the database.

sudo -u postgres createuser -s $USER
createdb gis
psql -d gis -c 'CREATE EXTENSION hstore; CREATE EXTENSION postgis;'

Make sure to create the user from the postgres user, not from root.
We also want to monitor the new database for size and other useful statistics

sudo munin-node-configure --sh | sudo sh
sudo service munin-node restart

Tuning

The default PostgreSQL settings aren’t great for very large databases like OSM databases. Proper tuning can just about double the performance you’re getting. The most important PostgreSQL settings to change are maintenance_work_mem and work_mem, both which should be increased for faster data loading and faster queries while rendering respectively. Conservative settings for a 2GB VM are work_mem=16MB and maintenance_work_mem=128MB. On a machine with enough memory you could set them as high as work_mem=128MB and maintenance_work_mem=1GB. An overview to tuning Postgres can be found on the PostgreSQL Wiki, but adjusting maintenance_work_mem and work_mem are probably enough on a development or testing machine.

Entire books have been written about Postgres tuning, but the two main osm2pgsql specific references are two presentations by Frederik Ramm of Geofabrik and Paul Norman.

Stylesheet

The stylesheet you’re using should come with a .style file which indicates what tags osm2pgsql needs to import into the database. Here we’re using openstreetmap-carto, the default style on openstreetmap.org. Because we’re going to download a few different pieces of OSM software or data, we’re going to make an osm directory in our home directory to stay organized.

mkdir -p ~/osm
cd ~/osm
git clone https://github.com/gravitystorm/openstreetmap-carto.git

Loading the data: two ways

The complete set of OSM data is very large. If you are testing or only want to serve tiles in a small part of the world you should use an extract which is only part of the OSM data. These are less demanding on hardware and loading the data into the database is much faster. If you want the entire planet or a large area like Europe, you want to load the data differently. The most popular source of extracts is Geofabrik, which makes extracts available by country or state. Other sources include Metro Extracts from Mapzen and extracts provided by local chapters. A more complete list of alternative source can be found on the Planet.osm wiki page. It is also possible to create your own extracts with tools like osmconvert or Osmosis.

Loading an extract

This assumes that you are

  • Importing a PBF file 300 MB or under, for example small country like Switzerland; a US state like Texas or Florida but not California; or a larger but less densely mapped country like Brazil
  • On a virtual machine with at least 2GB RAM and 8GB free disk space

Getting the data

There are a number of providers of extractsGeofabrik provides extracts of many countries and states. We first want to download the data and run md5sum to check that the download was not corrupted. For this example we’re going to use the data for Liechtenstein, a small European country. Because it is very small, the Liechtenstein extract is often used for testing.

cd ~/osm
wget http://download.geofabrik.de/europe/liechtenstein-latest.osm.pbf.md5
wget http://download.geofabrik.de/europe/liechtenstein-latest.osm.pbf
md5sum -c liechtenstein-latest.osm.pbf.md5

Loading with osm2pgsql

An osm2pgsql command line can be very complicated. There are lots of options and how they interact isn’t obvious. To help explain it, we’re going to break it down into parts. The command line we’re going to use is

osm2pgsql --create --slim \
    --cache 1000 --number-processes 2 --hstore \
    --style ~/osm/openstreetmap-carto/openstreetmap-carto.style --multi-geometry \
    ~/osm/liechtenstein-latest.osm.pbf

There are a few parts to this

  • --create tells osm2pgsql to create new tables rather than appending to existing tables. Creating new tables is the default.
  • --slim tells osm2pgsql to create “slim” tables to store data while importing rather than trying to store everything in memory. --slim is also necessary if we want to update the data
  • --cache 1000 causes 1000 MB of memory to be allocated as a cache for node positions. Having the node positions cached means that there are fewer reads from the database and constructing way geometries is much faster
  • --number-processes 2 causes 2 CPU cores to be used. This should be adjusted to the number of threads the CPU supports, but there are minimal gains past 8 CPU threads.
  • --style ~/osm/openstreetmap-carto/openstreetmap-carto.style gives the path to the .style file which tells osm2pgsql what columns to create
  • --hstore causes tags not in the .style file to be stored in a special “hstore” column. Hstore is a key-value store that supports arbitrary keys and values. Having other tags in hstore allows changes later on, like rendering names in a specific language and overall makes the database more flexible, giving you greater freedom to render interesting data on maps you create. There is a very minor speed and 10% database size penalty to having hstore.
  • --multi-geometry tells osm2pgsql not to break MULTIPOLYGONs into separate polygons. This increases flexibility and eliminates some rendering artifacts, but is slightly slower.
  • ~/osm/liechtenstein-latest.osm.pbf is the path to the OSM data to load

How long osm2pgsql takes to load the data will depend on the extract size and hardware, primarily random access disk speed. It can take anywhere from seconds for Liechtenstein to about an hour for a larger extract on slow hardware.

Loading the full planet

This assumes that you are

  • Importing a PBF file 6GB or over, for example the full planet or a large region with lots of data like North America or Europe.
  • On a machine with at least 16GB RAM and 400GB free disk space, ideally with SSDs, 24GB of RAM and not a lower-end virtual machine.

Double-check flat-nodes for NA

Before loading the full planet you should first check that your setup is correctly working by importing an extract, as described above. The osm2pgsql command line given below will remove the existing data when you are ready to import the planet.

Getting the data

Planet.openstreetmap.org provides weekly planet dumps of the entire database. We first want to download the data and run md5sum to check that the download was not corrupted.

mkdir -p ~/osm
cd ~/osm
wget http://planet.openstreetmap.org/pbf/planet-latest.osm.pbf.md5
wget http://planet.openstreetmap.org/pbf/planet-latest.osm.pbf
md5sum -c planet-latest.osm.pbf.md5

Because the planet dump could be up to a week old, we’re going to update it before importing. This is only necessary with the planet dump, not large extracts, and generally takes 30-60 minutes.

osmupdate planet-latest.osm.pbf new_planet-latest.osm.pbf

Loading with osm2pgsql

An osm2pgsql command line can be very complicated. There are lots of options and how they interact isn’t obvious. To help explain it, we’re going to break it down into parts. The command line we’re going to use is

osm2pgsql --create --slim \
    --flat-nodes ~/osm/flat_nodes.bin \
    -C 14000 --number-processes 4 --hstore \
    --style ~/osm/openstreetmap-carto/openstreetmap-carto.style --multi-geometry \
    ~/osm/new_planet-latest.osm.pbf

There are a few parts to this

  • --create tells osm2pgsql to create new tables rather than appending to existing tables. Creating new tables is the default.
  • --slim tells osm2pgsql to create “slim” tables to store data while importing rather than trying to store everything in memory. --slim is also necessary if we want to update the data
  • --flat-nodes ~/osm/flat_nodes.bin tells osm2pgsql to use the flat nodes mode and where to store the data. In this mode, instead of storing node locations in the database, they are stored in a flat binary file. This takes much less space and is faster, particularly on HDDs.
  • --cache 14000 causes 14000 MB of memory to be allocated as a cache for node positions. Having the node positions cached means that there are fewer reads from the database and constructing way geometries is much faster. If the machine you are using has enough memory, allocate up to 20000MB. There is no point to allocating even more memory as it will go unused.
  • --number-processes 4 causes 4 CPU cores to be used. This should be adjusted to the number of threads the CPU supports, but there are minimal gains past 8 CPU threads.
  • --style ~/osm/openstreetmap-carto/openstreetmap-carto.style gives the path to the .style file which tells osm2pgsql what columns to create
  • --hstore causes tags not in the .style file to be stored in a special “hstore” column. Hstore is a key-value store that supports arbitrary keys and values. Having other tags in hstore allows changes later on, like rendering names in a specific language and overall makes the database more flexible, giving you greater freedom to render interesting data on maps you create. There is a very minor speed and 10% database size penalty to having hstore.
  • --multi-geometry tells osm2pgsql not to break MULTIPOLYGONs into separate polygons. This increases flexibility and eliminates some rendering artifacts, but is slightly slower.
  • ~/osm/new_planet-latest.osm.pbf is the path to the OSM data to load

When loading the full planet it will take some time to build the indexes, so it may appear like nothing is happening at the end. You can examine what is happening in the database with pg_top -d gis

Using your database

When osm2pgsql has finished it will print how long it took. Once it’s done, you can do a few things with it

  • Design beautiful maps with TileMill
  • Render tiles with a tile server
  • Keep the data up to date with published updates
  • Connect using GIS tools like QGIS, ESRI ArcGIS or others