modRana data repository document
This is a description for a simple program, that should generate a data repository for offline Monav routing data. It should be flexible enough to enable extending it to also handle other data in the future.
Wokflow
- download source data
- do data processing
- package results
- move packages to repository
- update packages in repository & update manifest
Sequential
The initial repository implementation will run all the steps in order.
Parallel
As most modern systems have at least two independent CPU cores, the repository should parallelize as many operations as possible.
Parallel workflow:
- download source data - single downloading thread (due to Geofabrik download slot limitations)
- do data processing - multiple threads possible
- package results - multiple packagers
- move & publish - single publishing thread (as this is most probably IO bound)
- move packages to repository
- update packages in repository & update manifest
Requirements
- easy repository setup and regeneration
CLI options
Repositories
Repository definition file
This is a JSON file called repository.json that sits at the root of the repository.
Structure
Description of the different sections
= header = format_version - 1 for now = repository = * name - natural language name of the repository * last_update - epoch of the last update = data = Contains sections for the different data sub-repositories. == monav == * name = "Monav offline routing data repository" === Example_Package === * pack_type = "monav" * url - pack URL * last_updated - epoch of the last update * bytes_size (optional) ==== zsync_file_list - (optional) ===
Monav data repository
Located in the monav driectory in the main repository folder.
The individual packages are stored in a simple folder structure:
continent/[country_name/][city_name/]package_name.tar.gz
NOTE: Square brackets indicate optional path components.
Internal package structure
The package archives contain a named folder and inside this folder is the folder with Monav routin data.
Example - car routing data for Czech Republic:
Czech_Republic/routing_car/
Each package contains data for a single transportation mode. Like this, the users can select to download only the routing data they actually need.
package.JSON
Inside the routing_* subdirectory is JSON file called package.JSON that makes it possible to map the package folders back to packages existing in the repository.
This is mainly needed to facilitate package updates.
= header = format_version - 1 for now = package = origin - base repository URL (ex.: http://data.modrana.org path - path to the package in the repository (ex.: monav/europe/czech_republic.tar.gz)
Monav data processing
- the Monav preprocessor can run in multiple threads
- the repository generator should detect the number of cores and start supply the corresponding number to the preprocessor command line arguments using the -t option
Benchmarks
Preliminary
2012.10.04 france.osm.pbf bike speed profile PQ only@Asteria: 1 thread = 295.96 s 2 threads = 172.69 s 48 threads = 25.83 s 96 threads = 28.132 s 2012.10.04 czech_republic.osm.pbf Intel(R) Core(TM) i7-2640M CPU @ 2.80GHz: 1 thread: real 0m23.392s user 0m22.525s sys 0m0.756s 2 threads: real 0m18.536s user 0m26.162s sys 0m0.700s 4 threads: real 0m20.948s user 0m42.271s sys 0m0.880s 8 threads: real 0m15.879s user 0m27.570s sys 0m0.924s Asteria: 1 thread: real 0m27.449s user 0m26.714s sys 0m0.504s 2 threads: real 0m20.221s user 0m30.210s sys 0m0.520s 4 threads: real 0m15.889s user 0m33.766s sys 0m0.544s 8 threads: real 0m12.606s user 0m37.990s sys 0m0.536s 48 threads: real 0m11.244s user 1m53.039s sys 0m4.012s 96 threads: real 0m13.937s user 1m15.157s sys 0m21.989s 192 threads: real 0m15.164s user 1m22.069s sys 0m55.079s
First full run on Asteria 04
## Monav repository updated in 5 hours (19918 s) package count: 265 output data size: 91 GB NOTE: processed the whole World OSM data about 6 times * +1 - whole world country extracts * +1 - continent extracts * x3 - car, bike & pedestrian profiles (+? country region extracts for Germany, France, etc.)
Run on Asteria 04 running per package 3 preprocessors in parallel
## Monav repository updated in 4 hours (16597 s)