wiki:dataRepositoryManager

Version 22 (modified by Martin Kolman, 11 years ago) (diff)

--

modRana data repository document

This is a description for a simple program, that should generate a data repository for offline Monav routing data. It should be flexible enough to enable extending it to also handle other data in the future.

Wokflow

  1. download source data
  2. do data processing
  3. package results
  4. move packages to repository
  5. update packages in repository & update manifest

Sequential

The initial repository implementation will run all the steps in order.

Parallel

As most modern systems have at least two independent CPU cores, the repository should parallelize as many operations as possible.

Parallel workflow:

  1. download source data - single downloading thread (due to Geofabrik download slot limitations)
  2. do data processing - multiple threads possible
  3. package results - multiple packagers
  4. move & publish - single publishing thread (as this is most probably IO bound)
    1. move packages to repository
    2. update packages in repository & update manifest

Requirements

  • easy repository setup and regeneration

CLI options

Repositories

Repository definition file

This is a JSON file called repository.json that sits at the root of the repository.

Structure

Description of the different sections

= header =
format_version - 1 for now

= repository =
* name - natural language name of the repository
* last_update - epoch of the last update


= data =
Contains sections for the different data sub-repositories.


== monav ==
* name = "Monav offline routing data repository"

=== Example_Package ===
* pack_type = "monav"
* url - pack URL
* last_updated - epoch of the last update
* bytes_size (optional)

==== zsync_file_list - (optional) ===

Monav data repository

Located in the monav driectory in the main repository folder.

The individual packages are stored in a simple folder structure:

continent/[country_name/][city_name/]package_name.tar.gz

NOTE: Square brackets indicate optional path components.

Internal package structure

The package archives contain a named folder and inside this folder is the folder with Monav routin data.

Example - car routing data for Czech Republic:

Czech_Republic/routing_car/

Each package contains data for a single transportation mode. Like this, the users can select to download only the routing data they actually need.

package.JSON

Inside the routing_* subdirectory is JSON file called package.JSON that makes it possible to map the package folders back to packages existing in the repository.

This is mainly needed to facilitate package updates.

= header =
format_version - 1 for now

= package =
origin - base repository URL (ex.: http://data.modrana.org
path - path to the package in the repository (ex.: monav/europe/czech_republic.tar.gz)

Monav data processing

  • the Monav preprocessor can run in multiple threads
    • the repository generator should detect the number of cores and start supply the corresponding number to the preprocessor command line arguments using the -t option

Benchmarks

Preliminary
2012.10.04
france.osm.pbf
bike speed profile PQ only@Asteria:
1 thread = 295.96 s
2 threads = 172.69 s
48 threads = 25.83 s
96 threads = 28.132 s

2012.10.04
czech_republic.osm.pbf
Intel(R) Core(TM) i7-2640M CPU @ 2.80GHz:
1 thread:
real	0m23.392s
user	0m22.525s
sys	0m0.756s

2 threads:
real	0m18.536s
user	0m26.162s
sys	0m0.700s

4 threads:
real	0m20.948s
user	0m42.271s
sys	0m0.880s

8 threads:
real	0m15.879s
user	0m27.570s
sys	0m0.924s

Asteria:
1 thread:
real	0m27.449s
user	0m26.714s
sys	0m0.504s

2 threads:
real	0m20.221s
user	0m30.210s
sys	0m0.520s

4 threads:
real	0m15.889s
user	0m33.766s
sys	0m0.544s

8 threads:
real	0m12.606s
user	0m37.990s
sys	0m0.536s

48 threads:
real	0m11.244s
user	1m53.039s
sys	0m4.012s

96 threads:
real	0m13.937s
user	1m15.157s
sys	0m21.989s

192 threads:
real	0m15.164s
user	1m22.069s
sys	0m55.079s
First full run on Asteria 04
## Monav repository updated in 5 hours (19918 s)
package count: 265
output data size: 91 GB
NOTE: processed the whole World OSM data about 6 times
* +1 - whole world country extracts
* +1 - continent extracts
* x3 - car, bike & pedestrian profiles
(+? country region extracts for Germany, France, etc.)
Run on Asteria 04 running per package 3 preprocessors in parallel
## Monav repository updated in 4 hours (16597 s)