[[PageOutline]] = modRana data repository document = This is a description for a simple program, that should generate a data repository for offline Monav routing data. It should be flexible enough to enable extending it to also handle other data in the future. == Wokflow == 1. download source data 2. do data processing 3. package results 4. move packages to repository 5. update packages in repository & update manifest === Sequential === The initial repository implementation will run all the steps in order. === Parallel === As most modern systems have at least two independent CPU cores, the repository should parallelize as many operations as possible. Parallel workflow: 1. download source data - single downloading thread (due to Geofabrik download slot limitations) 2. do data processing - multiple threads possible 3. package results - multiple packagers 4. move & publish - single publishing thread (as this is most probably IO bound) 1. move packages to repository 2. update packages in repository & update manifest == Requirements == * easy repository setup and regeneration == CLI options == == Repositories == === Repository definition file === This is a JSON file called ''repository.json'' that sits at the root of the repository. ==== Structure ==== Description of the different sections {{{ = header = format_version - 1 for now = repository = * name - natural language name of the repository * last_update - epoch of the last update = data = Contains sections for the different data sub-repositories. == monav == * name = "Monav offline routing data repository" === Example_Package === * pack_type = "monav" * url - pack URL * last_updated - epoch of the last update * bytes_size (optional) ==== zsync_file_list - (optional) === }}} == Monav data repository == Located in the ''monav'' driectory in the main repository folder. The individual packages are stored in a simple folder structure: {{{ continent/[country_name/][city_name/]package_name.tar.gz }}} NOTE: Square brackets indicate optional path components. === Internal package structure === The package archives contain a named folder and inside this folder is the folder with Monav routin data. Example - car routing data for Czech Republic: {{{ Czech_Republic/routing_car/ }}} Each package contains data for a single transportation mode. Like this, the users can select to download only the routing data they actually need. ==== package.JSON ==== Inside the routing_* subdirectory is JSON file called package.JSON that makes it possible to map the package folders back to packages existing in the repository. This is mainly needed to facilitate package updates. {{{ = header = format_version - 1 for now = package = origin - base repository URL (ex.: http://data.modrana.org path - path to the package in the repository (ex.: monav/europe/czech_republic.tar.gz) }}} === Monav data processing === * the Monav preprocessor can run in multiple threads * the repository generator should detect the number of cores and start supply the corresponding number to the preprocessor command line arguments using the ''-t'' option ==== Benchmarks ==== ===== Preliminary ===== {{{ 2012.10.04 france.osm.pbf bike speed profile PQ only@Asteria: 1 thread = 295.96 s 2 threads = 172.69 s 48 threads = 25.83 s 96 threads = 28.132 s 2012.10.04 czech_republic.osm.pbf Intel(R) Core(TM) i7-2640M CPU @ 2.80GHz: 1 thread: real 0m23.392s user 0m22.525s sys 0m0.756s 2 threads: real 0m18.536s user 0m26.162s sys 0m0.700s 4 threads: real 0m20.948s user 0m42.271s sys 0m0.880s 8 threads: real 0m15.879s user 0m27.570s sys 0m0.924s Asteria: 1 thread: real 0m27.449s user 0m26.714s sys 0m0.504s 2 threads: real 0m20.221s user 0m30.210s sys 0m0.520s 4 threads: real 0m15.889s user 0m33.766s sys 0m0.544s 8 threads: real 0m12.606s user 0m37.990s sys 0m0.536s 48 threads: real 0m11.244s user 1m53.039s sys 0m4.012s 96 threads: real 0m13.937s user 1m15.157s sys 0m21.989s 192 threads: real 0m15.164s user 1m22.069s sys 0m55.079s }}} ===== First full run on Asteria 04 ===== {{{ ## Monav repository updated in 5 hours (19918 s) package count: 265 output data size: 91 GB NOTE: processed the whole World OSM data about 6 times * +1 - whole world country extracts * +1 - continent extracts * x3 - car, bike & pedestrian profiles (+? country region extracts for Germany, France, etc.) }}} ===== Run on Asteria 04 running per package 3 preprocessors in parallel ===== {{{ ## Monav repository updated in 4 hours (16597 s) }}}