File processor¶
Example 1 - Delimited¶
Copy delimited config file to /opt/geofilter_file_inbox.
Config file contents:
{
"files": ["delimited.csv"], // input data filename. Use "files": "*" to apply a default config to all files
"format": "delimited",
"fields": {
"timestamp": [1, 99], // [<column, index starts at 0>, <width: ignored>]
"longitude": [4, 99],
"latitude": [3, 99],
"device_id": [2, 99]
},
"delimiter": ",",
"quotechar": '"',
"timeformat": "iso"
}
Copy the sample delimited file to /opt/geofilter_file_inbox.
The file will be parsed, cleaned and processed in 15 - 30 mins based on Debounce. When status changes to processed, you can pick up the output files from /opt/geofilter_file_outbox.
Always copy the config file to inbox before transferring the data file. If data file is picked up before config file, you will see failed_parsing messages in the log. Please ignore this.
You can check the status of file processing by monitoring the contents of the file /opt/geofilter_file_outbox/filemon.log.:
17-09-2020 15:00:20 delimited.csv: status => queued
17-09-2020 15:02:24 delimited.csv: status => queued
17-09-2020 15:04:47 delimited.csv: status => queued
17-09-2020 15:06:48 delimited.csv: status => queued
17-09-2020 15:08:38 delimited.csv: status => queued
17-09-2020 15:11:12 delimited.csv: status => queued
17-09-2020 15:13:47 delimited.csv: status => queued
17-09-2020 15:15:28 delimited.csv: status => queued
17-09-2020 15:17:36 delimited.csv: status => queued
17-09-2020 15:19:39 delimited.csv: status => queued
17-09-2020 15:22:29 delimited.csv: status => queued
17-09-2020 15:25:10 delimited.csv: status => queued
17-09-2020 15:27:21 delimited.csv: status => queued
17-09-2020 15:29:54 delimited.csv: status => queued
17-09-2020 15:31:29 delimited.csv: status => queued
17-09-2020 15:33:59 delimited.csv: status => queued
17-09-2020 15:35:33 delimited.csv: status => queued
17-09-2020 15:37:52 delimited.csv: status => queued
17-09-2020 15:39:33 delimited.csv: delimited.csv_2020-09-17T14-34-24.060428 status => parsed
17-09-2020 15:39:33 delimited.csv: delimited.csv_2020-09-17T14-34-24.060428 processed => 3041 total => 3041
17-09-2020 15:39:33 delimited.csv: delimited.csv_2020-09-17T14-34-24.060428 Writing output files
17-09-2020 15:39:33 delimited.csv: delimited.csv_2020-09-17T14-34-24.060428 Stream JSON => /opt/geofilter_file_outbox/delimited.csv_2020-09-17T14-34-24.060428_stream.json
17-09-2020 15:39:34 delimited.csv: delimited.csv_2020-09-17T14-34-24.060428 Stream CSV => /opt/geofilter_file_outbox/delimited.csv_2020-09-17T14-34-24.060428_stream.csv
17-09-2020 15:39:34 delimited.csv: delimited.csv_2020-09-17T14-34-24.060428 Trips JSON => /opt/geofilter_file_outbox/delimited.csv_2020-09-17T14-34-24.060428_trips.json
17-09-2020 15:39:34 delimited.csv: delimited.csv_2020-09-17T14-34-24.060428 Processing complete.
- File processor produces three output files for each input batch. e.g.;
Example 2 - Fixed width¶
Copy fixed format config file to /opt/geofilter_file_inbox.
Config file contents:
{
"files": ["fixed.txt"], // input data filename. Use "files": "*" to apply a default config to all files
"format": "fixed",
"fields": {
"timestamp": [20, 120], // [<column 20, index starts at 0>, <width of 120 chars>]
"longitude": [410, 30],
"latitude": [300, 30],
"device_id": [200, 20]
},
"timeformat": "iso"
}
Copy the sample fixed file to /opt/geofilter_file_inbox.
The file will be parsed, cleaned and processed in 15 - 30 mins based on Debounce. When status changes to processed, you can pick up the output files from /opt/geofilter_file_outbox.
Always copy the config file to inbox before transferring the data file. If data file is picked up before config file, you will see failed_parsing messages in the log. Please ignore this.
You can check the status of file processing by monitoring the contents of the file /opt/geofilter_file_outbox/filemon.log:
17-09-2020 15:35:33 fixed.txt: status => queued
17-09-2020 15:37:52 fixed.txt: status => queued
17-09-2020 15:39:33 fixed.txt: fixed.txt_2020-09-17T14-34-24.060428 status => parsed
17-09-2020 15:39:33 fixed.txt: fixed.txt_2020-09-17T14-34-24.060428 processed => 3041 total => 3041
17-09-2020 15:39:33 fixed.txt: fixed.txt_2020-09-17T14-34-24.060428 Writing output files
17-09-2020 15:39:33 fixed.txt: fixed.txt_2020-09-17T14-34-24.060428 Stream JSON => /opt/geofilter_file_outbox/fixed.txt_2020-09-17T14-34-24.060428_stream.json
17-09-2020 15:39:34 fixed.txt: fixed.txt_2020-09-17T14-34-24.060428 Stream CSV => /opt/geofilter_file_outbox/fixed.txt_2020-09-17T14-34-24.060428_stream.csv
17-09-2020 15:39:34 fixed.txt: fixed.txt_2020-09-17T14-34-24.060428 Trips JSON => /opt/geofilter_file_outbox/fixed.txt_2020-09-17T14-34-24.060428_trips.json
17-09-2020 15:39:34 fixed.txt: fixed.txt_2020-09-17T14-34-24.060428 Processing complete.
- File processor produces 3 output files for each input batch. e.g.;
Example 3 - JSON Basic¶
Copy basic config file to /opt/geofilter_file_inbox.
Config file contents:
{
"files": ["basic.json"], // input data filename. Use "files": "*" to apply a default config to all files
"format": "json_basic",
"fields: {},
"timeformat": "iso"
}
Copy the sample fixed file to /opt/geofilter_file_inbox.
The file will be parsed, cleaned and processed in 15 - 30 mins based on Debounce. When status changes to processed, you can pick up the output files from /opt/geofilter_file_outbox.
Always copy the config file to inbox before transferring the data file. If data file is picked up before config file, you will see failed_parsing messages in the log. Please ignore this.
You can check the status of file processing by monitoring the contents of the file /opt/geofilter_file_outbox/filemon.log:
17-09-2020 15:37:52 basic.json: status => queued
17-09-2020 15:39:33 basic.json: basic.json_2020-09-17T14-34-24.060428 status => parsed
17-09-2020 15:39:33 basic.json: basic.json_2020-09-17T14-34-24.060428 processed => 3041 total => 3041
17-09-2020 15:39:33 basic.json: basic.json_2020-09-17T14-34-24.060428 Writing output files
17-09-2020 15:39:33 basic.json: basic.json_2020-09-17T14-34-24.060428 Stream JSON => /opt/geofilter_file_outbox/basic.json_2020-09-17T14-34-24.060428_stream.json
17-09-2020 15:39:34 basic.json: basic.json_2020-09-17T14-34-24.060428 Stream CSV => /opt/geofilter_file_outbox/basic.json_2020-09-17T14-34-24.060428_stream.csv
17-09-2020 15:39:34 basic.json: basic.json_2020-09-17T14-34-24.060428 Trips JSON => /opt/geofilter_file_outbox/basic.json_2020-09-17T14-34-24.060428_trips.json
17-09-2020 15:39:34 basic.json: basic.json_2020-09-17T14-34-24.060428 Processing complete.
- File processor produces 3 files as output for each input batch. e.g.;
Example 4 - JSON Rnbg¶
Copy this config file to /opt/geofilter_file_inbox.
Config file contents:
{
"files": ["rnbg.json"], // input data filename. Use "files": "*" to apply a default config to all files
"format": "json_rnbg",
"fields: {},
"timeformat": "iso"
}
Copy the sample fixed file to /opt/geofilter_file_inbox.
The file will be parsed, cleaned and processed in 15 - 30 mins based on Debounce. When status changes to processed, you can pick up the output files from /opt/geofilter_file_outbox.
Always copy the config file to inbox before transferring the data file. If data file is picked up before config file, you will see failed_parsing messages in the log. Please ignore this.
You can check the status of file processing by monitoring the contents of the file /opt/geofilter_file_outbox/filemon.log:
17-09-2020 15:37:52 rnbg.json: status => queued
17-09-2020 15:39:33 rnbg.json: rnbg.json_2020-09-17T14-34-24.060428 status => parsed
17-09-2020 15:39:33 rnbg.json: rnbg.json_2020-09-17T14-34-24.060428 processed => 3041 total => 3041
17-09-2020 15:39:33 rnbg.json: rnbg.json_2020-09-17T14-34-24.060428 Writing output files
17-09-2020 15:39:33 rnbg.json: rnbg.json_2020-09-17T14-34-24.060428 Stream JSON => /opt/geofilter_file_outbox/rnbg.json_2020-09-17T14-34-24.060428_stream.json
17-09-2020 15:39:34 rnbg.json: rnbg.json_2020-09-17T14-34-24.060428 Stream CSV => /opt/geofilter_file_outbox/rnbg.json_2020-09-17T14-34-24.060428_stream.csv
17-09-2020 15:39:34 rnbg.json: rnbg.json_2020-09-17T14-34-24.060428 Trips JSON => /opt/geofilter_file_outbox/rnbg.json_2020-09-17T14-34-24.060428_trips.json
17-09-2020 15:39:34 rnbg.json: rnbg.json_2020-09-17T14-34-24.060428 Processing complete.
- File processor produces 3 output files for each input batch. e.g.;
Config files¶
Config files specify input file formats. These are json files with .conf extension. A simple conf file for fixed width input is below.
{
"files": "*",
"format": "fixed",
"fields": {
"timestamp": [0, 10],
"longitude": [1, 6],
"latitude": [2, 6]
}
}
files: list of input filenames to which the config file applies. Specify “*” for default global config.
- format: Input file format. Allowed values are
- fields: Field names and where to find each field.
At a minimum you need to specify timestamp, latitude and longitude.
For fixed width format specify starting column number and width for each field. e.g.;
{ timestamp: [0, 10], //timestamp starts at position 0 with width 10 longitude: [10, 6], //longitude starts at position 10 with width 6 latitude: [16, 6], device_id: [22, 5] }
For delimited specify field number. Width is ignored
{ timestamp: [0, 99], //timestamp is the first field (columns are 0 indexed). Width is ignored. longitude: [2, 99], //longitude is field 3 (0 indexed, so index 2) latitude: [3, 99], // latitude is field 4 ( index 3) device_id: [4, 99] // device id is field 5 (index 4) }
For JSON, you don’t need to specify fields in the config file. File processor expects the input file to contain an array of JSON objects with timestamp, longitude and latitude as keys. Keys read by file processor are:
Key
Required
timestamp
Y
latitude
Y
longitude
Y
device_id
altitude
accuracy
speed
activity_type
activity_confidence
delimiter: Character to be used as delimiter
quotechar: Quote char for delimited files. Default double quote.
fields: Field names and where to find each field.
- timeformat: Timestring format. Allowed values are:
iso : iso 8601 format. Respects timezone offset or defaults to UTC.
unix : unix epoch seconds in UTC timezone
Output files¶
File processor produces 3 output files for each input batch.
1. stream_<batch_id>.json
- JSON data with below fields:
Field
Description
id
global record id
row_id
record id within a batch
timestamp
iso-8601 timestamp(UTC)
tz
original timezone
batch_id
trip_id
device_id
lattiude
decimal, epsg 4326
longitude
decimal, epsg 4326
altitude
heading
speed
activity
activity_confidence
accuracy
filter_time
true if time in future
filter_accuracy
true if accuracy < 250m
filter_static
true if static error
filter_jumps
true if GPS jumps
address
OpenStreetMap GeoJSON
See an example
2. stream_<batch_id>.csv
Content of stream*.json file as a flattened csv.
See an example
3. trips_<batch_id>.json
- An array of JSON objects with below fields:
Field
Description
id
trip id
device_id
start_time
iso-8601 timestamp(UTC)
end_time
iso-8601 timestamp(UTC)
timezone_offset
original timezone
metres
trips distance in meteres
route_gejson
route as a GeoJSON LineString
route_polyline
route as a Google polyline
address
OSM GeoJSON start/end address
See an example