File processor¶

Example 1 - Delimited¶

Copy delimited config file to /opt/geofilter_file_inbox.

Config file contents:

{
   "files": ["delimited.csv"],  // input data filename. Use "files": "*" to apply a default config to all files
   "format": "delimited",
   "fields": {
      "timestamp": [1, 99],     // [<column, index starts at 0>, <width: ignored>]
      "longitude": [4, 99],
      "latitude": [3, 99],
      "device_id": [2, 99]
   },
   "delimiter": ",",
   "quotechar": '"',
   "timeformat": "iso"
}

Copy the sample delimited file to /opt/geofilter_file_inbox.

The file will be parsed, cleaned and processed in 15 - 30 mins based on Debounce. When status changes to processed, you can pick up the output files from /opt/geofilter_file_outbox.

Always copy the config file to inbox before transferring the data file. If data file is picked up before config file, you will see failed_parsing messages in the log. Please ignore this.

You can check the status of file processing by monitoring the contents of the file /opt/geofilter_file_outbox/filemon.log.:

17-09-2020 15:00:20 delimited.csv:  status => queued
17-09-2020 15:02:24 delimited.csv:  status => queued
17-09-2020 15:04:47 delimited.csv:  status => queued
17-09-2020 15:06:48 delimited.csv:  status => queued
17-09-2020 15:08:38 delimited.csv:  status => queued
17-09-2020 15:11:12 delimited.csv:  status => queued
17-09-2020 15:13:47 delimited.csv:  status => queued
17-09-2020 15:15:28 delimited.csv:  status => queued
17-09-2020 15:17:36 delimited.csv:  status => queued
17-09-2020 15:19:39 delimited.csv:  status => queued
17-09-2020 15:22:29 delimited.csv:  status => queued
17-09-2020 15:25:10 delimited.csv:  status => queued
17-09-2020 15:27:21 delimited.csv:  status => queued
17-09-2020 15:29:54 delimited.csv:  status => queued
17-09-2020 15:31:29 delimited.csv:  status => queued
17-09-2020 15:33:59 delimited.csv:  status => queued
17-09-2020 15:35:33 delimited.csv:  status => queued
17-09-2020 15:37:52 delimited.csv:  status => queued
17-09-2020 15:39:33 delimited.csv: delimited.csv_2020-09-17T14-34-24.060428 status => parsed
17-09-2020 15:39:33 delimited.csv: delimited.csv_2020-09-17T14-34-24.060428 processed => 3041    total => 3041
17-09-2020 15:39:33 delimited.csv: delimited.csv_2020-09-17T14-34-24.060428 Writing output files
17-09-2020 15:39:33 delimited.csv: delimited.csv_2020-09-17T14-34-24.060428 Stream JSON => /opt/geofilter_file_outbox/delimited.csv_2020-09-17T14-34-24.060428_stream.json
17-09-2020 15:39:34 delimited.csv: delimited.csv_2020-09-17T14-34-24.060428 Stream CSV => /opt/geofilter_file_outbox/delimited.csv_2020-09-17T14-34-24.060428_stream.csv
17-09-2020 15:39:34 delimited.csv: delimited.csv_2020-09-17T14-34-24.060428 Trips JSON => /opt/geofilter_file_outbox/delimited.csv_2020-09-17T14-34-24.060428_trips.json
17-09-2020 15:39:34 delimited.csv: delimited.csv_2020-09-17T14-34-24.060428 Processing complete.

File processor produces three output files for each input batch. e.g.;

Example 2 - Fixed width¶

Copy fixed format config file to /opt/geofilter_file_inbox.

Config file contents:

{
   "files": ["fixed.txt"],  // input data filename. Use "files": "*" to apply a default config to all files
   "format": "fixed",
   "fields": {
      "timestamp": [20, 120],     // [<column 20, index starts at 0>, <width of 120 chars>]
      "longitude": [410, 30],
      "latitude": [300, 30],
      "device_id": [200, 20]
   },
   "timeformat": "iso"
}

Copy the sample fixed file to /opt/geofilter_file_inbox.

The file will be parsed, cleaned and processed in 15 - 30 mins based on Debounce. When status changes to processed, you can pick up the output files from /opt/geofilter_file_outbox.

Always copy the config file to inbox before transferring the data file. If data file is picked up before config file, you will see failed_parsing messages in the log. Please ignore this.

You can check the status of file processing by monitoring the contents of the file /opt/geofilter_file_outbox/filemon.log:

17-09-2020 15:35:33 fixed.txt:  status => queued
17-09-2020 15:37:52 fixed.txt:  status => queued
17-09-2020 15:39:33 fixed.txt: fixed.txt_2020-09-17T14-34-24.060428 status => parsed
17-09-2020 15:39:33 fixed.txt: fixed.txt_2020-09-17T14-34-24.060428 processed => 3041    total => 3041
17-09-2020 15:39:33 fixed.txt: fixed.txt_2020-09-17T14-34-24.060428 Writing output files
17-09-2020 15:39:33 fixed.txt: fixed.txt_2020-09-17T14-34-24.060428 Stream JSON => /opt/geofilter_file_outbox/fixed.txt_2020-09-17T14-34-24.060428_stream.json
17-09-2020 15:39:34 fixed.txt: fixed.txt_2020-09-17T14-34-24.060428 Stream CSV => /opt/geofilter_file_outbox/fixed.txt_2020-09-17T14-34-24.060428_stream.csv
17-09-2020 15:39:34 fixed.txt: fixed.txt_2020-09-17T14-34-24.060428 Trips JSON => /opt/geofilter_file_outbox/fixed.txt_2020-09-17T14-34-24.060428_trips.json
17-09-2020 15:39:34 fixed.txt: fixed.txt_2020-09-17T14-34-24.060428 Processing complete.

File processor produces 3 output files for each input batch. e.g.;

Example 3 - JSON Basic¶

Copy basic config file to /opt/geofilter_file_inbox.

Config file contents:

{
   "files": ["basic.json"],  // input data filename. Use "files": "*" to apply a default config to all files
   "format": "json_basic",
   "fields: {},
   "timeformat": "iso"
}

Copy the sample fixed file to /opt/geofilter_file_inbox.

The file will be parsed, cleaned and processed in 15 - 30 mins based on Debounce. When status changes to processed, you can pick up the output files from /opt/geofilter_file_outbox.

Always copy the config file to inbox before transferring the data file. If data file is picked up before config file, you will see failed_parsing messages in the log. Please ignore this.

You can check the status of file processing by monitoring the contents of the file /opt/geofilter_file_outbox/filemon.log:

17-09-2020 15:37:52 basic.json:  status => queued
17-09-2020 15:39:33 basic.json: basic.json_2020-09-17T14-34-24.060428 status => parsed
17-09-2020 15:39:33 basic.json: basic.json_2020-09-17T14-34-24.060428 processed => 3041    total => 3041
17-09-2020 15:39:33 basic.json: basic.json_2020-09-17T14-34-24.060428 Writing output files
17-09-2020 15:39:33 basic.json: basic.json_2020-09-17T14-34-24.060428 Stream JSON => /opt/geofilter_file_outbox/basic.json_2020-09-17T14-34-24.060428_stream.json
17-09-2020 15:39:34 basic.json: basic.json_2020-09-17T14-34-24.060428 Stream CSV => /opt/geofilter_file_outbox/basic.json_2020-09-17T14-34-24.060428_stream.csv
17-09-2020 15:39:34 basic.json: basic.json_2020-09-17T14-34-24.060428 Trips JSON => /opt/geofilter_file_outbox/basic.json_2020-09-17T14-34-24.060428_trips.json
17-09-2020 15:39:34 basic.json: basic.json_2020-09-17T14-34-24.060428 Processing complete.

File processor produces 3 files as output for each input batch. e.g.;

Example 4 - JSON Rnbg¶

Copy this config file to /opt/geofilter_file_inbox.

Config file contents:

{
   "files": ["rnbg.json"],  // input data filename. Use "files": "*" to apply a default config to all files
   "format": "json_rnbg",
   "fields: {},
   "timeformat": "iso"
}

Copy the sample fixed file to /opt/geofilter_file_inbox.

The file will be parsed, cleaned and processed in 15 - 30 mins based on Debounce. When status changes to processed, you can pick up the output files from /opt/geofilter_file_outbox.

Always copy the config file to inbox before transferring the data file. If data file is picked up before config file, you will see failed_parsing messages in the log. Please ignore this.

You can check the status of file processing by monitoring the contents of the file /opt/geofilter_file_outbox/filemon.log:

17-09-2020 15:37:52 rnbg.json:  status => queued
17-09-2020 15:39:33 rnbg.json: rnbg.json_2020-09-17T14-34-24.060428 status => parsed
17-09-2020 15:39:33 rnbg.json: rnbg.json_2020-09-17T14-34-24.060428 processed => 3041    total => 3041
17-09-2020 15:39:33 rnbg.json: rnbg.json_2020-09-17T14-34-24.060428 Writing output files
17-09-2020 15:39:33 rnbg.json: rnbg.json_2020-09-17T14-34-24.060428 Stream JSON => /opt/geofilter_file_outbox/rnbg.json_2020-09-17T14-34-24.060428_stream.json
17-09-2020 15:39:34 rnbg.json: rnbg.json_2020-09-17T14-34-24.060428 Stream CSV => /opt/geofilter_file_outbox/rnbg.json_2020-09-17T14-34-24.060428_stream.csv
17-09-2020 15:39:34 rnbg.json: rnbg.json_2020-09-17T14-34-24.060428 Trips JSON => /opt/geofilter_file_outbox/rnbg.json_2020-09-17T14-34-24.060428_trips.json
17-09-2020 15:39:34 rnbg.json: rnbg.json_2020-09-17T14-34-24.060428 Processing complete.

File processor produces 3 output files for each input batch. e.g.;

Input file formats¶

Fixed¶

Delimited¶

JSON Basic¶

JSON Rnbg¶

Config files¶

Config files specify input file formats. These are json files with .conf extension. A simple conf file for fixed width input is below.

{
   "files": "*",
   "format": "fixed",
   "fields": {
      "timestamp": [0, 10],
      "longitude": [1, 6],
      "latitude": [2, 6]
   }
}

files: list of input filenames to which the config file applies. Specify “*” for default global config.
format: Input file format. Allowed values are
- fixed
- delimited
- json_basic
- json_rnbg
fields: Field names and where to find each field.
- At a minimum you need to specify timestamp, latitude and longitude.
- For fixed width format specify starting column number and width for each field. e.g.;
  { timestamp: [0, 10], //timestamp starts at position 0 with width 10 longitude: [10, 6], //longitude starts at position 10 with width 6 latitude: [16, 6], device_id: [22, 5] }
- For delimited specify field number. Width is ignored
  { timestamp: [0, 99], //timestamp is the first field (columns are 0 indexed). Width is ignored. longitude: [2, 99], //longitude is field 3 (0 indexed, so index 2) latitude: [3, 99], // latitude is field 4 ( index 3) device_id: [4, 99] // device id is field 5 (index 4) }
- For JSON, you don’t need to specify fields in the config file. File processor expects the input file to contain an array of JSON objects with timestamp, longitude and latitude as keys. Keys read by file processor are:
  
  Key
  
  Required
  
  timestamp
  
  Y
  
  latitude
  
  Y
  
  longitude
  
  Y
  
  device_id
  
  altitude
  
  accuracy
  
  speed
  
  activity_type
  
  activity_confidence
delimiter: Character to be used as delimiter
quotechar: Quote char for delimited files. Default double quote.
fields: Field names and where to find each field.
timeformat: Timestring format. Allowed values are:
- iso : iso 8601 format. Respects timezone offset or defaults to UTC.
- unix : unix epoch seconds in UTC timezone

Output files¶

File processor produces 3 output files for each input batch.

1. stream_<batch_id>.json

JSON data with below fields:

Field	Description
id	global record id
row_id	record id within a batch
timestamp	iso-8601 timestamp(UTC)
tz	original timezone
batch_id
trip_id
device_id
lattiude	decimal, epsg 4326
longitude	decimal, epsg 4326
altitude
heading
speed
activity
activity_confidence
accuracy
filter_time	true if time in future
filter_accuracy	true if accuracy < 250m
filter_static	true if static error
filter_jumps	true if GPS jumps
address	OpenStreetMap GeoJSON

See an example

2. stream_<batch_id>.csv

Content of stream*.json file as a flattened csv.

See an example

3. trips_<batch_id>.json

An array of JSON objects with below fields:

Field	Description
id	trip id
device_id
start_time	iso-8601 timestamp(UTC)
end_time	iso-8601 timestamp(UTC)
timezone_offset	original timezone
metres	trips distance in meteres
route_gejson	route as a GeoJSON LineString
route_polyline	route as a Google polyline
address	OSM GeoJSON start/end address

See an example

File processor¶

Example 1 - Delimited¶

Example 2 - Fixed width¶

Example 3 - JSON Basic¶

Example 4 - JSON Rnbg¶

Input file formats¶

Fixed¶

Delimited¶

JSON Basic¶

JSON Rnbg¶

Config files¶

Output files¶

Navigation

Related Topics

Key	Required
timestamp	Y
latitude	Y
longitude	Y
device_id
altitude
accuracy
speed
activity_type
activity_confidence