File processor

File processor flow

Example 1 - Delimited

Copy delimited config file to /opt/geofilter_file_inbox.

Config file contents:

{
   "files": ["delimited.csv"],  // input data filename. Use "files": "*" to apply a default config to all files
   "format": "delimited",
   "fields": {
      "timestamp": [1, 99],     // [<column, index starts at 0>, <width: ignored>]
      "longitude": [4, 99],
      "latitude": [3, 99],
      "device_id": [2, 99]
   },
   "delimiter": ",",
   "quotechar": '"',
   "timeformat": "iso"
}

Copy the sample delimited file to /opt/geofilter_file_inbox.

The file will be parsed, cleaned and processed in 15 - 30 mins based on Debounce. When status changes to processed, you can pick up the output files from /opt/geofilter_file_outbox.

Always copy the config file to inbox before transferring the data file. If data file is picked up before config file, you will see failed_parsing messages in the log. Please ignore this.

You can check the status of file processing by monitoring the contents of the file /opt/geofilter_file_outbox/filemon.log.:

17-09-2020 15:00:20 delimited.csv:  status => queued
17-09-2020 15:02:24 delimited.csv:  status => queued
17-09-2020 15:04:47 delimited.csv:  status => queued
17-09-2020 15:06:48 delimited.csv:  status => queued
17-09-2020 15:08:38 delimited.csv:  status => queued
17-09-2020 15:11:12 delimited.csv:  status => queued
17-09-2020 15:13:47 delimited.csv:  status => queued
17-09-2020 15:15:28 delimited.csv:  status => queued
17-09-2020 15:17:36 delimited.csv:  status => queued
17-09-2020 15:19:39 delimited.csv:  status => queued
17-09-2020 15:22:29 delimited.csv:  status => queued
17-09-2020 15:25:10 delimited.csv:  status => queued
17-09-2020 15:27:21 delimited.csv:  status => queued
17-09-2020 15:29:54 delimited.csv:  status => queued
17-09-2020 15:31:29 delimited.csv:  status => queued
17-09-2020 15:33:59 delimited.csv:  status => queued
17-09-2020 15:35:33 delimited.csv:  status => queued
17-09-2020 15:37:52 delimited.csv:  status => queued
17-09-2020 15:39:33 delimited.csv: delimited.csv_2020-09-17T14-34-24.060428 status => parsed
17-09-2020 15:39:33 delimited.csv: delimited.csv_2020-09-17T14-34-24.060428 processed => 3041    total => 3041
17-09-2020 15:39:33 delimited.csv: delimited.csv_2020-09-17T14-34-24.060428 Writing output files
17-09-2020 15:39:33 delimited.csv: delimited.csv_2020-09-17T14-34-24.060428 Stream JSON => /opt/geofilter_file_outbox/delimited.csv_2020-09-17T14-34-24.060428_stream.json
17-09-2020 15:39:34 delimited.csv: delimited.csv_2020-09-17T14-34-24.060428 Stream CSV => /opt/geofilter_file_outbox/delimited.csv_2020-09-17T14-34-24.060428_stream.csv
17-09-2020 15:39:34 delimited.csv: delimited.csv_2020-09-17T14-34-24.060428 Trips JSON => /opt/geofilter_file_outbox/delimited.csv_2020-09-17T14-34-24.060428_trips.json
17-09-2020 15:39:34 delimited.csv: delimited.csv_2020-09-17T14-34-24.060428 Processing complete.
File processor produces three output files for each input batch. e.g.;

Example 2 - Fixed width

Copy fixed format config file to /opt/geofilter_file_inbox.

Config file contents:

{
   "files": ["fixed.txt"],  // input data filename. Use "files": "*" to apply a default config to all files
   "format": "fixed",
   "fields": {
      "timestamp": [20, 120],     // [<column 20, index starts at 0>, <width of 120 chars>]
      "longitude": [410, 30],
      "latitude": [300, 30],
      "device_id": [200, 20]
   },
   "timeformat": "iso"
}

Copy the sample fixed file to /opt/geofilter_file_inbox.

The file will be parsed, cleaned and processed in 15 - 30 mins based on Debounce. When status changes to processed, you can pick up the output files from /opt/geofilter_file_outbox.

Always copy the config file to inbox before transferring the data file. If data file is picked up before config file, you will see failed_parsing messages in the log. Please ignore this.

You can check the status of file processing by monitoring the contents of the file /opt/geofilter_file_outbox/filemon.log:

17-09-2020 15:35:33 fixed.txt:  status => queued
17-09-2020 15:37:52 fixed.txt:  status => queued
17-09-2020 15:39:33 fixed.txt: fixed.txt_2020-09-17T14-34-24.060428 status => parsed
17-09-2020 15:39:33 fixed.txt: fixed.txt_2020-09-17T14-34-24.060428 processed => 3041    total => 3041
17-09-2020 15:39:33 fixed.txt: fixed.txt_2020-09-17T14-34-24.060428 Writing output files
17-09-2020 15:39:33 fixed.txt: fixed.txt_2020-09-17T14-34-24.060428 Stream JSON => /opt/geofilter_file_outbox/fixed.txt_2020-09-17T14-34-24.060428_stream.json
17-09-2020 15:39:34 fixed.txt: fixed.txt_2020-09-17T14-34-24.060428 Stream CSV => /opt/geofilter_file_outbox/fixed.txt_2020-09-17T14-34-24.060428_stream.csv
17-09-2020 15:39:34 fixed.txt: fixed.txt_2020-09-17T14-34-24.060428 Trips JSON => /opt/geofilter_file_outbox/fixed.txt_2020-09-17T14-34-24.060428_trips.json
17-09-2020 15:39:34 fixed.txt: fixed.txt_2020-09-17T14-34-24.060428 Processing complete.
File processor produces 3 output files for each input batch. e.g.;

Example 3 - JSON Basic

Copy basic config file to /opt/geofilter_file_inbox.

Config file contents:

{
   "files": ["basic.json"],  // input data filename. Use "files": "*" to apply a default config to all files
   "format": "json_basic",
   "fields: {},
   "timeformat": "iso"
}

Copy the sample fixed file to /opt/geofilter_file_inbox.

The file will be parsed, cleaned and processed in 15 - 30 mins based on Debounce. When status changes to processed, you can pick up the output files from /opt/geofilter_file_outbox.

Always copy the config file to inbox before transferring the data file. If data file is picked up before config file, you will see failed_parsing messages in the log. Please ignore this.

You can check the status of file processing by monitoring the contents of the file /opt/geofilter_file_outbox/filemon.log:

17-09-2020 15:37:52 basic.json:  status => queued
17-09-2020 15:39:33 basic.json: basic.json_2020-09-17T14-34-24.060428 status => parsed
17-09-2020 15:39:33 basic.json: basic.json_2020-09-17T14-34-24.060428 processed => 3041    total => 3041
17-09-2020 15:39:33 basic.json: basic.json_2020-09-17T14-34-24.060428 Writing output files
17-09-2020 15:39:33 basic.json: basic.json_2020-09-17T14-34-24.060428 Stream JSON => /opt/geofilter_file_outbox/basic.json_2020-09-17T14-34-24.060428_stream.json
17-09-2020 15:39:34 basic.json: basic.json_2020-09-17T14-34-24.060428 Stream CSV => /opt/geofilter_file_outbox/basic.json_2020-09-17T14-34-24.060428_stream.csv
17-09-2020 15:39:34 basic.json: basic.json_2020-09-17T14-34-24.060428 Trips JSON => /opt/geofilter_file_outbox/basic.json_2020-09-17T14-34-24.060428_trips.json
17-09-2020 15:39:34 basic.json: basic.json_2020-09-17T14-34-24.060428 Processing complete.
File processor produces 3 files as output for each input batch. e.g.;

Example 4 - JSON Rnbg

Copy this config file to /opt/geofilter_file_inbox.

Config file contents:

{
   "files": ["rnbg.json"],  // input data filename. Use "files": "*" to apply a default config to all files
   "format": "json_rnbg",
   "fields: {},
   "timeformat": "iso"
}

Copy the sample fixed file to /opt/geofilter_file_inbox.

The file will be parsed, cleaned and processed in 15 - 30 mins based on Debounce. When status changes to processed, you can pick up the output files from /opt/geofilter_file_outbox.

Always copy the config file to inbox before transferring the data file. If data file is picked up before config file, you will see failed_parsing messages in the log. Please ignore this.

You can check the status of file processing by monitoring the contents of the file /opt/geofilter_file_outbox/filemon.log:

17-09-2020 15:37:52 rnbg.json:  status => queued
17-09-2020 15:39:33 rnbg.json: rnbg.json_2020-09-17T14-34-24.060428 status => parsed
17-09-2020 15:39:33 rnbg.json: rnbg.json_2020-09-17T14-34-24.060428 processed => 3041    total => 3041
17-09-2020 15:39:33 rnbg.json: rnbg.json_2020-09-17T14-34-24.060428 Writing output files
17-09-2020 15:39:33 rnbg.json: rnbg.json_2020-09-17T14-34-24.060428 Stream JSON => /opt/geofilter_file_outbox/rnbg.json_2020-09-17T14-34-24.060428_stream.json
17-09-2020 15:39:34 rnbg.json: rnbg.json_2020-09-17T14-34-24.060428 Stream CSV => /opt/geofilter_file_outbox/rnbg.json_2020-09-17T14-34-24.060428_stream.csv
17-09-2020 15:39:34 rnbg.json: rnbg.json_2020-09-17T14-34-24.060428 Trips JSON => /opt/geofilter_file_outbox/rnbg.json_2020-09-17T14-34-24.060428_trips.json
17-09-2020 15:39:34 rnbg.json: rnbg.json_2020-09-17T14-34-24.060428 Processing complete.
File processor produces 3 output files for each input batch. e.g.;

Input file formats

Fixed

Delimited

JSON Basic

JSON Rnbg

Config files

Config files specify input file formats. These are json files with .conf extension. A simple conf file for fixed width input is below.

{
   "files": "*",
   "format": "fixed",
   "fields": {
      "timestamp": [0, 10],
      "longitude": [1, 6],
      "latitude": [2, 6]
   }
}
  • files: list of input filenames to which the config file applies. Specify “*” for default global config.

  • format: Input file format. Allowed values are
  • fields: Field names and where to find each field.
    • At a minimum you need to specify timestamp, latitude and longitude.

    • For fixed width format specify starting column number and width for each field. e.g.;

      {
         timestamp: [0, 10], //timestamp starts at position 0 with width 10
         longitude: [10, 6], //longitude starts at position 10 with width 6
         latitude: [16, 6],
         device_id: [22, 5]
      }
      
    • For delimited specify field number. Width is ignored

      {
         timestamp: [0, 99], //timestamp is the first field (columns are 0 indexed). Width is ignored.
         longitude: [2, 99],  //longitude is field 3 (0 indexed, so index 2)
         latitude: [3, 99],   // latitude is field 4 ( index 3)
         device_id: [4, 99]   // device id is field 5 (index 4)
      }
      
    • For JSON, you don’t need to specify fields in the config file. File processor expects the input file to contain an array of JSON objects with timestamp, longitude and latitude as keys. Keys read by file processor are:

      Key

      Required

      timestamp

      Y

      latitude

      Y

      longitude

      Y

      device_id

      altitude

      accuracy

      speed

      activity_type

      activity_confidence

  • delimiter: Character to be used as delimiter

  • quotechar: Quote char for delimited files. Default double quote.

  • fields: Field names and where to find each field.

  • timeformat: Timestring format. Allowed values are:
    • iso : iso 8601 format. Respects timezone offset or defaults to UTC.

    • unix : unix epoch seconds in UTC timezone

Output files

File processor produces 3 output files for each input batch.

1. stream_<batch_id>.json

JSON data with below fields:

Field

Description

id

global record id

row_id

record id within a batch

timestamp

iso-8601 timestamp(UTC)

tz

original timezone

batch_id

trip_id

device_id

lattiude

decimal, epsg 4326

longitude

decimal, epsg 4326

altitude

heading

speed

activity

activity_confidence

accuracy

filter_time

true if time in future

filter_accuracy

true if accuracy < 250m

filter_static

true if static error

filter_jumps

true if GPS jumps

address

OpenStreetMap GeoJSON

See an example

2. stream_<batch_id>.csv

Content of stream*.json file as a flattened csv.

See an example

3. trips_<batch_id>.json

An array of JSON objects with below fields:

Field

Description

id

trip id

device_id

start_time

iso-8601 timestamp(UTC)

end_time

iso-8601 timestamp(UTC)

timezone_offset

original timezone

metres

trips distance in meteres

route_gejson

route as a GeoJSON LineString

route_polyline

route as a Google polyline

address

OSM GeoJSON start/end address

See an example