Blog

Small steps to big data: How to retrieve telematics data at Navixy for further analysis

This article explores how telematics data is currently retrieved and analyzed using Navixy’s Raw Data API, which enables access to valuable historical insights. Soon, with the launch of IoT Logic, data management will enter a new phase—featuring real-time processing, dynamic enrichment, and advanced custom attributes. 

When you have dozens of objects to monitor, maintaining operational efficiency is relatively straightforward, as you can manage units and their attributes directly. However, if your fleet contains more than 1,000 units, this task becomes significantly more challenging. It requires additional tools and resources to combine and aggregate data for further analysis. Advanced companies are seeking additional ways to manage and transform joint data into actionable insights.

Collecting and consolidating telematics data manually for each object can be extremely time-consuming. For example, extracting and combining data for a single object through Navixy GUI can require approximately 30 seconds. For a fleet of 2,000 objects, this operation could take over 16 hours.

Such operations can be further complicated by technological limitations, such as the inability of many spreadsheet editors to efficiently handle files containing millions of rows.

This article presents best practices for automating the extraction of such data using a Python script. It will equip you for further telematics data analysis and enable you to make decisions based on the insights gained.

What benefits does telematics big data offer fleet managers?

Let’s explore some innovative use cases of telematics big data beyond GPS information. Imagine leveraging additional sensor data to gain deeper insights and drive better decisions.

Fleet redistribution for kicksharing. Bike and scooter-sharing providers aim to minimize downtime by strategically redistributing units to high-demand areas. However, cities may want to provide people with more mobility options. For instance, Baltimore's regulations require redistribution whenever more than 35% of the fleet accumulates in one zone (NACTO Shared Micromobility Guidelines).

Optimizing passengers flow. Passenger-counter sensor data reveals peak usage areas and times. According to the American Public Transportation Association, route optimization based on such data can reduce passenger wait times by up to 30%, improving service quality and reducing congestion.

Analyzing road conditions. Accelerometer data indicating harsh braking often correlates with poor road conditions. The Federal Highway Administration reports that improving roads can decrease accidents by up to 25%. This versatile approach to a more accurate driver assessment, improving both safety and coaching efficiency.

Boosting agricultural productivity. Farmers using GIS and GPS tracking data along with data from different sensors have seen yield increases of up to 30% compared to those relying on traditional methods (FAO & ITU, 2022). Additionally, AI-powered disease detection tools enable early intervention, reducing crop losses by 30–40%.

With the launch of the forthcoming Navixy IoT Logic tool, you will be able to transform your data into actionable insights, driving your business efficiency to the next level.

What is big data as a concept?

“There is still a big question: what is big data? While some skeptics dismiss it as hype and just another trend in IT, others are leveraging this data for business purposes to drive growth and improvement. Regardless of the debate, the volume of big data has been increasing in recent years. A report from IDC forecasts that the global IoT datasphere will expand to 175 zettabytes by 2025. These figures highlight the potential power of big data for decision-making processes in managing fleets, ensuring cost-effectiveness, and achieving other global goals.”

Andrey Melnik, Navixy Product owner in Data Ops team

The concept of big data is often characterized by the 3Vs: Volume, Velocity, and Variety.

  • Volume refers to the immense amount of data generated and collected.
  • Velocity pertains to the speed at which data is produced and processed.
  • Variety encompasses the diverse types of data sources and formats.

In Navixy our responsibility extends beyond these traditional 3Vs. We also ensure the Veracity of the data, confirming that it is accurate, and extract genuine Value from it. Navixy's ultimate goal is to transform this telematics data into meaningful insights that drive real-world improvements.

How does raw data transform into big data?

Discussing big data typically refers to a vast amount of raw data. Navixy has already implemented a feature that allows you to retrieve raw data for a particular vehicle. This includes geospatial data, sensor readings, status information, and more — essentially, everything the tracker transmits. This data is usually needed for troubleshooting purposes, helping to identify anomalies in the operation of a specific GPS tracker, verify settings versus values, and ensure that the tracker works effectively.

For fleet management companies, big data encompasses the comprehensive dataset for the entire fleet, which can be leveraged for strategic views and objectives. The Navixy IoT raw data functionality could be applied to one object at a time. In this article, we will describe how to automate the routine process of data retrieval and combine it into a dataset suitable for further analysis.

How to retrieve telematics raw data for your entire fleet?

This section outlines the process of using the Navixy API and Python to gather and store your entire fleet’s raw data for further analysis.

Before you get started, we would like to provide some essential information so that you can properly plan your time expenditures. Working with big data operations for data retrieval can take time, depending on:

  • the time window for which you want to retrieve
  • the data the number of parameters and values you wish to retrieve.

Example
It can take about 25 minutes to retrieve speed, latitude, and longitude data for a fleet with about 1500 units for a 60-day period. In this case, the CSV file size can reach the size of 6 GB, containing over 100 million rows.

Ensure you have already obtained the API KEY for data retrieval.

A data-retrieval algorithm

This high-level algorithm scheme will provide an understanding of the general logic:

Small steps to big data: How to retrieve telematics data at Navixy for further analysis
Click to enlarge

Let’s explore these steps in more detail related to code.

Step 1: Retrieve the list of mobility units

  1. Ensure you have the required Python libraries (‘requests’ and ‘pandas’) installed. You will need the requests library for making API calls.
  2. Write a Python script to retrieve the list of units from the Navixy API.
  3. Set up the API parameters:
    - Define your API key and the URL for the Navixy API endpoint.
    - Set the headers and payload for the API request.
  4. Make the API request by using the ‘requests’ library to send a POST request to the Navixy API.
  5. Extract the list of tracker IDs from the API response.

Step 2: Retrieve the raw data for all mobility units

  1. Extend the script to retrieve raw data for each mobility unit using the raw data API.
  2. Identify required parameters:
    - Determine the parameters you need to retrieve, such as speed, latitude, and longitude.
    - Refer to additional documentation for a complete list of retrievable parameters.
  3. Define the API request:
    - Set the URL, headers, and payload for the Raw Data API request.
    - Define the time range and columns for the Raw Data request.
  4. Loop through the tracker IDs. For each tracker ID, send a POST request to the raw data API.
  5. Store the raw data response in a list.

Step 3: Create a single file with all data for further analysis

  1. Process the raw data:
    - Use the ‘pandas’ library to read the raw data into a DataFrame.
    - Add the tracker ID as the first column in the DataFrame.
  2. Ensure all required columns are present, filling the missing columns with NULL.
  3. Combine all data frames into a single DataFrame file.
  4. Save the concatenated DataFrame to a CSV or Parquet file for further analysis.

 

File typeAdvantagesApplications
CSV
  • Simplicity: Easy to read and write, making it accessible for quick telematics data analysis.
  • Compatibility: Widely supported by various tools and platforms.
  • Human-readable: Can be opened and edited in text editors or spreadsheet software.
  • Quick telematics data analysis: Suitable for small to medium-sized datasets where quick, ad-hoc analysis is needed.
  • Data sharing: Easily shareable with non-technical users.
Parquet
  • Efficiency: Columnar storage format highly efficient for both storage and query performance.
  • Scalability: Ideal for large datasets and big data processing frameworks like Apache Spark.
  • Compression: Supports efficient compression, reducing file size.
  • Big data analysis: Suitable for large-scale data processing and analytics.
  • High-performance queries: Ideal for environments where query performance is critical.
  • Data lakes: Commonly used in data lakes and big data ecosystems
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
import requests
import pandas as pd
from io import StringIO
# Step 1: Retrieve the list of tr IDs
api_key_hash = "032eaea32ad34a1acc345c2" # Replace it with your API key, TIPS: This is just code example. Avoid hardcoding API keys; use environment variables
url = 'https://api.eu.navixy.com/v2/tracker/list' # Replace it, depending on your region
headers = {'Content-Type': 'application/json'}
data = {'hash': api_key_hash}
response = requests.post(url, headers=headers, json=data)
tracker_list = response.json()['list']
tracker_ids = [tracker['id'] for tracker in tracker_list]
# Step 2: Retrieve raw data for each tr ID
raw_data_url = 'https://api.eu.navixy.com/dwh/v1/tracker/raw_data/read'
raw_data_headers = {
'accept': 'text/csv',
'Content-Type': 'application/json'
}
# TIPS: Refactor into functions for readability and reuse
# Define the time range and columns for the raw data request
from_time = "2025-01-01T00:00:00Z"
to_time = "2025-01-08T23:59:59Z"
columns = [
"lat",
"lng",
"speed"
]
# List to store DataFrames
dataframes = []
# TIPS: Consider multithreading to speed up data collection if required
for tracker_id in tracker_ids:
raw_data_payload = {
"hash": api_key_hash,
"tracker_id": tracker_id,
"from": from_time,
"to": to_time,
"columns": columns
}
raw_data_response = requests.post(raw_data_url, headers=raw_data_headers, json=raw_data_payload)
csv_data = raw_data_response.text
# Read CSV data into a DataFrame
df = pd.read_csv(StringIO(csv_data))
# Add the 'id' column as the first column
df.insert(0, 'id', tracker_id)
# Ensure all columns are present, filling missing columns with NULL
for col in columns:
if col not in df.columns:
df[col] = 'NULL'
dataframes.append(df)
# Step 3 Concatenate all DataFrames
final_df = pd.concat(dataframes, ignore_index=True)
#TIPS: Add timestamp or context to filenames for versioning if you do extraction on regular basis
# Save the final DataFrame to a CSV file
final_df.to_csv('all_raw_data.csv', index=False)
print("Data concatenation complete. Saved to 'all_raw_data.csv'.")
import requests import pandas as pd from io import StringIO # Step 1: Retrieve the list of tr IDs api_key_hash = "032eaea32ad34a1acc345c2" # Replace it with your API key, TIPS: This is just code example. Avoid hardcoding API keys; use environment variables url = 'https://api.eu.navixy.com/v2/tracker/list' # Replace it, depending on your region headers = {'Content-Type': 'application/json'} data = {'hash': api_key_hash} response = requests.post(url, headers=headers, json=data) tracker_list = response.json()['list'] tracker_ids = [tracker['id'] for tracker in tracker_list] # Step 2: Retrieve raw data for each tr ID raw_data_url = 'https://api.eu.navixy.com/dwh/v1/tracker/raw_data/read' raw_data_headers = { 'accept': 'text/csv', 'Content-Type': 'application/json' } # TIPS: Refactor into functions for readability and reuse # Define the time range and columns for the raw data request from_time = "2025-01-01T00:00:00Z" to_time = "2025-01-08T23:59:59Z" columns = [ "lat", "lng", "speed" ] # List to store DataFrames dataframes = [] # TIPS: Consider multithreading to speed up data collection if required for tracker_id in tracker_ids: raw_data_payload = { "hash": api_key_hash, "tracker_id": tracker_id, "from": from_time, "to": to_time, "columns": columns } raw_data_response = requests.post(raw_data_url, headers=raw_data_headers, json=raw_data_payload) csv_data = raw_data_response.text # Read CSV data into a DataFrame df = pd.read_csv(StringIO(csv_data)) # Add the 'id' column as the first column df.insert(0, 'id', tracker_id) # Ensure all columns are present, filling missing columns with NULL for col in columns: if col not in df.columns: df[col] = 'NULL' dataframes.append(df) # Step 3 Concatenate all DataFrames final_df = pd.concat(dataframes, ignore_index=True) #TIPS: Add timestamp or context to filenames for versioning if you do extraction on regular basis # Save the final DataFrame to a CSV file final_df.to_csv('all_raw_data.csv', index=False) print("Data concatenation complete. Saved to 'all_raw_data.csv'.")
import requests
import pandas as pd
from io import StringIO

# Step 1: Retrieve the list of tr IDs
api_key_hash = "032eaea32ad34a1acc345c2"  # Replace it with your API key, TIPS: This is just code example. Avoid hardcoding API keys; use environment variables
url = 'https://api.eu.navixy.com/v2/tracker/list' # Replace it, depending on your region
headers = {'Content-Type': 'application/json'}
data = {'hash': api_key_hash}

response = requests.post(url, headers=headers, json=data)
tracker_list = response.json()['list']
tracker_ids = [tracker['id'] for tracker in tracker_list]

# Step 2: Retrieve raw data for each tr ID
raw_data_url = 'https://api.eu.navixy.com/dwh/v1/tracker/raw_data/read'
raw_data_headers = {
    'accept': 'text/csv',
    'Content-Type': 'application/json'
}

# TIPS: Refactor into functions for readability and reuse 

# Define the time range and columns for the raw data request
from_time = "2025-01-01T00:00:00Z"
to_time = "2025-01-08T23:59:59Z"
columns = [
    "lat",
    "lng",
    "speed"
]

# List to store DataFrames
dataframes = []

# TIPS: Consider multithreading to speed up data collection if required

for tracker_id in tracker_ids:
    raw_data_payload = {
        "hash": api_key_hash,
        "tracker_id": tracker_id,
        "from": from_time,
        "to": to_time,
        "columns": columns
    }

    raw_data_response = requests.post(raw_data_url, headers=raw_data_headers, json=raw_data_payload)
    csv_data = raw_data_response.text

    # Read CSV data into a DataFrame
    df = pd.read_csv(StringIO(csv_data))

    # Add the 'id' column as the first column
    df.insert(0, 'id', tracker_id)

    # Ensure all columns are present, filling missing columns with NULL
    for col in columns:
        if col not in df.columns:
            df[col] = 'NULL'

    dataframes.append(df)

# Step 3 Concatenate all DataFrames
final_df = pd.concat(dataframes, ignore_index=True)

#TIPS: Add timestamp or context to filenames for versioning if you do extraction on regular basis

# Save the final DataFrame to a CSV file
final_df.to_csv('all_raw_data.csv', index=False)

print("Data concatenation complete. Saved to 'all_raw_data.csv'.")

Expected output

The final CSV/Parquet file will include columns for the tracker ID, message time, latitude, longitude, and speed. Each row will represent a data packet with its own timestamp, allowing for calculations and aggregations. See the example of CSV file format below.

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
"id","msg_time","lat","lng","speed"
22334455,"2024-01-30T13:13:14+0600",4.22809,9.5264283,28
22334455,"2024-01-30T13:13:25+0600",4.228095,9.5278333,32
22334455,"2024-01-30T13:13:36+0600",4.227765,9.5293916,39
...
"id","msg_time","lat","lng","speed" 22334455,"2024-01-30T13:13:14+0600",4.22809,9.5264283,28 22334455,"2024-01-30T13:13:25+0600",4.228095,9.5278333,32 22334455,"2024-01-30T13:13:36+0600",4.227765,9.5293916,39 ...
"id","msg_time","lat","lng","speed"
22334455,"2024-01-30T13:13:14+0600",4.22809,9.5264283,28
22334455,"2024-01-30T13:13:25+0600",4.228095,9.5278333,32
22334455,"2024-01-30T13:13:36+0600",4.227765,9.5293916,39
...

Note
As the time needed for the data retrieval might be dozens of minutes, depending on different conditions including speed of the internet or your PC’s capacity, we recommend testing this script for short timeframes (a few days) to check how it performs in your environment.

Conclusion: Navixy IoT will transform your data to business insights

In this article, we showcased a solution based on our existing Raw Data API, enhanced by a Python script. This combination will allow our customers to collect and store raw data for their entire fleet, preparing them for comprehensive telematics big data analysis. This guide will help you to prepare your data for next steps. In our next article, we’ll explore data preprocessing methods to ensure high-quality data outcomes.

At Navixy, we recognize the growing demand for telematics big data and its significant potential. To address this need, Navixy embarked on a substantial initiative into the realm of telematics big data. Forthcoming Navixy IoT Logic launch will provide partners and customers with data and tools designed to simplify extracting, transforming, loading, and visualizing telematics data.

Ready to transform your telematics data management?

Contact Navixy today to schedule a consultation and discover how the Navixy IoT Logic can reshape your business.

← Previous article
Ready for the most innovative GPS tracking software?
SIGN UP
Recent posts