Privacy Notice

Data Privacy & Engineering Standards

Guidelines for handling Personally Identifiable Information (PII) within the freight data lakehouse.

1. The Context of Freight Privacy

In logistics, the line between “Industrial Asset” (the truck) and “Individual” (the driver) is often blurred. An ELD (Electronic Logging Device) records engine telemetry, but it also records Hours of Service (HOS), which are legal records of a human’s work and rest periods. Furthermore, owner-operators often use their home address as their business registration address.

Therefore, BigData-ETL treats all telemetry data as Restricted by default until proven otherwise. We adhere to GDPR (EU), CCPA (California), and general global data protection standards.

2. Data Classification Levels

Level Examples Storage Requirement
L1: Public Facility Addresses, Carrier SCAC Codes, Public Weather Data. Cleartext. No encryption required.
L2: Internal Shipment Values, Load Contents, Contract Rates, Lane Volumes. Encrypted at rest (AES-256). Role-Based Access Control (RBAC).
L3: PII / Restricted Driver Name, Driver Phone, CDL Number, Home/Personal Geofences. Column-Level Encryption or Tokenization. Masked in BI tools.

3. Geo-Privacy and “Home Hiding”

A specific privacy attack vector in freight is “inferring driver home location” by analyzing GPS stops during off-duty hours. To mitigate this risk, our ingestion pipeline applies a Geo-Masking algorithm.

  • When a driver status changes to OFF_DUTY or SLEEPER_BERTH;
  • AND the location is NOT a known Facility or Truck Stop;
  • The system truncates the GPS precision to 2 decimal places (~1.1km accuracy) before writing to the Silver layer.
  • This allows regional analytics (e.g., “how many drivers are in Texas”) without pinpointing a driver’s driveway.

4. The “Right to be Forgotten” (RTBF)

Under GDPR, a driver may request the deletion of their personal data. In a Lakehouse architecture based on immutable Parquet files, we cannot simply “delete a row” without rewriting potentially terabytes of data.

Technical Implementation: We utilize a “Crypto-Shredding” approach (where applicable) or Delta Lake `DELETE` operations with vacuuming.

  1. Identity Mapping: All driver PII is stored in a separate restricted `Driver_PII` table, linked to the telemetry by a surrogate `driver_uuid`.
  2. Deletion Request: When a request is verified, we delete the record from the `Driver_PII` table.
  3. Anonymization: The telemetry data (speed, location, timestamps) remains in the `Telemetry` table but is now orphaned. It is associated with a `driver_uuid` that no longer resolves to a name or phone number. The data is effectively anonymized and retained for aggregate statistical modeling (e.g., traffic patterns).

Engineering Warning

Do not hardcode driver names into log messages or error outputs. If a pipeline fails on a specific driver’s record, log the `driver_uuid` only. Logs are often ingested into systems (like Splunk/Datadog) that have different retention policies than our primary data lake.

5. Data Retention Policies

Storage is cheap, but liability is expensive. We enforce Time-To-Live (TTL) on data layers:

  • Bronze (Raw JSON): 30 Days. (Used for replay/debugging only).
  • Silver (Conformed): 7 Years. (Legal requirement for financial audits).
  • Gold (Aggregated): Indefinite. (High-level stats, no PII).
  • Driver PII: Deleted 1 year after contract termination, unless a legal hold is active.

For Data Subject Access Requests (DSAR), please contact the Data Protection Officer at [email protected].