Near-real time signals from the fleet operating on the ground are useful to businesses in a number of ways. For example, businesses can use these to:
- Monitor the performance of their fleet and identify potential problems early on
- Improve customer service by providing accurate ETAs and tracking information
- Reduce costs by identifying and addressing inefficiencies
- Improve safety by monitoring driver behavior and identifying potential hazards
- Optimize driver routes and schedules to improve efficiency
- Comply with regulations by tracking vehicle location and hours of service
This document illustrates how developers can turn signals from Google Maps Platform's "Mobility services" ("Last Mile Fleet Solution" (LMFS) or "On-demand Rides and Deliveries Solution" (ODRD) into actionable custom events. Key concepts and design decisions of the Fleet Events Reference Solution available on GitHub are also covered.
This document is relevant to:
- Architects familiar with Google Maps Platform's "Mobility services" and one of its core component "Fleet Engine". For those new to "Mobility services", we recommend familiarizing yourself with the Last Mile Fleet Solution and/or On-demand Rides and Deliveries Solution, depending on your needs.
- Architects familiar with Google Cloud. For those new to Google Cloud, Building streaming data pipelines on Google Cloud is a recommended pre-read.
- If you are targeting other environments or software stacks, focus on understanding Fleet Engine's integration points and key considerations, which should still be applicable.
- Those with general interest in exploring how events from fleets can be generated and utilized.
By the end of this document, you should have a foundational understanding of the key elements and considerations of a streaming system, along with building blocks from Google Maps Platform and Google Cloud that make up the Fleet Events Reference Solution.
Fleet Events Reference Solution Overview
The Fleet Events Reference Solution is an open source solution that enables Mobility customers and partners to generate key events on top of Fleet Engine and Google Cloud components. Today, the reference solution supports customers using the Last Mile Fleet Solution with support for On-demand Rides and Delivery to follow.
The solution automatically generates events based on changes to specific data associated with tasks or trips. You can use these events to send notifications such as the following to stakeholders or trigger other actions for your fleet.
- ETA change for task arrival
- Relative ETA change for task arrival
- Time remaining to task arrival
- Distance remaining to task arrival
- TaskOutcome status change
Each component of the reference solution can be customized to suit your business needs.
Logical Building blocks
Diagram : The following diagram shows high level building blocks that make up the Fleet Events reference solution
The reference solution contains the following components:
- Event Source: Where the original event stream comes from. Both "Last Mile Fleet Solution" or "On-demand Rides and Deliveries Solution" have integration with Cloud Logging that helps to turn Fleet Engine RPC call logs into event streams available to developers. This is the core source to consume.
- Processing: Raw RPC call logs are be converted into state change events
within this block that computes over a stream of log events. To detect such
change, this component requires a state store so that new incoming events
can be compared with previous ones to detect change. Events might not always
include all the information of interest. In such cases, this block might
make look up calls to backends as needed.
- State store: Some processing frameworks provide intermediate data persistent on its own. But if not, in order to store state on your own, since these should be unique to a vehicle and event type, a K-V type data persistence service are a good fit.
- Sink (Custom Events): Detected state change should be made available to any application or service that can benefit from it. Therefore it is a natural choice to publish this custom event to an event delivery system for downstream consumption.
- Downstream service: Code that consumes the generated events and takes actions unique to your use case.
Service selection
When it comes to implementing the reference solution for "Last Mile Fleet Solution" or "On-demand Rides and Deliveries Solution" (coming late Q3 2023), the technology selection for "Source" and "Sink '' are straightforward. On the other hand, "Processing" has a wide range of options. The reference solution has chosen the following Google services.
Diagram : The following diagram shows Google Cloud service to implement the reference solution
Cloud Project layout
We recommend that you default to a multi-project deployment. This is so that Google Maps Platform and Google Cloud consumptions can be cleanly separated and be tied to your billing arrangement of choice.
Event Source
"Last Mile Fleet Solution" and "On-demand Rides and Deliveries Solution" write API request and response payloads to Cloud Logging. Cloud Logging deliver logs to one or more services of choice. Routing to Cloud Pub/Sub is a perfect choice here and allows to convert logs into a event stream without coding.
- Logging | Fleet Performance (for LMFS users)
- Logging | Trip and Order Progress (for ODRD users)
- View logs routed to Pub/Sub : Logging → Pub/Sub integration overview
Sink
In Google Cloud, Cloud Pub/Sub is the near-real time message delivery system of choice. Just like how the events from the source were delivered to Pub/Sub, custom events are also published to Pub/Sub for downstream consumption.
Processing
The following components play a role in event processing. Like the other building blocks, the processing components are completely serverless and scale well both up and down.
- Cloud Functions as a compute
platform for the initial release (*)
- Serverless, scales up and down with scaling controls to manage costs
- Java as the programming language given the availability of client libraries for Fleet Engine related APIs which contribute to ease of implementation
- Cloud Firestore as a state store
- Serverless Key-Value store
- Cloud Pub/Sub as integration point
with upstream and downstream components
- Loosely coupled near-real time integration
The functions can be used as-is with default settings, but can also be reconfigured. Configuration parameters are set through deployment scripts and are documented in detail in the corresponding terraform module READMEs.
*Note: This reference solution plans to release alternative implementations that can help meet different requirements.
Deployment
To make the reference solution deployment process repeatable, customizable, source code controllable, and secure, Terraform is chosen as the automation tool. Terraform is a widely adopted IaC (Infrastructure as Code) tool with rich support for Google Cloud.
- Google Cloud Platform Provider: Documentation of resource supported by the "Google Cloud Platform Provider"
- Best practices for using Terraform: Rich guidance on how best to adopt in Google Cloud
- Terraform Registry: additional modules supported by Google and the community
Terraform modules
Instead of making one large monolithic reference solution deployment module, reusable blocks of automation are implemented as Terraform modules which can be independently used. Modules provide a wide range of configurable variables, most of which have default values so that you can get started quickly but also have the flexibility to customize based on your needs and preferences.
Modules included in the reference solution:
- Fleet Engine logging configuration: Automate the Cloud Logging related configurations for use with Fleet Engine. In the reference solution, it is used to route Fleet Engine related logs to a specified Pub/Sub topic.
- Fleet Events cloud function deployment: Contains the sample function code deployment and also handles the automation of permission settings required for secure cross-project integration.
- Whole reference solution deployment: Calls the previous two modules and wraps the entire solution.
Security
IAM is adopted to apply least privilege principles along with Google Cloud's security best practices such as Service Account impersonation. Reference the following articles to better understand what Google Cloud offers to give you more control over security.
Next actions
You are now ready to access and further explore the Fleet Events Reference Solution. Head to GitHub to get started.
Appendix
Gather your requirements
We recommend that you gather your requirements earlier in the process.
First, capture the details on why you are interested or need to use near-realtime events. Here are some questions to help you crystalize your needs.
- What information is required for an event stream to be useful?
- Can the outcome be derived purely from data captured or produced in the Google services? Or, is data enrichment with integrated external systems required? If so, what are those systems and what integration interfaces do they offer?
- What are the metrics you would like to measure as a business? How are they defined?
- If you need to compute metrics across events, what kind of aggregation would that require? Try to layout the logical steps. (eg. Compare ETA/ATA against SLOs across a subsect of the fleet during peak hours to compute performance under resource constraints.)
- Why are you interested in an event based model or rather than batch? Is this
for lower latency (time-to-action) or for a loosely coupled integration
(agility)?
- If for low latency, define "low". Minutes? Seconds? Sub-second? And what latency?
- Have you already invested in a technology stack and related skills as a
team? If so, what is it and what integration points does it provide?
- Are there any requirements that your current systems cannot meet or may struggle with when processing events coming from your fleet?
Design principles
It is always useful to have some thought process to follow. This helps making consistent design decisions, especially when you have a variety of options to choose from.
- Default to simpler options.
- Default to shorter time-to-value. Less code, lower learning curve.
- For latency and performance, aim to meet the bar you have set, not maximum optimization. Also avoid extreme optimization as it often leads to adding complexity.
- The same goes for cost. Keep cost reasonable. You might not yet be at the state that you can commit to use high value but relatively more expensive services.
- At an experimental phase, scaling down can be as important as scaling up. Consider a platform that gives flexibility to scale up with cap and also scale down (ideally to zero) so that you don't spend when doing nothing. High performance with always-on infrastructure can be considered later on in the journey, when you have the confidence of its needs.
- Observe and measure so that you can later identify where to further work on.
- Keep services loosely coupled. It makes it easier to swap piece by piece later on.
- Last but not least, security cannot be loose. As a service running on a public cloud environment, there cannot be any unsecured doors to the system.
Streaming concepts
If you are relatively new to event based or streaming, there are key concepts worth being aware of, some of which can be very different from batch processing.
- Scale : In contrast to batch processing, where you usually have a good idea of the amount of data to process, in streaming you cannot. A traffic jam in a city can be generating a lot of events all of a sudden from the particular area, and you still need to be able to process it.
- Windowing : Instead of processing events one-by-one, it is often the case that you want to group events over a timeline into smaller "windows" as a unit for computation. There are various windowing strategies such as "fixed windows (eg. every calendar day)", "sliding windows (last 5 minutes)", "session windows (during this trip)", which you should choose from. The longer the window, the longer the delays in producing results. Choose the right model and configuration that meet your requirements.
- Triggering : There are cases you have no other choice but to have relatively longer windows. Still, you don't want to wait for the very end of the window to produce events, but instead, rather emit intermediate results in between. This concept can be implemented for use cases where this is value in returning quick results first, and then correct them later on. Imagine emitting intermediate status at 25%, 50%, 75% completion of a delivery.
- Ordering : Events don't necessarily reach the system in the order it was generated. Especially for use cases with involvement of communication over mobile networks that adds delay and complex routing paths. You need to be aware of the difference between "event time" (when the event actually happened) and "process time" (when the event reached the system) and handle events accordingly. In general, you want to process events based on "event time".
- Message delivery - At-least-once versus Exactly-once: Different event platform have different support over these. Depending on your use case, you need to consider retry or deduplication strategies.
- Completeness : Just like change of ordering, there is a chance of messages to be lost. This can be due application and device shutdown due to battery life of the device, unintentional damage to the phone, lost connectivity while in a tunnel, or a message that was received but only outside an acceptable window. How would incompleteness affect your outcomes?
This is not a complete list but an introduction. Here are some highly recommended reads that can help you further deepen your understanding of each.
Contributors
Google maintains this document. The following contributors originally wrote it.
Principal authors:
- Mary Pishny | Product Manager, Google Maps Platform
- Ethel Bao| Software Engineer, Google Maps Platform
- Mohanad Almiski | Software Engineer, Google Maps Platform
- Naoya Moritani | Solutions Engineer, Google Maps Platform