Push Notification Analytics Roadmap¶
Status: Draft
Target Audience: Developers, Data Scientists
Goal: Implement comprehensive analytics for the Push Notification subsystem to ensure reliability in real-world deployments (Restaurant Industry).
1. Executive Summary¶
As HOMEPOT transitions from simulation to real-world deployment in the restaurant industry, relying solely on "Job Success" metrics is insufficient. We need granular visibility into the delivery pipeline of push notifications.
This roadmap outlines the implementation of a Push Analytics Module that tracks the lifecycle of every message, measures latency, identifies network bottlenecks, and provides actionable insights to technicians.
Key Metrics to Track: 1. Delivery Rate: Percentage of messages successfully acknowledged by devices. 2. End-to-End Latency: Time difference between sent_at (Server) and received_at (Device). 3. Provider Reliability: Success rates per provider (FCM vs. APNs vs. MQTT). 4. Device Reachability: Identifying "Zombie Devices" that are online but not receiving pushes.
2. Architecture Strategy¶
2.1 The "Ack" Loop¶
Currently, push notifications are "Fire and Forget". We will implement a "Fire and Acknowledge" pattern.
- Server sends Push with a unique
message_idandsent_attimestamp. - Device receives Push (background wake-up).
- Device immediately calls
POST /api/v1/push/ackwithmessage_idandreceived_at. - Server calculates latency and updates the
PushNotificationLog.
2.2 Database Schema¶
We need a dedicated model to track individual messages, separate from the high-level "Jobs".
Proposed Model:
class PushNotificationLog(Base):
__tablename__ = "push_notification_logs"
id = Column(Integer, primary_key=True)
message_id = Column(String(100), unique=True, index=True)
# Context
device_id = Column(String(100), ForeignKey("devices.device_id"))
job_id = Column(String(100), ForeignKey("jobs.job_id"), nullable=True)
provider = Column(String(20)) # fcm, apns, mqtt
# Timestamps
sent_at = Column(DateTime(timezone=True), default=utc_now)
received_at = Column(DateTime(timezone=True), nullable=True)
# Metrics
latency_ms = Column(Integer, nullable=True) # Calculated on ack
status = Column(String(20)) # sent, delivered, failed, expired
# Error Tracking
error_code = Column(String(50), nullable=True)
error_message = Column(Text, nullable=True)
3. Implementation Roadmap¶
Phase 1: Backend Infrastructure¶
- Task 1.1: Create
PushNotificationLogmodel and migration. - Task 1.2: Update
PushNotificationProviderbase class to returnmessage_idand log the "Send" event. - Task 1.3: Create
POST /api/v1/push/ackendpoint to handle device acknowledgments.
Phase 2: Client Integration (SDK)¶
- Task 2.1: Update the HOMEPOT Client SDK (Android/iOS/Windows) to:
- Extract
message_idandsent_atfrom payload. - Call the Ack endpoint immediately upon receipt.
- Extract
- Task 2.2: Update
simulation.pyto simulate realistic network latency (e.g., random delays for "bad Wi-Fi").
Phase 3: Analytics Engine¶
- Task 3.1: Implement "Stale Message Detector" background task.
- Mark messages as
EXPIREDif not acknowledged within TTL (e.g., 5 mins).
- Mark messages as
- Task 3.2: Create Aggregation Queries:
- Average Latency per Site.
- Delivery Rate per Provider.
Phase 4: Visualization (Dashboard)¶
- Task 4.1: Add "Push Health" widget to Dashboard.
- Heatmap of delivery latency.
- Alerts for sites with high failure rates.
4. Success Criteria¶
- Visibility: Technicians can see exactly when a device received a command.
- Troubleshooting: We can distinguish between "Device Offline" vs. "Push Provider Failure".
- Optimization: Data allows us to choose the fastest provider for each site (e.g., "Site A works better with MQTT").
5. Handling Unreachable Devices (The Fallback Strategy)¶
A critical concern in diverse deployments (especially industrial POS) is devices that cannot use standard push services (FCM/APNs). This includes:
- AOSP Devices: Android terminals without Google Play Services (no FCM).
- Legacy Systems: Windows Embedded/IoT devices without WNS support.
- Strict Firewalls: Networks blocking
googleapis.comorapple.com.
5.1 The MQTT Fallback Layer¶
To ensure 100% reachability, HOMEPOT will implement an MQTT Fallback Layer.
- Primary Channel: Standard Push (FCM/APNs) - Preferred for battery efficiency.
- Secondary Channel: MQTT (Direct Connection) - Used when Primary fails or is unavailable.
Implementation Plan: 1. Client Logic: The HOMEPOT Agent will attempt to register with FCM/APNs. If it fails (or detects no GMS), it automatically establishes a persistent MQTT connection to the HOMEPOT Broker. 2. Server Logic: The PushNotificationProvider will check the device's capabilities. If fcm_token is missing but mqtt_client_id is present, it routes the message via MQTT. 3. Analytics: The Analytics Engine will track "Fallback Rate" to identify sites with systemic connectivity issues.