Users are experiencing issues accessing the MileIQ App and Dashboard
Incident Report for MileIQ
Postmortem

We understand that the recent service disruption on September 16, 2023, had a significant impact on your business operations, and we want to provide you with a detailed account of what transpired during this incident.

Chronology of Events:

  • 07:33 UTC: An issue surfaced that affected a large portion of customers using a Azure’s mission-critical database service.
  • Immediate Response: Our internal monitoring systems alerted us to this issue, triggering an immediate investigation to assess the situation.
  • Root Cause Identified: After thorough investigation, we discovered that the core issue was linked to an unexpected power disruption in the underlying network infrastructure with our cloud provider. This disruption led to the temporary unavailability of certain compute nodes, which in turn caused failures and timeouts for SQL Database operations.
  • Mitigation Steps: To mitigate the initial impact and restore functionality as swiftly as possible, we took the following actions:

    • Alternative steps of communicating: Attempted vendor suggested workaround of connecting via a new tunnel - operation timed out due to the incident.
    • Disaster Declaration: We started executing on a previously prepared plan and restoring DB replicas in alternate regions, our backups are geo-replicated to account for this exact usecase. Due to the dataset size restore would have taken 15-20 hours from that point
    • Complete Service Recovery: Azure team confirms restoration of the service. We abandon the efforts to bring the service up in the alternate location and focus on restoration of services in the primary location.
  • 21:38 UTC: We achieved complete service recovery, resolving the issue and restoring normal operation for all new sessions

Impact to our Service:

  • Most of the application clients were able to automatically re-connect upon service restoration.
  • A small subset required users to re-login.
  • Most drives captured during the downtime were cached on the device and posted to the platform upon service restoration.
  • A small subset of drives during the incident became irrecoverable. Customers are advised to use manual entry.

Our Commitment to You:

  • Our choice of infrastructure is our business, not yours, and we take full responsibility for this disruption.
  • We’re working with individual users who may have remnants of the issue or have questions via our support team
  • We’re pursuing investigation with our provider to understand why cross-AZ failover didn’t occur as planned.
  • We’re fixing all the reasons we have not persisted drives when the service was not available and ensuring high durability on -device.
  • We’re pursuing code improvements that would allow for us to buffer incoming data upstream of DB even with the loss of authentication.
Posted Sep 18, 2023 - 14:48 PDT

Resolved
We observe full service restoration.
Posted Sep 16, 2023 - 15:14 PDT
Update
A fix has been successfully implemented by Microsoft Azure, and our system is currently in the recovery phase. Users will regain access shortly, and we anticipate that drives will be posted within the next few hours.
Posted Sep 16, 2023 - 12:44 PDT
Monitoring
A fix has been successfully implemented by Microsoft Azure, and our system is currently in the recovery phase. Users will regain access shortly, and we anticipate that drives will be posted within the next few hours.
Posted Sep 16, 2023 - 12:14 PDT
Update
Azure continues their mitigation efforts. We aim to provide the next update by 07:40PM UTC. Please check back for more information.
Posted Sep 16, 2023 - 11:41 PDT
Update
Azure continues their mitigation efforts. We aim to provide the next update by 06:00PM UTC. Please check back for more information.
Posted Sep 16, 2023 - 10:07 PDT
Update
Azure continues their mitigation efforts. We aim to provide the next update by 05:00PM UTC. Please check back for more information.
Posted Sep 16, 2023 - 09:07 PDT
Update
Azure continues their mitigation efforts. We aim to provide the next update by 04:00PM UTC. Please check back for more information.
Posted Sep 16, 2023 - 08:00 PDT
Update
Azure continues their mitigation efforts. We aim to provide the next update by 03:00PM UTC. Please check back for more information.
Posted Sep 16, 2023 - 07:04 PDT
Update
Azure continues their mitigation efforts. We aim to provide the next update by 02:00PM UTC. Please check back for more information.
Posted Sep 16, 2023 - 06:00 PDT
Update
Azure continues their mitigation efforts. We aim to provide the next update by 01:00PM UTC. Please check back for more information.
Posted Sep 16, 2023 - 05:00 PDT
Update
Azure continues their mitigation efforts.
We aim to provide the next update by 12:00PM UTC. Please check back for more information.
Posted Sep 16, 2023 - 04:00 PDT
Update
Azure continues their mitigation efforts.
We aim to provide the next update by 11:00AM UTC. Please check back for more information.
Posted Sep 16, 2023 - 03:34 PDT
Update
Azure continues their mitigation efforts.
We aim to provide the next update by 10:25AM UTC. Please check back for more information.
Posted Sep 16, 2023 - 03:05 PDT
Update
Azure continues their mitigation efforts.
We aim to provide the next update by 10:05AM UTC. Please check back for more information.
Posted Sep 16, 2023 - 02:47 PDT
Identified
Azure continues their mitigation efforts.
We aim to provide the next update by 09:45AM UTC. Please check back for more information.
Posted Sep 16, 2023 - 02:25 PDT
Update
Update:
We're currently experiencing connectivity issues with our application. This incident has been traced back to an ongoing situation with our cloud provider, Azure. They have informed us of an issue impacting their services in our region, which has directly affected our application's ability to connect to its database.

Impact:
Users might encounter difficulties or be unable to access our application and related services until the problem is resolved. Rest assured, drives captured will be posted to your account once MileIQ is back online.
We aim to provide the next update by 09:25 AM UTC. Please check back for more information.
Posted Sep 16, 2023 - 02:05 PDT
Update
We are continuing to investigate this issue.
Posted Sep 16, 2023 - 01:52 PDT
Investigating
Our team is investigating the issue.
Posted Sep 16, 2023 - 01:51 PDT
This incident affected: iOS App, Android App, and Web App.