Update on murena.io service outage

Since Oct 6 19:50 CET, most murena.io services have been unreachable.

Affected services

  • all murena.io services including drive, calendar, notes and email (@murena.io / @e.email)

  • Over the air (OTA) /e/OS updates

  • App Lounge

All other websites including e.foundation, gitlab.e.foundation, community.e.foundation and murena.com remain unaffected.

Context

The reason behind this outage is related to our storage infrastructure that is getting old while the number of murena.io active users has grown a lot during the past two years.

An infrastructure evolution and consolidation was planned, but last week, our current infrastructure lost some storage nodes and switched to a degraded mode.

Unfortunately, yesterday we faced another issue that forced us to put the infrastructure in maintenance mode until it gets fully restored.

Actions we have taken:

  1. We are prioritizing email service recovery as soon as possible, and it is ongoing progress. We will update this today to give a first ETA.

  2. Meanwhile, we have already fixed the App Lounge service, which is now back, still in a degraded mode: only commercial apps (from Play Store) can currently be installed. An ETA for the service to be fully restored will be communicated later.

  3. We’re taking the current outage as an opportunity to accelerate the murena.io storage infrastructure evolution and consolidation that was planned. New storage nodes have been added and the synchronization process is already in progress. But due to the high volume of data and the tests procedures we need to perform before putting it back online, we prefer to be clear about the fact that it is probable that the ETA for murena.io services (excepted email services that will be back sooner) will be counted in days, not in hours.

  4. OTA updates repair have been low prioritized since the impact for users is the lowest.

We’d like also to make it clear that there was no data leak involved in this situation.

We sincerely apologize for the inconvenience and will keep you updated about progresses and ETA during the next hours.

The Murena Team

Update Oct 7th

  • we have an ETA for the mail service @murena.io/@e.email on Thursday 10/10 morning CET.
  • we don’t have yet an ETA for murena.io other services, as we are still evaluating a few different options to make it up again safely in the most reasonable amount of time

Update Oct 9th

  • ETA for mail service @murena.io/@e.email: Thursday Oct 10th, afternoon CEST
  • ETA for drive at murena.io: (hopefully) Monday Oct 14th
  • ETA for calendar, contacts, mail, passwords at https://murena.io : Friday Oct 11th
  • ETA for /e/OS images & OTA updates: start rolling on Thursday Oct 10th (not all devices will be available at the begining)

Update Oct 10th

  • ETA for mail service @murena.io/@e.email: Thursday Oct 10th, evening CEST.
  • ETA for drive at murena.io: end of next week.
  • ETA for calendar, contacts, mail, passwords at https://murena.io : Friday Oct 11th.
  • ETA for /e/OS images & OTA updates: OTA service reopened, /e/OS downloads have started to roll out for most devices - some IPv6 access issues are currently being fixed.

Update Oct 11th

  • update about email services @murena.io/@e.email: our email servers have been reopened yesterday evening as expected to handle incoming and outgoing email queues that were pending for several days. During the night, the service has also been reopened briefly to users, but as the load was extremely high on servers, we have suspended it until incoming and outgoing queues get emptied. ETA for reopening the service to users is this morning if the load is acceptable. If the load remains too high we will have to modify an email route, which might take a few more hours.
  • update about /e/OS images & OTA updates: IPv6 access is now working.

Update2/ Oct 11th

  • ETA for mail service @murena.io/@e.email: we have started to open again email access, starting with a bit more than 50% Premium users. Note: only fetching emails is possible for now. Sending will be opened once the load is acceptable.
  • ETA for calendar, contacts, mail, passwords at https://murena.io : Monday October 14th 2024 (testing/QA is in progress)
  • ETA for /e/OS images & OTA updates: OTA service reopened, as well as /e/OS downloads for most devices. Download speed should be way better than in the past.
  • ETA for drive at murena.io (files/images/videos) is still uncertain as we have remaining issues to fix with the storage infrastructure. Will update early next week.

Update Oct 12th

  • Email service is now fully operational (receiving and sending) for Premium accounts. Please note that some emails that would have been received between Sunday 13 5:00 CEST and Sunday 13 19:50 won’t show up for now. Email service for free users will start opening progressively on Monday Oct 14th.

Update Oct 14th

  • Mail service : @murena.io/@e.email: fully operational for Premium members. ETA for free users: not before Tuesday 15 evening CEST.
  • Calendar/Contacts/webmail/passwords at https://murena.io: ETA end of this week.
  • /e/OS images & OTA updates: operational.
  • drive at murena.io: ETA is still uncertain as we have remaining issues to fix with the storage infrastructure.
  • FOSS apps in App Lounge (F-Droid): no ETA yet.

Update Oct 16th

  • Mail service: email @murena.io/@e.email is now fully back for all members.

Update Oct 18th

  • FOSS apps in App Lounge (F-Droid): should now work normally (installation and updates)
  • Calendar/Contacts/webmail/passwords at https://murena.io: is currently being tested internally. ETA for public opening is Monday Oct, 21st.

Update Oct 21st

  • Minimal murena.io is now publicly available with Calendar/Contacts/Webmail/passwords apps.

Update Oct 25th

Dear everyone,

Thank you for your enduring patience with this outage. We are working tirelessly to bring Murena Workspace fully back online. Since the begining of the outage we have been able to put back in place:

  • /e/OS image download and OTA download
  • email service @e.email/@murena.io
  • murena.io partial setup: calendar, contacts, webmail and passwords

What is still missing is access to files/photos/videos that were stored at murena.io.

We’d like to give more explanation about why it’s taking so long. It all started with several defective hard-drives in our storage cluster. Our storage cluster has many disks with a lot of redundancy, but this time, it went to an unstable state that made us decide to stop it until we completely fix it. Unfortunately, several additional issues arose on top of the pre-existing issues. The resulting complexity led us to acquire expert advices from a specialist company to avoid further complications. Given the size of the cluster, each procedure like checking some data and reorganizing some data takes a long time (sometimes several days). After a comprehensive situation analysis, the expert company has advised use to reinforce the cluster with additional and new servers and disks before rerunning the stabilization again, this to avoid falling again soon or later in a cascade of new disruptions.

So we are at this point right now: new hardware has been ordered in the DC, which should normally be available and set up next week.

At that point, we can start again to stabilize the storage cluster, run through all appropriate validation procedures. Unfortunately, this process can still take several weeks, maybe less if we are lucky, so one week ago we have taken the decision to restore our cold backups. In the best-case scenario, if we can get our storage cluster up and running again soon, this cold storage recovery will have been useless. In the other case, the backup will allow us to restore users’ files on a new infrastructure. We’re in the middle of the restoration process and it will still take several days to complete, maybe up to one week.

When this backup has been restored, we will decide about the best route to take. It’s possible that we will provide access for everyone to the restored files in read-only mode first until we fix the storage infrastructure.

We apologize again for the inconvenience and will keep you informed about the status every time we have significant news.

Gaël & the Murena Team.

Update Nov 7th

There is no concrete news available this week, but we wanted to make a partial update:

  1. Regarding the consolidation of the storage cluster: the new hardware is unfortunately still not available, as we depend on suppliers, but we have good hope it will be completely installed and enabled early next week. Starting from this point we can resume work on the storage cluster stabilization, testing, and hopefully reopen the service.
  2. The backup restoration is not complete yet: it takes a lot of time when you are dealing with close to 100TB of data, not even considering data transfers.

So depending on which set of data will be available first, we will reopen a dedicated access, starting with Premium users accounts.

Update Nov 15th

This week’s update:

  1. The new hardware has finally been received, installed and tested. The cluster storage is currently rebuilding including this new hardware. Once fully consolidated, the plan for next week is to resume work on testing and stabilization of the filesystem.
  2. The backup restoration is not complete yet as it had to be paused due to some more hardware issues we had to fix. The restoration process has partially started to resume today.

Next week, we hope we will be able to give an ETA for service restoration, that will depend on the storage cluster stabilization and backup restoration status.

Update Nov 22nd

This weeks update:

  1. The file storage cluster has been consolidated (all data copied from old servers to new servers). A very long process of full filesystem scanning/analysing is currently running. Once completed, work can resume on stabilization of the filesystem (hopefully next week).
  2. Backup restoration is still ongoing, as we have encountered some issues with a part of the backup.

ETA for full service restoration cannot be given yet.

Update Nov 29th

This week’s update:

  1. The file storage cluster scanning/analysis has been completed mid of this week, but we had to wait for two additional days for external expert availability to resume stabilization. First attempt today has not been successful, we will make a new attempt on Monday.
  2. Backup restoration is still ongoing. We’re putting a plan in place to provide access to ready available backups to users.

ETA for full service restoration cannot be given yet.

Regain your privacy! Adopt /e/OS the deGoogled mobile OS and online servicesphone

99 Likes