Recovering boot-looped phone without factory reset

Hello everyone!

tl;dr
My phone (nairo) got stuck in a boot loop after upgrading from e.os 2.5 (based on lineage-20, AOSP 13) to e.os 2.6.3 (based on lineage-21, AOSP 14). I tapped into the boot-looped system using adb (with root privileges). I analyzed logcat and identified the problem (at least the first problem): the file /data/misc_de/0/apexdata/com.android.permission/roles.xml is empty (more precise, they consist of 3991 * ‘\0’) and cannot be parsed. The same goes for its copy /data_mirror/misc_de/null/0/apexdata/com.android.permission/roles.xml and there *.reservecopy, respectively. I tried to replace them with a copy from the repo. I am able to write to disk, but the changes survive neither a reboot (“adb shell reboot”), nor continuing the boot process (“adb shell start”).

Here is in essence what I tried:

adb wait-for-device && adb shell stop
adb shell whoami
adb shell ls -alZ /data/misc_de/0/apexdata/com.android.permission/
adb shell ls -alZ /data_mirror/misc_de/null/0/apexdata/com.android.permission/
adb push roles.xml/roles.xml.eos26u /tmp/roles.xml
adb push roles.xml.sha256 /tmp/roles.xml.sha256
adb shell setenforce 0
adb shell dd conv=fsync if=/tmp/roles.xml of=/data/misc_de/0/apexdata/com.android.permission/roles.xml.reservecopy
adb shell dd conv=fsync if=/tmp/roles.xml of=/data_mirror/misc_de/null/0/apexdata/com.android.permission/roles.xml.reservecopy
adb shell dd conv=fsync if=/tmp/roles.xml of=/data/misc_de/0/apexdata/com.android.permission/roles.xml
adb shell dd conv=fsync if=/tmp/roles.xml of=/data_mirror/misc_de/null/0/apexdata/com.android.permission/roles.xml
adb shell chown system:system /data/misc_de/0/apexdata/com.android.permission/roles.xml*
adb shell chown system:system /data_mirror/misc_de/null/0/apexdata/com.android.permission/roles.xml*
adb shell chmod 600 /data/misc_de/0/apexdata/com.android.permission/roles.xml*
adb shell chmod 600 /data_mirror/misc_de/null/0/apexdata/com.android.permission/roles.xml*
adb shell chcon -v "u:object_r:apex_system_server_data_file:s0" /data/misc_de/0/apexdata/com.android.permission/roles.xml*
adb shell chcon -v "u:object_r:apex_system_server_data_file:s0" /data_mirror/misc_de/null/0/apexdata/com.android.permission/roles.xml*
adb shell sync
adb shell setenforce 1
adb shell sha256sum -c /tmp/roles.xml.sha256
adb shell ls -alZ /data/misc_de/0/apexdata/com.android.permission/
adb shell ls -alZ /data_mirror/misc_de/null/0/apexdata/com.android.permission/

# adb shell reboot
# OR
# adb shell start


What am I missing? How can I write the files to disk permanently? Or is there an alternative to replace the files somewhere else?


And now to the long version …

Disclaimer: You and you alone are responsible if you brick your device! Nothing suggested here should brick your device when properly and carefully executed (meaning, you still should be able to flash rom images). But who knows? You have been warned!

I upgraded my phone (nairo - Motorola moto g 5G plus) from e.os 2.5 (based on lineage-20, AOSP 13) to e.os 2.6.3 (based on lineage-21, AOSP 14). On the first boot something went wrong, the phone froze and rebooted. My guess is a hardware or power glitch. Now it is stuck in a boot loop. I can still access fastboot and recovery.

I only noticed later that I don’t have a fully working backup. My latest full backup is a few weeks old. With a recent working backup, I would’ve done a factory reset, restored everything from backup and be done with it. But since it seems to be a glitch in the hardware and not the new software, I want to avoid data loss. So, let’s investigate …

Without luck, I tried the obvious, flashing the rom again. Downgrading to the old version wasn’t possible either (probably because the upgrade process started already upgrading databases and such).

The old tricks of wiping the caches do not work with modern android anymore. Dalvik isn’t used in a long, long time. And the ART cache is located in the encrypted /data partition. Thus, neither is available for wiping with e.g. TWRP.

Other methods of clearing the caches, e.g. adb shell pm trim-caches 256G, are not possible through adb recovery. (Neither are they available at the stage of the boot loop. But I’m getting ahead of myself, since I did’t have a privileged adb access, yet.) This method only seems viable after unlocking the System Device Encryption (DE) and/or the User DE of the file-based encryption.
https://source.android.com/docs/security/features/encryption/file-based

Now, to have a look, what’s going on, I need some logs. But logcat doesn’t work with recovery either. Therefore, I need access to the boot-looped system. Indeed, it is possible to connect to the device during the boot process, using adb wait-for-device. However, as user system, I don’t have enough privileges to access logs or adb shell stop (“Must be root”) the boot process.

During the boot, there isn’t enough time to look for a privilege escalation. But since I have access to adb recovery and fastboot, I should be able to modify the system directly or flash a modified image. Modifying an image or building it from source seems like a lot of work. Modifying /system directly seems like the way to go.

According to
https://johannes.truschnigg.info/writing/2022-05_android_bootloop_debugging/
and

there are a few options for build.prop to gain privileged adb access:

persist.service.adb.enable=1
persist.service.debuggable=1
persist.sys.usb.config=mtp,adb
ro.secure=0
ro.adb.secure=0

Start adb recovery and mount /system. Then, remount /system as read-write:
adb shell mount -o remount,rw /mnt/system

Making changes in place using sed, as well as copying a modified version of build.prop does not work. Using adb push fails without error. But e.g. dd throws the error message: “write error: No space left on device”.

Not working example code:

ls -al /mnt/system/system/build.prop
sed -i -e 's/^persist.sys.usb.config=.*/persist.sys.usb.config=mtp,adb/' -e 's/ro.adb.secure=.*/ro.adb.secure=0/' -e 's/^ro.secure=.*/ro.secure=0/' /mnt/system/system/build.prop
grep -E '^persist.sys.usb.config=|^ro.adb.secure=|^ro.secure=' /mnt/system/system/build.prop

cp /mnt/system/system/build.prop /tmp/build.prop
cp /tmp/build.prop /mnt/system/system/build.prop
dd if=/tmp/build.prop of=/mnt/system/system/build.prop

By the way, I made sure the modified copy is smaller than the original, by deleting all comments. I also removed the previous settings, I wanted to change. Thus, there are no misleading, duplicate options, when I append the five settings to the end of the file. But the write error persists. I am not sure where it is coming from, because df and df -i show that space and inodes are available on the filesystem.

Looking into modifying the system.img inspired a solution for the problem. Again, start adb recovery and mount /system. Lookup the system device using mount. For me, that is /dev/block/dm-4. Now unmount /system.

e2fsck -yf /dev/block/dm-4
resize2fs /dev/block/dm-4 2G
# 2G too large. Max size of 500988 suggested. Adjust accordingly.
resize2fs /dev/block/dm-4 500988
# mount via recovery
mount -o remount,rw /mnt/system
# adb push mod_build.prop /tmp/build.prop
dd if=/tmp/build.prop of=/mnt/system/system/build.prop
# umount
e2fsck -yf /dev/block/dm-4
resize2fs -M /dev/block/dm-4
e2fsck -yf /dev/block/dm-4

Now, reboot and setup adb with logcat:
adb wait-for-device && adb logcat | tee "logcat_$(date --rfc-3339=seconds)"

Finally, we have a privileged adb access and a log file for analysis!

I spare you the details of analyzing the log. But here are a few hints that helped me:

  1. Filter by latest boot process: tail -n 31000 logcat_*
  2. Filter by severity: awk '{if ($5 == "E" || $5 == "F"){print}}' logcat_*
  3. Find and filter by system server process ID (pid):
grep "Exit zygote because system server (pid" logcat* | tail -n 1
grep -a "5977  5977"
  1. Combine of all the above.
  2. View extracts next to each other with diff editor, e.g. meld.

In my case the (first) fatal problem seems to be that the roles.xml file cannot be parsed. It turns out, it consists of 3991 * ‘\0’. The same goes for its backup, roles.xml.reservecopy. Additionally, I found the same files in another location. The roles.xml path (/data/system/users/0/roles.xml) of previous Andoid 10 does not exist.

I found a copy of roles.xml file in the repos of e.os and lineage. Therefore, I just need to replace the empty ones with the one from the repo.

I am root. The filesystem seems to be mounted rw. I have write access to the folders. And I even disabled SELinux, just to be sure. Additionally, I make sure the sha256sum of the files check out. adb push suppresses some errors, therefore I use dd for the actual copying.
(I developed this process successively, of course, to eliminate all potential sources of error, because I failed writing the files again and again.)
Putting it all together, it looks like this:

adb wait-for-device && adb shell stop
adb shell whoami
adb shell ls -alZ /data/misc_de/0/apexdata/com.android.permission/
adb shell ls -alZ /data_mirror/misc_de/null/0/apexdata/com.android.permission/
adb push roles.xml/roles.xml.eos26u /tmp/roles.xml
adb push roles.xml.sha256 /tmp/roles.xml.sha256
adb shell setenforce 0
adb shell dd conv=fsync if=/tmp/roles.xml of=/data/misc_de/0/apexdata/com.android.permission/roles.xml.reservecopy
adb shell dd conv=fsync if=/tmp/roles.xml of=/data_mirror/misc_de/null/0/apexdata/com.android.permission/roles.xml.reservecopy
adb shell dd conv=fsync if=/tmp/roles.xml of=/data/misc_de/0/apexdata/com.android.permission/roles.xml
adb shell dd conv=fsync if=/tmp/roles.xml of=/data_mirror/misc_de/null/0/apexdata/com.android.permission/roles.xml
adb shell chown system:system /data/misc_de/0/apexdata/com.android.permission/roles.xml*
adb shell chown system:system /data_mirror/misc_de/null/0/apexdata/com.android.permission/roles.xml*
adb shell chmod 600 /data/misc_de/0/apexdata/com.android.permission/roles.xml*
adb shell chmod 600 /data_mirror/misc_de/null/0/apexdata/com.android.permission/roles.xml*
adb shell chcon -v "u:object_r:apex_system_server_data_file:s0" /data/misc_de/0/apexdata/com.android.permission/roles.xml*
adb shell chcon -v "u:object_r:apex_system_server_data_file:s0" /data_mirror/misc_de/null/0/apexdata/com.android.permission/roles.xml*
adb shell sync
adb shell setenforce 1
adb shell sha256sum -c /tmp/roles.xml.sha256
adb shell ls -alZ /data/misc_de/0/apexdata/com.android.permission/
adb shell ls -alZ /data_mirror/misc_de/null/0/apexdata/com.android.permission/

# adb shell reboot
# OR
# adb shell start

Everything seems to work! Except … As soon as I reboot or trigger the boot process in the current session again, the files are reverted to the empty, broken ones from earlier.

Note: All changes are solely made to slot_b. Therefore, it is directly visible during reboot (because of the boot-loop) if the wrong slot is active. Thus, this is can be ruled out as an issue.

Even brute forcing the rewrite of the files while triggering the boot process from a parallel adb session, does not work.

# init
adb wait-for-device && adb shell stop

adb push roles.xml/roles.xml.eos26u /tmp/roles.xml
adb push roles.xml.sha256 /tmp/roles.xml.sha256
adb shell chown system:system /tmp/roles.xml
adb shell chmod 600 /tmp/roles.xml
adb shell chcon -v "u:object_r:apex_system_server_data_file:s0" /tmp/roles.xml
adb shell setenforce 0

# loop file copy
while true; do dd conv=fsync if=/tmp/roles.xml of=/data/misc_de/0/apexdata/com.android.permission/roles.xml; chown system:system /data/misc_de/0/apexdata/com.android.permission/roles.xml; chmod 600 /data/misc_de/0/apexdata/com.android.permission/roles.xml; done

# alternative loop file copy
while true; do cp -a /tmp/roles.xml /data/misc_de/0/apexdata/com.android.permission/roles.xml; done

I don’t know why the files are not written permanently. Maybe someone here has an idea, how to fix it and solve my problem? I appreciate your thoughts on the matter!

I am going to post this to a couple of communities and link the posts here:

I’m not familiar with apex much, but is it possible this is a loopmount or extracted from an apex file on-boot? - (a reason why your writes don’t make it back)

Thanks for your reply and your suggestion! You are right about apex and the read-only mounts. However, I already looked into it. I found and analyzed the following two files:

/system/apex/com.android.permission.capex
/data/apex/decompressed/com.android.permission@350090000.decompressed.apex

They basically are .apk files. The .capex includes the .apex file. The .apex file holds the app files. (My missing roles.xml file is not included.)
The apex manager somehow mounts the .apex file via loop-devices:

/dev/block/dm-15 on /apex/com.android.permission@350090000 type ext4 (ro,dirsync,seclabel,nodev,noatime)
/dev/block/dm-15 on /apex/com.android.permission type ext4 (ro,dirsync,seclabel,nodev,noatime)

Sadly, I couldn’t find any connection between the ro apex mounts and the apixdata-path in question. It looks a bit like apexdata stores the userdata for the apex packages. Maybe they have a special folder, because apex runs in a relatively early stage of the boot sequence.