Experimenting around btrfs on Android
As covered before, btrfs might be an exciting thing to try on Android.
root@localhost:/# btrfs fi df /android/data
Data, single: total=34.00GiB, used=33.66GiB
System, single: total=32.00MiB, used=16.00KiB
Metadata, single: total=1.00GiB, used=425.84MiB
GlobalReserve, single: total=66.58MiB, used=0.00B
root@localhost:/# du -sh --apparent-size /android/data
38G /android/data
Although f2fs is clearly the king of the performance on Android, my inner mouth was keep saying to try it out, so I did.
Kernelspace
The first step is to enable btrfs kernel module... or is it?
The latest phone I had at the time when I started this project was the OnePlus 3, which used a 3.18 kernel.
So before I start experimenting around btrfs, I decided to port a more recent, stable version of btrfs to the OnePlus 3's kernel.
I picked the 4.9 kernel's btrfs as the 4.9 kernel is the latest LTS kernel as the time of writing this post.
For those who are unfamiliar with Linux kernel development, backporting a kernel module is a hideous process. Not only you have to fix a ton of compilation errors, you have to dig through the entire Git commit history to properly fix API usages. You also have to backport some other commits if the kernel module you're trying to port heavily relies on other, newly added commits. Oh, and did I mention you have to do this recursively for all of the dependencies? Yeah, it's hideous.
After spending about a week of digging through Git, solving compilation errors and debugging kernel panics, I managed to backport 4.9's btrfs to 3.18 successfully. You're welcome. (Although, it might not be fully stable, FYI)
While I was messing around the btrfs sources, I thought heck, let's add LZ4 support in the mix. I honestly have no idea why btrfs developers still aren't merging LZ4 support, when it clearly outperforms basically everything out there.
I picked up LZ4 patches from patchwork.kernel.org, made by Philip Worrall. (I had to fix it up to support latest LZ4 on the Linux kernel)
Userspace
You need several changes in the userspace :
- fstab to mount btrfs partitions
- mkfs.btrfs to format a partition into btrfs
- btrfs to manage btrfs filesystem
If you are bringing up a new filesystem, you probably also need to patch SELinux policy, but somehow Google seems to have added support for btrfs a long time ago.
Patching fstab is quite easy. Just replace the /data declaration with the following line :
/dev/block/bootdevice/by-name/userdata /data btrfs nosuid,nodev,noatime,discard,autodefrag,compress=lz4,ssd_spread wait
nosuid,nodev,noatime,discard is pretty much the default on Android.
autodefrag was used, as I didn't come up with an efficient, manually way of doing this on Android.
ssd_spread tells the btrfs that our partition is flash-based, and quote : "is more strict about finding a large unused region of the disk for new allocations, which tends to fragment the free space more over time. It is often faster on the less expensive SSD devices."
Since FTL in mobile devices are pretty simple(unlike SSDs), preferring ssd_spread over ssd makes sense.
autodefrag was used, as I didn't come up with an efficient, manually way of doing this on Android.
ssd_spread tells the btrfs that our partition is flash-based, and quote : "is more strict about finding a large unused region of the disk for new allocations, which tends to fragment the free space more over time. It is often faster on the less expensive SSD devices."
Since FTL in mobile devices are pretty simple(unlike SSDs), preferring ssd_spread over ssd makes sense.
mkfs and btrfs tool is obtainable at btrfs-progs. While the existant of "Android.mk" file there excited me at first, it's outdated and unusable. You need to patch it yourself, or you can just statically link it with glibc. I went to the former route, and also made it open-source. You're welcome.
(Sidenote : use 'mm mkfs.btrfs btrfs -j4' to build)
Recovery
It's basically impossible to wipe the /data partition on Android without going into the recovery since system daemons lock files under /data.
Recovery is a reserved boot partition on Android that only initializes a set of programs that's necessary for "recovery" purpose. OTA updates are also handled here(if it's not seamless-ready phone).
You can *technically* wipe the /data partition into btrfs in the recovery with the mkfs.btrfs tool without the kernel actually supporting btrfs, but this makes managing subvolumes extremely hard. So I recommend you to repack the recovery with the kernel that supports btrfs.
After entering the recovery, unmount the /data partition and call mkfs.btrfs :
mkfs.btrfs -l userdata /dev/block/bootdevice/by-name/userdata
After making the /data partition to btrfs, we'd like to setup a root subvolume first to make use of snapshots later :
btrfs subvolume create /data/root
After creating a root subvolume, we need to make it as a default subvolume.
Obtain the root subvolid with the following command :
btrfs subvolume list /data
It should be something like this :
ID 257 gen 215 top level 5 path root
Memo the ID number and run :
btrfs subvolume set-default 257 /data
(Obviously replace 257 with your own output)
Trying it out
You're now ready to boot.
Reboot and cross your fingers :)
After the boot is done, run 'mount' to check if btrfs is properly applied.
First thing to do is run benchmark, right?
Well, don't get too excited.
I ran AndroBench on 2 same phones.
The left one is on btrfs and the right one is on f2fs.
While the sequential performance is impressive(thanks to compression),
the random performance got absolutely slaughtered by f2fs.
btrfs inherently has random performance issues as it uses CoW design.
Many random writes will cause fragmentation overtime, which results in poor performance.
Most of the random I/Os on Android is from database/sqlite.
We can disable CoW specifically for sqlite to gain back some of the lost performance.
Disabling CoW will improve random performance, but it'll also disable many btrfs functions(e.g, compression, checksum, etc)
- Make /data/data as a subvolume and mount it with nodatacow option.
Although this will make files other than databases non-CoW.
- Modify libsqlite.so and add FS_NOCOW_FL when it calls write() syscall.
Or..
- Optimize SQLite
Seems like there is an attempt to this : https://github.com/wurikiji/SQLite-on-BtrFS
- Disable fsync()
Many brainless custom kernel developers disable fsync() outright, because it obviously improves performance by a very large margin.
While this makes almost no sense on traditional filesystems, it might be worth looking into it with btrfs as it has stronger fault-tolerance with checksums.
Checksum
btrfs does a full checksum(both metadata and actual data) to improve fault-tolerance. Unfortunately, this is not all that useful on Android unless your device has a faulty battery and reboots whenever it feels like it. Most Android devices are pretty safe from random reboots, but it's good to know whenever it happens, your data is most likely to be safe from it.
I could have also customized the checksum on btrfs by patching it to use xxHash.
Although, it should be noted that x86 and aarch64 both implements a hardware-accelerated crc32 function. So I decided to use crc32 instead. If you're using another architecture(such as arm32) that doesn't implement crc32 in hardware, you might want to try out xxHash.
Compression
btrfs can do a fully transparent file compression, which to my surprise, is a feature that's quite hard to find on other filesystems. My guess is that it's because most compression methods don't allow random access and modification. squashfs doesn't even allow read/write due to this limitation.
Apparently, compression in btrfs is done thanks to the CoW design as disabling CoW(nodatacow or FS_NOCOW_FL) also disables compression.
A full filesystem-level compression can be useful as it allows :
- Better sequential performance
- More usable space
Wait, what? Compressing files makes performance better?
When the CPU can (de)compress the data faster than the underlying storage reads/writes data, yes. Very much yes.
Files like a document can compress very well while files like .jpg, .mp4, .mkv don't.
However, compression can hurt random I/Os as the compression algorithms are sequential-based. This means you might have to decompress an entire block in order to randomly access data. (I need to lookup on how compression is implemented in btrfs for more details)
People might find the "more usable space" part more interesting.
It's obvious that a compression on a filesystem will get you more usable space.
But the question is how much?
But the question is how much?
I've transferred the data in my f2fs-formatted daily driver to lz4-enabled btrfs to see how much space it actually takes.
(Keep in mind that calculating free space in btrfs is a bit tricky.)
root@localhost:/# btrfs fi df /android/data
Data, single: total=34.00GiB, used=33.66GiB
System, single: total=32.00MiB, used=16.00KiB
Metadata, single: total=1.00GiB, used=425.84MiB
GlobalReserve, single: total=66.58MiB, used=0.00B
root@localhost:/# du -sh --apparent-size /android/data
38G /android/data
So it turned out that btrfs managed to compress about 4GB of data out of 38GB with lz4.
My phone's DCIM directory takes about 6.5GB and it's most likely that files under that directory aren't compressed at all. If we factor that in, btrfs compressed 4GB of data out of 31.5GB.
About 13% compression doesn't sound too bad.
It might be also interesting to try out zstd with btrfs. Although it's a bit slower than lz4, it has a much higher compression ratio. Facebook already cooked up patches to add zstd module to the Linux kernel, and hook it up to btrfs.
My phone's DCIM directory takes about 6.5GB and it's most likely that files under that directory aren't compressed at all. If we factor that in, btrfs compressed 4GB of data out of 31.5GB.
About 13% compression doesn't sound too bad.
It might be also interesting to try out zstd with btrfs. Although it's a bit slower than lz4, it has a much higher compression ratio. Facebook already cooked up patches to add zstd module to the Linux kernel, and hook it up to btrfs.
(You might be saying : "Wouldn't the extra CPU usage cause a hit on battery life?"
Maybe yes, but these days, OEMs are doing even crazier stuffs like compressing RAM(with zswap or zram). I/O to RAM is obviously much more frequent and expensive than I/O to disk.
Since this "even crazier idea" is being shipped to many production devices, maybe it makes sense to try out compression on disk.)
Snapshots
Ah, snapshots are very interesting :)
But this deserves to be in another detailed post.
We'll look into using snapshots in btrfs in the next post.
If I don't use snapshots, it doesn't write by COW. right?
ReplyDeleteIncorrect. btrfs uses CoW by default unless explicitly disabled.
Delete