Utilizing btrfs snapshots to protect Android from malware
Let's continue our ride with btrfs.
I've picked up a sample 4K footage on the web and extracted the frames to fill up the gallery by following command :
ffmpeg -i SES.Astra.UHD.Test.2.2160p.UHDTV.HEVC.x265-LiebeIst.mkv -q:v 1 image%5d.jpg
The images in total, were 2.8GB in size.
The snapshot feature is probably the most eye-catching one in btrfs.
As the name implies, it allows to revert the filesystem back to an older state.
Since btrfs uses CoW, which always append the data instead of updating it in-place, this feature makes a lot of sense.
If you're a sane person, first thing that pops to your head should be : BACKUPS!
But keep in mind that snapshots are not the same as backups.
- You don't get any additional redundancy.
If hardware fails, you lose all snapshots.
- Malwares with root permissions can still mess around with snapshots.
However, snapshots could be great as a backup solution in Android.
- You really don't need redundancy.
How many NAND failure have you seen on a smartphone(excluding LG, cough cough)?
- Most of the data destructions are owner-made.
- Most of the malwares are also installed by the owner.
- Most of the malwares don't have root permission.
- Most of the malwares don't have root permission.
Thanks to SELinux, malwares that exploit root permission has been reduced drastically. Which means, those can't touch system-implemented btrfs snapshots.
Android is an open platform. Users are allowed to install 3rd-party apps outside of Google Play unlike iOS. While antivirus solutions does a pretty good job on detecting and uninstalling malwares, none of them can protect users from storage-destroying malwares such as ransomware.
The importance and necessity of a good backup solution for Android will only grow as the time goes on.
The importance and necessity of a good backup solution for Android will only grow as the time goes on.
There are several reasons to this :
- No antivirus can safely protect you from ransomware.
Even existing antivirus solutions on PCs create some sample files and wait until those get tempered. If the files get tempered, they force and stop all user programs to prevent further damages. This method is very ineffective. Ransomware can improve target selection to completely workaround this.
- Antivirus on Android is ran at the same permission level as the malware.
Unless implemented by the system, this limits the number of actions that antivirus can do.
To properly protect against these kinds of attacks, a snapshot solution is the way to go.
I decided to come up with a (somewhat and immature) complete solution to this.
Disclaimer - the projects mentioned in this post has a lot of room to improve and currently not ready for actually being included to a production device. Please see these as a "very good proof-of-concept".
So the first step was..
Detecting storage destruction - corruptd
GitHub - https://github.com/arter97/corruptd
Users will notice a storage destruction quite easily. However, magically detecting it would be better. So I decided to write a system daemon in C that detects storage corruption.
corruptd stands for "corruption detector/daemon".
The main goal is to detect storage corruption efficiently. And I'm proud by the fact that it only takes ~5 seconds to scrub through ~3000 photos and detect corruption with basically (almost)no CPU usage and super low RAM usage. Let me break it down for you.
1. Using inotify
Although it has the "i" prefix, it's not from Apple. inotify is a set of APIs provided by the Linux kernel(not POSIX, so it's not portable) that notifies the userspace program if a watched file/directory changes.
Since it's an interrupt method instead of polling, it's very efficient.
2. Using crc32 hash on a selective part of a file
As I mentioned in my previous post, crc32 is hardware accelerated in aarch64, and the ROM I'm using makes use of that.
If a ransomware encrypts a file, it's pretty much guaranteed that even a small portion of a file would be changed. Which means we don't have to scan the entire file to see if it has changed.
I benchmarked to see how big the sample size has to be in order to be confident enough.
/*
* CRC_BUF's performance & confident level table
* Benchmarked on an aarch64 Android device with ~3000 photos
*
* CRC_BUF : Seconds Duplicates
* 128 : 1.541s 18
* 256 : 1.553s 17
* 512 : 1.583s 17
* 1024 : 1.604s 16
* 2048 : 1.632s 16
* 4096 : 1.656s 17
* 8192 : 1.633s 16
* 16384 : 1.705s 15
* 32768 : 1.955s 12
* 65536 : 2.397s 10
* 131072 : 2.979s 4
* 262144 : 4.670s 1
* 524288 : 6.715s 1
* 1048576 : 8.932s 1
* 2097152 : 12.708s 1
*/
I've wrote a shell script that iterates through the DCIM directory and calculates crc32 with the requested size for 20 times in a loop, then calculate the average time and the number of duplicated hashes.
Looks like 256K(262144) is a pretty good balance of performance(time) and reliability(duplicates).
However, there's a problem. If we calculate the hash at a hardcoded offset, the ransomware can choose not to encrypt the specified chunks of a file.
To overcome this potential security hole, corruptd generates offset for each files randomly from /dev/urandom.
One thing to note here, for optimizing performance, is that the offset is preferred to be dividable with PAGE_SIZE. If not, the underlying storage might have to read the previous block in order to read from the offset(more overhead).
3. Continuous and efficient memory allocation
Each files must be watched from corruptd. I came up with "struct watch_file" which is consisted of 3 64-bit members. So structs are (theoretically) 24 bytes each and it can easily monitor 5000 files for just about 117 kB of memory.
(Obviously, it'll take more than that at runtime due to link to bionic, inotify, etc)
Using a linked-list and calling malloc() on every files is very inefficient as it causes memory fragmentation. So I decided to bulk-malloc at first and call realloc() everytime when corruptd is about to increase the size to the next bulk.
4. LOOSE mode
If we are just detecting destruction, it's not necessary to watch every single files. Enabling LOOSE mode will store only up-to "n" files. Currently that threshold "n" is set to 5000. corruptd will scan and skip certain files to ensure good distribution across the watched directory.
After a file is removed, the "missing hole" will be filled up with a different file.
5. Detecting a destruction
inotify can tell corruptd if a certain file was deleted or modified.
Dealing with deletion is easy. corruptd just increments "int tempered" until it reaches a threshold.
In case of modification(ransomware can encrypt the file in-place), corruptd waits for the file-writing process to call close() with inotify's IN_CLOSE_WRITE API. Then, it recalculates the hash at the same offset as before. If the hash differs, "int tempered" is incremented.
After "int tempered" reaches a certain threshold(which by default, is 20% of the monitored files), corruptd will attempt to alert an Android app about it by executing an "am(ActivityManager)" command.
Since corruptd is intended to be a system daemon, this call works without permission denial.
6. CPU & RAM usage
Because corruptd is written in C and lightweight, it should be interesting to see how much resource it actually takes.
I've picked up a sample 4K footage on the web and extracted the frames to fill up the gallery by following command :
ffmpeg -i SES.Astra.UHD.Test.2.2160p.UHDTV.HEVC.x265-LiebeIst.mkv -q:v 1 image%5d.jpg
The images in total, were 2.8GB in size.
During the file transfer with MTP, you can clearly see that corruptd is barely taking any CPU resource while the MediaScanner is eating the CPU like crazy.
After about 3000 photos are added to the list, I've ran ps in order to see the memory usage.
Cool, it only takes about 5 MB of memory.
corruptd has a lot of room to improve by using some compiler optimizations and debugging through valgrind,
but nonetheless, it's nice to see that the very first, functional version is already this light.
Managing snapshots - Snapshot manager
Since interacting with corruptd and btrfs snapshots with adb and shell commands is clearly not user-friendly, I decided to also come up with a simple Android app that manages snapshot.
This app is also intended to be a system app, but for my ease of debugging, it currently just calls "su -c" whenever it executes btrfs commands.
Making a snapshot :
btrfs subvolume snapshot /data /data/btrfs-snapshot/xyz
Deleting a snapshot :
btrfs subvolume delete -c /data/btrfs-snapshot/xyz
Restoring to a snapshot :
btrfs subvolume set-default xyz /data
Planned features :
- Proper clean-up of old snapshots
- Automatic snapshots
- R/O snapshots for better security
corruptd calls Snapshot manager whenever it detects a storage destruction, and it currently looks like this :
When user taps on that notification, Snapshot manager asks the user to select a snapshot.
Restoring to a snapshot just sets the default subvolume to that specified snapshot, so the restoration will be also done instantly. The user just have to reboot after selecting a snapshot.
Demo
Both corruptd and Snapshot manager is demoed in this video :
A storage destruction is simulated by a simple "rm -rf /sdcard/DCIM/*" command.
After corruptd detects it(almost instantly), Snapshot manager wakes up and prompts the user.
When a reboot is done, you can see the pictures are fully restored.
Conclusion
While btrfs is still at "testing" phase for Android, it was quite interesting to get my hands on it early and experiment on it.
As you just saw, snapshots are very powerful and could be very useful for everyday users.
Ransomwares are all the rage these days and it's only a matter of time for those to hit Android(IMO).
People are still
- doing stuffs like Googling pirated apps
- getting phished from a random dude sending you a URL saying it's "your friend's wedding invitation"
and installs apks from an untrusted source without any caution what-so-ever.
In the events of ransomware becoming a major threat, the solution I came up could be useful.
I hope that I can improve the random I/O performance
so that I can finally use and test btrfs as a daily driver as well.
Comments
Post a Comment