Saving power consumption on laptops with Linux

With Windows and MacBooks dominating the laptop market, it's extremely rare for a manufacturer to actually care about Linux.

Thankfully, drivers problem has been mostly taken care of. You should expect almost 1:1 feature-set on Linux compared to Windows. Only major exceptions are:
- Proprietary drivers from NVIDIA for dedicated GPU
- Windows Hello devices for bio-metric authentication

The former is a whole another problem, and you should refrain from buying any laptops with dedicated GPU if you want to use Linux as the primary operating system.
Even if you manage to get semi-workable power management applied to dedicated GPU, I'm yet to see one that's comparable to ones only with iGPU.

Always remember:


Even if you own a laptop with dGPU, this article will still help you to some degree.
However, I'd still recommend you to make PCIe runtime PM working on dGPU by spending several more hours Googling other tips and articles.


Be sure to follow the terminal instructions from this article as root(by using 'sudo -s' or 'su').
Also, this article is written from Ubuntu 18.04. Other versions of Ubuntu can safely follow the same instructions, but in cases of other distros, you may have to go to a different path for the application to work properly.

It's recommended to follow these instructions with a reboot with each steps for troubleshooting issues(if those arise).

Let's begin.


1. Install the latest kernel


This whole articles doesn't matter, if the drivers handling our requests are faulty.
To make sure the drivers are up-to-date, install the latest kernel.

Consider installing the latest kernel as the same thing as installing the latest drivers for Windows.

"Meh, drivers don't matter much."
Think again.

If your CPU isn't properly supported, intel_pstate will fail to put your CPU to idle.
If your GPU isn't properly supported, i915 will fail to put your GPU to RC6(idle) state.
If your put_any_device_from_your_laptop_here isn't properly supported, the driver will fail to put your device to idle state.

One frustrating thing about package power state, is that even if your one of your devices from your laptop fail to idle properly, whole package will fail to enter deeper idle state.

The method for installing the latest kernel differs from distro to distro.
For Ubuntu, using Ukuu(Ubuntu Kernel Update Utility) is probably the easiest method.

OMG! Ubuntu! has a nice guide on using Ukuu.

It is probably a good idea to avoid using -rc releases as those could have some major breakage on system stability. You might actually suffer from filesystem corruption from using -rc kernels.

Also if you didn't know, upgrading the kernel for Linux distro is not like upgrading Ubuntu 16.04 to 18.04. You are just upgrading the core part of an operating system, you should not expect any functional differences from upgrading the kernel.


2. Install the latest firmware


Some manufacturers still consider opensourcing their entire device driver uncomfortable. They tend to ship additional firmware alongside the kernel for it to pick up later during initialization. So as running the latest kernel is important, using the latest firmware is also important.

Most distros use their own method of shipping the device firmware(in case of Ubuntu, there is "linux-firmware" package). However, just as how distros don't always ship the latest kernel, those firmware packages are also easy to be outdated very quickly.

We're going to use a brute-force method of installing the Linux firmware; from the official Git repository.
Following this instruction may even fix some devices(such as Bluetooth or Wi-Fi) to work properly.

cd /lib/firmware
git init
git remote add origin git://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git
git fetch origin # This may take some time
git reset --hard origin/master
update-initramfs -u -k all

Repeat the last 3 commands from time to time, probably followed by a kernel upgrade.

If your internet connection to git.kernel.org is slow, use Google's mirror:
git remote rm origin
git remote add origin https://kernel.googlesource.com/pub/scm/linux/kernel/git/firmware/linux-firmware


3. Enable GuC and HuC loading for i915


Using the latest firmware by the instructions above will also install GuC and HuC firmware, used by i915 graphics module.
Most of the power saving and GPU scheduling has been offloaded to these firmware, so it's critical to make sure that i915 is actually using these.

Fortunately, the steps for this is easy.

Add a new text file at /etc/modprobe.d/i915.conf with:
options i915 enable_guc=3

Then, run 'update-initramfs -u -k all' to regenerate initramfs with the new i915 option.

After a reboot, you can check if the new firmwares are at action by 'dmesg | grep i915':
[    0.595061] [drm] Finished loading DMC firmware i915/kbl_dmc_ver1_04.bin (v1.4)
[    0.603581] [drm] HuC: Loaded firmware i915/kbl_huc_ver02_00_1810.bin (version 2.0)
[    0.615365] [drm] GuC: Loaded firmware i915/kbl_guc_ver9_39.bin (version 9.39)
[    0.625914] i915 0000:00:02.0: GuC firmware version 9.39
[    0.625916] i915 0000:00:02.0: GuC submission enabled
[    0.625918] i915 0000:00:02.0: HuC enabled
[    0.628481] [drm] Initialized i915 1.6.0 20180308 for 0000:00:02.0 on minor 0

If you're more adventurous, feel free to try out other power saving options such as enable_psr(Panel Self Refresh).

For me personally, I'm using "options i915 enable_guc=3 enable_psr=2".


4. Install TLP


By default, Linux kernel uses very conservative configurations on power management to avoid issues out-of-the-box. TLP is a system utility that tries to apply more acceptable power management configurations, and is constantly maintained for new quirks and issues.

Add TLP PPA to install the latest version of TLP:
add-apt-repository ppa:linrunner/tlp
apt update
apt install tlp

The configuration file for TLP will be installed to /etc/default/tlp for you to customize, but most users will be fine with the default.


5. Tricking the BIOS to think Linux as Windows


Most BIOS implementations behave differently from different operating systems.
BIOS makers tried to filter Linux and disable some devices and features that are known to work improperly with Linux, but that later was considered as a roadblocker for Linux to actually fix.

Nowadays, it's better to just trick BIOS to think that it's running Windows, unless you're using a Linux-optimized laptop.

We need to determine which Windows version is known to the BIOS.
Follow the next commands to dump ACPI tables.

apt install acpica-tools iasl
acpidump > acpi.log
acpixtract acpi.log
iasl -d dsdt.dat

iasl will decompile the DSDT table to human readable format.
Open dsdt.dsl with a text editor, and search "Windows".
It'll look something like this:
                If (_OSI ("Windows 2012"))
                {
                    OSYS = 0x07DC
                }

                If (_OSI ("Windows 2013"))
                {
                    OSYS = 0x07DD
                }

                If (_OSI ("Windows 2015"))
                {
                    OSYS = 0x07DF
                }
Using the latest Windows version recognizable by BIOS is a good practice.
See here if you're wondering what all those numbers mean.

In the case of my Kabylake-R laptop, the BIOS behaves the same for all Windows 10 versions(RS1, RS2, RS3, RS4...).

Let's trick the BIOS to think that it's running Windows 2015.
This is done by passing some parameters to Linux command line(cmdline), which changes the way how kernel behaves. You can check your current cmdline by 'cat /proc/cmdline'.

Open /etc/default/grub and add
acpi_osi=\"Windows 2015\" acpi_osi=!
to GRUB_CMDLINE_LINUX_DEFAULT.


6. Enable ASPM


While we're at modifying cmdline, let's also force Linux to use ASPM.

Phoronix article regarding Linux ASPM
Since the version 2.6.38, the Linux kernel defaults to disabling ASPM, if it finds any mis-implementation in the BIOS regarding ASPM. While that might make sense as a default, there are too many BIOSes with said issue.

I've been forcing Linux to use ASPM for years and doing the same for my friend's laptops, and I'm yet to hit an issue except one case, which was the fault of Thunderbolt power management. Disabling Thunderbolt from the BIOS fixed that particular case.

Not using ASPM will cause package C state to stay in PC2 or PC3, which may cause ~40% increase in power consumption.

Add
pcie_aspm=force pcie_aspm.policy=powersupersave
to GRUB_CMDLINE_LINUX_DEFAULT from /etc/default/grub.

You can also try "powersave" instead of "powersupersave", if you find powersupersave to cause some issues.


7. Adjusting DRM vblank off delay


It is a very well known thing that i915 devices can use the lowest vblank off delay safely, which reduces wakeup events.

Let's add "drm.vblankoffdelay=1" to GRUB_CMDLINE_LINUX_DEFAULT as well.

This is how /etc/default/grub looks on my laptop:




8. Remove unused USB devices


USB is not just for external peripherals. There are internal USB devices such as webcam, Bluetooth and more.

While USB devices technically have an autosuspend functionality, it's still common to see devices with mis-implemented autosuspend. And if there is a device that you know that you won't be using, it's better to just straight up disabling the device rather than relying on autosuspend to do its job.

Type 'lsusb' and 'lsusb -v' to see which USB devices you have. It's probably a good idea to disconnect any external peripherals to filter internal devices.

In my case, "ID 0bda:562e" is a webcam and "ID 8087:0a2b" is a Bluetooth adapter, which I know I won't be using.

To disable/remove USB devices, we need to write a script that'll run upon boot and resume, as USB devices are re-added upon resuming from suspend.

Let's write one to /etc/suspend.sh.

#!/bin/bash

exec > /dev/kmsg 2>&1

sleep 3
find /sys -name idProduct | while read file; do
  if cat $file | grep -q '562e\|0a2b'; then
    echo Removing $(dirname $file)
    echo 1 > $(dirname $file)/remove
  fi
done

The part you need to change is at "grep -q '562e\|0a2b'".
The syntax is simple, just append new devices followed by "\|":
grep -q '0000\|1111\|2222\|3333\|4444'

The second line will make the script to print to the kernel log buffer, which is then later accessible by a simple 'dmesg' command.

Give the new script permission for execution:
chmod 755 /etc/suspend.sh

Now, for it to execute upon reboot, let's use crontab:

Add "@reboot /etc/suspend.shto the crontab file, which is accessible by 'crontab -e' command(make sure to run this as root!).

If you need to access those blacklisted USB devices, just remove the script and suspend/resume your laptop.

Now we need to execute the script upon resume. Let's write a systemd service for that:
Create a new file at /lib/systemd/system/laptop_suspend.service and add:

[Unit]
Description=Laptop suspend
Before=sleep.target
StopWhenUnneeded=yes

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStop=/etc/suspend.sh

[Install]
WantedBy=sleep.target

After that, install the script by:
systemctl enable laptop_suspend

You can validate whether it works by checking kernel log with dmesg:
[10838.133311] Removing /sys/devices/pci0000:00/0000:00:14.0/usb1/1-5
[10838.296408] usb 1-5: USB disconnect, device number 2
[10838.310044] Removing /sys/devices/pci0000:00/0000:00:14.0/usb1/1-6
[10838.454298] usb 1-6: USB disconnect, device number 3

9. Disable unused PCIe peripherals


The reason for this follows the same as above.
Only this time, this is much critical as PCIe can directly impact package C state.

Type 'lspci | grep -v 00:' to see which PCIe peripherals you have on your laptop.
The reason for the 'grep -v 00:' is that, PCI devices that start with 00: is usually a PCI bridge or controller.

In my case, 01:00.0 and 02:00.0 are NVMe SSDs, 03:00.0 is a PCIe microSD reader, and 04:00.0 is a Wi-Fi card.

I can disable 03:00.0 as I won't be using it anyways, and I later confirmed it as a package C state blocker as disabling the device made my laptop to enter PC8 whereas it was limited to PC6 before.
If you don't have any unused PCIe devices(which is very common), skip this step.

Disabling PCIe peripherals is a more straight forward process. We can just disable loading of a device driver that is responsible for such device.

'lspci -v' will print out which kernel modules are responsible for each devices. In my case, rtsx_pci was responsible. 'lsmod | grep rtsx_pci' showed some other modules that cooperates with rtsx_pci.

Adding all of those to /etc/modprobe.d/blacklist-rtsx.conf will prevent those modules from working.

Create a new blacklist.conf file to /etc/modprobe.d(file name doesn't matter), and add "blacklist " in front of every modules.
In my case, I had to write
blacklist rtsx_pci_ms
blacklist rtsx_pci_sdmmc
blacklist rtsx_pci
to /etc/modprobe.d/blacklist-rtsx.conf

After that, regenerate initramfs as initramfs(which is loaded before the actual mount of root filesystem) can contain some modules:
update-initramfs -u -k all


10. Undervolt


A very straight forward and easy guide is available here.

I'd recommend everyone to spend a few days finding the optimal voltage for their CPU/GPU as it directly impacts power consumption.

The average I found for about a dozen laptops are -0.080 mV.
If you're lucky like me, you may even get around with -0.180 mV, which is YUGE!

While the steps for this is easy, it's unfortunately the most time consuming one as you also have to run stability test and fine-tune the voltage that's very specific to your laptop.

After finding the best voltage for your laptop, write the wrmsr commands to the /etc/suspend.sh script we made earlier as MSR registers also get reset after suspend.


11. Check the results with PowerTOP


PowerTOP is a utility provided by Intel to diagnose power consumption on Linux system.
It also provides statistics of C states and overall power consumption.

Install PowerTOP and run it:
apt install powertop
powertop

PowerTOP will give additional statistics to find out which userspace processes are consuming the most power.

The tabs you should be looking at the most are "Idle stats" and "Device stats".
The "Idle stats" tab will give you package C state reports and core C state reports.

If your laptop fails to enter PC8(Package C state 8), then darn it. Looks like you have to dig the internet a bit more and figure out what's wrong with your system.

The "Device stats" tab will give you power consumption number in Watts.
Leave your laptop idle for a few seconds to read the lowest power consumption number.

I recommend you to leave "Tunables" tab as-is as it's maintained by TLP.


12. Share your results!


I find these steps to be sufficient for most laptops without dGPU, but I'd love to hear your stories :)



In my case, before these steps, the package C states were stuck to PC3. Now, it comfortably enters PC8.
Respectively, the idle power consumption went from 5.9W to 3.9W.


That's a 34% improvement. And thanks to the nature of Linux, it's free from CPU-eating-mysterious-processes which could be easily found on Windows, resulting in better power consumption than Windows.

Post your PowerTOP stats in the comments below!























Comments

  1. You should also try i915.enable_fbc=1 i915.enable_dc=2. For my laptop it makes a difference (0.5 -1 W)

    ReplyDelete

Post a Comment

Popular posts from this blog

리눅스를 사용하는 노트북에서 전력 소모 낮추기

Experimenting around btrfs on Android