Thierry Escande
November 06, 2017
Reading time:
Kmemleak (Kernel Memory Leak Detector) allows you to track possible memory leaks inside the Linux kernel. Basically, it tracks dynamically allocated memory blocks in the kernel and reports those without any reference left and that are therefore impossible to free. You can check the kmemleak page for more details.
This post exposes real life use cases that I encountered while working on the NFC Digital Protocol stack.
kmemleak can be enabled in the kernel configuration under Kernel hacking > Memory Debugging.
[*] Kernel memory leak detector (4000) Maximum kmemleak early log entries < > Simple test for the kernel memory leak detector [*] Default kmemleak to off
I used to turn it off by default and enable it on demand by passing kmemleak=on to the kernel command line. If some leaks occur before kmemleak is initialized you may need to increase the "early log entries" value. I used to set it to 4000.
The sysfs interface of kmemleak is a single file located in /sys/kernel/debug/kmemleak. You can control kmemleak with the following operations:
Trigger a memory scan:
$ echo scan > /sys/kernel/debug/kmemleak
Clear the leaks list:
$ echo clean > /sys/kernel/debug/kmemleak
Check the possible memory leaks by reading the control file:
$ cat /sys/kernel/debug/kmemleak
I will not go deep regarding the various NFC technologies and the following examples will be based on NFC-DEP, the protocol used to connect 2 NFC devices and make them communicate through standard POSIX sockets. DEP stands for Data Exchange Protocol.
For the purpose of this post I'm using nfctool, a standalone command line tool used to control and monitor NFC devices. nfctool is part of neard, the Linux NFC daemon.
So let's start with an easy case.
When putting a NFC device in target polling mode, it listens for different modulation modes from a peer device in initiator mode. When I first used kmemleak I was surprised to see possible leaks reported by kmemleak while not even a single byte has been exchanged, simply by turning target poll mode on the nfc0 device.
$ nfctool -d nfc0 -p Target
A few seconds later, after a kmemleak scan using:
$ echo scan > /sys/kernel/debug/kmemleak
The following message appear in the syslog:
[11764.643878] kmemleak: 8 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
OK! Check the kmemleak sysfs file then:
$ cat /sys/kernel/debug/kmemleak unreferenced object 0xffff9be0f8f43a08 (size 8): comm "kworker/0:1", pid 41, jiffies 4297830116 (age 16.044s) hex dump (first 8 bytes): 01 fe d3 80 ca 41 f1 a0 .....A.. backtrace: [] kmemleak_alloc+0x4a/0xa0 [] kmem_cache_alloc_trace+0xf5/0x1d0 [] digital_tg_listen_nfcf+0x3b/0x90 [nfc_digital] [] digital_wq_poll+0x5d/0x90 [nfc_digital] [] process_one_work+0x156/0x3f0 [] worker_thread+0x4b/0x410 [] kthread+0x109/0x140 [] ret_from_fork+0x25/0x30 [] 0xffffffffffffffff
This gives the call stack where the allocation has been actually done. So let's have a look at digital_tg_listen_nfcf()...
int digital_tg_listen_nfcf(struct nfc_digital_dev *ddev, u8 rf_tech) { int rc; u8 *nfcid2; rc = digital_tg_config_nfcf(ddev, rf_tech); if (rc) return rc; nfcid2 = kzalloc(NFC_NFCID2_MAXSIZE, GFP_KERNEL); if (!nfcid2) return -ENOMEM; nfcid2[0] = DIGITAL_SENSF_NFCID2_NFC_DEP_B1; nfcid2[1] = DIGITAL_SENSF_NFCID2_NFC_DEP_B2; get_random_bytes(nfcid2 + 2, NFC_NFCID2_MAXSIZE - 2); return digital_tg_listen(ddev, 300, digital_tg_recv_sensf_req, nfcid2); }
The only allocation here is the nfcid2 array, passed to digital_tg_listen() as 4th parameter, a user argument supposed to be returned as a function argument to the callback digital_tg_recv_sensf_req() upon reception of a valid frame from the peer device or if a timeout error occurs (nobody on the other side is talking to us). After a quick check in digital_tg_recv_sensf_req() it appears that the user argument is not used at all and of course not released.
As I said, that one was easy. There was no need for the nfcid2 array to be allocated in the first place so the fix was pretty straightforward.
Now digital_tg_listen_nfcf() looks good:
int digital_tg_listen_nfcf(struct nfc_digital_dev *ddev, u8 rf_tech) { int rc; rc = digital_tg_config_nfcf(ddev, rf_tech); if (rc) return rc; return digital_tg_listen(ddev, 300, digital_tg_recv_sensf_req, NULL); }
The commit for this fix can be found here.
Another use case for leaks hunting was about un-freed socket buffers.
In the kernel, the fundamental data structure used to exchange data between layers in networking protocols is the socket buffer or SKB. This is not the purpose of this post but David S. Miller wrote this about how SKBs work and it's a very good starting point if you're are interested to know how protocol stacks deal with data encapsulation through their various layers.
In this case, the leak was occurring when sending large data buffers between 2 NFC devices, connected together using NFC-DEP. The protocol handles small size PDUs and when the sent data is too big, it gets split in several PDUs. The sender is responsible for chaining PDUs and the receiver reply to each chained PDUs with an acknowledge PDU.
Another development tool used in this example is the virtual NFC device. It is a driver of my own called nfcsim that declares 2 NFC devices acting as loopback devices. Once connected using NFC-DEP, everything sent from one device is received by the other one.
While working on chained PDUs here is what kmemleak reported:
unreferenced object 0xffff9be0f8bd4a00 (size 512): comm "kworker/0:0", pid 3074, jiffies 4299448025 (age 9.164s) hex dump (first 32 bytes): 00 56 bd 04 d4 06 41 ff 00 00 00 00 00 00 00 00 .V....A......... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [] kmemleak_alloc+0x4a/0xa0 [] __kmalloc_node_track_caller+0x196/0x2b0 [] __kmalloc_reserve.isra.36+0x31/0x90 [] __alloc_skb+0x78/0x1a0 [] skb_copy+0x40/0xb0 [] nfcsim_send+0xc0/0x180 [nfcsim] [] nfcsim_in_send_cmd+0x11/0x13 [nfcsim] [] digital_wq_cmd+0x1b4/0x1f0 [nfc_digital] [] process_one_work+0x156/0x3f0 [] worker_thread+0x4b/0x410 [] kthread+0x109/0x140 [] ret_from_fork+0x25/0x30 [] 0xffffffffffffffff
Here, the initiator device allocates a SKB and the nfcsim driver sends it to the target device through a kernel workqueue. The SKB is copied in nfcsim_send() with skb_copy(), then this copy is dispatched to the target device, and the original one is freed. The issue here is that we don't know where the original SKB was allocated in the first place and where it is supposed to be freed by the target.
Hopefully, the nfcsim driver can be tweaked a bit to not send back a copy of the initial SKB but rather the same SKB. It means that when the leak will be detected by kmemleak, we will see in which function it was allocated at first.
So in nfcsim_send(), instead of:
dev_kfree_skb(link->skb); link->skb = skb_copy(skb, GFP_KERNEL); dev_kfree_skb(skb); link->rf_tech = rf_tech; link->mode = mode;
We will simply have:
dev_kfree_skb(link->skb); link->skb = skb; link->rf_tech = rf_tech; link->mode = mode;
With this modification in the nfcsim drvier kmemleak now reports:
unreferenced object 0xffff9be0f9903100 (size 256): comm "kworker/1:4", pid 1614, jiffies 4300771597 (age 8.472s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [] kmemleak_alloc+0x4a/0xa0 [] kmem_cache_alloc_node+0x13a/0x1e0 [] __alloc_skb+0x4c/0x1a0 [] digital_skb_alloc+0x26/0x40 [nfc_digital] [] digital_in_send_ack+0x22/0x100 [nfc_digital] [] digital_recv_dep_data_gather+0x10b/0x52f [nfc_digital] [] digital_in_recv_dep_res+0x1c6/0x670 [nfc_digital] [] digital_wq_cmd_complete+0x8e/0x110 [nfc_digital] [] process_one_work+0x156/0x3f0 [] worker_thread+0x4b/0x410 [] kthread+0x109/0x140 [] ret_from_fork+0x25/0x30 [] 0xffffffffffffffff
Here, the initiator receives a DEP response in digital_in_recv_dep_res() and sends an acknowledge PDU through digital_in_send_ack() which allocates the unfreed SKB. This SKB is actually supposed to be freed in digital_tg_recv_dep_req().
Here is the code that handles ACK PDU in digital_tg_recv_dep_req(). Like other functions in the NFC stack, it is responsible for freeing the resp SKB passed to it if an error occurs. In fact, most of the code paths for ACK PDUs do not release it.
case DIGITAL_NFC_DEP_PFB_ACK_NACK_PDU:
if (!DIGITAL_NFC_DEP_NACK_BIT_SET(pfb)) { /* ACK */
if ((ddev->atn_count &&
(DIGITAL_NFC_DEP_PFB_PNI(pfb - 1) !=
ddev->curr_nfc_dep_pni)) ||
(DIGITAL_NFC_DEP_PFB_PNI(pfb) !=
ddev->curr_nfc_dep_pni) ||
!ddev->chaining_skb || !ddev->saved_skb) {
rc = -EIO;
goto exit;
}
if (ddev->atn_count) {
ddev->atn_count = 0;
rc = digital_tg_send_saved_skb(ddev);
if (rc)
goto exit;
/** This return path does not free resp SKB */
return;
}
kfree_skb(ddev->saved_skb);
ddev->saved_skb = NULL;
rc = digital_tg_send_dep_res(ddev, ddev->chaining_skb);
if (rc)
goto exit;
} else { /* NACK */
<snip>
}
/** This return path does not free resp SKB */
return;
case DIGITAL_NFC_DEP_PFB_SUPERVISOR_PDU:
The fix for these leaks was part of a complete rework of the ACK/NACK PDU handling and the patch can be seen here.
In conclusion, kmemleak is a powerful tool but all it can do is to show you where the memory was allocated and what is inside that chunk when the leak is detected.
In these examples, with SKBs going from a protocol layer to another, most of the time in an asynchronous manner by using kernel workqueues, I had to modify the code to get where this was exactly allocated and where it was supposed to be freed.
Hopefully, the NFC Digital Protocol stack does not scatter memory chunks everywhere anymore. But as new code and drivers are added, I keep running kmemleak from time to time, just to make sure it doesn't spread out memory.
19/12/2024
In the world of deep learning optimization, two powerful tools stand out: torch.compile, PyTorch’s just-in-time (JIT) compiler, and NVIDIA’s…
08/10/2024
Having multiple developers work on pre-merge testing distributes the process and ensures that every contribution is rigorously tested before…
15/08/2024
After rigorous debugging, a new unit testing framework was added to the backend compiler for NVK. This is a walkthrough of the steps taken…
01/08/2024
We're reflecting on the steps taken as we continually seek to improve Linux kernel integration. This will include more detail about the…
27/06/2024
With each board running a mainline-first Linux software stack and tested in a CI loop with the LAVA test framework, the Farm showcased Collabora's…
26/06/2024
WirePlumber 0.5 arrived recently with many new and essential features including the Smart Filter Policy, enabling audio filters to automatically…
Comments (0)
Add a Comment