This commit is contained in:
xinqiu
2015-07-24 21:01:36 +08:00
parent b2134001eb
commit 1d3423f7dd
41 changed files with 0 additions and 13858 deletions

View File

@@ -1,9 +0,0 @@
Data Structures in the Linux Kernel
========================================================================
Linux kernel provides implementations of a different data structures like linked list, B+ tree, prinority heap and many many more.
This part considers these data structures and algorithms.
* [Doubly linked list](https://github.com/0xAX/linux-insides/blob/master/DataStructures/dlist.md)
* [Radix tree](https://github.com/0xAX/linux-insides/blob/master/DataStructures/radix-tree.md)

View File

@@ -1,247 +0,0 @@
Data Structures in the Linux Kernel
================================================================================
Doubly linked list
--------------------------------------------------------------------------------
Linux kernel provides its own doubly linked list implementation which you can find in the [include/linux/list.h](https://github.com/torvalds/linux/blob/master/include/linux/list.h). We will start `Data Structures in the Linux kernel` from the doubly linked list data structure. Why? Because it is very popular in the kernel, just try to [search](http://lxr.free-electrons.com/ident?i=list_head)
First of all let's look on the main structure:
```C
struct list_head {
struct list_head *next, *prev;
};
```
You can note that it is different from many lists implementations which you could see. For example this doubly linked list structure from the [glib](http://www.gnu.org/software/libc/):
```C
struct GList {
gpointer data;
GList *next;
GList *prev;
};
```
Usually a linked list structure contains a pointer to the item. Linux kernel implementation of the list does not. So the main question is - `where does the list store the data?`. The actual implementation of lists in the kernel is - `Intrusive list`. An intrusive linked list does not contain data in its nodes - A node just contains pointers to the next and previous node and list nodes part of the data that are added to the list. This makes the data structure generic, so it does not care about entry data type anymore.
For example:
```C
struct nmi_desc {
spinlock_t lock;
struct list_head head;
};
```
Let's look at some examples, how `list_head` is used in the kernel. As I already wrote about, there are many, really many different places where lists are used in the kernel. Let's look for example in miscellaneous character drivers. Misc character drivers API from the [drivers/char/misc.c](https://github.com/torvalds/linux/blob/master/drivers/char/misc.c) for writing small drivers for handling simple hardware or virtual devices. This drivers share major number:
```C
#define MISC_MAJOR 10
```
but has own minor number. For example you can see it with:
```
ls -l /dev | grep 10
crw------- 1 root root 10, 235 Mar 21 12:01 autofs
drwxr-xr-x 10 root root 200 Mar 21 12:01 cpu
crw------- 1 root root 10, 62 Mar 21 12:01 cpu_dma_latency
crw------- 1 root root 10, 203 Mar 21 12:01 cuse
drwxr-xr-x 2 root root 100 Mar 21 12:01 dri
crw-rw-rw- 1 root root 10, 229 Mar 21 12:01 fuse
crw------- 1 root root 10, 228 Mar 21 12:01 hpet
crw------- 1 root root 10, 183 Mar 21 12:01 hwrng
crw-rw----+ 1 root kvm 10, 232 Mar 21 12:01 kvm
crw-rw---- 1 root disk 10, 237 Mar 21 12:01 loop-control
crw------- 1 root root 10, 227 Mar 21 12:01 mcelog
crw------- 1 root root 10, 59 Mar 21 12:01 memory_bandwidth
crw------- 1 root root 10, 61 Mar 21 12:01 network_latency
crw------- 1 root root 10, 60 Mar 21 12:01 network_throughput
crw-r----- 1 root kmem 10, 144 Mar 21 12:01 nvram
brw-rw---- 1 root disk 1, 10 Mar 21 12:01 ram10
crw--w---- 1 root tty 4, 10 Mar 21 12:01 tty10
crw-rw---- 1 root dialout 4, 74 Mar 21 12:01 ttyS10
crw------- 1 root root 10, 63 Mar 21 12:01 vga_arbiter
crw------- 1 root root 10, 137 Mar 21 12:01 vhci
```
Now let's look how lists are used in the misc device drivers. First of all let's look on `miscdevice` structure:
```C
struct miscdevice
{
int minor;
const char *name;
const struct file_operations *fops;
struct list_head list;
struct device *parent;
struct device *this_device;
const char *nodename;
mode_t mode;
};
```
We can see the fourth field in the `miscdevice` structure - `list` which is list of registered devices. In the beginning of the source code file we can see definition of the:
```C
static LIST_HEAD(misc_list);
```
which expands to definition of the variables with `list_head` type:
```C
#define LIST_HEAD(name) \
struct list_head name = LIST_HEAD_INIT(name)
```
and initializes it with the `LIST_HEAD_INIT` macro which set previous and next entries:
```C
#define LIST_HEAD_INIT(name) { &(name), &(name) }
```
Now let's look on the `misc_register` function which registers a miscellaneous device. At the start it initializes `miscdevice->list` with the `INIT_LIST_HEAD` function:
```C
INIT_LIST_HEAD(&misc->list);
```
which does the same that `LIST_HEAD_INIT` macro:
```C
static inline void INIT_LIST_HEAD(struct list_head *list)
{
list->next = list;
list->prev = list;
}
```
In the next step after device created with the `device_create` function we add it to the miscellaneous devices list with:
```
list_add(&misc->list, &misc_list);
```
Kernel `list.h` provides this API for the addition of new entry to the list. Let's look on it's implementation:
```C
static inline void list_add(struct list_head *new, struct list_head *head)
{
__list_add(new, head, head->next);
}
```
It just calls internal function `__list_add` with the 3 given parameters:
* new - new entry;
* head - list head after which will be inserted new item;
* head->next - next item after list head.
Implementation of the `__list_add` is pretty simple:
```C
static inline void __list_add(struct list_head *new,
struct list_head *prev,
struct list_head *next)
{
next->prev = new;
new->next = next;
new->prev = prev;
prev->next = new;
}
```
Here we set new item between `prev` and `next`. So `misc` list which we defined at the start with the `LIST_HEAD_INIT` macro will contain previous and next pointers to the `miscdevice->list`.
There is still only one question how to get list's entry. There is special special macro for this point:
```C
#define list_entry(ptr, type, member) \
container_of(ptr, type, member)
```
which gets three parameters:
* ptr - the structure list_head pointer;
* type - structure type;
* member - the name of the list_head within the struct;
For example:
```C
const struct miscdevice *p = list_entry(v, struct miscdevice, list)
```
After this we can access to the any `miscdevice` field with `p->minor` or `p->name` and etc... Let's look on the `list_entry` implementation:
```C
#define list_entry(ptr, type, member) \
container_of(ptr, type, member)
```
As we can see it just calls `container_of` macro with the same arguments. For the first look `container_of` looks strange:
```C
#define container_of(ptr, type, member) ({ \
const typeof( ((type *)0)->member ) *__mptr = (ptr); \
(type *)( (char *)__mptr - offsetof(type,member) );})
```
First of all you can note that it consists from two expressions in curly brackets. Compiler will evaluate the whole block in the curly braces and use the value of the last expression.
For example:
```
#include <stdio.h>
int main() {
int i = 0;
printf("i = %d\n", ({++i; ++i;}));
return 0;
}
```
will print `2`.
The next point is `typeof`, it's simple. As you can understand from its name, it just returns the type of the given variable. When I first saw the implementation of the `container_of` macro, the strangest thing for me was the zero in the `((type *)0)` expression. Actually this pointer magic calculates the offset of the given field from the address of the structure, but as we have `0` here, it will be just a zero offset alongwith the field width. Let's look at a simple example:
```C
#include <stdio.h>
struct s {
int field1;
char field2;
char field3;
};
int main() {
printf("%p\n", &((struct s*)0)->field3);
return 0;
}
```
will print `0x5`.
The next offsetof macro calculates offset from the beginning of the structure to the given structure's field. Its implementation is very similar to the previous code:
```C
#define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER)
```
Let's summarize all about `container_of` macro. `container_of` macro returns address of the structure by the given address of the structure's field with `list_head` type, the name of the structure field with `list_head` type and type of the container structure. At the first line this macro declares the `__mptr` pointer which points to the field of the structure that `ptr` points to and assigns it to the `ptr`. Now `ptr` and `__mptr` point to the same address. Technically we don't need this line but its useful for type checking. First line ensures that that given structure (`type` parameter) has a member called `member`. In the second line it calculates offset of the field from the structure with the `offsetof` macro and subtracts it from the structure address. That's all.
Of course `list_add` and `list_entry` is not only functions which provides `<linux/list.h>`. Implementation of the doubly linked list provides the following API:
* list_add
* list_add_tail
* list_del
* list_replace
* list_move
* list_is_last
* list_empty
* list_cut_position
* list_splice
and many more.

View File

@@ -1,184 +0,0 @@
Data Structures in the Linux Kernel
================================================================================
Radix tree
--------------------------------------------------------------------------------
As you already know linux kernel provides many different libraries and functions which implement different data structures and algorithm. In this part we will consider one of these data structures - [Radix tree](http://en.wikipedia.org/wiki/Radix_tree). There are two files which are related to `radix tree` implementation and API in the linux kernel:
* [include/linux/radix-tree.h](https://github.com/torvalds/linux/blob/master/include/linux/radix-tree.h)
* [lib/radix-tree.c](https://github.com/torvalds/linux/blob/master/lib/radix-tree.c)
Lets talk about what is `radix tree`. Radix tree is a `compressed trie` where [trie](http://en.wikipedia.org/wiki/Trie) is a data structure which implements interface of an associative array and allows to store values as `key-value`. The keys are usually strings, but any other data type can be used as well. Trie is different from any `n-tree` in its nodes. Nodes of a trie do not store keys, instead, a node of a trie stores single character labels. The key which is related to a given node is derived by traversing from the root of the tree to this node. For example:
```
+-----------+
||
|" "|
| |
+------+-----------+------+
||
||
+----v------++-----v-----+
||||
|g||c|
| | | |
+-----------++-----------+
||
||
+----v------++-----v-----+
||||
|o||a|
| | | |
+-----------++-----------+
|
|
+-----v-----+
||
|t|
| |
+-----------+
```
So in this example, we can see the `trie` with keys, `go` and `cat`. The compressed trie or `radix tree` differs from `trie`, such that all intermediates nodes which have only one child are removed.
Radix tree in linux kernel is the datastructure which maps values to the integer key. It is represented by the following structures from the file [include/linux/radix-tree.h](https://github.com/torvalds/linux/blob/master/include/linux/radix-tree.h):
```C
struct radix_tree_root {
unsigned int height;
gfp_t gfp_mask;
struct radix_tree_node __rcu *rnode;
};
```
This structure presents the root of a radix tree and contains three fields:
* `height` - height of the tree;
* `gfp_mask` - tells how memory allocations are to be performed;
* `rnode` - pointer to the child node.
The first structure we will discuss is `gfp_mask`:
Low-level kernel memory allocation functions take a set of flags as - `gfp_mask`, which describes how that allocation is to be performed. These `GFP_` flags which control the allocation process can have following values, (`GF_NOIO` flag) be sleep and wait for memory, (`__GFP_HIGHMEM` flag) is high memory can be used, (`GFP_ATOMIC` flag) is allocation process high-priority and can't sleep etc.
The next structure is `rnode`:
```C
struct radix_tree_node {
unsigned int path;
unsigned int count;
union {
struct {
struct radix_tree_node *parent;
void *private_data;
};
struct rcu_head rcu_head;
};
/* For tree user */
struct list_head private_list;
void __rcu *slots[RADIX_TREE_MAP_SIZE];
unsigned long tags[RADIX_TREE_MAX_TAGS][RADIX_TREE_TAG_LONGS];
};
```
This structure contains information about the offset in a parent and height from the bottom, count of the child nodes and fields for accessing and freeing a node. The fields are described below:
* `path` - offset in parent & height from the bottom;
* `count` - count of the child nodes;
* `parent` - pointer to the parent node;
* `private_data` - used by the user of a tree;
* `rcu_head` - used for freeing a node;
* `private_list` - used by the user of a tree;
The two last fields of the `radix_tree_node` - `tags` and `slots` are important and interesting. Every node can contains a set of slots which are store pointers to the data. Empty slots in the linux kernel radix tree implementation store `NULL`. Radix tree in the linux kernel also supports tags which are associated with the `tags` fields in the `radix_tree_node` structure. Tags allow to set individual bits on records which are stored in the radix tree.
Now we know about radix tree structure, time to look on its API.
Linux kernel radix tree API
---------------------------------------------------------------------------------
We start from the datastructure intialization. There are two ways to initialize new radix tree. The first is to use `RADIX_TREE` macro:
```C
RADIX_TREE(name, gfp_mask);
````
As you can see we pass the `name` parameter, so with the `RADIX_TREE` macro we can define and initialize radix tree with the given name. Implementation of the `RADIX_TREE` is easy:
```C
#define RADIX_TREE(name, mask) \
struct radix_tree_root name = RADIX_TREE_INIT(mask)
#define RADIX_TREE_INIT(mask) { \
.height = 0, \
.gfp_mask = (mask), \
.rnode = NULL, \
}
```
At the beginning of the `RADIX_TREE` macro we define instance of the `radix_tree_root` structure with the given name and call `RADIX_TREE_INIT` macro with the given mask. The `RADIX_TREE_INIT` macro just initializes `radix_tree_root` structure with the default values and the given mask.
The second way is to define `radix_tree_root` structure by hand and pass it with mask to the `INIT_RADIX_TREE` macro:
```C
struct radix_tree_root my_radix_tree;
INIT_RADIX_TREE(my_tree, gfp_mask_for_my_radix_tree);
```
where:
```C
#define INIT_RADIX_TREE(root, mask) \
do { \
(root)->height = 0; \
(root)->gfp_mask = (mask); \
(root)->rnode = NULL; \
} while (0)
```
makes the same initialziation with default values as it does `RADIX_TREE_INIT` macro.
The next are two functions for the inserting and deleting records to/from a radix tree:
* `radix_tree_insert`;
* `radix_tree_delete`.
The first `radix_tree_insert` function takes three parameters:
* root of a radix tree;
* index key;
* data to insert;
The `radix_tree_delete` function takes the same set of parameters as the `radix_tree_insert`, but without data.
The search in a radix tree implemented in two ways:
* `radix_tree_lookup`;
* `radix_tree_gang_lookup`;
* `radix_tree_lookup_slot`.
The first `radix_tree_lookup` function takes two parameters:
* root of a radix tree;
* index key;
This function tries to find the given key in the tree and returns associated record with this key. The second `radix_tree_gang_lookup` function have the following signature
```C
unsigned int radix_tree_gang_lookup(struct radix_tree_root *root,
void **results,
unsigned long first_index,
unsigned int max_items);
```
and returns number of records, sorted by the keys, starting from the first index. Number of the returned records will be not greater than `max_items` value.
And the last `radix_tree_lookup_slot` function will return the slot which will contain the data.
Links
---------------------------------------------------------------------------------
* [Radix tree](http://en.wikipedia.org/wiki/Radix_tree)
* [Trie](http://en.wikipedia.org/wiki/Trie)