Linux + Administrator = Linuxator

Linux Adminstrator’s blog

  • Blog Stats

    • 61,264 hits

Posts Tagged ‘code’

5 things you didn’t know about linux kernel code metrics

Posted by Maciej Sołtysiak on July 22, 2008

Recently Greg Kroah Hartman showed some very interesting Linux kernel development stats. I decided to do some too and the result are 5 cool things you probably didn’t know about the kernel code 😉

These aren’t anything I’ve seen so far about the kernel.

Greg’s stats

First, let’s quickly summarize Greg’s findings related to kernel size (he also did a lot of work on who’s contributing. I’ll skip this here). Daily average (based on data from 2007-2008 period) is:

  • 4300 lines added, 1800 lines removed, 1500 lines modified
  • 3.69 changes per hour.

Greg also says that the kernel is growing 10% each year, with a current 9.2 million lines of which the biggest part are the drivers (55%). The core kernel is about 5%.

Those numbers seemed a little bit odd to me. Especially the 9 million lines. I wanted to check it myself. What I found out was that Greg wasn’t counting the pure source lines of code (SLOC), but all lines, he didn’t exclude comments and blank lines. That is why my metrics differ from his. It’s funny that the Wikipedia article on SLOC gives 5.2 million for 2.6.0 kernel, which also seems incorrect.

My stats

I started with writing a small script that:

  1. downloads a 2.6.0 kernel, analyzes it using SLOCCount written by David Wheeler
  2. patches to one step newer kernel and analyzes it using the same tool.
  3. goes to number 2 until patches run out at 2.6.26

Just in case I also used a different tool called cloc to analyze the same code. Word of comment on tools used is at the end of this post.

This script ate 477MB of disk space with tarballs and bzipped patches.

1. The kernel has just reached 6 millions lines with 2.6.26!

Linux kernel lines of code

Linux kernel lines of code

Yes, indeed, with 2.6.26 we’ve reached over 6 million lines of code. You can see that on the chart on the right (click for a normal size version).

Both SLOCCount and CLOC show similar results. What is interesting here is that:

  • there’s over a million of blank lines,
  • and a million lines of comments (which are of course important too),
  • the code shows a faster-than-linear growth characteristic,
  • if current speed is maintained I predict we might get 7 million with 2.6.30 and 8 million with 2.6.33, just look at the forecast.
Linux kernel lines of code forecast

Linux kernel lines of code forecast

2. It takes about 83 days (2¾ months) for a new kernel release!

As Greg Kroah-Hartman says, the current release scheme is solid and we’re getting an average of around 80-83 days between releases. That stability was starting around 2006 while the first 2.6 releases were more frequent and buggy. Here’s a graph and a table showing the numbers for the stable release cycle.

Days between releases of Linux  kernel

Days between releases of Linux kernel

3. Number of files in the project continues to grow faster than linear

This means that not only the size of current code grows but lots of new things come around. And this is true. Think of virtualization infrastructures, wireless, new architectures (eg. OLPC was merged recently).

File in Linux kernel source

File in Linux kernel source

First, look at the sheer number of files and how much they weigh in MBs. To the right, blue line represents all files in the directory, green line shows the number files that were analyzed by SLOCCount and CLOC. Not all files are analyzed because not all contain code. Anyway this give an idea of how many files people put in the source code.

Size of Linux kernel source code

Size of Linux kernel source code

Size is growing very rapidly too. Recent kernels grow with an average 6,3 MB. The record winning kernel is 2.5.25 which gained a whopping 13MB. If you take the 83 day lifecycle this means that it was gaining around 80 kB per day! (It’s not just code, documentation adds up to the numbers)

Top 8 directories in the kernel source

Top 8 directories in the kernel source

It is quite educating to look at exactly what’s growing per directory in therms of SLOC. If you take a look at top 8 directories within the kernel you can notice that:

  • drivers (/drivers) are a huge part and grow very quickly
  • arch (/arch) started growing around 2.6.5
  • network (/net) started growing around 2.6.13
  • filesystems (/fs) do not grow that much but they have their bursts like with 2.6.16, 2.6.19, where bulks of code where merged
  • network (/net) which stareted growing at 2.6.16, now outgrew sound (/sound)

I also did a graph with bottom with the LOC, but personally I don’t see it particuraly amusing, but here goes:

The rest of the directories

The rest of the directories

4. Daily SLOC added are over 1000 and this metric is also growing

LOC/day growth between versions

LOC/day growth between versions

The daily growth of SLOC for given releases varies of course. There are quite big differences between versions, however what can be certainly stated is that the trend is growing. Both the lower and upper bounds are at higher values with each new kernel.

Not incredily interesting but still, a metric is a metric and you can compare with other projects and your own programs 😉

5. Language breakdown of 2.6.26 using CLOC

2 different reports.
== CLOC ==
-----------------------------------------------------
Language          files     blank   comment      code
-----------------------------------------------------
C                 10195    921822    976772   4709722
C/C++ Header       9400    203125    321830   1096551
Assembler          1005     36250     42921    225549
make               1005      4569      5350     15238
Perl                 19      1157      1256      6092
yacc                  5       437       318      2919
Bourne Shell         48       404      1205      2623
C++                   1       205        58      1496
lex                   5       225       248      1395
HTML                  2        58         0       367
NAnt scripts          1        83         0       290
Lisp                  1        63         0       218
Python                2        41        37       208
ASP                   1        33         0       136
awk                   2        14         7        98
Bourne Again Shell    2         7        17        34
XSLT                  1         0         1         7
-----------------------------------------------------
SUM:              21695   1168493   1350020   6062943
-----------------------------------------------------

== SLOCCount ==
ansic:		5780304	(96.08%)
asm:		218132	(3.63%)
perl:		6075	(0.10%)
cpp:		3242	(0.05%)
yacc:		2919	(0.05%)
sh:		2609	(0.04%)
lex:		1825	(0.03%)
python:		331	(0.01%)
lisp:		218	(0.00%)
pascal:		116	(0.00%)
awk:		96	(0.00%)

Word of comment on tools used

SLOCCount is very fast, CLOC is very slow (crunching over 10 hours with CLOC). The results of SLOC are similar, there’s a difference around 1% between them, so It’s neglible. The output results were processed and put into a CSV file and processed with JpGraph. Why JpGraph? Because I wanted to try it out, just that 🙂

Cheers!

Links

  1. Linux Kernel Development Stats from Greg Kroah Hartman
  2. Greg Kroah Hartman on the Linux Kernel
  3. David Wheeler’s SLOCCount
  4. CLOC – Count Lines of Code
  5. JpGraph – Graph creating library for PHP

Posted in Kernel | Tagged: , , | 18 Comments »