Analyzing Drupal 7.0 development history

I tried before to analyze the development of drupal history, and I end up creating a little project to actually track the scripts and other things I am using to do that.

Now that drupal 7.0 is out, I improve a little more those scripts named drupal contribution analyzer, and now I am uploading the results of that.

Results contains: a group of tag clouds(now based on processing + wordookie instead of wordle) and a codeswarm video(also some basic statistics in plain text).

So, no more waiting, here the results:


Activity by changed files


Activity by changed files(excluding CVS committers)


Activity by commits


Activity by commits (excluding CVS committers)


Codeswarm video

It is based on commit authors and commit attribution(extracted from commit messages), and it starts on the first D7 commit.


I've recently stumbled over

Would be interesting to implement the algorithm explained there for this too :)

It looks fine, I would try to have a closer look to that project soon(probably after git phase 2 is finished ;-)). It seems like I would need to do something like the hack I have done to manually generate input for codeswarm, but it would be definitely interesting.

Hopefully I would also be able to make a little research about how to show that information, there should be some techniques out there.

Any other suggestion would be great!

It seems that any of these methods acknowledge the number of lines or characters changed in the respective commits. Is that correct?

Individual developers often focus on either; many small patches, or; a few large patches. A method that attributes weight according to number of characters changed in the patch is probably going to be attribute effort more appropriately than the methods used here and elsewhere.

Of course "by-character" is also not correct because some characters require more effort to change than others. E.g. 10 words to update help text is probably going to take less effort than fixing a bug in complex regex or working out that a comparison operator should be >= instead of > in an out-by-one bug. But it is better than simply tracking how many commits a name was mentioned in, or how many files the commit touched.

The current methods do not know anything about line changes or changed characters, it is now all about commits and files.

It would be great to add more axes to the review, actually that's my plan, add as many relevant axes as I can. I do not really believe that one of those independently is the right axe, that's why I stated at least with two(with a minor variation on each for easy-reading n tag clouds).

So, I see the line changes kind of doable in a short time, as I am using git and it can extract that information easy, but I think it's not going to be straight-forward to do it by characters. It would be great to have those two new axes, so adding it to my todo-list for drupal-contribution-analyzer.

I love these. As the 7.0 release drew closer, I was really excited about seeing the contributor name clouds.

My favourite one is "Activity by changed files(excluding CVS committers)", because it happens to place the "Sun" in the centre of the Drupal solar system :-) Also, my username is written in green.

These would look great hanging on the office wall. Is there any chance of rendering some hi-res versions suitable for printing?

I spent about half an hour staring at the video, over and over, trying to catch all four(!) of my mentions. I got them all, but now I keep seeing names everywhere...

Nice to know that you like them!

The way the clouds have been generated, make the image a raster image, I mean drawn pixel by pixel, so that limit the output images to not have a scalable version.

But! the thing is that you can just simply generate a new version of the tag cloud with different parameters about the image size and the font sizes, so you can always make a bigger image. Take a look to the TagCloud.pde file for details ;-)