Wednesday, May 25, 2011

Live Blog from ICSE: Interesting empirical studies

This is my fourth blog post from ICSE, following on from this morning's post.

The first paper in this session was An Empirical Investigation into the Role of API-Level Refactorings during Software Evolution, by Miryung Kim, Dongxiang Cai, and Sunghun Kim

Conventional wisdom suggests that refactoring improves quality and reduces technical debt. However in this paper, the authors found evidence to suggest that the picture isn't so rosy. The speaker pointed out existing research that shows that refactoring tools tend to be buggy, and are not used properly.

She focused on API level refactorings (rename, remove and signature changes at or above the header level). She contrasted these with bug fix revisions, and non-api-level revisions.

She showed that in the systems she studied there was an increase in fixes after API level refactoring. She said this was a surprise, but it was not a surprise to me since it confirms my coding experience. Half of the problems were incorrect refactorings done at the same as fixing bugs. She found that fix time for a bug is faster if refactoring is done first. This confirms best practice. She also found that despite the concept of 'code freeze' there seemed to be more refactorings and more bug fixes just before major releases.

The second paper was entitled Factors Leading to Integration Failures in Global Feature-Oriented Development: An Empirical Analysis, by Marcelo Cataldo and James D. Herbsleb

The presenter started out by defining what a feature is: Both a unit of functionality in end-user terms, and as managed by a project.

The research setting was automotive entertainment systems; they studied 179 engineers in 13 teams, over 32 months. These people were developing 1.5 million lines of code in 6789 source files.

They studied various technical attributes of the features (changed lines, and various aspects of dependencies) as well as organization aspects.

They measured integration failures, and analyzed the factors that may have led to these. The technical factors that mattered most were the number concentration of dependencies. The organizational factors were geographical distribution of team members, and lack of ownership of the feature by those that also owned coupled features.

A lot of this seems to correspond to expectations, however it is nice to have empirical results that confirm those expectations.

However one non-intuitive observation was that when the number of dependencies reached a particularly high level, a collocated team actually experienced worse results than a geographically distributed team.

The final paper in the session was entitled Assessing Programming Language Impact on Development and Maintenance: A Study on C and C++ by Pamela Bhattacharya and Iulian Neamtiu

The presenter indicated the purpose of their research was to determine which programming language (C or C++) led to fewer bugs, reduced complexity, greater ease of extension and better maintainability. They set out to measure the effect of the chosen language while controlling for such factors as development process and developer expertise.

They cited the site on language popularity, as well as the Tiobe index, which show use of langauges as chosen by developers, without a formal understanding of which would actually give the best results.

They studied four long-lived open-source projects: FireFiox, VLC, Blender (a 3D suite) and MySQL C++ is gradually taking over from C in the first three of these, but not Blender. The Blender developers told the researchers that they were more comfortable in C.

The researchers tested and confirmed the hypothesis that C++ code was of higher internal quality than C code. They also tested and confirmed the hypothesis that C++ code is less prone to errors. Their final hypothesis, that C++ code requires less maintenance than C code, was also confirmed.

I found these results interesting, since in my mind C++ is vastly more complex than C, but at the same time has better abstraction. It seems abstractions win.

There was an interesting discussion from the audience: One participant pointed out that the type of code suitable for C development may have been inherently different from the type of code suitable for C++ development, thus invalidating the results. Another pointed out the fact that C is less dense than C++, requiring more lines of code to do the same thing, this invalidating some of the metrics used. I am inclined to agree with these criticisms; as is the case in many scientific studies, we have to be skeptical unless rigorous controls are in place, and even then we have to remain skeptical until there are many replications.

No comments:

Post a Comment