Topic "CVS Archeology"

With the introduction of the open software development paradigm a technique for harvesting information has gained increased popularity that relies on automatized extraction from revision control systems (RCS). Especially in the beginning of the scientific research on Open Source we have seen a large amount of studies conducted with little more but statistical analysis from the cvs-logs. This seminar topic aims to talk about the methodological issues at hand and provide examples of how much information with what reliability can be gathered.

If you are interested in this topic you might also want to have a look at the thesis about Isolation of past defects in Open Source projects (in German) that Sebastian Jekutsch is currently offering in the same research area.

Key Questions

  • How is CVS archeology performed (technique, performance hit, time constraints)?
  • What information can be gathered?
  • How far can we trust information gathered from there? What are methodological issues?

References