Page 2 of 2

Re:C Source Code Counter

Posted: Wed Jan 18, 2006 1:56 am
by Colonel Kernel
Solar wrote:As long as you keep in mind that a F-22 isn't the same thing as a DC-3. And that's where the problem is: Line count is not comparable between projects unless it's the same team working in the same general business area on a problem of comparative complexity under comparable conditions.
You're absolutely right, and that's exactly what I said :) The best estimates for a given project are always based on historical data from the same organization that's estimating the project. Typically the organization has some kind of continuity in terms of the types of projects, languages used, etc. I work for a software development services company (i.e. -- we develop software for other software companies), so you'd think we'd be working on completely different projects all the time. While there is a certain amount of variety, there is also a lot of continuity -- the Mac experts typically do Mac porting work, the database experts (like me) do database-related work, etc.
I know quite something about software estimation, and all I've read so far makes me uneasy. I keep getting the feeling all these "metrices" are basically for people who don't understand the job (i.e., PHB types).
It sounds that way when reading about it. A lot of the books on the subject are very dry and horribly written (e.g. -- the book on COCOMO II... yuck). That's why I recommend Rapid Development -- it's written in a much more accessible style.

FWIW, I think your cynicism in estimation is misplaced, even though you're not alone in feeling that way. I've helped to design a relatively lightweight estimation process for a few of my company's teams, and it's worked remarkably well over the past few years. As I said, historical data is the key to making it work. We did our first few estimates using COCOMO II, and they were ok (I think maybe 15-20% too low), but the actuals we recorded from those projects gave us a baseline for subsequent estimates, which became more and more accurate as we used our historical data to calibrate our estimation tool.

continued...

Re:C Source Code Counter

Posted: Wed Jan 18, 2006 1:57 am
by Colonel Kernel
The process is really simple when you strip it down to the essentials:

Requirements -> Functional decomposition -> Size estimate -> Estimation tool -> Effort, cost, and schedule estimates

First you gather some high-level requirements and try to get a reasonable functional breakdown of the software to be built. You want to break it into pieces that individual developers would feel comfortable assigning a SLOC number to.

Then you get a bunch of developers to come up with these SLOC numbers on their own (no peeking at each other's estimates). This is where a tool like Surveyor (or even your find command, as long as everyone is using the same command) will come in handy, as it allows developers to measure previously written code that they feel is similar to the proposed modules.

Next, everybody gets together and the anonymous results are compared. There is a certain amount of give and take as the team tries to reconcile the outlying data points. In my experience, the outliers are always from those who didn't fully understand the requirements. The majority of the numbers have an almost uncanny tendency to converge (which I think reflects the experience of the estimators).

Once the size estimates have converged enough (leaving room for uncertainty), they are tallied up and fed into an estimation tool (I use Construx Estimate, which is a bit unstable but at least it's free of charge) that is calibrated with past projects' actual effort and schedule data. The estimation tool, via all that PHB math you alluded to, then produces cost and effort estimates, along with staffing recommendations and scheduling options. It's pretty neat when it all works (and I've seen it work several times to great effect).
The best example being not counting comments. They take time to write, they add to the quality of the software, yet still virtually everybody insists on not counting them. As for blank lines, use a source reformatter to get your sources into a known format (you should do that anyhow), and you can even compare blank lines.
If you want to count comments, then count comments. The point is, if your estimation process takes past historical data into account, and you count comments when looking at code you've already written, then you're using a standardized (for you) size metric, which keeps everything consistent. Things only break down if you're using industry productivity data instead of your own historical data, but this is generally considered unreliable anyway.
I don't really care much what someone wrote about line count. I'm aware that I've been working on F-22 type software and DC-3 software, working in a crack team and working in a team of people who never should have touched a compiler. I've worked in a happy little software house and I've worked in a big corp where the word "outsourcing" was uttered twice a day. I've worked in Visual Studio and I've worked with little more than vim and gcc, worked on C++ source that was "C with classes" and on C++ source that was thick with templates and multiple inheritance.
Me too... what's your point? I've been able to work without excessive schedule pressure because I can produce really, really good estimates. Is that bad?
Line count is one factor. Many, many others come into the equation. As such, I consider anything beyond that "find"-statement above overkill.
Those many other factors are accounted for in later stages of estimation. That's how you arrive at an effort number. However, size numbers are independent of the skill level of your team or any other external factors. Size is intrinsic to the software that's to be built, which is what makes it a great proxy measure to get the estimation process started.

Go ahead and use find. As long as you use the same find command and keep things consistent, it shouldn't matter. I just find Surveyor easier to work with and more powerful.

Re:C Source Code Counter

Posted: Wed Jan 18, 2006 2:33 am
by Solar
The whole thing boils down to a difference in focus: You use line count as the basic, underlying metric. I refuse using line count in any way (except for telling me whether I really want to look at someone's code), and rather use relative functionality as a metric. (No, I don't do complete function point analysis either. ;-) )

For example, I've written two pieces of software so far that somehow parse XML. The first was a file-based client (read XML, query server, write XML) written under deadline pressure, and I didn't know a thing about XML when I started. I conjured up a rough "custom" parser for the exact XML dialect the tool would be using. About 1500 lines of code.

The other piece of code was for a test driver - quickly setting up hierarchical data structures for testing. Basically the very same functionality as the tool above, but in the meantime I've learned about Xerxes, SAX et al. I got away with 400 lines of code.

The point being, the second tool is much more flexible, expandable, and standard-compliant. It also took me three times as long to get stable, because I had to cope with the Xerxes API whereas the first tool was standard-lib only.

My estimation on the time required for either tool was correct to within 10%, because I knew the second tool would be much more functional despite the lower line count.

But as I said, I don't try to convince you to stop using line count for anything. Just accept that you can do very fine estimations without it.

Re:C Source Code Counter

Posted: Wed Jan 18, 2006 3:06 am
by Candy
Solar wrote: For example, I've written two pieces of software so far that somehow parse XML. The first was a file-based client (read XML, query server, write XML) written under deadline pressure, and I didn't know a thing about XML when I started. I conjured up a rough "custom" parser for the exact XML dialect the tool would be using. About 1500 lines of code.

The other piece of code was for a test driver - quickly setting up hierarchical data structures for testing. Basically the very same functionality as the tool above, but in the meantime I've learned about Xerxes, SAX et al. I got away with 400 lines of code.

The point being, the second tool is much more flexible, expandable, and standard-compliant. It also took me three times as long to get stable, because I had to cope with the Xerxes API whereas the first tool was standard-lib only.

My estimation on the time required for either tool was correct to within 10%, because I knew the second tool would be much more functional despite the lower line count.

But as I said, I don't try to convince you to stop using line count for anything. Just accept that you can do very fine estimations without it.
You mainly prove that a language doesn't have to be like the language.

Re:C Source Code Counter

Posted: Wed Jan 18, 2006 3:24 am
by Solar
Candy wrote: You mainly prove that a language doesn't have to be like the language.
Correct. I simply doubt that any two projects are really "similar" enough to allow relative estimations based on code lines. You can factor in code lines. Or man hours. Or function points. Or whatever. The real result is depending on many different factors, and you have to be aware of them or the best code-line counter won't help you.

That is to say spending hours of work and probably even licensing budget on a tool that counts your lines of code is... ahem. 8)

Re:C Source Code Counter

Posted: Wed Jan 18, 2006 9:44 am
by Colonel Kernel
Solar wrote: That is to say spending hours of work and probably even licensing budget on a tool that counts your lines of code is... ahem. 8)
Of course. Surveyor is *free*. I wouldn't waste my budget on that. ;) The estimation tool that cranks out effort and cost numbers along with pretty reports might be worth a penny or two though...