Tuesday, December 21, 2004
How we whooped up on a competitor at TKG
Why am I passionate about timesheet software?
The question seems insane as I look at it there hanging on my screen, so I'd like to tell a story that will help you understand my psychosis.
In 1991 I started working for The Kernel Group in Austin TX (a.k.a. TKG).
Our primary contract was with IBM here in Austin fixing bugs in the AIX operating system, which was a Unix based operating system running on their RS/6000 computer.
We had a contract initially where we fixed these bugs for $2000 per bug, regardless of difficulty, and the company was originally seeded with 200 of these bugs in a $400,000 contract that was partially prepaid in order to fund the genesis of the company (i.e. so we could afford to quit our other jobs.) Such an unusual arrangement with IBM is uncommon, and Jeff Smith, an IBMer, was the main person who stuck his neck out to get it for us. He is creative and entrepreneurial (well for an IBMer anyway.)
Everyone in the company was a computer programmer, expert in the C language, except for one person, Weston Binford, who was the business genius who came up with the idea of applying the fixed cost concept to bug fixing instead of just charging 50$/hour, which was where the programmers' minds were at. Unlike the rest of us, Weston barely knew how to turn a computer on, much less program one, and we always enjoyed torturing him for that.
The argument of the programmers was that some bugs were easy and some were hard, very hard, and we were going to get screwed over.
Weston's argument was that statistically 200 bugs should be pretty consistent and we'd find out soon enough if $2k per bug was a fair price.
It turns out Weston was right. Very very right.
Bugs were relatively well defined due to the bug tracking software used, CMVC. Each one had a number and a description and a method of understanding what specific source code changes in the C language were made in order to fix it (if any.) In other words, the scope of each of these small projects was pretty understood in most cases.
Over violent opposition, Weston eventually got us all to track our time spent on a per bug basis as well as the classification of the bug, i.e. was it in the networking code, the library code, the kernel, NFS, NIS, etc. I worked on NFS, NIS, lockd, and automount bugs mostly. I worked for John Maddalozzo, and he is a pleasure to work for.
Anyway, we all came to understand because of this dual time and defect tracking methodology that sure enough some bugs were more profitable than others. In some cases this may have been due to programmer competence, coziness of the relationship with the IBMer who was delivering the bug to us, or to IBM's customers' communication abilities, or to the real technical difficulty of the problem. I've always thought that last quality was less important than others thought it was.
But it didn't really matter that much why at first. We just knew that some made more money for the company than others. And it was important that we knew that. And since all of the consultants had a share of the company and were incented with the potential for profitability bonuses, that data became viscerally important, despite everyone's' distaste for collecting it.
We became able to quantify precisely which types of bugs were profitability winners for TKG and which were not. We also became able to quantify productivity improvements born of new tools we'd invent to help with our debugging and understand if the development of those tools had been a winner or a loser (because we also tracked how long it took to develop them.)
This analysis helped encourage the company to allow Robbie Robbinette (as one example) to develop tools which enabled him to personally solve over 500 memory related bugs in the X-Windows server and libraries for IBM's customers (many of which turned out to be in the customers' code).
For those of you keeping score, that means Robbie directly resulted in $1 million dollars of revenue for TKG based largely on automated tool she wrote.
And sometimes it really didn't look like he was working all that hard.
This is just one example of the way the fixed-cost model combined with the ability to measure our own productivity helped us learn how to be many times more efficient as programmers than we'd ever been before or would ever have been otherwise. When I say we got many times better as programmers, that is not an exaggeration at all.
It also means that IBM's customers got fixes very fast. Faster than they could get from Sun or HP.
That tool grew into a product which was released in the general market eventually (ZeroFault) and it would have never even been a consideration for development in the first place if we'd been getting paid hourly. We'd be like a bunch of lawyers, losing things and billing more time because after all, what's the rush? IBM's customers would have suffered.
Eventually many of the IBMers noticed that we were fixing a lot of bugs and they did the math and figured out we were making a lot of money. Sure the customers were happier, but happy customers don't get attention,only bitchy ones do. They knew that IBM's customers loved us, but our BMWs were pissing them off big time.
So the system naturally started to evolve. For one thing we started no longer getting a random assortment of bugs, we started getting ones that all the IBMers and non-TKG contractors had worked over for months and couldn't fix. The truly impossible ones.
Also many of the easier ones were getting handled by others before we got a crack at them.
We negotiated for more 'randomness' with limited success, diversified into other customers, and worked towards charging 'out of paradigm' pricing on the truly hard bugs that were provably non-random.
Our revenue out of this part of IBM was approaching $3 million and IBM decided to bid the work out to multiple vendors, and select two winners to compete for work on a daily basis.
We knew exactly how much to bid on each bug type since we knew our costs so well. We also knew which ones were so hard that we basically had a technical monopoly and we bid those really high.
Pencom ended up being the 'winning' competitor against us (we also won.) We knew that certain bugs were really easy and so we bid a low price on them. Pencom bid lower, and won them, but we knew they'd never make money at that price.
After six months of effort in the defect shop, Pencom ran away crying like a bunch of babies and we got all the work again, albeit at an overall higher price.
So the time data allowed us to pick tactical directions for the company (which bugs to monopolize) and to make very accurate bids (thereby killing a competitor). It also helped people understand their value to the company and their own level of efficiency vis-a-vis their peers and their own past performance. It helped us understand things about IBM's OS's that IBM never understood (such as bug complexity and prevalence.)
The time data was the most valuable data TKG had and in retrospect it was worth millions of dollars to us.
I have another story about time tracking at Tivoli that I'd like towrite down too at some point. If anyone reads this far (and has interest) I'll write it.
-curt
The question seems insane as I look at it there hanging on my screen, so I'd like to tell a story that will help you understand my psychosis.
In 1991 I started working for The Kernel Group in Austin TX (a.k.a. TKG).
Our primary contract was with IBM here in Austin fixing bugs in the AIX operating system, which was a Unix based operating system running on their RS/6000 computer.
We had a contract initially where we fixed these bugs for $2000 per bug, regardless of difficulty, and the company was originally seeded with 200 of these bugs in a $400,000 contract that was partially prepaid in order to fund the genesis of the company (i.e. so we could afford to quit our other jobs.) Such an unusual arrangement with IBM is uncommon, and Jeff Smith, an IBMer, was the main person who stuck his neck out to get it for us. He is creative and entrepreneurial (well for an IBMer anyway.)
Everyone in the company was a computer programmer, expert in the C language, except for one person, Weston Binford, who was the business genius who came up with the idea of applying the fixed cost concept to bug fixing instead of just charging 50$/hour, which was where the programmers' minds were at. Unlike the rest of us, Weston barely knew how to turn a computer on, much less program one, and we always enjoyed torturing him for that.
The argument of the programmers was that some bugs were easy and some were hard, very hard, and we were going to get screwed over.
Weston's argument was that statistically 200 bugs should be pretty consistent and we'd find out soon enough if $2k per bug was a fair price.
It turns out Weston was right. Very very right.
Bugs were relatively well defined due to the bug tracking software used, CMVC. Each one had a number and a description and a method of understanding what specific source code changes in the C language were made in order to fix it (if any.) In other words, the scope of each of these small projects was pretty understood in most cases.
Over violent opposition, Weston eventually got us all to track our time spent on a per bug basis as well as the classification of the bug, i.e. was it in the networking code, the library code, the kernel, NFS, NIS, etc. I worked on NFS, NIS, lockd, and automount bugs mostly. I worked for John Maddalozzo, and he is a pleasure to work for.
Anyway, we all came to understand because of this dual time and defect tracking methodology that sure enough some bugs were more profitable than others. In some cases this may have been due to programmer competence, coziness of the relationship with the IBMer who was delivering the bug to us, or to IBM's customers' communication abilities, or to the real technical difficulty of the problem. I've always thought that last quality was less important than others thought it was.
But it didn't really matter that much why at first. We just knew that some made more money for the company than others. And it was important that we knew that. And since all of the consultants had a share of the company and were incented with the potential for profitability bonuses, that data became viscerally important, despite everyone's' distaste for collecting it.
We became able to quantify precisely which types of bugs were profitability winners for TKG and which were not. We also became able to quantify productivity improvements born of new tools we'd invent to help with our debugging and understand if the development of those tools had been a winner or a loser (because we also tracked how long it took to develop them.)
This analysis helped encourage the company to allow Robbie Robbinette (as one example) to develop tools which enabled him to personally solve over 500 memory related bugs in the X-Windows server and libraries for IBM's customers (many of which turned out to be in the customers' code).
For those of you keeping score, that means Robbie directly resulted in $1 million dollars of revenue for TKG based largely on automated tool she wrote.
And sometimes it really didn't look like he was working all that hard.
This is just one example of the way the fixed-cost model combined with the ability to measure our own productivity helped us learn how to be many times more efficient as programmers than we'd ever been before or would ever have been otherwise. When I say we got many times better as programmers, that is not an exaggeration at all.
It also means that IBM's customers got fixes very fast. Faster than they could get from Sun or HP.
That tool grew into a product which was released in the general market eventually (ZeroFault) and it would have never even been a consideration for development in the first place if we'd been getting paid hourly. We'd be like a bunch of lawyers, losing things and billing more time because after all, what's the rush? IBM's customers would have suffered.
Eventually many of the IBMers noticed that we were fixing a lot of bugs and they did the math and figured out we were making a lot of money. Sure the customers were happier, but happy customers don't get attention,only bitchy ones do. They knew that IBM's customers loved us, but our BMWs were pissing them off big time.
So the system naturally started to evolve. For one thing we started no longer getting a random assortment of bugs, we started getting ones that all the IBMers and non-TKG contractors had worked over for months and couldn't fix. The truly impossible ones.
Also many of the easier ones were getting handled by others before we got a crack at them.
We negotiated for more 'randomness' with limited success, diversified into other customers, and worked towards charging 'out of paradigm' pricing on the truly hard bugs that were provably non-random.
Our revenue out of this part of IBM was approaching $3 million and IBM decided to bid the work out to multiple vendors, and select two winners to compete for work on a daily basis.
We knew exactly how much to bid on each bug type since we knew our costs so well. We also knew which ones were so hard that we basically had a technical monopoly and we bid those really high.
Pencom ended up being the 'winning' competitor against us (we also won.) We knew that certain bugs were really easy and so we bid a low price on them. Pencom bid lower, and won them, but we knew they'd never make money at that price.
After six months of effort in the defect shop, Pencom ran away crying like a bunch of babies and we got all the work again, albeit at an overall higher price.
So the time data allowed us to pick tactical directions for the company (which bugs to monopolize) and to make very accurate bids (thereby killing a competitor). It also helped people understand their value to the company and their own level of efficiency vis-a-vis their peers and their own past performance. It helped us understand things about IBM's OS's that IBM never understood (such as bug complexity and prevalence.)
The time data was the most valuable data TKG had and in retrospect it was worth millions of dollars to us.
I have another story about time tracking at Tivoli that I'd like towrite down too at some point. If anyone reads this far (and has interest) I'll write it.
-curt