I’m the author of the article. I apologize for its massive length but wasn’t able to make it shorter without losing my favourite bits :) I’m happy to answer any questions people have here.
I thought this was very insightful and spot on for the most part, but I had one remark.
I object to the notion that it’s ok for bug ingress rate to be higher than bug egress. For me that’s symptomatic of underlying problems. Either (A) those bugs are on important features, and the team is favoring novelty over functionality by prioritizing new feature dev over feature maintenance, or (B) the broken features are unimportant and the team is failing to weed out irrelevant functionality from their codebase (it is important to remove features while you add them to not get in a zero-progress situation, unless you grow the team along with the codebase), or (C) the team has bad engineering practices causing a high ingress rate, or (D) many bugs are based on misunderstandings, which points to documentation or UX issues. No matter how you slice it, I see it as never acceptable to let a bug pile grow indefinitely.
Do you see this as a sort of compromise, as in it indeed pointing to deeper problems but needing a workaround in the real world, or do you disagree that a growing bug pile is symptomatic of deeper problems?
I disagree strongly with your take on story points:
First, if story points are an indirect measure of time, then the "psychological game" you're playing will be immediately revealed if your engineers are as smart as claimed. There is no reason for me to point something a 2 over a 3 unless you're measuring the time it takes to deliver software based off those measurements. On the opposite end of the spectrum, my confidence in whether a story is an 8 or a 13 becomes significantly weaker as the numbers get bigger.
Using numbers, specifically, is a hint that whomever is handling process just wants to predict when the project will be done. Numbers trick you into thinking they can be added, margins of error are not additive. My go-to question in these situations is why don't we just estimate with abstract sizes? (Small, Medium, Large, etc) Surprisingly I'm often met with resistance.
Second, if story points are not an indirect measure of time, then why are you pointing stories? What does the pointing gain your team other than fluff? If you say it's for prioritization then you're just invalidating the premise and we're back to an indirect measure of time.
Finally, I have not seen, and you have not presented, evidence that engineers are good at estimating. In fact all that I have read seems to indicate the exact opposite, that engineers are very bad at estimating (in fact that they are largely too optimistic). One could argue that this can be trained, or that you'll get better at estimating as you gain more experience. Which I will concede that you will get faster and better at estimating and implementing the same exact feature, but that is not what we do. We implement new features, things we likely haven't done before.
> First, if story points are an indirect measure of time, then the "psychological game" you're playing will be immediately revealed if your engineers are as smart as claimed.
Adding a level of indirection removes some implicit biases. Points might be indirectly related to time, but only incidentally. Points sound more directly related to task complexity, which is itself indirectly related to time.
It's kind of like how lines of code seems like kind of a poor measure of program complexity, and yet, study after study has shown that regardless of language, framework, etc. the number of line of code is the best measure we have for latent bug count. This correlation doesn't always make sense if you look at specific, contrived examples, but the property seems to hold in aggregate.
> Second, if story points are not an indirect measure of time, then why are you pointing stories? What does the pointing gain your team other than fluff?
The points can (and should) be assigned before the work arrives. Agile planning is a little more nuanced. There's business valuation (points from business development) which then become stories that are estimated during/over another sprint. The points don't correlate directly to hours because you haven't assigned points or know what resources will be on the estimated sprint. That is the responsibility of the Scrum master to handle. If people are vacationing, sick, replaced or there are new hires, you put a variable time-value on the points. Most importantly, you put a few stories in the current sprint off the top of the queue and pull as time allows. Telling the engineers that "all these stories must be done by the end of the sprint" negates the whole process.
> Finally, I have not seen, and you have not presented, evidence that engineers are good at estimating.
In general, nobody else is capable of coming close. That being said, I've blown up estimates to maximum by inquiring about specific details (what file, function, library, what repo, do you have creds for that, how long are the code reviews, etc) in a majority of tasks because some engineers are good as estimation for specific implementations, otherwise they concede on identified complexity/meta-process. Small, Medium, Large seems like the correct approach.
So what? Even though engineers realize that, as long as you're talking abstract points, and not holding me to a deadline, I have no reason to modify my estimate. And even if I -do- modify my estimate, as long as I continue with that new estimation mechanism for some period of time, velocity will change accordingly and the PO can -still- make accurate determinations of when something will be delivered. If I consistently under or over estimate things, I am predictable. The more constistent I am, the less time it takes to be uniform, and thus predictable. By not pinning it to time, the PO can determine the average velocity, the current estimations, and have a largely accurate idea of when things will be done. The only time this ever breaks is when I have a reason to change my estimation strategy, which only happens when you start applying incentives for me to change it. Every time my incentives change, my estimates will change; don't change my incentives. Don't give me a date or a deadline.
We could do this with hours except hours actually translate directly to time. From them I can determine when you think I should be done and thus the implied goal; points I can't. Because of that, I have incentive to overestimate when it comes to hours, to give myself extra time 'just in case'. With points I don't; they don't imply an end time. Even if the PO's estimates are wrong, they can't get onto me, because I never had a due date. Because of that, my estimates tend to be consistent (even if inaccurate), and consistency = predictability. That's the goal, making it predictable when things will be finished (within a given tolerance; hours never give us that). Whether points are big, or small, the PO can determine "This team has an average velocity of 20 points, whatever size those points may be. That means I can expect around 20 points per sprint in terms of stories. I have 50 points I need to get done for this next release, that means I can expect it in three more sprints". All done without the engineers ever having a goal they are trying to make, no incentive to change their estimates, and literally the best predictability (not accuracy; you don't need accuracy) you can achieve, which, it turns out, is actually pretty damn good in practice.
I think you're missing the fact that points only matter relative to each other. There is no absolute meaning. The author hints at that but I don't think is explicit about it, assuming you already know that. As such, a team tends to be consistent with them, even if not objectively accurate. And their mistakes average out into something predictable, even if not accurate. Because there is no incentive, conscious or unconscious, to massage it.
Some people -do- decide to use t-shirt sizes for sizing, rather than points. It still works. This is generally equivalent to 3, 5, and 8 using points (fibonacci). Most people who use points say anything less than a 3 is wrong (make it a three), and anything larger than an 8 should be broken up because otherwise it's difficult to estimate (with -maybe- a 13). So use whichever you like. S, M, L, and maybe XL, if you want to map to four options, as you have with points. Though you need a way to aggregate them to determine a velocity; how many S = 1M, etc. That's why people tend to use points.
Basically, the author said, flat out, it's a psychology game. AND IT IS. What a PO needs from a dev is consistency in how they estimate, not accuracy. From that you can measure the actual work completed over time, and get a measure of velocity, which can be used to accurately predict the delivery of future stories, within a pretty good tolerance. Ensure the psychology for that consistency is there. Points are part of it. No goals, milestones, deadlines, etc, are part of it.
My point is that no story points need to exist. If you want to average the amount of work done over a period of time then just count bugs and stories completed. The Central Limit theorem applies equally well to stories over the long term, so just collect data. Automatic.
We add story points, presumably because we don't want to wait to gather enough data on stories, but the neutral position is to not use them because they incur a cost (meetings, training to get consistency right, etc). So why have them? I have not seen a convincing argument.
Sure, we could treat all stories as being of a single size and rely on the central limit theorem, but that takes far, far longer to converge. On the order of years, I suspect, not weeks or months, which is what you get out of pointing. "We should have this done sometime between now and 2022" is not a useful metric for a PO.
We add story points because given consistent incentives, estimates tend to be consistent. Maybe consistently under or consistently over, and obviously there's a bit of wiggle room from estimate to estimate, but they tend to converge quite quickly, and to within a week or two's uncertainty of what we'll have done at any given point within the next six months, and within 6 weeks within the next year. Quite a far cry from not using them.
Meetings? Some, sure; you'd have them anyway just defining what it is you're doing, the additional burden to assign points takes up maybe half an hour per person per week or two. Training to get consistency right? There's no training involved. The hardest thing is to get people accustomed to picking a number, relative to the others. But that's not that hard either; we basically just took the first sprint's stories, organized them into a line (much like his rectangles) going from least to most complex, then discussed where to draw three vertical lines, separating them into four distinct sizes, 3s, 5s, 8s, and 13s. We then made sure the largest didn't feel too large compared with the other 13s (or else it might actually be a 21, just compared to what we had agreed a 13 was, and so we had to split it up), and from there we always had 'reference stories' to decide whether it felt more like a 5 or an 8, say. And while we sometimes differed, we could always hash out why we differed and come to an agreement on exactly how large the story was. Again, per the OP, so long as you are consistent with how you address those discrepancies, your overall estimates will be consistent, and give you predictability.
One half hour per week or two is hardly a huge cost to pay when it gives the business the ability to accurately predict when we'll have something delivered.
I don't care if the argument is convincing. I've seen it work. You're free to do whatever you want; I know what I've seen work, and what I've seen not work. I've yet to find something that works so well.
Your claims are outrageous with no supporting evidence.
If you just want to throw out anecdotes I'll give you my own. I collected data on all the pointed stories at one previous job for three years and found a negative correlation between story point and time from a story being started to being completed.
Sure, you might claim that it all averages out in the end, except that our velocities were wildly in flux for that entire span as well. But we were just "doing it wrong" right?
I doubt that you did, if I understand you. Because it sounds like you're saying that overall 3 pointed stories took the most amount of time, and 13 (or whatever your max) took the least.
But let's say it did. I'd look to see why your estimates fluctuated all over the place. Or whether stories were being closed when they were actually finished (i.e., being accurately reported). Did you have deadlines? Did you have delivery pressures? Because that right there is a good reason; as you near a deadline you start padding estimates more. Did you keep having things come up that broke the sprint? Etc. All manner of things can cause estimates to be wrong. But not negatively correlated, -especially- with velocities constantly in flux (and I mean seriously in flux; you take an average because it can and will vary, especially if there's unexpected stuff, like someone getting sick, that you didn't account for when planning); that to me, yes, definitely sounds like you were doing something very, very wrong.
If you look at the first Central Limit Theorem slide in the original article, it compares story precision vs precision of unestimated bugs. The short answer is that if your stories are big, not estimating them causes more error (weeks or months) than most people are willing to accept. Not so with small tasks (bugs) which average out, as you suggest.
However, it’s hard to make long-range estimates using only many tiny stories/bugs, because you don’t want to break the job down with such granularity for months or years into the future - plans will change by then, and all that design work will have been wasted. That’s what makes big stories useful; you can estimate months of work in a few minutes. But because they’re so big, you can’t treat them all as equal sized.
Except you can't, as I've said, engineers are not good at estimating. They get significantly worse when you start estimating months out instead of days out.
The problem is not that they're engineers, no one is good at estimating work they've never done before. This is a well known problem in pretty much every single software shop I've ever been in. Teams never deliver what was planned on time, only functional teams cut features for releases. This is not a "win" for estimation.
> Except you can't, as I've said, engineers are not good at estimating. They get significantly worse when you start estimating months out instead of days out.
Because you keep focusing on time estimates instead of point estimates. People have intrinsic biases related to time and their productivity. Like how most people implicitly assume they're above average in intelligence, looks, etc.
Wonderful article, read it like a blockbuster with a constants clicking in my brain. Every time I tried to impose some sort of sprints for myself and the team it always end in Friday superdense coding with a half task moving to the next sprint. Thank you for a great argumentation why precise estimation is a bullshit bingo, I tried to show to PM's that only a steepness of slope is important and managers can plan work without exact dev-hours, but without much success, your explanation is clear and easy to understand.
Super interesting points here. You mentioned that the real-world Kanban board forces you to empty slots to make room... I've never seen that in any of the software Kanban-style systems have you?
I've played around (in spreadsheets and basic apps) with trying to create systems that scaled available slots to team size as a way to force correct granularity.
No, I've never seen it enforced by tools. Physical (index cards) kanban boards have an implicit space limit though, and this is one of the too-seldom-acknowledged reasons why they work as well as they do. Unfortunately the software clones of physical kanban boards copied the unnecessary part (visual appearance of index cards) and not the necessary part (limited space at each phase).
Rally lets you set card limits in each column, though in practice that always seemed too easy to change (just add a couple more to the limit; just "temporarily" turn off the limit) to be entirely beneficial.
Are you talking about work-in-progress (WIP) limits? I've seen tools enforce this so you can't pull another story into the current list. Other tools let you pull but highlight the fact that you're going over the WIP limits by making the whole column eg. red. Or do you mean something else?
I've used LeanKit's Kanban board to enforce per-lane WIP limits. It works quite well and is, in my opinion, one of the most important aspects of using Kanban boards for software development.
First, thank you very much for the write-up, I sure found it interesting as well as entertaining.
When using bug trackers, I find the most frustrating aspect is the "non-linearity" of the workflow. By that I mean, how do I answer the question "What am I supposed to do next?". You can sort by project, or by priority, but what I typically end up with is a list of items that I already looked at a dozen times. And even though I don't want to look at them a second time, I haven't found a way for a bug tracker to do that for me. Ideally, I would want to look at each task at most two times: Once for triaging, and once for working on it. That's it.
So the way I understand it, you're trying to address this, at least partially. A task starts out as untriaged, then you tag it as triaged, and that means you only had to look at it once for triaging. Which is great, because it's a linear workflow.
Some tasks are obviously critical, and will end up in the next open release milestone. Some belong to a feature that is not released yet. But what about the stuff that ends up in the backlog? These smallish, nagging bugs that are not super-critical. That is the big, ugly pile that keeps growing and growing. How do you keep that big pile manageable? Ideally, that pile shouldn't become big in the first place, but how do you prevent that from happening?
The way I like to do it is to keep multiple backlogs, broken down roughly by feature group (ie. groups of things that might eventually become a milestone or story of their own). Then when I decide to prioritize a given story, I can go to the relevant backlog and re-triage only the bugs in there, which should hopefully be a relatively manageable number.
Another benefit of sub-categorizing this way is that it makes it easier to resolve bugs as duplicates. When a new bug is filed, it's hard to see that it's a duplicate when you're comparing it against 10000 other bugs, but it's easier when you're comparing it against only 100 other bugs in the same category.
I doubt you'll ever be able to get it down to "at most twice." But it needs to be much easier than ever resotring to "looking through the whole list."
>its massive length but wasn’t able to make it shorter
It's always possible to make it shorter to help communicate a main idea. (However, it does takes extra effort to extract the essence of a long piece.[1]) Your essay is ~15,000 words and desperately needs a short "elevator pitch" of its main points. I'm a very verbose writer so when I think others' writing is verbose, it means everybody is going to drown in the text.
Here's my summary of what I think you're trying to say:
1: There psychological problems with deadlines and large bug lists that cause counterproductive results
2: I discuss 2 psychological "tricks" in software development to counteract the unwanted behavior
2.a: Estimate software by abstract units such as "points" instead of concrete units such as "time/hours/weeks". The abstract units bypass the human biases that lead to bad estimates. Use the points to determine "relative" sizes of each "story". (E.g. Developers vote to converge on the "size" of each story point.) The last step is to multiply the points by a unit of time to derive a finish date.
2.b: Do not have a big global list of bugs to burn down. The size would be overwhelming and demoralizing to teams. Instead, triage bugs into smaller "hot lists" so they "see" a smaller manageable queue to work on. Also, measuring bug fix times will eventually let you derive an "average bug size" that's reasonably accurate
The tldr would be something like "Here are 2 counterproductive management techniques with setting deadlines and assigning bug fix work -- and here are 2 ways to counteract it with management ideas that take advantage of human psychology."
Somebody else can wordsmith it better than I can but that's what I think your essay is basically about. The 15000 words are mostly examples or background ideas leading up to your recommendations (SLO vs SLA, Tesla, what I like don't like about Agile, Kanban, etc).
I recommend that you put your strongest main points at the very top to give your readers the mental scaffolding to hang the rest of your 15000 words on.
I know what you mean, and in general I try to follow that advice. In the particular case of project management though, I'm frustrated by the huge amount of too-short and contradictory advice floating around on the Internet; adding one more unjustified summary to the pile doesn't help. So I think the extra details are important. And when you have that many details, the tangential expository fluff helps keep it interesting. I hope.
This is also why I didn't summarize everything at the top: that would encourage people to just read the top and stop there. They can do that with project management advice anywhere on the Internet. There's a place for that, but there's plenty of it already.
FWIW, I liked this format. It reads like a presentation and digresses and regroups back to the main thread throughout. It may make it less widely read but I think it may also make it more memorable and loved.
The value for the reader is in the act of chewing over a familiar problem along with your guidance. If the main points were made more obvious, then perhaps it also becomes more boring.
If you want to propose a more tangible recommendation/guide/process then yeah, give people hooky, easy to remember bits.
>So I think the extra details are important. [...] This is also why I didn't summarize everything at the top: that would encourage people to just read the top and stop there.
There's an opposite way to look at it: a good summary acts as a "hook" and entices readers to read the rest of 15000 words. I wasn't suggesting you delete the extra details. Instead, the bullet points at the top give the reader a "road map" to the rest of the long article.
>They can do that with project management advice anywhere on the Internet.
Well, you said the other articles out there are contradictory ("doesn't work and makes things worse") so there's your hook: you have a superior method.
If you prefer to write in a style that "unfolds" that's understandable. A writer can hold an opinion on the best way to present his ideas.
That said, I'll offer some counterpoint. A web surfer may have 20 browser tabs open as a "todo list" of unread blogs. The email inbox has a bunch of unread messages. There's also a stack of new candidate resumes he's supposed to read. That random person then clicks on your blog and sees the shaded rectangle in the scroll bar get real tiny which visually indicates it's a very long piece of text. 15000 words is ~1 hour of reading.
Since you're not a household name among famous authors, a lot of people just won't start reading it on faith alone.
They don't trust you enough yet that it will eventually unfold with an amazing insight. Instead, many busy people will just ignore it because there are so many other items competing for their attention. In particular, the project managers and business executives you want to reach and internalize your recommendations are especially prone to skip long articles. One hour articles are really making a huge demand from multitasking managers so they need a nudge to see if it's worth their time.
There's a glut of information overload out there and long articles can act as "RADIOACTIVE - DO NOT ENTER" signs to the people you most want to convince of your ideas.
I decided it was worth my time and attention because the HN comments were mostly positive. Once I skipped the personal intro, the argument was sufficiently engaging to keep me going.
I think the author is partly right that a breakdown might be more harmful because you lose too much information. HN and reddit comments are how I decide whether something is worth my time, so a summary isn't necessarily beneficial.
Amazing stuff. It was a real pleasure to read. Despite its length I couldn't get myself to skip ahead or skim.
We're now in the process of switching to a structure similar to Basecamp's 6-week cycles[0]. And those cycles obviously do have a deadline. However, I would still say this kind of deadline is better than your typical one for a couple of reasons:
1. The team is self-managing. Typically the team was involved in the pitch process for the project, so they have vested interest in getting this done. I think this is key to avoiding the Student problem. It's no longer an assignment dropped from above, but something you are keen to push forward.
2. The cycle is 6 weeks rather than a sprint of 2... So this feels more like a slower-pace mini-marathon. And the team has autonomy to drop features or make adjustments. I admit that's a weaker argument for it than the one above.
Agile is meant to be used for non-software delivery projects as well.
In your view, with the parts you crossed out (including the physiological/motivational structures)-- would Agile still be widely applicable outside of the software-oriented projects?
Couple of, say, hypothetical examples, perhaps for non-software projects
-- developing & submitting scientific grant application
-- organizing a non-trivial longitudinal survey
-- looking for college for kids
-- designing a motorcycle with unique frame/engine layout
In my opinion (and you should take it as an opinion :)), the "good parts" of Agile are the same for both software and non-software projects. Estimation, strict prioritization, and (automated) progress tracking are the keys to any successful project management.
Very interesting article, thanks for writing it and sharing it, I forwarded it to my PM :) . I do think it could have been edited without losing your favourite bits though :)