The Perfect Commit

donatj · on Oct 29, 2022

I am a firm believer that a commit should rarely be more than a couple minutes of work. When I go to git-blame, I want the commit message to be the why of this specific change. When I git blame and find a commit that's thousands of lines I wonder why we're even using git at that point and not just saving zips.

The proponents of PR Squashes irk me. You're losing all the value of git. In the future when I'm hunting down a problem I almost never care about what feature or fix this was a part of - e.g. the PR - I care entirely about the context of the developers mind when he committed that specific line of code.

In the rare case you do actually care what PR a commit is from, GitHub's search will tell you.[1]. There's no need to put that information in a commit at all.

I was recently hunting down the reason for a single ! casting a string to a bool. The blame on the file lead to a large commit with details about the project, but nothing about why that specific line of code was changed.

Was it an accident? Did it serve an actual purpose? I might never know.

Commits are cheap. Blames are easy. There's zero need to try to keep your tree "looking nice".

1: https://github.com/donatj/CsvToMarkdownTable/search?q=f4c167...

ilammy · on Oct 30, 2022

> I care entirely about the context of the developers mind when he committed that specific line of code.

In squash workflow, the PR is the context and the unit of change. It’s shoehorned onto git only because of GitHub.

In some way it’s a self-fulfilling prophecy: if you disregard individual patches then individual patches will be disregarded. It could also be attributed to git being hard to use, with all these “commits” and “patches”. So some developers treat Ctrl-S as the “commit” hotkey, putting whatever state of the codebase into commits, keep stringing them. Then in the end just ship it for review as is, since it’s the path of least resistance, never even expecting anyone to review individual patches, because as far as they are concerned they are not sending in patches, they are sending the PR. Then maintainers are faced with these PRs, and then they have a choice: either enforce the quality of “commits”, or review the PR as a whole and disregard the commits. The choice often falls for the latter, because the former does not provide any practical benefits for the process (rather than theoretical “but we’ll have individual commits when 5 years later someone blames them”), while the latter does provide practical benefits of not alienating developers with subpar commit discipline.

u801e · on Oct 30, 2022

> In squash workflow, the PR is the context and the unit of change. It’s shoehorned onto git only because of GitHub.

This could be useful if a PR is essentially changing only one thing like a commit should, but since PRs usually cover a feature or a fix for a ticket, these frequently involve doing multiple things to implement the feature or fix the bug.

> the [option of enforcing the quality of commits in a PR] does not provide any practical benefits for the process (rather than theoretical “but we’ll have individual commits when 5 years later someone blames them”),

Your parenthetical clause is not theoretical. It's basically a thing I have to deal with on a daily basis when I'm fixing bugs or removing/adding features in multiple code bases that were written by people years ago who no longer work there. I can't really make use of git blame because the commit messages are non-informative, the diffs are just work in progress saves, and the linked PRs just have looks good to me comments and no description.

But for code bases where commit quality was enforced, git blame actually becomes useful and allows me to see what was done and why it was done at the time it was written.

ilammy · on Oct 30, 2022

> since PRs usually cover a feature or a fix for a ticket, these frequently involve doing multiple things to implement the feature or fix the bug

Hence the calls to “split changes into multiple smaller PRs” you hear so often, since the scope of the unit of change is actually important.

If your unit of change is a PR, you'd want PRs to have limited scope & nice description, and you would not really mind that a feature requires 10 PRs to implement.

> Your parenthetical clause is not theoretical.

It is absolutely theoretical until it is required in practice.

Your reality is dealing with codebases that span multiple years. Other people might have different circumstances. They might not have come to value the tidy history made of self-contained commits because they never had a need for one. Either perceived, or quite real.

Why would you need git blame if you stay at a company 2 years tops? Why would you need git blame once your feature is effectively rewritten? twice? Why would you need git blame if you’re just an intern and merging & picking is done by senior greybeards? Why would you need git blame if everything is perfectly explained in the ticket linked from a PR? Why would you need git blame if the code is clear and does not require git archaeology to understand it?

There are ways to get things successfully done without git blame or “nice” history. And your output is not history, your output is feature in use by real users.

I called it “commit discipline” because it is discipline: it has benefits but most of them are non-obvious and hard to explain to undisciplined; the discipline requires extra effort to follow; in order to effectively acquire it you need nurturing, training, and reinforcement; after sticking with it for N years it’s just what you do and you do not imagine doing things otherwise. Oh, and you absolutely can get things somehow done without it and do not even realize what you’re missing. Or do things with it and do not realize that you’re wasting time on irrelevant work.

Izkata · on Oct 30, 2022

> It is absolutely theoretical until it is required in practice.

The problem with this assumption is once it is required in practice, you can't go back in time and change the past decade. This is one of those things you have to assume will be required if it's to be of any use at all.

> Why would you need git blame if you stay at a company 2 years tops?

It's not for you tracking your own code, it's for figuring out what was going on 3 developers ago (or 3 developers in the future looking at your code).

> Why would you need git blame once your feature is effectively rewritten? twice?

To know what happened that caused them to write that really weird line in the original code, how important it is, and whether those two rewrites need something like it. I'm going through this right now, rewriting some perl daemons in python because the underlying system it uses has become so unreliable and having such svn history from 10+ years ago has been a great help - even for simple things like finding that yes, the comments are wrong because they were for a version of the code before a refactor.

> Why would you need git blame if you’re just an intern and merging & picking is done by senior greybeards?

There's a good chance you're making the seniors' jobs harder by squashing it.

> Why would you need git blame if everything is perfectly explained in the ticket linked from a PR?

You're assuming the ticket still exists. We're on our 3rd system (at least), and I'm regularly in code that links pack to a ticket system that we haven't had in almost a decade.

> Why would you need git blame if the code is clear and does not require git archaeology to understand it?

What's "clear" can be really subjective. It probably was clear to the one who wrote the original version, but for any number of reasons it isn't anymore. (Idioms from one language you don't know well, quirks of the individual developer, an original clear design that slowly mutated over time and with multiple developers and is no longer clear, etc...)

js2 · on Oct 30, 2022

I also work this way and advocate against squash merging or putting all the context into the PR.

A big reason for me is that I've built up a lot of context in my head when writing code. When I make the commit, that context is fresh in my mind and flows easily into the commit message.

But say instead I make a half-dozen commits with just single sentence descriptions. At the end of my work day I push the commits to a new branch and open a PR, and start writing the PR description. What was I thinking 5 hours ago when I wrote that first commit? I've forgotten. I've lost context.

So I always try to commit and document as I go. At that point I don't spend a lot of time getting the commit messages perfect, but I make sure they at least have all my thoughts. Before I push up the commits, I'll use "git rebase -i" and "reword" to edit the messages (typos, reformatting the text, maybe adding additional thoughts as I look at the diff). Then I open the PR and at that point, maybe I'll add additional context but usually my PRs are just an abstract and I ask reviewers to look at the commits themselves. It drives me crazy how hostile GitHub is to reviewing individual commits.

Many teams end up with squash-merges though because frankly: (a) developers don't know how to use git beyond committing and pushing so; (b) PR feedback is addressed by adding more commits and pushing them up, as opposed to amending the original commits; combined with; (c) GitHub only grudgingly accommodating an amend+force push workflow.

Frankly Gerrit is a better-designed review tool, but it's not widely used.

u801e · on Oct 30, 2022

> So I always try to commit and document as I go. At that point I don't spend a lot of time getting the commit messages perfect, but I make sure they at least have all my thoughts. Before I push up the commits, I'll use "git rebase -i" and "reword" to edit the messages (typos, reformatting the text, maybe adding additional thoughts as I look at the diff).

I actually don't bother with committing anything to version control until I'm done with the feature or fix. I then will create a diff against the base branch and stage each individual part and create a commit from it. I then push up those commits to the remote.

> It drives me crazy how hostile GitHub is to reviewing individual commits.

I actually have an ingrained habit of middle-clicking on the commit sha1 and add my comments on the commit itself. It doesn't really show up with any context in the main PR view though.

Lately, I've started running git log -p --reverse origin/master.., piping the output into vim, prefixing every line with '>' and typing my comments inline like I would with an email message (trimming the parts that I'm not responding to), and then pasting the entire thing in a single comment on Github.

> Frankly Gerrit is a better-designed review tool, but it's not widely used.

I actually wish more pepole would use the email patch review process. At least there, reviewing individual commits is the default and the relation among multiple commits is preserved. I'm not sure how that's accomplished in Gerrit.

Quekid5 · on Oct 30, 2022

> I actually wish more pepole would use the email patch review process. At least there, reviewing individual commits is the default and the relation among multiple commits is preserved. I'm not sure how that's accomplished in Gerrit.

Gerrit adds a changeset-id (changeset = patch) to each commit message via local hooks -- this allows for tracking patches even when rewriting. (Ofc, if you remove/rewrite the changeset-id the connection gets broken, but it's pretty rare that that happens.)

Other than that it basically functions exactly like a more user-friendly patches-as-email.

u801e · on Oct 31, 2022

So, if you implement a feature that involves multiple commits, when you push the branch up, will all the commits in the branch have the same changeset-id? If not, how is the relation between commits on a branch maintained during review and in the history after the branch is merged?

Quekid5 · on Oct 31, 2022

No, each commit has its own changeset ID. The chain itself is tracked by Gerrit itself. I don't know exactly how, put I'm guesssing it's just using the regular git commit IDs for that. (The changeset ID is just so that it can identify patches long-term, even when their commit ID changes due to e.g. rebase, etc.)

> the history after the branch is merged

Regular commit IDs do that -- although it stores repos and full history, Gerrit doesn't really care about changeset-id's once they're merged.

u801e · on Oct 31, 2022

>> the [branch] history after the branch is merged

> Regular commit IDs do that -- although it stores repos and full history, Gerrit doesn't really care about changeset-id's once they're merged.

In git itself, related commits in a branch can be identified using the merge commit, because that commit's parent commits correspond to the head commit and base commit of that branch. So to look at the commits in a branch after it had been merged, you can do something like:

  git log merge-commit^1..merge-commit^2

This allows you to see the per commit changes as well as the overall change introduced by the branch. You mentioned that gerrit tracks that chain. Since it's not via change-set ids, then does it keep track of related commits in a branch?

Quekid5 · on Nov 2, 2022

> Since it's not via change-set ids, then does it keep track of related commits in a branch?

It is via changeset-Ids correlating to individual commits. The changeset-Ids only serve as a permanent identifier to individual commits.

Everything else comes via the regular backward chain of commit IDs. (Until it hits an already-known/merged commit or an unrelated commit which it knows a changeset-ID for. Not sure of the exact details, but it works as one would expect.)

(Even though Gerrit does support branches and submitting commit to specific branches, etc. we use it in a branchless way in almost all of our projects. Nor do we do merge commits.)

simonw · on Oct 30, 2022

This is why my workflow is so issue heavy: I use issue comments to constantly capture my context as I'm working, rather than using small commits.

I sometimes even run "git diff" and copy the result into an issue comment to record a potential implementation route that I'm not fully convinced of just yet.

davnn · on Oct 30, 2022

If most of the workflow and documentation is not based on git, but instead on issues and PRs, is there an easy way to swap the underlying platform, i.e. GitHub to Gitlab?

simonw · on Oct 30, 2022

Not easy but possible: you export a copy of the issue thread content (via the issues API, which I have a bunch of experience with). Then you host those copies as flat HTML somewhere, such that anyone who needs to find issue 56 can hit some-url/56.html

Then you could either rewrite the commit history to add links to this new location to each commit, or I'm investigating using the "git notes" mechanism to add those updated issue links to existing commits without needing to rewrite commit messages (and hence rewrite the hashes).

seba_dos1 · on Oct 30, 2022

> In the rare case you do actually care what PR a commit is from, GitHub's search will tell you.[1]. There's no need to put that information in a commit at all.

Even better - if you work with linear history (like most GitHub projects do I guess), you can always require merges to be fast-forwardable, but create a merge commit anyway. This way you retain all the individual commits and they're being grouped together by pull requests, so you can easily filter the details out with flags like `--first-parent` when you're not interested in them.

Squashing PRs truly is absolutely useless.

GauntletWizard · on Oct 30, 2022

The tooling just isn't there, though, especially if you want to do something like signing. I want to have my whole merge/pull request history in one subtree, while having the 1st subtree maintain a clear linear history with MR/PR Numbers in each merge commit, and also have that one be signed by the committer. It's easy to represent. It's a little harder to get the forges to do it right.

howscrewedami · on Oct 30, 2022

> Squashing PRs truly is absolutely useless

That is quite an absolute statement. Some people just have a personal preference for having git trees arranged in a certain way. If squashing can help with that, then it serves a purpose.

seba_dos1 · on Oct 30, 2022

Squashing is an essential tool for everyday git use. Squashing pull requests wholesale at merge time as the default merging strategy? It only hurts. It's a clear case of using the wrong tool only because you haven't learned a better one yet. Maybe it is "quite an absolute statement" - but I have never seen a valid justification for its use.

shawabawa3 · on Oct 30, 2022

Have you ever worked at a company where every engineer has in depth knowledge of git?

In my experience 90% of software engineers get as far as pull, push, commit, maybe merge. Rebase and amend if you're very lucky

Squash merges make git easier to use for 90% of engineers and therefore unless you have significantly above average git knowledge in your company it makes sense as a default strategy

5e92cb50239222b · on Oct 30, 2022

Coming from a completely different culture where the word 'engineer' still means something, I find this quite funny. If your 'engineers' just smash keys together until it seems to work somehow (which includes never bothering to learn the full potential of DVCS — one of our most important tools), they're not really engineers IMHO. Real ones know their tools (I don't pretend to be one and never call myself that).

seba_dos1 · on Oct 30, 2022

That's not in-depth knowledge, that's basic usage. I don't have in-depth knowledge of git myself, most of its plumbing is an opaque black box for me. Less obvious but useful things like `--first-parent` are just one manpage reading session away. If your colleagues are unable to perform some basic rebases to squash away their WIP commits, you may want to help them learn how to use one of the most essential tools in their work. Everyone in your team will benefit.

majormajor · on Oct 30, 2022

Do you constantly "rewind"? E.g. getting rid of stuff like "try if this works" type commit messages, after you've finished doing whatever experimentation you need to do?

The idea that I can commit every few minutes with commit messages that will make sense to you months later strikes me as incredibly unlikely, short of "write the code once to figure out how to do it, then throw it again and write it again to have a good chain of commits." And it's not worth that much - especially because the bit that's relevant when debugging may not be the bit that was relevant to me when writing the code.

Non-obvious things like that string/bool thing? I use comments for that. Yes, comments can get outdated, blah blah blah, but commit messages can also be unhelpful. What if that cast wasn't what was on their mind when writing the message anyway?

I would argue that if your PR was a thousand lines it's too damn big anyway. Figure out how to do multiple PRs, your reviewers will thank you. But I want commits that are cohesive, compilable, functional pieces. Not just path-dependent detritus of the dev process.

imaltont · on Oct 30, 2022

> Do you constantly "rewind"? E.g. getting rid of stuff like "try if this works" type commit messages, after you've finished doing whatever experimentation you need to do?

Not constantly, but will clean up the local/personal branch history before creating a PR with a rebase. Change some messages, maybe change up where some of the code is committed or even change the order if that makes more sense than the way it actually happened. It's a pretty quick and easy task to do when you get use to it.

seba_dos1 · on Oct 30, 2022

> Do you constantly "rewind"? E.g. getting rid of stuff like "try if this works" type commit messages, after you've finished doing whatever experimentation you need to do?

Yes. Being able to effortlessly do just that is the main value provided by git.

strictfp · on Oct 29, 2022

I've been following your school of thought as well as the one proposed by the author.

I would argue that your model is a lot freer and will lead to better code quality in the long run, since you're not feeling so restricted when working.

Meticulously crafting commits feel like one of those "good in theory" approaches that feels good but hurts you more than you realize. They're good for workflows where you move commits between lots of (release) branches, sure, but if you're doing trunk-based development I'm going to call it a bit formalistic.

lamontcg · on Oct 30, 2022

> I was recently hunting down the reason for a single ! casting a string to a bool. The blame on the file lead to a large commit with details about the project, but nothing about why that specific line of code was changed.

> Was it an accident? Did it serve an actual purpose? I might never know.

You're never going to get that level of detail.

Particularly if it was an accident or done without really thinking about it--the original coder will never document that because they were literally not thinking about it.

New commit rule: always document what you weren't considering when you wrote the commit.

Izkata · on Oct 30, 2022

> > Was it an accident? Did it serve an actual purpose? I might never know.

> You're never going to get that level of detail.

I have! I once found a bug that was caused by a typo in a linting commit, something obviously unintentional. If it was squashed it would have taken a lot longer to figure that out.

I also recently found issues in a common python 2 and 3 library where the code was broken in py2to3 and first-run-of-black commits. More places where it was obvious how and why the code was wrong because the formatting was a separate commit from the feature.

gorgoiler · on Oct 30, 2022

You wouldn’t publish the chapters of your book with a complete edit history and all of the margin notes and email discussions you had with your editor.

At some point you have to draw a line under your creative process and present a finished piece of work that stands up on its own. That means one diff with a meaningful commit message.

It might seem like a reasonable point that you want the unsquashed commit history in order to see into the mind of the developer. In practice, anything of consequence mentioned in those commits should be in the squashed message. I don’t need to see your thought process as you get something wrong three times then spend another three commits fighting with the linter all before you present your final thesis. I just need to see what you’ve finally settled on.

(See also: changes to a code base that land and are then immediately changed again. Developers like this put up code too soon and too persuasively. Try to get them to slow down.)

PS: Most projects I work on are mature code bases with between tens and thousands of developers. Most of the changes are iterative, adding features on the long march to revenue. There’s an exception to this process — which the OP alludes to — where large new features land as blocks of hundreds or thousands of lines of code.

When that happens, if the author spent a long time in modelling hell trying to find the best way to lay out their code, then it’s going to have a horrible commit history. If there’s anything they can do to land it in stages then they should — including rebuilding their history with pretend commits as if they have one clear idea after another. That is hard to simulate, so it’s often easier to break the code up on the orthogonal axis to time: space. With version controlled code, the dimensions of space are modules, and if a large change is broken into clearly defined modules then that provides the same benefit as a series of changes broken into clearly defined ideas / diffs / patches / PRs.

trashburger · on Oct 30, 2022

> I don’t need to see your thought process as you get something wrong three times then spend another three commits fighting with the linter all before you present your final thesis. I just need to see what you’ve finally settled on.

I don't think parent is advocating for a commit to stay the same before a merge. You can simply clean up those WIP commits via rebase + fixups and put them in a coherent order and structure (i.e. make them atomic as the article mentions). This and rebase-merge workflows make git history a delight to work with.

DeathArrow · on Oct 30, 2022

I see the purpose of code versioning in being able to save the code fast, being able to reverse code changes fast and doing code merged with ease.

If some philosophy stays against someone's productivity I don't believe it is a good thing. I would let developers commit in every state they find easy, provided the code builds and the commit comment describes what is being committed.

I think it is the PR which we can push more demands on but not commits.

I, for example, dislike having to track any small change I have made and commit it. I like committing larger parts which contain a functionality.

imaltont · on Oct 30, 2022

> I, for example, dislike having to track any small change I have made and commit it. I like committing larger parts which contain a functionality.

This is pretty much the suggested way by the creator of git too. A commit should contain the necessary changes for one bit of functionality/bug fix. This imo makes the history pretty neat and tidy, while at the same time making it easy to search through with blame/bisect whenever you need to. Giant commits (multiple functionality/whole project/extension squashed into one) makes both of those hard to use and in some cases pretty much useless outside of finding who did it and hope they still work at the company and remember their state of mind when they did the change.

darekkay · on Oct 30, 2022

> Blames are easy. There's zero need to try to keep your tree "looking nice".

I don't care about how the tree looks like. But I do care about keeping the number of commits low _because_ it makes blames easier.

> In the future when I'm hunting down a problem I almost never care about what feature or fix this was a part of - e.g. the PR - I care entirely about the context of the developers mind when he committed that specific line of code.

I've mostly made the opposite experience. Many developers don't have the knowledge or confidence to use "amend" etc. consciously, so it leads to a mass of useless "fix error", "forgot something", "add tests" and "implement PR reviews" commits. This makes blame more difficult. I also agree: sometimes a squashed commit message doesn't help me understand the code of line I'm blaming. But not having squashed the commit mostly wouldn't help me anyway. Instead, I try to avoid this problem beforehand, by making sure that the PR itself contains all the information (via PR description and/or code comments). Recently I've reviewed a "Fix scrollbar" PR. The "what" is clear. The "why" (= "it's broken") _sounds_ sufficient, but I was missing information like "how exactly is it broken" and "why does this change fix it". That's the kind of stuff I might blame in a year from now, and now it's part of the Git history.

b3morales · on Oct 30, 2022

> Many developers don't have the knowledge or confidence to use "amend" etc. consciously,

The secondary problem here is that GitHub (just to throw another complaint on the pile about GitHub) makes a dog's breakfast of tracking PR comments when things have changed. Amending and force pushing only seems to make it worse, so even though it should be the better workflow, it causes friction in a different place.

int0x2e · on Oct 30, 2022

In my experience, this only works if you have a small team of great engineers. Once you have 50+ people of various skill levels making contributions to the codebase, you'll start seeing your git history littered with variants of "fix", "fix 2", "fix broken CI test" and on and on...

For me, the PR is the context I can use when trying to find an issue. Sifting through many hundreds of commits per day is painful. Sifting through tens of PRs per day is not great, but is much more manageable...

seba_dos1 · on Oct 30, 2022

> you'll start seeing your git history littered with variants of "fix", "fix 2", "fix broken CI test" and on and on...

Should such commits even pass through the review in the first place?

m000 · on Oct 30, 2022

If the team hasn't agreed to not allow this kind of messages, they will get through. And if you're the only one asking for better messages, your colleagues will start seeing you as pushy, and the requested changes as personal favors.

The problem is that if good commit practices aren't established very early in the project, they're unlikely to be adopted later when the team grows (assuming that the whole team has a say on the decision).

MaulingMonkey · on Oct 30, 2022

> I am a firm believer that a commit should rarely be more than a couple minutes of work.

While I'm a proponent of small commits, this is overstating things IMO. I work on codebases that frequently take a couple of minutes to perform an incremental link, that put commits through dozens of CI hours of (useful!) integration testing, chase down C++ heisenbugs that take weeks to hunt down the root cause motivating a 1-line change with a 30-line commit message.

On the right codebase, I'm more than happy to throw a single file "fix whitespace" commit at CI without even locally compiling it, confident that CI will catch any problems, because it'll help untangle the diffs of future impending commits and make them easier to review. On the same codebase, I might make a commit touching 1000 files, mechanically switching code from an old deprecated API to a new one. If individual changes are "high risk" and potentially worthy of bisecting, proper review, etc. then I might split the commit up. If the individual changes are "low risk", and basically guaranteed to work if it compiles, then splitting up the commit is just adding noise - better to make it a complete and atomic commit than a micro commit, even if it might involve hours of work (e.g. adding a categorization enumerand to allocations or logging.)

The real risk would be something like changing allocation patterns breaking some uncaught edge case workflow by going out-of-memory, and the complete commit will be easier to bisect, track, and revert than dozens of scattered micro commits that individually made the crash only slightly more likely, but on the whole made the crash certain.

Unless I have a huge slog of entirely mechanical commits, I probably top out at maybe 20 commits on average over a flowing, coding-focused, 8 hour workday, doing work which is straightforward and easy to carve off completed atoms of work into their own commits. Which is what - 24 minutes per commit? Bit more than a couple. And I'm an outlier compared to coworkers over multiple game companies, who trend towards less frequent, girthier commits, even if they appreciate my approach.

> I was recently hunting down the reason for a single ! casting a string to a bool. The blame on the file lead to a large commit with details about the project, but nothing about why that specific line of code was changed. > > Was it an accident? Did it serve an actual purpose? I might never know.

I have asked myself the same questions of plenty of tiny commits, sometimes even my own. Tiny commits don't actually solve this problem. Thorough review, clear code, and proper documentation can help, but if nothing ever slipped through the cracks, we wouldn't have bugs in the first place.

And sometimes you discover it's actually a bug canceling out another bug, and that even the author of the commit didn't actually grok what was going on, even if they tricked themselves into thinking they did at the time.

---

My own rule of thumb: carve off small/atomic/freestanding "complete" changes into their own commits for easier review of both it and your future commit whenever reasonably possible, if only because the change is broken up. But even this has caveats - reviewers traumatized by past coworkers adding "code for the sake of code" might push back on these for lack of concrete use cases, and it may be easier to bend to their whims than to spend the time and political capital to fight 'em and bring them around to your way of thinking. "Reasonably possible" means "stop if it hurts."

bennysonething · on Oct 30, 2022

Git squash into main works well for my team. Lots a commits, often undoing then redoing things really doesnt add anything for me. In fact I find it harder to understand lots of small commits because the overall goal / change change is so fragmented. Also it's it's easier to revert a single squash.

simonw · on Oct 29, 2022

Is this example too large or about the right size for you? https://github.com/simonw/sqlite-utils/commit/ab8d4aad0c42f9...

js2 · on Oct 30, 2022

Here's what I would have written. This took me reading the code and clicking through to #299 to figure out and could have been in the commit message:

    Add optional tables argument to schema command

    The schema of every table in the DB is shown by default. Add optional
    tables argument to restrict which tables are shown.

    Closes #299

thunky · on Oct 30, 2022

The issue serves no purpose for me, so here's a shorter version:

  Add `tables` argument to schema command

  Optionally restricts which tables are shown

js2 · on Oct 30, 2022

I think the issue was originally an outside feature request and adding "Closes #xxx" is there just to link the commit to the issue and automatically close it.

thunky · on Oct 30, 2022

It wasn't, actually. simonw created the issue himself and self-closed it 8 minutes later.

I had a conversation with simonw yesterday about why I personally view this as an anti-pattern: https://news.ycombinator.com/item?id=33388369

seadan83 · on Oct 30, 2022

I really love this specific thread, thank you all for the thoughtful commentary.

An experience that is really burned into me was working on a team that put very little into commit comments or PR descriptions. But, they did have debates and design decision conversations within PR review comments. Hence the context was in the PR comments.

One fine day, I was looking for the background behind some code while tracking down a problem. After identifying about 7 commits (with pretty basic/useless messages, and no PR link!), I then had to find the corresponding PRs based on timestamps, and search the PR history for PRs merged around those timestamps. From there, i had to review the conversations, some very lengthy back and forth conversation threads within the PRs to happily finally find a conversation that had the desired background. Fhe background was something like "why did you do this? X reason. Response: ok" Given that was the main thrust of the update, had that context been in the commit, it would have saved a solid hour.

From this experience and others tracking down N jira items to gain context into N possible commits, my rule of thumb is that it scales poorly if you have to go to an external tool to gain context. The above experience meant tracking down the context of any one change was generally a 15 to 60 minute exercise! No chance to grep logs, just a lot of timestamps searches and scrolling through PR conversations.

My preferred approach is that a first commit is "for the history." Any further commits are for the reviewer and when squashed those commit comment will generally be removed. So things like "address concern for X" and "formatting" makes sense for a reviewer coming back to an already reviewed PR, but is deleted fir someone that will look at it squashed. The idea being the squashed commit is on master, at that point nobody cares about a "formatting" commit that is one line in a series of 10 other similar commits (that person is tracking down a bug or trying to figure out non-obvious code, they do not care about renamed variables being apart from other tweaks and formatting)

Hence, there is different granularity in commits, but eventually the PR and commit granularity become the same.

I thus treat PRs as ephemeral with all of the initial PR text auto-generated from the commits. So, any long work to fill out a PR description instead is captured in the commit message which then autopopulates the PR description leaving nothing extra in the PR. Then any review comments are preferably not addressed directly in the PR, but with a preference to update the code for clarity or with needed documentation. Once that is done, the response to the reviewer is a "updated in git Sha X, WDYT?" Hence, the PR is for the review process, not documentation.

Of course, some updates need several commits to be grok-able in review. In this case PRs are broken up, a leading PR might be 10 refactoring commits which is then squashed together with title "refactoring X", and the lusg of commits then automatically squashed into a list of reactors done in that commit. Thus avoids a very long string of commits in the history. Then the 11th commit that is the actual bug fix or functionality changes is submitted and then exists in the history as it's own commit. So the history would be like "simplify login moodule\n[list of refactors", and commit two: "fix empty username login problem\n[description of why this is a fix]"

thunky · on Oct 30, 2022

> After identifying about 7 commits (with pretty basic/useless messages, and no PR link!), I then had to find the corresponding PRs based on timestamps, and search the PR history for PRs merged around those timestamps.

Not sure if this would save any time, but it is possible to search PRs by commit. For example, say git blame led me to this commit: https://github.com/simonw/sqlite-utils/commit/129141572f249e...

I could have found PR #373 via this search: https://github.com/simonw/sqlite-utils/pulls?q=bb16f52681b6d...

> I thus treat PRs as ephemeral

I think I see what you're saying but as others have pointed out, sometimes you want to add screenshots etc to the context, and you can't capture this kind of info in commit messages. So then you have two choices: issues or PRs.

> Then any review comments are preferably not addressed directly in the PR

I would think that sometimes you really do want to have a back and forth conversation in the PR, rather than just a "make this change" -> "ok done" type of feedback loop.

I view the PR as an decent place for all of this because it's basically a commit of commits, capturing the related changes/conversation/context all in a single place at the point of merge.

seadan83 · on Oct 30, 2022

> I would think that sometimes you really do want to have a back and forth conversation in the PR, rather than just a "make this change

I did not intend to imply that back and forth conversations would not happen. Instead, the preference would be to first address the code. Sometimes it's not clear, there are difficulties, different options, all worth a conversation.

My ultimate thesis is that I think commits linking to external sources does not scale well when doing an investigation into code history.

I suppose a person could do links to images in commit messages, I certainly do put links in commit comments. Though, I would agree regarding having images in PR is good. When reviewing UI changes, or making UI changes, the before and after pictures (in a PR) are super helpful.

Though, again, I view the PR and its content as "for the reviewer" and not for the "bug investigator". Maybe it would be a cool git feature to be able to view images in-situ. Overall, I don't think this one weakness changes the picture of going to an external too for N commits scales very poorly. I feel it is very similar to the movement that combined code and documentation together (very long ago, api docs like javadoc used to be in different places). All that is to say, centrality of information is key IMHO for efficiency when looking at commits at scale

PS: thanks for the pointer on commit search =D I was not aware of it

fragmede · on Oct 29, 2022

The commit message could stand to be the tiniest bit longer, instead of having to click through to find out what #299 is.

donatj · on Oct 29, 2022

That's probably fine, honestly. I'd maybe take a little more context in the message, but I certainly write messages like that myself. It's always later when I'm looking into problems that I want more context. I spend a lot of time hunting down problems. Half the time I caused them.

I know tone/angle of commit messages is an entirely separate bag of cats that people have strong opinions about, but I find the tone strange. I generally fond of commits that state what the commit does. Something along the lines of "Adds optional table support"

js2 · on Oct 30, 2022

> Adds optional table support

Ahem, "Add optional table support"

Because: imperative mood. :-)

I was an early git contributor and I've modeled my commits from the git project ever since.

simonw · on Oct 30, 2022

I can never remember which tense is recommended for git commits - I keep meaning to figure that out and try to make a habit of if.

mklein994 · on Oct 30, 2022

It's called the Imperative Mood[0]. I like to think of it as though I'm completing this statement: "If you apply this patch, this commit will [Do The Thing™]". Here's a few examples:

If I were to apply this patch, this commit will…

- Simplify FooWidget

- Add optional table support

- Remove trailing whitespace

- Update fizzbuzz to 1.2.0

- Merge branch 'feature/ABC-123-foobar'

- Revert "Add foo to bar"

[0]: https://en.wikipedia.org/wiki/Imperative_mood

simonw · on Oct 30, 2022

Thanks, that's a really clear explanation. I'm going to try that out and see if it sticks for me.

bmitc · on Oct 30, 2022

If you keep your commits so small, what do you put as the commit message?

Also, this showcases why Git is a terrible system. It contains no semantic information in it and is just a dumb line and text change keeper.

raydiatian · on Oct 30, 2022

This makes sense for bugs, which every feature secretly is

wellpast · on Oct 29, 2022

Going to express a counterpoint.

Maintainable/malleable software is mostly to due with the architecture of the code at any given point in time, not the sequence of commits that got it there. (I'll take all N,000 commits collapsed into one with a message "Stuff" if the code is well-articulated than a series of perfect commits that culminate into a highly coupled volatile system.)

Furthermore, I'll posit that these "perfect commits" have a tendency to trend toward overall bad code architecture. Primarily because the attention flashlight is being shined on the wrong then (the delta as opposed to the resultant codebase). In the course of working on a system I'll often earmark code decoupling/cleanup as part of a commit. Or move to get cleanup in quickly so that my general feature/value delivery is achieved. But processes that focus attention on commits (like this "perfect commit" and also formal code review processes) discourage this kind of continuous cleanup/refactoring that is the only way to get to good code.

ecshafer · on Oct 29, 2022

Commit basically doesn't matter to me, I think in the terms of PR. I can make 100 commits or 1, it doesn't matter because when I merge I squash them into a single commit that encompasses everything. There are some exceptions, as splitting a large merge into several smaller merges can help a lot, for example adding a database column in a pr, then adding in a separate merge usage of that column.

In terms of commits, the commit message should be a good summary of what is there. In terms of a PR there should be several things. One is obviously tests present. The next is a what, why and a how, which should be explanations of why you are doing it, what the purpose is, how you are doing. Finally I think there should be a descsription of how this should be tested manually.

rektide · on Oct 29, 2022

Its tragic how either/or git is.

Both are great in different ways. If Im trying to understand a complex tricky bit of code, a long string of small commits is ideal. If Im browsing a repo, large squashed commits or PRs are better.

It's be great to have better heirarchy, to get both.

jimmaswell · on Oct 30, 2022

Resorting to sleuthing around the commit history means the codebase has a problem with adequate commenting or unclear code.

rektide · on Oct 31, 2022

Thats a remarkable amount of idealism.

Why bother writing PRs and commit messages at all, if this is true? Im not opposed to heading there, but I think a huge amount of reason intent & purpose is currently captured there in 99.9% of orgs. And that repository has always seemed like an invaluable repository of context.

I'd like to visit your pleasantville. I dont believe it is real.

jimmaswell · on Nov 1, 2022

Nobody almost ever writes detailed commits at my workplace, but I try to keep the code well documented when I can. Commit history can help sometimes but mostly just to establish timelines.

seba_dos1 · on Oct 30, 2022

`git log --first-parent`

`git log --graph --oneline`

It's GitHub you want to complain about. git handles that perfectly well.

drdec · on Oct 29, 2022

You can have both if you keep the PR branches

Twisol · on Oct 29, 2022

This is why I follow a rebase-then-merge approach. The commit graph looks like this:

   |
   *
   |\
   | *
   | |
   | *
   |/
   *
   |\
   | *
   |/
   |

Each PR is neatly delineated, and I can quickly follow things at coarse-grained level (follow only the left parents; there's even a `git log` flag for this) or at a fine-grained level.

This isn't just pretty -- this has been actively helpful to me many, many times. I can `git bisect` into a merged PR to more precisely identify where an issue arose, and `git rebase` onto intermediate commits instead of jumping all the way to the end if I find a whole PR to be problematic to rebase past in one go.

taberiand · on Oct 29, 2022

Semi-linear merge is my favourite too. I'll also often reset soft and hand craft the commits for the PR to lay out the work of the branch so that it builds up logically, and assist reviewers (and future-me) in understanding the structure of the change.

vvillena · on Oct 29, 2022

I also try to adopt this approach whenever possible. The 'git log' flag you mention is '--first-parent'.

Twisol · on Oct 29, 2022

Thanks for mentioning the specific flag! I didn't mention it because I don't use it often enough to remember it (the shape of the log is usually enough), but it does exist if you want it :D

mrslave · on Oct 29, 2022

I also enjoy waiting for unnecessary pipelines to run again, because we must first rebase. ;-)

It's XKCD's "My code's compiling!" for 2022.

Twisol · on Oct 30, 2022

It's a tradeoff. I've been in multiple situations where two branches independently work fine at runtime, and merge without conflict, but the merge is broken at either compile-time or runtime. I'd rather rebase with cognizance over what I'm rebasing over, and make sure my changes are actually semantically consistent afterwards.

Of course, if you have an organization-wide monorepo, this isn't going to work. (At the same time, at that point, you should probably be a bit more selective about what CI you run based on what components have changed... and make decisions about what checks to run nightly rather than on PRs.)

dec0dedab0de · on Oct 29, 2022

I leave all the commits in without squashing, but filter to only show merge commits

Noumenon72 · on Oct 29, 2022

This is really weird to me -- the merge commits are the only ones without any information, because they are all autogenerated. The only time I ever look at a merge commit is when someone screwed up a merge conflict. I have written commands to filter out merge commits before.

jacoblambda · on Oct 30, 2022

It's worth noting that a lot of git's tooling supports --first-parent which only shows the "top level" commits in a branch. This means top level merge commits and commits that aren't part of any merges into the branch you are looking at. Ex:

For git log, this only shows you the commits that are merges and direct commits to the branch. For most repos this should correspond to individual patches/PRs/issues. It also sets the diff settings to use the same setting while in the log.

For git show, this shows you the diff against the previous commit at the same "level" in the merge hierarchy. So on master this should generally show you the diff between pr47-merge and pr46-merge. If you step one level into the merge hierarchy and diff a merge commit from master into that feature branch (even if said feature branch has now been deleted but was merged into master), it'll show you the diff between that merge commit and the previous commit prior to that merge.

For git bisect, it only tests against these top level commits before stopping. If your development requires that commits be rebased to functional commits prior to merge, this doesn't really change anything but for most merge-with-history style repos, this setting allows you to quickly identify where bugs are introduced.

fiddlerwoaroof · on Oct 30, 2022

There’s no reason merge commits can’t have a useful message.

On-disk, the only difference between a merge commit and a squash merge is that the merge commit has two parents.

seba_dos1 · on Oct 30, 2022

> the merge commits are the only ones without any information

Huh? Merge commits usually have a high-level MR description included in their commit message.

Izkata · on Oct 30, 2022

By default from git, not gitlab or github, it's just "Merge branch 'Branchname'".

I think the project in gitlab also has to be configured to include the merge request description, so it can also be easily missed. No idea about github.

stormbrew · on Oct 30, 2022

Git isn't either/or: GitHub is because it has extremely poor support for nonlinear history. It's not alone in that but git itself has the fundamental tools to support it.

bagels · on Oct 30, 2022

Having all the commits, including the ones where you change your mind and try something different is not useful. Spending a ton of time to rewrite history crafting a ton of tiny commits is a waste of time over squashing.

simonw · on Oct 29, 2022

Totally agree - if I'm working on a larger change I'll usually do that in a branch, which ends up as a PR which I then squash-merge into a "perfect commit" as described in the article.

neon_electro · on Oct 29, 2022

This is why I now love GitHub's "squash & merge"; it accomplishes so much of the ideals laid out in this "perfect commit" article.

jacoblambda · on Oct 30, 2022

I have to disagree. Mainly because I care what the contents of the branch were after the merge when I have to go back and figure out why a particular error is occurring.

Mainly my workflow is:

1. Commit often with moderately useful commit titles/texts.

2. Rebase -i prior to pushing to remote. During rebase, squash/fixup together relevant commits and write succinct commit titles with useful commit texts. Then push.

3. Rebase -i again at the end of the PR to guarantee everything is coherent and that each commit that makes it in the merge at least builds and passes existing tests (i.e. that functionality before the branch isn't broken in the branch).

4. Merge into master. Merge commit should include the PR info (like PR text contents) in the merge text if at all possible.

5. Forget. Maybe move to a new git host.

The years later when I identify a bug I can just:

1. Put together a reproducible example.

2. Git bisect --first-parent (only top level commits/merges) to identify the merge where the bug was introduced.

3. Git bisect the merge itself to identify the exact commit that introduced it.

4. From there I can determine the context of hows and whys behind the bug being introduced, identify a possible solution, and evaluate any process changes that could have prevented this bug from being introduced elsewhere in the project.

manbash · on Oct 30, 2022

Step 3 is problematic. When you push change requests you also need to rebase i.e. put the fixing commits right after the affected commits.

Otherwise, if you only rebase once the PR is complete, then you are likely to stumble upon conflicts due to the order of affected files.

It's a waste of the developer's time either way, not to mention that rebasing i.e. changing history on an (upstream) PR hurts the review experience e.g. comments on lost commits.

jacoblambda · on Oct 30, 2022

I generally agree that it's not ideal.

When I say rebasing, mostly I mean going back to clean up commit texts/titles and occassionally to squash/fixup commits that at the end of the dev process don't really make sense anymore and would probably only confuse people looking back at it (like squashing/fixuping down "do a, do b, revert a, do c, revert b, do d" into "do c, do d").

I make the distinction from step 2 because while I forgot to mention it (and in my head was considering it a part of step 3), I'll generally rebase right before I open the PR which cleans up most of the noise (such as "merge master even though no relevant files were touched" commits which don't add anything meaningful). Then the rebase at the end of the PR is only really squashing down a lot of the back and forth on the code review into meaningful chunks.

On Github this does kinda mess with the code snippet/suggestion stuff but since it's right before merge, all of those should be closed as resolved/implemented/etc anyways.

As for it being a waste of time or not, I'd disagree. I spend maybe 5 minutes at absolute most rebasing and often only about a minute (really just git rebase -i, mark the relevant commits e/s/f, let it do it's thing poking it where necessary, resigning w/ gpg, and pushing). If the rebase is throwing nasty conflicts while working on fairly linear single-dev feature branches, odds are you are rebasing wrong. And I find this saves future me a lot of time since things generally "just work" and I have all the important history still in git.

b3morales · on Oct 30, 2022

> changing history on an (upstream) PR hurts the review experience e.g. comments on lost commits.

It does, but this is specifically a weakness of GitHub's UI, not of the process itself. It's handled better by other tools.

seba_dos1 · on Oct 30, 2022

> rebasing i.e. changing history on an (upstream) PR hurts the review experience e.g. comments on lost commits.

GitLab handles it well.

seba_dos1 · on Oct 30, 2022

I believe that "squash & merge" is the most useless possible git forge feature that never makes sense to use except in cases of blatant laziness or incompetence. Its only purpose is to plaster over the mess that people who don't know how to use git well make with their branches, and relying on it is always net negative for the utility of your repository.

The only case where I use "squash & merge" is when I get some trivial merge request from someone who clearly doesn't grok git well and posts tons of fix up commits. When their change would boil down to a single logical commit anyway, then sure, I can click that button instead of asking them to clean up their branch - saving time for both of us. Otherwise, there's absolutely no benefit to it that I can perceive. Every reason that I have ever heard from people using that workflow could be nullified by learning how to use git a bit better, often bringing lots of other benefits to the table.

(there's one exception - "GitHub doesn't handle PR reviews well" is a somewhat valid reason, but that's a perfect excuse to look for alternatives or at least to complain to GitHub to make them finally fix their product; it's not the only forge out there, GitLab deals with branches getting rewritten during review well)

m000 · on Oct 30, 2022

Finally someone who gets it: history rewriting is the killer feature of git. If you don't curate your local commits before pushing, you're not using git properly.

Squashed commits essentially reduce the development workflow/process to the level of previous generation VCS tools (e.g. subversion).

phist_mcgee · on Oct 30, 2022

Woah, that's a lot of absolute statements you've made.

Pulling out the old 'it's not the tool it's you' argument as well

seba_dos1 · on Oct 30, 2022

> absolute statements

In case you haven't noticed, I started the whole comment with "I believe" ;) I'm clearly not an almighty source of all truth; that's just my impression after 15 years of experience with git and many projects using it with various workflows.

js2 · on Oct 29, 2022

Perfect commits are a bulwark against technical debt:

https://www.infoq.com/articles/business-impact-code-quality/

https://news.ycombinator.com/item?id=33372016

The extra time spent crafting a perfect commit today will save 10x that time in the future. Your future self will thank you, and if not your future self, then I will, when I inherit your code.

psychoslave · on Oct 29, 2022

I very rarely check the repository history, even less to find an answer in how some buggy behavior was introduced.

At most I'll git blame, search the issue repository and see if I can chat with the concerned people.

Understanding the current relevant code and what need to be done is basically always sufficient to do the job. Digging into history, while potentially very interesting, never helped to reduce the time to resolve anything in my experience. If the code is a mess, the history will be exponentially so. If the code base is clean and well documented, you might more likely have a great set of "perfect commits" that you will never need.

js2 · on Oct 29, 2022

> see if I can chat with the concerned people

The concerned people quit and now I'm left supporting their untested, undocumented, unreviewed code that doesn't have commit messages more meaningful that "now it works", "fix it", "make it work", "another try".

So now answering every "why?" question takes me much longer than if the damn code was just documented in the first place.

I often go back to my own code, go to make a change, can't remember why I did it a particular way, run "git blame, git show" and thank myself for having the foresight a year ago to spent an extra minute or two when the code was fresh in my mind writing down why I did something a particular way.

If that doesn't float your boat, fine, but I don't want to work with you.

The git repo itself is the epitome of what I aim for in my commit messages. Here's an example:

https://github.com/git/git/commit/d3775de0745c975e2d13819a63...

Literally a single line change, the addition of:

    BASIC_CFLAGS += -O0

to the top level Makefile, and 9 paragraphs of explanation in the commit message.

grogers · on Oct 29, 2022

Writing good code and writing good commits definitely clusters together. It's part of a general pride in quality of work, organization, and planning.

Going back into the commit history can be useful to figure out if something suspicious was introduced intentionally or if it was just a mistake. Beyond about 3-6 months, the author will generally not have any context to remember small code details, so leaving breadcrumbs for your future self and team can be nice.

quickthrower2 · on Oct 29, 2022

History is useful because blame often get’s it wrong. So it is a manual blame.

keyle · on Oct 30, 2022

Until the requirements change 12 times over 12 months, and that time invested writing prose with all the stuff that goes with it goes down the drain.

Don't get me wrong, I'm not saying commit crap. I'm saying "Yes But". Use code as documentation, use Tests as defence turrets, forget the poetic muppetry. The code should reflect the now and mark the pitfalls. If the code is well written, I should be able to understand the why it's done this way. Don't send me off to some document on confluence written by the dude that left 2 years ago.

simonw · on Oct 30, 2022

That is the exact argument I'm making this article: if your documentation is on confluence it will inevitably go out of date.

The solution to that is to keep the documentation in the same repository of the code, and update it in lock step every time you make an implementation change that affects the documentation.

wruza · on Oct 30, 2022

For a counter example, most of my work projects usually live up to 3-4 years (with 0.5-1 year of iterative development) and I have never investigated a commit from the past nor had to understand what it does, because that state of a project has nothing to do with today’s and checking it out or blaming is usually completely meaningless, unless it’s from the last week or two.

I believe that most of these high-culture commit advices come from a specific (maybe common in bigcorps maintenance phase, idk) development process and/or paradigm, which may or may not be present at your workplace (though you may pretend they are, as we all do sometimes to look better). Personally and company-wise we are using RCSs simply to merge parallel work and to have an undo stack a little deeper than an editor could provide. That’s it. Nobody ever goes to the history, unless there is a technical issue with RCS itself.

I’m not suggesting to write “new task” in a commit message, but am not fond of writing markdown poems there either, or committing every 3 minutes because changes “must” be short. Commit when it’s done, for some reasonable definition of “it” and “done”, and explain it in one line in under a minute, this is my empirical rule. If you have more to tell, tell it in the comments. (That’s what your future self will really be grateful for.)

Pretty sure that many people who nod to all this noble culture actually silently relate more to the above.

Edit: should have read this subthread before being cautious in my statements. Glad that there are people who share these views. Because reading on how to do it The Right Way and then waiting for “the obvious benefit” over decades may be confusing and anxiety inducing af.

js2 · on Oct 30, 2022

You've presented a strawman argument. No one is suggesting committing every 3 minutes, nor writing markdown poems.

Commit your work in logical blocks that make sense and explain what the change is doing and why so that a developer other than yourself can understand why the change was made. That's all. Unless you expect 100% code churn, you'd be surprised what really survives in the history. And even in the course of churning code, I often refer back to commit messages for the existing code.

As I've mentioned elsewhere, I've been doing this a long time (more than two decades) at startups and fortune 500s, on both open and closed source code, on projects of all sizes. I'm surprised frankly that 3-4 years on a single code base has not been enough time for you to have never wished you'd written a better commit message.

I literally lost an hour today working backwards from an error message to what the code was actually complaining about. No comments in the code. No documentation on the API endpoint. No commit message or PR description to explain the change beyond "make it work now with IPA uploads." In my experience, all these things go together and are signs of a conscientious developer.

The code tells the computer what to do. The commit message tells other developers what the code is supposed to do and why it's supposed to do it.

This has nothing to do with nobility. It's about sustainable development that tries to avoid technical debt, about being forward thinking, and about courtesy to future maintainers of the code you've written. Someday that future maintainer may be you.

mianos · on Oct 30, 2022

Is this irony? Given the time.to write the odd comment, tidy a bit of code write a commit message, I know what I ask for. People read the code. I am sure some people jump right to the commit log but I have never met one in my 40 years as an active developer. Commit messages matter. A short statement of intent is a good idea. Dangerfiles are the penultimate expression of bikeshedding.

brian_cunnie · on Oct 30, 2022

> An issue is more valuable than a commit message

Watch out! An issue is more ephemeral than a commit message: The issue might not always last, but the commit message will. If, say, a project is re-hosted from GitHub to an internal GitLab instance. Or maybe the original project dies, and all the active work is done on a fork. In such cases, the original issues are gone, but the commit messages stay.

So put everything that's important into the git commit message. Don't assume the issue will always be there.

I learned this the hard way when the company I worked at used something similar to GitHub issues (Pivotal Tracker stories) to record the important background to a commit, which was great until the story was purged (because, say, the Tracker project was deleted).

Izkata · on Oct 30, 2022

Indeed, we went [something old] -> FogBugz -> Jira and cvs -> svn -> git. All cases from that original system have been lost and some but not all of the FogBugz cases were imported to Jira (and even the ones that were imported are hard to find), while the repos went through an import process so all commits and their messages have been preserved. The git history is a bit wonky because of how branches work differently, but "blame", check commit, "blame commit~1" to iterate backwards in the history to track a change do all still work as expected.

We now have been having to stop devs from using Gitlab as their primary issue tracker instead of Jira because the non-devs use Jira for everything and we need to keep them in the loop.

8n4vidtmkvmk · on Oct 30, 2022

its worth porting the issues to the new platform. i ported bitbucket to GitHub which was annoying because you can't choose the bug # so you have to import then in order and get it on the first try.. but at least everything lines up now

Macha · on Oct 30, 2022

In larger companies this is rarely your decision to make and in three different issue tracker migrations I've seen the team responsible for maintaining the issue tracker instead to keep the old tracker in a read only state until they either have to renew the vendor contract or make security updates, at which point the issues go down the drain. This may all have happened before you joined, so it's not likely you could have done the migration yourself (and the security team may have disabled generating API keys which automated migration tools need, anyway)

8n4vidtmkvmk · on Oct 30, 2022

Well... Whoever was in charge of those decisions made bad choices then I've done it twice now in tiny companies. Only takes a couple days to hack together a script

thom · on Oct 29, 2022

Somehow I’m really wary of all the energy that goes into source control workflows and commit messages. It’s all meta-information and the only time it really matters is when something has gone catastrophically wrong. If you genuinely have to go looking at the history of a codebase to find out why something weird has happened, that feels like a codebase crying out for a better structure. New features shouldn’t be major surgery, they should be instances of established abstractions. I’m very interested in languages that make that modularity easier but if you can’t delete your entire Git history with no negative consequences that just seems like a disaster to me, on both an individual and industry-wide scale.

cratermoon · on Oct 29, 2022

> the only time it really matters is when something has gone catastrophically wrong

Or in other words, it matters most when it's most needed and the recovery process is glad to have them.

Think about the airline industry and how many times the causes of an accident have been found in the months-old maintenance records of an aircraft, or in an examination of pilot training from last year.

Now imagine throwing away all the information that could have led to finding a cause and putting in place mechanisms and practices to prevent a recurrence. This is how the software industry continues to stumble over the same problems that were known and described in the 70s.

thom · on Oct 29, 2022

I don’t think it’s _necessary_ when something goes wrong. I think it’s a sign of a process going wrong (which is just as likely as your commit message process going wrong). If you can’t work out why a piece of code looks the way it does or why it even exists, the fix isn’t better commit messages, it’s better code.

thom · on Oct 29, 2022

I want to be clear I’m not just being a curmudgeon here. There’s been interesting, old work in the OO world about capturing use cases more explicitly in your model. I think some work around micro front-ends is promising. Anything that lets you compose new work in a modular, clear, explicit way is very interesting. And I think that’s the path we should take as an industry, rather than running git blame to understand why a particular case in a switch statement exists. At the very least, everything useful in a commit message could be a comment in the code, but even that seems weak to me.

cratermoon · on Oct 30, 2022

You're correct that ideally the code would reflect the understanding of the programmers and the business. I've definitely worked with codebases where it's nearly impossible to discover any but the most basic things about the business, and I appreciate better code. Still, even if the code is clear, it's clear about how things are now, but what about in the past? Can just looking at the differences between and now clarify why things changed, and why the code was changed the way it was? What's become sort of a mantra for me, "what problem were they trying to solve?" That's not often clear in the diffs.

thom · on Oct 30, 2022

The ‘why’ of all of your code at the macro level should be clear from the acceptance tests (which hopefully document the motivating use cases and aren’t just barren series of steps with enigmatic names). At the micro level it should be done with naming (it’s really good to have the strategy pattern here because you can explicitly capture policies as named things, and occasionally keep old ones around as a contrast) and comments or doc strings.

I don’t know why the industry has settled on the idea that we only talk about ‘why’ at diff time in external tools. Even if the tools and languages today aren’t ideal, I think it’s a worthwhile thought experiment to constantly ask yourself, how can we better capture this information and serve this workflow in one place?

For example, most codebases put everything in a series of functional siloes based on the architectural layering of the app. What would it take for a framework to make it easy to write code built around vertical _features_, all the code for each of which is co-located (including tests and documentation), and which integrate into the app via various hooks. There are obviously a million ways to poke holes in this idea but I think we’ve learned the lesson in the industry a thousand times over that functional siloes are bad from a team structure and work scheduling point of view. Perhaps that ought to also apply all the way down to the program text.

Ultimately I’m more intrigued than religious here, but I will maintain that it’s at least a code smell when someone has to care too much about commit messages.

rtpg · on Oct 30, 2022

The very obvious place where this is useful is when seeing a random `if` statement and wondering why it's there. Git can pull up the commit it was added in, and the right context.

This can also quickly identify who added it (so you can ask someone "hey, if I changed this to that would that make sense in your opinion?")

Like with many things in life it's not a guarantee of exactitude or rightness, but it can be very helpful information!

thom · on Oct 30, 2022

But we can both accept that the meaning not being clear in the code is bad, right? And the ideal solution would be the code being clearer, not the commit messages being clearer? I just think I’d always put my effort into the code and incentivise that over everything else. I mean, if you have a highly involved Git workflow already, aren’t you doing code reviews and catching this stuff anyway?

rtpg · on Oct 31, 2022

That is a legitimate point.

One challenge with this is that in a code review you have the context of the change in front of you, and are looking at changes. Many changes are "obvious" when seen in the context of other changes. But later on people aren't looking at changesets, but final artifacts! That difference in perspective can be hard to decipher.

Now of course there are things that are "obviously this should be commented clearly". But then there are more subtle things that make sense to the person writing the code, and the handful of reviewers, just because they have experience in a domain. Someone else comes in and it just isn't clear to them for... no fault of anyone really. But the clean commits can help this person discover answers more quickly than if there is no VCS information at all.

I think this might be the clarity equivalent of "defense in depth". You want things to be clear, but sometimes things are not that straightforward and you gotta dig in a bit more.

On a more holistic level as a commit writer, I'm generally only making a handful of commits in the end (I like squashing up the fixups and the like). I write out commit messages, and usually those messages become the PR messages (with some edits as needed). That process helps me clean up code and add comments as well. The process of making this stuff easy to understand involves thinking about it multiple times in the process from thought to committed code!

So yes, make the code clear. Sometimes things are hard to make unambiguously clear from any angle, though, so also make the changelog clear!

thom · on Oct 31, 2022

Okay so we’re learning even more here - code reviews have some flaws which perhaps we can address by performing smaller more isolated reviews first, or at least spending more time on the question of whether code makes sense in isolation.

I accept it’s hard to make code unambiguously clear sometimes but not that it’s impossible - commit messages are just text after all. After having to look up the commit messages do we at least make changes _then_? Do we try to learn a lesson to avoid the need in the future? If someone banned you from ever writing commit messages again, are you sure you’d make no changes to the way you write code? And if you would, why not make those changes today?

brian_cunnie · on Oct 30, 2022

Sometimes the code isn't clear because the reason for the "if" has nothing to do with the code and everything to do with externalities.

For example, I wrote a DNS server that had to take into account that on many Linux systemd distributions a daemon `systemd-resolve` binds to 127.0.0.54:53, preventing my DNS server from binding to INADDR_ANY port 53.

So my DNS server has a contorted piece of code that, when binding to INADDR_ANY fails, iterates through all the interfaces and all the IP addresses and binds to each one individually.

Anybody looking at the code would rightfully ask, "What the heck?", but if they read the commit messages, the reason becomes clear.

thom · on Oct 31, 2022

Why not just have a comment? Or structure the code such that the initial failure calling your bind_to_in_addr_any function sets a flag called inaddr_any_already_bound or something and checking that variable calls another function called bind_to_available_interfaces? There is nothing that can be explained in a commit message (which is just text) that can’t be explained in code (which is just text).

Dunedan · on Oct 31, 2022

Proper naming of variables/functions is important, but would only tell a part of the story. In the example you'd loose the context why that port might be already in use.

Using comments instead of commit messages is of course always possible, but if you'd note all relevant information you'd otherwise put into comment messages in comments, you might end up with more comments than actual code, which IMO hinders readability significantly. Take the Linux kernel for example. git commit messages there are very informative and often pretty long. Putting this information into comments would make the code pretty hard to read. Also sometimes there isn't a clear place where you'd put such a comment at, as it only makes sense when looking at multiple changes in different locations (= a git commit) at once.

I believe having the explanation why certain aspects of the code are as they are in the metadata in form of the git commit messages with an occasional comment in the code for important information is a good compromise between readability/verbosity of the code and having all relevant context available.

glacials · on Oct 29, 2022

Author might be interested in Git notes (`git notes --help`). Notes are arbitrary text content added to commits that don’t affect the commit’s identity, so they can be amended without rewriting history.

simonw · on Oct 29, 2022

Yeah I've not used those myself yet but I've been wondering if they might provide a path to "back up" my issue comments to the repository itself.

strictfp · on Oct 29, 2022

Ooooh, thanks for mentioning that!

andirk · on Oct 30, 2022

1) Write commit message. 2) THEN start doing work. If you get up for a spliff break, 1st thing after sitting back down is re-read commit message. Stay on track. 3) Stage files that are kempt, like a mini commit within the commit. 4) Once all files are staged, look at them and look at the message to make sure you have mentioned all that changed, including sometimes _why_. 5) Commit! But don't always push! Only push if you really like what you did. 6) Repeat. Squash similar commits together. 7) Rebase in master or merge in master. 8) Push! 9) Repeat! 10) PR!

Your commits will read like a story of the what and sometimes the why. PRs go smoothly. All is well. Lots of steps but it takes very little extra time.

swyx · on Oct 30, 2022

> I use these cookiecutter templates for almost all of my new projects. They configure a testing framework with a single passing test and GitHub Actions workflows to exercise it all from the very start.

this looks pretty handy and i feel like if anyone from @github is reading this it could maybe be built in to make the templates functionality a bit more useful out of the box

uwemaurer · on Oct 30, 2022

To me it is most important that the commit is focused only on one thing, and not mix any unrelated changes together.

We develop a Rust based tool HighFlux [1] which simplifies/automates working with git. In this workflow you create a feature branch for every feature you work on and it automatically commits all changes to this branch. After code review the branch gets merged as a squashed merge to have it in a single commit in the history.

1: https://www.highflux.io/

m000 · on Oct 30, 2022

How do you keep you commit focused on one thing if you squash?

Often, before you reach to the point where you can make the change you want, you need to make side-changes which are not a strict part of the feature you're building.

These should ideally be reviewed separately.

uwemaurer · on Oct 30, 2022

I agree this happens quite often. Since Highflux automatically commits changes to the current branch it is extremely easy to switch to a different branch with just one click (or CLI command). (so no stashing / dealing with untracked files is needed) Then you can implement the side change separately and then switch back again to your first feature branch.

Those two feature branches are then unrelated, we are planning to add stacked changes in a future version.

kalekold · on Oct 30, 2022

There had to be a rust ad in here somewhere.

sreevisakh · on Oct 30, 2022

The ad is about a git tool (hiflux) that happens to be written in Rust. That is perfectly relevant in this context. Rust is also a good language to write such tools (as is Go). I understand why Rust evangelism can be a bit tiring. But the opposition to it shouldn't try to imitate the same quality.

thunky · on Oct 29, 2022

> Sometimes I’ll even open an issue seconds before writing the commit message, just to give myself something I can link to from the commit itself!

IMO generally an issue should precede an implementation - it should track future work to completion, not work that is already complete.

Creating a new issue and then immediately closing it is an anti-pattern, in the same way that creating a pull request and immediately merging it is.

simonw · on Oct 29, 2022

Most of my work has an issue created at the start of the development process and closed with the final commit.

But occasionally I'll miss that step - maybe I dived straight in to fix a new bug without taking the time to open the issue first.

This are the cases where I'll open an issue at the last possible moment - on the basis that an open-and-shut linked issue is still better than a bug fix with no associated issue at all.

thunky · on Oct 29, 2022

> an open-and-shut linked issue is still better than a bug fix with no associated issue at all

It sounds like your issue tracker is a part time documentation repository.

I think I view this as an anti-pattern because I look at the issue tracker as a shared collaboration tool for planning & tracking, not as a tool to document all code changes.

For example, if I were solo I'd probably never create an issue (or a pull request).

simonw · on Oct 29, 2022

> It sounds like your issue tracker is a part time documentation repository.

Yeah, it's exactly that.

I'm planning to write more about this. Basically I've started thinking of issues as "temporaral documentation" - documentation with an attached timestamp that was known to be accurate only at the point when it was written.

A challenge with regular documentation is that when you first write it you are making a commitment to keep it up-to-date in the future. A project with thousands of pages of outdated documentation isn't much better than a project with no documentation at all (it might even be worse).

But a project with thousands of pages of timestamped issue comments feels different to me: there's no expectation that those will be updated in the future, but they still offer enormous value in terms of capturing the decisions that lead to the present day.

So yes: I do consider my issues as a form of documentation. They're not a replacement for traditional documentation, but they're a valuable extension to it.

thunky · on Oct 29, 2022

That makes sense, although I think I would prefer to put pull requests in the "temporaral documentation" category rather than issues. But it sounds like your process is working well.

quickthrower2 · on Oct 29, 2022

That is the perfect merge commit onto trunk/main. Branch commits can be chaotic if needed. The commit is then the unit of “walkawayability” a checkpoint where if thing get screwy they are happy to revert to.

furyofantares · on Oct 30, 2022

I work on small teams making video games. The hard part is figuring out what to make. The faster you can see your ideas play out in game and process feedback, the better. And any given piece of leaf gameplay code, at the time of being written, has a decent chance of being rewritten or deleted. Usually engine/tooling code doesn't work like this, and there's also systems in between the engine/tooling and the gameplay code that are written to make gameplay code very fast to write, and those in-between systems also don't get iterated on like this.

But most of the time is spent figuring out what to build by building something and iterating on it, and the perfect commit sounds like an excellent way to not get anything done in this context. I'd like to make 5-10 changes before lunch, play or spectate a playtest in the afternoon, and process feedback afterwards, as well as fit in a design discussion that's not necessarily about anything that's currently being iterated on.

simonw · on Oct 30, 2022

This sounds to me like what I think of "prototyping" in my own process. When I'm prototyping I'm often not working with version control at all - or if I need to track changes over time I'll work in a branch (or sometimes even just a GitHub Gist, which is effectively a cheap separate repo).

Once I've prototyped my way to a design that I like, I'll craft a proper commit with tests that captures the new design.

sreevisakh · on Oct 30, 2022

The perfect commit is useful only when you're ready to publish. Development can be chaotic with all the half-done or possibly-wrong code. You don't need to worry about all those standards while you're still developing. The perfect commit is then crafted out of the chaotic development branch by editing the history. I keep the original development branch alive for reference, while the crafted commit is submitted to the maintainer.

g051051 · on Oct 29, 2022

> Our job as software engineers is not to write new software: it’s to apply changes to existing software.

Well, I could stop reading right there.

simonw · on Oct 29, 2022

I probably should have expanded that point: I was trying to be pithy.

What I meant is that we spend 95% of our time making changes to software - adding new features, fixing bugs etc.

Starting a new project from scratch is something we might do a few times a year at most, but changing an existing project is a thing we do every working day.

g051051 · on Oct 29, 2022

Adding new features is writing new software in my book.

cratermoon · on Oct 29, 2022

What do you call writing -2000 lines of code? https://www.folklore.org/StoryView.py?story=Negative_2000_Li...

g051051 · on Oct 29, 2022

I call it "LOC is a terrible metric".

unionpivo · on Oct 29, 2022

Most developers add new features by modifying/adding to an existing code/repo most of the time.

Hell even when you create new project, a lot of development environments provide basic scaffolding, that you then modify (or sometimes remove 95% of).

khpart · on Oct 29, 2022

It's Python, which is a churnfest by design. Other communities have this problem to a lesser degree.

willm · on Oct 29, 2022

Great article. I’d like to quote it in a GitHub issue template.

I struggle with “a single focussed change”. As my coworkers will attest. When working on new features rather than bugs, I find it hard to say when one feature ends and another begins. Something I need to work on!

quickthrower2 · on Oct 29, 2022

Try to say it before coding, and probably the team should help with chunking it down.

Sometimes a thing cannot be chunked down more without silly inconvenience so keep that in mind.

simonw · on Oct 29, 2022

One thing that works for me here is to try and split the work into changes that could be shipped to production independently of each other.

irrational · on Oct 30, 2022

> Our job as software engineers is not to write new software

This can’t mean what the plain reading would suggest it means. I mean, I can’t be the only one writing new software all the time, can I? So, what does it actually mean?

simonw · on Oct 30, 2022

See this comment: https://news.ycombinator.com/item?id=33388466

It was a clumsy way of indicating that we very rarely start a new project from scratch - if we are building a feature it's almost always adding to an existing project.

manunamz · on Oct 30, 2022

This is a nice write-up.

I agree with other comments here too, which mention not necessarily needing to hit all points on this list. If working alone, or quickly prototyping, they may not be strictly necessary. It's about trade-offs: The more people involved or the longer one goes without re-visiting a project, the more important these bullets become.

If there isn't a pre-established standard, it's useful to run a mental "commit hook" that evaluates the usefulness, or future likelihood of usefulness, of these items.

febeling · on Oct 30, 2022

This depends on so many variable that I don't believe there's one perfect commmit. E.g. it depends in the lifecycle stage of the project. Is it just you figuring something out, or are hundreds of thousands of people depending on your work with their livelihoods? The first probably deserves the kinds of restraint that Op wants to see. Whereas you would terribly stifle you creative flow if you were to write docs for your explorative work in the unknown.

rendall · on Oct 30, 2022

One benefit of squash-commits to master/main is in continuous integration. If the commit needs to be reverted because it introduced a bug into production, the reversion is relatively easy to do. With an entire history, especially with later commits, finding the buggy commit can be a nightmare.

jonnycomputer · on Oct 29, 2022

Is there a way to add only certain lines of a file to a commit? Sometimes it'd be nice to say, "only these changes" make it to this commit, while "those changes" will be for another. Easy enough to do when they happen to be in different files, but sometimes that doesn't work out.

sreevisakh · on Oct 30, 2022

Other commenters have mentioned interactive and patch staging. That is the standard way to ensure that only the lines you specify go into a commit. I want to add an alternative method that offers some extra advantages.

Patch stack management tools like stacked-git and topgit allow you to setup a number of patches (using git). You can then specify which patch each change goes into. This is sort of like having multiple staging area/indexes at your disposal. They also allow you to add commit messages to each patch at any stage of the development. You could also add, delete, split, squash or edit each patch at any time. And these patches are converted to regular commits at the end of development of the feature. You can also convert any commits into patches if you ever need it.

The attraction that I see in this workflow is that it allows you to combine the chaotic development workflow and final rebasing step into a single stage.

simonw · on Oct 29, 2022

Yeah, you can do that with "git add -i" - it's not got a great interface but once you've learned how to use the "patch" and "split" commands you can generally craft commits with just specific lines pretty effectively.

iib · on Oct 29, 2022

`git add -p` should do it. git works great by "patches", not necessarily by "files", so what you are saying is supported out of the box really well.

9dev · on Oct 29, 2022

If you use an IntelliJ IDE, they have this— I frequently select just a few lines in a diff viewer and commit selectively what is logically coherent.

Not affiliated, just a big fan.

other_herbert · on Oct 30, 2022

It also will track changes you don’t want… like if you have something you’ve changed for testing that can be kept in a separate change list (I call mine “ignore-for-now”) and the n future changes to that file would be in the regular changes, just minus the other stuff… also not affiliated, another fan

Noumenon72 · on Oct 29, 2022

Yup, this is awesome. Just flit from one change to the next with Opt-Down Arrow and uncheck the ones you don't want to commit.

ElectricalUnion · on Oct 29, 2022

git-gui, the Motif (either retro awesome or ugly as a sin, depending on who you ask) tcl/tk gui, besides being pretty fast, can do that. I get side looks every time I commit something in the middle of a file without commiting the entire file.

Select a file by clicking it's name.

Select the lines, alternative click then "stage this hunk/line to commit".

You can also add/remove entire files from staging by clicking the file icons.

yCombLinks · on Oct 30, 2022

I use git kraken and you can easily choose individual lines in a file to include.

jonnycomputer · on Oct 30, 2022

Thank you everyone for your great replies. I'm excited to try this out!

wara23arish · on Oct 29, 2022

I know this is made easy with magit

beachy · on Oct 30, 2022

> Without tests, there’s a very strong possibility that your change will have broken some other, potentially unrelated feature.

[...]

> This is why I start every single one of my projects with a passing test. It doesn’t matter what this test is—assert 1 + 1 == 2 is fine!

These two statements do not add up.

lostcolony · on Oct 30, 2022

badum-tsh

That said, I think he just means "do the work to get tests in place, and build the habit of running them as part of your development cycle", even if, when starting, what you're running isn't helpful. The idea being that building "write a test" "run the tests" into your iterative loop, from the beginning, is a worthwhile practice.

BlargMcLarg · on Oct 30, 2022

It's half work until those tests actually test something of value. We all know what happens with half-work in the long run, and it certainly isn't 'they build habits towards full work'.

That's before considering even if the tests do test something, most code tests written are easily improved upon garbage.

simonw · on Oct 30, 2022

They add up if you consider the text in between them: "Adding a new test to a project with a lot of existing tests is easy"

My point here is about ensuring you have a test suite from the very first commit to a repo, so you have an obvious place to add tests to in further commits.

rendall · on Oct 30, 2022

If one is working in a team, and the team does code review before committing to master, then tests can be part of the code review. The most stable code bases I have ever worked with come with teams who will reject a pull request with inadequate tests.

WesolyKubeczek · on Oct 29, 2022

My perfect commits have message „WIP on stuff”, include changes to code coupled with formatting changes, have zero tests, nada documentation, and no links to issues.

But they are perfect nevertheless, because I judge them so. And because the code works.

Also, because I get to get out more.

simonw · on Oct 29, 2022

This post was extracted from a talk I gave at DjangoCon a few weeks ago, which was about how this kind of workflow has /increased/ my personal productivity by a material amount. https://github.com/simonw/djangocon-2022-productivity

So I get to get out more because I'm doing this!

WesolyKubeczek · on Oct 29, 2022

You may think so ;)

DangitBobby · on Oct 30, 2022

I honestly feel that any sort of documentation in commits themselves (besides a short, concise summary of what and maybe why) is a waste. It's one of the least likely places people will be looking for that information. A better place is in comments near the code, though obviously that has its own issues.

ziml77 · on Oct 30, 2022

I agree. It's much better to just have all of the key info inline. If I look at the commit history of a file, there's going to be such a mess of irrelevant commit messages to the issue I'm looking at that it's hardly worth spending the time trying to find if the info is there.

js2 · on Oct 29, 2022

I wanted to get out today, but instead I spent two hours debugging an issue because I was dealing with code from someone who thought like you who didn't think documentation, comments, meaningful commit messages, or tests were a good use of their time.

I don't care what you do with your code, but as soon as someone else inherits it, you're a sociopath if you didn't document anything along the way.

And btw, the person who inherits that code may be your future self, and you'll thank your past self for writing down why you did something the way you did.

WesolyKubeczek · on Oct 29, 2022

Do you think I should also document all the bugs I put in my code?

If I knew they were bugs, I wouldn’t be putting them there in the first place. I may be crazy, but a moron I’m not.

js2 · on Oct 30, 2022

You should document what the code should do, why it exists, and why you did it a certain way.

In my current role, I've inherited a decade of cobbled together shell, python, ruby, javascript, java, and groovy that together constitute a large part of our build release system, spread across a dozen or more repos, written by over a dozen developers.

This code is undocumented, untested, brittle, and riddled with heisenbugs.

Among other things was a developer who was terrified of ever breaking anything. So whenever he needed to make a change, he'd duplicate the code. One recent example I dealt with is three copies of a script: copyReports.sh, copyReports-alt.sh and copyReports-multi.sh. In addition are symlinks to each script without the .sh extension.

Before I could even touch this code I first had to instrument it to record when it was run, to figure out which of these was even in use, and under what circumstances. I had to try to tease out why the developer made the extra copies, because he didn't document the reason when he did so. I had to guess why there were both version with the '.sh' extension and why not. The developer hadn't written so much as a single sentence explaining his thinking at the time.

So no, I don't expect you to document your bugs, but maybe in the process of writing a commit message, or a test, or some documentation, you may realize on the second thought that you made an error in the code and save yourself from a bug.

But in any case, at least when you come across your bug a year from now, it'll be clearer what the code was supposed to do because you wrote it down at the time, as opposed to what it actually did.

WesolyKubeczek, can I ask how long you've been a developer and what size projects you've worked on?

I've been writing code professionally for over 20 years, worked at companies and on projects big and small, and have used revision control systems since RCS/SCCS, on through clearcase and subversion, mercurial and git. I've worked with code old and new. In my experience, documenting things along the way is invaluable to building a sustainable code base. You may think that shipping and velocity is all that matters and that writing a commit message is nothing but friction. But that hasn't been my experience, neither at startups nor at fortune 500s. Code that's going to be around more than 6 months is going to be revisited. To add features. To fix bugs. Even just to delete it. And when that time comes, having a commit message saves time, much more time than was spent writing the message in the first place. I personally consider it development malpractice not to document my code along the way.

WesolyKubeczek · on Oct 30, 2022

> WesolyKubeczek, can I ask how long you've been a developer and what size projects you've worked on?

Just over some 20 years. Haven't used RCS, but had some CVS and this version control system you have when you put file1.php, file2.php, file3.php, etc onto the production FTP, update includes, and hope you have caught all includes. Wasn't my idea, so don't ask. Codebases starting at several hundred thousand lines of code and up, mostly old ones.

What I have learned is that given enough time, everything is a lie. Documentation is a lie probably as soon as you're writing it. Comments in the code are lies in a month. Commit messages are lies too, by the time you bisect some sonofabitch of a bug, the offending commit will point at something unrelated entirely that subtly breaks what you care about.

When debugging regressions, my tool of choice is "git log -p". Sometimes commit messages help, but mostly they have been irrelevant, and only the actual diff matters. At best, they only show what the developer was thinking he was doing at the time.

Sure, you can take time to make better commits, better commit messages and stuff, but by vehemently insisting on THE PROCESS and demanding documentation, an opus magnum of tests and a commit message each time, you exhibit traits of a control freak who wants to impose their OCD onto everyone else in the team. This is an area where diminishing returns kick in very quickly.

I try to do better commits when I feel they are justified. When doing bugfixes, I describe what the error was, how I got to it, and why I think the fix is going to work.

OTOH, I have learned to never demand — as long as code changes are sound, and as long as it's not so blatant as to look like a sabotage — form from my colleagues. I've learned that anyone who spends time nitpicking at the form in a PR, bickering over formatting and grammar and whatnot, is usually a shallow-minded prick with nothing to say about the content of the change. It's not worth it in the longer run.

js2 · on Oct 30, 2022

I can only speak from my own experience. In my own experience, commit messages have saved me time. Documentation has saved me time. Not having had these things has sent me down hours long rabbit holes having to figure out why something existed.

Reading the code doesn't tell me anything about why the developer did something a certain way, some important bit of context. Also in my experience, developers who don't explain anything in commit messages also don't explain anything in comments or elsewhere. So my day is spent encountering dozens of Chesterson's Fences. If those fences simply had a sign on them explaining why they were there I could more easily know whether they could be torn down.

As a recent example, our macOS build hosts were running an alternate sshd on port 2222. Why was it there? What was wrong with using the sshd on port 22? If I tear it down and it doesn't seem to break anything, should I worry whether there's some use case I hadn't thought of? This alternate sshd was started by a shell script, so there was a place to add a comment saying why it was there, but there was no comment. The shell script was kept in git, but there was no explanation on the commit message or its PR. Eventually I was able to track down that it was due to a bug in Xcode that hadn't been relevant in over 5 years. However much time that developer saved by not documenting their work (a minute? 5 minutes?), I lost more than an hour having to figure out. I'll bet that developer is still going around not documenting their work and maybe they get promoting for shipping things fast. But then they move on and I get their mess.

airstrike · on Oct 29, 2022

I feel seen

drawfloat · on Oct 30, 2022

It’s a shame you can’t collapse commits when merging a branch - generally see a the single commit for the merge, but can expand to view the commits that were collapsed if you ever need a fine grain review in the future.

Izkata · on Oct 30, 2022

  git log --first-parent

drawfloat · on Nov 2, 2022

Great tip, thanks

penguin_booze · on Oct 30, 2022

My rule of thumb is to size and structure commits so that they're worthy of cherry-picking. This is blame-friendly, too.

I find it frustrating when `git blame` tells you "review comments".

ivix · on Oct 29, 2022

When in a serverless environment it's easy to get into a situation where you need to commit to deploy and test anything. Then git squash is essential, unless you're the only developer.

J-_-_b · on Oct 29, 2022

here's 95% of all of my commit messages: "wip"

ChrisMarshallNY · on Oct 29, 2022

I will often paste a line from the CHANGELOG.md into the commit.

We used to use Perforce, which forced you to write a commit message. One of my employees used to write "asdf". It got a bit old. I didn't care that much, myself, as he was responsible for maintaining his own code, and he did fine, there.

The Japanese hated it, though, and it probably played a big part in his getting laid off. I learned a big lesson, there, and insisted on my employees being more circumspect, after that.

DangitBobby · on Oct 30, 2022

I think there are two things in tension here. VC is primarily useful to help make sure you don't lose work and can merge separate threads of work into one place. In distant second is its ability to communicate those changes being made. That's also probably happening on numerous other places, like your issue tracker, documentation, whatever. Anyway, if I'm in the middle of something that doesn't really form a complete coherent change but also represents work I don't want to lose... You might get useless commit messages.

Useful commit messages are valuable, but so is the ability to actually incrementally change and back up your code. As an aside, that's why I think blocking pushes to remote due to failing tests/linter issues/basically any reason is misguided.

ziml77 · on Oct 30, 2022

> but so is the ability to actually incrementally change and back up your code

Backing up the code is why I will end up with multiple WIP commits when working on a large change. I want to get it pushed to the remote to not risk losing large amounts of work if something happens to my local copy. It's also nice knowing that if an emergency or something keeps me away from work for some time, the work I was doing could be picked up by a coworker without needing to convince InfoSec to provide them access to poke around the drive of my PC.

seba_dos1 · on Oct 30, 2022

That's exactly what makes distributed version control systems so powerful. Your local WIP commits, even when pushed out to a remote backup branch, have generally nothing to do with what you push out for others to review and integrate into working branch. Working this way with systems like Subversion would be much harder, if not impossible.

ChrisMarshallNY · on Oct 30, 2022

Yes, that makes a lot of sense. I like to have many small commits, as opposed to one big one.

However, I’ll often defer a commit, to avoid “breaking the build.”

I’ve come to avoid branching, if at all possible, using tags, instead.

That works for me, as I generally work alone, but it does not work for a team. In that case, we need working branches, and a stable mainline.

In my experience, commit comments are almost worthless, except as “mnemonics.” I tend to use diff views, instead, to figure out commits.

andirk · on Oct 30, 2022

It's simply disrespectful to not use the tools thats main purpose is to inform others of what's going on. I've seen it before and I don't fuss about it, but it's so arrogant, like "_I_ know what's up, and that's all that matters. Hmph".

Write good commit messages. Use present tense like "Add cats to news feed.". For your "WIP", squash them together once there's something cohesive.

Next on my gripe: A single half sentence email responses!

superdug · on Oct 30, 2022

full stop ... if you want my commit to be perfect ... make a unit test I can run on my PR.

Meet my need first, and I will automatically meet yours.

immy · on Oct 30, 2022

Reformatting mixed with real changes, woooof

therealplato · on Oct 30, 2022

``` 0 lines added 1 line removed ```

kingkongjaffa · on Oct 30, 2022

as a PM how do I write the perfect feature/bug issue?

Waterluvian · on Oct 29, 2022

Raise your hand if commits are mainly just a way to save progress. It can’t just be me.

I appreciate that needs differ between projects. In some public project I’d be more disciplined. But at work I just commit like a madman and collapse them all automatically upon merge.

ISL · on Oct 29, 2022

The key world is "mainly". If things go wrong, you haven't just saved progress, but also saved history. That distinction is powerful.

Waterluvian · on Oct 29, 2022

Indeed. A clean history can be helpful. And this is certainly context specific. In my experience it hasn’t been as useful as the cost to do so, so I’ve relaxed a bit on making my commits carefully.

tkiolp4 · on Oct 29, 2022

Here. I have a colleague obsessed with the “perfect” commit, with rebase instead of fast forward merges and what not. Can’t stand it. I do care about writing a good summary commit title that includes the Jira issue number that links to a well description of the feature/issue at hand.

harryvederci · on Oct 29, 2022

For work it's different, but for my side project:

    $ git log --pretty=oneline | wc -l # Total amount of commits.
    1369
    $ git log | grep '^    \.$' | wc -l # Count commits with a message of "."
    902

cratermoon · on Oct 29, 2022

Do you also push to the central repo, on your branch? Or wait until you're at a reasonable "checkpoint" to push?

vsviridov · on Oct 29, 2022

"Link to issue for context"

Just no...

Why would you decouple the change from its context? Just add all the relevant information to the commit message itself, so it's preserved.

I follow my friend's approach which works really well.

https://yrashk.medium.com/solving-problems-one-commit-at-a-t...

Too many a times I had to do code archeology to encounter archived Jira or some old space I don't have permissions for, that was originally migrated from GitHub issues and all the ticket numbers are in complete mismatch.