"Ideal" design and workflow for VCS

eekee · Post by **eekee** » Wed Jun 12, 2019 8:10 am

Solar wrote:Happens for me as well. The trick is to discipline yourself and...

a) Work on one issue at a time only. Don't let the scope creep, and start by thinking about "what will this look like as a change set, and what would be the comment?".

b) As soon as you resolved that issue, commit. Do not continue to add other tidbits to the same change set. You've reached a point of stability; snapshot it, like you would in a game.

As long as you follow those two rules, it doesn't matter if "work in progress" sits in your working directory for a while. (At least, not until someone else touches the same pieces of code and you have to start merging changes.)

Reexamining these rules is proving to be interesting! I'm actually very good at working on one issue at a time, except... except what?

What prevents me from looking at what I previously did before resuming work? The fact that ideas come to mind when I'm away from the computer. Can I mentally save them for later or throw them away? Not really, many are major clarifications of concept or development of intermediate goals. I imagine you have much less of this if you're working to standards or a preexisting design, or if you just select goals from things you already understand. I have very little code at this stage, it's mostly prose and bullet points. I have code for a text editor which is helping show up what's good and bad in the design, but other than that it's all slowly crystalizing ideas. They are crystalizing, there is improvement and greater detail over time. Sometimes I forget something important and write down a lot of infeasible plans which have to be reverted, so history would be useful.

So... is there any reason I can't commit my prose? Coming up with a commit message could be painful: I've just worked very hard to distil my ideas, and now I have to condense them again? A vague commit message may be better than none: I could produce accurate commit messages when I can, vague when I can't.

Now, when do I commit? I do reach points where I'm satsfied, I could commit then. It's harder if I hit a fatigue wall, walk away, and then come back 3 days or 3 weeks later thinking about a whole different topic which I want to write down before I get confused. I can't pick over my old changes. What if I discipline myself to commit when I get up in the morning? Well, some days I'll be too fatigued. On some of those fatigued days I'm going to try to pick over my changes "to produce a good commit" without the ability to think clearly, which will make things far worse than if I wasn't committing. If it's auto-committed or versioned or dumped during the night, that can't happen.

To summarize:

I've already got the discipline to work on one thing at a time
With my health, manually committing carries a significant potential for disaster

If I want history (and I do), versioning or automatic commits seems to be the only way to go. It will "somewhat muddy the waters around b)," but that can be mitigated by committing manually when I can.

Now I've finished this, I remember I worked all this out before.

I suppose it's good to reexamine one's convictions every once in a while. I often come to stronger conclusions after a challenge. On that note, I'm off to reexamine some of my actual religious convictions.

Korona · Post by **Korona** » Wed Jun 12, 2019 10:09 am

eekee wrote:What prevents me from looking at what I previously did before resuming work? The fact that ideas come to mind when I'm away from the computer. Can I mentally save them for later or throw them away? Not really, many are major clarifications of concept or development of intermediate goals. I imagine you have much less of this if you're working to standards or a preexisting design, or if you just select goals from things you already understand. I have very little code at this stage, it's mostly prose and bullet points. I have code for a text editor which is helping show up what's good and bad in the design, but other than that it's all slowly crystalizing ideas. They are crystalizing, there is improvement and greater detail over time. Sometimes I forget something important and write down a lot of infeasible plans which have to be reverted, so history would be useful.

To be honest, that sounds just disorganized. I am sure we all (or at least most of us) have a lot of ideas while we take breaks. The key thing here is organizing them well so that you can pick them up again when it's time to tackle them. I often use plain old paper for that. Or GitHub issues. Or Google Keep lists. Or Kanban boards. Or I draw my ideas on a whiteboard. I could imagine that it's possible to use VCS or versioned files efficiently for the same tasks. But this is really an organizational and not a tooling issue. IMHO, formulating ideas in a clear and concise way is a really important skill, not only for programming but in life as a whole.

eekee wrote:Now, when do I commit? I do reach points where I'm satsfied, I could commit then. It's harder if I hit a fatigue wall, walk away, and then come back 3 days or 3 weeks later thinking about a whole different topic which I want to write down before I get confused. I can't pick over my old changes. What if I discipline myself to commit when I get up in the morning? Well, some days I'll be too fatigued. On some of those fatigued days I'm going to try to pick over my changes "to produce a good commit" without the ability to think clearly, which will make things far worse than if I wasn't committing. If it's auto-committed or versioned or dumped during the night, that can't happen.

I fully agree with what Solar said about planning your commits. But if you come back after 3 weeks to uncommitted changes, it's always good to remember that git gives you a vast amount of tools to deal with the situation. You could `git stash` your changes and focus on something else. You can `git stash -p` them if you want to retain most changes for now. You could break them down to sensible commits using `git add -p` or `git add -i`. You can reorganize existing (non-upstreamed) commits using `git rebase -i`. `git commit --fixup` / `git rebase --autosquash` comes to mind. If even `git cherry-pick` or `git merge --squash` to a different branch if your patch set is getting messy. `git checkout -p` sometimes also works nicely for that.¹

Your patch set does not necessarily have to be clean at commit time. But it's worth taking the time to make sure that it is clean at push time.

¹As an example: I recently fixed the compilation of my userspace drivers under Clang. This was done as a giant WiP commit at first (because I didn't know what error clang would throw next). There were around 500 sloc of changes over around 2 weeks. In the end, I broke it down into 17 sensible and self-contained commits before pushing it.

Solar · Post by **Solar** » Thu Jun 13, 2019 2:11 am

eekee wrote:What prevents me from looking at what I previously did before resuming work? The fact that ideas come to mind when I'm away from the computer. Can I mentally save them for later or throw them away? Not really, many are major clarifications of concept or development of intermediate goals. I imagine you have much less of this if you're working to standards or a preexisting design, or if you just select goals from things you already understand.

Not really.

Right now I have many different places "under construction" in my PDCLib. I started working on locales (because it's one of the few non-math headers still to be done for C99 support, and the underpinning for wide and multibyte support). I then realized that I will need, basically, full Unicode support to "do it right", so I shelved locales and started working on Unicode. I got tangled up there and shelved the whole project for a while. Then I got a downstream request for thread support, so I started working on that, and finished it to about 98% -- I need some more tests, and some things (like the handling of errno) still needs to be adapted to utilize the thread support. I also realized I got a couple of bugs in freopen(). Then I got a downstream request to implement a thread-safe strtok(), so I had a look into the bounds-checking interfaces...

But that does not mean I couldn't have single-purpose "sprints" and commits. I don't mix threading code and bounds-checking code in one commit. I work on one of those things at a time. If I have an idea I want to preserve, I add a comment to a file and commit that, or I add some prose to the project website so I will be reminded later. But my point remains: Each commit should be about one thing.

This is my commit log. The main purpose of commit comments is that, when looking at that log, I can quickly find that half-remembered commit that (for example) "removed that test artifact", because I want to look at it again. (Bad example but I wanted to pick something that's on the first page of the log.)

You will see there's a lot of intermediate, patch-y stuff in there, because right now I am working from my smartphone more often than from my desktop, which makes for lots of "the train will arrive in five minutes, I have to wrap this up for now" type of commits. That's not a bad thing, a commit is not a release. The only reason not to commit changes just yet is if they break compilation. Committed code should always compile (even if it might not test ok yet).

eekee wrote:I have very little code at this stage, it's mostly prose and bullet points.

Concept work and similar shouldn't be held in the code repository. Set up a project wiki -- you get much the same functionality as from a VCS (versioning, safekeeping, history, accessibility), and you don't get your working directory cluttered up by loads of *.txt. Consider your source repo to be about "what a client will see"; that shouldn't include concepts, temporary work, or experiments.

Korona wrote:...if you come back after 3 weeks to uncommitted changes, it's always good to remember that git gives you a vast amount of tools to deal with the situation. You could `git stash` your changes and focus on something else. You can `git stash -p` them if you want to retain most changes for now. You could break them down to sensible commits using `git add -p` or `git add -i`. You can reorganize existing (non-upstreamed) commits using `git rebase -i`. `git commit --fixup` / `git rebase --autosquash` comes to mind. If even `git cherry-pick` or `git merge --squash` to a different branch if your patch set is getting messy. `git checkout -p` sometimes also works nicely for that.

And right here, boys and girls, we can see the major reason I advise against git for small projects, especially one-man projects. I mean, look at all that! It's nice that git offers it all, and it sure can be helpful if working as a repo maintainer for a huge code base with dozens of developers contributing. I will also freely admit that git is the better choice if you're working offline on a regular basis.

But for one man and his hobby project, it just gets in the way too often. The "vast amount of tools" needs to be learned, remembered, and applied with caution because git does make it possible to make a huge hash of everything (no pun intended). It's very easy to lose track of things you "stashed", rebasing has the potential of really screwing with your history, and generally speaking git's interface is a mess.

Especially when you have problems keeping yourself organised and focussed in the first place. Git is a bunch of tools that enable, but don't really help with, your organization. (As exemplified by the various "workflows" that are out there, hotly debated for their pros and cons.) Compared to that, SVN has a very clear and concise "modus operandi", and a well-documented and consistent interface.

eekee · Post by **eekee** » Thu Jun 13, 2019 7:18 am

Korona wrote:To be honest, that sounds just disorganized. I am sure we all (or at least most of us) have a lot of ideas while we take breaks. The key thing here is organizing them well so that you can pick them up again when it's time to tackle them.

I would be more organized if I wasn't trying to do quite what I am trying to do.

"We all" (including myself) pick ideas and goals for our projects from existing systems and from research we understand. Some of us toss in something all-new or combine things in new ways. It's in something all-new that my problem lies, because it lies right at the heart of how my components will fit together and how the user will think about them. I find myself with a chicken-and-egg problem: I can't have clear ideas without clear goals, and I can't make clear goals without first having clear ideas. It's not entirely chicken-and-egg; ideas are slowly being filtered down for practicality, but getting past this stage is a slow process. I don't want to code too much only to have to scrap it, but maybe I should; it's experiment. I've been doing a lot of thought experiments, but I have to admit I'm not good enough at them to tackle this issue well. (I'm also wondering if I should try a Canon Cat emulator as another source of ideas and goals, but I don't want to get bogged down in something which is not exactly what I want. Also, getting further off topic.)

Solar wrote:Concept work and similar shouldn't be held in the code repository. Set up a project wiki -- you get much the same functionality as from a VCS (versioning, safekeeping, history, accessibility), and you don't get your working directory cluttered up by loads of *.txt. Consider your source repo to be about "what a client will see"; that shouldn't include concepts, temporary work, or experiments.

I just came to the same conclusion, yes.

Solar wrote:And right here, boys and girls, we can see the major reason I advise against git for small projects, especially one-man projects. I mean, look at all that! It's nice that git offers it all, and it sure can be helpful if working as a repo maintainer for a huge code base with dozens of developers contributing. I will also freely admit that git is the better choice if you're working offline on a regular basis.

But for one man and his hobby project, it just gets in the way too often. The "vast amount of tools" needs to be learned, remembered, and applied with caution because git does make it possible to make a huge hash of everything (no pun intended). It's very easy to lose track of things you "stashed", rebasing has the potential of really screwing with your history, and generally speaking git's interface is a mess.

Especially when you have problems keeping yourself organised and focussed in the first place. Git is a bunch of tools that enable, but don't really help with, your organization. (As exemplified by the various "workflows" that are out there, hotly debated for their pros and cons.) Compared to that, SVN has a very clear and concise "modus operandi", and a well-documented and consistent interface.

I'm happy to see someone else expressing these opinions.

I made an effort to look at Korona's suggestions but for example found stash to be the opposite of what I need, and the potential for losing stashes was obvious. I didn't get as far as looking at rebase, but if it can screw up the history, I certainly don't want to use it on a regular basis.

Um... I think I had more to say, but I've completely run out of focus. I could certainly find more to say about Git, including "I want to look into Mercurial,"

but I don't want to put this post into my queue of half-finished things to do. (I've learned to keep this queue very short. The one big project may seem to be an exception, but it's a low-priority queue which is a single entry on the short high-priority queue.)

Has anyone used Fossil?

eekee · Post by **eekee** » Tue Jun 18, 2019 4:07 am

I decided to lower my sights somewhat, I can get close to what I want by picking designs from other systems. I've still got some wild ideas, but I'm no longer making them central to the design, I can just drop them if they get to be too much.

Getting somewhat back on topic, I'm trying Fossil. It's one binary which provides all the command-line VCS functions and a web server with web-VCS, wiki and ticket systems too -- everything I need in a single download with trivial installation. It's a 'cathedral' type project, which gives me hope the interface will be more consistent than 'bazaar'-type Git.

My first impression of Fossil as a VCS is that it's less confusing than Git. There are fewer commands with fewer options. I'm not really qualified to offer an opinion between different VCSs, but perhaps this Fossil versus Git page may be enlightening. One entry in the little table got my attention: Git "Remembers what you should have done", Fossil "Remembers what you actually did". I have no idea what that means!

I'm making much more use of Fossil's wiki now, as recommended. It's quite simple, with no categories or even 'reason for edit' messages. I like that it retains deleted pages. I'm not quite happy with the markup languages, one doesn't make nested indents easy while the other doesn't make linking as easy as I'd like. I am happy with the ease of creation of new pages which a wiki offers. Perhaps with more pages I'll make less use of nested indents. Overall, I'm happier with it than I was with the Acme text editor which I wrote about a while ago.

(Acme makes cross-linking and page creation easy, but its window system means lots of small pages become a bit fiddly. I held back from page creation, wrote long pages, and lost stuff in the lower parts of pages.

Plus, of course, it requires a versioning file system to approach wiki functionality.)

Edit: re-reading that Fossil versus Git page, I think I'm probably going to appreciate "The ease with which check-ins can be located and queried in Fossil [which] has resulted in a huge variety of reports and status screens that show project state in ways that help developers maintain enhanced awareness and comprehension and avoid errors." And I learned what that table entry means; it's to do with the deliberate omissions of rebasing and hidden branches. I'm not sure I entirely agree with those choices, but it won't matter to me because I don't want to run a project with many contributors. (My ego does, but my senses of self-preservation and modesty know better than to try.)

Solar · Post by **Solar** » Wed Jun 19, 2019 5:49 am

eekee wrote:Getting somewhat back on topic, I'm trying Fossil. It's one binary which provides all the command-line VCS functions and a web server with web-VCS, wiki and ticket systems too -- everything I need in a single download with trivial installation. It's a 'cathedral' type project, which gives me hope the interface will be more consistent than 'bazaar'-type Git.

FWIW, Trac is a one-stop solution web-VCS / wiki / ticket system, working with either SVN or git as the VCS backend. If you're ever looking to "if only I could have done the Fossil thing with SVN or git", Trac's your answer. (Although I can only vouch for the SVN backend, I didn't work with git yet when I gave Trac a try -- that was pre-1.0 IIRC, and I was positively impressed even then.)

eekee · Post by **eekee** » Wed Jun 26, 2019 9:17 am

Solar wrote:FWIW, Trac is a one-stop solution web-VCS / wiki / ticket system, working with either SVN or git as the VCS backend. If you're ever looking to "if only I could have done the Fossil thing with SVN or git", Trac's your answer. (Although I can only vouch for the SVN backend, I didn't work with git yet when I gave Trac a try -- that was pre-1.0 IIRC, and I was positively impressed even then.)

That is cool. It would solve the one regret I have: there are barely any free, administered Fossil hosting services I could use for low-effort off-site backup.

There's certainly nothing on the scale of Github which I expect to last... although Google Code was just as large when it shut down, wasn't it? In any case, there is one known Fossil-specific host, Chisel, and Fossil can readily be hosted anywhere that allows CGI, including Sourceforge.

lmemsm · Post by **lmemsm** » Mon Jul 29, 2019 8:17 am

glauxosdever wrote:Hi,
Since git won't (probably) run on my OS and I don't want to be dependent on another OS for hosting and building my OS eventually, I decided to design and implement a Version Control System that will run on Linux and other UNIX-like OSes for now, but will be easy to reimplement for my own OS. I also know there is no "ideal" design or workflow for anything, but maybe it can be better in some respects than in existing VCSs. So I'd like to get some input from you on this matter.

Git is a nuisance to port. However, there are some implementations of git that might be easier to port to other platforms. There's libgit2 and git9 from Plan 9 ( https://github.com/oridb/git9 ). I looked at the source code for several version control systems. sccs and rcs are some older systems that are fairly easy to port. They did have some drawbacks compared to more modern version control systems though. I seem to remember sccs having some issue with handling dates and needing patches to continue to work. From more modern version control systems, the one I found most portable was fossil by the creator of SQLite ( https://www.fossil-scm.org/home/doc/tru ... index.wiki ). If you don't want to reinvent the wheel, you may want to take a look at it. Would definitely recommend you look at the code and features of some of these systems before deciding to write your own. See what works for you and what doesn't. You may be able to fix something that's already out there and use it for your needs rather than starting from scratch.

If you are going to write your own, then two important pieces would be the ability to create differences (similar to diff) and the ability to reintegrate those differences (similar to patch). It's interesting just to consider how to design a program like diff. Many versions of diff (and I believe the BSD diff currently in use) use longest common subsequence and run in O(n log n) time (worst case n^2 log n). The later GNU versions of diff use an algorithm by Myers that runs in O(ND) time. There was an open project at the BSD web site to switch the BSD diff implementation to Myer's algorithm. There are trade-offs between speed/efficiency and memory usage. So before you implement a version control system, you may want to look into how you'd implement the pieces used by a version control system (such as diff functionality, patch functionality, etc.). There are some cases where I just use diff and patch tools in place of an entire version control system and that works well enough for those particular scenarios. If you'd like to discuss design/implementations of diff tools further, let me know. I found (and modified) a BSD licensed version of patch that I'm happy with. Haven't yet settled on a diff tool implementation that I really like.

OSDev.org

"Ideal" design and workflow for VCS

Re: "Ideal" design and workflow for VCS

Re: "Ideal" design and workflow for VCS

Re: "Ideal" design and workflow for VCS

Re: "Ideal" design and workflow for VCS

Re: "Ideal" design and workflow for VCS

Re: "Ideal" design and workflow for VCS

Re: "Ideal" design and workflow for VCS

Re: "Ideal" design and workflow for VCS