So, I'm reaching out for polite suggestions on an ongoing issue at my "work". I'm currently an Undergraduate student - seeking Master's - and I work with many other Graduate students on this relatively large research project.
The problem lies in getting some cooperation from others. I've used GitHub for almost five years, across every platform out there. I know quite a bit about its ins and outs, and I've come to rely on version control and commit history. The thing is - all of our coworkers, though one in specific (the one who happens to be doing a lot of the work right now, though I did lay down the backbone of the code that he's working on) practically refuse to use it.
This one individual has adopted what I call an "unplanned repository". He has several (approximately 8) folders of our repository he downloaded straight from GitHub, unzipped, and began working on. He hasn't used GitHub at all on these folders, and at this point he's keeping them all as separate "versions" with names that only he could distinguish from one another (stuff such as "(repo name)-REC-ONE", "(repo name)-WITH-HTTP", etc.).
I've politely (and recently, not so politely but in a joking manner) told him that he should really consider using the GitHub repository I took so much time in preparing with separate branches for our varying "working stages". His most recent excuse was "well, last time we did that you started working on my branch", which while true, it was after one line of code and then I immediately noticed and changed branches, discarding all changes and making no changes to his branch.
So, my question is: how would you suggest I practically force him to use it? It's becoming literally unbearable, and other professionals in the field have already pointed out to us that our "version control" is negligent, which I totally agree though it's not my fault. The hardest thing is that everyone has been at this work for longer than I have, but I've also somewhat been put in main control of this project, so I don't want to overstep my bounds or burn any bridges, but this just has to stop. Any suggestions? Do we just have to wait for something catastrophic to happen to convince him to use version control?
Top comments (3)
Could he be command-line-phobic? Do you use an editor with git integration or could he use one of the GUI clients?
The best way to enforce git usage is to create scripts that deploy from github to production. Nothing can get to the users without passing through these scripts.
Have you tried pair-programming with him? If you show him version control in action, including jumping between branches, looking back through logs, and perhaps an "oops" followed by rolling back to a good version, there's a good chance he'll see the benefits it would bring him.
If he is worried that people will tread on each other's branches, then maybe it would be advantageous for him to have a separate GitHub fork of the project and collaborate via pull requests, like in the open-source world. That way, he will have full control over what is allowed in his version of a branch.
It's notoriously difficult to change the workflow of researchers, even when the change is widely acknowledged as being an improvement by others.
So, there are three ideas that may be helpful to consider:
You want to make the new workflow as easy as possible to adopt. Putting in extra unautomated steps, even if beneficial in the long run, will make it more difficult to convince people to adopt it. One way to reduce the cost-factor of extra steps is to add things that make people feel positive about them. Automated testing, for regression testing, doctesting, and where possible unit-testing, is a good example here: if you're not using Travis or something like it already, try to start, with the branch and same-repo PR testing as well. Also try to make the documented usage steps include the git steps where possible.
To a certain extent, you can raise the flag of Research Integrity. It's perfectly reasonable in a research environment to make make useful tooling that essentially requires you to be working with a version that knows where it came from (e.g. by git ref), and that really noisily fails if it doesn't. (How you do this will depend on working language obviously.) Carry through that metadata into all outputs. Point out when any outputs don't include metadata. This may mean pointing at your colleague's output and saying, "Well, how do I know what code this was made with? How could I reproduce this?"
On Github, you can set up a repo such that it won't accept pushes or un-reviewed PRs to specific branches (see Protected Branches under the Settings tab). You could set this up on your "production" branches and your main/master branch -- the ones you use to produce data for papers and the like, and the ones you intend to keep long-term. (Versions used for production should probably also be tagged.)
Keeping code manageable and useful long-term in academia is a real problem. It's easy for researchers to work alone, untidily on code and for it to become a spaghettified mess that is more and more painful to modify by successive grad students. Code review and various automated tools around it can help, and can often fit into frequently-adopted research group meeting practices. Do read up on good code review practices before trying this for the first time, though: it's really easy for code review to become a really antagonistic and hostile affair, and some people will take it this way no matter what you do.
One minor language thing: git is not Github. Using git doesn't automatically mean using Github, and, to a lesser extent, vice-versa.
Also, if you're adding in doctesting, bear in mind it's not a great idea to make it your main form of testing: it's really easy to make it really unwieldy and it'll get in the way. Also, edge cases, such as you'd want to test, often don't make great examples, and doctesting is best for showing examples.