docs/project/git_workflow.pod - How to use Git to work on Parrot
To minimize the disruption of feature development on language and tool developers, all major changes to Parrot core take place in a branch. Ideally, branches are short-lived, and contain the smallest set of changes possible. It is also good practice to have "atomic" commits, in the sense that each commit does one thing, so that it makes it easier to accept/revert some things while keeping others. Git provides many powerful tools in the maintenance of branches. This document aims to provide everything a Parrot developer needs to know about Git to successfully create a branch, work on it, keep it in sync with changes on master and finally merge the branch (or send a pull request).
To get the full version history of Parrot,
which is called "cloning a repository",
you will use the
git clone command.
This will show how to clone the repo from http://github.com:
git clone git://github.com/parrot/parrot.git
If you have commit access to parrot.git, then you can use the read/write SSH protocol
git clone email@example.com:parrot/parrot.git
If you are behind a firewall that will only let you do HTTP, you can use the HTTP protocol, but it is much slower and inefficient, so only use it if you must:
git clone http://github.com/parrot/parrot.git
Cloning a git repo automatically makes the URL that you cloned from your default "remote", which is called "origin". What this means is that when you want to pull in new changes or push changes out in the future, "origin" is what will be used by default.
To create a branch and check it out, use the
git checkout -b command. Please note that branches are first created locally, and then pushed to any number of remotes. A branch cannot be seen by anyone else until it is pushed to a remote.
Branches in git are very "light-weight", i.e. they take up virtually no extra space (unlike in Subversion), so please use them liberally.
Current convention is to create branch names of the form username/foo.
git checkout -b username/foo
You can also create a branch without checking it out:
git branch username/foo
To checkout (or "switch to", as it is commonly referred to) the username/foo branch:
git checkout username/foo
If you would like to checkout a branch that already exists, first make sure to get the latest commits with:
Then use this syntax to track the remote branch with a local branch:
git checkout -t origin/username/foo
If you are using a very old version of Git, such as 1.5.x.x or older, you will want to tell it to track the remote branch:
git checkout --track -b username/foo origin/username/foo
Tracking remote branches is the default in Git 1.6.x.x and higher.
If you want to push your local branch user/branch to origin:
git push origin user/branch
Tags are not pushed by default. If you want to push your tags to origin:
git push origin --tags
Since origin is the default remote, this is the same as
git push --tags
To get the latest updates:
This copies the index from your default remote (origin) and saves it to your local index. This does not effect your working copy, so it doesn't matter what branch you are on when you run this command. To sync up your working copy, you can use
git checkout master # Switch to the master branch git rebase origin/master # rebase latest fetched changes onto master
This is a common action, so there is also a simpler way to do it:
git pull --rebase
Whenever you don't specify a remote or branch to a git command, they default to the remote called "origin" and the master branch, so the previous command means exactly the same as:
git pull --rebase origin master
This is important to note when you are working with remotes other than origin, or other branches.
Let's say you modified the file foo.c and added the file bar.c and you want to commit these changes. To add these changes to the staging area (the list of stuff that will be included in your next commit):
git add foo.c bar.c
git add is not only for adding new files, just keep that in mind. It tells git "add the *current* state of these files to my staging area." If these files change after you
git add them, they will need to be added again.
If you are trying to add files that are in .gitignore, you can do that with the --force flag:
git add --force ports/foo
NOTE: Make sure these files should actually be added. Most files in .gitignore should never be added, but some, such as some files in "ports/" will need the --force flag.
Now for actually creating your commit! Since Git is a distributed version control system, committing code is separate from sending your changes to a remote server. Keep this in mind if you are used to Subversion or similar VCS's, where committing and sending changes are tied together.
git commit -m "This is an awesome and detailed commit message"
If you do not give the -m option to
git commit, it will open your $EDITOR and give you metadata which will help you write your commit message, such as which files changed and if conflicts happened. Only lines that do not begin with a # will be in your commit message.
Commit messages consist of a one line "short message" and an optional long message that can be as long as you like. There must be an empty line between the short message and the long message. It is often a good practice to keep the first line of a commit message relatively short, around 50 characters. This allows for tools to succinctly display one-line commit messages with metadata. It is good practice to describe *why* something was done in the long message, in addition to what was done.
Instead of adding files individually, you can also tell
git commit that you want all modified and deleted files to be in your next commit via the
git commit -a -m "awesome and informative commit message"
Be careful with
git commit -a, it could add files that you do not mean to include. Verify that the contents of your most recent commit look sane with:
Reading the man page of
git commit with
git help commit is also a good idea.
To update your local git index from origin:
If you are following multiple remotes, you can fetch updates from all of them with:
git fetch --all
Note that this command requires a recent (1.7.x.x) version of git.
git fetch only modifies the index, not your working copy or staging area. To update your working copy of the branch bobby/tables:
git checkout bobby/tables # make sure we are on the branch git rebase origin/bobby/tables # get the latest sql injections
If you have a topic branch and want to pick up the most recent changes in master since the topic branch diverged, you can merge the master branch into the topic branch. In this case, we assume the topic branch is called parrot/beak:
git checkout parrot/beak # make sure we are on the branch git merge master # merge master into parrot/beak
This same pattern can be followed for merging any branch into another branch.
Post to firstname.lastname@example.org letting people know that you're about to merge a branch. Follow the guidelines in "project/merge_review_guidelines.pod" in docs, especially:
If someone has sent the Parrot Github Organization a pull request, life is a bit easier now. If pull request #123 has been sent, then type:
perl tools/dev/merge_pull_request.pl 123
and you will automatically be on a branch called pull_request_123 with all commits in the pull request applied as individually signed-off commits. Now you can review the code, run tests, etc and vet the code. You can even type
git checkout -b way_cooler_branch_name
if you want a more informative branch name than the autogenerated one.
If you want to merge this code to master, you then type
git checkout master git merge --no-ff pull_request_123
Don't forget to close the pull request manually, since signing off on the commits changes their SHA1s, which means Github can't detect the merge and autoclose the pull request. That's it!
When you're ready to merge your changes back into master, use the
git merge command. First, make sure you are on the master branch with:
git checkout master
Now, to merge the branch user/foo into master:
git merge --no-ff user/foo
If the same part of the same file has been modified in master and the branch user/foo, you will get a conflict. If you get a conflict, then you need to edit each file with a conflict, fix the conflict, delete the conflict markers,
git add each conflicted file and then finally commit the result. You may find it useful to type
git commit with no commit message argument (-m), so your $EDITOR will be opened and a default commit message of "Merged branch user/foo into master" with a list of conflicted files will be present, which can be left alone or additional information can be added.
At this point you should run the test suite on the merged branch, and fix any failures if possible. If these problems cannot be resolved, then the results of the merge should not be pushed back to master on origin. You may want to push a topic branch to allow other people to help:
git checkout -b user/foo_in_master git push origin user/foo_in_master
You can now mention the problems you ran into to parrot-dev or #parrot and mention the branch.
Why use "--no-ff" ? This flag mean "no fast forwards". Usually fast forwards are good, but if there is a branch that has all the commits of master, plus a few more, when you merge without --no-ff, there will be no merge commit. Git is smart enough to "fast forward." But for the purpose of looking at history, Parrot would like to always have a merge commit for a merge, even if it *could* be fast-forwarded.
This flag is only needed when merging branches into master. It is optional for other merge cases, but it won't hurt. If it is simpler to think about, you can always use "--no-ff" when merging.
If you fork Parrot on github, you will have a copy of the entire history of Parrot, which you can sync with the main parrot git repo as often as you want.
Don't Commit To Master In Your Fork
Create a branch (with git checkout -b branch) and do any development in that.
Next, you will add the main parrot.git repo as a remote called "upstream"
git remote add upstream git://github.com/parrot/parrot.git
You may call this remote anything you like, but for the purposes of this document, we will always refer to is as "upstream".
We now want to get a copy of the index from upstream. If you are using a recent git (1.7.x.x or higher) you can update your copy of the index of all remotes with
git fetch --all
If you have an elderly git, you can give the fetch command the name of the remote to fetch:
git fetch upstream
Now you need to update your working copy, and specifically, the master branch, so make sure you are on the master branch of your fork
git checkout master
and then sync up with upstream master
git rebase upstream/master
There is no possibility of conflicts here *if you didn't commit to master*. If you get conflicts at this step, you are doing something wrong. Stop and ask someone for help.
It is in your best interest to sync up your master branch with upstream before creating new branches from master, since that will minimize the possibilities of conflicts when merging the branch back to master later on.
Don't Rebase Branches Against Each Other
For instance, if you are on branch 'foo' and you do
git rebase upstream/master
you will be entering a world of pain. Don't do it. If you are smart enough to know when it is OK to do this, you don't need this document.
This workflow to maintain a fork will make submitting pull requests "Just Work" and if you plan to submit pull requests, this workflow is required. If you don't want to submit pull requests, do whatever ye will.
NOTE: This is not official policy, and could change at any time.
ALSO NOTE: With the introduction of the magical Github Merge Button, doesn't this seem like a ridiculous amount of work to accept code from a fork? This workflow is still useful if you want to run tests or do any other analysis on the pull request.
If you are merging code from non-core committers, Parrot would like to see "Signed-off-by" lines on each commit, so it is clear that one person wrote the code and someone else is taking responsibility for merging the code.
To merge in a pull request on Github, there is the easy way or the hard way. The easy way is to use tools/dev/merge_pull_request.pl like so:
perl $PARROT/tools/dev/merge_pull_request.pl N
where N is the number of the pull request. This will create a branch for the pull request and plop you on it, so that tests or other analyses can be run.
This script can be used to merge a pull request for any repo in the Parrot Github Organization. For instance, to merge Cardinal Pull request #4:
perl $PARROT/tools/dev/merge_pull_request.pl 4 cardinal
If you do not want the default branch name, you can specify a third argument to specify one:
perl $PARROT/tools/dev/merge_pull_request.pl 4 cardinal awesome_feature
This will merge pull request #4 in the cardinal repo into a branch called 'awesome_feature'.
perldoc tools/dev/merge_pull_request.pl for more info.
The hard way: Follow the first two steps on the Github pull request page, i.e:
# 1) Make a new branch to pull stuff into git checkout -b someguy-newbranch master # 2) Pull in changes from the branch of a forked repo # and switch to newbranch git pull https://github.com/someguy/parrot.git newbranch
Then you will follow this procedure:
# 3) let's look at a diff OPTIONAL git diff master # look at a diff # 4) let's make a patch of the difference between this branch and master git format-patch master --stdout > foo.patch # 5) Switch back to the master branch git checkout master # 6) does the patch apply? OPTIONAL git apply --check gci.patch # 7) Actually merge in the changes with Signed-Off-By lines git am --signoff gci.patch
After doing this, you will the new commits in master with your Signed-Off-By lines. Run the tests and if they all pass, push your changes upstream. If they don't, and you want others to take a look at stuff, create a new branch from your local master:
# make a new branch git checkout -b my_new_branch # push it upstream git push origin my_new_branch
At this point, your local master has commits that are now in my_new_branch, so you should reset master back to before you applied the patch. You can do this by looking at the output of
git log and finding the most recent SHA1 before your branch, then doing:
git reset --hard $sha1
Be careful with
git reset! Make sure there are no untracked files that you care about BEFORE YOUR RUN
The use of reset can be avoided if you create a new branch before applying the patch, and then merging that branch to master before pushing.
Also note that you will have to manually close the Github pull request, since adding Signed-Off-By lines changes the SHA1's which Github uses to automatically close pull requests.
Send a message to email@example.com letting people know that your branch has been merged. Include a detailed list of changes made in the branch (you may want to keep this list as you work). Particularly note any added, removed, or changed opcodes, changes to PIR syntax or conventions, and changes in the C interface.
If there was a specific language, tool, or platform that you wanted tested before merging but couldn't get any response from the responsible person, you may want to include some warning in the announcement that you weren't able to test that piece fully.
After merging a branch, you will have a local copy of the merged branch, as well as a copy of the branch on your remote. To remove the local branch:
git branch -d user/foo
To remove the remote branch user/foo on the remote origin:
git push origin :user/foo
This follows the general pattern of
git push origin local_branch_name:remote_branch_name
When local_branch_name is empty, you are pushing "nothing" to the remote branch, which deletes it.
To see a list of all remote branches that are not merged into branch foo
git branch -r --no-merged foo
If you leave out the last argument, it defaults to HEAD (the tip of the current branch). So, to see a list of all remote branches that have not been merged into master:
git checkout master git branch -r --no-merged