--- title: "git4r-demo" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{git4r-demo} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ## An introduction to using Git to change-control projects with R Git is ideal for tracking changes between text-like files, such as code, static csv lookup tables, your to-do list, etc. Other 'binary' files (like .docx or .jpeg) can be included too but a hidden copy will be saved every time it is changed (in `.git/`) which can get quite bloated. In this demo, we will create a temporary folder of pseudo-code and see how we can use git to keep parallel versions, view, and restore history. ``` > library(git4r) ``` ``` > myproject = tempfile(pattern = 'test-git-project-') > dir.create(myproject) > setwd(myproject) > # Make some pseudo files for our repository > for(my_code_files in LETTERS[1:5]) write(sample(letters,10), file=my_code_files) ``` ### Selecting files and adding them to a commit We can create a checkpoint (a *commit*) whenever we want which will save the contents of any files we choose to *add*. We can then bring back this version of the file any time in the future, and see the short message describing the change. Normally we want to add all files to our commit at once (backup everything) but you might want to split the changes in two, for example add one file with its own commit message, and an unrelated change in another. You will be asked first whether to convert the normal directory to a git repository -- always make sure you are in the right working directory so you don't start git-tracking your entire home directory! ``` > git_add() test-git-project-40a3cbf1356 directory is not inside a git repository - check this is the top level else ESCAPE and setwd() Should this directory be turned into a git repo? (Y/N) Y Copying default .gitignore to working directory test-git-project-40a3cbf1356 is now a git repo Modified files to be added (+ created, - deleted, * changed since added, ? conflict) 1 + .gitignore 2 + A 3 + B 4 + C 5 + D 6 + E Which file numbers to add? (Hit ENTER to add all non-conflicting, else ESCAPE) 1 2 3 4 Adding 4 file(s) ``` After adding a couple of files, we will re-run `git_add()` and see that we are given the option to remove any we regret. This time we add ALL files by hitting enter. ``` > git_add() Changes staged already 1 + .gitignore 2 + A 3 + B 4 + C Any file numbers to un-add? (Hit ENTER to keep all) Modified files to be added (+ created, - deleted, * changed since added, ? conflict) 1 + D 2 + E Which file numbers to add? (Hit ENTER to add all non-conflicting, else ESCAPE) Adding 2 file(s) ``` We can now commit all of these changes -- our first checkpoint ``` > git_commit() Using global (system default) identity: myname Commit message: My first commit Commit to master? (hit ESCAPE to cancel) Done ``` If we now make a whole bunch of changes, we will be able to see the new and old version of each file, and our helpful commit message which should briefly describe with one sentence what was changed and why. The 'who' and 'when' are recorded automatically. ``` > for(even_more_files in LETTERS[4:7]) write(sample(1:10,5), file=even_more_files, append=TRUE) ``` ``` > git_add() Modified files to be added (+ created, - deleted, * changed since added, ? conflict) 1 D 2 E 3 + F 4 + G Which file numbers to add? (Hit ENTER to add all non-conflicting, else ESCAPE) Adding 4 file(s) ``` ``` > git_commit() Using global (system default) identity: myname Commit message: Add some more files and made some changes Commit to master? (hit ESCAPE to cancel) Done ``` ### Viewing historic commits We can view the change history of a folder (always relative to the top directory) or only changes which change a particular file. This gives us a unique identifier code for this commit which is a unique hash, as well as timestamp and the message. ``` > git_history() 1 [facbb79] 2022-01-20: Add some more files and made some changes 2 [d8759cf] 2022-01-20: My first commit ``` We can look at what has changed between each of these commits using `git_diff()` on either a folder or within specific file. We can specify which commit we want to compare using filters like 'before' or 'message', or use NULL to get the the version right now. See `?git_history` for the full list. Here we compare the second most recent commit `n=2` with the most recent `n=1`. ``` > git_diff(path='.', n=2, n=1) ``` git_diff(path='.', n=2, n=1) The default behaviour is to compare current working directory version with most recent commit -- showing what will be added (or lost if you `git_undo()`) ``` > write('uncommitted change', 'F', append=TRUE) ``` ``` > git_diff(path='F') ``` git_diff(path='F') ### Creating and changing branches An important feature of git is keeping parallel versions of the same working directory called *branches*, for example if you want to develop some new code but have a stable working version you can instantly change back to. We can use `git_branch()` to list the existing branches and change to one of these, or create a new branch which starts with the current working directory. A simplification is made here that forbids changing branch without having committed all of your changes, because uncommitted changes could get irreversibly lost. It is good practice to create new branches for each feature or set of changes you are going to make, and only when it's finished do you merge it back into the main branch. This is what we shall do here with our 'TEST' change. ``` > git_branch() Current active branch: master New or existing branch name to move to (or hit ENTER to list branches) test-branch ``` ``` > for(change_files in LETTERS[2:6]) write('TEST', file=change_files, append=TRUE) ``` ``` > git_add() Modified files to be added (+ created, - deleted, * changed since added, ? conflict) 1 B 2 C 3 D 4 E 5 F Which file numbers to add? (Hit ENTER to add all non-conflicting, else ESCAPE) ``` ``` > git_commit() Using global (system default) identity: myname Commit message: Experimental changes on test-branch Commit to test-branch? (hit ESCAPE to cancel) Done ``` ``` > git_branch() Current active branch: test-branch New or existing branch name to move to (or hit ENTER to list branches) [facbb7] (Local) master [0f39df] (Local) (HEAD) test-branch ``` ### Merging branches We can merge these changes back into our master branch, or make another sub- branch off this one. A common problem that comes up is if you are making changes on lots of branches, you might change the same file in different ways. This causes a *conflict* which it will try to resolve automatically, but if it cannot, it will highlight the code chunk with `<<<<< >>>>>>` and prompt you to fix it by hand. You will not be allowed to `git_commit()` before you have confirmed that you have fixed the conflicts and added the files by number. When a merge is complete, all of the commits from both branches will make up the shared history. Return to main branch and write a change to file `E` remembering that we have already made a conflicting change on test-branch and Git will not know which is correct. ``` > git_branch() Current active branch: test-branch New or existing branch name to move to (or hit ENTER to list branches) master ``` ``` > write('This change will conflict with our test-branch', file='E') ``` ``` > write('This change does not conflict with test-branch', file='H') ``` ``` > git_add() Modified files to be added (+ created, - deleted, * changed since added, ? conflict) 1 E 2 + H Which file numbers to add? (Hit ENTER to add all non-conflicting, else ESCAPE) Adding 2 file(s) ``` ``` > git_commit() Using global (system default) identity: myname Commit message: Added a change to E and H which are not in test-branch Commit to master? (hit ESCAPE to cancel) Done ``` We can see the difference between the current state of the two branches, these will be the files which need to merge. ``` git_diff('',branch='test-branch', branch='master') ``` git_diff('',branch='test-branch', branch='master') ``` > git_merge() Branch to merge into master: test-branch Merge test-branch into master Proceed? (Y/N) Y Delete this branch after successful merge? (Y/N) Y Merging test-branch into master This merge will result with conflicts to resolve manually. Continue anyway? (Y/N) TRUE The following files have conflicts: E Deleting branch test-branch Open the conflicting files and resolve <<>> now? (Y/N) Y ``` This then opens the file in an editor (default RStudio otherwise external) so that it can have the conflicting parts deleted. You should do this straight because the files will appear corrupted with all of the `<<<<` `=====` and `>>>>` in them. To make sure you have fixed each file, they will **not** automatically add when running `git_add()` and each file marked `?` must be added by number. ``` > git_add() Changes staged already 1 B 2 C 3 D 4 F Any file numbers to un-add? (Hit ENTER to keep all) Modified files to be added (+ created, - deleted, * changed since added, ? conflict) 1 ? E Which file numbers to add? (Hit ENTER to add all non-conflicting, else ESCAPE) 1 Adding 1 file(s) ``` ``` > git_commit() Using global (system default) identity: myname Commit message: Completed the merge of our test code Commit to master? (hit ESCAPE to cancel) Done ``` The history of the main branch now contains all commits from both branches, and another commit we did immediately after completing the merge. ``` > git_history() 1 [a92c7d2] 2022-01-20: Completed the merge of our test code 2 [b9ea6f1] 2022-01-20: Added a change to E and H which are not in test-branch 3 [0f39df6] 2022-01-20: Experimental changes on test-branch 4 [facbb79] 2022-01-20: Add some more files and made some changes 5 [d8759cf] 2022-01-20: My first commit ``` ### Reverting back to a previous commit Occasionally things will go wrong with our code and we will want to hard reset or we merge / delete a branch by mistake. This is totally *undo*-able as long as it is done as soon as possible. `git_undo()` will show the recent history of actions that have happened locally (Git's *reflog*) and allow you to restore the working directory to a previous commit checkpoint. It does not remember what branch you were on, so you may have to `git_branch()` to the branch with the right name before restoring the old files. Typical use is to wipe any changes since the last commit by resetting to the zero commit on the list. It will duly warn of the irreversibly loss (which can be checked with `git_diff()`). ``` > file.remove('B', 'D', 'G') [1] TRUE TRUE TRUE ``` ``` > git_undo() Not all changes have been committed! Run git_diff() to see what. >>>>> Continuing will result in IRREVERSIBLE LOSS <<<<< 0 [a92c7d2] HEAD@{0}: commit (merge): Completed the merge of our test code 1 [b9ea6f1] HEAD@{1}: commit: Added a change to E and H which are not in test-branch 2 [facbb79] HEAD@{2}: checkout: moving from test-branch to master 3 [0f39df6] HEAD@{3}: commit: Experimental changes on test-branch 4 [facbb79] HEAD@{4}: checkout: moving from master to test-branch 5 [facbb79] HEAD@{5}: commit: Add some more files and made some changes 6 [d8759cf] HEAD@{6}: commit (initial): My first commit Which commit to reset to? (Hit ESCAPE to cancel) 0 Selected: [a92c7d2] HEAD@{0}: commit (merge): Completed the merge of our test code Confirm that you really want to change the working directory to historic state? (Y/N) Y ``` Happy days. ``` > list.files() [1] "A" "B" "C" "D" "E" "F" "G" "H" ``` ## Setting up a remote copy on a shared drive Git repositories can be synchronised with a shared location to allow multiple people to make versions (branches), commit changes and provide backup. In your team, choose a shared drive which has the access level you need, and make a directory. Use this as the `GIT_DEFAULT_REMOTE` by adding this to your .Renviron file (see `?git_remote` for example). Here we will use a pretend temporary `default_remote` folder ``` > demo_default_remote = tempfile(pattern = 'test-default-remote-') > dir.create(demo_default_remote) > Sys.setenv(GIT_DEFAULT_REMOTE = demo_default_remote) ``` ### Adding a new remote Repositories can be given a 'remote' copy which allows changes to pushed to it and then others can then pull (download) these changes to their own local version. This remote is often an http server such as GitHub however this may not be reachable using a corporate system or not wanted. This package encourages a simple shared drive folder to keep remote copies of the team's code which can then be synchronised internally. Remote copies can have different names (such as 'public-mirror') and you can choose which branches they should contain. However, **only** the remote called 'origin' will be used to pull (download) changes from and this is usually the only remote you will have. Here we will add the default origin which means saving a copy in the `GIT_DEFAULT_REMOTE` folder path. ``` > git_remote() No remote yet! Hit ENTER to add 'origin' remote, else type remote name, or ESCAPE to cancel: Path / URL for 'origin' (hit ENTER to use default /tmp/test-default-remote): Make remote a bare repository (recommended for 'origin' - see ?git_remote) (Y/N): Y Creating bare repo at /tmp/test-default-remote/test-git-project Adding origin at /tmp/test-default-remote/test-git-project Now run git_push() to complete setup of remote branch tracking ``` ``` > git_push() Current branch is not yet pushed to origin -- do this manually Available remotes 1 origin /tmp/test-default-remote/test-git-project Index number of remote to push to: 1 Local branches to push (+ if not in remote, > current branch) 1 + > master Which branch numbers to push? (Hit ENTER to add all except + else ESCAPE) 1 Done ``` ``` > git_pull() Pull result: Already up-to-date ``` ### Pushing changes to remote After making commits to a branch, this branch should be pushed to the remote. A branch can be kept local (private) by not pushing this branch. It is bad practice to commit to the main branch because this will almost certainly give problems where two people change the same file. Best is to create a unique (and helpful) named branch and a nominated 'repository owner' does a merge into the main branch for everyone. (This is often called a merge request or pull-request and can be strictly enforced on http remotes). The cycle of pull-add-commit-push has been wrapped up in a simple `git()` call. ``` > git_branch() Current active branch: master New or existing branch name to move to (or hit ENTER to list branches) dev-branch ``` ``` write('Change by person A', file='F', append=TRUE) ``` ``` > git() Skipping pull: active branch has not been pushed to origin remote Modified files to be added (+ created, - deleted, * changed since added, ? conflict) 1 F Which file numbers to add? (Hit ENTER to add all non-conflicting, else ESCAPE) Adding 1 file(s) Commit message: Made a change on the dev-branch, waiting approval Commit to dev-branch? (hit ESCAPE to cancel) Done Current branch is not yet pushed to origin -- do this manually Available remotes 1 origin /tmp/test-default-remote/test-git-project Index number of remote to push to: 1 Local branches to push (+ if not in remote, > current branch) 1 + > dev-branch 2 master Which branch numbers to push? (Hit ENTER to add all except + else ESCAPE) 1 2 Done ``` ### Cloning another copy from the shared remote Sharing code is made simple by cloning the remote folder to make a local copy. This automatically sets the same 'origin' remote as where you cloned from which means all changes are kept in sync even after cloning (every time you *pull*). Here, the `GIT_DEFAULT_REMOTE` folder is used to list all repos which have been pushed there and can be cloned from, or you can give another path or http address. Changing to a remote branch will create a local copy. If two people do push different commits onto the same branch, when you next pull (download changes) it will give conflicts like a merge and `git_add()` should be used to find and fix the conflicting files. ``` > git_clone() Available repo names: test-git-project ``` We can see the project that someone previously pushed, which we can download another downstream copy of. ``` > another_tempdir = tempfile() > dir.create(another_tempdir) > git_clone('test-git-project', to=another_tempdir) Cloning test-git-project into " /tmp/another_tempdir/test-git-project " -- proceed? (Y/N) Y Cloning to /tmp/another_tempdir/test-git-project Set working directory to here? (Y/N) Y ``` If we look at what branches exist, we can see it has created a local copy of the main branch, and if we change to 'dev-branch' it will do likewise (if we want to create our own new branch, we must choose a totally unique name). ``` > git_branch() Current active branch: master New or existing branch name to move to (or hit ENTER to list branches) [a92c7d] (Local) (HEAD) master [66d1db] (origin @ /tmp/test-default-remote/test-git-project) dev-branch (origin @ /tmp/test-default-remote/test-git-project) HEAD [a92c7d] (origin @ /tmp/test-default-remote/test-git-project) master New or existing branch name to move to (or hit ENTER to cancel) ``` We can inspect and merge remote branches in the same way as if they were local. Here we do a typical repository maintenance job of approving somebody elses changes and making them part of the main branch. ``` git_diff('F', branch='master', branch='origin/dev-branch') ``` git_diff('F', branch='master', branch='origin/dev-branch') ``` > git_merge() Branch to merge into master: origin/dev-branch Merge origin/dev-branch into master Proceed? (Y/N) Y Delete this branch after successful merge? (Y/N) N Merging origin/dev-branch into master ```