Core Concepts and Operations in Distributed Version Control with Git
Version control involves tracking changes made to files within a project, assigning an identifier to each set of changes.
Categories of version control systems include:
- Local Version Control Systems (VCS)
- Centralized Version Control Systems (CVCS)
- Distributed Version Control Systems (DVCS)
Git is a distributed version control system. When you run git init to create a new repository or git clone to copy an existing one, a hidden .git directory is established to manage the repository's history.
The command git status reveals the state of files in your working directory. Files can be in one of these primary states:
- Modified: Changes have been made to the file, but they are not yet recorded in the repository's database.
- Staged: The modified file has been marked to be included in the next snapshot (commit).
- Committed: The file's data is securely stored in the local repository database.
These states correspond to different working areas within a Git project: the working directory, the staging area (or index), and the repository.
Understanding Fork, Clone, and Branch
Fork
A fork creates a personal copy of another user's repository on a hosting service like GitHub. This copy includes all branches, tags, and commits from the original (upstream) repository. Forking is a feature of the hosting platform, not a native Git command. After forking, you can clone your copy locally, make changes, and push them back to your forked repository. You can propose integrating your changes into the original project by submitting a pull request.
Clone
The git clone command downloads a complete copy of a remote repository to your local machine, creating a local working copy with full version history.
Branch
Branches enable parallel development streams. git branch <branch-name> creates a new branch. git switch <branch-name> moves your working directory to that branch, isolating your changes from the main line of development.
HEAD, Working Tree, and Index
HEAD
HEAD is a special pointer that references the current branch you are working on, or a specific commit if you are in a "detached HEAD" state. It essentially points to your current location in the repository's history.
Working Tree and Index
The working tree is your project's directory where you view and edit files. The index (or staging area) is an intermediate area where you prepare changes for a commit. The command git add moves changes from the working tree to the index.
Pull vs. Fetch
git fetch downloads the latest objects and references from a remote repository without integrating them into your local branches. It allows you to inspect changes before merging.
git pull performs a fetch and then immediately merges the fetched content into your current branch. This can sometimes result in merge conflicts that require resolution.
Example of using fetch and merge:
git fetch origin main:remote-main
git merge remote-main
This fetches the remote main branch into a local branch named remote-main and then merges it.
Stashing Changes
git stash temporarily shelves (or stashes) modifications from your working directory and staging area, saving them on a stack for latter use. By default, it saves tracked files that are staged or modified, but not untracked or ignored files.
Useful flags:
-uor--include-untracked: Also stash new, untracked files.-aor--all: Stash all files, including ignored ones.
Typical workflow for saving work-in-progress before updating from remote:
git stash # Save local changes
git pull # Update from remote
git stash pop # Reapply saved changes, potentially resolving conflicts
Rebase vs. Merge
Merge
git merge <branch> integrates changes from one branch into another. It creates a new merrge commit that has two parent commits, preserving the history of both branches.
Rebase
git rebase <branch> moves or replays the commits from your current branch onto the tip of the specified branch. It results in a linear project history, as if the work had been developed sequentially on the target branch.
Undoing Changes: Reset vs. Revert
Reset
git reset moves the current branch HEAD to a specified commit, optionally altering the staging area and working directory. It rewrites history, removing subsequent commits from the branch's log.
--soft: Moves HEAD only. Staging area and working directory are unchanged.--mixed(default): Moves HEAD and resets the staging area to match the specified commit. Working directory changes are preserved but unstaged.--hard: Moves HEAD, resets the staging area, and discards all working directory changes to match the specified commit.
Example:
git reset --hard a1b2c3d # Reset branch HEAD, staging, and working dir to commit a1b2c3d
Revert
git revert creates a new commit that undoes the changes made in a specified previous commit. It is a safe way to undo public changes because it adds history rather than erasing it.
Example:
git revert HEAD # Create a commit that undoes the most recent commit
git revert a1b2c3d # Create a commit that undoes the changes from commit a1b2c3d