Git Working Areas

Git first cover
Reading Time: 8 minutes

Hello all, in this blog we will obtain an understanding of git internals. We will understand how our content moves within a git system. This will introduce us to the git working areas. We will see which git commands moves data to & from these areas. As a matter of fact, this blog will help boost our knowledge regarding git operations. It will also serve as a base to understand some of the troublesome commands in Git, like rebase, reset, and etc. Though, we’ll be discussing those commands in the next post along with some of the best practices while working with Git. So, Let’s get started.

Git

Though this blog is not going to cover the basics of Git, but a small introduction is always good to satrt with. So, what is git? Git is a distributed version control system(DVCS). Most of the people who are working on an enterprise grade project probably already know what that means. It is an organized system that helps multiple developers work on the same repository/code simultaneously. Such a strategy makes sure that at any point of time the common reference branch has a no-conflict state. Pushes to the major branches can succeed only after the internal conflicts, merge conflicts are resolved. Git is a mature, actively maintained open source project originally developed in 2005 by Linus Torvalds.

Git ideology

Rather than have only one single place for the full version history of the software as is common in once-popular version control systems like CVS or Subversion (also known as SVN), in Git, every developer’s working copy of the code is also a repository that can contain the full history of all changes.

In addition to being distributed, Git has been designed with performance, security and flexibility in mind. One can find more on that here.

I think that is enough in terms of understanding what git is. Let’s now move on to the focus factor of this blog: Git working areas.

Git Working Areas

I hope we all know what is the very first step to start with git? Yes, you are right we need to initialize an empty git repository, assuming that git is installed in your system already. 🙂 How do we do that? It is quite simple, in a terminal move to your desired directory and issue:

git init

This would initialize the current directory as an empty git directory. Then you can add a remote working branch as the origin for this directory. This would make sure that all your push and pull are made from the repository hosted remotely. Sounds good right? That is pretty much we all know already. Now the question(s) one may ask is where the git working areas are? What is meant by working areas? If there are any working areas then what is their count? And so on. So let’s try to answer those questions.

Ques: Where are the working areas?

Ans: Remember the first step we performed with git? Initializing an empty git repository. That init step was the first step to add the working areas in the initialized directory. So the answer is pretty simple, git working areas resides locally within every git initialized directory, in the .git folder.

Ques: What is meant by the git working areas?

Ans: Git working areas are the physical storages on the local disk that track how your code and commits move across as you issue git commands. As a result, staging and unstaging of changes happen locally. We will learn what is staging and unstaging in a while.

Ques: How many working areas are there?

Ans: There are these 4 working areas in git.

  1. Workspace aka untracked area
  2. Index aka staging area
  3. Repository
  4. Stash

Let us quickly go through each of these working areas one-by-one and understand their purpose.

1. Workspace Area

Well the working area(aka the workspace) is the area where we actually start adding our changes to. Whenever we initialize an empty directory and add some content like a file or folder etc we are working in the workspace area only. Every thing we put in the directory, we are actually putting that in the workspace. This is the area that contains the least concerned file state. Meaning, that we are not concerned if the changes we made or things we have added are lost in the future, unless we want them to persist and issue an add command. As a result, this is considered as the highly volatile area of git.

Let’s see through an example.

For the sake of simplicity, I have created an empty git repository: gitBlog. I then went ahead and added a HelloWorld.scala file to the directory. Remember, this file was just added to the workspace area and is volatile. We can see from the below screenshot, when we issue a git status command, it says there are untracked changes. We can also confirm that the file is accessible in the area and has content in it.

Git status shows the file HelloWorld.scala is untracked.
git status: untracked files.

Untracked files are those that are present in your workspace but are considered safe to delete by git. Making them quite volatile. Meaning, once the file is deleted there is no way that git can track it.

removing untracked files, permanently delete it from git's memory.
removing untracked files, permanently delete it from git’s memory.

That is why, working area must not have any untracked file or content that we can’t risk to loose.

2. Index/Staging Area

Git is quite interactive itself, one can notice that when we issued the git status command earlier. It prompts with a message:

Untracked files:
(use "git add …" to include in what will be committed)
HelloWorld.scala
nothing added to commit but untracked files present (use "git add" to track)

This is git’s way of doing things right. It is a developer friendly system assisting them at every step. Here git is telling us that we have untracked file, and to make it traceable we should use the git add command.

When we issue a git add command the contents of the directory move from the Working Area -> Staging Area. Every staging step leads to the creation of blobs and commits in the Indexing Area Trees. It is though out of scope for this blog post. Staging area is from where git starts tracking and saving changes that occur in the file. The .git directory, stores all of it. Let’s see how we stage our changes, we take the same example from above.

To add a specific file to the staging area use the command:

git add <your-File-Name> which in our case turns out to be:
git add HelloWorld.scala

In case you do not want to manually add every file from the working area to the staging area use:

git add .
The `.` is a wildcard denoting that git should move every file from the workspace to staging area. 

We will see something like this, after staging is successfull:

staging makes a file trackable.
staging makes a file trackable.

Now how is this less volatile? Consider removing the file as we did when it was in working area. Only this time when we would do a git status after removal of the file we will see the traces of it as shown below:

We now can see the traces of the file.
We now can see the traces of the file.

This clearly shows that git keeps a track of the file deleted from staging area. It is still in the Workspace, use the git add command to move it to the staging area.

3. Repository Area

Also known as local repository, to avoid any confusion with remote repository references. This is probably the safest area for the local code/content. We move from the Staging Area -> Local Repository when we issue a git commit command. Mainly what you will see in your Local Repository are all of your checkpoints or commits. It is the area that saves everything (so don’t delete it). A commit is simply a checkpoint telling git to track all changes that have occurred up to this point using our last commit as a comparison. The staging area will be empty, once the code gets committed.

Commit moves content from staging to repository area.
Commit moves content from staging to repository area.

Now to track the commits we can simply use a git log command.

4. Stash Area

To hold some content changes safely, git uses a special area within which is the stash area. These changes, later at some point of time, can be re imported. We use it when we want to record the current state of the working directory and the index, but want to go back to a clean working directory. Confusing? Let’s simplify by taking an example.

When you are in the middle of something, your boss comes in and demands that you fix something immediately. Traditionally, you would make a commit to a temporary branch to store your changes away, and return to your original branch to make the emergency fix, right? But this introduces an unnecessary branching burden. To avoid this we could simply use the stash area concept.

We can simply perform the following:

  1. Stash the current content, that we were working on.
  2. Put an emergrncy fix to the code, commit it and push it to remote.
  3. Pull the stashed content from the stash area and continue working on it.

There is only one command that in general affects the stash area and that is the git stash command.

The below screenshot shows how to play with stash:

git stash requirement
git stash requirement

Stash in action

We received a request to fix the previous version first, while we were updating HelloWorld.scala’s content locally. Therefore, we will need to stash the current changes and fix the breaking file.

git stash
Saved working directory and index state WIP on feature/add-Input: 603cfc8 Add HelloWorld to Local Repository area.

When we issued stash command git responds back with a confirmation message that indicates a stash has occured. Refer to the screenshot below:

stash done, emergency commit done.
stash done, emergency commit done.

Now the content of HelloWorld.scala is:

knoldus@knoldus-vostro-15-3568:~/Desktop/gitBlog$ cat HelloWorld.scala
object HelloWorld extends App {
println("Hello Scala World, fixing in emergency!")
}

knoldus@knoldus-vostro-15-3568:~/Desktop/gitBlog

Let’s apply stash now, we can do so by:

  1. Searching through the stash list using command: git stash list
  2. Use the command: git stash apply <stashId> with the proper stash id from the list.
  3. On stashing success, the HelloWorld.scala file would be marked untracked.
  4. Check the contents of the file, we will find the emergency commit and our stashed content together in the file.

Stashing result

Shown below is the result of Stashing:

stash is applied and content is merged with the emergency commit.
stash is applied and content is merged with the emergency commit.

See the documentation part /**…/ that was what we working on earlier, before we stashed it and went to do the emergency fix.

Once we are done with the stash, we can clear the stash using: git stash clear command.

Conclusion

I believe this blog post has covered a lot of ground with respect to the git working areas. We now have a sound understanding of how code content moves to & from within these areas. We also observed when does a particular area comes into the picture and its interaction with the other working areas. The following image summarises our observations.

git working areas conclusion
git working areas conclusion

Hopefully, this would prove out to be useful to anyone using git. Comments, Queries and suggestions are heartly welcome on the post. Please spread the word if you find this insightful and useful.

Please stay tuned for, Bonus Blog covering git best practices and some of the widely used git commands. It would consider this blog as a prerequisite. As a result, we will be able to generate an understandin of the complete git workflow.

Update: The blog on git useful commands can be found here.

References:

  1. https://git-scm.com/docs/git-stash
  2. https://app.pluralsight.com/library/courses/mastering-git/table-of-contents
  3. https://medium.com/@lucasmaurer/git-gud-the-working-tree-staging-area-and-local-repo-a1f0f4822018
Knoldus-blog-footer-image

Written by 

Prashant is a Senior Software Consultant having experience more than 5 years, both in service development and client interaction. He is familiar with Object-Oriented Programming Paradigms and has worked with Java and Scala-based technologies and has experience working with the reactive technology stack, following the agile methodology. He's currently involved in creating reactive microservices some of which are already live and running successfully in production, he has also worked upon scaling of these microservices by implementing the best practices of APIGEE-Edge. He is a good team player, and currently managing a team of 4 people. He's always eager to learn new and advance concepts in order to expand his horizon and apply them in project development with his skills. His hobbies include Teaching, Playing Sports, Cooking, watching Sci-Fi movies and traveling with friends.