GitPython: How to use Git with Python

Reading Time: 4 minutes

GitPython is a python library used to interact with git repositories. It is a module in python used to access our git repositories.

It provides abstractions of git objects for easy access of repository data, and additionally allows you to access the git repository more directly using pure python implementation.

Requirements for GitPython

  • Python3
  • Git
  • GitPython module
  • pip and virtualenv, which come packaged with Python 3, to install and isolate the GitPython library from any other Python project.

Installing GitPython

Firstly, we need to create a new virtual environment for our project. My virtualenv is named testgitpython but you can name according to yourself.

knoldus@knoldus-Vostro-3559:~$ python3 -m venv gitpython 

Secondly, activate the newly created virtualenv.

knoldus@knoldus-Vostro-3559:~$ source gitpython/bin/activate

If virtual environment activated then its name is prepended to the command prompt as shown below:

knoldus@knoldus-Vostro-3559:~$ python3 -m venv gitpython
knoldus@knoldus-Vostro-3559:~$ source gitpython/bin/activate
(gitpython) knoldus@knoldus-Vostro-3559:~$ 

After activating the virtualenv, it’s time to install GitPython using pip command. You can also specify the GitVersion available as pip install GitPython==2.1.7 but by default it install the latest version.

(gitpython) knoldus@knoldus-Vostro-3559:~$ pip install GitPython
Collecting GitPython
  Downloading https://files.pythonhosted.org/packages/09/bc/ae32e07e89cc25b9e5c793d19a1e5454d30a8e37d95040991160f942519e/GitPython-3.1.8-py3-none-any.whl (159kB)
    100% |████████████████████████████████| 163kB 738kB/s 
Collecting gitdb<5,>=4.0.1 (from GitPython)
  Downloading https://files.pythonhosted.org/packages/48/11/d1800bca0a3bae820b84b7d813ad1eff15a48a64caea9c823fc8c1b119e8/gitdb-4.0.5-py3-none-any.whl (63kB)
    100% |████████████████████████████████| 71kB 2.3MB/s 
Collecting smmap<4,>=3.0.1 (from gitdb<5,>=4.0.1->GitPython)
  Downloading https://files.pythonhosted.org/packages/b0/9a/4d409a6234eb940e6a78dfdfc66156e7522262f5f2fecca07dc55915952d/smmap-3.0.4-py2.py3-none-any.whl
Installing collected packages: smmap, gitdb, GitPython
Successfully installed GitPython-3.1.8 gitdb-4.0.5 smmap-3.0.4
(gitpython) knoldus@knoldus-Vostro-3559:~$

After installing the GitPython now start writing script to interacting with Git repositories.

Clone Repository

We can use git module in python to clone the repository from git.
Clone the repository you want to work with in local system.

(gitpython) knoldus@knoldus-Vostro-3559:~/gitpython$ vim gitclone.py 
from git import Repo
Repo.clone_from("https://github.com/official-himanshu/JavaPro.git", "/home/knoldus/clone")

So in clone_from methods pass the two arguments in which first argument is url of your repository and second argument is the location of your directory where you want to cloned the repo. In my case it is /home/knoldus/clone.

GitPython can also work with remote repository directly but for our simplicity we are using above local repository which we cloned in our system.

Copy the location where we saved the cloned repository because we need to tell GitPython which repository to handle. For this we can do the following:

(gitpython) knoldus@knoldus-Vostro-3559:~/cd /home/knoldus/clone
(gitpython) knoldus@knoldus-Vostro-3559:~/clone$ pwd
/home/knoldus/clone
(gitpython) knoldus@knoldus-Vostro-3559:~/clone$ 

In your case it may be different. This path is our absolute path to the base of the Git repository we just cloned into a directory.

For our convenience we use export command to set an environment variable for the absolute path to the Git repository or you can give this path directly in your script.

(gitpython) knoldus@knoldus-Vostro-3559:~/gitpython$ export REPO_PATH='/home/knoldus/clone'

Print Commit Data

To print commit data of our repository we need to create a new python file named read_repository.py.

In this file we are going to use git module to print the commit details of our repository which we just cloned.

import os
from git import Repo

COMMITS_TO_PRINT = 5

os module is used to read the absolute path of our git repository stored in our system i.e REPO_PATH variable we set earlier.

COMMITS_TO_PRINT is a constant which is used to print the specific range of commit.

Now create a function print_commit_data in the same file to print individual commit data:

def print_commit_data(commit):
    print('-----')
    print(str(commit.hexsha))
    print("\"{}\" by {} ({})".format(commit.summary, commit.author.name, commit.author.email))
    print(str(commit.authored_datetime))
    print(str("count: {} and size: {}".format(commit.count(), commit.size)))

This function takes a GitPython commit object and print the following information related to our above cloned repository.

  • 40 character SHA-1 hash of the commit.
  • the commit summary
  • author name
  • author email
  • commit date and time
  • count and update size

Print Repository Information

Create another function named print_repository_info to print details of the Repo object:

def print_repository_info(repo):
    print('Repository description: {}'.format(repo.description))
    print('Repository active branch is {}'.format(repo.active_branch))

    for remote in repo.remotes:
        print('Remote named "{}" with URL "{}"'.format(remote, remote.url))

    print('Last commit for repository is {}.'.format(str(repo.head.commit.hexsha)))

print_repository_info is similar to print_commit_data but instead prints the following :

  • Repository description
  • Active branch
  • All remote Git URLs configured for this repository
  • Latest commits.

Main function

Now create a “main” function to invoke the script from the terminal using the python command.

if name == "main":
repo_path = os.getenv('REPO_PATH')
# Repo object used to interact with Git repositories
repo = Repo(repo_path)

# check that the repository loaded correctly
if not repo.bare:
    print('Repo at {} successfully loaded.'.format(repo_path))
    print_repository_info(repo)

    # create list of commits then print some of them to stdout
    commits = list(repo.iter_commits('master'))[:COMMITS_TO_PRINT]
    for commit in commits:
        print_commit_data(commit)
        pass

else:
    print('Could not load repository at {} :'.format(repo_path))

Here in main function os module is used to grab the repo path environment variable because it is used to create a Repo object based on the path.

If the repo is not empty, which indicates a failure to find the repo, then both the above functions are called.

Test GitPython script

Invoke the read_repo.py file using the following command.

(gitpython) knoldus@knoldus-Vostro-3559:~/gitpython$ python3 read_repository.py 

Above command will show the output as following:

\Repo at /home/knoldus/clone successfully loaded.
Repo description: Unnamed repository; edit this file 'description' to name the repository.
Rep active branch is master
Remote named "origin" with URL "https://github.com/official-himanshu/JavaPro.git"
Last commit for repo is 1d09e78b575b795bf428636c1c7c3ce036f4d071.
-----
1d09e78b575b795bf428636c1c7c3ce036f4d071
"changes" by Himanshu Chaudhary (himanshu.chaudhary@knoldus.com)
2020-09-11 16:00:02+05:18
count: 5 and size: 258
-----
7ce1107bf982714067f489c077ad2a2648809feb
"new commit" by Himanshu Chaudhary (himanshu.chaudhary@knoldus.com)
2020-09-11 14:40:40+05:18
count: 4 and size: 261
-----
c47d2bb14575368f2ea5914160872ab9407c05dd
"Second java to test jenkins" by Himanshu Chaudhary (himanshu.chaudhary@knoldus.com)
2020-09-11 12:15:15+05:18
count: 3 and size: 278
-----
199271fde329d1feb04aa5fc9220a0a465e1f483
"Jenkins testing" by Himanshu Chaudhary (himanshu.chaudhary@knoldus.com)
2020-09-11 12:10:35+05:18
count: 2 and size: 266
-----
508eeb3211fc940f8315e230ddce501b0518df8b
"Initial commit" by official-himanshu (69458189+official-himanshu@users.noreply.github.com)
2020-09-11 12:08:28+05:18
count: 1 and size: 685
(gitpython) knoldus@knoldus-Vostro-3559:~/gitpython$ 

These are those commits which are based on the last 5 commits I’ve pushed to the GitHub repository.

Conclusion

In this blog we just cloned a Git repository and used the GitPython library to read data about the repository and its commits.

GitPython can do more than just read data even we can directly interact with git repository and can perform all the operations like git init, git add, git commit, git push, git pull, etc same as git.

References

Knoldus Footer