How to use Python to manipulate Git code? Introduction to GitPython

created at 06-30-2021 views: 2

Sometimes, you need to do complex Git operations, and there is a lot of intermediate logic. Using Shell to do complex logic operations and process control is a disaster. Therefore, it is a pleasant choice to implement it in Python. At this time, you need to operate the Git library in Python.

Introduction to GitPython

GitPython is a Python library that interacts with the Git library, including low-level commands (Plumbing) and high-level commands (Porcelain). It can implement most of the Git read and write operations, avoiding the malformed code that frequently interacts with the Shell. It is not a pure Python implementation, but one part relies on the direct execution of git commands, and the other part relies on GitDB.

GitDB is also a Python library. It establishes a database model for .git/objects, which can realize direct reading and writing. Due to the use of stream reading and writing, it runs efficiently and has low memory usage.

Install GitPython

pip install GitPython

It relies on GitDB to be installed automatically, but executable git commands require additional installation.

Basic usage

init

import git
repo = git.Repo.init(path='.')

This creates a Git library in the current directory. Of course, the path can be customized.

Since git.Repo implements __enter__ and __exit__, it can be used in conjunction with with

with git.Repo.init(path='.') as repo:
    # do sth with repo

However, since only some cleanup operations are implemented, it can still be read and written after being closed, so it is not necessary to use this form.

clone

There are two types of clone. One is to clone from the current library to another location:

new_repo = repo.clone(path='../new')

The second is to clone from a URL to a local location:

new_repo = git.Repo.clone_from(url='git@github.com:USER/REPO.git', to_path='../new')

commit

with open('test.file', 'w') as fobj:
    fobj.write('1st line\n')
repo.index.add(items=['test.file'])
repo.index.commit('write a line into test.file')

with open('test.file', 'aw') as fobj:
    fobj.write('2nd line\n')
repo.index.add(items=['test.file'])
repo.index.commit('write another line into test.file')

status
GitPython does not implement the original git status, but gives some information.

>>> repo.is_dirty()
False
>>> with open('test.file', 'aw') as fobj:
>>>     fobj.write('dirty line\n')
>>> repo.is_dirty()
True
>>> repo.untracked_files
[]
>>> with open('untracked.file', 'w') as fobj:
>>>     fobj.write('')
>>> repo.untracked_files
['untracked.file']

checkout (clean up all changes)

>>> repo.is_dirty()
True
>>> repo.index.checkout(force=True)
<generator object <genexpr> at 0x7f2bf35e6b40>
>>> repo.is_dirty()
False

branch
Get the current branch:

head = repo.head

New branch:

new_head = repo.create_head('new_head', 'HEAD^')

Switch branches:

new_head.checkout()
head.checkout()

Delete branch:

git.Head.delete(repo, new_head)
# or
git.Head.delete(repo, 'new_head')

merge
The following demonstrates how to merge another branch (master) in one branch (other).

master = repo.heads.master
other = repo.create_head('other', 'HEAD^')
other.checkout()
repo.index.merge_tree(master)
repo.index.commit('Merge from master to other')

remote, fetch, pull, push

Create remote:

remote = repo.create_remote(name='gitlab', url='git@gitlab.com:USER/REPO.git')

Remote interactive operation:

remote = repo.remote()
remote.fetch()
remote.pull()
remote.push()

Delete remote:

repo.delete_remote(remote)
# or
repo.delete_remote('gitlab')

other
There are other related operations such as Tag and Submodule, which are not very commonly used, so I won't introduce them here.

The advantage of GitPython is that it can easily obtain internal information when doing read operations. The disadvantage is that it feels very uncomfortable when doing write operations. Of course, it also supports direct execution of git operations.

git = repo.git
git.status()
git.checkout('HEAD', b="my_new_branch")
git.branch('another-new-one')
git.branch('-D', 'another-new-one')

Other ways to operate Git

subprocess
In another process, execute the Shell command and parse the returned result through stdio.

import subprocess
subprocess.call(['git', 'status'])

dulwich
dulwich is a Git interactive library implemented in pure Python, let's study it later when you have time.

dulwich

pygit2

pygit2 is a Python library based on libgit2. The bottom layer is C, and the upper layer Python is just an interface. The operating efficiency should be the highest, but the lonely still gave up. The disadvantage is that libgit2 needs to be pre-installed in the environment. In contrast, GitPython only requires environment presets Git, which is much simpler.

Official

reference

  1. "GitPython Documentation"
  2. "Welcome to GitDB’s documentation!"
  3. "Git-Low Level Commands (Plumbing) and High Level Commands (Porcelain)"
  4. "GitPython | Hom"
Please log in to leave a comment.