DEV Community

Seenevasaraj
Seenevasaraj

Posted on

READ FILES IN GIT REPO USING DIFFERENT MODULES IN PYTHON

Hey Reader,

My name is Seenevasaraj, and I am working as Software Developer at Luxoft India. The various project at Luxoft I am implementing automation in all viable way. Through this article I desired to provide an explanation about how we can automate the git repo files reading using some of python modules.

GIT
Git is distributed version control system widely used in source code management and tracking changes in source code,It created by Linus Torvalds in 2005 to manage the development of Linux kernel, it become the standard for version control in many other projects and industries

Repository-A repository or repo, is a collection of files and their revision history. Git repositories can exist locally on your computer or remotely on a server.

Clone-Clone a repository means to create a copy of it on your local machine

Commit-Commit is snapshot of changes made to the repository at a specific time. It represents a single atomic unit of change.

Branch-Branch is a parallel version of a repository codeIt allows developers to work on separate features or fixes without interfering with the main codebaseGit uses branches to manage changes.

Merge-Merging is process of integrating changes from one branch into another it combines the changes made in one branch with another branch

Pull Request-pull request[PR] is way to propose changes to repositoryIt allows developers to review code, discuss changes, and collaborate on projects in many workflows,pull requests are used for code review before changes are merged into the main codebase

Push-pushing refers to sending committed changes to remote repository and making them accessible to others

Fetch-Fetching retrieves changes from remote repository without merging them into your local branchesIt updates your local repository with changes from remote server

Pull-Pulling is combination of fetching changes and merging them into your local branch

Remote-remote is version of the repository hosted on the internet or network,It serves as common repository for all team members to exchange their changes

These are just a few of the fundamental concepts and commands. Git powerful tool with many features and capabilities that enable efficient collaboration and version control in software development projects.

Modules to read files in git branch

  1. requests

  2. gitpython

requests module

To establish connection with Git branches in Python requests module typically interact with a Git repository through its remote URL using Git's HTTP(S) API

Image description

In this example:

  • We call the URL to Git repository, which is used to interact with using HTTP.

  • We can make GET request to this URL using requests.get()

  • We can read response lines and extract branch names using iter_lines() method of the response object,branch names typically found in lines starting with "refs/heads/"

Git repository is accessible over HTTP(S) and that the server supports Git's HTTP protocol,some repositories may require authentication, include appropriate authentication headers in the request

gitpython

install GitPython by using pip install GitPython command. provide proper repo_path, branch, and file_paths variables according to your Git repository structure and the files you intend to read.

Read files from Git repository using GitPython in Python,which provides high-level interface to interact with Git repositories.

Image description

  • import the Repo class from the git module of GitPython

  • define a function read_files_from_git that takes path to repo_path, the branch name and list of file paths as input.

  • Inside function open the Git repository using Repo(repo_path).

  • retrieve the commit at the head of the specified branch using repo.commit(branch)

  • iterate over the list of file paths provided and read the contents of each file from the commit's tree object using commit.tree[file_path].data_stream.read().decode('utf-8')

  • return python dictionary format containing the file paths as keys and values

Comparision of requests and gitpython

Purpose:

Requests: requests is primarily used to make HTTP request,it allows to interact with web services APIs, and web pages

GitPython: it provides interface to interact with Git repositories directly from Python code and allows to perform various Git operations eg,cloning committing, branching, merging,etc..

Level of Abstraction

Requests: requests operates at lower level of abstraction dealing directly with HTTP requests&responses,provides functionalities for sending HTTP requests & responses managing cookies and setting headers

GitPython: It operates at higher level of abstraction providing an object-oriented interface to interact with Git repo and abstracts away many of low-level details of Git commands and allows to perform Git operations programmatically

Use Cases

Requests: requests is common tasks - consuming RESTful APIs, fetching web page downloading files and interacting with web services

GitPython: GitPython is used to work with Git repositories programmatically like automating Git operations analyzing repository data extracting file contents or integrating Git functionality into larger Python applications

Conclusion
As developer, based on development environment they can select either requests or gitpython, more less both have same kind of functionalities. But requests module have more advantage by making REST-API requests.

Top comments (0)