Git: Transparent Encryption for Private Repositories

Table of Contents

The main purpose is simple: if a private repository leaks, the attacker should not be able to read the code. A stolen GitHub token, an accidental visibility change, or an exposed mirror should reveal encrypted text, not source files.

The hard part is keeping development pleasant. Source code should remain plaintext locally so editors, build tools, and AI coding agents can work normally. Encryption should happen at the Git boundary, before content becomes part of commits or reaches the remote.

Target state:

Working tree stays plaintext.
Git blobs are encrypted.
GitHub stores encrypted ASCII text.
New commits encrypt automatically on git add.
Old commits are rewritten so historical versions are encrypted too.

This is different from using SOPS for a few secret files. SOPS is great for secrets, but full source trees need transparent encryption. A Git clean/smudge filter fits that job: the clean filter encrypts when staging, and the smudge filter decrypts when checking out.

The filter must be deterministic. If the same plaintext produced different ciphertext every time, Git would constantly think files changed. Deterministic authenticated encryption, such as AES-256-SIV, keeps git status stable while still detecting tampering.

This protects source contents and historical file versions on the remote. It does not hide commit messages, filenames, file sizes, branch names, timing, logs, build artifacts, or anything copied outside Git. It also does not protect against malware or anyone with access to the unlocked development machine or the local encryption key.

Repo Layout #

The encrypted Git blob starts with a clear marker:

repo-crypt-v1:aes-256-siv

Then the encrypted payload is base64 text.

GitHub can render the file as text and show commit history, but the visible content is ciphertext.

The only plaintext tracked file required by Git is .gitattributes:

* filter=repo-crypt -text
.gitattributes -filter text eol=lf

That tells Git to apply the encryption filter to every tracked file except .gitattributes itself.

Key Setup #

Use a dedicated repository encryption key. Do not reuse existing SOPS, infrastructure, SSH, or GPG keys.

Create a local symmetric key:

mkdir -p "$HOME/.config/repo-crypt"
umask 077
openssl rand -base64 64 > "$HOME/.config/repo-crypt/repos-key.txt"
chmod 600 "$HOME/.config/repo-crypt/repos-key.txt"

Back up this file securely:

$HOME/.config/repo-crypt/repos-key.txt

If this key is lost, the encrypted repository cannot be decrypted.

Git Filter Setup #

Install the Python dependency:

python3 -m pip install cryptography

Place the repo encryption script somewhere local to the repository, for example:

.git/repo-crypt-filter.py

Configure Git to use it:

chmod 700 .git/repo-crypt-filter.py
git config filter.repo-crypt.clean "python3 $(pwd)/.git/repo-crypt-filter.py clean --path %f"
git config filter.repo-crypt.smudge "python3 $(pwd)/.git/repo-crypt-filter.py smudge --path %f"
git config filter.repo-crypt.required true

required=true matters. If the filter is missing or broken, Git should fail instead of silently committing plaintext.

Encrypting Existing History #

Adding the filter only protects future commits. It does not encrypt old commits.

To protect historical source code, rewrite the entire history so every historical blob becomes encrypted:

FILTER_BRANCH_SQUELCH_WARNING=1 git filter-branch --force --index-filter 'python3 .git/repo-crypt-filter.py index-filter' -- --all

After rewriting, remove the plaintext backup refs created by filter-branch:

python3 - <<'PY'
import subprocess
refs = subprocess.check_output(['git','for-each-ref','--format=%(refname)','refs/original']).decode().splitlines()
for ref in refs:
    subprocess.check_call(['git','update-ref','-d',ref])
print(f'deleted_original_refs={len(refs)}')
PY

git reflog expire --expire=now --all
git gc --prune=now --aggressive

Important consequences:

Commit hashes change.
Signed commits and tags need re-signing.
Open pull requests against old history may no longer apply cleanly.
Any old clone or backup may still contain plaintext history.

Safety Hooks #

Use hooks as a second line of defense.

A pre-commit hook should inspect staged blobs and reject anything that is not encrypted, except .gitattributes.

A pre-push hook should inspect objects being pushed and reject pushes containing plaintext blobs.

This prevents common failure modes:

Someone disables or misconfigures the clean filter.
A new machine is cloned without setup.
A script stages files incorrectly.
A branch with plaintext history is accidentally pushed.

Hooks are local Git files, so they must be installed on every development machine.

New Machine Workflow #

The safe clone flow is to clone without checkout first:

git clone --no-checkout https://github.com/example/private-repo.git private-repo
cd private-repo

Then restore the key:

mkdir -p "$HOME/.config/repo-crypt"
chmod 700 "$HOME/.config/repo-crypt"
chmod 600 "$HOME/.config/repo-crypt/repos-key.txt"

Then install the filter and hooks before checkout:

python3 -m pip install cryptography
chmod 700 .git/repo-crypt-filter.py
git config filter.repo-crypt.clean "python3 $(pwd)/.git/repo-crypt-filter.py clean --path %f"
git config filter.repo-crypt.smudge "python3 $(pwd)/.git/repo-crypt-filter.py smudge --path %f"
git config filter.repo-crypt.required true

Only then checkout the branch:

git checkout main

Expected result:

Working tree files are plaintext.
Git blobs remain encrypted.
git status is clean.

Verification #

Check that the working tree is plaintext while Git stores ciphertext:

python3 - <<'PY'
from pathlib import Path
import subprocess, sys

HEADER=b'repo-crypt-v1:aes-256-siv\n'
path='README.md'

worktree_encrypted = Path(path).read_bytes().startswith(HEADER)
blob_encrypted = subprocess.check_output(['git','show',f'HEAD:{path}']).startswith(HEADER)

print(f'worktree_encrypted={worktree_encrypted}')
print(f'git_blob_encrypted={blob_encrypted}')

if worktree_encrypted or not blob_encrypted:
    sys.exit(1)
PY

Expected:

worktree_encrypted=False
git_blob_encrypted=True

Check all reachable Git blobs:

python3 - <<'PY'
import subprocess, sys

HEADER=b'repo-crypt-v1:aes-256-siv\n'
lines=subprocess.check_output(['git','rev-list','--objects','--all']).splitlines()
bad=[]
checked=0

for line in lines:
    parts=line.split(b' ', 1)
    oid=parts[0].decode()
    path=parts[1].decode('utf-8','surrogateescape') if len(parts) == 2 else ''
    if subprocess.check_output(['git','cat-file','-t',oid]).strip() != b'blob':
        continue
    data=subprocess.check_output(['git','cat-file','-p',oid])
    checked += 1
    if data.startswith(HEADER):
        continue
    if path == '.gitattributes' and b'filter=repo-crypt' in data:
        continue
    bad.append(path or oid)

print(f'checked_blobs={checked} bad={len(bad)}')
if bad:
    print('\n'.join(bad[:50]))
    sys.exit(1)
PY

Expected:

bad=0

Daily Workflow #

Development stays normal:

git status
git add <files>
git commit -m "message"
git push

The important detail is that encryption happens on git add, not on git push.

Push is just transport. If plaintext entered Git history, push would send plaintext unless a hook blocks it.

Reviewing Old Commits #

GitHub cannot decrypt commit diffs. Local Git history tools may show ciphertext unless they are filter-aware.

One practical workaround is a local helper command that reads encrypted blobs from two commits, decrypts them locally, and prints a normal unified diff.

Example interface:

git-decr-commit
git-decr-commit --prev 2
git-decr-commit --hash <commit-hash>

This keeps the remote blind while preserving local review capability.

Final Notes #

This pattern is defense in depth for private source code. It does not replace good access control, 2FA, scoped tokens, secret scanning, CI isolation, or backups.

It does change the security default, though. A leaked private Git repository becomes much less useful to an attacker because the valuable part, the source files and historical versions, are encrypted before they ever reach the remote.