Our Git repositories started out as parts of a single monster SVN repository where the individual projects each had their own tree like so:
project1/branches
/tags
/trunk
project2/branches
/tags
/trunk
Obviously, it was pretty easy to move files from one to another with svn mv
. But in Git, each project is in its own repository, and today I was asked to move a subdirectory from project2
to project1
. I did something like this:
$ git clone project2
$ cd project2
$ git filter-branch --subdirectory-filter deeply/buried/java/source/directory/A -- --all
$ git remote rm origin # so I don't accidentally overwrite the repo ;-)
$ mkdir -p deeply/buried/different/java/source/directory/B
$ for f in *.java; do
> git mv $f deeply/buried/different/java/source/directory/B
> done
$ git commit -m "moved files to new subdirectory"
$ cd ..
$
$ git clone project1
$ cd project1
$ git remote add p2 ../project2
$ git fetch p2
$ git branch p2 remotes/p2/master
$ git merge p2 # --allow-unrelated-histories for git 2.9+
$ git remote rm p2
$ git push
But that seems pretty convoluted. Is there a better way to do this sort of thing in general? Or have I adopted the right approach?
Note that this involves merging the history into an existing repository, rather than simply creating a new standalone repository from part of another one (as in an earlier question).
git fetch p2 && git merge p2
instead of git fetch p2 && git branch .. && git merge p2
? Edit: alright, it looks like you want to get the changes in a new branch named p2, not the current branch.
git filter-repo
is the correct tool for doing this in 2021, rather than filter-branch
.
If your history is sane, you can take the commits out as patch and apply them in the new repository:
cd repository
git log --pretty=email --patch-with-stat --reverse --full-index --binary -- path/to/file_or_folder > patch
cd ../another_repository
git am --committer-date-is-author-date < ../repository/patch
Or in one line
git log --pretty=email --patch-with-stat --reverse --full-index --binary -- path/to/file_or_folder | (cd /path/to/new_repository && git am --committer-date-is-author-date)
(Taken from Exherbo’s docs)
Having tried various approaches to move a file or folder from one Git repository to another, the only one which seems to work reliably is outlined below.
It involves cloning the repository you want to move the file or folder from, moving that file or folder to the root, rewriting Git history, cloning the target repository and pulling the file or folder with history directly into this target repository.
Stage One
Make a copy of repository A as the following steps make major changes to this copy which you should not push! git clone --branch
Stage Two
Cleanup step git reset --hard Cleanup step git gc --aggressive Cleanup step git prune
You may want to import these files into repository B within a directory not the root:
Make that directory mkdir
Stage Three
Make a copy of repository B if you don’t have one already git clone
Yep, hitting on the --subdirectory-filter
of filter-branch
was key. The fact that you used it essentially proves there's no easier way - you had no choice but to rewrite history, since you wanted to end up with only a (renamed) subset of the files, and this by definition changes the hashes. Since none of the standard commands (e.g. pull
) rewrite history, there's no way you could use them to accomplish this.
You could refine the details, of course - some of your cloning and branching wasn't strictly necessary - but the overall approach is good! It's a shame it's complicated, but of course, the point of git isn't to make it easy to rewrite history.
--index-filter
in the filter-branch
manpage.
This becomes simpler by using git-filter-repo.
In order to move project2/sub/dir
to project1/sub/dir
:
# Create a new repo containing only the subdirectory:
git clone project2 project2_clone --no-local
cd project2_clone
git filter-repo --path sub/dir
# Merge the new repo:
cd ../project1
git remote add tmp ../project2_clone/
git fetch tmp master
git merge remotes/tmp/master --allow-unrelated-histories
git remote remove tmp
To install the tool simply: pip3 install git-filter-repo
(more details and options in README)
# Before: (root)
.
|-- project1
| `-- 3
`-- project2
|-- 1
`-- sub
`-- dir
`-- 2
# After: (project1)
.
├── 3
└── sub
└── dir
└── 2
git remote add
and the git merge
you need to run git fetch
to make the target repository aware of the changes in the source repository.
git filter-repo --path sub/dir --path-rename sub:newsub
to get a a tree of /newsub/dir
. This tool makes the process extremely simple.
git filter-repo --path CurrentPathAfterRename --path OldPathBeforeRename
. git filter-repo --analyze
produces a file renames.txt that can be helpful in determining these. Alternatively, you may find a script like this helpful.
git filter-repo
command arguments just add a --path
argument for each individual file or directory you want to move.
I found Ross Hendrickson's blog very useful. It is a very simple approach where you create patches that are applied to the new repo. See the linked page for more details.
It only contains three steps (copied from the blog):
# Setup a directory to hold the patches
mkdir <patch-directory>
# Create the patches
git format-patch -o <patch-directory> --root /path/to/copy
# Apply the patches in the new repo using a 3 way merge in case of conflicts
# (merges from the other repo are not turned into patches).
# The 3way can be omitted.
git am --3way <patch-directory>/*.patch
The only issue I had was that I could not apply all patches at once using
git am --3way <patch-directory>/*.patch
Under Windows I got an InvalidArgument error. So I had to apply all patches one after another.
KEEPING THE DIRECTORY NAME
The subdirectory-filter (or the shorter command git subtree) works good but did not work for me since they remove the directory name from the commit info. In my scenario I just want to merge parts of one repository into another and retain the history WITH full path name.
My solution was to use the tree-filter and to simply remove the unwanted files and directories from a temporary clone of the source repository, then pull from that clone into my target repository in 5 simple steps.
# 1. clone the source
git clone ssh://<user>@<source-repo url>
cd <source-repo>
# 2. remove the stuff we want to exclude
git filter-branch --tree-filter "rm -rf <files to exclude>" --prune-empty HEAD
# 3. move to target repo and create a merge branch (for safety)
cd <path to target-repo>
git checkout -b <merge branch>
# 4. Add the source-repo as remote
git remote add source-repo <path to source-repo>
# 5. fetch it
git pull source-repo master
# 6. check that you got it right (better safe than sorry, right?)
gitk
The one I always use is here http://blog.neutrino.es/2012/git-copy-a-file-or-directory-from-another-repository-preserving-history/ . Simple and fast.
For compliance with stackoverflow standards, here is the procedure:
mkdir /tmp/mergepatchs
cd ~/repo/org
export reposrc=myfile.c #or mydir
git format-patch -o /tmp/mergepatchs $(git log $reposrc|grep ^commit|tail -1|awk '{print $2}')^..HEAD $reposrc
cd ~/repo/dest
git am /tmp/mergepatchs/*.patch
git log
displays in color for you, the grep ^commit
might not work. if so, add --no-color
to that git log
command. (e.g., git log --no-color $reposrc
)
Having had a similar itch to scratch (altough only for some files of a given repository) this script proved to be really helpful: git-import
The short version is that it creates patch files of the given file or directory ($object
) from the existing repository:
cd old_repo
git format-patch --thread -o "$temp" --root -- "$object"
which then get applied to a new repository:
cd new_repo
git am "$temp"/*.patch
For details please look up:
the documented source
git format-patch
git am
Update (from another author) This useful approach can be used by the following bash function. Here is an example usage:
gitcp <Repo1_basedir> <path_inside_repo1> <Repo2_basedir>
gitcp ()
{
fromdir="$1";
frompath="$2";
to="$3";
echo "Moving git files from "$fromdir" at "$frompath" to "$to" ..";
tmpdir=/tmp/gittmp;
cd "$fromdir";
git format-patch --thread -o $tmpdir --root -- "$frompath";
cd "$to";
git am $tmpdir/*.patch
}
This answer provide interesting commands based on git am
and presented using examples, step by step.
Objective
You want to move some or all files from one repository to another.
You want to keep their history.
But you do not care about keeping tags and branches.
You accept limited history for renamed files (and files in renamed directories).
Procedure
Extract history in email format using git log --pretty=email -p --reverse --full-index --binary Reorganize file tree and update filename change in history [optional] Apply new history using git am
1. Extract history in email format
Example: Extract history of file3
, file4
and file5
my_repo
├── dirA
│ ├── file1
│ └── file2
├── dirB ^
│ ├── subdir | To be moved
│ │ ├── file3 | with history
│ │ └── file4 |
│ └── file5 v
└── dirC
├── file6
└── file7
Clean the temporary directory destination
export historydir=/tmp/mail/dir # Absolute path
rm -rf "$historydir" # Caution when cleaning
Clean your the repo source
git commit ... # Commit your working files
rm .gitignore # Disable gitignore
git clean -n # Simulate removal
git clean -f # Remove untracked file
git checkout .gitignore # Restore gitignore
Extract history of each file in email format
cd my_repo/dirB
find -name .git -prune -o -type d -o -exec bash -c 'mkdir -p "$historydir/${0%/*}" && git log --pretty=email -p --stat --reverse --full-index --binary -- "$0" > "$historydir/$0"' {} ';'
Unfortunately option --follow
or --find-copies-harder
cannot be combined with --reverse
. This is why history is cut when file is renamed (or when a parent directory is renamed).
After: Temporary history in email format
/tmp/mail/dir
├── subdir
│ ├── file3
│ └── file4
└── file5
2. Reorganize file tree and update filename change in history [optional]
Suppose you want to move these three files in this other repo (can be the same repo).
my_other_repo
├── dirF
│ ├── file55
│ └── file56
├── dirB # New tree
│ ├── dirB1 # was subdir
│ │ ├── file33 # was file3
│ │ └── file44 # was file4
│ └── dirB2 # new dir
│ └── file5 # = file5
└── dirH
└── file77
Therefore reorganize your files:
cd /tmp/mail/dir
mkdir dirB
mv subdir dirB/dirB1
mv dirB/dirB1/file3 dirB/dirB1/file33
mv dirB/dirB1/file4 dirB/dirB1/file44
mkdir dirB/dirB2
mv file5 dirB/dirB2
Your temporary history is now:
/tmp/mail/dir
└── dirB
├── dirB1
│ ├── file33
│ └── file44
└── dirB2
└── file5
Change also filenames within the history:
cd "$historydir"
find * -type f -exec bash -c 'sed "/^diff --git a\|^--- a\|^+++ b/s:\( [ab]\)/[^ ]*:\1/$0:g" -i "$0"' {} ';'
Note: This rewrites the history to reflect the change of path and filename. (i.e. the change of the new location/name within the new repo)
3. Apply new history
Your other repo is:
my_other_repo
├── dirF
│ ├── file55
│ └── file56
└── dirH
└── file77
Apply commits from temporary history files:
cd my_other_repo
find "$historydir" -type f -exec cat {} + | git am
Your other repo is now:
my_other_repo
├── dirF
│ ├── file55
│ └── file56
├── dirB ^
│ ├── dirB1 | New files
│ │ ├── file33 | with
│ │ └── file44 | history
│ └── dirB2 | kept
│ └── file5 v
└── dirH
└── file77
Use git status
to see amount of commits ready to be pushed :-)
Note: As the history has been rewritten to reflect the path and filename change: (i.e. compared to the location/name within the previous repo)
No need to git mv to change the location/filename.
No need to git log --follow to access full history.
Extra trick: Detect renamed/moved files within your repo
To list the files having been renamed:
find -name .git -prune -o -exec git log --pretty=tformat:'' --numstat --follow {} ';' | grep '=>'
More customizations: You can complete the command git log
using options --find-copies-harder
or --reverse
. You can also remove the first two columns using cut -f3-
and grepping complete pattern '{.* => .*}'.
find -name .git -prune -o -exec git log --pretty=tformat:'' --numstat --follow --find-copies-harder --reverse {} ';' | cut -f3- | grep '{.* => .*}'
Try this
cd repo1
This will remove all the directories except the ones mentioned, preserving history only for these directories
git filter-branch --index-filter 'git rm --ignore-unmatch --cached -qr -- . && git reset -q $GIT_COMMIT -- dir1/ dir2/ dir3/ ' --prune-empty -- --all
Now you can add your new repo in your git remote and push it to that
git remote remove origin <old-repo>
git remote add origin <new-repo>
git push origin <current-branch>
add -f
to overwrite
Using inspiration from http://blog.neutrino.es/2012/git-copy-a-file-or-directory-from-another-repository-preserving-history/ , I created this Powershell function for doing the same, which has worked great for me so far:
# Migrates the git history of a file or directory from one Git repo to another.
# Start in the root directory of the source repo.
# Also, before running this, I recommended that $destRepoDir be on a new branch that the history will be migrated to.
# Inspired by: http://blog.neutrino.es/2012/git-copy-a-file-or-directory-from-another-repository-preserving-history/
function Migrate-GitHistory
{
# The file or directory within the current Git repo to migrate.
param([string] $fileOrDir)
# Path to the destination repo
param([string] $destRepoDir)
# A temp directory to use for storing the patch file (optional)
param([string] $tempDir = "\temp\migrateGit")
mkdir $tempDir
# git log $fileOrDir -- to list commits that will be migrated
Write-Host "Generating patch files for the history of $fileOrDir ..." -ForegroundColor Cyan
git format-patch -o $tempDir --root -- $fileOrDir
cd $destRepoDir
Write-Host "Applying patch files to restore the history of $fileOrDir ..." -ForegroundColor Cyan
ls $tempDir -Filter *.patch `
| foreach { git am $_.FullName }
}
Usage for this example:
git clone project2
git clone project1
cd project1
# Create a new branch to migrate to
git checkout -b migrate-from-project2
cd ..\project2
Migrate-GitHistory "deeply\buried\java\source\directory\A" "..\project1"
After you've done this, you can re-organize the files on the migrate-from-project2
branch before merging it.
I wanted something robust and reusable (one-command-and-go + undo function) so I wrote the following bash script. Worked for me on several occasions, so I thought I'd share it here.
It is able to move an arbitrary folder /path/to/foo
from repo1
into /some/other/folder/bar
to repo2
(folder paths can be the same or different, distance from root folder may be different).
Since it only goes over the commits that touch the files in input folder (not over all commits of the source repo), it should be quite fast even on big source repos, if you just extract a deeply nested subfolder that was not touched in every commit.
Since what this does is to create an orphaned branch with all the old repo's history and then merge it to the HEAD, it will even work in case of file name clashes (then you'd have to resolve a merge at the end of course).
If there are no file name clashes, you just need to git commit
at the end to finalize the merge.
The downside is that it will likely not follow file renames (outside of REWRITE_FROM
folder) in the source repo - pull requests welcome on GitHub to accommodate for that.
GitHub link: git-move-folder-between-repos-keep-history
#!/bin/bash
# Copy a folder from one git repo to another git repo,
# preserving full history of the folder.
SRC_GIT_REPO='/d/git-experimental/your-old-webapp'
DST_GIT_REPO='/d/git-experimental/your-new-webapp'
SRC_BRANCH_NAME='master'
DST_BRANCH_NAME='import-stuff-from-old-webapp'
# Most likely you want the REWRITE_FROM and REWRITE_TO to have a trailing slash!
REWRITE_FROM='app/src/main/static/'
REWRITE_TO='app/src/main/static/'
verifyPreconditions() {
#echo 'Checking if SRC_GIT_REPO is a git repo...' &&
{ test -d "${SRC_GIT_REPO}/.git" || { echo "Fatal: SRC_GIT_REPO is not a git repo"; exit; } } &&
#echo 'Checking if DST_GIT_REPO is a git repo...' &&
{ test -d "${DST_GIT_REPO}/.git" || { echo "Fatal: DST_GIT_REPO is not a git repo"; exit; } } &&
#echo 'Checking if REWRITE_FROM is not empty...' &&
{ test -n "${REWRITE_FROM}" || { echo "Fatal: REWRITE_FROM is empty"; exit; } } &&
#echo 'Checking if REWRITE_TO is not empty...' &&
{ test -n "${REWRITE_TO}" || { echo "Fatal: REWRITE_TO is empty"; exit; } } &&
#echo 'Checking if REWRITE_FROM folder exists in SRC_GIT_REPO' &&
{ test -d "${SRC_GIT_REPO}/${REWRITE_FROM}" || { echo "Fatal: REWRITE_FROM does not exist inside SRC_GIT_REPO"; exit; } } &&
#echo 'Checking if SRC_GIT_REPO has a branch SRC_BRANCH_NAME' &&
{ cd "${SRC_GIT_REPO}"; git rev-parse --verify "${SRC_BRANCH_NAME}" || { echo "Fatal: SRC_BRANCH_NAME does not exist inside SRC_GIT_REPO"; exit; } } &&
#echo 'Checking if DST_GIT_REPO has a branch DST_BRANCH_NAME' &&
{ cd "${DST_GIT_REPO}"; git rev-parse --verify "${DST_BRANCH_NAME}" || { echo "Fatal: DST_BRANCH_NAME does not exist inside DST_GIT_REPO"; exit; } } &&
echo '[OK] All preconditions met'
}
# Import folder from one git repo to another git repo, including full history.
#
# Internally, it rewrites the history of the src repo (by creating
# a temporary orphaned branch; isolating all the files from REWRITE_FROM path
# to the root of the repo, commit by commit; and rewriting them again
# to the original path).
#
# Then it creates another temporary branch in the dest repo,
# fetches the commits from the rewritten src repo, and does a merge.
#
# Before any work is done, all the preconditions are verified: all folders
# and branches must exist (except REWRITE_TO folder in dest repo, which
# can exist, but does not have to).
#
# The code should work reasonably on repos with reasonable git history.
# I did not test pathological cases, like folder being created, deleted,
# created again etc. but probably it will work fine in that case too.
#
# In case you realize something went wrong, you should be able to reverse
# the changes by calling `undoImportFolderFromAnotherGitRepo` function.
# However, to be safe, please back up your repos just in case, before running
# the script. `git filter-branch` is a powerful but dangerous command.
importFolderFromAnotherGitRepo(){
SED_COMMAND='s-\t\"*-\t'${REWRITE_TO}'-'
verifyPreconditions &&
cd "${SRC_GIT_REPO}" &&
echo "Current working directory: ${SRC_GIT_REPO}" &&
git checkout "${SRC_BRANCH_NAME}" &&
echo 'Backing up current branch as FILTER_BRANCH_BACKUP' &&
git branch -f FILTER_BRANCH_BACKUP &&
SRC_BRANCH_NAME_EXPORTED="${SRC_BRANCH_NAME}-exported" &&
echo "Creating temporary branch '${SRC_BRANCH_NAME_EXPORTED}'..." &&
git checkout -b "${SRC_BRANCH_NAME_EXPORTED}" &&
echo 'Rewriting history, step 1/2...' &&
git filter-branch -f --prune-empty --subdirectory-filter ${REWRITE_FROM} &&
echo 'Rewriting history, step 2/2...' &&
git filter-branch -f --index-filter \
"git ls-files -s | sed \"$SED_COMMAND\" |
GIT_INDEX_FILE=\$GIT_INDEX_FILE.new git update-index --index-info &&
mv \$GIT_INDEX_FILE.new \$GIT_INDEX_FILE" HEAD &&
cd - &&
cd "${DST_GIT_REPO}" &&
echo "Current working directory: ${DST_GIT_REPO}" &&
echo "Adding git remote pointing to SRC_GIT_REPO..." &&
git remote add old-repo ${SRC_GIT_REPO} &&
echo "Fetching from SRC_GIT_REPO..." &&
git fetch old-repo "${SRC_BRANCH_NAME_EXPORTED}" &&
echo "Checking out DST_BRANCH_NAME..." &&
git checkout "${DST_BRANCH_NAME}" &&
echo "Merging SRC_GIT_REPO/" &&
git merge "old-repo/${SRC_BRANCH_NAME}-exported" --no-commit &&
cd -
}
# If something didn't work as you'd expect, you can undo, tune the params, and try again
undoImportFolderFromAnotherGitRepo(){
cd "${SRC_GIT_REPO}" &&
SRC_BRANCH_NAME_EXPORTED="${SRC_BRANCH_NAME}-exported" &&
git checkout "${SRC_BRANCH_NAME}" &&
git branch -D "${SRC_BRANCH_NAME_EXPORTED}" &&
cd - &&
cd "${DST_GIT_REPO}" &&
git remote rm old-repo &&
git merge --abort
cd -
}
importFolderFromAnotherGitRepo
#undoImportFolderFromAnotherGitRepo
SED_COMMAND='s@\t\"*@\t'${REWRITE_TO}'@'
2. In modern git, you must provide the --allow-unrelated-histories flag to merge: git merge "old-repo/${SRC_BRANCH_NAME}-exported" --no-commit --allow-unrelated-histories &&
I hope it will help someone, Ori.
git subtree
works intuitively and even preserves history.
Example usage: Add the git repo as a subdirectory:
git subtree add --prefix foo https://github.com/git/git.git master
Explanation:
#├── repo_bar
#│ ├── bar.txt
#└── repo_foo
# └── foo.txt
cd repo_bar
git subtree add --prefix foo ../repo_foo master
#├── repo_bar
#│ ├── bar.txt
#│ └── foo
#│ └── foo.txt
#└── repo_foo
# └── foo.txt
In my case, I didn't need to preserve the repo I was migrating from or preserve any previous history. I had a patch of the same branch, from a different remote
#Source directory
git remote rm origin
#Target directory
git remote add branch-name-from-old-repo ../source_directory
In those two steps, I was able to get the other repo's branch to appear in the same repo.
Finally, I set this branch (that I imported from the other repo) to follow the target repo's mainline (so I could diff them accurately)
git br --set-upstream-to=origin/mainline
Now it behaved as-if it was just another branch I had pushed against that same repo.
If the paths for the files in question are the same in the two repos and you're wanting to bring over just one file or a small set of related files, one easy way to do this is to use git cherry-pick
.
The first step is to bring the commits from the other repo into your own local repo using git fetch <remote-url>
. This will leave FETCH_HEAD
pointing to the head commit from the other repo; if you want to preserve a reference to that commit after you've done other fetches you may want to tag it with git tag other-head FETCH_HEAD
.
You will then need to create an initial commit for that file (if it doesn't exist) or a commit to bring the file to a state that can be patched with the first commit from the other repo you want to bring in. You may be able to do this with a git cherry-pick <commit-0>
if commit-0
introduced the files you want, or you may need to construct the commit 'by hand'. Add -n
to the cherry-pick options if you need to modify the initial commit to, e.g., drop files from that commit you don't want to bring in.
After that, you can continue to git cherry-pick
subsequent commits, again using -n
where necessary. In the simplest case (all commits are exactly what you want and apply cleanly) you can give the full list of commits on the cherry-pick command line: git cherry-pick <commit-1> <commit-2> <commit-3> ...
.
The below method to migrate my GIT Stash to GitLab by maintaining all branches and preserving history.
Clone the old repository to local.
git clone --bare <STASH-URL>
Create an empty repository in GitLab.
git push --mirror <GitLab-URL>
The above I performed when we migrated our code from stash to GitLab and it worked very well.
Success story sharing
git log --pretty=email --patch-with-stat --full-index --binary --reverse -- client > patch
. Works without problems AFAICT.--committer-date-is-author-date
option to preserve the original commit date instead of the date the files were moved.--follow
option togit log
(which only works with one file at a time).