How can I download only a specific folder or directory from a remote Git repo hosted on GitHub?
Say the example GitHub repo lives here:
git@github.com:foobar/Test.git
Its directory structure:
Test/
├── foo/
│ ├── a.py
│ └── b.py
└── bar/
├── c.py
└── d.py
I want to download only the foo folder and not clone the whole Test project.
Update Apr. 2021: there are a few tools created by the community that can do this for you:
Download Directory (Credits to fregante) It has also been integrated into the excellent Refined Github chrome extension as a button in the Github web UI.
It has also been integrated into the excellent Refined Github chrome extension as a button in the Github web UI.
GitZip (Credits to Kino - see his answer here)
DownGit (Credits to Minhas Kamal - see his answer here)
Note: if you're trying to download a large number of files, you may need to provide a token to these tools to avoid rate limiting.
Original (manual) approach: Checking out an individual directory is not supported by git
natively, but Github can do this via SVN. If you checkout your code with subversion, Github will essentially convert the repo from git to subversion on the backend, then serve up the requested directory.
Here's how you can use this feature to download a specific folder. I'll use the popular javascript library lodash
as an example.
Navigate to the folder you want to download. Let's download /test from master branch. Modify the URL for subversion. Replace tree/master with trunk. https://github.com/lodash/lodash/tree/master/test ➜ https://github.com/lodash/lodash/trunk/test Download the folder. Go to the command line and grab the folder with SVN.
svn checkout https://github.com/lodash/lodash/trunk/test
You might not see any activity immediately because Github takes up to 30 seconds to convert larger repositories, so be patient.
Full URL format explanation: If you're interested in master branch, use trunk instead. So the full path is trunk/foldername If you're interested in foo branch, use branches/foo instead. The full path looks like branches/foo/foldername Protip: You can use svn ls to see available tags and branches before downloading if you wish
That's all! Github supports more subversion features as well, including support for committing and pushing changes.
You can download directly or create a download link for any GitHub public directory or file using this tool I made: DownGit
https://i.stack.imgur.com/CDeZE.gif
You can also configure properties of the downloaded file - detailed usage
Disclaimer: I fell into the same problem as the question-asker and could not find any simple solution. So, I developed this tool for my own use first, then opened it for everyone :)
Two options for this feature:
Option 1: GitZip Browser Extension
Chrome Extension, Edge Extension, Firefox Addon
Usage:
Browse any Github repository page. Two ways to download: Choose the items: In default, you can double click on items or check the checkbox on the front of items. Click download button at the bottom-right of the page. In context menu: Click "GitZip Download" > "Whole Repository" or "Current Folder". Move the mouse cursor on the item and click "GitZip Download" > "Selected Folder/File". Click "GitZip Download" > "Checked Items" after doing 2-1-1. See the progress dashboard and wait for browser trigger download. Get the ZIP file.
Get Token:
Click GitZip Extension icon on your browser. Click "Normal" or "Private" link besides "Get Token". Authorize GitZip permission on Github auth page. Back to repo page of the beginning. Continue to use.
Option 2: Github gh-page
http://kinolien.github.io/gitzip by using GitHub API, and JSZip, FileSaver.js libraries.
Step1: Input github url to the field at the top-right. Step2: Press enter or click download for download zip directly or click search for view the list of sub-folders and files. Step3: Click "Download Zip File" or "Get File" button to get files.
In most cases, it works fine, except that the folder contains more than 1,000 files, because of the Github Trees API limitation. (refers to Github API#Contents)
And it also can support private/public repos and upgrade the rate limit, if you have GitHub account and use "get token" link in this site.
If you have svn
, you can use svn export
to do this:
svn export https://github.com/foobar/Test.git/trunk/foo
Notice the URL format:
The base URL is https://github.com/
/trunk appended at the end
Before you run svn export
, it's good to first verify the content of the directory with:
svn ls https://github.com/foobar/Test.git/trunk/foo
.git
extension. You can use the full project link, and start by using svn ls
followed by the project full path. Example: svn ls https://github.com/RobTillaart/Arduino.git
. To export just one folder, you just add the /trunk
followed by the desired path, like svn export https://github.com/RobTillaart/Arduino.git/trunk/libraries/DHTlib
. It is easier to keep the project path intact.
https://github.com/miguelgrinberg/python-socketio/tree/master/examples/wsgi
, run svn export https://github.com/miguelgrinberg/python-socketio.git/trunk/examples/wsgi
. A directory called wsgi
will be created under the current working directory. Only source files, nothing else. No .git
, no subversion related files.
For a Generic git Repo:
If you want to download files, not clone the repository with history, you can do this with git-archive
.
git-archive
makes a compressed zip or tar archive of a git repository. Some things that make it special:
You can choose which files or directories in the git repository to archive. It doesn't archive the .git/ folder, or any untracked files in the repository it's run on. You can archive a specific branch, tag, or commit. Projects managed with git often use this to generate archives of versions of the project (beta, release, 2.0, etc.) for users to download.
An example of creating an archive of the docs/usage
directory from a remote repo you're connected to with ssh:
# in terminal
$ git archive --format tar --remote ssh://server.org/path/to/git HEAD docs/usage > /tmp/usage_docs.tar
More information in this blog post and the git documentation.
Note on GitHub Repos:
GitHub doesn't allow git-archive
access. ☹️
git archive --format tar
format is not tar.gz, but tar.
Nothing wrong with other answers but I just thought I'd share step-by-step instructions for those wandering through this process for the first time.
How to download a single folder from a github repository (Mac OS X):
~ To open Terminal just click spotlight and type terminal then hit enter
On a Mac you likely already have SVN (to test just open terminal and type "svn" or "which svn" ~ without the quote marks) On Github: Locate the Github path to your git folder (not the repo) by clicking the specific folder name within a repo Copy the path from the address bar of the browser Open Terminal and type: svn export Next paste in the address (eg.): https://github.com/mingsai/Sample-Code/tree/master/HeadsUpUI Replace the words: tree/master with the word: trunk Type in the destination folder for the files (in this example, I store the target folder inside of the Downloads folder for the current user) Here space is just the spacebar not the word (space) ~/Downloads/HeadsUpUI The final terminal command shows the full command to download the folder (compare the address to step 5) svn export https://github.com/mingsai/Sample-Code/trunk/HeadsUpUI ~/Downloads/HeadsUpUI
BTW - If you are on Windows or some other platform you can find a binary download of subversion (svn) at http://subversion.apache.org
~ If you want to checkout the folder rather than simply download it try using the svn help (tldr: replace export with checkout)
Update
Regarding the comment on resuming an interrupted download/checkout. I would try running svn cleanup
followed by svn update
. Please search SO for additional options.
Whoever is working on specific folder he needs to clone that particular folder itself, to do so please follow below steps by using sparse checkout.
Create a directory. Initialize a Git repository. (git init) Enable Sparse Checkouts. (git config core.sparsecheckout true) Tell Git which directories you want (echo 2015/brand/May( refer to folder you want to work on) >> .git/info/sparse-checkout) Add the remote (git remote add -f origin https://jafartke.com/mkt-imdev/DVM.git) Fetch the files (git pull origin master )
After trying all the answers, the best solution for me was:
GitHub's vscode based editor.
Pros:
doesn't require any extra tool like svn or API tokens. No limit on size of content Saves as a directory or file, and not archive.
Instructions
Go to any repo. (ex. https://github.com/RespiraWorks/Ventilator/tree/master/software) Press . or replace .com with .dev in URL to open the repo in GitHub's internal editor In Explorer pane (left side or press Ctrl+Shift+E), Right click on the required file/folder and select download. In the Select Folder dialog box, choose the directory on your disk under which you want the selected file/folder to exist.
Note
I tried other solutions like in accepted answer but,
Don't want to install and learn svn only for this. Other tools like Download Directory, Refined GitHub, GitZip, DownGit either require API tokens or cannot download large directories.
Other options
VSCode with Remote Repositories extension to open the repo and download the file/folder.
You cannot; unlike Subversion, where each subdirectory can be checked out individually, Git operates on a whole-repository basis.
For projects where finer-grained access is necessary, you can use submodules -- each submodule is a separate Git project, and thus can be cloned individually.
It is conceivable that a Git front-end (e.g. GitHub's web interface, or gitweb) could choose to provide an interface for you to extract a given folder, but to my knowledge none of them do that (though they do let you download individual files, so if the folder does not contain too many files, that is an option)
Edit - GitHub actually offers access via SVN, which would allow you to do just this (as per comment). See https://github.com/blog/1438-improved-svn-here-to-stay-old-svn-going-away for latest instructions on how to do this
2019 Summary
There are a variety of ways to handle this, depending on whether or not you want to do this manually or programmatically.
There are four options summarized below. And for those that prefer a more hands-on explanation, I've put together a YouTube video: Download Individual Files and Folders from GitHub.
Also, I've posted a similar answer on StackOverflow for those that need to download single files from GitHub (as opposed to folders).
1. GitHub User Interface
There's a download button on the repository's homepage. Of course, this downloads the entire repo, after which you would need to unzip the download and then manually drag out the specific folder you need.
2. Third Party Tools
There are a variety of browser extensions and web apps that can handle this, with DownGit being one of them. Simply paste in the GitHub URL to the folder (e.g. https://github.com/babel/babel-eslint/tree/master/lib) and press the "Download" button.
3. Subversion
GitHub does not support git-archive (the git feature that would allow us to download specific folders). GitHub does however, support a variety of Subversion features, one of which we can use for this purpose. Subversion is a version control system (an alternative to git). You'll need Subversion installed. Grab the GitHub URL for the folder you want to download. You'll need to modify this URL, though. You want the link to the repository, followed by the word "trunk", and ending with the path to the nested folder. In other words, using the same folder link example that I mentioned above, we would replace "tree/master" with "trunk". Finally, open up a terminal, navigate to the directory that you want the content to get downloaded to, type in the following command (replacing the URL with the URL you constructed): svn export https://github.com/babel/babel-eslint/trunk/lib, and press enter.
4. GitHub API
This is the solution you'll need if you want to accomplish this task programmatically. And this is actually what DownGit is using under the hood. Using GitHub's REST API, write a script that does a GET request to the content endpoint. The endpoint can be constructed as follows: https://api.github.com/repos/:owner/:repo/contents/:path. After replacing the placeholders, an example endpoint is: https://api.github.com/repos/babel/babel-eslint/contents/lib. This gives you JSON data for all of the content that exists in that folder. The data has everything you need, including whether or not the content is a folder or file, a download URL if it's a file, and an API endpoint if it's a folder (so that you can get the data for that folder). Using this data, the script can recursively go through all content in the target folder, create folders for nested folders, and download all of the files for each folder. Check out DownGit's code for inspiration.
If you truly just want to just "download" the folder and not "clone" it (for development), the easiest way to simply get a copy of the most recent version of the repository (and therefore a folder/file within it), without needing to clone the whole repo or even install git in the first place, is to download a zip archive (for any repo, fork, branch, commit, etc.) by going to the desired repository/fork/branch/commit on GitHub (e.g. http(s)://github.com/<user>/<repo>/commit/<Sha1>
for a copy of the files as they were after a specific commit) and selecting the Downloads
button near the upper-right.
This archive format contains none of the git-repo magic, just the tracked files themselves (and perhaps a few .gitignore files if they were tracked, but you can ignore those :p) - that means that if the code changes and you want to stay on top, you'll have to manually re-download it, and it also means you won't be able to use it as a git repository...
Not sure if that's what you're looking for in this case (again, "download"/view vs "clone"/develop), but it can be useful nonetheless...
tar.gz
: https://github.com/${owner}/${repo}/archive/${hash}.tar.gz
There's a Python3 pip package called githubdl
that can do this*:
export GIT_TOKEN=1234567890123456789012345678901234567890123
pip install githubdl
githubdl -u http://github.com/foobar/test -d foo
The project page is here
* Disclaimer: I wrote this package.
If you are comfortable with unix commands, you don't need special dependencies or web apps for this. You can download the repo as a tarball and untar only what you need.
Example (woff2 files from a subdirectory in fontawesome):
curl -L https://api.github.com/repos/FortAwesome/Font-Awesome/tarball | tar xz --wildcards "*/web-fonts-with-css/webfonts/*.woff2" --strip-components=3
More about the link format: https://developer.github.com/v3/repos/contents/#get-archive-link (including how to get a zip file or specific branches/refs)
Keep the initial part of the path (*/) to match any directory. Github creates a wrapper directory with the commit ref in the name, so it can't be known.
You probably want --strip-components to be the same as the amount of slashes (/) in the path (previous argument).
This will download the whole tarball. Use the SVN method mentioned in the other answers if this has to be avoided or if you want to be nice to the GitHub servers.
git clone --filter from git 2.19 now works on GitHub
Tested 2020-09-18, git 2.25.1.
This option was added together with an update to the remote protocol, and it truly prevents objects from being downloaded from the server.
E.g., to clone only objects required for d1
of this repository: https://github.com/cirosantilli/test-git-partial-clone I can do:
git clone \
--depth 1 \
--filter=blob:none \
--no-checkout \
https://github.com/cirosantilli/test-git-partial-clone \
;
cd test-git-partial-clone
git checkout master -- d1
I have covered this in more detail at: Git: How do I clone a subdirectory only of a Git repository?
Another specific example:
Like I want to download 'iOS Pro Geo' folder from the url
https://github.com/alokc83/APRESS-Books-Source-Code-/tree/master/%20Pro%20iOS%20Geo
and I can do so via
svn checkout https://github.com/alokc83/APRESS-Books-Source-Code-/trunk/%20Pro%20iOS%20Geo
Note trunk in the path
Edited: (as per Tommie C's comment)
Yes, using export instead of checkout would give a clean copy without extra git repository files.
svn export https://github.com/alokc83/APRESS-Books-Source-Code-/trunk/%20Pro%20iOS%20Geo
Edited: If tree/master is not there in url then Fork it and it will be there in Forked url.
This is how I do it with git v2.25.0, also tested with v2.26.2. This trick doesn't work with v2.30.1
TLDR
git clone --no-checkout --filter=tree:0 https://github.com/opencv/opencv
cd opencv
# requires git 2.25.x to 2.26.2
git sparse-checkout set data/haarcascades
You can use Docker to avoid installing a specific version of git
git clone --no-checkout --filter=tree:0 https://github.com/opencv/opencv
cd opencv
# requires git 2.25.x to 2.26.2
docker run --rm -it -v $PWD/:/code/ --workdir=/code/ alpine/git:v2.26.2 sparse-checkout set data/haarcascades
Full solution
# bare minimum clone of opencv
$ git clone --no-checkout --filter=tree:0 https://github.com/opencv/opencv
...
Resolving deltas: 100% (529/529), done.
# Downloaded only ~7.3MB , takes ~3 seconds
# du = disk usage, -s = summary, -h = human-readable
$ du -sh opencv
7.3M opencv/
# Set target dir
$ cd opencv
$ git sparse-checkout set data/haarcascades
...
Updating files: 100% (17/17), done.
# Takes ~10 seconds, depending on your specs
# View downloaded files
$ du -sh data/haarcascades/
9.4M data/haarcascades/
$ ls data/haarcascades/
haarcascade_eye.xml haarcascade_frontalface_alt2.xml haarcascade_licence_plate_rus_16stages.xml haarcascade_smile.xml
haarcascade_eye_tree_eyeglasses.xml haarcascade_frontalface_alt_tree.xml haarcascade_lowerbody.xml haarcascade_upperbody.xml
haarcascade_frontalcatface.xml haarcascade_frontalface_default.xml haarcascade_profileface.xml
haarcascade_frontalcatface_extended.xml haarcascade_fullbody.xml haarcascade_righteye_2splits.xml
haarcascade_frontalface_alt.xml haarcascade_lefteye_2splits.xml haarcascade_russian_plate_number.xml
References
git-sparse-checkout-blog
git-sparse-checkout-docs
git-filter-props-docs
you can use git-svn in the following way.
first, replace tree/master
with trunk
then, install git-svn
by sudo apt install git-svn
git svn clone https://github.com/lodash/lodash/trunk/test
This way you don't have to go through the pain of setting svn, specifically for Windows users.
sudo apt install git-svn
is required when running through WSL.
You can do a simple download of the directory tree:
git archive --remote git@github.com:foobar/Test.git HEAD:foo | tar xf -
But if you mean to check it out, and be able to do commits and push them back, no you can't do that.
None of the answers helped in my situation. If you are developing for Windows, you likely don't have svn. In many situations one can't count on users to have Git installed either, or don't want to download entire repositories for other reasons. Some of the people that answered this question, such as Willem van Ketwich and aztack, made tools to accomplish this task. However, if the tool isn't written for the language you are using, or you don't want to install a third party library, these don't work.
However, there is a much easier way. GitHub has an API that allows you to download a single file or an entire directory's contents using GET requests. You can access a directory using https://api.github.com/repos/:owner/:repo_name/contents/:path
that returns a JSON object enumerating all the files in the directory. Included in the enumeration is a link to the raw content of the file, the download_url
parameter. The file can then be downloaded using that URL.
It's a two step process that requires the ability to make GET requests, but this can be implemented in pretty much any language, on any platform. It can be used to get files or directories.
download_url
for folders is null
. Please, read the question carefuly, before posting an answer.
git sparse-checkout
Git 2.25.0 includes a new experimental git sparse-checkout command that makes the existing feature easier to use, along with some important performance benefits for large repositories. (The GitHub Blog)
Example with current version:
git clone --filter=blob:none --sparse https://github.com/git/git.git
cd git
git sparse-checkout init --cone
git sparse-checkout add t
Most notably
--sparse checks out only top-level directory files of git repository into working copy
git sparse-checkout add t incrementally adds/checks out t subfolder of git
Other elements
git sparse-checkout init does some preparations to enable partial checkouts
--filter=blob:none optimizes data fetching by downloading only necessary git objects (take a look at partial clone feature for further infos)
--cone also speeds up performance by applying more restricted file inclusion patterns
GitHub status
GitHub is still evaluating this feature internally while it’s enabled on a select few repositories [...]. As the feature stabilizes and matures, we’ll keep you updated with its progress. (docs)
git sparse checkout
is not strictly needed, at least not anymore. git checkout
alone also only obtains the missing blobs: stackoverflow.com/a/56504849/895245
Just 5 steps to go
Download SVN from here.
Open CMD and go to SVN bin directory like: cd %ProgramFiles%\SlikSvn\bin
Let's suppose I wan to download this directory URL https://github.com/ZeBobo5/Vlc.DotNet/tree/develop/src/Samples
Replace tree/develop or tree/master with trunk
Now fire this last command to download folder in same directory.
svn export https://github.com/ZeBobo5/Vlc.DotNet/trunk/src/Samples
You can use ghget with any URL copied from the address bar:
ghget https://github.com/fivethirtyeight/data/tree/master/airline-safety
It's a self-contained portable shell script that doesn't use SVN (which didn't work for me on a big repo). It also doesn't use the API so it doesn't require a token and isn't rate-limited.
Disclaimer: I made it.
Just to amplify the answers above, a real example from a real GitHub repository to a local directory would be:
svn ls https://github.com/rdcarp/playing-cards/trunk/PumpkinSoup.PlayingCards.Interfaces
svn export https://github.com/rdcarp/playing-cards/trunk/PumpkinSoup.PlayingCards.Interfaces /temp/SvnExport/Washburn
Sometimes a concrete example helps clarify the substitutions proposed.
I use linux so , put this in ~/.bashrc , called even :D $HOME/.bashrc
git-dowloadfolder(){
a="$1"
svn checkout ${a/tree\/master/trunk}
}
then refresh the shell with
source ~/.bashrc
then use it with git-downloadfolder blablabla :D
It's one of the few places where SVN is better than Git.
In the end we've gravitated towards three options:
Use wget to grab the data from GitHub (using the raw file view). Have upstream projects publish the required data subset as build artifacts. Give up and use the full checkout. It's big hit on the first build, but unless you get lot of traffic, it's not too much hassle in the following builds.
For whatever reason, the svn
solution does not work for me, and since I have no need of svn
for anything else, it did not make sense to spend time trying to make it, so I looked for a simple solution using tools I already had. This script uses only curl
and awk
to download all files in a GitHub directory described as "/:user:repo/contents/:path"
.
The returned body of a call to the GitHub REST API "GET /repos/:user:repo/contents/:path"
command returns an object that includes a "download_url"
link for each file in a directory.
This command-line script calls that REST API using curl
and sends the result through AWK, which filters out all but the "download_url" lines, erases quote marks and commas from the links, and then downloads the links using another call to curl.
curl -s https://api.github.com/repos/:user/:repo/contents/:path | awk \
'/download_url/ { gsub("\"|,", "", $2); system("curl -O "$2"); }'
awk: cmd. line:1: /download_url/ { gsub("\"|,", "", $2); system("curl -O "$2"); } awk: cmd. line:1: ^ syntax error
FOR /F delims^=^"^ tokens^=4 %%a IN ('curl -s https://api.github.com/repos/:user/:repo/contents/:path 2^>NUL ^| findstr "download_url"') DO curl -O "%%~a"
Our team wrote a bash script to do this because we didn't want to have to install SVN on our bare bones server.
https://github.com/ojbc/docker/blob/master/java8-karaf3/files/git-download.sh
It uses the github API and can be run from the command line like this:
git-download.sh https://api.github.com/repos/ojbc/main/contents/shared/ojb-certs
I work with CentOS 7 servers on which I don't have root access, nor git, svn, etc (nor want to!) so made a python script to download any github folder: https://github.com/andrrrl/github-folder-downloader
Usage is simple, just copy the relevant part from a github project, let's say the project is https://github.com/MaxCDN/php-maxcdn/, and you want a folder where some source files are only, then you need to do something like:
$ python gdownload.py "/MaxCDN/php-maxcdn/tree/master/src" /my/target/dir/
(will create target folder if doesn't exist)
It requires lxml library, can be installed with easy_install lxml
If you don't have root access (like me) you can create a .pydistutils.py
file into your $HOME
dir with these contents: [install] user=1
And easy_install lxml
will just work (ref: https://stackoverflow.com/a/33464597/591257).
Open repo to codesandbox by replacing github to githubbox in url and on codesandbox go to file menu and Export it as a zip.
For following repo: https://github.com/geist-org/react/tree/master/examples/custom-themes
Enter following url: https://githubbox.com/geist-org/react/tree/master/examples/custom-themes
In codesandbox go to file menu and Export it as a Zip.
try it.
https://github.com/twfb/git-directory-download
usage: gitd [-h] [-u URL] [-r] [-p] [--proxy PROXY]
optional arguments:
-h, --help show this help message and exit
-u URL, --url URL github url, split by ",", example: "https://x, http://y"
-r, --raw download from raw url
-p, --parse download by parsing html
--proxy PROXY proxy config, example "socks5://127.0.0.1:7891"
Example:
1. download by raw url: gitd -u "https://github.com/twfb/git-directory-download"
2. download by raw url: gitd -r -u "https://github.com/twfb/git-directory-download"
3. dowmload by parsing: gitd -p -u "https://github.com/twfb/git-directory-download"
4. download by raw url with proxy: gitd -r -u "https://github.com/twfb/git-directory-download" --proxy "socks5://127.0.0.1:7891"
Success story sharing
svn export
, as I didn't want a Subversion working copy. Then I added the resulting folder in Git. (I somehow lost a large piece of my directory tree, so I exported from the repo I forked.)Import
command. I am sure I'm providing the correct URL in a similar format as shown in the answer. I even tried using the visual inspector and selected the required folder (no URL typed) and the result is the same.https://github.com/$organization/$repo/branches/$branch/$directory
repo/branches/foo_branch/bar_folder
you will receive a status 410, feature gone.