ChatGPT解决这个技术问题 Extra ChatGPT

How to download an entire directory and subdirectories using wget?

I am trying to download the files for a project using wget, as the SVN server for that project isn't running anymore and I am only able to access the files through a browser. The base URLs for all the files is the same like

http://abc.tamu.edu/projects/tzivi/repository/revisions/2/raw/tzivi/*

How can I use wget (or any other similar tool) to download all the files in this repository, where the "tzivi" folder is the root folder and there are several files and sub-folders (upto 2 or 3 levels) under it?

You can't do that if server has no web-page with list of all links to files you need.
do you know the name of the files?
no i don't know the name of all files.I tried wget with the recursive option but it didn't work either.Is that because the server doesn't have any index.html file which lists all the inner links.
Did you try the mirroring option of wget?

佚名

You may use this in shell:

wget -r --no-parent http://abc.tamu.edu/projects/tzivi/repository/revisions/2/raw/tzivi/

The Parameters are:

-r     //recursive Download

and

--no-parent // Don´t download something from the parent directory

If you don't want to download the entire content, you may use:

-l1 just download the directory (tzivi in your case)

-l2 download the directory and all level 1 subfolders ('tzivi/something' but not 'tivizi/somthing/foo')  

And so on. If you insert no -l option, wget will use -l 5 automatically.

If you insert a -l 0 you´ll download the whole Internet, because wget will follow every link it finds.


Great, so to simplify for the next reader: wget -r -l1 --no-parent http://www.stanford.edu/~boyd/cvxbook/cvxbook_additional_exercises/ was the answer for me. Thanks your answer.
I tried the above command to get all the files from http://websitename.com/wp-content/uploads/2009/05 but all I got was an index.html file which had nothing. I can't figure what I missed.
I know this is quite old. But what I also found useful was the -e robots=off switch. ;)
Why don't you remove the "I forgot something important" and just fix the answer ???
We can use -nH option with wget to prevent the hostname directory getting created by default with the download directory.
a
andDevW

You can use this in a shell:

wget -r -nH --cut-dirs=7 --reject="index.html*" \
      http://abc.tamu.edu/projects/tzivi/repository/revisions/2/raw/tzivi/

The Parameters are:

-r recursively download

-nH (--no-host-directories) cuts out hostname 

--cut-dirs=X (cuts out X directories)

G
GMB

This link just gave me the best answer:

$ wget --no-clobber --convert-links --random-wait -r -p --level 1 -E -e robots=off -U mozilla http://base.site/dir/

Worked like a charm.


Where to use this code?
S
Sarkar_lat_2016
wget -r --no-parent URL --user=username --password=password

the last two options are optional if you have the username and password for downloading, otherwise no need to use them.

You can also see more options in the link https://www.howtogeek.com/281663/how-to-use-wget-the-ultimate-command-line-downloading-tool/


D
Draken

use the command

wget -m www.ilanni.com/nexus/content/

b
baobab33

you can also use this command :

wget --mirror -pc --convert-links -P ./your-local-dir/ http://www.your-website.com

so that you get the exact mirror of the website you want to download


A
Android Cse

try this working code (30-08-2021):

!wget --no-clobber --convert-links --random-wait -r -p --level 1 -E -e robots=off --adjust-extension -U mozilla "yourweb directory with in quotations"

H
Hiep Luong

This works:

wget -m -np -c --no-check-certificate -R "index.html*" "https://the-eye.eu/public/AudioBooks/Edgar%20Allan%20Poe%20-%2"

C
CoderGuy123

This will help

wget -m -np -c --level 0 --no-check-certificate -R"index.html*"http://www.your-websitepage.com/dir

A little description of your suggested answer will be more helpful. Please read stackoverflow.com/help/how-to-answer