ChatGPT解决这个技术问题 Extra ChatGPT

Are Git forks actually Git clones?

I keep hearing people say they're forking code in Git. Git "fork" sounds suspiciously like Git "clone" plus some (meaningless) psychological willingness to forgo future merges. There is no fork command in Git, right?

GitHub makes forks a little more real by stapling correspondence onto it. That is, you press the fork button and later, when you press the pull request button, the system is smart enough to email the owner. Hence, it's a little bit of a dance around repository ownership and permissions.

Yes/No? Any angst over GitHub extending Git in this direction? Or any rumors of Git absorbing the functionality?

Yeah, it is just a type of clone which is tracked by the github database.
Doesn't GitHub do something special to avoid doubling the storage requirements (on GitHub's own servers)?
Not mentioned yet: Deleting a private repo deletes all its forks. Deleting a public repo keeps the forks but promotes one fork to be the new parent repo. If your boss makes your public repo private, it breaks all the existing forks and you won't be able to make pull requests from them to the private repo. help.github.com/articles/…
I believe (without proof since GitHub do not show this to us) that the actual mechanism here is Git's "alternates". In other words, the fork is a mirror clone with --reference used. Exactly how public repos and deletions are handled is not at all clear (move alternates to randomly chosen promoted repo? point all forks to some common alternate that's not part of the original fork?) but the use of alternates explains various observable behaviors.

V
VonC

Fork, in the GitHub context, doesn't extend Git.
It only allows clone on the server side.

When you clone a GitHub repository on your local workstation, you cannot contribute back to the upstream repository unless you are explicitly declared as "contributor". That's because your clone is a separate instance of that project. If you want to contribute to the project, you can use forking to do it, in the following way:

clone that GitHub repository on your GitHub account (that is the "fork" part, a clone on the server side)

contribute commits to that GitHub repository (it is in your own GitHub account, so you have every right to push to it)

signal any interesting contribution back to the original GitHub repository (that is the "pull request" part by way of the changes you made on your own GitHub repository)

Check also "Collaborative GitHub Workflow".

If you want to keep a link with the original repository (also called upstream), you need to add a remote referring that original repository.
See "What is the difference between origin and upstream on GitHub?"

https://i.stack.imgur.com/cEJjT.png

And with Git 2.20 (Q4 2018) and more, fetching from fork is more efficient, with delta islands.


"When you are cloning a GitHub repo on your local workstation, you cannot contribute back to the upstream repo unless you are explicitly declared as "contributor"." --- Is this not true with "forking"? Please explain.
@TestSubject528491 no, with a fork, that means you are cloning the upstream repo as your own repo on the GitHub server side. Then you can locally clone that new "fork" repo on your computer and freely push back on it, since you are the creator and owner of that fork.
To me, the key point is that you can't submit a PR from your local copy unless you're declared to be a contributor. I'm so used to submitting PRs from my local repo, but that's because I'm always marked as a contributor. If you think about it, to submit a PR you have to push a branch to the remote repo and then create the PR. I guess it makes sense if you don't want random people creating branches on your repo. And that you'd prefer them to fork it and submit PRs that way instead.
Is the fork a clone --bare or clone --mirror ?
@theonlygusti mirror on the server side (GitHub).
A
Al Sweigart

I keep hearing people say they're forking code in git. Git "fork" sounds suspiciously like git "clone" plus some (meaningless) psychological willingness to forgo future merges. There is no fork command in git, right?

"Forking" is a concept, not a command specifically supported by any version control system.

The simplest kind of forking is synonymous with branching. Every time you create a branch, regardless of your VCS, you've "forked". These forks are usually pretty easy to merge back together.

The kind of fork you're talking about, where a separate party takes a complete copy of the code and walks away, necessarily happens outside the VCS in a centralized system like Subversion. A distributed VCS like Git has much better support for forking the entire codebase and effectively starting a new project.

Git (not GitHub) natively supports "forking" an entire repo (ie, cloning it) in a couple of ways:

when you clone, a remote called origin is created for you

by default all the branches in the clone will track their origin equivalents

fetching and merging changes from the original project you forked from is trivially easy

Git makes contributing changes back to the source of the fork as simple as asking someone from the original project to pull from you, or requesting write access to push changes back yourself. This is the part that GitHub makes easier, and standardizes.

Any angst over Github extending git in this direction? Or any rumors of git absorbing the functionality?

There is no angst because your assumption is wrong. GitHub "extends" the forking functionality of Git with a nice GUI and a standardized way of issuing pull requests, but it doesn't add the functionality to Git. The concept of full-repo-forking is baked right into distributed version control at a fundamental level. You could abandon GitHub at any point and still continue to push/pull projects you've "forked".


Thanks for your excellent answer. I just want to clarify, this means, outside the context of github I could clone some X project on my machine. If I make changes in my local and don't have write access to origin, I will email the author of the project to request a pull. He will make a remote called gideon which will be a url to my local clone, and he can pull, right?
If you want to contribute your changes to a project you can either save them into files e.g. using git format-patch and attach them to an email to someone who has that write access, or you can obtain your own hosting, push your work to that and send the URL in an email e.g. using the git request-pull command. Repos on workstations are not usually direclty accessible online.
But yes, if your workstation happens to be accessible over the internet to the author of the project then you can simply send the URL to them and they can add it as a remote and pull from it.
Re: angst, the only such for me is that there's no link or button to click to create a pull-from-my-repo's-perspective button where GitHub tells you you're 50 commits behind. No biggie now that I know they're using the term "Pull Request" to also include requests for pulling from the upstream to your GitHub fork. Git is hard.
P
Peter Mortensen

Yes, fork is a clone. It emerged because, you cannot push to others' copies without their permission. They make a copy of it for you (fork), where you will have write permission as well.

In the future if the actual owner or others users with a fork like your changes they can pull it back to their own repository. Alternatively you can send them a "pull-request".


Can I simply clone the repository to my local machine, create a branch, then submit a pull request to the original owner? It seems redundant to have multiple copies of repos hosted all over GitHub, just to facilitate code updates.
@Casey You can only send a pull request through GitHub from GitHub itself and you can only send a GitHub pull request from a branch that exists on GitHub. If you are not a collaborator on the Repository in question, there is no way for you to create a branch from which you can initiate a GitHub pull request. Nothing stops you from doing it via email the old fashioned way, but GitHub plays no part in that.
@Casey, a reason is that normally others do not have URL access to your workstation. The GitHub fork means there is a copy of your work on the GitHub server, that you can push to and which others do have URL access to so they can pull. The pull request is just a standard way to getting the URL for your copy (up on GitHub) to them so they can easily pull it into the their repository.
This should be the correct/accepted answer I believe. Imagine a mess in a scene where a team of 15-20 developers creating branches and pushing to origin versus 15-20 developers having their own copy of same repository and making as many branches and doing changes and pushing it back. Then Author of original repository can pull only changes he/she wants.
D
Daenyth

"Fork" in this context means "Make a copy of their code so that I can add my own modifications". There's not much else to say. Every clone is essentially a fork, and it's up to the original to decide whether to pull the changes from the fork.


In specific: "Make a copy of their code on the GitHub server so that I can add my own modifications and others can have URL access to my version". Most local workstations do not offer URL access for anyone to be able to pull. But if you push to your fork on the server, then they can have the URL for the pull.
The question is not about forking is in general, but about the GitHub forking specifically.
S
Sam Johnson

Cloning involves making a copy of the git repository to a local machine, while forking is cloning the repository into another repository. Cloning is for personal use only (although future merges may occur), but with forking you are copying and opening a new possible project path


D
Daniel Shen

I think fork is a copy of other repository but with your account modification. for example, if you directly clone other repository locally, the remote object origin is still using the account who you clone from. You can't commit and contribute your code. It is just a pure copy of codes. Otherwise, If you fork a repository, it will clone the repo with the update of your account setting in you github account. And then cloning the repo in the context of your account, you can commit your codes.


a
aliasav

Forking is done when you decide to contribute to some project. You would make a copy of the entire project along with its history logs. This copy is made entirely in your repository and once you make these changes, you issue a pull request. Now its up-to the owner of the source to accept your pull request and incorporate the changes into the original code.

Git clone is an actual command that allows users to get a copy of the source. git clone [URL] This should create a copy of [URL] in your own local repository.


P
Peter Mortensen

There is a misunderstanding here with respect to what a "fork" is. A fork is in fact nothing more than a set of per-user branches. When you push to a fork you actually do push to the original repository, because that is the ONLY repository.

You can try this out by pushing to a fork, noting the commit and then going to the original repository and using the commit ID, you'll see that the commit is "in" the original repository.

This makes a lot of sense, but it is far from obvious (I only discovered this accidentally recently).

When John forks repository SuperProject what seems to actually happen is that all branches in the source repository are replicated with a name like "John.master", "John.new_gui_project", etc.

GitHub "hides" the "John." from us and gives us the illusion we have our own "copy" of the repository on GitHub, but we don't and nor is one even needed.

So my fork's branch "master" is actually named "Korporal.master", but the GitHub UI never reveals this, showing me only "master".

This is pretty much what I think goes on under the hood anyway based on stuff I've been doing recently and when you ponder it, is very good design.

For this reason I think it would be very easy for Microsoft to implement Git forks in their Visual Studio Team Services offering.


Dear Hugh, half of your response is actually incorrect -- a fork is a clone of a whole repository from one user account to another user account, together with all the branches and history. When you commit to the fork, nothing changes in the original repository from which you forked. But besides these few misunderstandings on your part regarding what a "fork" is, there is now some good news: Visual Studio Team services include now a "Fork" functionality. ;)
@SorinPostelnicu source? I'm inclined to believe Hugh here due to personal experience of forks behaving in ways that are inconstant with them being a simple clone of the repository. For example, when upstream is deleted, forks are deleted (as was mentioned in a comment on OP's question) and sometimes upstream has wound up merging things into branches of my forks when accepting a pull request, without me doing anything.
Indeed this appears to be the case. After all it would be incredibly stupid for GitHub to literally git clone a whole new repository (even a "bare" one) every time someone pushes the "fork" button -- that would be an incredible waste of storage, and likely an attack vector as well.
P
Peter Mortensen

Apart from the fact that cloning is from server to your machine and forking is making a copy on the server itself, an important difference is that when we clone, we actually get all the branches, labels, etc.

But when we fork, we actually only get the current files in the master branch, nothing other than that. This means we don't get the other branches, etc.

Hence if you have to merge something back to the original repository, it is a inter-repository merge and will definitely need higher privileges.

Fork is not a command in Git; it is just a concept which GitHub implements. Remember Git was designed to work in peer-to-peer environment without the need to synchronize stuff with any master copy. The server is just another peer, but we look at it as a master copy.


Huh? A fork gets all the branches, though you have to know where to look (hint: git branch -a).
P
Peter Mortensen

In simplest terms,

When you say you are forking a repository, you are basically creating a copy of the original repository under your GitHub ID in your GitHub account.

and

When you say you are cloning a repository, you are creating a local copy of the original repository in your system (PC/laptop) directly without having a copy in your GitHub account.