What is the difference between a thread and a fiber? I've heard of fibers from ruby and I've read heard they're available in other languages, could somebody explain to me in simple terms what is the difference between a thread and a fiber.
In the most simple terms, threads are generally considered to be preemptive (although this may not always be true, depending on the operating system) while fibers are considered to be light-weight, cooperative threads. Both are separate execution paths for your application.
With threads: the current execution path may be interrupted or preempted at any time (note: this statement is a generalization and may not always hold true depending on OS/threading package/etc.). This means that for threads, data integrity is a big issue because one thread may be stopped in the middle of updating a chunk of data, leaving the integrity of the data in a bad or incomplete state. This also means that the operating system can take advantage of multiple CPUs and CPU cores by running more than one thread at the same time and leaving it up to the developer to guard data access.
With fibers: the current execution path is only interrupted when the fiber yields execution (same note as above). This means that fibers always start and stop in well-defined places, so data integrity is much less of an issue. Also, because fibers are often managed in the user space, expensive context switches and CPU state changes need not be made, making changing from one fiber to the next extremely efficient. On the other hand, since no two fibers can run at exactly the same time, just using fibers alone will not take advantage of multiple CPUs or multiple CPU cores.
Threads use pre-emptive scheduling, whereas fibers use cooperative scheduling.
With a thread, the control flow could get interrupted at any time, and another thread can take over. With multiple processors, you can have multiple threads all running at the same time (simultaneous multithreading, or SMT). As a result, you have to be very careful about concurrent data access, and protect your data with mutexes, semaphores, condition variables, and so on. It is often very tricky to get right.
With a fiber, control only switches when you tell it to, typically with a function call named something like yield()
. This makes concurrent data access easier, since you don't have to worry about atomicity of data structures or mutexes. As long as you don't yield, there's no danger of being preempted and having another fiber trying to read or modify the data you're working with. As a result, though, if your fiber gets into an infinite loop, no other fiber can run, since you're not yielding.
You can also mix threads and fibers, which gives rise to the problems faced by both. Not recommended, but it can sometimes be the right thing to do if done carefully.
In Win32, a fiber is a sort of user-managed thread. A fiber has its own stack and its own instruction pointer etc., but fibers are not scheduled by the OS: you have to call SwitchToFiber explicitly. Threads, by contrast, are pre-emptively scheduled by the operation system. So roughly speaking a fiber is a thread that is managed at the application/runtime level rather than being a true OS thread.
The consequences are that fibers are cheaper and that the application has more control over scheduling. This can be important if the app creates a lot of concurrent tasks, and/or wants to closely optimise when they run. For example, a database server might choose to use fibers rather than threads.
(There may be other usages for the same term; as noted, this is the Win32 definition.)
First I would recommend reading this explanation of the difference between processes and threads as background material.
Once you've read that it's pretty straight forward. Threads cans be implemented either in the kernel, in user space, or the two can be mixed. Fibers are basically threads implemented in user space.
What is typically called a thread is a thread of execution implemented in the kernel: what's known as a kernel thread. The scheduling of a kernel thread is handled exclusively by the kernel, although a kernel thread can voluntarily release the CPU by sleeping if it wants. A kernel thread has the advantage that it can use blocking I/O and let the kernel worry about scheduling. It's main disadvantage is that thread switching is relatively slow since it requires trapping into the kernel.
Fibers are user space threads whose scheduling is handled in user space by one or more kernel threads under a single process. This makes fiber switching very fast. If you group all the fibers accessing a particular set of shared data under the context of a single kernel thread and have their scheduling handled by a single kernel thread, then you can eliminate synchronization issues since the fibers will effectively run in serial and you have complete control over their scheduling. Grouping related fibers under a single kernel thread is important, since the kernel thread they are running in can be pre-empted by the kernel. This point is not made clear in many of the other answers. Also, if you use blocking I/O in a fiber, the entire kernel thread it is a part of blocks including all the fibers that are part of that kernel thread.
In section 11.4 "Processes and Threads in Windows Vista" in Modern Operating Systems, Tanenbaum comments:
Although fibers are cooperatively scheduled, if there are multiple threads scheduling the fibers, a lot of careful synchronization is required to make sure fibers do not interfere with each other. To simplify the interaction between threads and fibers, it is often useful to create only as many threads as there are processors to run them, and affinitize the threads to each run only on a distinct set of available processors, or even just one processor. Each thread can then run a particular subset of the fibers, establishing a one to-many relationship between threads and fibers which simplifies synchronization. Even so there are still many difficulties with fibers. Most Win32 libraries are completely unaware of fibers, and applications that attempt to use fibers as if they were threads will encounter various failures. The kernel has no knowledge of fibers, and when a fiber enters the kernel, the thread it is executing on may block and the kernel will schedule an arbitrary thread on the processor, making it unavailable to run other fibers. For these reasons fibers are rarely used except when porting code from other systems that explicitly need the functionality provided by fibers.
Note that in addition to Threads and Fibers, Windows 7 introduces User-Mode Scheduling:
User-mode scheduling (UMS) is a light-weight mechanism that applications can use to schedule their own threads. An application can switch between UMS threads in user mode without involving the system scheduler and regain control of the processor if a UMS thread blocks in the kernel. UMS threads differ from fibers in that each UMS thread has its own thread context instead of sharing the thread context of a single thread. The ability to switch between threads in user mode makes UMS more efficient than thread pools for managing large numbers of short-duration work items that require few system calls.
More information about threads, fibers and UMS is available by watching Dave Probert: Inside Windows 7 - User Mode Scheduler (UMS).
Threads were originally created as lightweight processes. In a similar fashion, fibers are a lightweight thread, relying (simplistically) on the fibers themselves to schedule each other, by yielding control.
I guess the next step will be strands where you have to send them a signal every time you want them to execute an instruction (not unlike my 5yo son :-). In the old days (and even now on some embedded platforms), all threads were fibers, there was no pre-emption and you had to write your threads to behave nicely.
Threads are scheduled by the OS (pre-emptive). A thread may be stopped or resumed at any time by the OS, but fibers more or less manage themselves (co-operative) and yield to each other. That is, the programmer controls when fibers do their processing and when that processing switches to another fiber.
Threads generally rely on the kernel to interrupt the thread so it or another thread can run (which is better known as Pre-emptive multitasking) whereas fibers use co-operative multitasking where it is the fiber itself that give up the its running time so that other fibres can run.
Some useful links explaining it better than I probably did are:
http://en.wikipedia.org/wiki/Fiber_(computer_science)
http://en.wikipedia.org/wiki/Computer_multitasking#Cooperative_multitasking.2Ftime-sharing
http://en.wikipedia.org/wiki/Pre-emptive_multitasking
Win32 fiber definition is in fact "Green Thread" definition established at Sun Microsystems. There is no need to waste the term fiber on the thread of some kind, i.e., a thread executing in user space under user code/thread-library control.
To clarify the argument look at the following comments:
With hyper-threading, a multi-core CPU can accept multiple threads and distribute them one on each core.
Superscalar pipelined CPU accepts one thread for execution and uses Instruction Level Parallelism (ILP) to run the thread faster. We may assume that one thread is broken into parallel fibers running in parallel pipelines.
SMT CPU can accept multiple threads and break them into instruction fibers for parallel execution on multiple pipelines, using pipelines more efficiently.
We should assume that processes are made of threads and that threads should be made of fibers. With that logic in mind, using fibers for other sorts of threads is wrong.
Success story sharing