In what segment (.BSS, .DATA, other) of an executable file are static variables stored so that they don't have name collision? For example:
foo.c: bar.c:
static int foo = 1; static int foo = 10;
void fooTest() { void barTest() {
static int bar = 2; static int bar = 20;
foo++; foo++;
bar++; bar++;
printf("%d,%d", foo, bar); printf("%d, %d", foo, bar);
} }
If I compile both files and link it to a main that calls fooTest() and barTest repeatedly, the printf statements increment independently. Makes sense since the foo and bar variables are local to the translation unit.
But where is the storage allocated?
To be clear, the assumption is that you have a toolchain that would output a file in ELF format. Thus, I believe that there has to be some space reserved in the executable file for those static variables. For discussion purposes, lets assume we use the GCC toolchain.
Where your statics go depends on whether they are zero-initialized. zero-initialized static data goes in .BSS (Block Started by Symbol), non-zero-initialized data goes in .DATA
When a program is loaded into memory, it’s organized into different segments. One of the segment is DATA segment. The Data segment is further sub-divided into two parts:
Initialized data segment: All the global, static and constant data are stored here.
Uninitialized data segment (BSS): All the uninitialized data are stored in this segment.
Here is a diagram to explain this concept:
https://i.stack.imgur.com/JQjKp.png
Here is very good link explaining these concepts: Memory Management in C: The Heap and the Stack
In fact, a variable is tuple (storage, scope, type, address, value):
storage : where is it stored, for example data, stack, heap...
scope : who can see us, for example global, local...
type : what is our type, for example int, int*...
address : where are we located
value : what is our value
Local scope could mean local to either the translational unit (source file), the function or the block depending on where its defined. To make variable visible to more than one function, it definitely has to be in either DATA or the BSS area (depending on whether its initialized explicitly or not, respectively). Its then scoped accordingly to either all function(s) or function(s) within source file.
The storage location of the data will be implementation dependent.
However, the meaning of static is "internal linkage". Thus, the symbol is internal to the compilation unit (foo.c, bar.c) and cannot be referenced outside that compilation unit. So, there can be no name collisions.
in the "global and static" area :)
There are several memory areas in C++:
heap
free store
stack
global & static
const
See here for a detailed answer to your question:
The following summarizes a C++ program's major distinct memory areas. Note that some of the names (e.g., "heap") do not appear as such in the draft [standard].
Memory Area Characteristics and Object Lifetimes
-------------- ------------------------------------------------
Const Data The const data area stores string literals and
other data whose values are known at compile
time. No objects of class type can exist in
this area. All data in this area is available
during the entire lifetime of the program.
Further, all of this data is read-only, and the
results of trying to modify it are undefined.
This is in part because even the underlying
storage format is subject to arbitrary
optimization by the implementation. For
example, a particular compiler may store string
literals in overlapping objects if it wants to.
Stack The stack stores automatic variables. Typically
allocation is much faster than for dynamic
storage (heap or free store) because a memory
allocation involves only pointer increment
rather than more complex management. Objects
are constructed immediately after memory is
allocated and destroyed immediately before
memory is deallocated, so there is no
opportunity for programmers to directly
manipulate allocated but uninitialized stack
space (barring willful tampering using explicit
dtors and placement new).
Free Store The free store is one of the two dynamic memory
areas, allocated/freed by new/delete. Object
lifetime can be less than the time the storage
is allocated; that is, free store objects can
have memory allocated without being immediately
initialized, and can be destroyed without the
memory being immediately deallocated. During
the period when the storage is allocated but
outside the object's lifetime, the storage may
be accessed and manipulated through a void* but
none of the proto-object's nonstatic members or
member functions may be accessed, have their
addresses taken, or be otherwise manipulated.
Heap The heap is the other dynamic memory area,
allocated/freed by malloc/free and their
variants. Note that while the default global
new and delete might be implemented in terms of
malloc and free by a particular compiler, the
heap is not the same as free store and memory
allocated in one area cannot be safely
deallocated in the other. Memory allocated from
the heap can be used for objects of class type
by placement-new construction and explicit
destruction. If so used, the notes about free
store object lifetime apply similarly here.
Global/Static Global or static variables and objects have
their storage allocated at program startup, but
may not be initialized until after the program
has begun executing. For instance, a static
variable in a function is initialized only the
first time program execution passes through its
definition. The order of initialization of
global variables across translation units is not
defined, and special care is needed to manage
dependencies between global objects (including
class statics). As always, uninitialized proto-
objects' storage may be accessed and manipulated
through a void* but no nonstatic members or
member functions may be used or referenced
outside the object's actual lifetime.
How to find it yourself with objdump -Sr
To actually understand what is going on, you must understand linker relocation. If you've never touched that, consider reading this post first.
Let's analyze a Linux x86-64 ELF example to see it ourselves:
#include <stdio.h>
int f() {
static int i = 1;
i++;
return i;
}
int main() {
printf("%d\n", f());
printf("%d\n", f());
return 0;
}
Compile with:
gcc -ggdb -c main.c
Decompile the code with:
objdump -Sr main.o
-S decompiles the code with the original source intermingled
-r shows relocation information
Inside the decompilation of f
we see:
static int i = 1;
i++;
4: 8b 05 00 00 00 00 mov 0x0(%rip),%eax # a <f+0xa>
6: R_X86_64_PC32 .data-0x4
and the .data-0x4
says that it will go to the first byte of the .data
segment.
The -0x4
is there because we are using RIP relative addressing, thus the %rip
in the instruction and R_X86_64_PC32
.
It is required because RIP points to the following instruction, which starts 4 bytes after 00 00 00 00
which is what will get relocated. I have explained this in more detail at: https://stackoverflow.com/a/30515926/895245
Then, if we modify the source to i = 1
and do the same analysis, we conclude that:
static int i = 0 goes on .bss
static int i = 1 goes on .data
I don't believe there will be a collision. Using static at the file level (outside functions) marks the variable as local to the current compilation unit (file). It's never visible outside the current file so never has to have a name that can be used externally.
Using static inside a function is different - the variable is only visible to the function (whether static or not), it's just its value is preserved across calls to that function.
In effect, static does two different things depending on where it is. In both cases however, the variable visibility is limited in such a way that you can easily prevent namespace clashes when linking.
Having said that, I believe it would be stored in the DATA
section, which tends to have variables that are initialized to values other than zero. This is, of course, an implementation detail, not something mandated by the standard - it only cares about behaviour, not how things are done under the covers.
So, how does a segment of memory (Data Segment) store variables that can be accessed from everywhere (global variables) and also those which have limited scope (file scope or function scope in case of static variables)?
This is how (easy to understand):
https://bayanbox.ir/view/581244719208138556/virtual-memory.jpg
It depends on the platform and compiler that you're using. Some compilers store directly in the code segment. Static variables are always only accessible to the current translation unit and the names are not exported thus the reason name collisions never occur.
Data declared in a compilation unit will go into the .BSS or the .Data of that files output. Initialised data in BSS, uninitalised in DATA.
The difference between static and global data comes in the inclusion of symbol information in the file. Compilers tend to include the symbol information but only mark the global information as such.
The linker respects this information. The symbol information for the static variables is either discarded or mangled so that static variables can still be referenced in some way (with debug or symbol options). In neither case can the compilation units gets affected as the linker resolves local references first.
I tried it with objdump and gdb, here is the result what I get:
(gdb) disas fooTest
Dump of assembler code for function fooTest:
0x000000000040052d <+0>: push %rbp
0x000000000040052e <+1>: mov %rsp,%rbp
0x0000000000400531 <+4>: mov 0x200b09(%rip),%eax # 0x601040 <foo>
0x0000000000400537 <+10>: add $0x1,%eax
0x000000000040053a <+13>: mov %eax,0x200b00(%rip) # 0x601040 <foo>
0x0000000000400540 <+19>: mov 0x200afe(%rip),%eax # 0x601044 <bar.2180>
0x0000000000400546 <+25>: add $0x1,%eax
0x0000000000400549 <+28>: mov %eax,0x200af5(%rip) # 0x601044 <bar.2180>
0x000000000040054f <+34>: mov 0x200aef(%rip),%edx # 0x601044 <bar.2180>
0x0000000000400555 <+40>: mov 0x200ae5(%rip),%eax # 0x601040 <foo>
0x000000000040055b <+46>: mov %eax,%esi
0x000000000040055d <+48>: mov $0x400654,%edi
0x0000000000400562 <+53>: mov $0x0,%eax
0x0000000000400567 <+58>: callq 0x400410 <printf@plt>
0x000000000040056c <+63>: pop %rbp
0x000000000040056d <+64>: retq
End of assembler dump.
(gdb) disas barTest
Dump of assembler code for function barTest:
0x000000000040056e <+0>: push %rbp
0x000000000040056f <+1>: mov %rsp,%rbp
0x0000000000400572 <+4>: mov 0x200ad0(%rip),%eax # 0x601048 <foo>
0x0000000000400578 <+10>: add $0x1,%eax
0x000000000040057b <+13>: mov %eax,0x200ac7(%rip) # 0x601048 <foo>
0x0000000000400581 <+19>: mov 0x200ac5(%rip),%eax # 0x60104c <bar.2180>
0x0000000000400587 <+25>: add $0x1,%eax
0x000000000040058a <+28>: mov %eax,0x200abc(%rip) # 0x60104c <bar.2180>
0x0000000000400590 <+34>: mov 0x200ab6(%rip),%edx # 0x60104c <bar.2180>
0x0000000000400596 <+40>: mov 0x200aac(%rip),%eax # 0x601048 <foo>
0x000000000040059c <+46>: mov %eax,%esi
0x000000000040059e <+48>: mov $0x40065c,%edi
0x00000000004005a3 <+53>: mov $0x0,%eax
0x00000000004005a8 <+58>: callq 0x400410 <printf@plt>
0x00000000004005ad <+63>: pop %rbp
0x00000000004005ae <+64>: retq
End of assembler dump.
here is the objdump result
Disassembly of section .data:
0000000000601030 <__data_start>:
...
0000000000601038 <__dso_handle>:
...
0000000000601040 <foo>:
601040: 01 00 add %eax,(%rax)
...
0000000000601044 <bar.2180>:
601044: 02 00 add (%rax),%al
...
0000000000601048 <foo>:
601048: 0a 00 or (%rax),%al
...
000000000060104c <bar.2180>:
60104c: 14 00 adc $0x0,%al
So, that's to say, your four variables are located in data section event the the same name, but with different offset.
static variable stored in data segment or code segment as mentioned before.
You can be sure that it will not be allocated on stack or heap.
There is no risk for collision since static
keyword define the scope of the variable to be a file or function, in case of collision there is a compiler/linker to warn you about.
A nice example
The answer might very well depend on the compiler, so you probably want to edit your question (I mean, even the notion of segments is not mandated by ISO C nor ISO C++). For instance, on Windows an executable doesn't carry symbol names. One 'foo' would be offset 0x100, the other perhaps 0x2B0, and code from both translation units is compiled knowing the offsets for "their" foo.
Well this question is bit too old, but since nobody points out any useful information: Check the post by 'mohit12379' explaining the store of static variables with same name in the symbol table: http://www.geekinterview.com/question_details/24745
they're both going to be stored independently, however if you want to make it clear to other developers you might want to wrap them up in namespaces.
you already know either it store in bss(block start by symbol) also referred as uninitialized data segment or in initialized data segment.
lets take an simple example
void main(void)
{
static int i;
}
the above static variable is not initialized , so it goes to uninitialized data segment(bss).
void main(void)
{
static int i=10;
}
and of course it initialized by 10 so it goes to initialized data segment.
Success story sharing
.data
, and static data with no initializer went in.bss
.