From what I understand, a cache is an encrypted file of similar files.
What do we do with the __pycache__
folder? Is it what we give to people instead of our source code? Is it just my input data? This folder keeps getting created, what it is for?
Python 3.8
you can use an environment variable to change the location for the annoying cache directories: stackoverflow.com/a/57414308/1612318
When you run a program in Python, the interpreter compiles it to bytecode first (this is an oversimplification) and stores it in the __pycache__
folder. If you look in there you will find a bunch of files sharing the names of the .py
files in your project's folder, only their extensions will be either .pyc
or .pyo
. These are bytecode-compiled and optimized bytecode-compiled versions of your program's files, respectively.
As a programmer, you can largely just ignore it... All it does is make your program start a little faster. When your scripts change, they will be recompiled, and if you delete the files or the whole folder and run your program again, they will reappear (unless you specifically suppress that behavior).
When you're sending your code to other people, the common practice is to delete that folder, but it doesn't really matter whether you do or don't. When you're using version control (git
), this folder is typically listed in the ignore file (.gitignore
) and thus not included.
If you are using CPython (which is the most common, as it's the reference implementation) and you don't want that folder, then you can suppress it by starting the interpreter with the -B flag, for example
python -B foo.py
Another option, as noted by tcaswell, is to set the environment variable PYTHONDONTWRITEBYTECODE
to any value (according to Python's man page, any "non-empty string").
__pycache__
is a folder containing Python 3 bytecode compiled and ready to be executed.
I don't recommend routinely laboriously deleting these files or suppressing creation during development as it wastes your time. Just have a recursive command ready (see below) to clean up when needed as bytecode can become stale in edge cases (see comments).
Python programmers usually ignore bytecode. Indeed __pycache__
and *.pyc
are common lines to see in .gitignore
files. Bytecode is not meant for distribution and can be disassembled using dis
module.
If you are using OS X you can easily hide all of these folders in your project by running following command from the root folder of your project.
find . -name '__pycache__' -exec chflags hidden {} \;
Replace __pycache__
with *.pyc
for Python 2.
This sets a flag on all those directories (.pyc files) telling Finder/Textmate 2 to exclude them from listings. Importantly the bytecode is there, it's just hidden.
Rerun the command if you create new modules and wish to hide new bytecode or if you delete the hidden bytecode files.
On Windows the equivalent command might be (not tested, batch script welcome):
dir * /s/b | findstr __pycache__ | attrib +h +s +r
Which is same as going through the project hiding folders using right-click > hide...
Running unit tests is one scenario (more in comments) where deleting the *.pyc
files and __pycache__
folders is indeed useful. I use the following lines in my ~/.bash_profile
and just run cl
to clean up when needed.
alias cpy='find . -name "__pycache__" -delete'
alias cpc='find . -name "*.pyc" -delete'
...
alias cl='cpy && cpc && ...'
and more lately
# pip install pyclean
pyclean .
A __pycache__
folder is created when you use the line:
import file_name
or try to get information from another file you have created. This makes it a little faster when running your program the second time to open the other file.
Updated answer from 3.7+ docs:
To speed up loading modules, Python caches the compiled version of each module in the __pycache__ directory under the name module.version.pyc, where the version encodes the format of the compiled file; it generally contains the Python version number. For example, in CPython release 3.3 the compiled version of spam.py would be cached as __pycache__/spam.cpython-33.pyc. This naming convention allows compiled modules from different releases and different versions of Python to coexist.
Source: https://docs.python.org/3/tutorial/modules.html#compiled-python-files
That is, this directory is generated by Python and exists to make your programs run faster. It shouldn't be committed to source control, and should coexist in peace with your local source code.
__pycache__
is a directory that contains bytecode cache files that are automatically generated by python, namely compiled python, or .pyc
, files. You might be wondering why Python, an "interpreted" language, has any compiled files at all. This SO question addresses that (and it's definitely worth reading this answer).
The python docs go into more depth about exactly how it works and why it exists:
It was added in python 3.2 because the existing system of maintaining .pyc files in the same directory caused various problems, such as when a program was run with Python interpreters of different versions. For the full feature spec, see PEP 3174.
When you import a module,
import file_name
Python stores the compiled bytecode in __pycache__
directory so that future imports can use it directly, rather than having to parse and compile the source again.
It does not do that for merely running a script, only when a file is imported.
(Previous versions used to store the cached bytecode as .pyc files that littered up the same directory as the .py files, but starting in Python 3 they were moved to a subdirectory to make things tidier.)
PYTHONDONTWRITEBYTECODE ---> If this is set to a non-empty string, Python won’t try to write .pyc files on the import of source modules. This is equivalent to specifying the -B option.
file_name
is a file containing definitions of functions?
from the official python tutorial Modules
To speed up loading modules, Python caches the compiled version of each module in the __pycache__ directory under the name module.version.pyc, where the version encodes the format of the compiled file; it generally contains the Python version number. For example, in CPython release 3.6 the compiled version of spam.py would be cached as __pycache__/spam.cpython-36.pyc.
from Python doc Programming FAQs
When a module is imported for the first time (or when the source file has changed since the current compiled file was created) a .pyc file containing the compiled code should be created in a __pycache__ subdirectory of the directory containing the .py file. The .pyc file will have a filename that starts with the same name as the .py file, and ends with .pyc, with a middle component that depends on the particular python binary that created it.
Python Version 2.x will have .pyc when interpreter compiles the code.
Python Version 3.x will have __pycache__ when interpreter compiles the code.
alok@alok:~$ ls
module.py module.pyc __pycache__ test.py
alok@alok:~$
The python interpreter compiles the *.py script file and saves the results of the compilation to the __pycache__
directory.
When the project is executed again, if the interpreter identifies that the *.py script has not been modified, it skips the compile step and runs the previously generated *.pyc file stored in the __pycache__
folder.
When the project is complex, you can make the preparation time before the project is run shorter. If the program is too small, you can ignore that by using python -B abc.py
with the B
option.
Execution of a python script would cause the byte code to be generated in memory and kept until the program is shutdown. In case a module is imported, for faster reusability, Python would create a cache .pyc (PYC is 'Python' 'Compiled') file where the byte code of the module being imported is cached. Idea is to speed up loading of python modules by avoiding re-compilation ( compile once, run multiple times policy ) when they are re-imported.
The name of the file is the same as the module name. The part after the initial dot indicates Python implementation that created the cache (could be CPython) followed by its version number.
In 3.2 and later, Python saves .pyc compiled byte code files in a sub-directory named __pycache__
located in the directory where your source files reside with filenames that identify the Python version that created them (e.g. script.cpython-33.pyc)
does the byte code in __pycache__
get automatically invoked the next time the application is started? I.E: if we run some application main.py
for the first time and all the necessary modules get complied and stored in pycache
then the next time are they automatically used even if I call python main.py
? or we will have to call python _pycache_/main.pyc
?
Success story sharing
PYTHONDONTWRITEBYTECODE=<any_value>
to suppress it permanently.python2
puts the compiled files in the same directory as the originals, if I'm not mistaken.find . -name '*.pyc' -delete
Yes, find has a flag for deleting the found files, so you don't have to use any xargs shananigans