ChatGPT解决这个技术问题 Extra ChatGPT

Is the C99 preprocessor Turing complete?

After discovering the Boost preprocessor's capabilities I found myself wondering: Is the C99 preprocessor Turing complete?

If not, what does it lack to not qualify?

The missing thing in CPP for turing completeness is essentially recursion, since it can't loop without it (and really has a fairly limited conditional since you can't conditionally expand portions of a macro)
If you're truly curious you should check it out for yourself, but just a warning to anyone seeing the arguments that CPP is turing complete: all the attempts to do so seem to depend on defining a LARGE NUMBER of macros, and them using them to facilitate looping / recursion. This is not theoretically turing-complete, although you could argue that it's practically sufficient.

s
stgatilov

Well macros don't directly expand recursively, but there are ways we can work around this.

The easiest way of doing recursion in the preprocessor is to use a deferred expression. A deferred expression is an expression that requires more scans to fully expand:

#define EMPTY()
#define DEFER(id) id EMPTY()
#define OBSTRUCT(...) __VA_ARGS__ DEFER(EMPTY)()
#define EXPAND(...) __VA_ARGS__

#define A() 123
A() // Expands to 123
DEFER(A)() // Expands to A () because it requires one more scan to fully expand
EXPAND(DEFER(A)()) // Expands to 123, because the EXPAND macro forces another scan

Why is this important? Well when a macro is scanned and expanding, it creates a disabling context. This disabling context will cause a token, that refers to the currently expanding macro, to be painted blue. Thus, once its painted blue, the macro will no longer expand. This is why macros don't expand recursively. However, a disabling context only exists during one scan, so by deferring an expansion we can prevent our macros from becoming painted blue. We will just need to apply more scans to the expression. We can do that using this EVAL macro:

#define EVAL(...)  EVAL1(EVAL1(EVAL1(__VA_ARGS__)))
#define EVAL1(...) EVAL2(EVAL2(EVAL2(__VA_ARGS__)))
#define EVAL2(...) EVAL3(EVAL3(EVAL3(__VA_ARGS__)))
#define EVAL3(...) EVAL4(EVAL4(EVAL4(__VA_ARGS__)))
#define EVAL4(...) EVAL5(EVAL5(EVAL5(__VA_ARGS__)))
#define EVAL5(...) __VA_ARGS__

Now if we want to implement a REPEAT macro using recursion, first we need some increment and decrement operators to handle state:

#define CAT(a, ...) PRIMITIVE_CAT(a, __VA_ARGS__)
#define PRIMITIVE_CAT(a, ...) a ## __VA_ARGS__

#define INC(x) PRIMITIVE_CAT(INC_, x)
#define INC_0 1
#define INC_1 2
#define INC_2 3
#define INC_3 4
#define INC_4 5
#define INC_5 6
#define INC_6 7
#define INC_7 8
#define INC_8 9
#define INC_9 9

#define DEC(x) PRIMITIVE_CAT(DEC_, x)
#define DEC_0 0
#define DEC_1 0
#define DEC_2 1
#define DEC_3 2
#define DEC_4 3
#define DEC_5 4
#define DEC_6 5
#define DEC_7 6
#define DEC_8 7
#define DEC_9 8

Next we need a few more macros to do logic:

#define CHECK_N(x, n, ...) n
#define CHECK(...) CHECK_N(__VA_ARGS__, 0,)

#define NOT(x) CHECK(PRIMITIVE_CAT(NOT_, x))
#define NOT_0 ~, 1,

#define COMPL(b) PRIMITIVE_CAT(COMPL_, b)
#define COMPL_0 1
#define COMPL_1 0

#define BOOL(x) COMPL(NOT(x))

#define IIF(c) PRIMITIVE_CAT(IIF_, c)
#define IIF_0(t, ...) __VA_ARGS__
#define IIF_1(t, ...) t

#define IF(c) IIF(BOOL(c))

#define EAT(...)
#define EXPAND(...) __VA_ARGS__
#define WHEN(c) IF(c)(EXPAND, EAT)

Now with all these macros we can write a recursive REPEAT macro. We use a REPEAT_INDIRECT macro to refer back to itself recursively. This prevents the macro from being painted blue, since it will expand on a different scan(and using a different disabling context). We use OBSTRUCT here, which will defer the expansion twice. This is necessary because the conditional WHEN applies one scan already.

#define REPEAT(count, macro, ...) \
    WHEN(count) \
    ( \
        OBSTRUCT(REPEAT_INDIRECT) () \
        ( \
            DEC(count), macro, __VA_ARGS__ \
        ) \
        OBSTRUCT(macro) \
        ( \
            DEC(count), __VA_ARGS__ \
        ) \
    )
#define REPEAT_INDIRECT() REPEAT

//An example of using this macro
#define M(i, _) i
EVAL(REPEAT(8, M, ~)) // 0 1 2 3 4 5 6 7

Now this example is limited to 10 repeats, because of limitations of the counter. Just like a repeat counter in a computer would be limited by the finite memory. Multiple repeat counters could be combined together to workaround this limitation, just like in a computer. Furthermore, we could define a FOREVER macro:

#define FOREVER() \
    ? \
    DEFER(FOREVER_INDIRECT) () ()
#define FOREVER_INDIRECT() FOREVER
// Outputs question marks forever
EVAL(FOREVER())

This will try to output ? forever, but will eventually stop because there are no more scans being applied. Now the question is, if we gave it an infinite number of scans would this algorithm complete? This is known as the halting problem, and Turing completeness is necessary to prove the undecidability of the halting problem. So as you can see, the preprocessor can act as a Turing complete language, but instead of being limited to the finite memory of a computer it is instead limited by the finite number of scans applied.


...Wow. Very impressive! Here I thought that the C99 preprocessor was definetely not turing complete.. +1 for thinking out of the box
+1 Quite a creative way to show that preprocessor can scan symbols on a tape ;-) (Thanks to the mod for accepting the flagging to remove the wiki!).
I like how this uses O(log(N)) macros to recurse N times. This is better than Boost Preprocessor which uses O(N) macros.
@qbt937 The EVAL macro applies over 250 scans to the algorithm even it is not needed. Whereas Boost.PP only uses the number of scans that is needed by the algorithm, so Boost.PP is much more efficient.
The limited number of scans is analogous is not analogous to a computer having limited memory. The essence of Turing completeness is that the specification of the computations itself carries no limitation, even when it is actually made to run on a machine that does have limited capacity.
b
bta

Here is an example of abusing the preprocessor to implement a Turing machine. Note that an external build script is needed to feed the preprocessor's output back into its input, so the preprocessor in and of itself isn't Turing complete. Still, it's an interesting project.

From the description of the afore-linked project:

the preprocessor is not Turing complete, at least not if the program is preprocessed only once. This is true even if the program is allowed to include itself. (The reason being that for a given program, the preprocessor has only a finite number of states, plus a stack consisting of the places which the file has been included from. This is only a push-down automaton.)

The answer by Paul Fultz II is quite impressive and certainly closer than I thought the preprocessor could ever get, but it's not a true Turing machine. The C preprocessor has certain limits that prevent it from executing an arbitrary program like a Turing machine could, even if you had infinite memory and time. Section 5.2.4.1 of the C spec gives the following minimum limits for a C compiler:

63 nesting levels of parenthesized expressions within a full expression 63 significant initial characters in an internal identifier or a macro name 4095 macro identifiers simultaneously defined in one preprocessing translation unit 4095 characters in a logical source line

The counter mechanism below requires a macro definition per value, so the macro definition limit will limit how many times you can loop (EVAL(REPEAT(4100, M, ~)) would yield undefined behavior). This essentially puts a cap on the complexity of the program that you can execute. The nesting and complexity of the multi-level expansions may hit one of the other limits as well.

This is fundamentally different than the "infinite memory" limitation. In this case, the spec specifically says that a standards-conforming C compiler is only required to conform to these limits, even if it has infinite time, memory, etc. Any input file exceeding these limits can be processed in an unpredictable or undefined manner (or outright rejected). Some implementations may have higher limits, or no limits at all, but that's considered "implementation-specific" and not part of the standard. It may be possible to use Paul Fultz II's method to implement something like a Turing machine on some specific compiler implementation that has no finite limits, but in a general sense of "can this be done on any arbitrary, standards-conforming C99 pre-processor", the answer is no. Since the limit here is built into the language itself and not simply a side-effect of our inability to construct an infinite computer, I say that breaks Turing completeness.


This answer is incorrect, as the 77 point answer below it shows extensively. Please unaccept it and accept the more useful answer, thank you.
If you mean the 115 point answer by Paul Fultz II below: it confirms this answer.
The limit is in the language itself, but not because of the spec, but because we must write the scans to evaluate the algorithm in the language itself, which there is no mechanism to apply an infinite number of scans.
To add to the lack of recursiveness, that seems to be (under)specified by the standard. §6.10.3.5 paragraph 4 shows an example of two macros calling each other and states: "There are cases where it is not clear whether a replacement is nested or not [...]. Strictly conforming programs are not permitted to depend on such unspecified behavior. "
@reinierpost technically, the 217 point answer (now above smh) doesn't actually confirm this answer, as someone else could still prove that CPP is TC. However, it seems like no-one has, and all who attempt use LARGE NUMBERS of macros. The fact that the algorithm is designed to always terminate is enough for me (see alinsoar's answer).
a
alinsoar

To be Turing complete, one needs to define recursion that may never finish -- one calls them mu-recursive operator.

To define such an operator one needs an infinite space of defined identifiers (in case that each identifier is evaluated a finite number of times), as one cannot know a priori an upper limit of time in which the result is found. With a finite number of operators inside the code one needs to be able to check an unlimited number of possibilities.

So this class of functions cannot be computed by the C preprocessor because in C preprocessor there is a limited number of defined macros and each one is expanded only once.

The C preprocessor uses the Dave Prosser's algorithm (written by Dave Prosser for the WG14 team in 1984). In this algorithm a macro is painted blue in the moment of the first expansion; a recursive call (or mutual recursive call) does not expand it, as it has already been painted blue in the moment when the first expansion starts. So with a finite number of preprocessing lines it is impossible to make infinite calls of functions(macros), which characterizes the mu-recursive operators.

The C preprocessor can compute only sigma-recursive operators .

For details see the course of computation of Marvin L. Minsky (1967) -- Computation: Finite and Infinite Machines, Prentice-Hall, Inc. Englewood Cliffs, N.J. etc.


The Ackerman function, which is mu-recursive only, can be implemented in the PP, so the C preprocessor is not limited to sigma-recursive operators only: gist.github.com/pfultz2/80391e8b18abf3225da2242dcc570cec
As I said in the other comment, it's a difference between what you did, to check a large (but limited) number of inputs and to search an infinte number. The Ackerman function is known to finish, this is why cpp can find its value. It is impossible to compute any mu operator and make cpp turing complete.
@PaulFultzII You need to modify your code in order to check for a larger space of solutions, while a mu-operator is fixed and will check for an infinite space (as large as the hardware resources allow).
The algorithm(ie the A macro) does not need to be modified, only the evaluation needs to be updated to add more scans.
C
Cogwheel

It's Turing complete within limits (as are all computers since they don't have infinite RAM). Check out the kinds of things you can do with Boost Preprocessor.

Edit in response to question edits:

The main limitation on Boost is the maximum macro expansion depth which is compiler-specific. Also, the macros that implement recursion (FOR..., ENUM..., etc.) aren't truly recursive, they just appear that way thanks to a bunch of near-identical macros. In the big picture, this limitation is no different than having a maximum stack size in an actually recursive language.

The only two things that are really necessary for limited Turing-completeness (Turing-compatibility?) are iteration/recursion (equivalent constructs) and conditional branching.


hi. That was actually what prompted my question, I have been using preprocessor for a while.
I believe that macros can't do recursion. Boost just seems to simulate them by having hardcoded macros named like macro0, macro1 .. macro255. I'm not sure whether that counts as "turing complete". The preprocessor has an explicit rule that forbids going from macro255 back to macro0 :( It seems like trying to build a verifyer for fully parenthesized expressions using a finite state automaton. It can work for a limited number of parentheses, but that's not a general verifyer anymore. I've no clue about boost.pp inner workings though, so i could likely be wrong on this.
@Johannes: The chaos preprocessor doesn't have any macros like that. Check it out here: sourceforge.net/projects/chaos-pp
To the reader: be aware that Turing complete within limits means not Turing complete.
@Retr0id No. Turing completeness is a property of computational systems. Such systems describe an abstraction of reality. Programming languages do that. When I write while :; do echo 1; done in my Linux shell, I've specified an unbounded loop. It is actually unbounded, as far as that programming language is concerned. The fact that all such programs must run in an environment that is finite and will at some point stop any such loop, is irrelevant. We're not discussing that environment, we're only discussing the language itself.