I've seen Ruby and Perl programmers do some complicated code challenges entirely with regexes. The lookahead and lookbehind capabilities in Perl regexes make them more powerful than the regex implementations in most other languages. I was wondering how powerful they really are.
Is there an easy way to either prove or disprove that Perl regexes are Turing complete?
'010'
, and pattern s/(?:|(?<=0)(0)(?=0)|(?<=0)0(?=(1))|...|(?<=1)1(?=1)(?=1*(0))|^(?=(0))|(?<=(0))$)/$1/g
- it needs some more thinking, I guess), but I think you need to use it in a loop to be of any use. Is that legitimate? Maybe you have a template of the program you're after?
Excluding any kind of embedded code, such as ?{ }
, they probably don't cover all of context-free, much less Turing Machines. They might, but to my knowledge, nobody has actually proven it one way or another. Given that people have been trying to solve certain context-free problems with Perl regexes for a while and haven't come up with a solution yet, it's likely that they are not context-free.
There is an interesting discussion to be had about what features are merely convenient, and which actually add power. For instance, matching 0n*1*0n (that's notation for "any number of zeros, followed by a one, followed by the same number of zeros as before") is not something that can be done with pure regexes. You can prove this can't be done with regexes using the Pumping Lemma, but the simple, informal proof is that the regex would have to count an arbitrary number of zeros, and regexes can't do counting.
However, backreferences can match that with:
/(0*) 1 \1/x;
So that means backreferences give you more power, and are not a mere convenience. What else might give us more power, I wonder?
Also, Perl6 "patterns" (they're not even pretending they're regexes anymore) are designed to look kinda like Perl5 regexes (so you don't need to relearn much), but they have enough features added to be fully context-free. They're actually designed so you can use them to alter the way the language is parsed within a lexical scope.
There are at least two discussions: Turing completeness and regular expressions and Are Perl patterns universal? with further references.
The consensus (to my untrained eye) seems to be that the answer is "no", but I am not sure if I understand everything correctly.
For regexes in Perl there are two cases:
With embedded code: They are of course Turing-complete. Without embedded code: They always halt so they are not general Turing machines.
Every regular language can be accepted by a finite automaton. Its input must be a finite string.
[...] a deterministic finite automaton (DFA)—also known as deterministic finite state machine—is a finite state machine that accepts/rejects finite strings of symbols [...].
The same goes for Turing machines: The formal definition does not even have input. It must be encoded in the finite number of states.
Alternative (equivalent) definitions include input, but it must be finite.
(ab)*
does not halt because there can always be a c
in the future. The text book answer is that regular languages are clearly not Turing complete. I think infinite input is just not part of the definition.
(ab)*
halts on infinite input like abcabcabcabcabcabc...
a*b
does not halt for input aaaaa...
. Anyway, I don't think infinite inputs are allowed or make sense. That would invalidate many important and clearly true results.
Success story sharing