I saw a line of C that looked like this:
!ErrorHasOccured() ??!??! HandleError();
It compiled correctly and seems to run ok. It seems like it's checking if an error has occurred, and if it has, it handles it. But I'm not really sure what it's actually doing or how it's doing it. It does look like the programmer is trying express their feelings about errors.
I have never seen the ??!??!
before in any programming language, and I can't find documentation for it anywhere. (Google doesn't help with search terms like ??!??!
). What does it do and how does the code sample work?
wtf
and roflmao
, respectively.
??!
is a trigraph that translates to |
. So it says:
!ErrorHasOccured() || HandleError();
which, due to short circuiting, is equivalent to:
if (ErrorHasOccured())
HandleError();
Guru of the Week (deals with C++ but relevant here), where I picked this up.
Possible origin of trigraphs or as @DwB points out in the comments it's more likely due to EBCDIC being difficult (again). This discussion on the IBM developerworks board seems to support that theory.
From ISO/IEC 9899:1999 §5.2.1.1, footnote 12 (h/t @Random832):
The trigraph sequences enable the input of characters that are not defined in the Invariant Code Set as described in ISO/IEC 646, which is a subset of the seven-bit US ASCII code set.
Well, why this exists in general is probably different than why it exists in your example.
It all started half a century ago with repurposing hardcopy communication terminals as computer user interfaces. In the initial Unix and C era that was the ASR-33 Teletype.
This device was slow (10 cps) and noisy and ugly and its view of the ASCII character set ended at 0x5f, so it had (look closely at the pic) none of the keys:
{ | } ~
The trigraphs were defined to fix a specific problem. The idea was that C programs could use the ASCII subset found on the ASR-33 and in other environments missing the high ASCII values.
Your example is actually two of ??!, each meaning |, so the result is ||.
However, people writing C code almost by definition had modern equipment,1 so my guess is: someone showing off or amusing themself, leaving a kind of Easter egg in the code for you to find.
It sure worked, it led to a wildly popular SO question.
https://i.stack.imgur.com/WbaCR.jpg
ASR-33 Teletype
1. For that matter, the trigraphs were invented by the ANSI committee, which first met after C become a runaway success, so none of the original C code or coders would have used them.
#
was replaced with £
. In other regions, maybe "ASCII" had no braces etc.
if (x || y) { a[i] = '\0'; }
looking like if (x öö y) ä aÄiÅ = 'Ö0'; å
in the wrong charset.
It's a C trigraph. ??!
is |
, so ??!??!
is the operator ||
<iso646.h>
header file.
As already stated ??!??!
is essentially two trigraphs (??!
and ??!
again) mushed together that get replaced-translated to ||
, i.e the logical OR, by the preprocessor.
The following table containing every trigraph should help disambiguate alternate trigraph combinations:
Trigraph Replaces
??( [
??) ]
??< {
??> }
??/ \
??' ^
??= #
??! |
??- ~
Source: C: A Reference Manual 5th Edition
So a trigraph that looks like ??(??)
will eventually map to []
, ??(??)??(??)
will get replaced by [][]
and so on, you get the idea.
Since trigraphs are substituted during preprocessing you could use cpp
to get a view of the output yourself, using a silly trigr.c
program:
void main(){ const char *s = "??!??!"; }
and processing it with:
cpp -trigraphs trigr.c
You'll get a console output of
void main(){ const char *s = "||"; }
As you can notice, the option -trigraphs
must be specified or else cpp
will issue a warning; this indicates how trigraphs are a thing of the past and of no modern value other than confusing people who might bump into them.
As for the rationale behind the introduction of trigraphs, it is better understood when looking at the history section of ISO/IEC 646:
ISO/IEC 646 and its predecessor ASCII (ANSI X3.4) largely endorsed existing practice regarding character encodings in the telecommunications industry. As ASCII did not provide a number of characters needed for languages other than English, a number of national variants were made that substituted some less-used characters with needed ones.
(emphasis mine)
So, in essence, some needed characters (those for which a trigraph exists) were replaced in certain national variants. This leads to the alternate representation using trigraphs comprised of characters that other variants still had around.
char *date = "??-??-??!"
may not produce what you expect (this actually produces char *date = "~~|";
)
if(data??(x??)??(y??)=='??/r' ??!??! data??(x??)??(y??)==0) ??< break; ??>
?:
for added readability
Success story sharing
ErrorHasOccurred() && HandleError();
That is, if you're used to shell scripting. :)