Skip to content
15 changes: 14 additions & 1 deletion integration_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -502,4 +502,17 @@ def test_define(record_property, tmpdir): # #589

assert exitcode == 0
assert stderr == "test.cpp:1: syntax error: failed to expand 'TEST_P', Invalid ## usage when expanding 'TEST_P': Unexpected token ')'\n"
assert stdout == '\n'
assert stdout == '\n'

def test_utf16_bom(tmpdir):
test_file = os.path.join(tmpdir, "test.cpp")
with open(test_file, 'wb') as f:
f.write(b'\xFF\xFE\x3B\x00')

args = [test_file]

exitcode, stdout, stderr = simplecpp(args, cwd=tmpdir)

assert exitcode == 0
assert stderr == ''
assert stdout == ';\n'
5 changes: 4 additions & 1 deletion simplecpp.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -276,7 +276,10 @@ class simplecpp::TokenList::Stream {
}

unsigned char peekChar() {
auto ch = static_cast<unsigned char>(peek());
const int pk = peek();
auto ch = static_cast<unsigned char>(pk);
if (pk == EOF)
return ch;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate that we handle EOF. However, spontanously it does not seem optimal to return some abitrary ch value. It's implementation defined what value of EOF is right? If we would return 0xFF we at least know what the return value is. Or do we want to have the implementation defined return value?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to return EOF since we are a custom layer on top of getc(). And since we use the EOF macro it is not "arbitrary" but the one defined by the libc implementation.


// For UTF-16 encoded files the BOM is 0xfeff/0xfffe. If the
// character is non-ASCII character then replace it with 0xff
Expand Down
Loading