2.19.1.1 BOM: Byte Order Mark
From section
2.19.1, you may have got the impression that text files are
complicated. This section deals with a related topic, making life often
easier for the user, but providing another worry to the programmer.
BOM or Byte Order Marker is a technique for identifying
Unicode text files as well as the encoding they use. Such files start
with the Unicode character 0xFEFF, a non-breaking, zero-width space
character. This is a pretty unique sequence that is not likely to be the
start of a non-Unicode file and uniquely distinguishes the various
Unicode file formats. As it is a zero-width blank, it even doesn't
produce any output. This solves all problems, or ... Some formats start
off as US-ASCII and may contain some encoding mark to switch to UTF-8,
such as the encoding="UTF-8"
in an XML header. Such formats
often explicitly forbid the use of a UTF-8 BOM. In other cases there is
additional information revealing the encoding, making the use of a BOM
redundant or even illegal.
The BOM is handled by SWI-Prolog open/4
predicate. By default, text files are probed for the BOM when opened for
reading. If a BOM is found, the encoding is set accordingly and the
property bom(true)
is available through stream_property/2.
When opening a file for writing, writing a BOM can be requested using
the option bom(true)
with
open/4.