Read Text File R←{X} ⎕NGET Y

This function reads the contents of the specified text file. See also Write Text File.

Y is either a character vector/scalar containing the name of the file to be read, or a 2-item vector whose first item is the file name and whose second is an integer scalar specifying flags for the operation.

If flags is 0 (the default value if omitted) the content in the result R is a character vector. If flags is 1 the result is a nested array of character vectors corresponding to the lines in the file.

The optional left-argument X is either

  • a character vector that specifies the file-encoding as shown in the table below.
  • a 256-element numeric vector that maps each possible byte value (0-255) to a Unicode code point (1st element = Unicode code point corresponding to byte value 0, and so on). ¯1 indicates that the corresponding byte value is not mapped to any character. Apart from ¯1, no value may appear in the table more than once.
Table 1: File Encodings
Encoding Description
UTF-8 The data is encoded as UTF-8 format.
UTF-16LE The data is encoded as UTF-16 little-endian format.
UTF-16BE The data is encoded as UTF-16 big-endian format.
UTF-16 The data is encoded as UTF-16 with the endianness of the host system (currently BE on AIX platforms, LE on all others).
UTF-32LE The data is encoded as UTF-32 little-endian format.
UTF-32BE The data is encoded as UTF-32 big-endian format.
UTF-32 The data is encoded as UTF-32 with the endianness of the host system (currently BE on AIX platforms, LE on all others).
ASCII The data is encoded as 7-bit ASCII format.
Windows-1252 The data is encoded as 8-bit Windows-1252 format.
ANSI ANSI is a synonym of Windows-1252.

The above UTF formats may be qualified with -BOM or -NOBOM (for example, UTF-8-BOM). See Write Text File.

Whether or not X is specified, if the start of the file contains a recognised Byte Order Mark (BOM), the file is decoded according to the BOM. Otherwise, if X is specified the file is decoded according to the value of X. Otherwise, the file is examined to try to decide its encoding and is decoded accordingly.

The result R is a 3-element vector comprising (content) (encoding) (newline) where:

content A simple character vector, or a vector of character vectors, according to the value of flags .
encoding The encoding that was actually used to read the file. If this is a UTF format, it will always include the appropriate endianness (except for UTF-8 to which endianness doesn't apply) and a -BOM or -NOBOM suffix to indicate whether or not a BOM is actually present in the file. For example, UTF-16LE-BOM. If X specified a user-defined encoding as a 256-element numeric vector, encoding will be that same vector.
newline Determined by the first occurrence in the file of one of the newline characters identified in the line separator table, or if no such line separator is found.

If content is simple then all its line separators (listed in the table below) are replaced by (normalised to) ⎕UCS 10, which in the Classic Edition must be in ⎕AVU (else TRANSLATION ERROR).

If content is nested, it is formed by splitting the contents of the file on the occurrence of any of the line separators shown in the table below. These line separators are removed.

The 3rd element of the result newline is a numeric vector from the Value column of the table below corresponding to the first occurrence of any of the newline characters in the file. If none of these characters are present, the value is .

Table 2: Line separators:
Value Code Description
newline characters
13 CR Carriage Return (U+000D)
10 LF Line Feed (U+000A)
13 10 CRLF Carriage Return followed by Line Feed
133 NEL New Line (U+0085)
other line separator characters
11 VT Vertical Tab (U+000B)
12 FF Form Feed (U+000C)
8232 LS Line Separator (U+2028)
8233 PS Paragraph Separator (U+2029)