HEX
Server: Apache
System: Linux vpshost0650.publiccloud.com.br 4.4.79-grsec-1.lc.x86_64 #1 SMP Wed Aug 2 14:18:21 -03 2017 x86_64
User: bandeirantesbomb3 (10068)
PHP: 8.0.7
Disabled: apache_child_terminate,dl,escapeshellarg,escapeshellcmd,exec,link,mail,openlog,passthru,pcntl_alarm,pcntl_exec,pcntl_fork,pcntl_get_last_error,pcntl_getpriority,pcntl_setpriority,pcntl_signal,pcntl_signal_dispatch,pcntl_sigprocmask,pcntl_sigtimedwait,pcntl_sigwaitinfo,pcntl_strerror,pcntl_wait,pcntl_waitpid,pcntl_wexitstatus,pcntl_wifexited,pcntl_wifsignaled,pcntl_wifstopped,pcntl_wstopsig,pcntl_wtermsig,php_check_syntax,php_strip_whitespace,popen,proc_close,proc_open,shell_exec,symlink,system
Upload Files
File: //usr/share/doc/gawk-4.0.2/README.multibyte
Fri Jun  3 12:20:17 IDT 2005
============================

As noted in the NEWS file, as of 3.1.5, gawk uses character values instead
of byte values for `index', `length', `substr' and `match'.  This works
in multibyte and unicode locales.

Wed Jun 18 16:47:31 IDT 2003
============================

Multibyte locales can cause occasional weirdness, in particular with
ranges inside brackets: /[....]/.  Something that works great for ASCII
will choke for, e.g., en_US.UTF-8.  One such program is test/gsubtst5.awk.

By default, the test suite runs with LC_ALL=C and LANG=C. You
can change this by doing (from a Bourne-style shell):

	$ GAWKLOCALE=some_locale make check

Then the test suite will set LC_ALL and LANG to the given locale.

As of this writing, this works for en_US.UTF-8, and all tests
pass except gsubtst5.

For the normal case of RS = "\n", the locale is largely irrelevant.
For other single byte record separators, using LC_ALL=C will give you
much better performance when reading records.  Otherwise, gawk has to
make several function calls, *per input character* to find the record
terminator.  You have been warned.