ColdFusion forgets to use UTF-8 files after logs rotated

Submitted by Falken on

For a long time, it's been recommended to set up Linux's logrotate system to manage the size of cfserver.log, which can easily grow to be a very large size on production systems, and ColdFusion seems unable to manage itself, unlike the other logs.

I recently discovered however, that a restart of ColdFusion via logrotate was changing the way files were written and read.

By default, everything ColdFusion does on modern operating systems is in UTF-8, so you can read and write files using CFFILE without needed to worry about the charset parameter.

You can easily check that this is the case by testing for the default file encodings in the underlying Java JVM

<cfobject type="java" action="create" class="java.lang.System" name="s">
<cfoutput>
file.encoding: #s.getProperty("file.encoding")#<br>
</cfoutput>
<cfobject type="java" action="create" class="java.io.InputStreamReader"
name="reader">
<cfset reader.init(s.in)>
<cfoutput>stream reader's encoding:
#reader.getEncoding()#
</cfoutput>

If this outputs "UTF-8" (or UTF8) for both properties then everything is working as it should.

However, I found out that after an automatic log rotation, this would output the very odd "ANSI_X3.4-1968" for the file encoding and ASCII for the stream.

Why is this an issue ?
Well, now reading and writing files with UTF-8 ("foreign") characters in them will silently mangle the contents - leading to question marks, weird black boxes, find()'s no longer matching or other signs of corruption.

I stumbled across a few pointers, but basically if you compare /proc/(ColdFusion process id)/environ before and after an automatic restart, you'll see that logrotate has thrown away the environment settings that Java uses to figure out if your operating system was made in the 21st century or not.
Yes, I know, it should really look for an excuse to downgrade to ASCII, not the other way around...

Anyway, the outcome is therefore that Java (and hence ColdFusion) fall back to using ASCII for file operations unless they see a LANG environment variable mentioning UTF-8.

So, the upshot is to use something like this in /etc/logrotate.d/coldfusion instead, adding an explicit export of the LANG variable as well as using runuser. You may need to use 'su' on some UNIX variants, in which case please leave a comment here with the version you used.

/opt/coldfusion9/logs/cfserver.log {
  missingok
  rotate 5
  size=250M
  compress
  postrotate
         /sbin/runuser -s /bin/bash root -c "export LANG=en_GB.UTF-8 ; /etc/init.d/coldfusion_9 restart"
 endscript

}

Maybe one day Adobe will ship with something like the above, or just manage the log files correctly.

Sections