Linux and PHP web application support and development (Bromsgrove, UK)

Zend_Validate_StringLength and iconv_strlen

One customer parse a substantial amount of (somewhat poorly formed) XML on a daily basis. (I say poorly formed as the prolog does not specify a character set — i.e. the file starts with <?xml version=”1.0″> and not <?xml version=”1.0″ charset=”UTF-8″> and different files will contain either UTF-8 or ISO-8859-1 text depending on the data supplier).

Recently with PHP we’ve been seeing errors like :

PHP Notice:  iconv_strlen(): Detected an illegal character in input string in ..../vendor/zendframework/zendframework1/library/Zend/Validate/StringLength.php on line 246
PHP Stack trace:
....
Zend_Form_Element->isValid()          ...../vendor/zendframework/zendframework1/library/Zend/Form.php:2300
Zend_Validate_StringLength->isValid() ...../vendor/zendframework/zendframework1/library/Zend/Form/Element.php:1443
iconv_strlen()                         ..../vendor/zendframework/zendframework1/library/Zend/Validate/StringLength.php:246

After some investigation, we found out that Zend_Validate_StringLength defaults to using the ‘iconv.internal_encoding’ php.ini setting if an encoding is not specified when creating the validator ($options = array('encoding' => 'UTF-8'); $validator new Zend_Validate_StringLength($options) ...)

So, perhaps the moral learnt is :

  1. Set php.ini to have a default_charset of UTF-8
  2. Set php.ini to have a default iconv.internal_encoding of UTF-8

Alternatively, I suspect passing the XML file through xmllint before it’s used.

At which point an é will be turned into an &#xE9; which will solve the problem.

Leave a Reply

Your email address will not be published. Required fields are marked *