PHP and MySql: getting ready for multi language applications with utf8

Thursday, 08 December 2011

This explains how to write PHP applications ready for international use.

Utf-8 is a wide-used and well-suported character encoding which allows for special characters, like accented characters, characters with umlauts, cedillas and so on.

Here's a few steps to follow to ensure your data gets correctly stored in utf8 encoding:

place this at the very top of your 'entry' PHP script or your config PHP script(to make sure the code gets executed site-wide):

<?php //set INTL encoding for PHP sources and regexp
mb_internal_encoding("UTF-8");
mb_regex_encoding("UTF-8");

 

When outputting html, make sure you set document encoding by specifying a HTTP header with PHP

<?php header('Content-Type: text/html; charset=UTF-8');

or a meta tag in your HTML <head> 'section':

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

Now for the MySQL part:

MySql uses the so-called 'collations' to determine how the database stores, sorts and compares strings.

The catch here is that you may configured the tables to store text fields in utf8, but on the other hand the default server collation may be set to sometnigh else(for example, 'latin1_swedish_ci'). Note the '_ci' suffix, it stands for 'Case inSensiTive'.

What I do is set the schema collation to utf8_unicode_ci,  which sorts strings well in most languages.

As for PHP, I make sure the very first of my SQL statements(after the database connecion was setup) is

SET NAMES utf8;

 

 This will explicitly set MySQL connection collation. So we have utf8 encoded PHP code, utf8 regex engine for PHP, we output utf8 HTML, and get inputs from HTML forms(you guessed) in utf8.

Setting NAMES to utf8 ensures we pass data back and forth to MySQL in the right encoding. If you don't set the NAMES, you may connect with a different collation(server's default), and malformed text will come your way, because automatic transcoding occurs automatically at the database level(e.g. from 'utf8' to 'latin1' character set or God nows what).

Thanks, and have fun!



Give us some social love (it really works now)!

Reddit! Del.icio.us! StumbleUpon! Yahoo! Swik!



Be first to comment this article
RSS comments

Write Comment
  • Please keep the topic of messages relevant to the subject of the article.
  • Please don't use comments to plug your web site. Links are rel='nofollow'-ed
  • Please refresh the page if you're having trouble with the security image code
Name:
E-mail
Homepage
Title:
Comment:

:) :grin ;) 8) :p
:roll :eek :upset :zzz :sigh
:? :cry :( :x
Code:* Code

Last Updated ( Thursday, 08 December 2011 )
 

Europe freelancer directory

Newsletter

Subscribe to TeachMeJoomla's newsletter
Name:
Email:


Auto tags

multilanguage php

+php +"set names"

php multilanguage mysql

php mysql multilanguage

set language by mysql php

multi bahasa php

set php source utf8

multi lang in php sql

setup a multilanguage mysql db

multi lang mysql php

sort utf8 encoded titles mysql

multi language +php

title utf8 mysql php

multi language mysql php

udf8 with multi language support in mysql

multi language support in php and mysql

multi language with php-intl exemple

multi-language with php mysql

multilanguage for php

multilanguage mysql php

multilanguage php application

multilanguage support mysql collation