Hi all, relatively new to databases and MT, but I feel like I’m getting the hang of things, with one minor, obnoxious snag.
I’m trying to post entries in my blog with Japanese characters, and as soon as I hit save, the Japanese characters all get converted to question marks. For instance, if the characters are in the entry title, the dashboard will say “publishing Entry ‘???, English characters display ok’ ..” and when it shows the edit entry page again all the asian characters are question marks.
From several searches this tells me that there must be wrong with either the blog or database encoding settings; in the browser the blog index and everything is set to utf-8 and I can’t find any settings within the blog dashboard itself that seem to affect the situation. In phpMyAdmin, it’s set to: MySQL charset: UTF-8 Unicode (utf8) MySQL connection collation: utf8unicodeci
All my settings in phpAdmin are actually identical to a friend of mine (who does not have this problem); we even copied the contents of his mt-config file to make sure that our configurations were the same… but no luck.
Can anyone shed some light on this?
Reported on Movable Type 4.2
As I've seen this issue with some of my customers, the issue you are having may be related to your web hosting configuration.
Your database has the correct encoding (UTF-8) and so also your movable type backend, but your server may impose ISO for your file headers.
Try to validate a page showing those bad japanese characters against the W3C specs from http://validator.w3.org/ and you will find out if the page header is imposed by your hosting server.
Then, if that's the case, contact your web server and ask them to fix this.
For some customers, this issue could be fixed also by a setting they could do themselves on their hosting box, but it is always advisable to have your hosting support do it for you...
Hi Mihai, thanks for your response :)
I put in the permalink to one of the posts containing Japanese characters (of course, these are all just shown as question marks now). The validator properly identified it as utf-8, is there any other place I should be looking?
It's doubly strange that my friend has no trouble with this, since I'm actually using his server to host my files...
Could you post a link from your friend's blog and yours? Weird thing is, I am noticing that virtually all of the pages that I am loading up in Chrome, Firefox or IE on Windows that have Japanese characters on them look like garbage to me, and Windows XP is supposed to come with a fully functional set of Unicode fonts for stuff like this.
http://www.paper-machete.com/2008/11/-death-by-employment.html
here is a permalink that should have some Japanese characters (more as a test than anything else). The ??? at the beginning of the entry title should read 過労死, and I tried putting a random character at the end of the entry just to see if it'll last. That too became a question mark.
I can't find a link on my friend's blog that has Japanese in it, but when I told him about it he published a test post and it showed up alright.
For my own blog, I have no problem entering Japanese text when writing an entry, but as soon as I hit "save" it refreshes and says "publishing entry" and any Japanese characters are converted to question marks (not garbled accented characters).
What happens when you run a select statement in MySQL that pulls back the title for that post?
Hi David, this is quite weird!...
Are you writing the Japanese characters by hand in movable type (i suspect not), or are you pasting it from somewhere?
Where are you pasting it from?
Hi Mike, I'm sorry but quite the novice when it comes to MySQL, could you elaborate or link me to some more information? I need to beef up on this sort of thing anyways.
Mihai, I'm actually writing it directly into the "create" or "edit entry" text box within MT. No copy/pasting.
Find the entry ID number in the CMS for that entry, log into phpMyAdmin, and run:
select entry_title from mt_entry where entry_id = ENTRY_ID;
select entry_title from mt_entry where entry_id = ENTRYID;
Thanks Mike, I'll try that as soon as I get home. What kind of reaction am I looking for? If it comes back in the correct characters, or if it responds with question marks?
Pretty much. Alternatively, you can try this from the command line if you think phpMyAdmin might not be returning unicode:
mysql -u USERNAME -p DATEBASE NAME
---enter password
---run the select statement from the server's command line via SSH.
Hi Mike,
I ran the command in phpMyAdmin's query window and it returned the entry, with question marks:
"???, death by employment"
Checked the "server settings and variables," here's what it says:
Server variables and settings
Variable Session value / Global value
auto increment increment 1
auto increment offset 1
automatic sp privileges ON
back log 50
basedir /data/mysql/dzurilla/
binlog cache size 32,768
bulk insert buffer size 8,388,608
character set client utf8
(Global value) latin1
character set connection utf8
(Global value) latin1
character set database latin1
character set filesystem binary
character set results utf8
(Global value) latin1
character set server latin1
character set system utf8
character sets dir /data/mysql/dzurilla/share/mysql/charsets/
collation connection utf8_unicode_ci
(Global value) latin1_swedish_ci
collation database latin1_swedish_ci
collation server latin1_swedish_ci
completion type 0
concurrent insert 1
connect timeout 10
datadir /dh/mysql/dzurilla/data/
date format %Y-%m-%d
datetime format %Y-%m-%d %H:%i:%s
default week format 0
delay key write ON
delayed insert limit 100
delayed insert timeout 300
delayed queue size 1,000
div precision increment 4
....
there's a lot more but i think this is the relevant part. Does this mean my server is set in a different character encoding? and if so, is there an easy way to just set everything to UTF-8?
Is your blog hosted on the same server as your friend's blog?
yep, and I've copied the settings from his mt config file so our settings within MT should be identical
Ok... if it is on the same server, same host, same mt-config file, you may need to contact your host's tech support.
so the variety of character encodings listed in my server settings and variables within myPhpAdmin is inconsequential in this case? hm this is a bit frustrating :\
Take a look at this and tell me if it applies to your post: http://bugs.mysql.com/bug.php?id=16526
hmm, it looks like they're mostly talking about the fact that utf8_unicode doesn't make the distinction between certain similar characters in Japanese. But my problem is not just character pairs, but every character in the language getting treated the same and being converted to question marks.
In the main phpMyAdmin window, it says that the MySQL charset is UTF-8 Unicode (utf8) and the MySQL connection collation is UTF_unicode_ci.
However, when i look at the MT database itself, the collation is set to utf8_general_ci.
It seems like whatever is happening, it is converting the Japanese to question marks as soon as they are sent to the database (as soon as I hit the 'save' button). It makes sense that this is a database problem, but I don't have much experience with databases so its difficult for me to determine where exactly the problem is (there are just so many different "collations" going on I can't tell what's supposed to be what).
Thanks for your effort though, if you can think of anything else I am certainly all ears. This bug is preventing me from posting quite a bit of material.
I doesn't have anything to do with Japanese.
I am having exactly the same problem with Turkish characters.
And I started having this problem after upgrading to MT 4.3 from MT 3.2. So I figure there's something about 4.3 that needs to be fixed.