default userpic

Trouble with foreign language entries

Vote 0 Votes

Hi all, relatively new to databases and MT, but I feel like I’m getting the hang of things, with one minor, obnoxious snag.

I’m trying to post entries in my blog with Japanese characters, and as soon as I hit save, the Japanese characters all get converted to question marks. For instance, if the characters are in the entry title, the dashboard will say “publishing Entry ‘???, English characters display ok’ ..” and when it shows the edit entry page again all the asian characters are question marks.

From several searches this tells me that there must be wrong with either the blog or database encoding settings; in the browser the blog index and everything is set to utf-8 and I can’t find any settings within the blog dashboard itself that seem to affect the situation. In phpMyAdmin, it’s set to: MySQL charset: UTF-8 Unicode (utf8) MySQL connection collation: utf8unicodeci

All my settings in phpAdmin are actually identical to a friend of mine (who does not have this problem); we even copied the contents of his mt-config file to make sure that our configurations were the same… but no luck.

Can anyone shed some light on this?

Reported on Movable Type 4.2

20 Replies

| Add a Reply
  • As I've seen this issue with some of my customers, the issue you are having may be related to your web hosting configuration.

    Your database has the correct encoding (UTF-8) and so also your movable type backend, but your server may impose ISO for your file headers.

    Try to validate a page showing those bad japanese characters against the W3C specs from http://validator.w3.org/ and you will find out if the page header is imposed by your hosting server.

    Then, if that's the case, contact your web server and ask them to fix this.

    For some customers, this issue could be fixed also by a setting they could do themselves on their hosting box, but it is always advisable to have your hosting support do it for you...

  • Hi Mihai, thanks for your response :)

    I put in the permalink to one of the posts containing Japanese characters (of course, these are all just shown as question marks now). The validator properly identified it as utf-8, is there any other place I should be looking?

    It's doubly strange that my friend has no trouble with this, since I'm actually using his server to host my files...

  • Could you post a link from your friend's blog and yours? Weird thing is, I am noticing that virtually all of the pages that I am loading up in Chrome, Firefox or IE on Windows that have Japanese characters on them look like garbage to me, and Windows XP is supposed to come with a fully functional set of Unicode fonts for stuff like this.

  • http://www.paper-machete.com/2008/11/-death-by-employment.html
    here is a permalink that should have some Japanese characters (more as a test than anything else). The ??? at the beginning of the entry title should read 過労死, and I tried putting a random character at the end of the entry just to see if it'll last. That too became a question mark.

    I can't find a link on my friend's blog that has Japanese in it, but when I told him about it he published a test post and it showed up alright.

    For my own blog, I have no problem entering Japanese text when writing an entry, but as soon as I hit "save" it refreshes and says "publishing entry" and any Japanese characters are converted to question marks (not garbled accented characters).

  • Hi David, this is quite weird!...

    Are you writing the Japanese characters by hand in movable type (i suspect not), or are you pasting it from somewhere?

    Where are you pasting it from?

  • Hi Mike, I'm sorry but quite the novice when it comes to MySQL, could you elaborate or link me to some more information? I need to beef up on this sort of thing anyways.

    Mihai, I'm actually writing it directly into the "create" or "edit entry" text box within MT. No copy/pasting.

  • select entry_title from mt_entry where entry_id = ENTRYID;

  • Thanks Mike, I'll try that as soon as I get home. What kind of reaction am I looking for? If it comes back in the correct characters, or if it responds with question marks?

    • Pretty much. Alternatively, you can try this from the command line if you think phpMyAdmin might not be returning unicode:

      mysql -u USERNAME -p DATEBASE NAME

      ---enter password

      ---run the select statement from the server's command line via SSH.

  • Hi Mike,
    I ran the command in phpMyAdmin's query window and it returned the entry, with question marks:

    "???, death by employment"

  • Checked the "server settings and variables," here's what it says:

    Server variables and settings
    Variable Session value / Global value
    auto increment increment 1
    auto increment offset 1
    automatic sp privileges ON
    back log 50
    basedir /data/mysql/dzurilla/
    binlog cache size 32,768
    bulk insert buffer size 8,388,608
    character set client utf8
    (Global value) latin1
    character set connection utf8
    (Global value) latin1
    character set database latin1
    character set filesystem binary
    character set results utf8
    (Global value) latin1
    character set server latin1
    character set system utf8
    character sets dir /data/mysql/dzurilla/share/mysql/charsets/
    collation connection utf8_unicode_ci
    (Global value) latin1_swedish_ci
    collation database latin1_swedish_ci
    collation server latin1_swedish_ci
    completion type 0
    concurrent insert 1
    connect timeout 10
    datadir /dh/mysql/dzurilla/data/
    date format %Y-%m-%d
    datetime format %Y-%m-%d %H:%i:%s
    default week format 0
    delay key write ON
    delayed insert limit 100
    delayed insert timeout 300
    delayed queue size 1,000
    div precision increment 4
    ....

    there's a lot more but i think this is the relevant part. Does this mean my server is set in a different character encoding? and if so, is there an easy way to just set everything to UTF-8?

  • yep, and I've copied the settings from his mt config file so our settings within MT should be identical

  • so the variety of character encodings listed in my server settings and variables within myPhpAdmin is inconsequential in this case? hm this is a bit frustrating :\

  • hmm, it looks like they're mostly talking about the fact that utf8_unicode doesn't make the distinction between certain similar characters in Japanese. But my problem is not just character pairs, but every character in the language getting treated the same and being converted to question marks.

    In the main phpMyAdmin window, it says that the MySQL charset is UTF-8 Unicode (utf8) and the MySQL connection collation is UTF_unicode_ci.

    However, when i look at the MT database itself, the collation is set to utf8_general_ci.

    It seems like whatever is happening, it is converting the Japanese to question marks as soon as they are sent to the database (as soon as I hit the 'save' button). It makes sense that this is a database problem, but I don't have much experience with databases so its difficult for me to determine where exactly the problem is (there are just so many different "collations" going on I can't tell what's supposed to be what).

    Thanks for your effort though, if you can think of anything else I am certainly all ears. This bug is preventing me from posting quite a bit of material.

  • I doesn't have anything to do with Japanese.

    I am having exactly the same problem with Turkish characters.

    And I started having this problem after upgrading to MT 4.3 from MT 3.2. So I figure there's something about 4.3 that needs to be fixed.

Add a Reply

If you need to share template code, replace all the "<" signs with "&lt;" or use this utility.

Forum Groups

1773 6162

Last Topic: Excluding categories from blog by kholechek on Feb 9, 2012

86 302

Last Topic: website entries by masoud on Oct 26, 2011

1429 5077

Last Topic: What apocalypse hit this community in the middle of 2011? by 75th on Feb 10, 2012

695 2910

Last Topic: Insert Image / File Fails by Russ Miller on Feb 10, 2012

84 291

Last Topic: How to have some other characters in entry basename automatically written by Afshin Haghighatnia on Dec 22, 2011

173 737

Last Topic: About the MT version stated in HTML source by Alex E. Schneider on Feb 7, 2012

190 567

Last Topic: Analytics Reporting by michael webster on Feb 5, 2012

48 210

Last Topic: An idea and also a request by Afshin Haghighatnia on Jun 29, 2011

64 246

Last Topic: jQuery in MT 5.1 still at 1.4 - why? by perlmonkey on May 25, 2011

code.sixapart.com

137 478

Last Topic: Getting a thumbnail with xpath by Peter on Mar 13, 2011

222 720

Last Topic: Custom Field for Asset Not Appearing by android on Feb 9, 2012