Jump to content

Charset conversion


Poco
 Share

2 posts in this topic

Recommended Posts

We have successfully converted our database charset to utf8mb4 in addition to applying a fix to posts with accented or otherwise special characters. If you notice any information has been lost or content is appearing garbled please let us know and we can attempt to recover. This conversion also allows more modern emojis to be used in the forum.

 

Known issues:

-Some old posts containing � may have been converted to the incorrect character. No information previously displayed should have been lost however.

-Emojis are not supported through the quick edit/reply boxes. If you edit a post containing such characters, they will vanish on submission.

  • Like 6
Link to comment
Share on other sites

I may have found the issue with quick edit and emojis. It's related to double encoding in PHP.

The full editor uses a multipart payload to deliver the content, which results in:

<p><strong><span style="color:rgb(51,51,51);font-family:'Segoe UI Emoji';font-size:49px;">?? </span></strong></p>

While the quick editor uses a single POST body, which will get it encoded as
 

%3Cp%3E%3Cspan%20style%3D%22color%3Argb%2851%2C51%2C51%29%3Bfont-family%3A%27Segoe%20UI%20Emoji%27%3Bfont-size%3A49px%3B%22%3E%20%3C/span%3E%3C/p%3E%0A

If you call rawurldecode(thatString) in PHP, you'll get
 

<p><span style="color:rgb(51,51,51);font-family:'Segoe UI Emoji';font-size:49px;">% uD83D% uDE0C</span></p>

Which may be the cause of the issues. I need to check the PHP code that handles Quick Posts.

 

EDIT: ipb strips out % symbol. On latest one it's not empty, it's "% uD83D% uDE0C" (without spaces). Different from the first ??, which are the actual utf8 characters.

Link to comment
Share on other sites

 Share

×
×
  • Create New...