"Rig presets" missing - unicode issue

If you have problems understanding something or getting started, ask here

Moderator: joepal

Re: "Rig presets" missing

Postby RobBaer » Sat Dec 29, 2018 2:20 am

radionfire wrote:Hi all,

I'm currently using an English version windows10 code page:65001, so far as I know there is no Non-Ascii characters have ever been used anywhere on my computer.

I'm just an artist and never code anything at all, so I guess there is nothing I can help further. ...


We'll see what we can figure out. Reports like yours are helpful for making things work for all of us.. Thanks for sharing your experience.
User avatar
RobBaer
 
Posts: 1208
Joined: Sat Jul 13, 2013 3:30 pm
Location: Kirksville, MO USA

Re: "Rig presets" missing

Postby joepal » Sat Dec 29, 2018 11:07 am

Although the windows interface language is english, the detected code pages are "gbk" https://en.wikipedia.org/wiki/GBK_(character_encoding) and cp936 https://en.wikipedia.org/wiki/Code_page_936_(Microsoft_Windows)

The only non-ascii character I can think of in the skeleton files is the "tm" character. It's possible this can't be represented in either of these code pages, although I don't really see a reason why it should have to be. I think we're simply assuming json files will be opened as utf-8. Maybe it'd be possible to explicitly tell json to open files with the utf-8 encoding?
Joel Palmius (LinkedIn)
MakeHuman Infrastructure Manager
http://www.palmius.com/joel
joepal
 
Posts: 4465
Joined: Wed Jun 04, 2008 11:20 am

Re: "Rig presets" missing

Postby Aranuvir » Sat Dec 29, 2018 11:54 am

From /usr/lib/python3.6/json/__init__.py, line 338:
The ``encoding`` argument is ignored and deprecated.

I'm not sure if the encoding matters, since the file was saved as 'utf-8'.
Aranuvir
 
Posts: 1314
Joined: Sun Oct 12, 2014 2:12 pm

Re: "Rig presets" missing

Postby jujube » Sun Dec 30, 2018 5:35 pm

I think I had this happen to me when I tried using a custom skeleton. And I couldn't fix it, so I had to switch to using the next version of makehuman (1.1.0 to 1.1.1). My memory is faulty though so that may not be accurate...
jujube
 
Posts: 404
Joined: Fri Aug 14, 2015 10:46 pm

Re: "Rig presets" missing

Postby RobBaer » Sun Dec 30, 2018 7:19 pm

Aranuvir wrote:From /usr/lib/python3.6/json/__init__.py, line 338:
The ``encoding`` argument is ignored and deprecated.

I'm not sure if the encoding matters, since the file was saved as 'utf-8'.


But remember that unicode is stored UTF8 on unix-alikes but as UTF16 on windows. It's been long enough since we last worked on this that I don't remember what bearing this might have, but I think it plays into the codepage mix.

joelpal wrote:The only non-ascii character I can think of in the skeleton files is the "tm" character.


:D Yes, some good ideas only dig a deeper ditch...
User avatar
RobBaer
 
Posts: 1208
Joined: Sat Jul 13, 2013 3:30 pm
Location: Kirksville, MO USA

Re: "Rig presets" missing

Postby Aranuvir » Tue Jan 01, 2019 11:16 am

But remember that unicode is stored UTF8 on unix-alikes but as UTF16 on windows. It's been long enough since we last worked on this that I don't remember what bearing this might have, but I think it plays into the codepage mix.

I disbelief the files get converted to another encoding, when simply being stored on another filesystem. Maybe some unzip tools do a conversion, but in that case one should expect more problems. My favorites are either some files were opened with another tool and then saved back with a different encoding, or, even more likely, one file is simply corrupt.
We should change the code, so it doesn't stop to load other rigs when it fails loading on one file.

@radionfire: perhaps you can identify the rig in question by removing all mhskel files and adding them back one by one?
Aranuvir
 
Posts: 1314
Joined: Sun Oct 12, 2014 2:12 pm

Re: "Rig presets" missing

Postby RobBaer » Tue Jan 01, 2019 10:18 pm

So, it would seem that the trademark symbol is NOT on codepage 936 (GBK?)
https://ssl.icu-project.org/icu-bin/convexp?conv=cp936 and https://www.unicode.org/Public/MAPPINGS ... /CP936.TXT

Here is the bit about OS based unicode storage differences although this may not affect json text:
TM_Encoding.png
TM encoding


Section 11 #18 of this manifesto is worth a read:
http://utf8everywhere.org/

Did we miss skeletons when we did the codepage fixes?
User avatar
RobBaer
 
Posts: 1208
Joined: Sat Jul 13, 2013 3:30 pm
Location: Kirksville, MO USA

Re: "Rig presets" missing

Postby joepal » Wed Jan 02, 2019 9:15 am

I guess the obvious low-tech solution would be to back out of the TM change. But that's mostly a band-aid, as there's nothing stopping people from adding any utf-8 text in names, tags and descriptions in asset files.

But I agree with aranuvir: it seems very strange that the file is read using the file system codepage/encoding. For file NAMES I can understand it, but for file CONTENTS, it should be utf-8 no matter the file system encoding.

I think the best way forward would be to make really sure that all files are read with utf-8 encoding. If the json call doesn't support this directly, we can read the file to a string via a normal file operation first, and then pass that string to the json parser.
Joel Palmius (LinkedIn)
MakeHuman Infrastructure Manager
http://www.palmius.com/joel
joepal
 
Posts: 4465
Joined: Wed Jun 04, 2008 11:20 am

Re: "Rig presets" missing

Postby RobBaer » Wed Jan 02, 2019 7:54 pm

@radionfire
This may be a long shot, but when I was looking at my own "locale settings on my Windows 10 (Build 17134)" to try to use the simplified Chinese codepage, I got the option to "use unicode utf8" for international support.

Simplified Chinese.png
Simplified Chinese
Simplified Chinese.png (10.89 KiB) Viewed 4769 times


Do you have this option, and if so, have you checked the option? Does it help? (What is your Windows 10 build? Type 'System Information' in search box; read second line for version). This unicode utf8 option may be relatively new as it is listed as beta on the dialog.

Directions
  • start control panel
  • in the upper right choose "category mode" from drop down if not already selected
  • click clock and region
  • click region, Administrator tab
  • click bottom button (system locale)
  • on the regions dialog box that appears, look for the check box in red on the figure above and check it.
User avatar
RobBaer
 
Posts: 1208
Joined: Sat Jul 13, 2013 3:30 pm
Location: Kirksville, MO USA

Re: "Rig presets" missing - unicode issue

Postby RobBaer » Thu Jan 03, 2019 10:39 pm

OK- More looking into this...

It seems that I have problems with displaying the TM symbol even with encoding set to "United States", so this may not be a codepage specific issue. The figure below shows what I see with and without the check box checked as per my previous post. The check box solution works on Windows 10, but I fear it might not exist for Vista and Windows 7, but I don't have those easily available to look.

Unlike @radionfire's original post, I am able to see the skeletons either with or without the box checked. This, is where the codepage may add additional complexity. I guess that unless the box is checked, Windows assumes that unicode was saved as UTF16 rather than utf8. I guess, since Joel (or whoever) likely did it on a Ubuntu system, json unicode is saved as utf8 unicode??

BoxCSollution.png
Box Solution
User avatar
RobBaer
 
Posts: 1208
Joined: Sat Jul 13, 2013 3:30 pm
Location: Kirksville, MO USA

PreviousNext

Return to Newbies

Who is online

Users browsing this forum: Google [Bot] and 1 guest

cron