Page 2 of 3

Re: "Rig presets" missing

PostPosted: Sat Dec 29, 2018 2:20 am
by RobBaer
radionfire wrote:Hi all,

I'm currently using an English version windows10 code page:65001, so far as I know there is no Non-Ascii characters have ever been used anywhere on my computer.

I'm just an artist and never code anything at all, so I guess there is nothing I can help further. ...


We'll see what we can figure out. Reports like yours are helpful for making things work for all of us.. Thanks for sharing your experience.

Re: "Rig presets" missing

PostPosted: Sat Dec 29, 2018 11:07 am
by joepal
Although the windows interface language is english, the detected code pages are "gbk" https://en.wikipedia.org/wiki/GBK_(character_encoding) and cp936 https://en.wikipedia.org/wiki/Code_page_936_(Microsoft_Windows)

The only non-ascii character I can think of in the skeleton files is the "tm" character. It's possible this can't be represented in either of these code pages, although I don't really see a reason why it should have to be. I think we're simply assuming json files will be opened as utf-8. Maybe it'd be possible to explicitly tell json to open files with the utf-8 encoding?

Re: "Rig presets" missing

PostPosted: Sat Dec 29, 2018 11:54 am
by Aranuvir
From /usr/lib/python3.6/json/__init__.py, line 338:
The ``encoding`` argument is ignored and deprecated.

I'm not sure if the encoding matters, since the file was saved as 'utf-8'.

Re: "Rig presets" missing

PostPosted: Sun Dec 30, 2018 5:35 pm
by jujube
I think I had this happen to me when I tried using a custom skeleton. And I couldn't fix it, so I had to switch to using the next version of makehuman (1.1.0 to 1.1.1). My memory is faulty though so that may not be accurate...

Re: "Rig presets" missing

PostPosted: Sun Dec 30, 2018 7:19 pm
by RobBaer
Aranuvir wrote:From /usr/lib/python3.6/json/__init__.py, line 338:
The ``encoding`` argument is ignored and deprecated.

I'm not sure if the encoding matters, since the file was saved as 'utf-8'.


But remember that unicode is stored UTF8 on unix-alikes but as UTF16 on windows. It's been long enough since we last worked on this that I don't remember what bearing this might have, but I think it plays into the codepage mix.

joelpal wrote:The only non-ascii character I can think of in the skeleton files is the "tm" character.


:D Yes, some good ideas only dig a deeper ditch...

Re: "Rig presets" missing

PostPosted: Tue Jan 01, 2019 11:16 am
by Aranuvir
But remember that unicode is stored UTF8 on unix-alikes but as UTF16 on windows. It's been long enough since we last worked on this that I don't remember what bearing this might have, but I think it plays into the codepage mix.

I disbelief the files get converted to another encoding, when simply being stored on another filesystem. Maybe some unzip tools do a conversion, but in that case one should expect more problems. My favorites are either some files were opened with another tool and then saved back with a different encoding, or, even more likely, one file is simply corrupt.
We should change the code, so it doesn't stop to load other rigs when it fails loading on one file.

@radionfire: perhaps you can identify the rig in question by removing all mhskel files and adding them back one by one?

Re: "Rig presets" missing

PostPosted: Tue Jan 01, 2019 10:18 pm
by RobBaer
So, it would seem that the trademark symbol is NOT on codepage 936 (GBK?)
https://ssl.icu-project.org/icu-bin/convexp?conv=cp936 and https://www.unicode.org/Public/MAPPINGS ... /CP936.TXT

Here is the bit about OS based unicode storage differences although this may not affect json text:
TM_Encoding.png
TM encoding


Section 11 #18 of this manifesto is worth a read:
http://utf8everywhere.org/

Did we miss skeletons when we did the codepage fixes?

Re: "Rig presets" missing

PostPosted: Wed Jan 02, 2019 9:15 am
by joepal
I guess the obvious low-tech solution would be to back out of the TM change. But that's mostly a band-aid, as there's nothing stopping people from adding any utf-8 text in names, tags and descriptions in asset files.

But I agree with aranuvir: it seems very strange that the file is read using the file system codepage/encoding. For file NAMES I can understand it, but for file CONTENTS, it should be utf-8 no matter the file system encoding.

I think the best way forward would be to make really sure that all files are read with utf-8 encoding. If the json call doesn't support this directly, we can read the file to a string via a normal file operation first, and then pass that string to the json parser.

Re: "Rig presets" missing

PostPosted: Wed Jan 02, 2019 7:54 pm
by RobBaer
@radionfire
This may be a long shot, but when I was looking at my own "locale settings on my Windows 10 (Build 17134)" to try to use the simplified Chinese codepage, I got the option to "use unicode utf8" for international support.

Simplified Chinese.png
Simplified Chinese
Simplified Chinese.png (10.89 KiB) Viewed 4773 times


Do you have this option, and if so, have you checked the option? Does it help? (What is your Windows 10 build? Type 'System Information' in search box; read second line for version). This unicode utf8 option may be relatively new as it is listed as beta on the dialog.

Directions
  • start control panel
  • in the upper right choose "category mode" from drop down if not already selected
  • click clock and region
  • click region, Administrator tab
  • click bottom button (system locale)
  • on the regions dialog box that appears, look for the check box in red on the figure above and check it.

Re: "Rig presets" missing - unicode issue

PostPosted: Thu Jan 03, 2019 10:39 pm
by RobBaer
OK- More looking into this...

It seems that I have problems with displaying the TM symbol even with encoding set to "United States", so this may not be a codepage specific issue. The figure below shows what I see with and without the check box checked as per my previous post. The check box solution works on Windows 10, but I fear it might not exist for Vista and Windows 7, but I don't have those easily available to look.

Unlike @radionfire's original post, I am able to see the skeletons either with or without the box checked. This, is where the codepage may add additional complexity. I guess that unless the box is checked, Windows assumes that unicode was saved as UTF16 rather than utf8. I guess, since Joel (or whoever) likely did it on a Ubuntu system, json unicode is saved as utf8 unicode??

BoxCSollution.png
Box Solution