Go Back   Wireless and Wifi Forums > News > Newsgroups > alt.computer.security
Register FAQ Forum Rules Members List Calendar Search Today's Posts Advertise Mark Forums Read

 
Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 05-25-2011, 03:37 PM
Globemaker
Guest
 
Posts: n/a
Default Google censors Unicode 0x12000

My two blogspot websites were censored by google where I posted
Unicode for Cuneiform above hex 0x12000 as code points. Here is where
it is done :
The Cuneiform Club
http://cuneiformclub.blogspot.com/p/...-failures.html
Greek Alphabet
http://greekalphabet.blogspot.com/
The code points above 0x12000 were replaced by question marks below
0xFFFF
That is decimal 73728 replaced by 65533 automatically by google on my
blogspots.
What's up?


Reply With Quote
  #2 (permalink)  
Old 05-25-2011, 04:13 PM
David H. Lipman
Guest
 
Posts: n/a
Default Re: Google censors Unicode 0x12000

From: "Globemaker" <alanfolmsbee@cabanova.com>

> My two blogspot websites were censored by google where I posted
> Unicode for Cuneiform above hex 0x12000 as code points. Here is where
> it is done :
> The Cuneiform Club
> http://cuneiformclub.blogspot.com/p/...-failures.html
> Greek Alphabet
> http://greekalphabet.blogspot.com/
> The code points above 0x12000 were replaced by question marks below
> 0xFFFF
> That is decimal 73728 replaced by 65533 automatically by google on my
> blogspots.
> What's up?


Its a free service and they don't like the char set as maybe it it could be used in a
malicious way.

I suggest your OWN web site and then you can do what you bloody like .


--
Dave
Multi-AV Scanning Tool - http://www.pctipp.ch/downloads/dl/35905.asp



Reply With Quote
  #3 (permalink)  
Old 05-25-2011, 04:27 PM
Globemaker
Guest
 
Posts: n/a
Default Re: Google censors Unicode 0x12000

On May 25, 12:13*pm, "David H. Lipman" <DLipman~nosp...@Verizon.Net>
wrote:
> From: "Globemaker" <alanfolms...@cabanova.com>
>
> > My two blogspot websites were censored by google where I posted
> > Unicode for Cuneiform above hex 0x12000 as code points. Here is where
> > it is done :
> > The Cuneiform Club
> >http://cuneiformclub.blogspot.com/p/...-failures.html
> > Greek Alphabet
> >http://greekalphabet.blogspot.com/
> > The code points above 0x12000 were replaced by question marks below
> > 0xFFFF
> > That is decimal 73728 replaced by 65533 automatically by google on my
> > blogspots.
> > What's up?

>
> Its a free service and they don't like the char set as maybe it it could be used in a
> malicious way.
>
> I suggest your OWN web site and then you can do what you bloody like .
>
> --
> Dave
> Multi-AV Scanning Tool -http://www.pctipp.ch/downloads/dl/35905.asp



More details, and correction:

The problem has been traced to a UTF-16 issue, not a 0x12000 range
issue for UTF-32 to UTF-8 encoding translation. When Cuneiform is in
Wordpad, and then copied to blogspot, it looks like the 0x12000
codepoint is then encoded into pairs of unicode high surrogate and low
surrogate 16 bit versions as so as UTF-16 decimal:
& # 55304 ; & # 56320
& # 55304 ; & # 56384

When Blogspot overlords at google spot that, they change it to a
UTF-32 question mark glyph at 65533 decimal
So the censorship is not for 12000 codepoints in UTF-32 it is for
UTF-16 pairs of numbers as hi surrogate and low surrogate.

Those encodings were all useable by some browsers, but not all.
Internet Explorer 8 was able to display the UTF-16 pairs but not
Chrome or Firefox.

Conclusion: the automatic censorship is taking place for the UTF-16
encodings of Cuneiform glyphs in Unicode, but not for UTF-32
encodings.

Reply With Quote
  #4 (permalink)  
Old 05-25-2011, 04:36 PM
Globemaker
Guest
 
Posts: n/a
Default Re: Google censors Unicode 0x12000

On May 25, 12:27*pm, Globemaker <alanfolms...@cabanova.com> wrote:
> On May 25, 12:13*pm, "David H. Lipman" <DLipman~nosp...@Verizon.Net>
> wrote:
>
>
>
>
>
>
>
>
>
> > From: "Globemaker" <alanfolms...@cabanova.com>

>
> > > My two blogspot websites were censored by google where I posted
> > > Unicode for Cuneiform above hex 0x12000 as code points. Here is where
> > > it is done :
> > > The Cuneiform Club
> > >http://cuneiformclub.blogspot.com/p/...-failures.html
> > > Greek Alphabet
> > >http://greekalphabet.blogspot.com/
> > > The code points above 0x12000 were replaced by question marks below
> > > 0xFFFF
> > > That is decimal 73728 replaced by 65533 automatically by google on my
> > > blogspots.
> > > What's up?

>
> > Its a free service and they don't like the char set as maybe it it could be used in a
> > malicious way.

>
> > I suggest your OWN web site and then you can do what you bloody like .

>
> > --
> > Dave
> > Multi-AV Scanning Tool -http://www.pctipp.ch/downloads/dl/35905.asp

>
> More details, and correction:
>
> The problem has been traced to a UTF-16 issue, not a 0x12000 range
> issue for UTF-32 to UTF-8 encoding translation. When Cuneiform is in
> Wordpad, and then copied to blogspot, it looks like the 0x12000
> codepoint is then encoded into pairs of unicode high surrogate and low
> surrogate 16 bit versions as so as UTF-16 decimal:
> *& # 55304 ; & # 56320
> & # 55304 ; & # 56384
>
> When Blogspot overlords at google spot that, they change it to a
> UTF-32 question mark glyph at 65533 decimal
> *So the censorship is not for 12000 codepoints in UTF-32 it is for
> UTF-16 pairs of numbers as hi surrogate and low surrogate.
>
> Those encodings were all useable by some browsers, but not all.
> Internet Explorer 8 was able to display the UTF-16 pairs but not
> Chrome or Firefox.
>
> Conclusion: the automatic censorship is taking place for the UTF-16
> encodings of Cuneiform glyphs in Unicode, but not for UTF-32
> encodings.


Additional notes:

I tried to fix the problem by posting UTF-32 codepoints directly into
the blogspot. I verified that the values were near & # 73728. But the
next day, they had all been changed to UTF-16. What next? Will those
UTF-16 pairs of 16 bit integers be replaced by single & # 65533
question marks again? I expect so. As a separate test, I have posted
Cuneiform Unicode to Wordpress to see if it is uncensored there.
http://popcry.wordpress.com/abecedary/

Reply With Quote
  #5 (permalink)  
Old 05-25-2011, 09:19 PM
David H. Lipman
Guest
 
Posts: n/a
Default Re: Google censors Unicode 0x12000

From: "Globemaker" <alanfolmsbee@cabanova.com>

> Additional notes:
>
> I tried to fix the problem by posting UTF-32 codepoints directly into
> the blogspot. I verified that the values were near & # 73728. But the
> next day, they had all been changed to UTF-16. What next? Will those
> UTF-16 pairs of 16 bit integers be replaced by single & # 65533
> question marks again? I expect so. As a separate test, I have posted
> Cuneiform Unicode to Wordpress to see if it is uncensored there.
> http://popcry.wordpress.com/abecedary/


Besides setting up your OWN webv site, here is a simple solution.

Instead of posting the text in UTF-32 codepoints directly, post a graphic of the paragraph
of the text represented in UTF-32 codepoints.

BTW: I don't think the word "censorship" is apropos in this situation.


--
Dave
Multi-AV Scanning Tool - http://www.pctipp.ch/downloads/dl/35905.asp



Reply With Quote
  #6 (permalink)  
Old 05-25-2011, 10:20 PM
Globemaker
Guest
 
Posts: n/a
Default Re: Google censors Unicode 0x12000

On May 25, 5:19*pm, "David H. Lipman" <DLipman~nosp...@Verizon.Net>
wrote:
> From: "Globemaker" <alanfolms...@cabanova.com>
>
> > Additional notes:

>
> > I tried to fix the problem by posting UTF-32 codepoints directly into
> > the blogspot. I verified that the values were near & # 73728. But the
> > next day, they had all been changed to UTF-16. What next? Will those
> > UTF-16 pairs of 16 bit integers be replaced by single & # 65533
> > question marks again? I expect so. As a separate test, I have posted
> > Cuneiform Unicode to Wordpress to see if it is uncensored there.
> >http://popcry.wordpress.com/abecedary/

>
> Besides setting up your OWN webv site, here is a simple solution.
>
> Instead of posting the text in UTF-32 codepoints directly, post a graphicof the paragraph
> of the text represented in UTF-32 codepoints.
>
> BTW: *I don't think the word "censorship" is apropos in this situation.
>
> --
> Dave
> Multi-AV Scanning Tool -http://www.pctipp.ch/downloads/dl/35905.asp


Unicode 6.0 is intended to let all character sets from many cultures
be displayed on most computers. I am working to confirm that for
Cuneiform, the original alphabet of Iraq. As an alternative to
Unicode, I could make a jpeg image of clay tablets I prepare with a
stylus.

This may not be appropriate to call this Censorship, when an American
superpower corporation destroys the writings of Iraq. A better
description instead of censorship might be patriotism. Or maybe this
is a heroic effort to make the world safe for Anglophiles by deleting
Cuneiform Iraqi files. Imagine how this deletion of my Unicode code
points for Cuneiform may have occurred: Bush is sitting in his bunker
in Crawford Texas glued to his plasma monitor, scouring the websites
of the world with robotic killer code, unleashing programmatic ethnic
cleansing to provide a Uniform character set instead of Unicode
varieties for which there may be a 1% chance of malicious intent.

Yes, you are right, an image of Cuneiform may be more secure from such
deletions.

Reply With Quote
  #7 (permalink)  
Old 06-03-2011, 11:47 AM
Junior Member
 
Join Date: May 2011
Posts: 2
Default

Thank you
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Free Money From Google Adsense - Introduction arigano.spagety@gmail.com alt.comp.hardware 0 09-30-2007 04:31 AM
Google "Secure Access" FAQ + Download link frankdowling1@yahoo.com alt.internet.wireless 11 09-23-2005 08:22 PM


All times are GMT. The time now is 01:00 PM.



Powered by vBulletin® Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.6.0 PL2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45