Print

Print


Curious how others have handled this, as I'm sure those on this list have
dealt with this.

Any site that has mixed English/non-English content will quickly run into
this issue:

Page has a name like:

Bob

Intuitive URLs are nice, we'll put it at:

/bob

Page has a name like:

Itajaí Cêrebro Raízes

OK... should that be:

/itajai-cerebro-raizes
/itajaí-cêrebro-raízes
/itaja-crebro-razes

The first being easy for US-English users to type on a keyboard, and, valid
in all URI specs.

The second being the actual name, and valid in the HTML5 URI spec, invalid
in the original URI spec, and will break old browsers like anyone on old
enough versions of Internet Explorer. And, a US-English user probably
doesn't know how to type it.

The third being a naive "Remove anything that isn't on a US English
keyboard" (regex /[^/w/d]/), works for everyone, simplest to code, least
intuitive.

The challenge isn't over yet. How about:

עםמק שבםודאןב

What should that URL be? /עםמק-שבםודאןב? And note that this is a
right-to-left language, so in a sense just having the URL read left to
right until this name is a bit Amerocentric on its own. Zero chance a US
English keyboard user will have any idea how to type that, and, they're
unlikely to recall it when sharing in a conversation either.

I've seen 3 opinions and I dislike all of them:

1) Be Amerocentric like the old days. Just mush everything down to US
keyboard chars and when there aren't any, give it an ID number.

2) Be open. Use it as-is and if someone can't type it well, there's always
Google, and the page might not be in their language anyway so maybe that
content isn't for them.

3) Do both. Accept both above URL approaches:
3a) ...and redirect to Amerocentric URL
3b) ...and redirect to international URL
3c) ...and don't redirect

The problem with 1 is obvious.

The problem with 2 is "That content isn't for them" is a false assertion.
That Hebrew name above comes from an actual site, about an event in an area
for primarily English speakers.

The variations in 3 all have issues:

3c breaks SEO rules. You are supposed to have a single canonical URL for
each page and each piece of content on your site, or else you're going to
frustrate users looking for your content with multiple results for the same
thing - or more likely, Google will punish you by pushing all of your
frustrating results down and users will never find you.

3a and 3b break Web Performance rules, especially on mobile. A Redirect is
a great way to ruin page load time. Presumably within your site you'd
always link to the canonical URL, so those visiting the non-canonical will
typically coming in from the outside. Wasting their time with a redirect is
a good way to turn users away from your slow site.

Interested to hear if there's a 4th, best option.