Carl Drud <carl.drud@myrealbox.com> posting:
>> Jeg mener ikke, det skal fortolkes som UTF-8, hvis intet er angivet,
>> men ASCII (kan huske forkert).
>Sådan er det vist med mail/news beskeder. Om det forholder sig ligeså
>med web sider ved jeg ikke.
Nå, så må jeg vel hellere se, om jeg kan finde ud af en mulig årsag
og få en opfrisket HTML/HTTP-læsning
En meget hurtigt HTTP-læsning giver, at det per standard skal ses
som ISO-8859-1 (muligt jeg har misset noget i farten):
|The "charset" parameter is used with some media types to define the
|character set (section 3.4) of the data. When no explicit charset
|parameter is provided by the sender, media subtypes of the "text"
|type are defined to have a default charset value of "ISO-8859-1"
|when received via HTTP.
--
http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.7.1
|Some HTTP/1.0 software has interpreted a Content-Type header without
|charset parameter incorrectly to mean "recipient should guess."
|Senders wishing to defeat this behavior MAY include a charset
|parameter even when the charset is ISO-8859-1 and SHOULD do so when
|it is known that it will not confuse the recipient.
http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.4.1
Det er jo så meget rart, at HTML 4 siger noget andet og overfrumfer
dette:
|The HTTP protocol ([RFC2616], section 3.7.1) mentions ISO-8859-1 as
|a default character encoding when the "charset" parameter is absent
|from the "Content-Type" header field. In practice, this
|recommendation has proved useless because some servers don't allow a
|"charset" parameter to be sent, and others may not be configured to
|send the parameter. Therefore, user agents must not assume any
|default value for the "charset" parameter.
--
http://www.w3.org/TR/html401/charset.html
.... og videre....
|To sum up, conforming user agents must observe the following
|priorities when determining a document's character encoding (from
|highest priority to lowest):
|1. An HTTP "charset" parameter in a "Content-Type" field.
|2. A META declaration with "http-equiv" set to "Content-Type" and a
|value set for "charset".
|3. The charset attribute set on an element that designates an external
|resource.
(omtalte gratis-ting-side har intet af dette angivet)
|In addition to this list of priorities, the user agent may use
|heuristics and user settings. For example, many user agents use a
|heuristic to distinguish the various encodings used for Japanese
|text. Also, user agents typically have a user-definable, local
|default character encoding which they apply in the absence of other
|indicators.
Altså kan vi vel slutte, at validatorens og til tider FFs heuristik
i omtalte tilfælde ikke rammer plet, og det er altså dér, UTF-8
kommer ind i billedet. Værre er det, at FF ikke helt kan blive enig
med sig selv om at anvende den angivede "local default character
encoding which they apply in the absence of other indicators",
news:2o1f021lkslpnraf8m0lcie2kami4vf5pv@dtext.news.tele.dk - en bug?
--
What is life, except excuse for death,
or death, but an escape from life. -Ukendt
http://my.opera.com/community/customize/widgets/?show=new