Movatterモバイル変換


[0]ホーム

URL:


Document Information

Preface

Part I Introduction

1.  Overview

2.  Using the Tutorial Examples

Part II The Web Tier

3.  Getting Started with Web Applications

4.  Java Servlet Technology

5.  JavaServer Pages Technology

6.  JavaServer Pages Documents

7.  JavaServer Pages Standard Tag Library

8.  Custom Tags in JSP Pages

9.  Scripting in JSP Pages

10.  JavaServer Faces Technology

11.  Using JavaServer Faces Technology in JSP Pages

12.  Developing with JavaServer Faces Technology

13.  Creating Custom UI Components

14.  Configuring JavaServer Faces Applications

15.  Internationalizing and Localizing Web Applications

Java Platform Localization Classes

Providing Localized Messages and Labels

Establishing the Locale

Setting the Resource Bundle

Retrieving Localized Messages

Date and Number Formatting

Character Sets and Encodings

Character Sets

Character Encoding

Request Encoding

Page Encoding

Response Encoding

Further Information about Internationalizing Web Applications

Part III Web Services

16.  Building Web Services with JAX-WS

17.  Binding between XML Schema and Java Classes

18.  Streaming API for XML

19.  SOAP with Attachments API for Java

Part IV Enterprise Beans

20.  Enterprise Beans

21.  Getting Started with Enterprise Beans

22.  Session Bean Examples

23.  A Message-Driven Bean Example

Part V Persistence

24.  Introduction to the Java Persistence API

25.  Persistence in the Web Tier

26.  Persistence in the EJB Tier

27.  The Java Persistence Query Language

Part VI Services

28.  Introduction to Security in the Java EE Platform

29.  Securing Java EE Applications

30.  Securing Web Applications

31.  The Java Message Service API

32.  Java EE Examples Using the JMS API

33.  Transactions

34.  Resource Connections

35.  Connector Architecture

Part VII Case Studies

36.  The Coffee Break Application

37.  The Duke's Bank Application

Part VIII Appendixes

A.  Java Encoding Schemes

B.  About the Authors

Index

 

The Java EE 5 Tutorial

Java Coffee Cup logo
PreviousContentsNext

Character Sets and Encodings

The following sections describe character sets and character encodings.

Character Sets

Acharacter set is a set of textual and graphic symbols, each of whichis mapped to a set of nonnegative integers.

The first character set used in computing was US-ASCII. It is limited inthat it can represent only American English. US-ASCII contains uppercase and lowercase Latinalphabets, numerals, punctuation, a set of control codes, and a few miscellaneous symbols.

Unicode defines a standardized, universal character set that can be extended to accommodateadditions. When the Java program source file encoding doesn’t support Unicode, you canrepresent Unicode characters as escape sequences by using the notation\uXXXX, whereXXXX isthe character’s 16-bit representation in hexadecimal. For example, the Spanish version of theDuke’s Bookstore message file uses Unicode for non-ASCII characters:

{"TitleCashier", "Cajero"},{"TitleBookDescription", "Descripci" + "\u00f3" + "n del Libro"},{"Visitor", "El visitante" + "\u00fa" + "mero "},{"What", "Qu" + "\u00e9" + " libros leemos"},{"Talk", " describe cómo los componentes de software de web pueden transformar la manera en que desarrollamos las aplicaciones para la web. Este libro es obligatorio para cualquier programador de respeto!"},{"Start", "Empezar a Comprar"},

Character Encoding

Acharacter encoding maps a character set to units of a specific widthand defines byte serialization and ordering rules. Many character sets have more than oneencoding. For example, Java programs can represent Japanese character sets using theEUC-JPorShift-JIS encodings, among others. Each encoding has rules for representing and serializinga character set.

The ISO 8859 series defines 13 character encodings that can represent texts indozens of languages. Each ISO 8859 character encoding can have up to 256characters. ISO-8859-1 (Latin-1) comprises the ASCII character set, characters with diacritics (accents, diaereses,cedillas, circumflexes, and so on), and additional symbols.

UTF-8 (Unicode Transformation Format, 8-bit form) is a variable-width character encoding that encodes16-bit Unicode characters as one to four bytes. A byte in UTF-8 isequivalent to 7-bit ASCII if its high-order bit is zero; otherwise, the charactercomprises a variable number of bytes.

UTF-8 is compatible with the majority of existing web content and provides accessto the Unicode character set. Current versions of browsers and email clients supportUTF-8. In addition, many new web standards specify UTF-8 as their character encoding.For example, UTF-8 is one of the two required encodings for XML documents(the other is UTF-16).

See AppendixFigure 37-6 for more information on character encodings in the Java 2platform.

Web components usually usePrintWriter to produce responses;PrintWriter automatically encodes using ISO-8859-1. Servletscan also output binary data usingOutputStream classes, which perform no encoding. Anapplication that uses a character set that cannot use the default encoding mustexplicitly set a different encoding.

For web components, three encodings must be considered:

  • Request

  • Page (JSP pages)

  • Response

Request Encoding

Therequest encoding is the character encoding in which parameters in an incoming requestare interpreted. Currently, many browsers do not send a request encoding qualifier withtheContent-Type header. In such cases, a web container will use the defaultencoding, ISO-8859-1, to parse request data.

If the client hasn’t set character encoding and the request data is encodedwith a different encoding from the default, the data won’t be interpreted correctly. Toremedy this situation, you can use theServletRequest.setCharacterEncoding(String enc) method to override thecharacter encoding supplied by the container. To control the request encoding from JSPpages, you can use the JSTLfmt:requestEncoding tag. You must call the method ortag before parsing any request parameters or reading any input from the request.Calling the method or tag once data has been read will notaffect the encoding.

Page Encoding

For JSP pages, thepage encoding is the character encoding in which the fileis encoded.

For JSP pages in standard syntax, the page encoding is determined from thefollowing sources:

  • The page encoding value of a JSP property group (seeSetting Properties for Groups of JSP Pages) whose URL pattern matches the page.

  • ThepageEncoding attribute of thepage directive of the page. It is a translation-time error to name different encodings in thepageEncoding attribute of the page directive of a JSP page and in a JSP property group.

  • TheCHARSET value of thecontentType attribute of thepage directive.

If none of these is provided, ISO-8859-1 is used as the defaultpage encoding.

For JSP pages in XML syntax (JSP documents), the page encoding isdetermined as described in section 4.3.3 and appendix F.1 of the XML specification.

ThepageEncoding andcontentType attributes determine the page character encoding of only thefile that physically contains thepage directive. A web container raises a translation-time errorif an unsupported page encoding is specified.

Response Encoding

Theresponse encoding is the character encoding of the textual response generated by aweb component. The response encoding must be set appropriately so that the charactersare rendered correctly for a given locale. A web container sets an initialresponse encoding for a JSP page from the following sources:

  • TheCHARSET value of thecontentType attribute of thepage directive

  • The encoding specified by thepageEncoding attribute of thepage directive

  • The page encoding value of a JSP property group whose URL pattern matches the page

If none of these is provided, ISO-8859-1 is used as the defaultresponse encoding.

Thejavax.servlet.ServletResponse.setCharacterEncoding,javax.servlet.ServletResponse.setContentType, andjavax.servlet.ServletResponse.setLocale methods can be called repeatedly to change the characterencoding. Calls made after the servlet response’sgetWriter method has been called or afterthe response is committed have no effect on the character encoding. Data issent to the response stream on buffer flushes (for buffered pages) or onencountering the first content on unbuffered pages.

Calls tosetContentType set the character encoding only if the given content typestring provides a value for thecharset attribute. Calls tosetLocale set thecharacter encoding only if neithersetCharacterEncoding norsetContentType has set the character encodingbefore. To control the response encoding from JSP pages, you can use theJSTLfmt.setLocale tag.

To obtain the character encoding for a locale, thesetLocale method checks thelocale encoding mapping for the web application. For example, to map Japanese tothe Japanese-specific encodingShift_JIS, follow these steps:

  1. Select the WAR.

  2. Click the Advanced Settings button.

  3. In the Locale Character Encoding table, Click the Add button.

  4. Enterja in the Extension column.

  5. EnterShift_JIS in the Character Encoding column.

If a mapping is not set for the web application,setLocale uses aApplication Server mapping.

The first application inChapter 5, JavaServer Pages Technology allows a user to choose an Englishstring representation of a locale from all the locales available to the Java2 platform and then outputs a date localized for that locale. To ensurethat the characters in the date can be rendered correctly for a widevariety of character sets, the JSP page that generates the date sets theresponse encoding to UTF-8 by using the following directive:

<%@ page contentType="text/html; charset=UTF-8" %>
PreviousContentsNext

Copyright © 2010, Oracle and/or its affiliates. All rights reserved.Legal Notices


[8]ページ先頭

©2009-2025 Movatter.jp