MySQL Cookbook Free Open Book

MySQL Cookbook

Previous Section Next Section

16.5 Encoding Special Characters in Web Output

16.5.1 Problem

Certain characters are special in HTML pages and must be encoded if you want to display them literally. Because database content often contains these characters, scripts that include query results in web pages should encode those results to prevent browsers from misinterpreting the information.

16.5.2 Solution

Use the methods that are provided by your API for performing HTML-encoding and URL-encoding.

16.5.3 Discussion

HTML is a markup language—it uses certain characters as markers that have a special meaning. To include literal instances of these characters in a page, you must encode them so that they are not interpreted as having their special meanings. For example, < should be encoded as &lt; to keep a browser from interpreting it as the beginning of a tag. Furthermore, there are actually two kinds of encoding, depending on the context in which you use a character. One encoding is appropriate for general HTML text, another is used for text that is part of a URL in a hyperlink.

The MySQL show-tables scripts shown in Recipe 16.3 and Recipe 16.4 are simple demonstrations of how to produce web pages using programs. But with one exception, the scripts have a common failing: they take no care to properly encode special characters that occur in the information retrieved from the MySQL server. (The exception is the JSP version of the script; the <c:out> tag used there handles encoding automatically, as we'll discuss shortly.)

As it happens, I deliberately chose information to display that is unlikely to contain any special characters, so they should work properly even in the absence of any encoding. However, in the general case, it's unsafe to assume that a query result will contain no special characters and thus you must be prepared to encode it. Neglecting to do this often results in scripts that generate pages containing malformed HTML that displays incorrectly.

This section describes how to handle special characters, beginning with some general principles, and then discusses how each API implements encoding support. The API-specific examples show how to process information drawn from a database table, but they can be adapted to any content you include in a web page, no matter its source.

16.5.4 General Encoding Principles

One form of encoding applies to characters that are used in writing HTML constructs, another applies to text that is included in URLs. It's important to understand this distinction so that you don't encode text inappropriately. Note too that encoding text for inclusion in a web page is an entirely different issue than encoding special characters in data values for inclusion in a SQL statement. The latter issue is discussed in Recipe 2.8.

16.5.4.1 Encoding characters that are special in HTML

HTML markup uses < and > characters to begin and end tags, & to begin special entity names (such as &nbsp; to signify a non-breaking space), and " to quote attribute values in tags (such as <p align="left">). Consequently, to display literal instances of these characters, you must encode them as HTML entities so that browsers or other clients understand your intent. To do this, convert <, >, &, and " to the corresponding HTML entity designators &lt; (less than), &gt; (greater than), &amp; (ampersand), and &quot; (quote).

Suppose you want to display the following string literally in a web page:

Paragraphs begin and end with <p> & </p> tags.

If you send this text to the client browser exactly as shown, the browser will misinterpret it. (The <p> and </p> tags will be taken as paragraph markers and the & may be taken as the beginning of an HTML entity designator.) To display the string the way you intend, the special characters should be encoded as the &lt;, &gt;, and &amp;, entities:

Paragraphs begin and end with &lt;p&gt; &amp; &lt;/p&gt; tags.

The principle of encoding text this way is also useful within tags. For example, HTML tag attribute values usually are enclosed within double quotes, so it's important to perform HTML-encoding on attribute values. Suppose you want to include a text input box in a form, and you want to provide an initial value of Rich "Goose" Gossage to be displayed in the box. You cannot write that value literally in the tag like this:

<input type="text" name="player_name" value="Rich "Goose" Gossage" />

The problem here is that the double-quoted value attribute includes internal double quotes, which makes the <input> tag malformed. The proper way to write it is to encode the double quotes:

<input type="text" name="player_name" value="Rich &quot;Goose&quot; Gossage" />

When a browser receives this text, it will decode the &quot; entities back to " characters and interpret the value attribute value properly.

16.5.4.2 Encoding characters that are special in URLs

URLs for hyperlinks that occur within HTML pages have their own syntax, and their own encoding. This encoding applies to attributes within several tags:

<a href="URL">
<img src="URL">
<form action="URL">
<frame src="URL">

Many characters have special meaning within URLs, such as :, /, ?, =, &, and ;. The following URL contains some of these characters:

http://apache.snake.net/myscript.php?id=428&name=Gandalf

Here the : and / characters segment the URL into components, the ? character indicates that parameters are present, and the & characters separates the parameters, each of which is specified as a name=value pair. (The ; character is not present in the URL just shown, but commonly is used instead of & to separate parameters.) If you want to include any of these characters literally within a URL, you must encode them to prevent the browser from interpreting them with their usual special meaning. Other characters such as spaces require special treatment as well. Spaces are not allowed within a URL, so if you want to reference a page named my home page.html on the site apache.snake.net, the URL in the following hyperlink won't work:

<a href="http://apache.snake.net/my home page.html">My Home Page</a>

URL-encoding for special and reserved characters is performed by converting each such character to % followed by two hexadecimal digits representing the character's ASCII code. For example, the ASCII value of the space character is 32 decimal, or 20 hexadecimal, so you'd write the preceding hyperlink like this:

<a href="http://apache.snake.net/my%20home%20page.html">My Home Page</a>

Sometimes you'll see spaces encoded as + in URLs. This too is legal.

16.5.4.3 Encoding interactions

Be sure to encode information properly for the context in which you're using it. Suppose you want to create a hyperlink to trigger a search for items matching a search term, and you want the term itself to appear as the link label that is displayed in the page. In this case, the term appears as a parameter in the URL, and also as HTML text between the <a> and </a> tags. If the search term is "cats & dogs", the unencoded hyperlink construct looks like this:

<a href="/cgi-bin/myscript?term=cats & dogs">cats & dogs</a>

That is incorrect because & is special in both contexts and the spaces are special in the URL. The link should be written like this instead:

<a href="/cgi-bin/myscript?term=cats%20%26%20dogs">cats &amp; dogs</a>

Here, & is HTML-encoded as &amp; for the link label, and is URL-encoded as %26 for the URL, which also includes spaces encoded as %20.

Granted, it's a pain to encode text before writing it to a web page, and sometimes you know enough about a value that you can skip the encoding (see the sidebar Do You Always Need to Encode Web Page Output?"). But encoding is the safe thing to do most of the time. Fortunately, most APIs provide functions to do the work for you. This means you need not know every character that is special in a given context. You just need to know which kind of encoding to perform, and call the appropriate function to produce the intended result.

Do You Always Need to Encode Web Page Output?

If you know a value is legal in a particular context within a web page, you need not encode it. For example, if you obtain a value from an integer-valued column in a database table that cannot be NULL, it must necessarily be an integer. No HTML- or URL-encoding is needed to include the value in a web page, because digits are not special in HTML text or within URLs. On the other hand, suppose you solicit an integer value using a field in a web form. You might be expecting the user to provide an integer, but the user might be confused and enter an illegal value. You could handle this by displaying an error page that shows the value and explains that it's not an integer. But if the value contains special characters and you don't encode it, the page won't display the value properly, possibly confusing the user further.

16.5.5 Encoding Special Characters Using Web APIs

The following encoding examples show how to pull values out of MySQL and perform both HTML-encoding and URL-encoding on them to generate hyperlinks. Each example reads a table named phrase that contains short phrases, using its contents to construct hyperlinks that point to a (hypothetical) script that searches for instances of the phrases in some other table. The table looks like this:

mysql> SELECT phrase_val FROM phrase ORDER BY phrase_val;
+--------------------------+
| phrase_val               |
+--------------------------+
| are we "there" yet?      |
| cats & dogs              |
| rhinoceros               |
| the whole > sum of parts |
+--------------------------+

The goal here is to generate a list of hyperlinks using each phrase both as the hyperlink label (which requires HTML-encoding) and in the URL as a parameter to the search script (which requires URL-encoding). The resulting links look like this:

<a href="/cgi-bin/mysearch.pl?phrase=are%20we%20%22there%22%20yet%3F">
are we &quot;there&quot; yet?</a>
<a href="/cgi-bin/mysearch.pl?phrase=cats%20%26%20dogs">
cats &amp; dogs</a>
<a href="/cgi-bin/mysearch.pl?phrase=rhinoceros">
rhinoceros</a>
<a href="/cgi-bin/mysearch.pl?phrase=the%20whole%20%3E%20sum%20of%20parts">
the whole &gt; sum of parts</a>

The links produced by some APIs will look slightly different, because they encode spaces as + rather than as %20.

16.5.5.1 Perl

The Perl CGI.pm module provides two methods, escapeHTML( ) and escape( ), that handle HTML-encoding and URL-encoding. There are three ways to use these methods to encode a string $str:

  • Invoke escapeHTML( ) and escape( ) as CGI class methods using a CGI:: prefix:

    use CGI;
    printf "%s\n%s\n", CGI::escape ($str), CGI::escapeHTML ($str);
  • Create a CGI object and invoke escapeHTML( ) and escape( ) as object methods:

    use CGI;
    my $cgi = new CGI;
    printf "%s\n%s\n", $cgi->escape ($str), $cgi->escapeHTML ($str);
  • Import the names explicitly into your script's namespace. In this case, neither a CGI object nor the CGI:: prefix is necessary and you can invoke the methods as standalone functions. The following example imports the two method names in addition to the set of standard names:

    use CGI qw(:standard escape escapeHTML);
    printf "%s\n%s\n", escape ($str), escapeHTML ($str);

I prefer the last alternative because it is consistent with the CGI.pm function call interface that you use for other imported method names. Just remember to include the encoding method names in the use CGI statement for any Perl script that requires them, or you'll get "undefined subroutine" errors when the script executes.

The following code reads the contents of the phrase table and produces hyperlinks from them using escapeHTML( ) and escape( ):

my $query = "SELECT phrase_val FROM phrase ORDER BY phrase_val";
my $sth = $dbh->prepare ($query);
$sth->execute ( );
while (my ($phrase) = $sth->fetchrow_array ( ))
{
    # URL-encode the phrase value for use in the URL
    # HTML-encode the phrase value for use in the link label
    my $url = "/cgi-bin/mysearch.pl?phrase=" . escape ($phrase);
    my $label = escapeHTML ($phrase);
    print a ({-href => $url}, $label) . br ( ) . "\n";
}
16.5.5.2 PHP

In PHP, the htmlspecialchars( ) and urlencode( ) functions perform HTML-encoding and URL-encoding. They're used as follows:

$query = "SELECT phrase_val FROM phrase ORDER BY phrase_val";
$result_id = mysql_query ($query, $conn_id);
if ($result_id)
{
    while (list ($phrase) = mysql_fetch_row ($result_id))
    {
        # URL-encode the phrase value for use in the URL
        # HTML-encode the phrase value for use in the link label
        $url = "/mcb/mysearch.php?phrase=" . urlencode ($phrase);
        $label = htmlspecialchars ($phrase);
        printf ("<a href=\"%s\">%s</a><br />\n", $url, $label);
    }
    mysql_free_result ($result_id);
}
16.5.5.3 Python

In Python, the cgi and urllib modules contain the relevant encoding methods. cgi.escape( ) performs HTML-encoding and urllib.quote( ) does URL-encoding:

import cgi
import urllib

query = "SELECT phrase_val FROM phrase ORDER BY phrase_val"
cursor = conn.cursor ( )
cursor.execute (query)
for (phrase,) in cursor.fetchall ( ):
    # URL-encode the phrase value for use in the URL
    # HTML-encode the phrase value for use in the link label
    url = "/cgi-bin/mysearch.py?phrase=" + urllib.quote (phrase)
    label = cgi.escape (phrase, 1)
    print "<a href=\"%s\">%s</a><br />" % (url, label)
cursor.close ( )

The first argument to cgi.escape( ) is the string to be HTML-encoded. By default, this function converts <, >, and & characters to their corresponding HTML entities. To tell cgi.escape( ) also to convert double quotes to the &quot; entity, pass a second argument of 1, as shown in the example. This is especially important if you're encoding values to be placed into a double-quoted tag attribute.

16.5.5.4 Java

The <c:out> JSTL tag automatically performs HTML-encoding for JSP pages. (Strictly speaking, it performs XML-encoding, but the set of characters affected is <, >, &, ", and ', which includes all those needed for HTML-encoding.) By using <c:out> to display text in a web page, you need not even think about converting special characters to HTML entities. If for some reason you want to suppress encoding, invoke <c:out> like this:

<c:out value="value to display" encodeXML="false" />

To URL-encode parameters for inclusion in a URL, use the <c:url> tag. Specify the URL string in the tag's value attribute, and include any parameter values and names in <c:param> tags in the body of the <c:url> tag. A parameter value can be given either in the value attribute of a <c:param> tag or in its body. Here's an example that shows both ways:

<c:url var="urlStr" value="myscript.jsp">
    <c:param name="id" value ="47" />
    <c:param name="color">sky blue</c:param>
</c:url>

This will URL-encode the values of the id and color parameters and add them to the end of the URL. The result is placed in an object named urlStr, which you can display as follows:

<c:out value="${urlStr}" />

The <c:url> tag does not encode special characters such as spaces in the string supplied in its value attribute. You must encode them yourself, so it's probably best just to avoid creating pages with spaces in their names, to avoid the likelihood that you'll need to refer to them.

The <c:out> and <c:url> tags can be used as follows to display entries from the phrase table:

<sql:query var="rs" dataSource="${conn}">
    SELECT phrase_val FROM phrase ORDER BY phrase_val
</sql:query>

<c:forEach var="row" items="${rs.rows}">
    <%-- URL-encode the phrase value for use in the URL --%>
    <%-- HTML-encode the phrase value for use in the link label --%>
    <c:url var="urlStr" value="/mcb/mysearch.jsp">
        <c:param name="phrase" value ="${row.phrase_val}" />
    </c:url>
    <a href="<c:out value="${urlStr}" />"><c:out value="${row.phrase_val}" />
    </a>
    <br />
</c:forEach>
    Previous Section Next Section
    Index: [SYMBOL][A][B][C][D][E][F][G][H][I][J][K][L][M][N][O][P][Q][R][S][T][U][V][W][X][Y][Z]


         Main Menu
    Main Page
    Table of content
    Copyright
    Preface
    Chapter 1. Using the mysql Client Program
    Chapter 2. Writing MySQL-Based Programs
    Chapter 3. Record Selection Techniques
    Chapter 4. Working with Strings
    Chapter 5. Working with Dates and Times
    Chapter 6. Sorting Query Results
    Chapter 7. Generating Summaries
    Chapter 8. Modifying Tables with ALTER TABLE
    Chapter 9. Obtaining and Using Metadata
    Chapter 10. Importing and Exporting Data
    Chapter 11. Generating and Using Sequences
    Chapter 12. Using Multiple Tables
    Chapter 13. Statistical Techniques
    Chapter 14. Handling Duplicates
    Chapter 15. Performing Transactions
    Chapter 16. Introduction to MySQL on the Web
    16.1 Introduction
    16.2 Basic Web Page Generation
    16.3 Using Apache to Run Web Scripts
    16.4 Using Tomcat to Run Web Scripts
    16.5 Encoding Special Characters in Web Output
    Chapter 17. Incorporating Query Resultsinto Web Pages
    Chapter 18. Processing Web Input with MySQL
    Chapter 19. Using MySQL-Based Web Session Management
    Appendix A. Obtaining MySQL Software
    Appendix B. JSP and Tomcat Primer
    Appendix C. References
    Colophone
    Index


    More Books
    PHP Hacks
    Processing Xml With Java - A Guide To Sax, Dom, Jdom, Jaxp, And Trax
    The Koran (Holy Qur'an)
    Macromedia Flash 8 Bible
    Search Engine Optimization for Dummies
    YouTube Traffic
    PHP 5 for Dummies
    Harry Potter and The Chamber of Secrets
    Harry Potter and the Sorcerer's Stone
    The Pilgrim's Progress
    Wireless Hacks
    Flash Hacks. 100 Industrial-Strength Tips & Tools
    PayPal Hacks. 100 Industrial-Strength Tips and Tools
    Amazon Hacks
    Pdf Hacks
    The Da Vinci Code
    Google Hacks
    The Holy Bible
    Windows XP For Dummies
    Harry Potter and the Half-Blood Prince
    Seo Book
    Upgrading and Repairing Networks
    Macromedia Dreamweaver 8 UNLEASHED
    Windows XP Annoyances
    Windows XP Hacks
    Microsoft Windows XP Power Toolkit
    Teach Yourself MS Office In 24Hours
    iPod & iTunes Missing Manual
    PC Hacks 100 Industrial-Strength Tips and Tools
    PC Overclocking, Optimization, and Tuning - 2th Edition
    PC Hardware In A Nutshell 3rd Edition
    PC Hardware in a Nutshell, 2nd Edition
    Upgrading and Repairing PCs
    Google for Dummies
    MySQL Cookbook
    Teach Yourself Macromedia Flash 8 In 24 Hours
    PHP CookBook
    Sams Teach Yourself JavaScript in 24 Hours
    PHP5 Manual
    Free Games Paper Airplanes
    500 Juegos Gratis 500 Giochi Gratis 500 Jeux Gratuits 500 Jogos Gratis 500 Kostenlose Spiele