Spidering

Here are a few more options relating to Robots.txt and how to limit search engines like Google and others that may support these rules, from indexing certain content.

Pattern matching :

To block access to all subdirectories that begin with private, you could use the following entry:

User-Agent: Googlebot
Disallow: /private*/

Dynamic generated pages :

To block access to all URLs that include a question mark (?), you could use the following entry:

User-agent: *
Disallow: /*?*

Matching the end characters of the URL using $
You can use the $ character to specify matching the end of the URL. For instance, to block an URLs that end with .php, you could use the following entry:

User-Agent: Googlebot
Disallow: /*.php$

You can use this pattern matching in combination with the Allow directive. For instance, if a ? indicates a session ID, you may want to exclude all URLs that contain them to ensure Googlebot doesn't crawl duplicate pages. But URLs that end with a ? may be the version of the page that you do want included. For this situation, you can set your robots.txt file as follows:

User-agent: *
Allow: /*?$
Disallow: /*?

The Disallow:/ *? line will block any URL that includes a ? (more specifically, it will block any URL that begins with your domain name, followed by any string, followed by a question mark, followed by any string).

The Allow: /*?$ line will allow any URL that ends in a ? (more specifically, it will allow any URL that begins with your domain name, followed by a string, followed by a ?, with no characters after the ?).

Most of the above infomation above is from the webmaster help section at Google.