How to develop WhatWeb 0.4 plugins

EDB-ID:

13654

CVE:

N/A


Platform:

Multiple

Published:

2010-03-29

How to develop WhatWeb 0.4 plugins
----------------------------------
by Andrew Horton aka urbanadventurer. MorningStar Security http://www.morningstarsecurity.com/
Revision 1.1, 29th March 2010.


Contents
=================================================
1. Introduction to WhatWeb
2. Introduction to WhatWeb plugins
	General aims of a plugin
	Methods to identify systems
	Important files and folders
	Anatomy of a plugin
3. Research background information
4. Collect samples
	Website Showcases
	Using Search Engines
	Forums for website development with the cms	
5. Analyze samples
	Read the source of a couple of samples
	Collect HTML and HTTP headers from samples
	Remove incorrectly identified samples
	Examine the samples with WhatWeb
	Remove more incorrectly identified samples with the whatweb report		
	Use find-common-stuff to automatically identify common strings in the samples
	Analyse HTTP headers and cookies
	Read more HTML source
6. Review of unique patterns identified
7. Write the plugin
8. Closing notes
9. Resources


1. Introduction to WhatWeb
=================================================

WhatWeb lets you identify content management systems (CMS), blogging platforms, stats/analytics packages, javascript libraries, servers and more. When you visit a website in your browser the transaction includes many unseen hints about how the webserver is set up and what software is delivering the webpage. Some of these hints are obvious, eg. "Powered by XYZ" and others are more subtle. WhatWeb recognises these hints and reports what it finds.

WhatWeb has many plugins and needs community support to develop more. Plugins can identify systems with obvious identifying hints removed by also looking for subtle clues. For example, a WordPress site might remove the tag <meta name="generator" content="WordPress 2.6.5"> but the WordPress plugin also looks for "wp-content" which is less easy to disguise. Plugins are flexible and can return any datatype, for example plugins can return version numbers, email addresses, account ID's and more.

There are both passive and aggressive plugins, passive plugins use information on the page, in cookies and in the URL to identify the system. A passive request is as light weight as a simple GET / HTTP/1.1 request so it is suitable for large scale scanning of websites. Aggressive plugins guess URLs and request more files.


2. Introduction to WhatWeb Plugins
=================================================

Plugins are easy to write, you don't need to know ruby to make them but it helps.

General aims of a plugin
------------------------

Most plugins have a primary aim which is to identify a type of system based on signatures. The system could be a:

	* Content Management System
	* Javascript Library
	* HTTP Server
	* Application Framework
	
Some plugins do not have the aim to identify a specific type of system. Instead they try to give information that can be used to identify unanticipated systems or can be used for all types of websites. These plugins are:
	
	* Title
	* MD5 hash
	* Meta generator tag name
	* Uncommon HTTP headers
	

Methods to identify systems
---------------------------
There are 4 main methods to identify a CMS or web application. They are:

	1. Matching patterns in the HTTP headers and HTML of a simple webpage request
	2. Testing for URLs and identifying patterns in the HTML
	3. Testing for URLs and recognising the MD5 hash of the HTML
	4. Testing for URLs and simply noting they exist or return an HTTP status 200 code.

WhatWeb supports all 4 methods however the 1st method is the most useful in large scale scanning. It is also the most efficient by trading off knowledge for network bandwidth and time.
Support for the first method is the most developed method within WhatWeb and is discussed in detail in this document. Future development of WhatWeb will add more user friendly support for methods 2 through 4 which come under the purview of aggressive plugins.


Important files and folders
---------------------------

The important folders to plugins are:

	* disabled-plugins/
	* plugin-development/
	* plugin-development/tests/	
	* plugins/

All .rb files in the plugins/ folder are loaded by WhatWeb. To disable a plugin, move it into the disabled-plugins/ folder.

The plugin-development folder contains some tools that are useful in developing plugins.
The tools are:

	* find-common-stuff		- This searches for common strings among a set of HTML files
	* wget-list				- This downloads a list of example websites

The plugin-development/tests folder contains example webpages of CMS's to study. The wget-list will create two files for each example webpage. A .html file and a .meta file.




Anatomy of a plugin
-------------------

This is a typical plugin. It identifies the Drupal framework and it's split into sections and given line numbers.


->-----------------------------------------------------------------------------------------------------------
1	Plugin.define "Drupal" do
2	author "Andrew Horton"
3	version "0.1"
4	description "Drupal is an opensource CMS written in PHP. Homepage: http://www.drupal.org"
-<-----------------------------------------------------------------------------------------------------------

Line 1. has the name. This name can be referred to on the commandline in a case insensitive way.

For example, the following works:

	$ ./whatweb -pdrupal www.example.com
	
Line 2. has the author. Just fill in your name between the double quotes.
Line 3. contains the version number. It's up to you what number to choose.
Line 4. Contains the description. This should contain a description of what the plugin identifies that anyone can understand. It can be many lines but must start and end with double quotes.

Note that the author, version and description follow the format:
	
	field-name field-content
	
On the left is the name of the variable and on the right, separated by a space is the value. This type of variable declaration isn't ruby code, it's specific to the plugins and only works for certain variable names.

The list of variable names that can be declared in a plugin in this manner are:
	
	* author
	* version
	* description
	* examples
	* matches


->-----------------------------------------------------------------------------------------------------------
5	# hard to identify
6	#<a href="http://drupal.org"><img src="/dagboek/misc/powered-black-80x15.png" alt="Powered by Drupal, an open source content management system" title="Powered by Drupal, an open source content management system" width="80" height="15" /></a>  </div>
7	#  <script type="text/javascript" src="/misc/drupal.js"></script>
8	#  <script type="text/javascript" src="/main/misc/drupal.js"></script>
9	# @import "/misc/drupal.css";
10	# Set-Cookie: SESS6bdd09d4debccdc3a0f49becc449e8d5=2sq674vjn6vig48e3podh3j8e2; expires=Fri, 11 Dec 2009 15:37:52 GMT; path=/; domain=.moby.com
11	# Set-Cookie: SESS9795bcd4ea70e3f846e84f29f9491636=57eafcca6400d894772a136fb5889b92; expires=Fri, 11-Dec-2009 15:38:25 GMT; path=/; domain=.save-your-future.com
12
13
14	examples %w| amnesty.org/ appel.nasa.gov/ beta.worldbank.org/ entergy.pewclimate.org/ labs.divx.com/ lindenlab.com/ littlestarprints.com moby.com/ myplay.com/ sequelnaturals.com/ teen.secondlife.com/ www.artwaves.de www.asys.com.br/ www.atomicbop.net www.cristal.com.pe/?adulto=si www.dutchbutnotfromholland.eu/ www.elespectador.com/ www.ensembles.com.ph/ www.foxsearchlight.com/index.php www.freshbrain.org/ www.icsalabs.com/ www.johnnycashonline.com/ www.journalismcenter.org/ www.jovenscriativos.com.br/ www.koalafoundation.org.au/ www.la2day.com/ www.moove.be www.mtv.co.uk/channel/flux www.mulinobianco.it/ www.multiways.com/ www.nowpublic.com/ www.pravda.lt/ www.realismssoftware.com/ www.save-your-future.com www.shock.com.co/ www.sosojuicy.com/ www.spreadfirefox.com/ www.tidningenresultat.se www.ubuntu.com/ www.universitytowers.net/ www.warnerbrosrecords.com |
15
-<-----------------------------------------------------------------------------------------------------------

Lines 5 through to 11 are comments. Each commented line must begin with a # character and this is a standard ruby way to comment code.

Line 14 is a list of example websites. The examples prefix of %w| means an array of elements separated by whitespace. The individual examples are URLs. If they are missing the http:// or https:// then http:// is assumed.

If you prefer you can list the examples like this:

	examples %w|
	http://www.example.com
	http://www.example2.com
	http://www.site.com/blah/
	|

->-----------------------------------------------------------------------------------------------------------
16	matches [
17	{:name=>"/misc/drupal.js",
18	:probability=>100,
19	:regexp=>/<script type="text\/javascript" src="[^\"]*\/misc\/drupal.js[^\"]*"><\/script>/},
20
21	{:name=>"Powered by link",
22	:probability=>100,
23	:regexp=>/<[^>]+alt="Powered by Drupal, an open source content management system"/},
24
25	{:name=>"/misc/drupal.css",
26	:probability=>100,
27	:regexp=>/@import "[^\"]*\/misc\/drupal.css"/},
28
29	{:name=>"jQuery.extend(Drupal.settings,",
30	:probability=>100,
31	:text=>'jQuery.extend(Drupal.settings,'},
32
33	{:name=>"Drupal.extend(",
34	:probability=>100,
36	:text=>'Drupal.extend('}
37	]
-<-----------------------------------------------------------------------------------------------------------

This section is a list of patterns to match against the webpage. Matches is an array and each element of the array is a hash and is surrounded by {} brackets. Notice that each pattern has a comma after it except for the last one. This is the normal ruby method of defining an array except that there is whitespace between matches and the content.

Lines 17 through 19 define the first pattern. 
Line 17 defines the pattern name. The name can be anything that describes what it's matching.
Line 18 defines the probability of the pattern correctly identifying the system. It's not a real probability, instead it refers to the certainty that the match correctly identifies the system:

The probability values are:

	 25	= Maybe
	 75	= Probably
	100	= Certain

Line 19 contains the pattern to match. It is a regular expression but could be any of the following list:
	
	* regexp	- Regular Expression. Standard ruby regular expression surrounded by slashes.
	* text		- Simple string of text surrounded by " or ' quotes
	* ghdb		- Google Hacking Database. This is a google-like query that supports a few parameters.

The parameters supported by ghdb are:
	
	* inurl:	- the following string is in the URL
	* intitle:	- the following string is between the <title> </title> tags
	* filetype:	- the following string is the file extension, eg. PDF, JPG, RB, etc.
	* -			- the following string is not matched on the page

The match used on Line 19 is regexp and the pattern is:
	 /<script type="text\/javascript" src="[^\"]*\/misc\/drupal.js[^\"]*"><\/script>/

The slash needs to be escaped with a backslash. That is why "text/javascript" is written as "text\/javascript". This is a standard ruby regular expression which differs slightly from regular expressions in other languages. To learn to write regular expressions visit http://rubular.com/ where you can copy & paste some HTML into the box then test out different regular expressions to see if they match.

->-----------------------------------------------------------------------------------------------------------
38.	def passive
39.		m=[]
40.		#SESS 9795bcd4ea70e3f846e84f29f9491636 =6b74f8aff4bf7d34d181a6a380d1ec7b; expires=Tue, 15-Dec-2009 15:21:24 GMT; path=/; domain=.save-your-future.com
41.		m << {:name=>"SESS Drupal Cookie", :probability=>75 } if @meta["set-cookie"] =~ /^SESS[a-z0-9]{32}=[a-z0-9]{32}/
42.		m
43.	end


44.	end
-<-----------------------------------------------------------------------------------------------------------

Lines 38 through 43 defined the passive function. This function is called everytime the plugin is matched against a webpage.
Functions are able to access the following variables:

	* @body		- The HTML body
	* @meta		- The HTTP Headers include cookies
	* @status	- The HTTP status code. 200 is successful, 404 is not found.
  	* @base_uri	- The URL

The passive plugin on line 39 creates an empty array called m. On line 42 it returns that array. The m array will either be empty or will have the sames fields as the patterns in the match array.

Line 40 is a comment which contains a sample session cookie

Line 41 adds the hash to m if the @meta array element 'set-cookie' matches the regular expression /^SESS[a-z0-9]{32}=[a-z0-9]{32}/
This regexp means a line that starts with SESS followed by 32 lowercase letters or numbers followed by the equals sign which is followed by 32 lowercase letters or numbers.

Line 44 ends the plugin which was started on line 1.


	
3. Research background information
=================================================

Go to the homepage of the software or CMS you are researching and learn about it. 

Look for:
	* Requirements, eg. the type of web server and languages it requires
	* Demo sites
	* Website showcases and portfolios
	* Download links
	* Documentation.

Some of this information will help in writing the plugin description and some will be useful in collecting samples. 

The information I gathered:
	* The SilverStripe homepage is http://www.silverstripe.com/
	* The opensource CMS software is at http://silverstripe.org/
	* Documentation of requirements is at http://doc.silverstripe.org/doku.php?id=server-requirements	
	* A project showcase at http://www.silverstripe.com/project-showcase/
	* A demo site at http://demo.silverstripe.com/
	
Using the information found I wrote the following plugin description:
"SilverStripe is an opensource CMS written in PHP. It can run on Apache, IIS or lighthttpd. Homepage: http://www.silverstripe.com"

Advanced hint: If you intend to make an aggressive plugin then you may wish to download multiple versions of the software.


4. Collect samples
=================================================

Your website samples should be representative of all SilverStripe installations. Take care not to just collect samples that are recently developed. Try to collect samples from a variety of sources and with a range of configurations.


Methods to find samples:
	* Search Engines
	* Website Showcases and Design Portfolios
	* Forums for website development with the cms

Website Showcases
-----------------
A website showcase is a collection of websites that show off the abilities of the web designers and the potential of the CMS. Try to find showcases that have websites designed by more than one web developer. Sites that are made by the same developer are not properly representative of all sites and may include the designers idiosyncrasies.

While reading the background information I found this showcase on the official homepage: http://www.silverstripe.com/project-showcase/

By Googling for "silverstripe showcase" I found the official community showcase at http://www.silverstripe.org/community-showcase.

Googling for "webdesign portfolio silverstripe" found some web designers with links to SilverStripe websites.

The portfolio at http://smartplugsdesign.com/portfolio/ contains the following SilverStripe sites:
	http://www.lisamarieelliott.com/
	http://www.moonlitekustoms.com/
	http://www.textiprints.com/
	http://www.intandemtheatre.org/
	http://www.stillrunnin.com/
	http://www.enamaine.org/

The best source of samples is the community showcase because it contains a variety of websites made by different webdesigners and the websites are included in the portfolio over a period of time. Websites created over a wide period of time are useful as samples because they will run different versions of the SilverStripe software. There are 98 portfolio pages so I collected samples from pages 1, 25, 50, 75 and 98.

The samples collected from the community portfolio:
	http://www.holistichealth.com/
	http://www.verus.com.tr/
	http://www.latenightdisco.com/
	http://www.arprostatecancer.org/
	http://www.cavendishimaging.com/
	http://beatone.co.uk/
	http://www.loguitos.com/
	http://www.easycash4life.com/
	http://www.gsbc.edu/
	http://www.bradyinc.com/
	http://www.monjasantner.de/
	http://www.robert80.de/
	http://customcanvas.fritzandandre.com/
	http://www.idee-cruises.de/
	http://www.maklerservice-greiz.de/
	http://www.kitesurfnelson.co.nz/
	http://www.moto-racepaint.com/
	http://www.hutmacherin.com/
	http://www.fuel.ie/silverstripe
	http://www.infinitestillness.ie/ss
	http://www.peterpanvakantieclub.nl/
	http://www.chapmansurfboards.com/
	http://www.fairtradenap.net/
	http://www.benpearce.co.nz/
	http://www.wend.nl/
	http://www.resoba.com/
	http://maungataniwha.co.nz/
	http://www.gyo.co.nz/
	http://www.firstgalaxies.org/
	http://www.clockwork.co.nz/
	http://www.upstreamgroup.com/
	http://www.moerakihavenmotel.co.nz/
	http://www.thelightboxdesigns.com/
	http://www.nadabakery.co.nz/
	http://comtel.com.au/
	http://victoriaoruwari.com/
	http://www.demconvention.com/
	http://www.whileyouwait.co.nz/
	http://omb.cl/
	http://www.executivemediasearch.com/
	http://www.naciondnb.com/
	http://www.thecelebritytruth.com/
	http://www.frussian.com.ar/
	http://unbounded.org/
	http://www.rcaforum.org.nz/
	http://charcoalinteriors.com.au/
	http://www.rcaforum.org.nz/
	http://www.andrewking.co.nz/
	http://www.elijahlofgren.com/silverstripe/
	http://www.silverstripe.com/

This may seem like a large number of samples to collect but I assume that some of these websites will no longer be running SilverStripe or may no longer exist at all.


Using Search Engines
--------------------

Introduction
------------
Google-dorks are strings that can be used with Google to discover specific systems. There is an extensive database of google-dorks in the Google Hacking Database hosted at http://www.hackersforcharity.org/ghdb/. 

	Example: “Powered by Vsns Lemon” intitle:”Vsns Lemon”

Using search engines to discover samples with google-dorks must not be the sole method used as these websites do not represent all sites on the internet running the system you are searching for. Webmasters have an incentive to remove the identifying strings discovered by google-dorks to reduce it's discoverability by malicious hackers.

Some WordPress installations include the text in the footer "Powered by WordPress" while this makes an excellent string to search for to find some installations, most WordPress sites do not include this string.


SilverStripe example
--------------------
First I searched for known google-dorks for SilverStripe by googling for "silverstripe google dork" and "silverstripe google hacking".
	
We won't know how to search for SilverStripe sites until we analyze some of the sample sites. Note that Google doesn't index html fragments, instead it just indexes words, titles, and urls.

I pick one sample to check, www.cavendishimaging.com. By reading the HTML source code I notice that the following line is included: <meta name="generator" http-equiv="generator" content="SilverStripe - http://www.silverstripe.com" >
This positively identifies the site is made with SilverStripe but this content won't be indexed by google. At first glance nothing else on the page looks as though it can identify it was made with SilverStripe.


Forums for website development with the cms:
--------------------------------------------

Webdesign forums often have links to websites provided by the web designers. Some of these websites will be of lower quality than found in a portfolio and some will also be in the default setup. Such websites would not be included in an offical portfolio.

By Googling for "silverstripe webdesign forum" I found the official SilverStripe Forum: http://silverstripe.org/connect-with-other-silverstripe-members/show/256356

Some website samples collected from the forum are:
	http://hungryhearts.no 
	http://weonline.in 
	http://belitsky.info/work/hartmann 
	http://kunstforum.as/
	http://www.choidoco.com/demo/
	http://www.tobychampion.co.uk/
	http://www.silverstripe.org.pl/



5. Anaylze samples
=================================================

I need to analyze the samples I have collected to find similarities that can be used to identify these websites as SilverStripe. First I will search for identifying features in the webpages and HTTP headers.

In step 2 I collected 62 SilverStripe samples. I assume that some of these websites are incorrectly listed as SilverStripe so I will keep that in mind.

	http://beatone.co.uk/
	http://belitsky.info/work/hartmann 
	http://charcoalinteriors.com.au/
	http://comtel.com.au/
	http://customcanvas.fritzandandre.com/
	http://hungryhearts.no 
	http://kunstforum.as/
	http://maungataniwha.co.nz/
	http://omb.cl/
	http://unbounded.org/
	http://victoriaoruwari.com/
	http://weonline.in 
	http://www.andrewking.co.nz/
	http://www.arprostatecancer.org/
	http://www.benpearce.co.nz/
	http://www.bradyinc.com/
	http://www.cavendishimaging.com/
	http://www.chapmansurfboards.com/
	http://www.choidoco.com/demo/
	http://www.clockwork.co.nz/
	http://www.demconvention.com/
	http://www.easycash4life.com/
	http://www.elijahlofgren.com/silverstripe/
	http://www.enamaine.org/
	http://www.executivemediasearch.com/
	http://www.fairtradenap.net/
	http://www.firstgalaxies.org/
	http://www.frussian.com.ar/
	http://www.fuel.ie/silverstripe
	http://www.gsbc.edu/
	http://www.gyo.co.nz/
	http://www.holistichealth.com/
	http://www.hutmacherin.com/
	http://www.idee-cruises.de/
	http://www.infinitestillness.ie/ss
	http://www.intandemtheatre.org/
	http://www.kitesurfnelson.co.nz/
	http://www.latenightdisco.com/
	http://www.lisamarieelliott.com/
	http://www.loguitos.com/
	http://www.maklerservice-greiz.de/
	http://www.moerakihavenmotel.co.nz/
	http://www.monjasantner.de/
	http://www.moonlitekustoms.com/
	http://www.moto-racepaint.com/
	http://www.naciondnb.com/
	http://www.nadabakery.co.nz/
	http://www.peterpanvakantieclub.nl/
	http://www.rcaforum.org.nz/
	http://www.resoba.com/
	http://www.robert80.de/
	http://www.silverstripe.com/
	http://www.silverstripe.org.pl/
	http://www.stillrunnin.com/
	http://www.textiprints.com/
	http://www.thecelebritytruth.com/
	http://www.thelightboxdesigns.com/
	http://www.tobychampion.co.uk/
	http://www.upstreamgroup.com/
	http://www.verus.com.tr/
	http://www.wend.nl/
	http://www.whileyouwait.co.nz/


Read the source of a couple of samples
--------------------------------------
Select at random 2 or 3 websites and read the HTML source carefully. Look for anything that isn't generic or anything that you wouldn't find on any website. Good places to scrutinise are headers, footers, url structures, filenames of javascript libraries and css files, and div naming schemes.

A fast visual inspection only identifies the meta generator tag:
<meta name="generator" http-equiv="generator" content="SilverStripe - http://www.silverstripe.com" >

A 2nd sample shows the following tag which includes a version number.
<meta name="generator" http-equiv="generator" content="SilverStripe 2.3.1 - http://www.silverstripe.com" >

I also notice the URL format of some images is interesting, eg. "/assets/galleries/cakes/_resampled/Banner-Nada-090.jpg"

The div id names appear generic, eg. <div id="BgContainer">, <div id="Footer"> and <div class="footerTop">. At this stage they aren't interesting because I expect these div names to change with themes.


Collect HTML and HTTP headers from samples
------------------------------------------
Make a separate folder for the plugin you are analyzing. I will make the folder, plugin-development/tests/silverstripe/

	$ cd whatweb-0.4/plugin-development/tests
	$ mkdir silverstripe
	$ cd silverstripe

Create a file in the silverstripe folder that contains the list of samples. I have called the file 'list'.

	$ ../../wget-list 
	Usage: ../../wget-list <file with list of urls>
	downloads each URL's html and headers into the current directory

In the plugin-development/ folder there is a script called wget-list. Use the script to download the samples into the silverstripe folder.

	$ ../../wget-list ./list 
	--2010-03-04 17:03:09--  http://beatone.co.uk/
	Resolving beatone.co.uk... 84.45.68.168
	Connecting to beatone.co.uk|84.45.68.168|:80... connected.
	HTTP request sent, awaiting response... 200 OK
	Length: unspecified [text/html]
	Saving to: `beatone.co.uk-.html'

		[   <=>                                                                    ] 10,785      19.2K/s   in 0.5s    

This takes a few minutes to complete. The script creates 2 files for each sample, an HTML and a META file which contains the HTTP headers.

	$ ls
	beatone.co.uk-.html                   www.demconvention.com-.meta               www.moerakihavenmotel.co.nz-.meta
	beatone.co.uk-.meta                   www.easycash4life.com-.html               www.monjasantner.de-.html
	belitsky.info-work-hartmann.html      www.easycash4life.com-.meta               www.monjasantner.de-.meta
	belitsky.info-work-hartmann.meta      www.elijahlofgren.com-silverstripe-.html  www.moonlitekustoms.com-.html
	charcoalinteriors.com.au-.html        www.elijahlofgren.com-silverstripe-.meta  www.moonlitekustoms.com-.meta
	charcoalinteriors.com.au-.meta        www.enamaine.org-.html                    www.moto-racepaint.com-.html
	comtel.com.au-.html                   www.enamaine.org-.meta                    www.moto-racepaint.com-.meta
	comtel.com.au-.meta                   www.executivemediasearch.com-.html        www.naciondnb.com-.html
	customcanvas.fritzandandre.com-.html  www.executivemediasearch.com-.meta        www.naciondnb.com-.meta
	customcanvas.fritzandandre.com-.meta  www.fairtradenap.net-.html                www.nadabakery.co.nz-.html
	hungryhearts.no.html                  www.fairtradenap.net-.meta                www.nadabakery.co.nz-.meta
	hungryhearts.no.meta                  www.firstgalaxies.org-.html               www.peterpanvakantieclub.nl-.html
	kunstforum.as-.html                   www.firstgalaxies.org-.meta               www.peterpanvakantieclub.nl-.meta
	kunstforum.as-.meta                   www.frussian.com.ar-.html                 www.rcaforum.org.nz-.html
	list                                  www.frussian.com.ar-.meta                 www.rcaforum.org.nz-.meta
	maungataniwha.co.nz-.html             www.fuel.ie-silverstripe.html             www.resoba.com-.html
	maungataniwha.co.nz-.meta             www.fuel.ie-silverstripe.meta             www.resoba.com-.meta
	omb.cl-.html                          www.gsbc.edu-.html                        www.robert80.de-.html
	omb.cl-.meta                          www.gsbc.edu-.meta                        www.robert80.de-.meta
	unbounded.org-.html                   www.gyo.co.nz-.html                       www.silverstripe.com-.html
	unbounded.org-.meta                   www.gyo.co.nz-.meta                       www.silverstripe.com-.meta
	victoriaoruwari.com-.html             www.holistichealth.com-.html              www.silverstripe.org.pl-.html
	victoriaoruwari.com-.meta             www.holistichealth.com-.meta              www.silverstripe.org.pl-.meta
	weonline.in.html                      www.hutmacherin.com-.html                 www.stillrunnin.com-.html
	weonline.in.meta                      www.hutmacherin.com-.meta                 www.stillrunnin.com-.meta
	www.andrewking.co.nz-.html            www.idee-cruises.de-.html                 www.textiprints.com-.html
	www.andrewking.co.nz-.meta            www.idee-cruises.de-.meta                 www.textiprints.com-.meta
	www.arprostatecancer.org-.html        www.infinitestillness.ie-ss.html          www.thecelebritytruth.com-.html
	www.arprostatecancer.org-.meta        www.infinitestillness.ie-ss.meta          www.thecelebritytruth.com-.meta
	www.benpearce.co.nz-.html             www.intandemtheatre.org-.html             www.thelightboxdesigns.com-.html
	www.benpearce.co.nz-.meta             www.intandemtheatre.org-.meta             www.thelightboxdesigns.com-.meta
	www.bradyinc.com-.html                www.kitesurfnelson.co.nz-.html            www.tobychampion.co.uk-.html
	www.bradyinc.com-.meta                www.kitesurfnelson.co.nz-.meta            www.tobychampion.co.uk-.meta
	www.cavendishimaging.com-.html        www.latenightdisco.com-.html              www.upstreamgroup.com-.html
	www.cavendishimaging.com-.meta        www.latenightdisco.com-.meta              www.upstreamgroup.com-.meta
	www.chapmansurfboards.com-.html       www.lisamarieelliott.com-.html            www.verus.com.tr-.html
	www.chapmansurfboards.com-.meta       www.lisamarieelliott.com-.meta            www.verus.com.tr-.meta
	www.choidoco.com-demo-.html           www.loguitos.com-.html                    www.wend.nl-.html
	www.choidoco.com-demo-.meta           www.loguitos.com-.meta                    www.wend.nl-.meta
	www.clockwork.co.nz-.html             www.maklerservice-greiz.de-.html          www.whileyouwait.co.nz-.html
	www.clockwork.co.nz-.meta             www.maklerservice-greiz.de-.meta          www.whileyouwait.co.nz-.meta
	www.demconvention.com-.html           www.moerakihavenmotel.co.nz-.html

The folder whatweb-0.4/plugin-development/tests/silverstripe now contains many .html and .meta files.

	$ head beatone.co.uk-.html 
	<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
	<html>
		<head>
		    <base href="http://beatone.co.uk/" ><!--[if IE 6]></base><![endif]-->
			<title>Be At One - London Bar, Bookings Central London, Great Cocktails London </title>
		    <meta name="generator" http-equiv="generator" content="SilverStripe - http://www.silverstripe.com" >
	<meta http-equiv="Content-type" content="text/html; charset=utf-8" >
	<meta http-equiv="Content-Language" content="en-US">

		    <link rel="shortcut icon" href="/favicon.ico">

This is a standard HTML file, this is the same as what you see when you select 'View Source' in a web browser.

	$ cat beatone.co.uk-.meta 
	HTTP/1.1 200 OK
	Date: Thu, 04 Mar 2010 04:04:35 GMT
	Server: Apache/2.2.3 (Debian) PHP/5.2.0-8+etch16 mod_ssl/2.2.3 OpenSSL/0.9.8c
	X-Powered-By: PHP/5.2.0-8+etch16
	Expires: Thu, 19 Nov 1981 08:52:00 GMT
	Cache-Control: no-cache, max-age=0, must-revalidate
	Pragma: no-cache
	Vary: Accept,User-Agent,Accept-Encoding
	Content-Type: text/html; charset=utf-8
	Via: 1.1 bc2
	Connection: Keep-Alive
	Set-Cookie: PHPSESSID=4d463f54abb74031c117569ca3aa3c61; path=/

These are the HTTP Headers the webserver sends before the HTML. Look for unusual cookie names and non-standard HTTP headers.


Remove incorrectly identified samples
-------------------------------------
To identify some samples that are not SilverStripe I grep for the generator tag and remove all occurances that include the word Silver. This leaves me with the following:

	$ grep generator *html | grep -v Silver
	omb.cl-.html:	<meta name="generator" content="dospuntocero.cl" >
	www.andrewking.co.nz-.html:<meta name="generator" content="WordPress 2.8.4" />
	www.easycash4life.com-.html:<meta name="generator" content="WordPress 2.8.2" />
	www.idee-cruises.de-.html:<meta name="generator" http-equiv="generator" content="cms.Koncepts - http://www.koncepts.de" />
	www.thecelebritytruth.com-.html:<meta name="generator" content="WordPress 2.8.4" />

These websites are obviously not SilverStripe so I delete their files and remove them from the list.

	$ rm omb.cl-.* www.andrewking.co.nz-.* www.easycash4life.com-.* www.idee-cruises.de-.* www.thecelebrity	truth.com-.* 


How many sites can we be certain are SilverStripe?
--------------------------------------------------
There are 57 website samples left:
	$ ls *html | wc -l
	57

Of these samples, 38 include the term silverstrip in the HTML	
	$ grep -li silverstripe *html | wc -l
	38


Examine the samples with WhatWeb
--------------------------------
Using whatweb before the plugin is written may show some interesting information. In this case it has identified the meta generator tag. However notice that some of these websites do not have the meta generator tag. Reasons could be that the webmaster has removed it or the website is no longer running SilverStripe.

This is also useful to find more samples that are not SilverStripe

$ ./whatweb -i ./plugin-development/tests/silverstripe/list 

ttp://belitsky.info/work/hartmann [301] md5[c112335e6a56038ca4ba4b906d6aee05], redirect-location[http://belitsky.info/work/hartmann/], server-header[Apache], title[301 Moved Permanently]
http://belitsky.info/work/hartmann/ [200] index-of, md5[21577203b9abc6091d99203295712f0c], server-header[Apache], title[Index of /work/hartmann]
http://charcoalinteriors.com.au/ [200] md5[bff9a28ebdc1cdfdb80743458606df1d], server-header[Apache/2.2.13 (Unix) mod_ssl/2.2.13 OpenSSL/0.9.8e-fips-rhel5 mod_bwlimited/1.4 mod_perl/2.0.4 Perl/v5.8.8], title[Home -Charcoal Interiors], x-powered-by-header[PHP/5.2.10]
http://customcanvas.fritzandandre.com/ [200] JQuery, Mailto, md5[452d4fd540f07c98e0288094d0bf959f], meta-generator[SilverStripe - http://www.silverstripe.com], server-header[Apache/2.2.9 (Debian) DAV/2 SVN/1.5.1 PHP/5.2.6-1+lenny6 with Suhosin-Patch mod_python/3.3.1 Python/2.5.2 mod_ruby/1.2.6 Ruby/1.8.7(2008-08-11) mod_ssl/2.2.9 OpenSSL/0.9.8g], title[Home], x-powered-by-header[PHP/5.2.6-1+lenny6]
http://comtel.com.au/ [200] Google-Analytics-GA[1388941], probably Joomla[com_search], md5[050d35f63e2d9cd46065cde83f876989], server-header[Apache/2.0.55 (Ubuntu) PHP/5.1.2], title[Comtel - Telephone Radio & Data Systems | Comtel], x-powered-by-header[PHP/5.1.2]
http://beatone.co.uk/ [200] Google-Analytics-GA[11953167], JQuery, md5[b85a0567c28cfc3ba050ccbc95c899c4], meta-generator[SilverStripe - http://www.silverstripe.com], server-header[Apache/2.2.9 (Debian) PHP/5.2.6-1+lenny6 with Suhosin-Patch mod_ssl/2.2.9 OpenSSL/0.9.8g], title[Be At One - London Bar, Bookings Central London, Great Cocktails London], x-powered-by-header[PHP/5.2.6-1+lenny6]
http://kunstforum.as/ ERROR: Socket error getaddrinfo: Name or service not known
http://hungryhearts.no [200] Google-Analytics-GA[2984373], JQuery, Mailto, md5[1156688f57dfb37d853d3d7326daaadc], meta-generator[SilverStripe - http://www.silverstripe.com], server-header[Apache/1.3.41 (Unix) PHP/5.2.6 mod_psoft_traffic/0.2 mod_ssl/2.8.31 OpenSSL/0.9.7a mod_macro/1.1.2], title[The Hungry Hearts. Pin-up performance band.], x-powered-by-header[PHP/5.2.6]
http://unbounded.org/ [200] Google-Analytics-urchin[97930], md5[34d6c3cfc3b9ba0c66a8942a490c259d], meta-generator[SilverStripe 2.0 - http://www.silverstripe.com], server-header[Apache/2.2.3 (Debian) DAV/2 PHP/5.2.6-1+lenny3 with Suhosin-Patch mod_ssl/2.2.3 OpenSSL/0.9.8g], title[unbounded], x-powered-by-header[PHP/5.2.6-1+lenny3]
http://www.arprostatecancer.org/ [200] Google-Analytics-GA[2447233], Mailto, md5[dc1a9f02efc52b1ebc72f2c8a0b03ae6], server-header[Apache/1.3.41 (Unix) PHP/5.2.6 mod_auth_passthrough/1.8 mod_log_bytes/1.2 mod_bwlimited/1.4 FrontPage/5.0.2.2635 mod_ssl/2.8.31 OpenSSL/0.9.7a], title[Arkansas Prostate Cancer Foundation], x-powered-by-header[PHP/5.2.6]
http://weonline.in [200] Google-Analytics-GA[8297705], Mailto, md5[f9d784e8b18c942fe7ea4ed8273630c9], meta-generator[SilverStripe 2.3.1 - http://www.silverstripe.com], server-header[Apache], title[Home. Weonline web design group. We love to do beautiful stuff for the web.], x-powered-by-header[PHP/5.2.9]
http://maungataniwha.co.nz/ [200] Google-Analytics-GA[3842018], JQuery, md5[58fbcc50e566ceaab813ecd38098e1df], server-header[Apache/2.2], title[Maungataniwha Lodge | New Zealand | Home]
http://victoriaoruwari.com/ [200] md5[2afe04307f5e3503efbe1c2c75b62511], server-header[Apache/1.3.41 (Unix) mod_ssl/2.8.31 OpenSSL/0.9.7a PHP/5.2.8 mod_perl/1.29 FrontPage/5.0.2.2510], title[Victoria Oruwari - Home], x-powered-by-header[PHP/5.2.8]
http://www.benpearce.co.nz/ [200] Google-Analytics-GA[1362535], JQuery, Mailto, Prototype, md5[4b1abe022c1c40c6091366c71c367cca], server-header[Apache/2.0.54 (Debian GNU/Linux) PHP/5.2.3-0.dotdeb.0 with Suhosin-Patch mod_ssl/2.0.54 OpenSSL/0.9.7e], title[Ben Pearce - artist], x-powered-by-header[PHP/5.2.3-0.dotdeb.0]
http://www.chapmansurfboards.com/ [200] Google-Analytics-GA[419314], JQuery, md5[9036620b334c3d2e7f6722e946277d29], meta-generator[SilverStripe 2.0 - http://www.silverstripe.com], server-header[Apache], title[Dale Chapman Surf Designs], x-powered-by-header[PHP/5.2.10]
http://www.cavendishimaging.com/ [200] Google-Analytics-GA[11469477], JQuery, md5[ef8fbfae1dec5750a55478ed295cb34d], meta-generator[SilverStripe - http://www.silverstripe.com], server-header[Apache/2.2.9 (Debian) PHP/5.2.6-1+lenny6 with Suhosin-Patch mod_ssl/2.2.9 OpenSSL/0.9.8g], title[Dentomaxillofacial Imaging & Anatomical Model Specialists - Cavendish Imaging], x-powered-by-header[PHP/5.2.6-1+lenny6]
http://www.demconvention.com/ ERROR: Socket error getaddrinfo: Name or service not known
http://www.clockwork.co.nz/ [200] md5[70e4b614d4ae85161014d05939ebd073], server-header[Apache], title[clockwork.co.nz]
http://www.choidoco.com/demo/ [200] md5[ccfe1940e9651416af58ff3f4a3eff77], meta-generator[SilverStripe - http://www.silverstripe.com], server-header[Apache/2.2.12 (Unix) mod_ssl/2.2.12 OpenSSL/0.9.8e-fips-rhel5 mod_auth_passthrough/2.1 mod_bwlimited/1.4 FrontPage/5.0.2.2635 PHP/5.2.11 mod_perl/2.0.4 Perl/v5.8.8], title[home], x-powered-by-header[PHP/5.2.11]
http://www.bradyinc.com/ [200] Google-Analytics-GA[13121212], Prototype, md5[34d5c1f763e3ceffaeae07de7de98f3d], meta-generator[SilverStripe - http://www.silverstripe.com], server-header[Apache/1.3.41 (Unix) FrontPage/5.0.2.2635 mod_ssl/2.8.31 OpenSSL/0.9.7m], title[Staffing Productivity Benchmarks » Brady & Associates], x-powered-by-header[PHP/5.2.11]
http://www.executivemediasearch.com/ [404] md5[588da43361637cd97f3096ab9ce70183], server-header[Apache], title[Error 404 - Not found]
http://www.fairtradenap.net/ [200] Google-Analytics-GA[1362535], Mailto, md5[f4f36e648290b8fb818ea7fb297a4944], server-header[Apache], title[Home], x-powered-by-header[PHP/5.2.9]
http://www.elijahlofgren.com/silverstripe/ [404] Google-Analytics-urchin[2328965], maybe Mambo, md5[89a0d093054c83ef60a842b2aa7ff48f], meta-generator[CMS Made Simple - Copyright (C) 2004-6 Ted Kulp. All rights reserved.], powered by...[CMSMS], server-header[lighttpd/1.4.22], title[404 Error - Elijah Lofgren's Website]
http://www.enamaine.org/ [200] Google-Analytics-GA[3359251], Mailto, md5[57cb1e27c565acff11ce5f8103696696], meta-generator[SilverStripe - http://www.silverstripe.com], server-header[Apache], title[Maine ENA Home | Maine ENA], x-powered-by-header[PHP/5.2.9]
http://www.fuel.ie/silverstripe [301] md5[cbee7d5cfda4e161caffa892cc08558a], redirect-location[http://www.fuel.ie/silverstripe/], server-header[Zeus/4.3], title[Error 301 Moved Permanently]
http://www.firstgalaxies.org/ [200] Google-Analytics-urchin[777185], md5[fde922a270e0b438789eef03f2bbc064], meta-generator[SilverStripe - http://www.silverstripe.com], server-header[Apache], title[A Resource for Research on the Most Distant Galaxies], x-powered-by-header[PHP/5.2.11]
http://www.frussian.com.ar/ [200] md5[54b8d6e69f5be47d29f8225126bf92da], meta-generator[SilverStripe 2.0 - http://www.silverstripe.com], server-header[Apache/2.2.11 (Unix) mod_ssl/2.2.11 OpenSSL/0.9.8i DAV/2 mod_auth_passthrough/2.1 mod_bwlimited/1.4 FrontPage/5.0.2.2635], title[Home], x-powered-by-header[PHP/5.2.9]
http://www.gyo.co.nz/ ERROR: Socket error getaddrinfo: Name or service not known
http://www.gsbc.edu/ [200] Google-Analytics-GA[276990], JQuery, Prototype, md5[dc166f92e1ed43f5435f60743c0d272a], meta-generator[SilverStripe - http://www.silverstripe.com], server-header[Apache/2.2.14 (Unix) mod_ssl/2.2.14 OpenSSL/0.9.8l DAV/2 mod_auth_passthrough/2.1 FrontPage/5.0.2.2635], title[Home » Golden State Baptist College], x-powered-by-header[PHP/5.2.11]
http://www.fuel.ie/silverstripe/ [200] Mailto, md5[c63c8ea00aa0624fb4df7989c92b172e], meta-generator[SilverStripe - http://www.silverstripe.com], server-header[Zeus/4.3], title[The Fuel/Silverstripe Demo Site » The Fuel/Silverstripe Demo Site]
http://www.infinitestillness.ie/ss [301] md5[cbee7d5cfda4e161caffa892cc08558a], redirect-location[http://www.infinitestillness.ie/ss/], server-header[Zeus/4.3], title[Error 301 Moved Permanently]
http://www.holistichealth.com/ [200] Google-Analytics-GA[6289330], md5[b59e703ec114ea8cd99da42af210f1a1], meta-generator[SilverStripe - http://www.silverstripe.com], server-header[Apache/1.3.41 (Unix) mod_auth_passthrough/1.8 mod_log_bytes/1.2 mod_bwlimited/1.4 FrontPage/5.0.2.2635 mod_ssl/2.8.31 OpenSSL/0.9.7a], title[Holistic Health International - Where Science and Caring Meet], x-powered-by-header[PHP/5.2.6]
http://www.hutmacherin.com/ [301] md5[d41d8cd98f00b204e9800998ecf8427e], redirect-location[/start], server-header[Apache/2.2.9 (Debian) DAV/2 SVN/1.5.1 PHP/5.2.6-1+lenny6 with Suhosin-Patch mod_python/3.3.1 Python/2.5.2 mod_ssl/2.2.9 OpenSSL/0.9.8g]
http://www.infinitestillness.ie/ss/ [200] Mailto, md5[77b579a3d6752d45bcd8718d906fe5c3], meta-generator[SilverStripe - http://www.silverstripe.com], server-header[Zeus/4.3], title[Infinite Stillness | Ki Massage & Reiki Healing Dublin 4]
http://www.hutmacherin.com/start [200] md5[a89fbf86cf798314dba620717b1d99b8], server-header[Apache/2.2.9 (Debian) DAV/2 SVN/1.5.1 PHP/5.2.6-1+lenny6 with Suhosin-Patch mod_python/3.3.1 Python/2.5.2 mod_ssl/2.2.9 OpenSSL/0.9.8g], title[Start | Isabell von Maltzahn | Hutmacherin aus Berlin]
http://www.intandemtheatre.org/ [200] Google-Analytics-GA[6603467], Prototype, md5[133a6893f85c4c41034ced6a7aec3e75], meta-generator[SilverStripe - http://www.silverstripe.com], server-header[Apache], title[Welcome | In Tandem Theatre], x-powered-by-header[PHP/5.2.9]
http://www.latenightdisco.com/ [200] Google-Analytics-GA[768894], md5[b81b6b18af21e3d5f7d0749aa483da64], meta-generator[SilverStripe - http://www.silverstripe.com], server-header[Apache/2.2.8 (Ubuntu) PHP/5.2.4-2ubuntu5.4 with Suhosin-Patch], title[Experience Central Arkansas hottest night club Discovery], x-powered-by-header[PHP/5.2.4-2ubuntu5.4]
http://www.lisamarieelliott.com/ [200] Google-Analytics-GA[3359251], md5[d844eb0937b79fc02c347930522fe490], meta-generator[SilverStripe - http://www.silverstripe.com], server-header[Apache], title[Home], x-powered-by-header[PHP/5.2.9]
http://www.loguitos.com/ [200] Joomla[1.0], maybe Mambo, md5[0eed48ee56f6a2b784ea010b1bfa15b8], meta-generator[Joomla! - Copyright (C) 2005 - 2007 Open Source Matters. All rights reserved.], server-header[Apache/2.2.10 (Unix) mod_ssl/2.2.10 OpenSSL/0.9.8i DAV/2 mod_auth_passthrough/2.1 mod_bwlimited/1.4 FrontPage/5.0.2.2635], title[Dise�o de logotipos - Loguitos - Dise�adores de logos profesionales - Inicio], x-powered-by-header[PHP/5.2.6]
http://www.moonlitekustoms.com/ [200] md5[cf3d07ae32e645d04a1acad6560ce668], meta-generator[SilverStripe - http://www.silverstripe.com], server-header[Apache], title[In The Shop - Moonlite Kustoms], x-powered-by-header[PHP/5.2.9]
http://www.maklerservice-greiz.de/ [200] Google-Analytics-GA[10587433], Prototype, md5[9ac4c5710cf3e694917e2a3949680fc7], meta-generator[SilverStripe - http://www.silverstripe.com], server-header[Apache/1.3 (Unix) mod_ssl/2.8.28 OpenSSL/0.9.8f AuthPG/1.3 FrontPage/5.0.2.2635], title[Ihr Maklerservice in Greiz: Steiniger Versicherungsmakler], x-powered-by-header[PHP/5.2.9]
http://www.moerakihavenmotel.co.nz/ [200] md5[12e0be3129dea7738500b21aeab0ff96], meta-generator[SilverStripe 2.0 - http://www.silverstripe.com], powered by...[:], server-header[Apache], title[Moeraki Haven Motel, Moreaki Motel Accommodation, Otago Motel Accommodation.], x-powered-by-header[PHP/5.2.11]
http://www.monjasantner.de/ [200] Google-Analytics-GA[10599117], JQuery, md5[c30a4c5e6b9aad2a7f435621deb2ef40], meta-generator[SilverStripe - http://www.silverstripe.com], server-header[Apache], title[Monja Santner » Home], x-powered-by-header[PHP/5.2.6-1+lenny6]
http://www.nadabakery.co.nz/ [200] Google-Analytics-GA[4761582], md5[68f3dd581e40f370927205103ca6903a], meta-generator[SilverStripe 2.3.1 - http://www.silverstripe.com], server-header[Apache/2], title[Home | Nada - New Zealand's Greatest Bakery], x-powered-by-header[PHP/5.2.12]
http://www.naciondnb.com/ [200] md5[d443c89dc137c8286aaa173ebc806176], server-header[Apache], title[NacionDNB]
http://www.moto-racepaint.com/ [200] Google-Analytics-GA[1912569], JQuery, md5[7c1db45c5801d09dac4539725c6cf658], meta-generator[SilverStripe - http://www.silverstripe.com], server-header[Apache/2.2.14 (Unix) mod_ssl/2.2.14 OpenSSL/0.9.8e-fips-rhel5 mod_bwlimited/1.4], title[MotoRacePaint Home], x-powered-by-header[PHP/5.2.11]
http://www.rcaforum.org.nz/ [200] Google-Analytics-GA[4693659], Mailto, Prototype, md5[5cbb333d29fb180f42142a9551e105a2], meta-generator[SilverStripe 2.3.0 - http://www.silverstripe.com], server-header[Apache], title[Homepage - RCA Forum], x-powered-by-header[PHP/5.2.1]
http://www.peterpanvakantieclub.nl/ [200] Google-Analytics-GA[1010482], JQuery, md5[011e5aa0e0e9599d10cd2913270959ea], meta-generator[SilverStripe 2.0 - http://www.silverstripe.com], server-header[Apache/2], title[Peter Pan Vakantieclub | Home], x-powered-by-header[PHP/5.2.4]
http://www.robert80.de/ [200] Lightbox, md5[e12ecd879e7a24eaf6630d0097f6d0c8], meta-generator[SilverStripe - http://www.silverstripe.com], server-header[Apache], title[Robert Müller 80 Freunde » Robert Müller 80 Freunde helfen], x-powered-by-header[PHP/5.2.13]
http://www.silverstripe.com/ [200] Google-Analytics-urchin[84547], JQuery, md5[25a7a15b8e40a9443ed6bd896705b7ba], meta-generator[SilverStripe - http://www.silverstripe.com], server-header[Apache/2.2.11 (Debian) PHP/5.2.6-1+lenny2 with Suhosin-Patch mod_ssl/2.2.11 OpenSSL/0.9.8g], title[SilverStripe.com - Open Source CMS / Framework], x-powered-by-header[PHP/5.2.6-1+lenny2]
http://www.stillrunnin.com/ [200] Google-Analytics-GA[3359251], md5[bd3f631ef6075169b66c44031e526184], meta-generator[SilverStripe - http://www.silverstripe.com], server-header[Apache/2.2.11 (Unix) mod_ssl/2.2.11 OpenSSL/0.9.8i DAV/2 mod_auth_passthrough/2.1 mod_bwlimited/1.4 FrontPage/5.0.2.2635], title[Still Runnin Magazine - Online Gearhead Ezine], x-powered-by-header[PHP/5.2.9]
http://www.silverstripe.org.pl/ [200] Google-Analytics-GA[8121843], md5[1d8badb4df0d05a72c665de18087ae2b], meta-generator[SilverStripe - http://www.silverstripe.com], server-header[Apache/2.0.63 (Unix) mod_ssl/2.0.63 OpenSSL/0.9.8e-fips-rhel5 mod_auth_passthrough/2.1 mod_bwlimited/1.4 FrontPage/5.0.2.2635 PHP/5.2.8], title[Serwis polskiej spoÅ‚ecznoÅ›ci SilverStripe » SilverStripe.org.pl], x-powered-by-header[PHP/5.2.8]
http://www.thelightboxdesigns.com/ [200] Mailto, md5[763842fb73491b1ccc9090219e299ed1], server-header[Apache], title[Graphic and Web Site Design Services in Brownsville, TX, McAllen, Harlingen and the Rio Grande Valley :: The Lightbox Designs]
http://www.textiprints.com/ [200] Google-Analytics-GA[10064088], md5[65c429cfbad7a5acccddb8eda664f13c], meta-generator[SilverStripe - http://www.silverstripe.com], server-header[Apache], title[TextiPrints - Digital Garment Printer - Ormond Beach, Florida], x-powered-by-header[PHP/5.2.9]
http://www.tobychampion.co.uk/ [500] md5[c5ea88dc871f71751f932a8a9bed884b], server-header[Apache/2.0.63 (FreeBSD) DAV/2 SVN/1.5.2 mod_python/3.3.1 Python/2.5.1 PHP/5.2.6 with Suhosin-Patch mod_ssl/2.0.63 OpenSSL/0.9.7e-p1 mod_fastcgi/2.4.6 mod_perl/2.0.3 Perl/v5.8.8], title[GET /], x-powered-by-header[PHP/5.2.12]
http://www.upstreamgroup.com/ [200] Google-Analytics-GA[3522744], md5[085964a490aa87a8d061828ebd63f35b], meta-generator[SilverStripe 2.3.1 - http://www.silverstripe.com], server-header[Apache], title[Upstream Group: Clarity, Perspective, Knowledge], x-powered-by-header[PHP/5.2.11]
http://www.wend.nl/ [200] Google-Analytics-GA[1010482], JQuery, md5[e08dcd1b16e7210f1762b49bb45d58ff], server-header[Apache/2], title[Wend - Home], x-powered-by-header[PHP/5.2.4]
http://www.verus.com.tr/ [200] Google-Analytics-GA[7233761], md5[54bfe62b41b88b472d9739d2ed47b30f], meta-generator[SilverStripe - http://www.silverstripe.com], server-header[Apache/2.2.8 (Ubuntu) mod_python/3.3.1 Python/2.5.2 PHP/5.2.4-2ubuntu5.10 with Suhosin-Patch mod_ssl/2.2.8 OpenSSL/0.9.8g mod_perl/2.0.3 Perl/v5.8.8], title[VERUS » ETKÄ°NLÄ°K ÇÖZÃœMLERÄ° » Ä°Åž ÇÖZÃœMLERÄ° » WEB ÇÖZÃœMLERÄ°], x-powered-by-header[PHP/5.2.4-2ubuntu5.10]
http://www.whileyouwait.co.nz/ [200] Mailto, md5[05df26cf45f8410dea2272f6c0ff269b], meta-generator[SilverStripe 2.0 - http://www.silverstripe.com], server-header[Apache], title[While You Wait Studios - Christchurch, New Zealand - PTFOTO], x-powered-by-header[PHP/5.2.11]
http://www.kitesurfnelson.co.nz/ [200] Google-Analytics-GA[1921819], md5[f7618ec4d8da26579e5df948d80ba5b8], server-header[Apache/2.2.8 (Unix) mod_ssl/2.2.8 OpenSSL/0.9.8e-fips-rhel5 mod_bwlimited/1.4 mod_perl/2.0.4 Perl/v5.8.8], title[Kite Surf Nelson - kitesurfing lessons, equipment sales and advice - in Nelson, New Zealand], x-powered-by-header[PHP/5.2.6]
http://www.resoba.com/ ERROR: EOF error end of file reached


Remove more incorrectly identified samples with the whatweb report
------------------------------------------------------------------

	http://belitsky.info/work/hartmann [301] md5[c112335e6a56038ca4ba4b906d6aee05], redirect-location[http://belitsky.info/work/hartmann/], server-header[Apache], title[301 Moved Permanently]
	http://belitsky.info/work/hartmann/ [200] index-of, md5[21577203b9abc6091d99203295712f0c], server-header[Apache], title[Index of /work/hartmann]

This matches the 'Index of' plugin. By loading this URL into a web browser I can see that this webpage isn't a CMS, instead it's a directory listing so I can delete http://belitsky.info/work/hartmann from the list.

	http://charcoalinteriors.com.au/ [200] md5[bff9a28ebdc1cdfdb80743458606df1d], server-header[Apache/2.2.13 (Unix) mod_ssl/2.2.13 OpenSSL/0.9.8e-fips-rhel5 mod_bwlimited/1.4 mod_perl/2.0.4 Perl/v5.8.8], title[Home -Charcoal Interiors], x-powered-by-header[PHP/5.2.10]

There's no way to be sure if this is SilverStripe because it has the meta generator tag removed.

	http://comtel.com.au/ [200] Google-Analytics-GA[1388941], probably Joomla[com_search], md5[050d35f63e2d9cd46065cde83f876989], server-header[Apache/2.0.55 (Ubuntu) PHP/5.1.2], title[Comtel - Telephone Radio & Data Systems | Comtel], x-powered-by-header[PHP/5.1.2]

This appears to be powered by the Joomla CMS. A website cannot be two CMSs at the same time with the same URL so if Joomla is present then SilverStripe cannot be. Note that other plugin matches such as Jquery identify a javascript library and will be found with many different CMSs including SilverStripe. Manual verification by testing http://comtel.com.au/administrator proves it is Joomla.

	http://kunstforum.as/ ERROR: Socket error getaddrinfo: Name or service not known

This website doesn't exist anymore.

	http://www.arprostatecancer.org/ [200] Google-Analytics-GA[2447233], Mailto, md5[dc1a9f02efc52b1ebc72f2c8a0b03ae6], server-header[Apache/1.3.41 (Unix) PHP/5.2.6 mod_auth_passthrough/1.8 mod_log_bytes/1.2 mod_bwlimited/1.4 FrontPage/5.0.2.2635 mod_ssl/2.8.31 OpenSSL/0.9.7a], title[Arkansas Prostate Cancer Foundation], x-powered-by-header[PHP/5.2.6]

I can't be sure if this is SilverStripe yet.

I deleted the files of the remaining websites that are identified as something other than SilverStream.
	$ wc -l list
	49 list

Now I have just 49 SilverStripe samples. I know that some of these samples might not be SilverStripe.



Use find-common-stuff to automatically identify common strings in the samples
-----------------------------------------------------------------------------
I can use a simple tool called find-common-stuff to find certain types of common strings in samples.

find-common-stuff will identify and count the occurances of:
	* complete HTML tags
	* strings enclosed in double quotes

It has threshold setting that adjusts how many uncommon things are displayed.
	
	$  ../../find-common-stuff 
	Usage: find-common-stuff FILES
	--threshold, -t	The lowest % of files an item occurs in to display. Eg. 0.25 and 0.50

	$ ../../find-common-stuff *html
	imported 49 files
	counted 3324 tags
	[["<script type=\"text/javascript\">", 35],
	 ["<![endif]-->", 30],
	 ["<meta http-equiv=\"Content-type\" content=\"text/html; charset=utf-8\" />",
	  28],
	 ["<style type=\"text/css\">", 21],
	 ["<!--[if IE 7]>", 21],
	 ["<!--[if IE 6]>", 21],
	 ["<meta name=\"generator\" http-equiv=\"generator\" content=\"SilverStripe - http://www.silverstripe.com\" />",
	  20],
	 ["<div id=\"footer\">", 16],
	 ["<link rel=\"shortcut icon\" href=\"/favicon.ico\" />", 16],
	 ["<div class=\"clear\">", 16],
	 ["<meta http-equiv=\"Content-Language\" content=\"en-US\"/>", 14],
	 ["<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.1//EN\" \"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd\">",
	  14],
	 ["<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01//EN\" \"http://www.w3.org/TR/html4/strict.dtd\">",
	  14],
	 ["<div class=\"typography\">", 14],
	 ["<meta http-equiv=\"Content-type\" content=\"text/html; charset=utf-8\" >",
	  13]]
	counted 1874 quoted texts
	[["\"text/javascript\"", 35],
	 ["\"link\"", 24],
	 ["\"text/css\"", 23],
	 ["\"typography\"", 19],
	 ["\"current\"", 19],
	 ["\"clear\"", 18],
	 ["\"footer\"", 17],
	 ["\"en\"", 16],
	 ["\"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd\"", 14],
	 ["\"http://www.w3.org/TR/html4/strict.dtd\"", 14],
	 ["\"header\"", 14]]

In this case, the automated tool, find-common-stuff has failed to identify anything I didn't noticed while reading the HTML source.


Analyse HTTP headers and cookies
--------------------------------
The .meta files contain the HTTP headers and any cookies set by the websites.
Use the cat command to display all of them.


	$ cat *meta
	HTTP/1.1 200 OK
	Date: Thu, 04 Mar 2010 04:04:35 GMT
	Server: Apache/2.2.3 (Debian) PHP/5.2.0-8+etch16 mod_ssl/2.2.3 OpenSSL/0.9.8c
	X-Powered-By: PHP/5.2.0-8+etch16
	Expires: Thu, 19 Nov 1981 08:52:00 GMT
	Cache-Control: no-cache, max-age=0, must-revalidate
	Pragma: no-cache
	Vary: Accept,User-Agent,Accept-Encoding
	Content-Type: text/html; charset=utf-8
	Via: 1.1 bc2
	Connection: Keep-Alive
	Set-Cookie: PHPSESSID=4d463f54abb74031c117569ca3aa3c61; path=/

	HTTP/1.1 200 OK
	Date: Thu, 04 Mar 2010 04:04:41 GMT
	Server: Apache/2.2.13 (Unix) mod_ssl/2.2.13 OpenSSL/0.9.8e-fips-rhel5 mod_bwlimited/1.4 mod_perl/2.0.4 Perl/v5.8.8
	X-Powered-By: PHP/5.2.10
	Content-Type: text/html
	Via: 1.1 bc6
	Connection: Keep-Alive

	HTTP/1.1 200 OK
	Date: Thu, 04 Mar 2010 04:04:30 GMT
	Server: Apache/2.2.9 (Debian) DAV/2 SVN/1.5.1 PHP/5.2.6-1+lenny6 with Suhosin-Patch mod_python/3.3.1 Python/2.5.2 mod_ruby/1.2.6 Ruby/1.8.7(2008-08-11) mod_ssl/2.2.9 OpenSSL/0.9.8g
	X-Powered-By: PHP/5.2.6-1+lenny6
	Expires: Thu, 19 Nov 1981 08:52:00 GMT
	Cache-Control: no-cache, max-age=0, must-revalidate
	Pragma: no-cache
	Vary: Accept-Encoding
	Content-Type: text/html; charset="utf-8"
	Via: 1.1 bc1
	Connection: Keep-Alive
	Set-Cookie: PHPSESSID=1b627d14a21e4475c49e7089c01e11b8; path=/

	HTTP/1.1 200 OK
	Date: Thu, 04 Mar 2010 04:04:49 GMT
	Server: Apache/1.3.41 (Unix) PHP/5.2.6 mod_psoft_traffic/0.2 mod_ssl/2.8.31 OpenSSL/0.9.7a mod_macro/1.1.2
	X-Powered-By: PHP/5.2.6
	Expires: Thu, 19 Nov 1981 08:52:00 GMT
	Cache-Control: no-cache, max-age=0, must-revalidate
	Pragma: no-cache
	Content-Type: text/html; charset="utf-8"
	Via: 1.1 bc6
	Connection: Keep-Alive
	Set-Cookie: PHPSESSID=kibbf9utq9decif304ml5lgu01; path=/

	HTTP/1.1 200 OK
	Expires: Thu, 19 Nov 1981 08:52:00 GMT
	Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
	Date: Thu, 04 Mar 2010 04:05:09 GMT
	Server: Apache/2.2
	Content-Type: text/html; charset="utf-8"
	Pragma: no-cache
	Via: 1.1 bc2
	Connection: Keep-Alive
	Set-Cookie: PHPSESSID=tmufihalkg02p4t0vi3mdf3k94; path=/
	Set-Cookie: X-Mapping-caklakng=293486F930CB5202C46B0E5EFB41C64E; path=/

	HTTP/1.1 200 OK
	Date: Thu, 04 Mar 2010 04:05:29 GMT
	Server: Apache/2.2.3 (Debian) DAV/2 PHP/5.2.6-1+lenny3 with Suhosin-Patch mod_ssl/2.2.3 OpenSSL/0.9.8g
	X-Powered-By: PHP/5.2.6-1+lenny3
	Set-Cookie: PHPSESSID=1b0b76cadaefb10b5973b8bb622d9669; path=/
	Expires: Thu, 19 Nov 1981 08:52:00 GMT
	Cache-Control: no-cache, max-age=0, must-revalidate
	Pragma: no-cache
	Set-Cookie: PastVisitor=1; expires=Wed, 02-Jun-2010 04:05:29 GMT; path=/
	Vary: Accept,Accept-Encoding
	Content-Type: text/html; charset=utf-8

	HTTP/1.1 200 OK
	Date: Thu, 04 Mar 2010 04:05:35 GMT
	Server: Apache/1.3.41 (Unix) mod_ssl/2.8.31 OpenSSL/0.9.7a PHP/5.2.8 mod_perl/1.29 FrontPage/5.0.2.2510
	X-Powered-By: PHP/5.2.8
	Expires: Mon, 21 Jun 2010 11:11:02 GMT
	Cache-Control: max-age=86400, must-revalidate
	Pragma: 
	Last-Modified: Sat, 14 Nov 2009 21:00:08 GMT
	Vary: Accept
	Content-Type: text/html; charset=utf-8
	Via: 1.1 bc6
	Connection: Keep-Alive
	Set-Cookie: PHPSESSID=b255ace17a48eadebd5de6ec6b7f3dc6; path=/
	Set-Cookie: PastVisitor=1; expires=Wed, 02-Jun-2010 04:05:35 GMT; path=/

	HTTP/1.1 200 OK
	Date: Thu, 04 Mar 2010 04:05:37 GMT
	Server: Apache
	X-Powered-By: PHP/5.2.9
	Expires: Thu, 19 Nov 1981 08:52:00 GMT
	Pragma: no-cache
	Cache-Control: no-cache, max-age=0, must-revalidate
	Vary: Accept-Encoding
	Content-Type: text/html; charset="utf-8"
	Via: 1.1 bc5
	Connection: Keep-Alive
	Set-Cookie: PHPSESSID=5nfa7b62smslu5qo660bmtmk32; path=/

	HTTP/1.1 200 OK
	Date: Thu, 04 Mar 2010 04:05:34 GMT
	Server: Apache/1.3.41 (Unix) PHP/5.2.6 mod_auth_passthrough/1.8 mod_log_bytes/1.2 mod_bwlimited/1.4 FrontPage/5.0.2.2635 mod_ssl/2.8.31 OpenSSL/0.9.7a
	X-Powered-By: PHP/5.2.6
	Content-Type: text/html
	Via: 1.1 bc7
	Connection: Keep-Alive

	HTTP/1.1 200 OK
	Date: Thu, 04 Mar 2010 04:05:46 GMT
	Server: Apache/2.0.54 (Debian GNU/Linux) PHP/5.2.3-0.dotdeb.0 with Suhosin-Patch mod_ssl/2.0.54 OpenSSL/0.9.7e
	X-Powered-By: PHP/5.2.3-0.dotdeb.0
	Set-Cookie: PHPSESSID=d627a8e8613eecd6b0c2a41b0ec79dd4; path=/
	Expires: Thu, 19 Nov 1981 08:52:00 GMT
	Cache-Control: no-cache, max-age=0, must-revalidate
	Pragma: no-cache
	Content-Type: text/html; charset="utf-8"

	HTTP/1.1 200 OK
	Date: Thu, 04 Mar 2010 04:05:50 GMT
	Server: Apache/1.3.41 (Unix) FrontPage/5.0.2.2635 mod_ssl/2.8.31 OpenSSL/0.9.7m
	Cache-Control: no-cache, max-age=0, must-revalidate
	Expires: Thu, 19 Nov 1981 08:52:00 GMT
	Pragma: no-cache
	X-Powered-By: PHP/5.2.11
	Content-Type: text/html; charset="utf-8"
	Via: 1.1 bc7
	Connection: Keep-Alive
	Set-Cookie: PHPSESSID=9508d9fa7a9d869e745c286ff9799d40; path=/

	HTTP/1.1 200 OK
	Date: Thu, 04 Mar 2010 04:05:54 GMT
	Server: Apache/2.2.3 (Debian) PHP/5.2.0-8+etch16 mod_ssl/2.2.3 OpenSSL/0.9.8c
	X-Powered-By: PHP/5.2.0-8+etch16
	Expires: Thu, 19 Nov 1981 08:52:00 GMT
	Cache-Control: no-cache, max-age=0, must-revalidate
	Pragma: no-cache
	Vary: Accept,User-Agent,Accept-Encoding
	Content-Type: text/html; charset=utf-8
	Via: 1.1 bc2
	Connection: Keep-Alive
	Set-Cookie: PHPSESSID=8d9b19f46268342f8a0823bd79599cd2; path=/

	HTTP/1.1 200 OK
	Date: Thu, 04 Mar 2010 04:06:01 GMT
	Server: Apache
	X-Powered-By: PHP/5.2.10
	Expires: Sat, 13 Mar 2010 04:54:33 GMT
	Cache-Control: max-age=86400, must-revalidate
	Pragma: 
	Last-Modified: Tue, 23 Feb 2010 03:17:29 GMT
	Content-Type: text/html; charset="utf-8"
	Via: 1.1 bc7
	Connection: Keep-Alive
	Set-Cookie: PHPSESSID=4askloa60f26kvftt1cfiao344; path=/
	Set-Cookie: PastVisitor=1; expires=Wed, 02-Jun-2010 04:06:01 GMT; path=/

	HTTP/1.1 200 OK
	Date: Thu, 04 Mar 2010 04:06:13 GMT
	Server: Apache/2.2.12 (Unix) mod_ssl/2.2.12 OpenSSL/0.9.8e-fips-rhel5 mod_auth_passthrough/2.1 mod_bwlimited/1.4 FrontPage/5.0.2.2635 PHP/5.2.11 mod_perl/2.0.4 Perl/v5.8.8
	X-Powered-By: PHP/5.2.11
	Expires: Thu, 19 Nov 1981 08:52:00 GMT
	Cache-Control: no-store, must-revalidate, max-age=0, proxy-revalidate, no-transform
	Pragma: no-cache
	Vary: Accept,Accept-Encoding,User-Agent
	Content-Type: text/html; charset=utf-8
	X-Cache: MISS from sv25.byethost25.org
	Via: 1.0 sv25.byethost25.org:80 (squid/2.7.STABLE7), 1.1 bc7
	Connection: Keep-Alive
	Set-Cookie: PHPSESSID=f0c426eaeef84f38c04624a59a98edd4; path=/demo/

	HTTP/1.1 200 OK
	Date: Thu, 04 Mar 2010 04:06:17 GMT
	Server: Apache
	Expires: Thu, 29 Oct 1998 17:04:19 GMT
	Last-Modified: Thu, 04 Mar 2010 04:06:17 GMT
	Cache-Control: no-store, no-cache, must-revalidate
	Cache-Control: post-check=0, pre-check=0
	Pragma: no-cache
	Vary: Accept-Encoding
	Content-Type: text/html; charset=UTF-8
	Via: 1.1 bc2
	Connection: Keep-Alive
	Set-Cookie: clockwork_co_nz=4040a20413b27678f6a5b225bd85f612; expires=Tue, 03-Mar-2015 04:06:17 GMT; path=/

	HTTP/1.1 200 OK
	Date: Thu, 04 Mar 2010 04:06:46 GMT
	Server: Apache
	X-Powered-By: PHP/5.2.9
	Expires: Thu, 19 Nov 1981 08:52:00 GMT
	Pragma: no-cache
	Cache-Control: no-cache, max-age=0, must-revalidate
	Content-Type: text/html; charset="utf-8"
	Via: 1.1 bc6
	Connection: Keep-Alive
	Set-Cookie: PHPSESSID=6e9a4bb3b41cdd968202a19fbb91b3c0; path=/

	HTTP/1.1 404 Not Found
	Date: Thu, 04 Mar 2010 04:06:48 GMT
	Server: Apache
	Content-Type: text/html
	Via: 1.1 bc1
	Content-Length: 0
	Connection: Keep-Alive

	HTTP/1.1 200 OK
	Date: Thu, 04 Mar 2010 04:06:49 GMT
	Server: Apache
	X-Powered-By: PHP/5.2.9
	Expires: Thu, 19 Nov 1981 08:52:00 GMT
	Pragma: no-cache
	Cache-Control: no-cache, max-age=0, must-revalidate
	Content-Type: text/html; charset="utf-8"
	Via: 1.1 bc2
	Connection: Keep-Alive
	Set-Cookie: PHPSESSID=ec0a2031ab05b371baf412429859bbdf; path=/

	HTTP/1.1 200 OK
	Date: Thu, 04 Mar 2010 04:07:01 GMT
	Server: Apache
	X-Powered-By: PHP/5.2.11
	Expires: Thu, 19 Nov 1981 08:52:00 GMT
	Pragma: no-cache
	Vary: Accept,Accept-Encoding
	Cache-Control: no-cache, max-age=0, must-revalidate
	Content-Type: text/html; charset=utf-8
	Via: 1.1 bc4
	Connection: Keep-Alive
	Set-Cookie: PHPSESSID=5e61do18cag74urkrq65eqkcf3; path=/

	HTTP/1.1 200 OK
	Date: Thu, 04 Mar 2010 04:07:07 GMT
	Server: Apache/2.2.11 (Unix) mod_ssl/2.2.11 OpenSSL/0.9.8i DAV/2 mod_auth_passthrough/2.1 mod_bwlimited/1.4 FrontPage/5.0.2.2635
	X-Powered-By: PHP/5.2.9
	Cache-Control: max-age=86400, must-revalidate
	Pragma: 
	Expires: Wed, 14 Mar 2012 07:10:47 GMT
	Vary: Accept
	Last-Modified: Fri, 22 Feb 2008 01:03:27 GMT
	Content-Type: text/html; charset=utf-8
	Via: 1.1 bc3
	Connection: Keep-Alive
	Set-Cookie: PHPSESSID=3fb099aeb12584c9a2a308667e960553; path=/
	Set-Cookie: PastVisitor=1; expires=Wed, 02-Jun-2010 04:07:07 GMT; path=/

	HTTP/1.1 301 Moved Permanently
	Date: Thu, 04 Mar 2010 04:07:14 GMT
	Location: http://www.fuel.ie/silverstripe/
	Server: Zeus/4.3
	Content-Type: text/html
	Via: 1.1 bc3
	Content-Length: 212
	Connection: Keep-Alive
	Set-Cookie: X-Mapping-enlokcai=E05A570E7E395D6AB72BEA6FC2D4D8D4; path=/

	HTTP/1.1 200 OK
	Date: Thu, 04 Mar 2010 04:07:19 GMT
	Server: Apache/2.2.14 (Unix) mod_ssl/2.2.14 OpenSSL/0.9.8l DAV/2 mod_auth_passthrough/2.1 FrontPage/5.0.2.2635
	X-Powered-By: PHP/5.2.11
	Expires: Thu, 19 Nov 1981 08:52:00 GMT
	Pragma: no-cache
	Cache-Control: no-cache, max-age=0, must-revalidate
	Vary: Accept-Encoding
	Content-Type: text/html; charset="utf-8"
	Via: 1.1 bc1
	Connection: Keep-Alive
	Set-Cookie: PHPSESSID=05b506d2ba68f9057269c4ff70ff9643; path=/

	HTTP/1.1 200 OK
	Date: Thu, 04 Mar 2010 04:07:11 GMT
	Server: Apache/1.3.41 (Unix) mod_auth_passthrough/1.8 mod_log_bytes/1.2 mod_bwlimited/1.4 FrontPage/5.0.2.2635 mod_ssl/2.8.31 OpenSSL/0.9.7a
	Cache-Control: no-cache, max-age=0, must-revalidate
	Expires: Thu, 19 Nov 1981 08:52:00 GMT
	Pragma: no-cache
	X-Powered-By: PHP/5.2.6
	Content-Type: text/html; charset="utf-8"
	Via: 1.1 bc5
	Connection: Keep-Alive
	Set-Cookie: PHPSESSID=f7aec63e8640dcd5d42a0476301a065f; path=/

	HTTP/1.1 301 OK
	Date: Thu, 04 Mar 2010 04:07:27 GMT
	Server: Apache/2.2.9 (Debian) DAV/2 SVN/1.5.1 PHP/5.2.6-1+lenny6 with Suhosin-Patch mod_python/3.3.1 Python/2.5.2 mod_ssl/2.2.9 OpenSSL/0.9.8g
	Expires: Thu, 19 Nov 1981 08:52:00 GMT
	Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
	Pragma: no-cache
	Location: /start
	Content-Type: text/html; charset="utf-8"
	Via: 1.1 bc1
	Connection: Keep-Alive
	Set-Cookie: PHPSESSID=0e75b5e83a844f8902e361394c39c39f; path=/

	HTTP/1.1 301 Moved Permanently
	Date: Thu, 04 Mar 2010 04:07:36 GMT
	Location: http://www.infinitestillness.ie/ss/
	Server: Zeus/4.3
	Content-Type: text/html
	Via: 1.1 bc3
	Content-Length: 212
	Connection: Keep-Alive
	Set-Cookie: X-Mapping-enlokcai=E05A570E7E395D6AB72BEA6FC2D4D8D4; path=/

	HTTP/1.1 200 OK
	Date: Thu, 04 Mar 2010 04:07:48 GMT
	Server: Apache
	X-Powered-By: PHP/5.2.9
	Expires: Thu, 19 Nov 1981 08:52:00 GMT
	Pragma: no-cache
	Cache-Control: no-cache, max-age=0, must-revalidate
	Vary: Accept-Encoding
	Content-Type: text/html; charset="utf-8"
	Via: 1.1 bc6
	Connection: Keep-Alive
	Set-Cookie: PHPSESSID=75cd9c74439efc3bca8a6f6a8fd429f4; path=/

	HTTP/1.1 200 OK
	Date: Thu, 04 Mar 2010 04:07:48 GMT
	Server: Apache/2.2.8 (Unix) mod_ssl/2.2.8 OpenSSL/0.9.8e-fips-rhel5 mod_bwlimited/1.4 mod_perl/2.0.4 Perl/v5.8.8
	X-Powered-By: PHP/5.2.6
	Expires: Thu, 19 Nov 1981 08:52:00 GMT
	Pragma: no-cache
	Cache-Control: no-cache, max-age=0, must-revalidate
	Content-Type: text/html; charset="utf-8"
	Via: 1.1 bc3
	Connection: Keep-Alive
	Set-Cookie: PHPSESSID=e5fa1ae0a7149cb25f36f4cf63e55603; path=/

	HTTP/1.1 200 OK
	Date: Thu, 04 Mar 2010 04:07:54 GMT
	Server: Apache/2.2.8 (Ubuntu) PHP/5.2.4-2ubuntu5.4 with Suhosin-Patch
	X-Powered-By: PHP/5.2.4-2ubuntu5.4
	Expires: Thu, 19 Nov 1981 08:52:00 GMT
	Cache-Control: no-cache, max-age=0, must-revalidate
	Pragma: no-cache
	Content-Type: text/html; charset="utf-8"
	Via: 1.1 bc7
	Connection: Keep-Alive
	Set-Cookie: PHPSESSID=4c256b274d52c2746423786faadd30c4; path=/

	HTTP/1.1 200 OK
	Date: Thu, 04 Mar 2010 04:08:00 GMT
	Server: Apache
	X-Powered-By: PHP/5.2.9
	Expires: Thu, 19 Nov 1981 08:52:00 GMT
	Pragma: no-cache
	Cache-Control: no-cache, max-age=0, must-revalidate
	Content-Type: text/html; charset="utf-8"
	Via: 1.1 bc6
	Connection: Keep-Alive
	Set-Cookie: PHPSESSID=bb34c4a17da632e7aefab6246e7704ab; path=/

	HTTP/1.1 200 OK
	Date: Thu, 04 Mar 2010 04:08:06 GMT
	Server: Apache/1.3 (Unix) mod_ssl/2.8.28 OpenSSL/0.9.8f AuthPG/1.3 FrontPage/5.0.2.2635
	Cache-Control: no-cache, max-age=0, must-revalidate
	Expires: Thu, 19 Nov 1981 08:52:00 GMT
	Pragma: no-cache
	X-Powered-By: PHP/5.2.9
	Content-Type: text/html; charset="utf-8"
	Via: 1.1 bc2
	Connection: Keep-Alive
	Set-Cookie: PHPSESSID=416bd54449181a0b0839ae7dcda95902; path=/

	HTTP/1.1 200 OK
	Date: Thu, 04 Mar 2010 04:08:06 GMT
	Server: Apache
	X-Powered-By: PHP/5.2.11
	Cache-Control: max-age=86400, must-revalidate
	Pragma: 
	Expires: Sat, 09 Oct 2010 04:35:26 GMT
	Vary: Accept
	Set-Cookie: PHPSESSID=c977eedf73ae811cabbe26bc8e0313c5; path=/
	Set-Cookie: PastVisitor=1; expires=Wed, 02-Jun-2010 04:08:06 GMT; path=/
	Last-Modified: Tue, 28 Jul 2009 03:40:46 GMT
	Content-Type: text/html; charset=utf-8

	HTTP/1.1 200 OK
	Date: Thu, 04 Mar 2010 04:08:11 GMT
	Server: Apache
	X-Powered-By: PHP/5.2.6-1+lenny6
	Expires: Thu, 19 Nov 1981 08:52:00 GMT
	Pragma: no-cache
	Cache-Control: no-cache, max-age=0, must-revalidate
	Content-Type: text/html; charset="utf-8"
	Via: 1.1 bc5
	Connection: Keep-Alive
	Set-Cookie: PHPSESSID=e0a6ca2c8e29af432c8a1266e992e129; path=/

	HTTP/1.1 200 OK
	Date: Thu, 04 Mar 2010 04:08:17 GMT
	Server: Apache
	X-Powered-By: PHP/5.2.9
	Expires: Thu, 19 Nov 1981 08:52:00 GMT
	Pragma: no-cache
	Cache-Control: no-cache, max-age=0, must-revalidate
	Content-Type: text/html; charset="utf-8"
	Via: 1.1 bc6
	Connection: Keep-Alive
	Set-Cookie: PHPSESSID=0423b5407d22301ff1e4b3d2f41b9970; path=/

	HTTP/1.1 200 OK
	Date: Thu, 04 Mar 2010 04:08:14 GMT
	Server: Apache/2.2.14 (Unix) mod_ssl/2.2.14 OpenSSL/0.9.8e-fips-rhel5 mod_bwlimited/1.4
	X-Powered-By: PHP/5.2.11
	Expires: Thu, 19 Nov 1981 08:52:00 GMT
	Pragma: no-cache
	Cache-Control: no-cache, max-age=0, must-revalidate
	Content-Type: text/html; charset="utf-8"
	Via: 1.1 bc5
	Connection: Keep-Alive
	Set-Cookie: PHPSESSID=6ad778ad438fca511d10c7f438d181dd; path=/

	HTTP/1.1 200 OK
	Date: Thu, 04 Mar 2010 04:08:20 GMT
	Server: Apache
	Last-Modified: Mon, 17 Aug 2009 04:07:41 GMT
	ETag: "7c8c026-3aa-8b0b3d40"
	Accept-Ranges: bytes
	Vary: Accept-Encoding
	Content-Type: text/html
	Via: 1.1 bc1
	Content-Length: 938
	Connection: Keep-Alive

	HTTP/1.1 200 OK
	Date: Thu, 04 Mar 2010 04:08:20 GMT
	Server: Apache/2
	X-Powered-By: PHP/5.2.12
	Set-Cookie: PHPSESSID=78e42396dc632a12633ef0f065b979be; path=/
	Expires: Thu, 19 Nov 1981 08:52:00 GMT
	Cache-Control: no-cache, max-age=0, must-revalidate
	Pragma: no-cache
	Vary: Accept,Accept-Encoding,User-Agent
	Content-Type: text/html; charset=utf-8

	HTTP/1.1 200 OK
	Date: Thu, 04 Mar 2010 04:09:47 GMT
	Server: Apache/2
	X-Powered-By: PHP/5.2.4
	Expires: Fri, 19 Mar 2010 21:22:18 GMT
	Cache-Control: max-age=86400, must-revalidate
	Pragma: 
	Last-Modified: Tue, 16 Feb 2010 10:57:16 GMT
	Vary: Accept,Accept-Encoding,User-Agent
	Content-Type: text/html; charset=utf-8
	Via: 1.1 bc7
	Connection: Keep-Alive
	Set-Cookie: PHPSESSID=2055abb25c60b3a28fc9810dd601fe13; path=/
	Set-Cookie: PastVisitor=1; expires=Wed, 02-Jun-2010 04:09:47 GMT; path=/

	HTTP/1.1 200 OK
	Date: Thu, 04 Mar 2010 04:08:24 GMT
	Server: Apache
	X-Powered-By: PHP/5.2.1
	Set-Cookie: PHPSESSID=b6cc6c9f2dc42acb0dca07bd778b031c; path=/
	Expires: Thu, 19 Nov 1981 08:52:00 GMT
	Cache-Control: no-cache, max-age=0, must-revalidate
	Pragma: no-cache
	Content-Type: text/html; charset="utf-8"

	HTTP/1.1 200 OK
	Date: Thu, 04 Mar 2010 04:28:26 GMT
	Server: Apache
	Cache-Control: no-cache, max-age=0, must-revalidate
	Expires: Thu, 19 Nov 1981 08:52:00 GMT
	Pragma: no-cache
	X-Powered-By: PHP/5.2.13
	Content-Type: text/html; charset="utf-8"
	Via: 1.1 bc3
	Connection: Keep-Alive
	Set-Cookie: PHPSESSID=40555488da9765a46c63a96625ad28b6; path=/

	HTTP/1.1 200 OK
	Date: Thu, 04 Mar 2010 04:28:27 GMT
	Server: Apache/2.2.11 (Debian) PHP/5.2.6-1+lenny2 with Suhosin-Patch mod_ssl/2.2.11 OpenSSL/0.9.8g
	X-Powered-By: PHP/5.2.6-1+lenny2
	Set-Cookie: PHPSESSID=1f41c7f80dfad6ef9e9594963612472f; path=/
	Expires: Thu, 19 Nov 1981 08:52:00 GMT
	Cache-Control: no-cache, max-age=0, must-revalidate
	Pragma: no-cache
	Vary: Accept,Accept-Encoding
	Content-Type: text/html; charset=utf-8

	HTTP/1.1 200 OK
	Date: Thu, 04 Mar 2010 04:28:13 GMT
	Server: Apache/2.0.63 (Unix) mod_ssl/2.0.63 OpenSSL/0.9.8e-fips-rhel5 mod_auth_passthrough/2.1 mod_bwlimited/1.4 FrontPage/5.0.2.2635 PHP/5.2.8
	X-Powered-By: PHP/5.2.8
	Content-Type: text/html
	Via: 1.1 bc5
	Connection: Keep-Alive

	HTTP/1.1 200 OK
	Date: Thu, 04 Mar 2010 04:28:35 GMT
	Server: Apache/2.2.11 (Unix) mod_ssl/2.2.11 OpenSSL/0.9.8i DAV/2 mod_auth_passthrough/2.1 mod_bwlimited/1.4 FrontPage/5.0.2.2635
	X-Powered-By: PHP/5.2.9
	Expires: Thu, 19 Nov 1981 08:52:00 GMT
	Pragma: no-cache
	Cache-Control: no-cache, max-age=0, must-revalidate
	Content-Type: text/html; charset="utf-8"
	Via: 1.1 bc6
	Connection: Keep-Alive
	Set-Cookie: PHPSESSID=f0ca06da67663004b31d8f14f721daab; path=/

	HTTP/1.1 200 OK
	Date: Thu, 04 Mar 2010 04:28:40 GMT
	Server: Apache
	X-Powered-By: PHP/5.2.9
	Expires: Thu, 19 Nov 1981 08:52:00 GMT
	Pragma: no-cache
	Cache-Control: no-cache, max-age=0, must-revalidate
	Content-Type: text/html; charset="utf-8"
	Via: 1.1 bc6
	Connection: Keep-Alive
	Set-Cookie: PHPSESSID=98f72bf1ed2792b143f47308b3b5f099; path=/

	HTTP/1.1 403 Forbidden
	Date: Thu, 04 Mar 2010 04:28:42 GMT
	Server: Apache
	Content-Type: text/html; charset=iso-8859-1
	Via: 1.1 bc5
	Connection: Keep-Alive

	HTTP/1.1 200 OK
	Date: Thu, 04 Mar 2010 04:28:45 GMT
	Server: Apache/2.0.63 (FreeBSD) DAV/2 SVN/1.5.2 mod_python/3.3.1 Python/2.5.1 PHP/5.2.6 with Suhosin-Patch mod_ssl/2.0.63 OpenSSL/0.9.7e-p1 mod_fastcgi/2.4.6 mod_perl/2.0.3 Perl/v5.8.8
	X-Powered-By: PHP/5.2.12
	Expires: Thu, 19 Nov 1981 08:52:00 GMT
	Pragma: no-cache
	Cache-Control: no-cache, max-age=0, must-revalidate
	Vary: Accept-Encoding
	Content-Type: text/html; charset="utf-8"
	Via: 1.1 bc7
	Connection: Keep-Alive
	Set-Cookie: PHPSESSID=860147a02be38690829b458077eabd7c; path=/

	HTTP/1.1 200 OK
	Date: Thu, 04 Mar 2010 04:28:47 GMT
	Server: Apache
	X-Powered-By: PHP/5.2.11
	Expires: Thu, 19 Nov 1981 08:52:00 GMT
	Cache-Control: no-cache, max-age=0, must-revalidate
	Pragma: no-cache
	Content-Type: text/html; charset="utf-8"
	Via: 1.1 bc7
	Connection: Keep-Alive
	Set-Cookie: PHPSESSID=t4962e1dunk93crufg8qil4375; path=/

	HTTP/1.1 200 OK
	Date: Thu, 04 Mar 2010 04:28:53 GMT
	Server: Apache/2.2.8 (Ubuntu) mod_python/3.3.1 Python/2.5.2 PHP/5.2.4-2ubuntu5.10 with Suhosin-Patch mod_ssl/2.2.8 OpenSSL/0.9.8g mod_perl/2.0.3 Perl/v5.8.8
	X-Powered-By: PHP/5.2.4-2ubuntu5.10
	Expires: Thu, 19 Nov 1981 08:52:00 GMT
	Cache-Control: no-cache, max-age=0, must-revalidate
	Pragma: no-cache
	Content-Type: text/html; charset="utf-8"
	Via: 1.1 bc7
	Connection: Keep-Alive
	Set-Cookie: PHPSESSID=5a5baaf3c09bd237859f534039e52f03; path=/

	HTTP/1.1 200 OK
	Date: Thu, 04 Mar 2010 04:30:20 GMT
	Server: Apache/2
	X-Powered-By: PHP/5.2.4
	Expires: Tue, 08 Jun 2010 22:50:34 GMT
	Cache-Control: max-age=86400, must-revalidate
	Pragma: 
	Last-Modified: Fri, 27 Nov 2009 10:10:06 GMT
	Vary: Accept,Accept-Encoding,User-Agent
	Content-Type: text/html; charset=utf-8
	Via: 1.1 bc7
	Connection: Keep-Alive
	Set-Cookie: PHPSESSID=2089013901ea310463177f049f4a92b4; path=/
	Set-Cookie: PastVisitor=1; expires=Wed, 02-Jun-2010 04:30:20 GMT; path=/

	HTTP/1.1 200 OK
	Date: Thu, 04 Mar 2010 04:28:55 GMT
	Server: Apache
	X-Powered-By: PHP/5.2.11
	Expires: Thu, 19 Nov 1981 08:52:00 GMT
	Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
	Pragma: no-cache
	Vary: Accept
	Set-Cookie: PHPSESSID=b63d92df808dc2f1edafa1c1e8830409; path=/
	Set-Cookie: PastVisitor=1; expires=Wed, 02-Jun-2010 04:28:56 GMT; path=/
	Content-Type: text/html; charset=utf-8


I noticed two uncommon HTTP headers. The expiry date in 1981 and the cookie, PastVisitor.

		Expires: Thu, 19 Nov 1981 08:52:00 GMT
		Set-Cookie: PastVisitor=1; expires=Wed, 02-Jun-2010 04:28:56 GMT; path=/

Googling for 'cookie "PastVisitor"' turns up results referring directly to SilverStripe and results referring to websites running SilverStripe. This cookie name while generic sounding appears to be only used by SilverStripe and makes a good plugin pattern.

Googling for 'Thu, 19 Nov 1981 08:52:00 GMT' turns up many results. This date relates to the PHP language and is not useful in identifying SilverStripe.


Read more HTML source
---------------------
Select some more HTML files to read, looking for unusual patterns.

grep all the HTML files simulatenously looking for patterns:
Examples:
	$ grep css *html
	
	$ grep javascript *html
	

Many of the samples have a css file called typography.css. This by itself isn't uncommon enough to make a plugin match. Even if we search for themes/.*/css/typography.css it's still not uncommon enough.

	<link rel="stylesheet" type="text/css" href="http://www.lisamarieelliott.com/themes/lisamarieelliott/css/typography.css?m=1254246770" />

The -A parameter to the grep command is used to display lines after the matched line. Using this we can see the lines directly after layout.css.

	$ grep -A 2 layout.css *html
	customcanvas.fritzandandre.com-.html:<link rel="stylesheet" type="text/css" href="http://customcanvas.fritzandandre.com/themes/blueplanet/css/layout.css?m=1254524509" />
	customcanvas.fritzandandre.com-.html-<link rel="stylesheet" type="text/css" href="http://customcanvas.fritzandandre.com/themes/blueplanet/css/typography.css?m=1254524509" />
	customcanvas.fritzandandre.com-.html-<link rel="stylesheet" type="text/css" href="http://customcanvas.fritzandandre.com/themes/blueplanet/css/form.css?m=1254524509" />
	--
	hungryhearts.no.html:	<link rel="stylesheet" href="themes/hh/css/layout.css" type="text/css">
	hungryhearts.no.html-	<link rel="stylesheet" href="themes/hh/css/form.css" type="text/css">
	hungryhearts.no.html-	<link rel="stylesheet" href="themes/hh/javascript/fancybox/jquery.fancybox.css" type="text/css" media="screen">
	--
	maungataniwha.co.nz-.html:<link rel="stylesheet" type="text/css" href="http://maungataniwha.co.nz/themes/maungataniwha/css/layout.css?m=1265149666" />
	maungataniwha.co.nz-.html-<link rel="stylesheet" type="text/css" href="http://maungataniwha.co.nz/themes/maungataniwha/css/typography.css?m=1265149666" />
	maungataniwha.co.nz-.html-<link rel="stylesheet" type="text/css" href="http://maungataniwha.co.nz/themes/maungataniwha/css/form.css?m=1265149668" />
	--
	weonline.in.html:	<link rel="stylesheet" href="themes/weonline/css/layout.css" type="text/css" media="screen" />
	weonline.in.html-	
	weonline.in.html-	<!--[if IE 6]>
	--
	www.benpearce.co.nz-.html:			<link href="themes/main/css/layout.css" rel="stylesheet" type="text/css" />
	www.benpearce.co.nz-.html-			<link href="themes/main/css/typography.css" rel="stylesheet" type="text/css" />
	www.benpearce.co.nz-.html-			<link href="themes/main/css/form.css" rel="stylesheet" type="text/css" />
	--
	www.bradyinc.com-.html:	<link rel="stylesheet" type="text/css" href="http://www.bradyinc.com/themes/bradyassociates/css/layout.css?m=1267052060" />
	www.bradyinc.com-.html-<link rel="stylesheet" type="text/css" href="http://www.bradyinc.com/themes/bradyassociates/css/typography.css?m=1266644617" />
	www.bradyinc.com-.html-<link rel="stylesheet" type="text/css" href="http://www.bradyinc.com/themes/bradyassociates/css/form.css?m=1266644611" />
	--

layout.css itself is not uncommon enough to make a plugin match. However many of the samples have at least 3 css files named layout.css, typography.css and form.css. The use of these names is not exclusive to SilverStripe and is considered best practice for making CSS frameworks but the order of their appearance combined with the folder structure is unique enough for a 'probable' plugin match.

<link rel="stylesheet" type="text/css" href="http://www.textiprints.com/themes/textiprints/css/layout.css?m=1266347738" />
<link rel="stylesheet" type="text/css" href="http://www.textiprints.com/themes/textiprints/css/typography.css?m=1266347623" />
<link rel="stylesheet" type="text/css" href="http://www.textiprints.com/themes/textiprints/css/form.css?m=1247030621" />


Earlier I identified the format /assets/galleries/xxxx/_resampled/xxxx.jpg as worthy of investigation.

	$ grep -o 'src="/assets[^"]*' *html
	beatone.co.uk-.html:src="/assets/Banner-Images/Home/_resampled/croppedimage626168-BAL-Busy2.jpg
	beatone.co.uk-.html:src="/assets/Banner-Images/Home/_resampled/croppedimage626168-Cheers.jpg
	beatone.co.uk-.html:src="/assets/Banner-Images/Home/_resampled/croppedimage626168-COV-BarBusy.jpg
	beatone.co.uk-.html:src="/assets/Banner-Images/Home/_resampled/croppedimage626168-FolkDrinking1.jpg
	beatone.co.uk-.html:src="/assets/Banner-Images/Home/_resampled/croppedimage626168-FolkDrinkingTWINS.jpg
	beatone.co.uk-.html:src="/assets/Banner-Images/Home/_resampled/croppedimage626168-HAM-Busy2.jpg
	beatone.co.uk-.html:src="/assets/Banner-Images/Home/_resampled/croppedimage626168-HAM-BusyGirls.jpg
	beatone.co.uk-.html:src="/assets/Banner-Images/Home/_resampled/croppedimage626168-HAM-GirlDrinking.jpg
	beatone.co.uk-.html:src="/assets/Banner-Images/Home/_resampled/croppedimage626168-PUT-HappyHour.jpg
	beatone.co.uk-.html:src="/assets/Banner-Images/Home/_resampled/croppedimage626168-ShakeShake.jpg
	beatone.co.uk-.html:src="/assets/Banner-Images/Home/_resampled/croppedimage626168-SOHO-BarBusy2.jpg
	beatone.co.uk-.html:src="/assets/Banner-Images/Home/_resampled/croppedimage626168-SOHO-BarMixing.jpg
	beatone.co.uk-.html:src="/assets/Widgets/_resampled/croppedimage158220-book-a-party.jpg
	beatone.co.uk-.html:src="/assets/Widgets/_resampled/SetWidth182-book-a-party-title.gif
	beatone.co.uk-.html:src="/assets/Widgets/_resampled/croppedimage158220-happy-hour.jpg
	beatone.co.uk-.html:src="/assets/Widgets/_resampled/SetWidth182-happy-hour-title.gif
	customcanvas.fritzandandre.com-.html:src="/assets/Banners/blueplanetvespa.jpg
	customcanvas.fritzandandre.com-.html:src="/assets/Banners/scooter5.jpg
	customcanvas.fritzandandre.com-.html:src="/assets/Banners/scooterad4.jpg
	customcanvas.fritzandandre.com-.html:src="/assets/Banners/traffic.jpg
	customcanvas.fritzandandre.com-.html:src="/assets/Banners/badboy.jpg
	hungryhearts.no.html:src="/assets/Uploads/_resampled/croppedimage177207-hungry-heartsindex5.jpg
	hungryhearts.no.html:src="/assets/Uploads/_resampled/croppedimage8075-kvad-cropst2web2.jpg
	hungryhearts.no.html:src="/assets/Uploads/_resampled/croppedimage8075-IMG6686.jpg
	hungryhearts.no.html:src="/assets/Uploads/_resampled/croppedimage8075-henri4.jpg
	hungryhearts.no.html:src="/assets/Uploads/_resampled/croppedimage8075-bil4.jpg
	...
	
	$ grep -lo 'src="/assets.*_resampled' *html | wc -l
	13

The pattern appears in only 13 of the samples. At first it doesn't appear to be a very unique match but a google query for "/assets/ _resampled/" returned almost entirely SilverStripe websites.



6. Review of unique patterns identified
=======================================

Pattern 1 - Meta generator tag
-------------------------------

Examples:
	$ grep -hi 'name="generator' *html | sed  's/^[ \t]*//g' | sort -u
	<meta name="generator" http-equiv="generator" content="SilverStripe 2.0 - http://www.silverstripe.com" >
	<meta name="generator" http-equiv="generator" content="SilverStripe 2.0 - http://www.silverstripe.com" />
	<meta name="generator" http-equiv="generator" content="SilverStripe 2.3.0 - http://www.silverstripe.com" />
	<meta name="generator" http-equiv="generator" content="SilverStripe 2.3.1 - http://www.silverstripe.com" >
	<meta name="generator" http-equiv="generator" content="SilverStripe 2.3.1 - http://www.silverstripe.com" />
	<meta name="generator" http-equiv="generator" content="SilverStripe - http://www.silverstripe.com" >
	<meta name="generator" http-equiv="generator" content="SilverStripe - http://www.silverstripe.com" />

The meta generator tag is most likely to be removed by a web developer. It sometimes has the version number which is useful.
I will give this pattern a certainty of 100%.


Pattern 2 - Cookie PastVisitor
-------------------------------
Googling for 'cookie "PastVisitor"' turns up results referring directly to SilverStripe and results referring to websites that turn out to be running SilverStripe. This cookie name, while generic sounding appears to be only used by SilverStripe and will make a good plugin match.

Examples:
	$ grep -h PastV *meta	
	Set-Cookie: PastVisitor=1; expires=Wed, 02-Jun-2010 04:05:29 GMT; path=/
	Set-Cookie: PastVisitor=1; expires=Wed, 02-Jun-2010 04:05:35 GMT; path=/
	Set-Cookie: PastVisitor=1; expires=Wed, 02-Jun-2010 04:06:01 GMT; path=/
	Set-Cookie: PastVisitor=1; expires=Wed, 02-Jun-2010 04:07:07 GMT; path=/
	Set-Cookie: PastVisitor=1; expires=Wed, 02-Jun-2010 04:08:06 GMT; path=/
	Set-Cookie: PastVisitor=1; expires=Wed, 02-Jun-2010 04:09:47 GMT; path=/
	Set-Cookie: PastVisitor=1; expires=Wed, 02-Jun-2010 04:30:20 GMT; path=/
	Set-Cookie: PastVisitor=1; expires=Wed, 02-Jun-2010 04:28:56 GMT; path=/

I will give this pattern a certainty of 100% because I couldn't find any examples of a non-SilverStripe website using the cookie name.


Pattern 3 - 3 CSS files, layouts.css, typography.css and form.css
-----------------------------------------------------------------
Many of the samples have at least 3 css files named layout.css, typography.css and form.css. The use of these names is not exclusive to SilverStripe and is considered best practice for making CSS frameworks but the order of their appearance combined with the folder structure is unique enough for a 'probable' plugin match.

	<link rel="stylesheet" type="text/css" href="http://www.textiprints.com/themes/textiprints/css/layout.css?m=1266347738" />
	<link rel="stylesheet" type="text/css" href="http://www.textiprints.com/themes/textiprints/css/typography.css?m=1266347623" />
	<link rel="stylesheet" type="text/css" href="http://www.textiprints.com/themes/textiprints/css/form.css?m=1247030621" />

Not all samples have the ?m= after the css filename. Here are examples without the ?m=
	$ grep -h -A 2 layout.css plugin-development/tests/silverstripe/*html |fgrep -v "?m="

	<link rel="stylesheet" type="text/css" href="themes/dcsd/css/layout.css" />
	<link rel="stylesheet" type="text/css" href="themes/dcsd/css/form.css" />

	<link rel="stylesheet" type="text/css" href="tutorial/css/layout.css" >
	<link rel="stylesheet" type="text/css" href="tutorial/css/typography.css" >
	<link rel="stylesheet" type="text/css" href="tutorial/css/form.css" >

	<link rel="stylesheet" href="/themes/firstgalaxies/css/layout.css" type="text/css">
	<link rel="stylesheet" href="/themes/firstgalaxies/css/typography.css" type="text/css">
	<link rel="stylesheet" href="/themes/firstgalaxies/css/form.css" type="text/css">

I will give this pattern a certainty of 75% because these three CSS filenames are considered best practice.


Pattern 4 - image assets url structure
-----------------------------------------------------------------
<img src="/assets/.*/_resampled/.*.jpg"

Examples:
	<img src="/assets/magazine/sr6/toc/_resampled/croppedimage220165-02-Two Vietnam Vets.jpg" alt="" />
	<img class="left noborder" src="/assets/Uploads/services/icons/fundraisers-icon.jpg" alt="Fundraisers Icon" />

At first it doesn't appear to be a very unique match but a google query for "/assets/ _resampled/" returned almost entirely SilverStripe websites.
I will give this pattern a certainty of 75% because I found at least 1 non-SilverStripe example with Google.


7. Write the plugin
=======================================

Build on the plugin template
---------------------------------------
The plugin template is found in the plugin-development/ folder. Copy this into the plugins/ folder with the name of your choosing. All plugin names have the .rb extension.

The template:

	Plugin.define "Plugin-Template" do
	author "Enter Your Name"
	version "0.1"
	description "Describe what the plugin identifies. Include the homepage of the software package"
	examples %w| include-some.net example-websites.com here.com |

	# a comment block here is a good place to make notes for yourself and others

	# There are four types of matches: regexp, text, ghdb
	# Matches are enclosed in {} brackets and separated by commas
	matches [
	{:name=>"a brief description of the match, eg. powered by in footer",
	:probability=>100, # this isn't a real probability. 100 is certain, 75 is probably and 25 is maybe
	:regexp=>/This page was generated by <a href="http:\/\/www.genericcms.com\/en\/products\/generic-cms\/">Generic CMS<\/a>/ },

	{:name=>"title",
	:probability=>75,
	:text=>"<title>Generic Homepage</title>" }
	]
	end


Fill in the plugin name, author, version, description and examples fields. The examples are a ruby array delimited by whitespace and the http:// prefix is optional.

	Plugin.define "SilverStripe" do
	author "Andrew Horton"
	version "0.1"
	description "SilverStripe is an opensource CMS written in PHP. It can run on Apache, IIS or lighthttpd. Homepage: http://www.silverstripe.com"

	examples %w|http://beatone.co.uk/ http://charcoalinteriors.com.au/ http://customcanvas.fritzandandre.com/ http://hungryhearts.no http://maungataniwha.co.nz/ http://unbounded.org/ http://victoriaoruwari.com/ http://weonline.in  http://www.arprostatecancer.org/ http://www.benpearce.co.nz/ http://www.bradyinc.com/ http://www.cavendishimaging.com/ http://www.chapmansurfboards.com/ http://www.choidoco.com/demo/ http://www.clockwork.co.nz/ http://www.enamaine.org/ http://www.executivemediasearch.com/ http://www.fairtradenap.net/ http://www.firstgalaxies.org/ http://www.frussian.com.ar/ http://www.fuel.ie/silverstripe http://www.gsbc.edu/ http://www.holistichealth.com/ http://www.hutmacherin.com/ http://www.infinitestillness.ie/ss http://www.intandemtheatre.org/ http://www.kitesurfnelson.co.nz/ http://www.latenightdisco.com/ http://www.lisamarieelliott.com/ http://www.maklerservice-greiz.de/ http://www.moerakihavenmotel.co.nz/ http://www.monjasantner.de/ http://www.moonlitekustoms.com/ http://www.moto-racepaint.com/ http://www.naciondnb.com/ http://www.nadabakery.co.nz/ http://www.peterpanvakantieclub.nl/ http://www.rcaforum.org.nz/ http://www.robert80.de/ http://www.silverstripe.com/ http://www.silverstripe.org.pl/ http://www.stillrunnin.com/ http://www.textiprints.com/ http://www.thelightboxdesigns.com/ http://www.tobychampion.co.uk/ http://www.upstreamgroup.com/ http://www.verus.com.tr/ http://www.wend.nl/ http://www.whileyouwait.co.nz/ |
	
	# a comment block here is a good place to make notes for yourself and others
	
	# There are four types of matches: regexp, text, ghdb
	# Matches are enclosed in {} brackets and separated by commas
	matches [
	{:name=>"a brief description of the match, eg. powered by in footer",
	:probability=>100, # this isn't a real probability. 100 is certain, 75 is probably and 25 is maybe
	:regexp=>/This page was generated by <a href="http:\/\/www.genericcms.com\/en\/products\/generic-cms\/">Generic CMS<\/a>/ },

	{:name=>"title",
	:probability=>75,
	:text=>"<title>Generic Homepage</title>" }
	]
	end


Match Pattern 1 - Meta generator tag
----------------------------------------------------

Review the examples you have collected for match 1 and decide on what type of match is best suited to this pattern.
	
	<meta name="generator" http-equiv="generator" content="SilverStripe 2.0 - http://www.silverstripe.com" >
	<meta name="generator" http-equiv="generator" content="SilverStripe 2.0 - http://www.silverstripe.com" />
	<meta name="generator" http-equiv="generator" content="SilverStripe 2.3.0 - http://www.silverstripe.com" />
	<meta name="generator" http-equiv="generator" content="SilverStripe 2.3.1 - http://www.silverstripe.com" >
	<meta name="generator" http-equiv="generator" content="SilverStripe 2.3.1 - http://www.silverstripe.com" />
	<meta name="generator" http-equiv="generator" content="SilverStripe - http://www.silverstripe.com" >
	<meta name="generator" http-equiv="generator" content="SilverStripe - http://www.silverstripe.com" />

Match types are:
	regexp	- Ruby regular expressions that start and end with / characters
	text	- A text string enclosed in ' or " characters
	ghdb	- Google Hacking Database. This uses some Google query parameters. It currently supports intitle:, filetype:, inurl:

For this plugin to match we will look for a meta tag with the name 'generator' and a content parameter that starts with "SilverStripe". Later we can extract the version number. A regular expression match is best suited for this.

The following regular expression will match the tag.
	/<meta name="generator"[^>]*content="SilverStripe/

Notice how I haven't tried to match the http-equiv="generator" part of the tag or the website URL in the content field. Those parts of the tag are irrelevant and may change in future versions of SilverStripe. 

Note: If you don't understand regular expressions you could make a text match like:
	:text=>'<meta name="generator" http-equiv="generator" content="SilverStripe'

The plugin now looks like:

	Plugin.define "SilverStripe" do
	author "Andrew Horton"
	version "0.1"
	description "SilverStripe is an opensource CMS written in PHP. It can run on Apache, IIS or lighthttpd. Homepage: http://www.silverstripe.com"

	examples %w|http://beatone.co.uk/ http://charcoalinteriors.com.au/ http://customcanvas.fritzandandre.com/ http://hungryhearts.no http://maungataniwha.co.nz/ http://unbounded.org/ http://victoriaoruwari.com/ http://weonline.in  http://www.arprostatecancer.org/ http://www.benpearce.co.nz/ http://www.bradyinc.com/ http://www.cavendishimaging.com/ http://www.chapmansurfboards.com/ http://www.choidoco.com/demo/ http://www.clockwork.co.nz/ http://www.enamaine.org/ http://www.executivemediasearch.com/ http://www.fairtradenap.net/ http://www.firstgalaxies.org/ http://www.frussian.com.ar/ http://www.fuel.ie/silverstripe http://www.gsbc.edu/ http://www.holistichealth.com/ http://www.hutmacherin.com/ http://www.infinitestillness.ie/ss http://www.intandemtheatre.org/ http://www.kitesurfnelson.co.nz/ http://www.latenightdisco.com/ http://www.lisamarieelliott.com/ http://www.maklerservice-greiz.de/ http://www.moerakihavenmotel.co.nz/ http://www.monjasantner.de/ http://www.moonlitekustoms.com/ http://www.moto-racepaint.com/ http://www.naciondnb.com/ http://www.nadabakery.co.nz/ http://www.peterpanvakantieclub.nl/ http://www.rcaforum.org.nz/ http://www.robert80.de/ http://www.silverstripe.com/ http://www.silverstripe.org.pl/ http://www.stillrunnin.com/ http://www.textiprints.com/ http://www.thelightboxdesigns.com/ http://www.tobychampion.co.uk/ http://www.upstreamgroup.com/ http://www.verus.com.tr/ http://www.wend.nl/ http://www.whileyouwait.co.nz/ |

	#<meta name="generator" http-equiv="generator" content="SilverStripe 2.0 - http://www.silverstripe.com" >
	#<meta name="generator" http-equiv="generator" content="SilverStripe 2.0 - http://www.silverstripe.com" />
	#<meta name="generator" http-equiv="generator" content="SilverStripe 2.3.0 - http://www.silverstripe.com" />
	#<meta name="generator" http-equiv="generator" content="SilverStripe 2.3.1 - http://www.silverstripe.com" >
	#<meta name="generator" http-equiv="generator" content="SilverStripe 2.3.1 - http://www.silverstripe.com" />
	#<meta name="generator" http-equiv="generator" content="SilverStripe - http://www.silverstripe.com" >
	#<meta name="generator" http-equiv="generator" content="SilverStripe - http://www.silverstripe.com" />

	matches [
	{:name=>"meta generator tag",
	:probability=>100,
	:regexp=>/<meta name="generator"[^>]*content="SilverStripe/}

	]
	end

I have included the meta generatator tag examples within the plugin as comments because this is a good place to refer to them later.


Plugin Testing
---------------------------------------
It's good practice to test your plugin while writing it to make sure it works.

If you want to ensure your plugin is loaded, run the following command:
	$ ./whatweb -l
	
This shows all loaded plugins displayed along with the version number.

Test your current plugin on the SilverStripe samples you have collected. The whatweb parameters used are:
	-v	Verbose. This shows us which matches are being found
	-p	Plugins. Only load the SilverStripe plugin
	
	$ ./whatweb -v -psilverstripe ./plugin-development/tests/silverstripe/*html
	
	./plugin-development/tests/silverstripe/charcoalinteriors.com.au-.html [] 
	Identifying: ./plugin-development/tests/silverstripe/charcoalinteriors.com.au-.html
	HTTP-Status: 

	./plugin-development/tests/silverstripe/beatone.co.uk-.html [] SilverStripe
	Identifying: ./plugin-development/tests/silverstripe/beatone.co.uk-.html
	HTTP-Status: 
	[["SilverStripe",
	  [{:regexp=>/<meta name="generator"[^>]*content="SilverStripe/,
		:name=>"meta generator tag",
		:probability=>100}]]]

	./plugin-development/tests/silverstripe/hungryhearts.no.html [] SilverStripe
	Identifying: ./plugin-development/tests/silverstripe/hungryhearts.no.html
	HTTP-Status: 
	[["SilverStripe",
	  [{:regexp=>/<meta name="generator"[^>]*content="SilverStripe/,
		:name=>"meta generator tag",
		:probability=>100}]]]

	./plugin-development/tests/silverstripe/maungataniwha.co.nz-.html [] 
	Identifying: ./plugin-development/tests/silverstripe/maungataniwha.co.nz-.html
	HTTP-Status: 

	./plugin-development/tests/silverstripe/customcanvas.fritzandandre.com-.html [] SilverStripe
	Identifying: ./plugin-development/tests/silverstripe/customcanvas.fritzandandre.com-.html
	HTTP-Status: 
	[["SilverStripe",
	  [{:regexp=>/<meta name="generator"[^>]*content="SilverStripe/,
		:name=>"meta generator tag",
		:probability=>100}]]]

We notice that charcoalinteriors.com.au and maungataniwha.co.nz aren't matching. After viewing the HTML files we notice they do not include the meta generator tag. Our first match is working correctly.


Match Pattern 2 - Cookie PastVisitor
------------------------------------------------------
Review the examples you have collected and decide on what type of match is best suited.

	Set-Cookie: PastVisitor=1; expires=Wed, 02-Jun-2010 04:05:29 GMT; path=/
	Set-Cookie: PastVisitor=1; expires=Wed, 02-Jun-2010 04:05:35 GMT; path=/
	Set-Cookie: PastVisitor=1; expires=Wed, 02-Jun-2010 04:06:01 GMT; path=/
	Set-Cookie: PastVisitor=1; expires=Wed, 02-Jun-2010 04:07:07 GMT; path=/
	Set-Cookie: PastVisitor=1; expires=Wed, 02-Jun-2010 04:08:06 GMT; path=/
	Set-Cookie: PastVisitor=1; expires=Wed, 02-Jun-2010 04:09:47 GMT; path=/
	Set-Cookie: PastVisitor=1; expires=Wed, 02-Jun-2010 04:30:20 GMT; path=/
	Set-Cookie: PastVisitor=1; expires=Wed, 02-Jun-2010 04:28:56 GMT; path=/

Matching the cookie cannot be done within the matches array. Create a function called passive that will be triggered whenever the plugin is used.

	def passive
		m=[]
	
		m
	end

This code creates a empty array called m and returns the value of that array. This array will contain hashes in the same format as the matches section. Hash element names can be name, probability, version, etc.

	def passive
		m=[]
	    m << {:name=>"PastVisitor Cookie", :probability=>100 } if @meta["set-cookie"] =~ /PastVisitor=[0-9]+.*/	
		m
	end

Now the function is checking the @meta array element "set-cookie" is see if it contains a regular expression that begins with PastVisitor= then has some numbers.

We cannot test this against the saved HTML files in our plugin-development/silverstripe/ folder. Instead we will test it using example sites. Note the whatweb parameter -e uses the examples in the loaded plugins as targets.

	$ ./whatweb -v -psilverstripe -e 
	http://charcoalinteriors.com.au/ [200] 
	Identifying: http://charcoalinteriors.com.au/
	HTTP-Status: 200

	http://beatone.co.uk/ [200] SilverStripe
	Identifying: http://beatone.co.uk/
	HTTP-Status: 200
	[["SilverStripe",
	  [{:regexp=>/<meta name="generator"[^>]*content="SilverStripe/,
		:name=>"meta generator tag",
		:probability=>100}]]]

	http://maungataniwha.co.nz/ [200] 
	Identifying: http://maungataniwha.co.nz/
	HTTP-Status: 200

	http://hungryhearts.no [200] SilverStripe
	Identifying: http://hungryhearts.no
	HTTP-Status: 200
	[["SilverStripe",
	  [{:regexp=>/<meta name="generator"[^>]*content="SilverStripe/,
		:name=>"meta generator tag",
		:probability=>100}]]]

	http://customcanvas.fritzandandre.com/ [200] SilverStripe
	Identifying: http://customcanvas.fritzandandre.com/
	HTTP-Status: 200
	[["SilverStripe",
	  [{:regexp=>/<meta name="generator"[^>]*content="SilverStripe/,
		:name=>"meta generator tag",
		:probability=>100}]]]

	http://unbounded.org/ [200] SilverStripe
	Identifying: http://unbounded.org/
	HTTP-Status: 200
	[["SilverStripe",
	  [{:regexp=>/<meta name="generator"[^>]*content="SilverStripe/,
		:name=>"meta generator tag",
		:probability=>100},
	   {:name=>"PastVisitor Cookie", :probability=>100}]]]

	http://weonline.in [200] SilverStripe
	Identifying: http://weonline.in
	HTTP-Status: 200
	[["SilverStripe",
	  [{:regexp=>/<meta name="generator"[^>]*content="SilverStripe/,
		:name=>"meta generator tag",
		:probability=>100}]]]

	http://www.arprostatecancer.org/ [200] 
	Identifying: http://www.arprostatecancer.org/
	HTTP-Status: 200

	http://victoriaoruwari.com/ [200] SilverStripe
	Identifying: http://victoriaoruwari.com/
	HTTP-Status: 200
	[["SilverStripe", [{:name=>"PastVisitor Cookie", :probability=>100}]]]

Our plugin has recognised the PastVistitor cookie for unbounded.org and victoriaoruwari.com so we know that it works. 

Using just the two matches so far would be insufficient. Notice how some sites match only the meta generator tag, other match only the cookie and some aren't matched at all.


Match Pattern 3 - 3 CSS files, layouts.css, typography.css and form.css
-----------------------------------------------------------------------

Review the examples you have collected and decide on what type of match is best suited.

	<link rel="stylesheet" type="text/css" href="http://www.textiprints.com/themes/textiprints/css/layout.css?m=1266347738" />
	<link rel="stylesheet" type="text/css" href="http://www.textiprints.com/themes/textiprints/css/typography.css?m=1266347623" />
	<link rel="stylesheet" type="text/css" href="http://www.textiprints.com/themes/textiprints/css/form.css?m=1247030621" />
	
	<link rel="stylesheet" href="/themes/firstgalaxies/css/layout.css" type="text/css">
	<link rel="stylesheet" href="/themes/firstgalaxies/css/typography.css" type="text/css">
	<link rel="stylesheet" href="/themes/firstgalaxies/css/form.css" type="text/css">

I decided earlier to give this pattern a certainty of 75% because using this set of filenames for CSS files is considered best practice. I also discovered an example where the typography.css file wasn't included. I chose to not match that because three names in order is a more likely unique match.

A regular expression is the best choice to match match these three css files in order:
/<link[^>]*stylesheet[^>]*layout.css[^>]*>[^<]*<link[^>]*stylesheet[^>]*typography.css[^>]*>[^<]*<link[^>]*stylesheet[^>]*form.css[^>]*>/

Including this regular expression gives us the following matches array in our plugin. Notice how a comma is included after the first match.

	matches [
	{:name=>"meta generator tag",
	:probability=>100,
	:regexp=>/<meta name="generator"[^>]*content="SilverStripe/},
	
	{:name=>"layout, typography, form css files",
	:probability=>75,
	:regexp=>/<link[^>]*stylesheet[^>]*layout.css[^>]*>[^<]*<link[^>]*stylesheet[^>]*typography.css[^>]*>[^<]*<link[^>]*stylesheet[^>]*form.css[^>]*>/}
	]


Test the regular expression to make sure it works.

	$ ./whatweb -v -psilverstripe ./plugin-development/tests/silverstripe/*html
	./plugin-development/tests/silverstripe/charcoalinteriors.com.au-.html [] 
	Identifying: ./plugin-development/tests/silverstripe/charcoalinteriors.com.au-.html
	HTTP-Status: 

	./plugin-development/tests/silverstripe/beatone.co.uk-.html [] SilverStripe
	Identifying: ./plugin-development/tests/silverstripe/beatone.co.uk-.html
	HTTP-Status: 
	[["SilverStripe",
	  [{:regexp=>/<meta name="generator"[^>]*content="SilverStripe/,
		:name=>"meta generator tag",
		:probability=>100}]]]

	./plugin-development/tests/silverstripe/hungryhearts.no.html [] SilverStripe
	Identifying: ./plugin-development/tests/silverstripe/hungryhearts.no.html
	HTTP-Status: 
	[["SilverStripe",
	  [{:regexp=>/<meta name="generator"[^>]*content="SilverStripe/,
		:name=>"meta generator tag",
		:probability=>100}]]]

	./plugin-development/tests/silverstripe/maungataniwha.co.nz-.html [] probably SilverStripe
	Identifying: ./plugin-development/tests/silverstripe/maungataniwha.co.nz-.html
	HTTP-Status: 
	[["SilverStripe",
	  [{:regexp=>
		 /<link[^>]*stylesheet[^>]*layout.css[^>]*>[^<]*<link[^>]*stylesheet[^>]*typography.css[^>]*>[^<]*<link[^>]*stylesheet[^>]*form.css[^>]*>/,
		:name=>"layout, typography, form css files",
		:probability=>75}]]]

	./plugin-development/tests/silverstripe/customcanvas.fritzandandre.com-.html [] SilverStripe
	Identifying: ./plugin-development/tests/silverstripe/customcanvas.fritzandandre.com-.html
	HTTP-Status: 
	[["SilverStripe",
	  [{:regexp=>/<meta name="generator"[^>]*content="SilverStripe/,
		:name=>"meta generator tag",
		:probability=>100},
	   {:regexp=>
		 /<link[^>]*stylesheet[^>]*layout.css[^>]*>[^<]*<link[^>]*stylesheet[^>]*typography.css[^>]*>[^<]*<link[^>]*stylesheet[^>]*form.css[^>]*>/,
		:name=>"layout, typography, form css files",
		:probability=>75}]]]

3 of the first 5 didn't match the CSS files, charcoalinteriors.com.au, beatone.co.uk, and hungryhearts.no.html. I manually inspected the files to see what their CSS files were. None of them include the 3 CSS files in order so our regular expression works.


Match Pattern 4 - image assets url structure
-----------------------------------------------------------------

Review the examples you have collected and decide on what type of match is best suited.

	<img src="/assets/magazine/sr6/toc/_resampled/croppedimage220165-02-Two Vietnam Vets.jpg" alt="" />
	<img class="left noborder" src="/assets/Uploads/services/icons/fundraisers-icon.jpg" alt="Fundraisers Icon" />

This match is best found with a regular expression.

Earlier I thought it didn't appear to be a unique match but a google query for "/assets/ _resampled/" returned almost entirely SilverStripe websites. I'm giving this pattern a certainty of 75% because I found at least 1 non-SilverStripe example with Google.

The following regular expression will work, <img src="/assets/[^/]+/_resampled/[^"]+.jpg" . In plain english this is read as 
<img src="/assets/anything but a slash/_resampled/anything but double quotes.jpg"

Our plugin matches array now looks like:

	matches [
	{:name=>"meta generator tag",
	:probability=>100,
	:regexp=>/<meta name="generator"[^>]*content="SilverStripe/},
	
	{:name=>"layout, typography, form css files",
	:probability=>75,
	:regexp=>/<link[^>]*stylesheet[^>]*layout.css[^>]*>[^<]*<link[^>]*stylesheet[^>]*typography.css[^>]*>[^<]*<link[^>]*stylesheet[^>]*form.css[^>]*>/},
	
	{:name=>"<img src="/assets/something/_resampled/something.jpg"",
	:probability=>75,
	:regexp=>/<img src="/assets/[^/]+/_resampled/[^"]+.jpg"/}	
	]

Next I tested the plugin to confirm that it matches some of the samples.


Test the plugin against all the examples
----------------------------------------

Test the plugin against all the example websites to test how effectively it idenfies them. Instead of testing against the saved HTML files I will test against the example URLs in the plugin so that the match that checks the cookies will work.

	$ ./whatweb  -psilverstripe -e
	http://unbounded.org/ [200] SilverStripe
	http://charcoalinteriors.com.au/ [200] 
	http://customcanvas.fritzandandre.com/ [200] SilverStripe
	http://beatone.co.uk/ [200] SilverStripe
	http://maungataniwha.co.nz/ [200] probably SilverStripe
	http://hungryhearts.no [200] SilverStripe
	http://www.benpearce.co.nz/ [200] 
	http://victoriaoruwari.com/ [200] SilverStripe
	http://www.arprostatecancer.org/ [200] 
	http://weonline.in [200] SilverStripe
	http://www.choidoco.com/demo/ [200] SilverStripe
	http://www.cavendishimaging.com/ [200] SilverStripe
	http://www.bradyinc.com/ [200] SilverStripe
	http://www.clockwork.co.nz/ ERROR: Socket error getaddrinfo: Name or service not known
	http://www.chapmansurfboards.com/ [200] SilverStripe
	http://www.fairtradenap.net/ [200] probably SilverStripe
	http://www.executivemediasearch.com/ [404] 
	http://www.fuel.ie/silverstripe [301] 
	http://www.firstgalaxies.org/ [200] SilverStripe
	http://www.fuel.ie/silverstripe/ [200] SilverStripe
	http://www.enamaine.org/ [200] SilverStripe
	http://www.frussian.com.ar/ [200] SilverStripe
	http://www.infinitestillness.ie/ss [301] 
	http://www.holistichealth.com/ [200] SilverStripe
	http://www.gsbc.edu/ [200] SilverStripe
	http://www.hutmacherin.com/ [301] 
	http://www.infinitestillness.ie/ss/ [200] SilverStripe
	http://www.hutmacherin.com/start [200] probably SilverStripe
	http://www.latenightdisco.com/ [200] SilverStripe
	http://www.kitesurfnelson.co.nz/ [200] probably SilverStripe
	http://www.intandemtheatre.org/ [200] SilverStripe
	http://www.moerakihavenmotel.co.nz/ [200] SilverStripe
	http://www.lisamarieelliott.com/ [200] SilverStripe
	http://www.moonlitekustoms.com/ [200] SilverStripe
	http://www.naciondnb.com/ ERROR: Socket error getaddrinfo: Name or service not known
	http://www.maklerservice-greiz.de/ [200] SilverStripe
	http://www.monjasantner.de/ [200] SilverStripe
	http://www.nadabakery.co.nz/ [200] SilverStripe
	http://www.rcaforum.org.nz/ [200] SilverStripe
	http://www.moto-racepaint.com/ [200] SilverStripe
	http://www.peterpanvakantieclub.nl/ [200] SilverStripe
	http://www.robert80.de/ [200] SilverStripe
	http://www.silverstripe.com/ [200] SilverStripe
	http://www.stillrunnin.com/ [200] SilverStripe
	http://www.textiprints.com/ [200] SilverStripe
	http://www.thelightboxdesigns.com/ [200] 
	http://www.silverstripe.org.pl/ [200] SilverStripe
	http://www.whileyouwait.co.nz/ [200] SilverStripe
	http://www.tobychampion.co.uk/ [500] 
	http://www.wend.nl/ [200] SilverStripe
	http://www.upstreamgroup.com/ [200] SilverStripe
	http://www.verus.com.tr/ [200] SilverStripe

Most of the sites with a HTTP 301 status redirect to a page with a status of 200 and are identified as SilverStripe. Some sites are no longer active so I removed them from the examples list.

Of the 45 live websites, 43 are identified as SilverStripe. It is accurate to say our plugin, using only passive matches identifies about 95% of SilverStripe websites.


Extract version numbers
-----------------------

The meta generator tag sometimes contains version numbers which we want to detect.

	#<meta name="generator" http-equiv="generator" content="SilverStripe 2.0 - http://www.silverstripe.com" >
	#<meta name="generator" http-equiv="generator" content="SilverStripe 2.0 - http://www.silverstripe.com" />
	#<meta name="generator" http-equiv="generator" content="SilverStripe 2.3.0 - http://www.silverstripe.com" />
	#<meta name="generator" http-equiv="generator" content="SilverStripe 2.3.1 - http://www.silverstripe.com" >
	#<meta name="generator" http-equiv="generator" content="SilverStripe 2.3.1 - http://www.silverstripe.com" />
	#<meta name="generator" http-equiv="generator" content="SilverStripe - http://www.silverstripe.com" >
	#<meta name="generator" http-equiv="generator" content="SilverStripe - http://www.silverstripe.com" />

To extract the version number I need to write some custom ruby code in the passive function.

	if @body ~= /<meta name="generator"[^>]*content="SilverStripe [0-9\.]+/
			v=@body.scan(/<meta name="generator"[^>]*content="SilverStripe ([0-9\.]+)/)[0].to_s
			m << {:name=>"meta generator version", :probability=>100, :version=>v }
	end
	
This code checks that the regular expression that has the SilverStripe version within the meta generator tag is in the HTML body. If so, it copies it into a variable called v then includes it in a hash which is put into the array of matches.

Testing the code shows that version numbers are being extracted. Note that if SilverStripe were to include letters after version number, eg. 2.3.5b that the letter wouldn't be recognised.

	$ ./whatweb  -psilverstripe ./plugin-development/tests/silverstripe/*html
	./plugin-development/tests/silverstripe/charcoalinteriors.com.au-.html [] 
	./plugin-development/tests/silverstripe/beatone.co.uk-.html [] SilverStripe
	./plugin-development/tests/silverstripe/maungataniwha.co.nz-.html [] probably SilverStripe
	./plugin-development/tests/silverstripe/hungryhearts.no.html [] SilverStripe
	./plugin-development/tests/silverstripe/customcanvas.fritzandandre.com-.html [] SilverStripe
	./plugin-development/tests/silverstripe/victoriaoruwari.com-.html [] 
	./plugin-development/tests/silverstripe/weonline.in.html [] SilverStripe[2.3.1]
	./plugin-development/tests/silverstripe/unbounded.org-.html [] SilverStripe[2.0]
	./plugin-development/tests/silverstripe/www.benpearce.co.nz-.html [] 
	./plugin-development/tests/silverstripe/www.choidoco.com-demo-.html [] SilverStripe
	./plugin-development/tests/silverstripe/www.bradyinc.com-.html [] SilverStripe
	./plugin-development/tests/silverstripe/www.cavendishimaging.com-.html [] SilverStripe
	./plugin-development/tests/silverstripe/www.chapmansurfboards.com-.html [] SilverStripe[2.0]
	./plugin-development/tests/silverstripe/www.arprostatecancer.org-.html [] 
	./plugin-development/tests/silverstripe/www.executivemediasearch.com-.html [] 
	./plugin-development/tests/silverstripe/www.enamaine.org-.html [] SilverStripe
	./plugin-development/tests/silverstripe/www.firstgalaxies.org-.html [] SilverStripe
	./plugin-development/tests/silverstripe/www.fairtradenap.net-.html [] probably SilverStripe
	./plugin-development/tests/silverstripe/www.fuel.ie-silverstripe.html [] SilverStripe
	./plugin-development/tests/silverstripe/www.holistichealth.com-.html [] SilverStripe
	./plugin-development/tests/silverstripe/www.hutmacherin.com-.html [] probably SilverStripe
	./plugin-development/tests/silverstripe/www.gsbc.edu-.html [] SilverStripe
	./plugin-development/tests/silverstripe/www.clockwork.co.nz-.html [] 
	./plugin-development/tests/silverstripe/www.frussian.com.ar-.html [] SilverStripe[2.0]
	./plugin-development/tests/silverstripe/www.infinitestillness.ie-ss.html [] SilverStripe
	./plugin-development/tests/silverstripe/www.lisamarieelliott.com-.html [] SilverStripe
	./plugin-development/tests/silverstripe/www.latenightdisco.com-.html [] SilverStripe
	./plugin-development/tests/silverstripe/www.kitesurfnelson.co.nz-.html [] probably SilverStripe
	./plugin-development/tests/silverstripe/www.intandemtheatre.org-.html [] SilverStripe
	./plugin-development/tests/silverstripe/www.moerakihavenmotel.co.nz-.html [] SilverStripe[2.0]
	./plugin-development/tests/silverstripe/www.maklerservice-greiz.de-.html [] SilverStripe
	./plugin-development/tests/silverstripe/www.monjasantner.de-.html [] SilverStripe
	./plugin-development/tests/silverstripe/www.naciondnb.com-.html [] 
	./plugin-development/tests/silverstripe/www.moto-racepaint.com-.html [] SilverStripe
	./plugin-development/tests/silverstripe/www.nadabakery.co.nz-.html [] SilverStripe[2.3.1]
	./plugin-development/tests/silverstripe/www.moonlitekustoms.com-.html [] SilverStripe
	./plugin-development/tests/silverstripe/www.peterpanvakantieclub.nl-.html [] SilverStripe[2.0]
	./plugin-development/tests/silverstripe/www.robert80.de-.html [] SilverStripe
	./plugin-development/tests/silverstripe/www.rcaforum.org.nz-.html [] SilverStripe[2.3.0]
	./plugin-development/tests/silverstripe/www.silverstripe.org.pl-.html [] 
	./plugin-development/tests/silverstripe/www.thelightboxdesigns.com-.html [] 
	./plugin-development/tests/silverstripe/www.silverstripe.com-.html [] SilverStripe
	./plugin-development/tests/silverstripe/www.textiprints.com-.html [] SilverStripe
	./plugin-development/tests/silverstripe/www.stillrunnin.com-.html [] SilverStripe
	./plugin-development/tests/silverstripe/www.tobychampion.co.uk-.html [] SilverStripe
	./plugin-development/tests/silverstripe/www.upstreamgroup.com-.html [] SilverStripe[2.3.1]
	./plugin-development/tests/silverstripe/www.whileyouwait.co.nz-.html [] SilverStripe[2.0]
	./plugin-development/tests/silverstripe/www.wend.nl-.html [] probably SilverStripe
	./plugin-development/tests/silverstripe/www.verus.com.tr-.html [] SilverStripe

We can see that only some websites include the version number in the meta generator tag.

The final plugin
----------------

	Plugin.define "SilverStripe" do
	author "Andrew Horton"
	version "0.1"
	description "SilverStripe is an opensource CMS written in PHP. It can run on Apache, IIS or lighthttpd. Homepage: http://www.silverstripe.com"

	examples %w|http://beatone.co.uk/ http://charcoalinteriors.com.au/ http://customcanvas.fritzandandre.com/ http://hungryhearts.no http://maungataniwha.co.nz/ http://unbounded.org/ http://victoriaoruwari.com/ http://weonline.in  http://www.arprostatecancer.org/ http://www.benpearce.co.nz/ http://www.bradyinc.com/ http://www.cavendishimaging.com/ http://www.chapmansurfboards.com/ http://www.choidoco.com/demo/ http://www.enamaine.org/ http://www.executivemediasearch.com/ http://www.fairtradenap.net/ http://www.firstgalaxies.org/ http://www.frussian.com.ar/ http://www.fuel.ie/silverstripe http://www.gsbc.edu/ http://www.holistichealth.com/ http://www.hutmacherin.com/ http://www.infinitestillness.ie/ss http://www.intandemtheatre.org/ http://www.kitesurfnelson.co.nz/ http://www.latenightdisco.com/ http://www.lisamarieelliott.com/ http://www.maklerservice-greiz.de/ http://www.moerakihavenmotel.co.nz/ http://www.monjasantner.de/ http://www.moonlitekustoms.com/ http://www.moto-racepaint.com/ http://www.nadabakery.co.nz/ http://www.peterpanvakantieclub.nl/ http://www.rcaforum.org.nz/ http://www.robert80.de/ http://www.silverstripe.com/ http://www.silverstripe.org.pl/ http://www.stillrunnin.com/ http://www.textiprints.com/ http://www.thelightboxdesigns.com/ http://www.tobychampion.co.uk/ http://www.upstreamgroup.com/ http://www.verus.com.tr/ http://www.wend.nl/ http://www.whileyouwait.co.nz/ |

	#<meta name="generator" http-equiv="generator" content="SilverStripe 2.0 - http://www.silverstripe.com" >
	#<meta name="generator" http-equiv="generator" content="SilverStripe 2.0 - http://www.silverstripe.com" />
	#<meta name="generator" http-equiv="generator" content="SilverStripe 2.3.0 - http://www.silverstripe.com" />
	#<meta name="generator" http-equiv="generator" content="SilverStripe 2.3.1 - http://www.silverstripe.com" >
	#<meta name="generator" http-equiv="generator" content="SilverStripe 2.3.1 - http://www.silverstripe.com" />
	#<meta name="generator" http-equiv="generator" content="SilverStripe - http://www.silverstripe.com" >
	#<meta name="generator" http-equiv="generator" content="SilverStripe - http://www.silverstripe.com" />

	matches [
	{:name=>"meta generator tag",
	:probability=>100,
	:regexp=>/<meta name="generator"[^>]*content="SilverStripe/}, #" I have included a comment with double quotes for the benefit of syntax hilighting in gedit

	{:name=>"layout, typography, form css files",
	:probability=>75,
	:regexp=>/<link[^>]*stylesheet[^>]*layout.css[^>]*>[^<]*<link[^>]*stylesheet[^>]*typography.css[^>]*>[^<]*<link[^>]*stylesheet[^>]*form.css[^>]*>/},

	{:name=>'<img src="/assets/something/_resampled/something.jpg"',
	:probability=>75,
	:regexp=>/<img src="\/assets\/[^\/]+\/_resampled\/[^"]+.jpg"/} #"
	] 

	#	Set-Cookie: PastVisitor=1; expires=Wed, 02-Jun-2010 04:05:29 GMT; path=/
	def passive
		m=[]
		m << {:name=>"PastVisitor Cookie", :probability=>100 } if @meta["set-cookie"] =~ /PastVisitor=[0-9]+.*/	
	
		if @body =~ /<meta name="generator"[^>]*content="SilverStripe [0-9\.]+/
				v=@body.scan(/<meta name="generator"[^>]*content="SilverStripe ([0-9\.]+)/)[0].to_s
				m << {:name=>"meta generator version", :probability=>100, :version=>v }
		end
		m
	end

	end


8. Closing Notes
=======================================

I have shown you the process of how to write a simple WhatWeb plugin. The most important part of the process is the complete research that happens before writing any matches.

Some people will be tempted to write a pattern for the meta-generator tag then stop. Such an approach would identify about 75% of SilverStripe sites. Futhermore there is a generic meta generator tag plugin so such an effort would be of little practical use.

Our final plugin identifies about 95% of SilverStripe websites using only passive matches. An aggressive plugin that guesses URLs would increase the effectivenses to 100% however aggressive plugins are not stealthy, they use more bandwidth and so are less suitable for large scale website identification. However aggressive plugins are useful during penetration testing to identify frameworks. Writing aggressive plugins is a more advanced topic and will be covered in another tutorial.

Please submit your plugins to andrew [at] morningstarsecurity.com to be included in the next release of WhatWeb.



9. Resources
=======================================

The best way to learn how to develop plugins is by reading the plugins bundled with WhatWeb.

To learn and test regular expressions visit: http://rubular.com/
WhatWeb homepage: http://www.morningstarsecurity.com/research/whatweb

Visit MorningStar Security for the best Information Security news at http://www.morningstarsecurity.com/news