|Online policy research, outreach, and action on issues such as access, privacy, defamation, and the digital divide.|
Web / Email Hosting
About / Contact
Volunteer / Intern
Online Policy Group Software Utilities
The Online Policy Group has developed a set of software utilities to assist in ongoing research about policy issues such as online access, privacy, digital defamation, and the digital divide.
Please email bug reports, corrections, and request for enhancements to email@example.com. Our volunteers will handle them on a time-available basis.
The source code for this software is distributed freely as a matter of full disclosure regarding the research techniques of the Online Policy Group and to encourage other organizations to develop open source software utilities for conducting similar research.
Since these software utilities run in a Java runtime environment, you must have the Java 1.2 or 1.3 runtime environment installed on your machine before you execute the software.
To determine if the Java 1.2 or 1.3 runtime environment is already installed on your computer:
Make sure that you have specified the directory containing java.exe in the command pathname so that you may run the utilities. For each operating system, there is a separate procedure for specifying the command pathname.
On Windows, you can edit your command pathname to include the directory containing java.exe by editing the AUTOEXEC.BAT file as follows:
[Need directions for other platforms here.]
Once you have the Java 1.2 or 1.3 runtime environment installed on your computer, you can download the software utilities file called onlinepolicy.jar by following this procedure:
CLASSPATH is a system environment variable required to run the OPG software utilities.
Set your CLASSPATH variable as follows:
You can run any of the software utilities from a command line which varies according to the computer and operating system you are using.
On Windows machines, if you have not already started up MS-DOS, start up the MS-DOS Prompt by as described in the Configure the Software Utilities section.
Before attempting to run any of the utilities, change directories to the directory where the software utilities are located by typing:
[Instructions for Mac and Unix machines should go here.]
Specific directions for running each of the software utilities are listed below.
The RandomPageFinder utility generates n (specified by user) sites by constructing a random IP address and then testing that IP for existence of a web page. The test is a simple connection to that IP address and a search for a title (in the Head HTML tag).
RandomPageFinder continues until the desired number of pages is found, which make take hours or even days to find a handful of pages.
Here is an example that generates two random sites into a report file called
and logging to a file called
java RandomPageFinder -count 2 -report random.0103231719.txt > random.0103231719.log.txt
To stop execution of the utility before it has completed, you can type Ctrl-C on most operatings systems.
Note: On Unix, you can run the utility in the "background", so you can run multiple commands simultaneously. If you are running the bash shell, you can logout and leave the commands running on the computer until you have a chance to come back and retrieve the results.
java RandomPageFinder -count n [ -report fn ] [-max n ]
This program reads a list of web addresses (URL's) and determines if each one is reachable. For each web address that is reachable, SiteBlockMapper follows all the links on the page pointed at by the URL to determine if they are reachable. This process can be repeated for a specified depth.
The objective is to determine if a given web site is available while an Internet filter is active.
Occasionally, SiteBlockMapper finds a page which confuses the scanner (i.e. the scanner chokes on some obscure HTML). In those cases, the message 'Scan Error' occurs for that web page.
SiteBlockMapper scans each web page only once. However, the SiteBlockMapper output may repeat a given web address since the same web address may be encountered from multiple links within a site.
SiteBlockMapper scans links to pages on other sites, but the links are never followed onto sites that were not specified in the original site list.
Here is an example running SiteBlockMapper on a list of files called siteList.txt with the utility search two levels down of links.
java SiteBlockMapper -maxdepth 2 siteList.txt
java SiteBlockMapper -maxdepth n [-debug i] siteList.txt
WebMasher is simple web browser which allows you to select a web page and then obtain a list of the web addresses for all the links on that page. You can use WebMasher to browse keyword searches on a search engine to collect a list of web pages included in the the search engine result.
Once you have collected the list of web addresses and edited it, you can save the list into a file. This same file format can be read by WebGrab.
This program reads the file produced by the WebMasher (which is a list of web addresses) and loads the HTML code from each web address onto the local disk. WebGrab generates a unique filename for each HTML file based on the web address for the file.
java WebGrab webmashfile
The source code for the Online Policy Group software utilities is included in the onlinepolicy.jar file, and the source code is accessible by unzipping the onlinepolicy.jar file.
As a condition of use of the source code, please send any code improvements to firstname.lastname@example.org.
Diversity of Content
Online Policy Group, Inc.
All rights reserved.