[an error occurred while processing this directive]


Retrieve ISO protected web pages

Summary: Webisoget provides a means to automatically and programmatically retrieve pages from web sites that require ISO login - pubcookie or shibboleth, for example, or google, for example,
Source: github

Retrieving pages

The presence of a url command, either on the command line or interactively, causes webisoget to retrieve the specified web page. In certain conditions it will use content in that page to retrieve a second page:

  1. If the page headers contain an automatic redirection, "Location" or "Refresh", the redirection will be followed.

  2. If the page text contains a meta refresh, and its time delay is zero, the redirection will be followed.

  3. If one or more anchor's have been specified, and the page contains the matching link, that link will be followed.

  4. If one or more frame's have been specified, and the page contains the matching frame, that frame will be loaded.

  5. If one or more form's have been specified, and the page contains the matching form, that form will be filled in and submitted.

The second page may lead in a similar manner to a third and fourth and etc.

Filling in forms

Web forms consist of name-value pairs. Webisoget's form file contains descriptions, one line per form, of forms you want to fill out and submit. General syntax is name=value, with entries separated by semicolons.

name=form name;field_name=field_value;field_name=field_value;...;

  1. You may specify the same field name more than once to submit multiple values, as for example a form with checkboxes might do.
  2. In value strings you may use:
    • "\;" to specify a semicolon
    • "\n" for a newline
    • "\\" for a backslash.
  3. Specify name=form_name; to match a a specific form by name.
  4. Specify name=; to match a form with no name.
  5. Omit the name= to match any form.
  6. Specify a domain name, ";", to match forms only in that domain.
  7. Specify a submit name, "submit_name=name;", to match a form with a submit button with that name.
  8. Specify a submit value, "submit_value=value;", to match a form with a submit button with that value.
  9. Textareas are treated like text inputs.

For example, the form specification

name=query; user=spud; pass=potato;
Would submit pubcookie's login form.

Many forms have name-value pairs with default values, hidden fields, for example. Webisoget will automatically fill in any default values.

Working with cookies

Web ISO login systems use cookies to communicate between their various components and to maintain a session once logged-in. Webisoget allows you to cache and reuse those session cookies - so you don't have to repeat the entire login sequence each time you test the application.

Using certificates

As with any application using SSL, webisoget allows you to specify a CA file of trusted certificate authorities; certificate and key files containing a certificate to use if a target site requires one. You can also choose to ignore verification of the server's certificate.

Accessing cluster members

A service is often supported by a cluster of several systems, each responding to the same domain name. A weblogin service at "" might consist of the servers "", "", ..., "". Webisoget allows you to map the generic cluster name to a specific system in that cluster.

Working with RESTful services

Webisoget supports the most common REST operations,

PUTsee -putfile
DELETEsee -delete
POSTsee -postfile

but cannot help you with the XML.

Webisoget processes commands on the command line in the order they appear. It does NOT read the entire set of arguments before proceeding. If no URL is specified webisoget will process the arguments and wait for interactive commands.

Interactive commands are just like arguments except they don't use the leading dash.

Controlling verbosity
-cookies Show any cookies received for the session.
-pagetimes Show elapsed time of operations.
-verbose Give commentary on progress.
-debug More commentary than verbose.
-debug2 More commentary than debug.
-debug3 A really lot of commentary.
Specifying forms and links
-anchor anchor_text Follow the specified link: <a href=..>anchor_text</a>
-frame name Follow the named frame
-form form_description In-line form description. Better to use a form file.
-formfile form_filename A file of form descriptions.
Using certificates
-cafile CA_filename A file of Certificate Authority certificates. These will be used to authenticate any certificates sent by a server.
-cert cert_filename A file containing a certificate to use when a site requires certificate authentication. Must be PEM format.
-certkey key_for_cert A file containing your certificate's key. Must be PEM format.
-noverify Do NOT verify certificates from servers.
Retrieving pages
-cache cache_file Load session cookies from this file. Save session cookies to this file before exit.
-continue If processing was stopped due to a hop count this will continue page processing.
-delete The next request will be a DELETE.
-header header_text Add a header to the request. Do NOT include and CR or LF.
-map alias=real Any request for alias (the cluster) will be sent to real (the menber).
-maxhop count Maximum number of redirections to follow. Default is 20. Zero means get only one page.
-maxtext count Maximum characters of text that will be printed to stdout. Default is 1M. Zero means no limit.
-out output_file The last page retrieved will be saved to this file.
-timeout seconds Maximum time to wait for a page.
-text Display the text of the page. If there are redirections or forms, only the last page will be displayed.
-url url Retrieve the URL, following forms and redirections as requested. If specified as an argument webisoget will exit and not enter interactive mode.
Uploading content
-postfile filename POST the content of the specified file.
-putfile filename PUT the content of the specified file.
-quit Exit the program.
-version Show program's version information and exit.


Pubcookie uses several Location redirection and one form (query).

To retrieve a page protected by pubcookie, using the user user with id "spud" and password "potato" use this formfile (spud.login)

  name=query; user=spud; pass=potato;
and this command
  $ webisoget -verbose -out page.txt -formfile spud.login \


Shibboleth uses several Location redirections, possibly a form at a wayf, and, of course, whatever the IdP's login requirements are.

To retrieve a page protected by shibboleth (with pubcookie), using the user user with id "spud" and password "potato" use this formfile (spud.login);;
  name=query; user=spud; pass=potato;
  name=; submit_value=Continue; domain=your_idp_domain_name
(assuming ivy is registered with InCommon) and the same command
  $ webisoget -verbose -out page.txt -formfile spud.login \


Google (this is a google openid-connect login) uses a form and some Location redirections. After you're logged in there may be more pages to negotiate. Save yourself trouble and prerelease any attributes you might need.

To accomplish a google openid-connect login using the user user with email "" and password "potato" use this formfile (spud.login);; Passwd=potato;

Webisoget is licensed under the terms of the Apache License 2. You are free to copy, use, and modify it to your needs according to the terms of that license.

[an error occurred while processing this directive]
Jim Fox
UW Technology
Identity and Access Management
University of Washington
[an error occurred while processing this directive]
[an error occurred while processing this directive]
[an error occurred while processing this directive]
Fox's Home

© 1983-2017, University of Washington