- Documentation
- Reference manual
- Packages
- SWI-Prolog HTTP support
- The HTTP server libraries
- Creating an HTTP reply
- library(http/http_dispatch): Dispatch requests in the HTTP server
- library(http/http_dirindex): HTTP directory listings
- library(http/http_files): Serve plain files from a hierarchy
- library(http/http_session): HTTP Session management
- library(http/http_cors): Enable CORS: Cross-Origin Resource Sharing
- library(http/http_authenticate): Authenticate HTTP connections using 401 headers
- library(http/http_digest): HTTP Digest authentication
- library(http/http_dyn_workers): Dynamically schedule HTTP workers.
- Custom Error Pages
- library(http/http_openid): OpenID consumer and server library
- Get parameters from HTML forms
- Request format
- Running the server
- The wrapper library
- library(http/http_host): Obtain public server location
- library(http/http_log): HTTP Logging module
- Debugging HTTP servers
- library(http/http_header): Handling HTTP headers
- The library(http/html_write) library
- library(http/js_write): Utilities for including JavaScript
- library(http/http_path): Abstract specification of HTTP server locations
- library(http/html_head): Automatic inclusion of CSS and scripts links
- library(http/http_pwp): Serve PWP pages through the HTTP server
- The HTTP server libraries
- SWI-Prolog HTTP support
3.14 Running the server
The functionality of the server should be defined in one Prolog file (of course this file is allowed to load other files). Depending on the wanted server setup this‘body' is wrapped into a small Prolog file combining the body with the appropriate server interface. There are three supported server-setups. For most applications we advice the multi-threaded server. Examples of this server architecture are the PlDoc documentation system and the SeRQL Semantic Web server infrastructure.
All the server setups may be wrapped in a reverse proxy to make them available from the public web-server as described in section 3.14.7.
- Using
library(thread_httpd)
for a multi-threaded server
This server exploits the multi-threaded version of SWI-Prolog, running the users body code parallel from a pool of worker threads. As it avoids the state engine and copying required in the event-driven server it is generally faster and capable to handle multiple requests concurrently.This server is harder to debug due to the involved threading, although the GUI tracer provides reasonable support for multi-threaded applications using the tspy/1 command. It can provide fast communication to multiple clients and can be used for more demanding servers.
- Using
library(inetd_httpd)
for server-per-client
In this setup the Unix inetd user-daemon is used to initialise a server for each connection. This approach is especially suitable for servers that have a limited startup-time. In this setup a crashing client does not influence other requests.This server is very hard to debug as the server is not connected to the user environment. It provides a robust implementation for servers that can be started quickly.
3.14.1 Common server interface options
All the server interfaces provide http_server(:Goal, +Options)
to create the server. The list of options differ, but the servers share
common options:
- port(?Port)
- Specify the port to listen to for stand-alone servers. Port is either an integer or unbound. If unbound, it is unified to the selected free port.
3.14.2 Multi-threaded Prolog
The library(http/thread_httpd.pl)
provides the
infrastructure to manage multiple clients using a pool of worker-threads.
This realises a popular server design, also seen in Java Tomcat and
Microsoft .NET. As a single persistent server process maintains
communication to all clients startup time is not an important issue and
the server can easily maintain state-information for all clients.
In addition to the functionality provided by the inetd server, the
threaded server can also be used to realise an HTTPS server exploiting
the library(ssl)
library. See option ssl(+SSLOptions)
below.
- http_server(:Goal, +Options)
- Create the server. Options must provide the
port(?Port)
option to specify the port the server should listen to. If Port is unbound an arbitrary free port is selected and Port is unified to this port-number. The server consists of a small Prolog thread accepting new connection on Port and dispatching these to a pool of workers. Defined Options are:- port(?Address)
- Address to bind to. Address is either a port (integer) or a term Host:Port. The port may be a variable, causing the system to select a free port and unify the variable with the selected port. See also tcp_bind/2.
- workers(+N)
- Defines the number of worker threads in the pool. Default is to use five workers. Choosing the optimal value for best performance is a difficult task depending on the number of CPUs in your system and how much resources are required for processing a request. Too high numbers makes your system switch too often between threads or even swap if there is not enough memory to keep all threads in memory, while a too low number causes clients to wait unnecessary for other clients to complete. See also http_workers/2.
- timeout(+SecondsOrInfinite)
- Determines the maximum period of inactivity handling a request. If no
data arrives within the specified time since the last data arrived, the
connection raises an exception, and the worker discards the client and
returns to the pool-queue for a new client. If it is
infinite
, a worker may wait forever on a client that doesn't complete its request. Default is 60 seconds. - keep_alive_timeout(+SecondsOrInfinite)
- Maximum time to wait for new activity on Keep-Alive connections. Choosing the correct value for this parameter is hard. Disabling Keep-Alive is bad for performance if the clients request multiple documents for a single page. This may ---for example-- be caused by HTML frames, HTML pages with images, associated CSS files, etc. Keeping a connection open in the threaded model however prevents the thread servicing the client servicing other clients. The default is 2 seconds.
- local(+KBytes)
- Size of the local-stack for the workers. Default is taken from the commandline option.
- global(+KBytes)
- Size of the global-stack for the workers. Default is taken from the commandline option.
- trail(+KBytes)
- Size of the trail-stack for the workers. Default is taken from the commandline option.
- ssl(+SSLOptions)
- Use SSL (Secure Socket Layer) rather than plain TCP/IP. A server created
this way is accessed using the
https://
protocol. SSL allows for encrypted communication to avoid others from tapping the wire as well as improved authentication of client and server. The SSLOptions option list is passed to ssl_context/3. The port option of the main option list is forwarded to the SSL layer. See thelibrary(ssl)
library for details.
- http_server_property(?Port, ?Property)
- True if Property is a property of the HTTP server running at
Port. Defined properties are:
- goal(:Goal)
- Goal used to start the server. This is often http_dispatch/1.
- scheme(-Scheme)
- Scheme is one of
http
orhttps
. - start_time(-Time)
- Time-stamp when the server was created. See format_time/3 for creating a human-readable representation.
- http_workers(+Port, ?Workers)
- Query or manipulate the number of workers of the server identified by
Port. If Workers is unbound it is unified with the
number of running servers. If it is an integer greater than the current
size of the worker pool new workers are created with the same
specification as the running workers. If the number is less than the
current size of the worker pool, this predicate inserts a number of‘quit'
requests in the queue, discarding the excess workers as they finish
their jobs (i.e. no worker is abandoned while serving a client).
This can be used to tune the number of workers for performance. Another possible application is to reduce the pool to one worker to facilitate easier debugging.
- http_add_worker(+Port, +Options)
- Add a new worker to the HTTP server for port Port. Options
overrule the default queue options. The following additional options are
processed:
- max_idle_time(+Seconds)
- The created worker will automatically terminate if there is no new work within Seconds.
- http_stop_server(+Port, +Options)
- Stop the HTTP server at Port. Halting a server is done gracefully, which means that requests being processed are not abandoned. The Options list is for future refinements of this predicate such as a forced immediate abort of the server, but is currently ignored.
- http_current_worker(?Port, ?ThreadID)
- True if ThreadID is the identifier of a Prolog thread serving Port. This predicate is motivated to allow for the use of arbitrary interaction with the worker thread for development and statistics.
- http_spawn(:Goal, +Spec)
- Continue handling this request in a new thread running Goal.
After
http_spawn/2,
the worker returns to the pool to process new requests. In its simplest
form, Spec is the name of a thread pool as defined by
thread_pool_create/3.
Alternatively it is an option list, whose options are passed to thread_create_in_pool/4
if Spec contains
pool(Pool)
or to thread_create/3 of the pool option is not present. If the dispatch module is used (see section 3.2), spawning is normally specified as an option to the http_handler/3 registration.We recomment the use of thread pools. They allow registration of a set of threads using common characteristics, specify how many can be active and what to do if all threads are active. A typical application may define a small pool of threads with large stacks for computation intensive tasks, and a large pool of threads with small stacks to serve media. The declaration could be the one below, allowing for max 3 concurrent solvers and a maximum backlog of 5 and 30 tasks creating image thumbnails.
:- use_module(library(thread_pool)). :- thread_pool_create(compute, 3, [ local(20000), global(100000), trail(50000), backlog(5) ]). :- thread_pool_create(media, 30, [ local(100), global(100), trail(100), backlog(100) ]). :- http_handler('/solve', solve, [spawn(compute)]). :- http_handler('/thumbnail', thumbnail, [spawn(media)]).
3.14.3 library(http/http_unix_daemon): Run SWI-Prolog HTTP server as a Unix system daemon
- See also
- The file <swi-home>/doc/packages/examples/http/linux-init-script provides a /etc/init.d script for controlling a server as a normal Unix service.
- To be done
- Cleanup issues wrt. loading and initialization of xpce.
This module provides the logic that is needed to integrate a process into the Unix service (daemon) architecture. It deals with the following aspects, all of which may be used/ignored and configured using commandline options:
- Select the
port(s)
to be used by the server - Run the startup of the process as root to perform privileged tasks and the server itself as unpriviledged user, for example to open ports below 1000.
- Fork and detach from the controlling terminal
- Handle console and debug output using a file and/or the syslog daemon.
- Manage a pid file
The typical use scenario is to write a file that loads the following components:
- The application code, including http handlers (see http_handler/3).
- This library
In the code below, ?- [load].
loads the remainder of the
webserver code. This is often a sequence of use_module/1
directives.
:- use_module(library(http/http_unix_daemon)). :- [load].
The program entry point is http_daemon/0, declared using initialization/2. This may be overruled using a new declaration after loading this library. The new entry point will typically call http_daemon/1 to start the server in a preconfigured way.
:- use_module(library(http/http_unix_daemon)). :- initialization(run, main). run :- ... http_daemon(Options).
Now, the server may be started using the command below. See http_daemon/0 for supported options.
% [sudo] swipl mainfile.pl [option ...]
Below are some examples. Our first example is completely silent,
running on port 80 as user www
.
% swipl mainfile.pl --user=www --pidfile=/var/run/http.pid
Our second example logs HTTP interaction with the syslog daemon for
debugging purposes. Note that the argument to --debug
= is a
Prolog term and must often be escaped to avoid misinterpretation by the
Unix shell. The debug option can be repeated to log multiple debug
topics.
% swipl mainfile.pl --user=www --pidfile=/var/run/http.pid \ --debug='http(request)' --syslog=http
Broadcasting The library uses broadcast/1 to allow hooking certain events:
- http(pre_server_start)
- Run after fork, just before starting the HTTP server. Can be used to load additional files or perform additional initialisation, such as starting additional threads. Recall that it is not possible to start threads before forking.
- http(post_server_start)
- Run after starting the HTTP server.
- http_daemon
- Start the HTTP server as a daemon process. This predicate processes the
commandline arguments below. Commandline arguments that specify servers
are processed in the order they appear using the following schema:
- Arguments that act as default for all servers.
--http=Spec
or--https=Spec
is followed by arguments for that server until the next--http=Spec
or--https=Spec
or the end of the options.- If no
--http=Spec
or--https=Spec
appears, one HTTP server is created from the specified parameters.Examples:
--workers=10 --http --https --http=8080 --https=8443 --http=localhost:8080 --workers=1 --https=8443 --workers=25
- --port=Port
- Start HTTP server at Port. It requires root permission and the option
--user=User
to open ports below 1000. The default port is 80. If--https
is used, the default port is 443. - --ip=IP
- Only listen to the given IP address. Typically used as
--ip=localhost
to restrict access to connections from localhost if the server itself is behind an (Apache) proxy server running on the same host. - --debug=Topic
- Enable debugging Topic. See debug/3.
- --syslog=Ident
- Write debug messages to the syslog daemon using Ident
- --user=User
- When started as root to open a port below 1000, this option must be
provided to switch to the target user for operating the server. The
following actions are performed as root, i.e.,
before switching to User:
- open the
socket(s)
- write the pidfile
- setup syslog interaction
- Read the certificate, key and password file (
--pwfile=File
)
- open the
- --group=Group
- May be used in addition to
--user
. If omitted, the login group of the target user is used. - --pidfile=File
- Write the PID of the daemon process to File.
- --output=File
- Send output of the process to File. By default, all Prolog console output is discarded.
- --fork[=Bool]
- If given as
--no-fork
or--fork=false
, the process runs in the foreground. - --http[=(Bool
|
Port|
BindTo:Port)] - Create a plain HTTP server. If the argument is missing or
true
, create at the specified or default address. Else use the given port and interface. Thus,--http
creates a server at port 80,--http=8080
creates one at port 8080 and--http=localhost:8080
creates one at port 8080 that is only accessible fromlocalhost
. - --https[=(Bool
|
Port|
BindTo:Port)] - As
--http
, but creates an HTTPS server. Use--certfile
,--keyfile
,-pwfile
,--password
and--cipherlist
to configure SSL for this server. - --certfile=File
- The server certificate for HTTPS.
- --keyfile=File
- The server private key for HTTPS.
- --pwfile=File
- File holding the password for accessing the private key. This is
preferred over using
--password=PW
as it allows using file protection to avoid leaking the password. The file is read before the server drops privileges when started with the--user
option. - --password=PW
- The password for accessing the private key. See also‘--pwfile`.
- --cipherlist=Ciphers
- One or more cipher strings separated by colons. See the OpenSSL documentation for more information. Starting with SWI-Prolog 7.5.11, the default value is always a set of ciphers that was considered secure enough to prevent all critical attacks at the time of the SWI-Prolog release.
- --interactive[=Bool]
- If
true
(defaultfalse
) implies--no-fork
and presents the Prolog toplevel after starting the server. - --gtrace=[Bool]
- Use the debugger to trace http_daemon/1.
- --sighup=Action
- Action to perform on
kill -HUP <pid>
. Default isreload
(running make/0). Alternative isquit
, stopping the server.
Other options are converted by argv_options/3 and passed to http_server/1. For example, this allows for:
- --workers=Count
- Set the number of workers for the multi-threaded server.
http_daemon/0 is defined as below. The start code for a specific server can use this as a starting point, for example for specifying defaults.
http_daemon :- current_prolog_flag(argv, Argv), argv_options(Argv, _RestArgv, Options), http_daemon(Options).
- See also
- http_daemon/1
- http_daemon(+Options)
- Start the HTTP server as a daemon process. This predicate processes a
Prolog option list. It is normally called from http_daemon/0,
which derives the option list from the command line arguments.
Error handling depends on whether or not
interactive(true)
is in effect. If so, the error is printed before entering the toplevel. In non-interactive mode this predicate callshalt(1)
. - [semidet,multifile]http_certificate_hook(+CertFile, +KeyFile, -Password)
- Hook called before starting the server if the --https option is used. This hook may be used to create or refresh the certificate. If the hook binds Password to a string, this string will be used to decrypt the server private key as if the --password=Password option was given.
- [semidet,multifile]http_server_hook(+Options)
- Hook that is called to start the HTTP server. This hook must be
compatible to
http_server(Handler, Options)
. The default is provided by start_server/1. - [multi,multifile]http:sni_options(-HostName, -SSLOptions)
- Hook to provide Server Name Indication (SNI) for TLS servers. When starting an HTTPS server, all solutions of this predicate are collected and a suitable sni_hook/1 is defined for ssl_context/3 to use different contexts depending on the host name of the client request. This hook is executed before privileges are dropped.
3.14.4 From (Unix) inetd
All modern Unix systems handle a large number of the services they
run through the super-server inetd. This program reads
/etc/inetd.conf
and opens server-sockets on all ports
defined in this file. As a request comes in it accepts it and starts the
associated server such that standard I/O refers to the socket. This
approach has several advantages:
- Simplification of servers
Servers don't have to know about sockets and -operations. - Centralised authorisation
Using tcpwrappers simple and effective firewalling of all services is realised. - Automatic start and monitor
The inetd automatically starts the server‘just-in-time' and starts additional servers or restarts a crashed server according to the specifications.
The very small generic script for handling inetd based connections is
in inetd_httpd
, defining http_server/1:
- http_server(:Goal, +Options)
- Initialises and runs http_wrapper/5 in a loop until failure or end-of-file. This server does not support the Port option as the port is specified with the inetd configuration. The only supported option is After.
Here is the example from demo_inetd
#!/usr/bin/pl -t main -q -f :- use_module(demo_body). :- use_module(inetd_httpd). main :- http_server(reply).
With the above file installed in /home/jan/plhttp/demo_inetd
,
the following line in /etc/inetd
enables the server at port
4001 guarded by tcpwrappers. After modifying inetd, send the
daemon the HUP
signal to make it reload its configuration.
For more information, please check inetd.conf(5).
4001 stream tcp nowait nobody /usr/sbin/tcpd /home/jan/plhttp/demo_inetd
3.14.5 MS-Windows
There are rumours that inetd has been ported to Windows.
3.14.6 As CGI script
To be done.
3.14.7 Using a reverse proxy
There are several options for public deployment of a web service. The main decision is whether to run it on a standard port (port 80 for HTTP, port 443 for HTTPS) or a non-standard port such as for example 8000 or 8080. Using a standard port below 1000 requires root access to the machine, and prevents other web services from using the same port. On the other hand, using a non-standard port may cause problems with intermediate proxy- and/or firewall policies that may block the port when you try to access the service from some networks. In both cases, you can either use a physical or a virtual machine running ---for example--- under VMWARE or XEN to host the service. Using a dedicated (physical or virtual) machine to host a service isolates security threats. Isolation can also be achieved using a Unix chroot environment, which is however not a security feature.
To make several different web services reachable on the same (either standard or non-standard) port, you can use a so-called reverse proxy. A reverse proxy uses rules to relay requests to other web services that use their own dedicated ports. This approach has several advantages:
- We can run the service on a non-standard port, but still access it (via the proxy) on a standard port, just as for a dedicated machine. We do not need a separate machine though: We only need to configure the reverse proxy to relay requests to the intended target servers.
- As the main web server is doing the front-line service, the Prolog server is normally protected from malformed HTTP requests that could result in denial of service or otherwise compromise the server. In addition, the main web server can transparently provide encodings such as compression to the outside world.
Proxy technology can be combined with isolation methods such as dedicated machines, virtual machines and chroot jails. The proxy can also provide load balancing.
Setting up an Apache reverse proxy
The Apache reverse proxy setup is really simple. Ensure the modules
proxy
and proxy_http
are loaded. Then add two
simple rules to the server configuration. Below is an example that makes
a PlDoc server on port 4000 available from the main Apache server at
port 80.
ProxyPass /pldoc/ http://localhost:4000/pldoc/ ProxyPassReverse /pldoc/ http://localhost:4000/pldoc/
Apache rewrites the HTTP headers passing by, but using the above
rules it does not examine the content. This implies that URLs embedded
in the (HTML) content must use relative addressing. If the locations on
the public and Prolog server are the same (as in the example above) it
is allowed to use absolute locations. I.e. /pldoc/search
is
ok, but http://myhost.com:4000/pldoc/search
is not.
If the locations on the server differ, locations must be relative (i.e. not
start with
.
/
This problem can also be solved using the contributed Apache module
proxy_html
that can be instructed to rewrite URLs embedded
in HTML documents. In our experience, this is not troublefree as URLs
can appear in many places in generated documents. JavaScript can create
URLs on the fly, which makes rewriting virtually impossible.