Abstract

Brain Aid Prolog Abstract

This file is intended to give you a brief idea about BAP. Please see the tutorial for details.

System Components

The BAP system consists of the following components:

Low Level Message Router

The router provides the connection between processes. Processes are identified by a unique ID, allowing messages to be sent to any processes at any time. The low level router works transparently and allows to see any kind of network as a 'pool of processes'. The actual topology may affect communication performance, but is in general hidden to the user (it is possible to place processes directly on certain processors).

Operating System

Responsible for memory management, loading and execution of applications and the process management.

PROLOG Runtime System

This component includes several libraries, predefined predicates such as i/o functions, object management and an interface to C - routines. Code modules can be recompiled at runtime and replaced even in a running application.

PROLOG Compiler

The compiler is the central part of BAP. The BAP PROLOG dialect is based on the standard as defined by Clocksin and Mellish, 1984. Variables are not typed, but an optional type checking (for debugging purposes) is available. The BAP compiler is written in PROLOG and runs as an application on the Transputer system. It is not necessary to leave the development environment or reset the Transputer network to run the compiled programs.

Host Server System

The I/O server system is available for MS WINDOWS and X Windows. Together with the runtime system, the host server provides a comfortable development environment, including source level debugging of multiple processes. Other features are graphics output, file I/O to the host file system and all the little details provided by modern windows environments.

Concept

The concepts used to implement BAP are based on the idea that programming on parallel systems should be as easy as programming on sequential computers. Although parallelism seems to be inherent in any natural process, the programmers mind is trained to break down a problem into small steps which are executed sequentially. We try to use both ideas by letting the programmer decide at which level the parallelism should start. Systems that provide automatic parallelization will never be able to optimize code as well as a programmer with all his knowledge of the problem. We do not think it wise to ignore the programmer's intelligence. As a result we decided to base our system on the well known sequential PROLOG language and provide efficient extensions for communication and synchronization.

Closer Look

The Compiler produces genuine assembler code for the Transputer. This allows direct access to the Transputer communication hardware.

The process of compiling PROLOG files can be executed concurrently, as single predicates do not contain dependencies needed before linking. The source files are splitted into smaller chunks by the use of keywords, which for example indicate the start of a new predicate description. These chunks may be treated separately until linking is required. Furthermore, scanning, parsing, code generation and linking may be executed in pipelines.

PROLOG programs are built up of MODULES with external references made public by IMPORT and EXPORT statements.

These modules are loaded on demand from the host server or from neighbour nodes that loaded these modules before. The exported predicates can be called by name. This allows interpreting shells and interactive debugging. Single modules may be replaced by newer versions even in a running application. This allows immediate examination of changes made.

The debugger traces processes at source level. Several debugging sessions can run parallel on the host server, enabling comfortable tracing of complex parallel constructs. Debug windows show the executed sources as well as the parameters actually passed to and returned from subgoals. Trace can be switched on and off by the host system and/or the program. Another debugging feature is the message spy facility that can be used to intercept messages to any object.

There are no limitations to the number of PROLOG processes or objects started on each Transputer node, except for the amount of memory available. This allows the development of even highly parallelized applications on small machines. They can later be executing on larger networks.

Performance

Sequential Performance

Some standard benchmarks might give an idea of the current speed of the system. The listing given in Appendix A has been compiled exactly as printed here, so it also demonstrates the use of BAP. The benchmarks are sequential PROLOG programs running on a single Transputer T800 20MHz. The benchmarks were taken from 'IMPLEMENTING PROLOG v2 by D. Warren, D.A.I Research Report No. 40'.

The output after a bench(1000) call was the following:

Tests: QSORT(list50) 
Time: 35.70 ms
Tests: NREVERS(list30)
Time: 32.85 ms
Tests: DIRIV(times10,x,Y)
Time: 3.13 ms 
Tests: DIRIV(divide10,x,Y)
Time: 3.40 ms
Tests: DIRIV(log10,x,Y)
Time: 2.25 ms
Tests: DIRIV(ops8,x,Y)
Time: 2.25 ms

We expect speed improvements with the introduction of typed variables related code optimizations. Some of the optimizations possible with sequential machines cannot be used, for instance an atom table is not possible without a shared memory between processors.

Parallel Performance

The following example shows the well known queens problem solved by a simple farm concept built from message passing primitives. The 'exec' predicate interprets its arguments as a goal to be executed on a concurrent process. 'Send_msg' and 'rec_msg' are used to synchronize the termination of the main process to enable timing. They could be used to collect the results by the client process, but in this case we preferred to direct output directly onto the screen.

A call to pqueens(8) is executed in 77.3 seconds on a single T800 20MHz. With 4 nodes in a pipeline topology the program reached a time of 20.4 seconds.