AN IMPLEMENTATION OF ALGOL 60
FOR THE FP6000

R. D. MOORE

Feranti Electronics, Toronto

ABSTRACT
REASONS FOR A STACK SCHEME
THE ADMINISTRATION OF THE STACK
NON-LOCAL IDENTIFIERS
PARAMETEBS
GO TO STATEMENTS
RESULTANT LIST STRUCTURE IN STACK
ARRAYS AND BLOCKS
OWN ARRAYS
REFERENCES

ABSTRACT

The paper covers some of the techniques used in the implementation of the language known as Algol 6000. The language contains almost all of the features of Algol 60. The most restrictive difference is the requirement that specifications appear for all formal parameters.

The FP6000 is a single address computer with index registers. Execution of the object program produced by the compiler does not depend upon an interpreter or special hardware features. Storage used by the program is divided into five sections. One section of fixed length contains values of own variables. A section of variable length contains information about the dynamic state of the program, that is, values of local variables and value parameters, pointers to name parameter evaluation routines, procedure return and storage reservation information, and pointers to information in the array section. Another section of variable length contains values of local and own arrays, and subscript computation data. A short section indicates what storage is currently in use. The fifth section contains constants and instructions. The detailed functions of each of these sections are covered in the paper.

The compiler is a separable program, consisting of two "passes" which operate concurrently. The translation from infix to postfix notation is effected by Floyd production tables. A table scanning mechanism similar to that of a conditional macro assembler is used in the translation from postfix notation to machine language.

THIS PAPER DESCRIBES the object programs of an Algol compiler. The implementations of the general call-by-name, designational expressions, and arrays are discussed. The methods illustrated here are usable on most machines but favour those with index registers. Most features of the language can be handled without recourse to an object program interpreter. Special housekeeping subroutines are used only for array declarations, undefined switch designators, and procedure headings. Subroutines are used in the latter two cases solely to conserve storage.

Some of the complications in Algol implementation arise from recursion and the hypothetical ubiquity of function designators. The main effect of recursion is that each recursive call of a procedure creates a new copy of the non-own variables which are local to that procedure. When discussing or executing an Algol program containing recursion, it is not sufficient to know which statement is being executed. One must also know which copies of the variables belonging to various procedures are in use.

A few terms will be defined for the purposes of this paper: The current procedure is the procedure in which control resides at the time under discussion.

(The outermost block is considered a procedure for these purposes.) An active procedure, or active copy of a procedure, is a procedure which has been called and from which there has been no exit via the return mechanism at its end or a GO TO statement. Note that in the event of recursion there are several copies of the same procedure. An encloser of a procedure is a procedure which contains that procedure. The immediate encloser of a procedure is that encloser of the procedure which is enclosed by all other enclosers of that procedure. For example, in the program of Fig. 1, P1 is the immediate encloser

begin Boolean FIRST;

procedure P1 (PP);

procedure PP;
begin real A;
procedure P2(B);

real B; comment P1 is immediate encloser of P2;
PRINTOUT(A, B);

if FIRST then

begin comment first call of P1;

FIRST: = false; A: = 2; P1(P2) end

else begin comment second call of P1;

A: = 3; PP(A) end

end P1;

FIRST: = true;
P1 (cos)

end

FIG. 1

of P2. Identifiers which are "Declared for a Block" (3, section 5) are local to that block. The successor of a statement is the statement which is executed after that statement. An inverted tree is a tree with one leaf and several roots.

REASONS FOR A STACK SCHEME

The use of a storage in Algol has a simple pattern: If a procedure or block is not active, it does not need to have storage reserved for it. When a block is activated, storage must be provided for it. From the current block, control can proceed in one of two directions: (1) An exit might be made from the current block; then storage reserved for the current block can be released. (2) Another block might be activated, in which case storage must be assigned for that block. Thus the order in which blocks are left is the exact opposite of the order in which they are entered. This allows for a simple storage allocation algorithm to be used for block and procedures.

There are some specific problems which must be solved by the storage referencing scheme.

The addressing system should be biased towards speed of addressing variables local to the current procedure or block and the rapid evaluation of arithmetic expressions. The reason for this bias is that these are the two most frequent operations in non-hypothetical Algol programs.

The use of procedure and variable names as actual parameters allows reference to copies of variables which are not the most recently created ones. The program of Fig. 1 is an example of this.

Upon exit from a procedure, the successor of the procedure must have access to the correct copy of its variables. This is trivial unless exit from a procedure is via a GO TO statement.

A stacking technique is used for dynamic storage allocation. It is easy for the compiler to reserve one relatively large area for the storage requirements of all blocks and procedures. Storage is allocated at run time by starting at one end of this area and working towards the other end. A register called S is used to mark the boundary between storage which is in use and that which is not. It is this register S which characterizes a stacking scheme. The stack is that portion of the large storage area which has been allocated. Its lower bound is the lower bound of the storage area and its upper bound is the cell designated by S. A relativisor is used to address each assigned area.

A portion of the compiled program which needs to use some of the unassigned storage must adjust the S register to reflect the amount taken and check to see that the storage is available. On most machines this operation takes about the same time as a division. When a routine wishes to release storage, it must restore the S register to the value which it had before the storage was taken. Because on most computers the operations of storage allocation and release are not instantaneous, considerations of running time discourage the use of these operations while evaluating simple arithmetic expressions.

THE ADMINISTRATION OF THE STACK

The maximum storage requirements of a procedure can be determined when the procedure is entered. The number of locations needed for simple variables, special housekeeping cells, and storage of intermediate results in arithmetic expressions can be determined at compile time. The size of the arrays declared in the procedure's outer block is known before the body of that block is executed. Thus each procedure can take what storage it needs upon entrance. The operations required are:

Relativisor : = S ;

STHIS : == S : = S + fixed storage requirements + array storage.

STHIS is stored in the stack. The use of STHIS will be explained below. The release of storage upon normal exit from the procedure is accomplished by:

S : = Relativisor.

The relativisor goes into an index register. Thus references to local variables are just index-modified instructions. The non-own variables of a procedure or block are stored in the stack and it is the relativisor which points to the place in the stack where they are stored. When a procedure is called, it stores some information about its caller in the stack. One cell is reserved for the return address, another called LASTR for the value of the caller's relativisor. These are restored upon return from the procedure. This is the extent of the mechanism required to handle simple recursion.

NON-LOCAL IDENTIFIERS

References to identifiers which are not local to the current procedure require additional mechanisms. A block or procedure can refer directly only to identifiers which are local to itself or to one of its enclosers.

If the relativisor of the block which contains the referenced identifiers is known to the current block, this relativisor can be loaded into an index register. Then variables corresponding to these identifiers can be examined or altered with index-modified instructions. One way to give a procedure access to the relativisors of its enclosers is through a list structure. If every active procedure has a standard cell in the stack called BIND which contains a copy of its immediate encloser's relativisor, the pertinent relativisor of every encloser of the current procedure can be found by progressing down the structure formed by the BIND cells.

As an example, take the program of Fig. 1. Assume that the procedure P2 contained a reference to the identifier FIRST, which is local to the block immediately enclosing PI. This reference would cross two procedure boundaries. Code in P2 would load P2's BIND into an index register. Then P1's BIND could be accessed and loaded into an index register. This is the relativisor which is needed to access FIRST. An index-modified instruction can now reference FIRST. The number of steps in this process is equal to the number of procedure boundaries which are crossed by the reference. This number is known to the compiler although the order in which the procedures will be called is not.

The contents of the BIND cell for each call of a procedure are determined by the following method. When the name of a procedure is used in a procedure statement or function designator, this procedure is local either to the current block or to some block which encloses the current block. If the procedure is local to the current block, the relativisor of the current block is passed as a parameter to the procedure being called. If the procedure is local to some enclosing block, then the relativisor of the block which is the immediate encloser of the called procedure is located by moving down the BIND list. In any case the called procedure receives the pertinent relativisor as a parameter. Note that the mechanism for obtaining the correct relativisor to pass as a parameter is the same as the mechanism for accessing a variable.

The compiler can produce the required instructions because in this case it knows how many procedure boundaries, if any, are crossed by the call. This mechanism is also used for references to switch identifiers and labels which are not local to the current block.

The only way, other than procedure statements and function designators, in which a procedure identifier may be used, is as an actual parameter to some second procedure. When this happens, the address of the first instruction of the first procedure and the relativisor of its immediate encloser are stored in the stack and their location in the stack is used as the actual parameter. When the corresponding formal parameter is used in a procedure statement or function designator, the first procedure is entered with the relativisor which was stored on the stack as a parameter.

Without this mechanism or its equivalent the program of Fig. 1 could not be executed correctly. The procedure P2 is activated by its being used as an actual parameter to its immediate encloser. The copy rule dictates P2's use of the copy of A created by the first call of PI, not the second.

PARAMETEBS

Actual parameters which are called by value can be treated as name parameters by the calling sequence. Only the called procedure needs to know which parameters are called by value. The requirements of the general call by name imply that in the general case the caller must supply a subroutine to evaluate each actual parameter. The called procedure will call this parameter subroutine every time the corresponding formal parameter is referenced.

A parameter subroutine must have access to the same identifiers to which the caller of the procedure had access. This means that either the parameter subroutine must be able to access the caller's relativisor or this relativisor must be supplied as a parameter to the parameter subroutine. Upon return from the parameter subroutine, the relativisor being used by the called procedure must also be restored. The LASTR cell of a procedure is the relativisor required by the parameter subroutine. After obtaining its relativisor by using the called procedure's relativisor and LASTR, the parameter subroutine saves the procedure's relativisor and the subroutine's return address on the stack. This allows one parameter subroutine to call another.

If a parameter subroutine calls a procedure, the compiler must cause the parameter subroutines of the interior procedure call to store their return information in a stack position other than that used by the outer parameter subroutine. This can be done at compile time because the compiler knows the maximum depth of this kind of nesting. Since all of the storage requirements of the parameter subroutines are known at compile time, there is no need for a parameter subroutine to adjust S.

Parameter subroutines for switch identifiers and designational expressions do not need to save a return address and called procedure's relativisor, for the reasons mentioned below. They do use the procedure's LASTR cell to restore the relativisor which was in use at the time the procedure was called.

GO TO STATEMENTS

Section 4.3.5 of the Algol report (1) requires a GO TO statement involving an undefined switch designator to be equivalent to a dummy statement. Since function designators can alter own variables this is impossible, but a reasonable interpretation of the report would require the return of control to the statement which follows the GO TO statement.

A transfer out of a procedure must restore an old relativisor so that the proper copies of variables can be addressed. This is handled by retrieving the pertinent relativisor, using either the LASTR or the BIND cells, depending upon whether or not the designational expression is a formal parameter.

When control is returned to the statement which follows the undefined GO TO statement, the variables which were available to the GO TO statement should still be available. To ensure this, storage in use at the time of the GO TO statement was encountered is not released when the GO TO statement is undefined. This is implemented by delaying the release of storage until the designational expression selects a label and control passes to it.

As mentioned above, when a procedure is left by executing its last statement, the storage which was used by the procedure is released with the statement:

S : == Relativisor. When a GO TO statement in the procedure references a formal parameter of some procedure or a switch identifier local to some other block, the procedure's storage cannot be released until all pertinent switch designators are known to be defined. A switch designator is known to be defined when it selects a label which is not a formal parameter. The compiler knows if a label could be reached by a GO TO statement which occurs in a block other than the one in which the label is defined. The compiler can then generate instructions at the label definition which will cause the unneeded storage to be released.

Every active copy of a procedure has a special cell called STHIS which points to the cell just beyond the uppermost stack cell used by this active copy. The STHIS of the current block normally has the same value as S has. All procedures or blocks which were activated after the current block was activated used storage beginning with the cell designated by the STHIS of the current block. Therefore if one of those later activated blocks did not release its storage, the statement: S : = STHIS will release that storage. This is the statement which the compiler generates at certain label definitions.

The STHIS cells act as a linked list. When an undefined switch designator is encountered, S will have the same value as it had at the GO TO statement, and the corresponding relativisor can be found by searching the STHIS list.

When this relativisor has been found, the housekeeping cells of the block in which the undefined GO TO statement occurred can be addressed.

If a switch designator is undefined, the successor of the GO TO statement in which it was used should be the following statement. This following statement can be located if instructions in a GO TO statement, which might be undefined, store the location of the GO TO statement on the stack. The location must be stored on the stack because a designational expression in a switch designator might contain a function designator. The procedure called by the function designator might contain another such GO TO statement.

Note that this scheme allows designational expressions which appear in switch lists or as actual parameters to be compiled as simple GO TO statements without sacrificing the ability to continue after encountering an undefined switch designator. See Watt's paper (2) for the exact details on how a designational expression can be reduced to simpler GO TO statements.

RESULTANT LIST STRUCTURE IN STACK

The STHIS cells form a simple list which includes all active blocks in their order of activation. The last element of the list points to the same cell that S points to.

The BIND cells form an inverted tree. Locally an inverted tree always has the appearance of a list. Its structure is similar to the tree structure of the static program. Recursion appears as duplication of the tree structure in that a recursive call will result in two links pointing to the same immediate encloser.

The structure formed by the LASTR cells is normally the inverse of the STHIS list. When a parameter subroutine calls a procedure, LASTR is an inverted tree which traces active calls from called to caller.

Note that all three of these structures are formed with copies of relativisors. These list structures constitute the major difference between the tabular stack organization described by Watt (2) and the present system.

ARRAYS AND BLOCKS

There are two alternative methods of handling blocks which are not procedures. These methods affect the manner in which arrays are stored.

Blocks can be treated as procedures and given a relativisor. This allows non-own arrays of variable size to be stored in the stack conveniently. References to variables which are local to the immediate encloser of a simple block are not as rapid as under other schemes because the BIND mechanism must be used.

Blocks are considerably simpler than procedures in that a particular block may be activated from only one place in the program. At the time the block is activated, its immediate encloser will always be the current procedure. If the immediate encloser of a block and all intervening enclosers of that block use a constant amount of stack space, the difference between a relativisor for the block and the corresponding BIND will be constant. The compiler knows this constant and can encode the instructions of the block in such a manner that the relativisor of the block's immediate encloser can be used to address the variables local to the block. The block no longer needs a relativisor of its own. This scheme speeds the addressing of variables local to the block's immediate encloser by bypassing the BIND mechanism for this case. It is then difficult to store arrays of variable size on the stack.

The first of these two schemes is easier to implement but "produces an object program which runs more slowly" for the reason stated above.

OWN ARRAYS

A separate stack can be used for arrays. The same block of storage is used by both attacks. One stack uses storage from the bottom portion of the block, the other uses the top portion. With two stacks storage release becomes a lengthier operation. Dynamic own arrays increase the complexity of storage allocation and release for the following reason.

Space is reserved for non-own arrays as the blocks in which they are declared are entered, and storage is released when the block is left. Recursion presents no problems in that multiple copies of an array can be created. Also, the storage to be released is always the most recently allocated. This pattern of storage use favours a stack scheme.

There is only one copy of an own array. Its storage requirements may be altered at almost any time but the array itself never vanishes completely, as does a non-own array.

Multiple stacks such that there is one stack for each own array with variable dimensions would be a nice feature, but this is not feasible on most computers. A compromise solution is to use a stack for non-own arrays and another contiguous area for own arrays. If the storage requirements of an own array change, the stack must be moved. The efficiency of this solution is quite low unless the Algol programmer avoids dynamic own array declarations at times when the non-own array stack is not empty. This solution is preferable to outlawing own arrays completely. Its main advantage is that addressing of own arrays is the same as for non-own arrays.

Simplified storage release upon exit from a block requires some arithmetic when the stack is movable. The quantity of storage being used for non-own arrays must be recorded, rather than the limit of the array stack. To release storage the limit of own array storage is added to the number saved at entrance to the block to get a new limit of array storage which excludes the non-own array storage used by the block being left.

Since the dimensions and location of an array are not necessarily known until its declaration has been interpreted, this information must be supplied to the object program by the subroutine which processes the array declaration. This information may be used by the object program or a special subroutine such as might be used to implement arrays called by value. The location of this information is supplied to a procedure when the name of the array is used as an actual parameter. To implement arrays called by value, this information should include an indication of the total size of the array. If there is only one copy of this information, the housekeeping involved in moving the array stack is simplified.

Much of Algol can be readily implemented on conventional machines without continual resort to an interpretive system. This is especially true if address modification can be easily specified. Most of the burden recursion is placed upon procedure heading. An Algol program which follows most of the Fortran restrictions on addressing of variables should not run substantially slower than a comparable Fortran program on a machine with index registers.

The scheme described above does not include parametic designational expressions which are called by value. If one believes that a value can be assigned to a label, modifications could be made to the code produced for function designators which are used in designational expressions and certain label definitions. No provision has been made for formal and actual parameters which are not of identical kind and type.

Some features of Algol, such as dynamic own arrays, are not very easy to implement. It is unfortunate for the implementer that features such as these are often of greater use to the programmer than more easily implemented features. The ability to change the direction of test of the STEP-UNTIL form of a FOR list element while inside the controlled statement is rarely, if ever, used. Yet it presents few difficulties to the implementer.

REFERENCES

1. NAUR, PETER, et al. "Revised Report on the Algorithmic Language ALGOL 60," Computer Journal, Jan. 1963, pp. 349-365.

2. WATT, J. M. "The Realization of ALGOL Procedures and Designationial Expressions," Computer Journal, Jan. 1963, pp. 332-337.

AN IMPLEMENTATION OF ALGOL 60FOR THE FP6000