APL hardware is hardware that has been designed to natively support APL array operations. This breaks the popular understanding of APL as an interpreted language. Unlike x86, which is targeted to operate on individual scalars one at a time, native APL architectures would be targeted to operate on entire arrays at a time, thereby increasing the speed of APL processing.
The APL Machine
- Main article: APL Machine
The APL Machine was an actual (as opposed to theoretical) hardware implementation, created with the explicit purpose of facilitating programming an analog array processor.
Cellular APL Computer
A 1970 paper describes a possible general design for a computer which implements a dialect of APL as its machine language. The purpose of the design was to take advantage of the inherent parallelism in APL by being flexible enough to operate on entire arrays. The design was built to be cellular, meaning that each component would handle a separate part of the APL logic.
The specified design contains:
- a matrix logic-in-memory unit (MLIM)
- sixteen memory arrays (MA1, MA2..., MA16)
- an instruction memory unit (IMU)
- routing logic (RL)
- thirty-two vector accumulators (VA1, VA2..., VA32)
- input-output controllers (IOC)
- a pre-processor (PP)
The MLIM is a 32x32 array of memory cells. Each memory cell contains four shift registers named A, B, C, and T. This is equivalent to creating 4 arrays of memory cells with one shift register each. The arrays created by registers A, B, and C are used to store and operate on data, while T is temporary array storage for the result of an operation. The operations which the MLIM can perform can either read from A and B to store the result in C, or read from C and B and store the result in A. The RL processing helps place each memory array in the correct locations of A, B, and C such that the operands line up before the MLIM performs its operations.
MA1 through MA16 are each 32×32 arrays of memory cells which can each store one word (16 bits). This means the total array storage of the computer is 16,384 words.
The IMU is a temporary location for instructions, to "give the programmer a usable memory of 16,384 words." Each cell is a 32-bit read-only memory cell. The RL is a specialized transfer system that can perform row- or column-wise transfer of data to the MLIM or VAs. It can also index into matrices and vectors during transfer.
The IOC is a generalized input/output system that can be modified for any purpose. Input from the IOC is fed through the PP before it is fed into the RL. The PP would handle storage allocation, basic operations, and other operations which the MLIM cannot perform.
VA1 through VA32 are each composed of two 32 bit registers A and B. Register A of each accumulator is connected to the Routing and Control Logic board via a decoder. Each decoder is connected to its VA via a 32-bit bus, to the Routing and Control Logic board via a 32-bit output bus, and to the PP via a 5-bit input logic bus. if p is the value on the 5-bit bus such that 0 ≤ p ≤ 31, then the 32-bit bus shifts the bits in the output such that it returns 32−p, 32−p+1, …, 32. Thus, it shifts the input left by p bits, masking out the indices that are greater than or equal to 32. Thus, this type of register is called a shift register. The decoders are considered a part of the RL cell. Register B is directly connected to register A, and has a direct vector routing bus connected to the Routing and Control Logic board.
It is an important functionality that vectors can be loaded into right justified into a VA, then read offset such that its length ≤ 32. Because of register B, each accumulator can perform reductions by repeatedly adding the register A.
Values can be transferred using the
The MLIM natively supports the monadic operations:
and the dyadic operations:
(C,A)←(A,C)+B (C,A)←(A,C)-B (C,A)←(A,C)×B (C,A)←(A,C)÷B ⍝ when B ≠ 0 (C,A)←(A,C)∨B (C,A)←(A,C)∧B (C,A)←(A,C)<B (C,A)←(A,C)=B (C,A)←(A,C)>B
The operations can be combined to create the following operations:
+M ×M |M ?M ~M !M M+N M-N M×N M÷N M⌊N M⌈N M!N M*N M<N M≤N M=N M≥N M>N M≠N M∨N M∧N M⍱N M⍲N
The masking functionality of the RL combined with the native ability for Scan (reduce while reading the output in between each step) and Reduce (
/ respectively) allows for Index Generator (monadic
⍳) to be defined, while its generalized indexing functionality allows Reverse and Transpose (monadic
⍉) to be defined. Shape and Ravel (monadic
,) can also be defined by using the RL and PP in parallel. Thus, a list of complex operators can be defined:
⍳N ,N ⍴N ⌽N ⍉N M⍴N M,N M⌽N M⍳N M/N M\N M⍉N M↑N M↓N M∊N M∘.b N M[N]
This paper does not outline floating point arithmetic. Many functions may be missing because floating point arithmetic has not been defined.
All Applications Digital Computer
The All Applications Digital Computer (AADC) is a paper written by Stanley M. Nissen and Steven J. Wallach in 1973 detailing a modular computer architecture which can process APL natively.
- Thurber, Kenneth J. and Myrna, John W. System Design of a Cellular APL Computer. IEEE Transactions on Computers, volume C-19, issue 4. Institute of Electrical and Electronics Engineers. April 1970.
|APL development |
|Interface||Session ∙ Typing glyphs (on Linux) ∙ Fonts ∙ Text editors|
|Publications||Introductions ∙ Learning resources ∙ Simple examples ∙ Advanced examples ∙ Mnemonics ∙ Standards ∙ A Dictionary of APL ∙ Case studies ∙ Documentation suites ∙ Books ∙ Papers ∙ Videos ∙ Periodicals ∙ Terminology (Chinese, German) ∙ Neural networks ∙ Error trapping with Dyalog APL (in forms)|
|Sharing code||Backwards compatibility ∙ APLcart ∙ APLTree ∙ APL-Cation ∙ Dfns workspace ∙ Tatin|
|Implementation||Developers (APL2000, Dyalog, GNU APL community, IBM, IPSA, STSC) ∙ Resources ∙ Open-source ∙ Magic function ∙ Performance ∙ APL hardware|