System Architecture (8/16)

The Indi processor architecture was at the outset designed as a 16 bit system accessing 16 bit words with a 16 bit address. Later in the design stage it was decided to use an eight bit bus interface in the indi16 for lower implementation cost. The indi's 16 bit design is suitable for external ROM and SRAM with bank selection ROM 00, RAM Data/Program.01, RAM Return Stack 10, RAM Parameter Stack 11. This splitting of the addressing spaces was decided for possible later use of extended Havard Architectures.

A bus cycle sequenced RW (rising edge triggers write to memory) was chosen with memory mapped video control and DRAM refresh too. The instruction sequencing unit becomes tiny in comparrision to most other CPU designs. The bus makes good use of read and write interleaving to maximize thruput at less than 2.7MHz bus bandwidth lost.

The simplistic instruction set architecture has a high emphisis on both source and destination operands as most general microcontroller processing work is data movement. More compact code results in having a source and destination per byte sized instruction. Two instructions per machine word are fetched in pairs to allow for more compact code, as less stack adjustments have to be made. A 16 bit opcode size was not chosen, as this would lead to lower code density considering the low complexity of the instruction set, so two instructions per machine word provides a lower instruction fetch bandwidth with the second instruction in a pair behaving like a branch delay slot.

An eight bit data bus interface with a video interface (with 8MHz pixel clock) for lower system cost is now standard, and so a full fetch execute cycle takes 16 clocks (for a 66MHz/64MHz clock) for 8 MIPS performance. The 66MHz clock on the test system is scaled down to 64MHz by eliminating approximatly every 33rd clock. The extra clocks are not used as the null destination is reserved for refresh of DRAM, leaving direct register modes for other features later. Sound uses other three bus cycles available when executing a group of four instructions, after the video has used the other three cycles.

The full fetch execute cycle at 16 clock cycles, at 8 clock cycles per instruction and includes three layer video signal fetching, or 2 cell fetches and one cell store for sound. A two stage pipeline could allow address generation to happen in parallel with a data fetch or store. An extra address line HILO controls high and low byte access, for the 8 bit DRAM interface version.

Even though the processor is big endian, the low byte is fetched first and written first in all memory acceses, as shown below. This is mainly done for carry chain propergation reasons in optimized designs, and all data bus IO can be 16 bit.

Cycle
Bus
0
LO opcode fetch
1
HI opcode fetch
2
HI op LO src fetch
3
HI op HI src fetch
4
VFETCH LO
5
VFETCH HI
6
HI op LO dest store
7
HI op HI dest store
8
CFETCH LO
9
CFETCH HI
A
LO op LO src fetch
B
LO op HI src fetch
C
PFETCH LO
D
PFETCH HI
E
LO op LO dest store
F
LO op HI dest store

.The following useful ordering table makes the best pocket guide for the instruction set.

Numeric
0
1
2
3
Register
P
Q
R
S
Opcode
ADD
AND
EXOR
LOAD
Shorthand Op
+
*
-
/
Addressing
Indirect
Direct
[x]+ and -[x] indirect modes only
Shorthand Addr
.
=
All modes execute as specified except .p dest as null
Opcode
Src Mode
Src
Dest Mode
Dest
7
6
5
4
3
2
1
0
CS[1..0]
[P]
[Q]
[R]
[S]
Forth Register Usage
Program Counter
Working Register
Return Stack Pointer
Stack Pointer
Useful Pairs
First Instruction
Second Instruction
Jump Subroutine
/=p.r
/.p=p
Retern from Subroutine
/.r=p
/.p.p
Jump Indirect
/.p=p
/.p=p
Double NOP
/=p=p
/.s.s

Always remember that the Accumulator A is always modified by an instruction, and the Carry Flag C is only used and modified by ADD. The FORTH primitive CLC clears the carry. Machine code programming should become easier as you practice.

The FORTH system is designed to take away the machine code burden for most common programming tasks such as multiplication and division. A forth routine may be called from machine code by placing the return address on the return stack and also placing the word to execute's CFA address placed after the jump to address $call to execute the word.

/=p.r /.p=p

$call

<name> \ <name>'s CFA

This completes the main indi16 architecture description. The coding syntax uses <name> to represent the CFA's address for thread compilation, but $<name> is the PFA for variables and machine code primitives. $ is also used for hexadecimal constants, and the word LITERAL checks for its presence arround the same time as the minus sign.

Optional indi16 IO Modules

As most IO is from or to 1K x 16 bit word cell buffers certain limits are placed on the devices. They must align on or use multiples of 1K cells for data space. They have access to 16 cells of control data available through the forth word CBLK and can be used for various pointers within the block buffer. The high 16 cells in the 64K cell address space are reserved for use by the forth compilier and microcontroller IO, and are not open to reuse. They are the control words of the controll block which occupies the highest numbered internal block. Internal blocks are accessed via the forth word IBLK whic converts internal block numbers into base addresses of the 64 internal blocks. INB and OUTB set the current internal blocks for input and output respectivly.

ATA/IDE Interface

An IDE interface with support for any disk using LVTTL inputs and outputs, or which works off a reduced 5V line. Data reads and writes are none blocking if not executed too close together, with any read returning the last read PIO register, and scheduling the current register to be read. This is a read delay, but gives an easy non-blocking PIO mode 4 interface.

PAL I Video

A 32*32 (1024) character display using 1 block for the character occupation grid, and any number of blocks for ROM or user defined character bitmaps. ASCII 128 and 128 alternate characters are supplied in 1 block of the system as an 8 by 8 pixel font with a 2 bit colour depth (with 4 of 16 colours selectable), one bit of the colur depth selects the ASCII map and the other bit selcts the alternate character map. The characters are aranged with 8 cells per character. Thea pixel order is most significant dibit to least significant dibit and then the other 7 rows of 8 pixels (top to bottom).

The 1K block supplied font contains 256 characters. The lower 32 ASCII characters are emitted by the word GEMIT as EMIT will convert them into control actions. The bitmaps for them display as

MicroFont

The upper 128 alternate characters are displayed by setting the alternate colours to display in a font character, which is handled automatically by EMIT. They are subscript digits 0-9, superscript digits 0-9, superscript + and -, the equilibrium symbol and chemical symbols for elements 1 to 105. These were chosen as an elemental font creates applications and understanding based upon them. The modern world should have tools to help with elemental recycling.

A simple video coprocessor handles a 256*256 XY grid. This grid may be extended to 488*512 (assuming 8us flyback) by building a coprocessor list manually. The coprocessor fetches an instruction every 256 pixels, using the current VPC (Video Program Counter). The X register is incremented every pixel, and Y is set and incremented by coprocessor opcodes. The least significant bits are set to 0 for syncronization, as the VPC will increase by one block if they are not set, with the lower bits of VPC set to 0. The sync bit was chosen as indirect P register is the safest destination for the filling memory with relatively harmless processor opcodes.

There are 4 opcodes of the video coprocessor, 00 sets the VPC block and the lower 6 bits of the Y register. All other bits of the VPC and Y are set to 0 and CL and CH remain set as they were. The opcode 01 overides the value of Y in all calculations of address in the current half scanline, but Y remains unchanged. The opcode 10 loads in CH and CL and leaves Y unchanged. The final opcode 11 loads CH and CL and increments Y.

These four opcodes allow the creation of the display, and do not waste excessive memory on sync patterns. It also allows for a split vertical display, more colour per character cell, and double height characters on any line of characters. A 625 line display would need 1250 cells of coprocessor instructions, 1024 character cells, 1024 colour cells, and 8192 bitmap graphics cells, plus a small number of cells to hold the syncronization lines. If full bitmapped graphics was not required, then the 8192 graphics cells would reduce to 1024 cells for ASCII 127 plus the alternate character in each bitmap (single colour characters assumed).

F
E
D
C
B
A
9
8
7
6
5
4
3
2
1
0
Opcode
VPC/CH Page
Y/CL Page
0
0

Depending on the current BLOCK (CH or CL), X and Y registers a character code or colour location is fetched in the CFETCH cycles. This 16 bit cell fetched has character bit map start address normalized by a rotate right 3 bits or a colour word address which is not normalized. This gives 8192 possible chracters in 64K cells and the ability to use 4 of 16 colours in any character cell (colour cycling of pixels in user defined characters is also possible on an individual displayed character, but using indirect colur many characters can be colour cycled by altering just 1 memory location).

F
E
D
C
B
A
9
8
7
6
5
4
3
2
1
0
Normalized Character or Colour Address

Depending on the fetched character address or colour address, X and Y then an 8 pixel cell or 4 of 16 colour cell is fetched in the PFETCH cycle. They are displayed on the next 8 pixel clocks (most significant dibit first).

F
E
D
C
B
A
9
8
7
6
5
4
3
2
1
0
Pixel Colour 11
Pixel Colour 10
Pixel Colour 01
Pixel Colour 00

This uses the following colours and two sync codes. The two colour names arctic and tropic were chosen after a little thought, and link in to energy saving global planet ideas best

Nibble Value Pixel Phase Colour Name Coloured Cell
0
0
Red
1
30
Orange
2
60
Yellow
3
90
Lime
4
120
Green
5
150
Tropic
6
180
Cyan
7
210
Arctic
8
240
Blue
9
270
Purple
A
300
Magenta
B
330
Pink
C
Zero Luminance
Black
D
Full Luminance
White
E
PAL Phase Sync
Psync
F
NTSC Sync
Nsync

This gives 5K cells for a full text display, and 12Kcells for a 256*256 XY bitmapped display, or under 16K cells total display memory use including the ASCII character maps. It also makes ZX Spectrum and other 8 bit micro emulation displays easy to set up.

USB/MIDI UART

A UART working at the MIDI data rate of 31250 baud for musical applications, or a USB serial device which can be driven by a MIDI serial port driver. The USB version allows upto 128 indi systems to be connected to a USB system. A USB host network controller/PSU will eventually be designed.

64 Channel FM Stereo Sound

128 cells at a reserved location contain sound index and sound increment double cell pairs for 128 square wave oscillators. This gives 31250Hz maximum sample frequency, and so 15KHz maximum sound frequency (good MIDI UART clock also). The resolution is greater then 1 cent at 440Hz and covers more than 10 octaves. The location of the 128 sound control cells must be fixed, as the sound engine performs writes to memory when the phase index is incremented. The most significant cell contains the frequency increment except for the most significant bit which does FM chaining modulation from the previous oscillator when set to 1. The least significant cell contains the phase index. Even numbered channels are set to left, and odd numbered channels are set to right.

The FM modulation index is maintained by either adding half or one and a half the frequency increment to the phase index. This also removes the complexity of a full adder, and gives the more natural constant modulation index with respect to frequency. It does however make the modulation waveform be a square wave.

Volume is controlled by having multiple oscillators playing the same tone. This method was chosen for maximum flexibility and minimum logic and memory overheads.

A 4 by 8 Keypad Scan Controller (With 7 Segment Support Possible)

12 external pins are utilized to allow a keypad matrix arrangement of 4 rows by 8 columns with 8 shift banks. This makes it possible to type upto 256 possible input characters with software debounce and 3 stcky shift keys make direct ASCII order (0-31, 32-63, etc) the most efficient input method. The internal block buffer is used to store such things as the keyboard buffer, the display contents and to provide keybounce elimination.

The key layout being ASCII order is simple, with a reassignment of ASCII control codes to useful functions completes the layout.

LOAD
SAVE
EDIT
CALC
EXIT
PERF
FILE
BELL
@
`
A
!
B
"
C
#
D
$
E
%
F
&
G
'
BACK
TAB
NL
VT
HOME
CR
PROD
FACT
H
(
I
)
J
*
K
+
L
,
M
-
N
.
O
/
CODE
IDX
RP
SP
TRAC
TURN
RUN
SOLV
P
0
Q
1
R
2
S
3
T
4
U
5
V
6
W
7
STOP
HALT
DICT
F
{
L
|
R
}
U
~
D
X
8
Y
9
Z
:
[
;
\
<
]
=
^
>
_
?

The key layout uses red for symbol shift, green for upper shift, blue for lower shift and black for both upper and lower case letters, or for control codes. The only irregularity is that space is symbol shift on the @ key, but is not shown on the key. FLRUD in the lower right keys is fire, left, right, up and down, with fire being the escape control character. A better rendition of these actions would be present on a production system, and any digital joystick present would generate these keys using a standard 9 pin joystick D connector switching the correct keyboard scan matrix points.

Function Description
LOAD Opens the load dialog to compile screens into dictionary.
SAVE Opens the save dialog to SAVE-BUFFERS to specific location.
EDIT Opens the edit dialog to load code for editing, with no compile.
CALC Opens the calculator dialog to do calculations, and paste results.
EXIT Opens the shutdown dialog.
PERF Opens the performance and profiling dialog.
FILE Executes the file manager.
BELL Makes a beep.
BACK Moves the cursor back one character.
TAB Tabulates the cursor forward 8 characters.
NL Moves cursor down one line, and all the way to the left.
VT Moves cursor down 8 lines.
HOME Moves cursor to top left of screen.
CR Moves cursor all the way to the left of the line.
PROD Expands inline functions.
FACT Factors common expressions.
CODE Opens the dissembler.
IDX Creates index entries for finding screen.
RP Does a return stack trace.
SP Shows the parameter stack trace.
TRAC Opens up the variable tracking dialog.
TURN Calculates functional morphings.
RUN Opens the task list of tasking words to execute.
SOLV Opens the equation solver.
STOP Open the task control dialog.
HALT Stops all tasks except the outer interpreter.
DICT Opens the compiled dictionary browser.
FIRE Fire button or escape character.
LEFT Left joystick.
RIGHT Right joystick.
UP Up joystick.
DOWN Down joystick.

The control codes which have security implications are still executed, but user confirmation is required. This compatability with ASCII with Unix NL is beneficial for data importing, while allowing program control.