Last visit was: Thu May 01, 2025 5:19 pm
|
It is currently Thu May 01, 2025 5:19 pm
|
Author |
Message |
BigEd
Joined: Wed Jan 09, 2013 6:54 pm Posts: 1821
|
Nice diagrams - helpful - thanks!
|
Fri Jan 07, 2022 9:05 pm |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2307 Location: Canada
|
Some success! The eight-bit version of the system successfully lights up the LEDs. This is going through the network to load the instruction cache, execute instructions, then send a write back to the system to lite LEDs. The program to do this was in assembler. In the meantime the compiler was improved.
After getting the compiler to work with 12-bit bytes I put more work into the 12-bit version of the core. Using a 12-bit core adds about 30% to the overall size of the system. Not surprising as 12-bit is 30% more bits to process. The appeal of using 12-bits is 24-bit addressing. The compiler supports pointer values made up of two bytes. It is actually a fair amount of work to get the compiler to use 24-bit pointers when the byte size is only eight bits. Registers cannot contain a 24-bit address and quite a few hoops must be jumped to get good compiled output. It is simpler to modify the compiler to use 12-bit bytes, that way pointers can be contained in registers.
_________________Robert Finch http://www.finitron.ca
|
Sat Jan 08, 2022 4:04 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2307 Location: Canada
|
Made the number of nodes in the ring configurable. Reduced the node count to one for debugging. This reduces the system build time considerably. Got the system as far as clearing the screen now, but it does not clear to the correct colors. Working on software for the core. Thinking of creating a multi-core version of the sieve, code below. Each core would be responsible for applying a portion of the sieve. I do not think this would be any faster than performing it on a single core as global memory would be updated. Code: ; First fill screen chars with 'P' indicating prime positions ; Each core is responsible for the Nth position where N is the ; core number minus two. ; multi_sieve: lda #'P' ; indicate prime ldb COREID ; find out which core we are subb #2 ldx #0 ; start at first char of screen abx multi_sieve3: sta TEXTSCR,x ; store 'P' leax 8,x ; advance to next position cmpx #4095 blo multi_sieve3 addb #2 ; start sieve at 2 (core id) lda #'N' ; flag position value of 'N' for non-prime multi_sieve2: ldx #0 abx ; skip the first position - might be prime multi_sieve1: abx ; increment sta TEXTSCR,x cmpx #4095 blo multi_sieve1 addb #8 ; number of cores working on it cmpb #4095 blo multi_sieve2 multi_sieve4: ; hang machine bra mult_sieve4
_________________Robert Finch http://www.finitron.ca
|
Sun Jan 09, 2022 7:16 am |
|
 |
BigEd
Joined: Wed Jan 09, 2013 6:54 pm Posts: 1821
|
three nodes might be a good tradeoff between build time and finding bugs in the network...
|
Sun Jan 09, 2022 7:36 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2307 Location: Canada
|
Got a note about the Turbo9 project and watched the you-tube video. The core is a pipelined 6809 core. Running at about 120MHz with 3x the performance of a stock 6809 gives the equivalent of over 350x performance. I assume they picked a fast FPGA part for timing. My own core runs at over 40MHz in a slow part. It would probably run over 100MHz in fast part. However, my core does not have an overlapping pipeline. So its probably only about 120x the performance. Had the stacks for both cores of a node at the same location, not good. Ran into issues with values on the stack being corrupted. There is a switch now for different processing for the odd or even numbered core. For instance, only one core needs to copy the global ROM to RAM since the RAM is shared. There seems to be an issue accessing local memory. I tried running the sieve program and it worked. Thing is it only uses global video memory. However more sophisticated program is not working because apparently subroutine calls do not work. A subroutine may be called but does not return properly. Quote: three nodes might be a good tradeoff between build time and finding bugs in the network... I have tried several different numbers of nodes the network seems to be working. The sieve was coded for four nodes (eight cores) so wont run properly without that number. It still runs however, just incorrect results. I am working on other software between trial runs, like a BIOS / boot rom and OS. There are also peripheral cores to update. Currently eight-bit versions are used with a 12-bit bus. Several peripherals like the keyboard and RTC don’t make sense to update. I may have fixed the local memory issue. I made the memory controlled about as simple as possible which cost some performance. I think the issue was the enable signal was being cleared during the same clock cycle as data was captured from the RAM. I delayed clearing the signal by a clock cycle and now it seems to work. More testing needed though.
_________________Robert Finch http://www.finitron.ca
|
Sun Jan 09, 2022 4:56 pm |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2307 Location: Canada
|
The RAM test routine passed okay except for core #5 which reported an error. RAM test is special in that the return address for it is stuffed in the U register so that the RAM test does not need any memory. Next thing to try is reducing the clock frequency. Got a bunch of keyboard driver routines written. Code: ; Local RAM test routine ; Checkerboard testing. ; There is 70kB of local RAM ; Does not use any RAM including no stack
ramtest: ldy #0 lda #1 sta LEDS ldd #$AAA555 ramtest1: std ,y++ cmpy #71680 blo ramtest1 ; now readback values and compare ldy #0 ramtest3: ldd ,y++ cmpd #$AAA555 bne ramerr cmpy #71680 blo ramtest3 lda #2 sta LEDS jmp ,u ramerr: lda #$80 sta LEDS ldx #TEXTSCR ldb COREID abx lda #'F' sta ,x sync jmp ,u
_________________Robert Finch http://www.finitron.ca
|
Mon Jan 10, 2022 5:04 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2307 Location: Canada
|
Latest fix: the nybble codes for transfer and exchange instructions were positioned wrong for the 12-bit version of the core. This led to those instructions not working correctly.
Looking at the transfer and exchange instructions for the 12-bit core there are extra bits that could be used. The core id input to the core is a register that could potentially be accessible with the TFR instruction. ATM the core id is available at a specially dedicated address: $FFFFE0. There are other registers common in other architectures that might be useful to include such as the tick count.
Added: a checkpoint interrupt. The checkpoint register at address $FFFFFFFE1 must be cleared within 1 second or an NMI is generated. The checkpoint register is set automatically by a timer circuit.
The instruction cache load now uses asynchronous reads across the network for better performance. With asynchronous reads multiple reads may be issued before any responses are received back. This allows the cache to place a high speed burst of addresses on the network. Tricky to get working because response may come back out of order.
Code is running from the local RAM now after it is loaded from global ROM. However, after running numerous iterations of the delay loop, the core jumps off to a wild address, then keeps executing unknown instructions from the wild set of addresses. That was incentive to add checkpointing. All cores reporting ram test failure after some slight modifications. More to fix yet.
_________________Robert Finch http://www.finitron.ca
|
Tue Jan 11, 2022 3:59 am |
|
 |
oldben
Joined: Mon Oct 07, 2019 2:41 am Posts: 768
|
Have you looked at the Hitachi 6309, a enhanced 6909.?
|
Wed Jan 12, 2022 12:46 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2307 Location: Canada
|
Quote: Have you looked at the Hitachi 6309, a enhanced 6909.? Yes. That is a great update to the 6809. I have looked at that instruction set, and I am not fond of the extensions. Extensions to the instruction set for triple byte addressing conflict with 6309 extensions. I had a version coded to support the 6309 instead of the 6809 but dropped it. I never used it. There are a lot of 6809 systems using banked addressing. I feel it is more important to extend the addressing range than it is to change other aspects of the ISA. The easy route is to make a wider machine, 12-bits instead of eight bits. Latest Fixes: the assembler was only outputting the low order 16-bits of a 24-bit displacement for long branches. This led to issues executing subroutines. Latest Additions: Added a different configuration for processing nodes. Rather than have two cores sharing RAM in one node. The new configuration has only a single core with dedicated local RAM. There is half as much RAM available. The new configuration arose out of the desire to simplify the software for the node. With two cores sharing RAM there was a lot testing and setting in the software to control buffer positions for each core. When there is only a single core present the software is simpler. Thinking about adding a second ring to the bus for responses. This issue is that if the bus becomes flooded with requests then there is no space on it for responses. This hangs the bus because the requests will keep looping around until there is a response. If there was a dedicated ring for responses then it may help. The keyboard initialization routine is hanging, it loops around forever. I have traced the code using ILA through a branch that is failing. I have no idea why the branch would fail. <- figured this out. The decrement instructions were not encoded by the assembler correctly. In one location I decided to code DEY instead of LEAY -1,y. Character output via the DisplayChar() routine seems to be working. The start-up message displays onscreen. Just trying to get keyboard input working ATM.
_________________Robert Finch http://www.finitron.ca
|
Wed Jan 12, 2022 3:23 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2307 Location: Canada
|
Spent a chunk of time today debugging the keyboard. Finally figured out that there was another device placing data on the databus when a keyboard read occurred. The two values were wire or’d together. Now that the keyboard device is working, trying to use the keyboard with the monitor program still does not work. The scan-codes fetched from the keyboard look correct as the values are dumped to the screen, but when converted to ascii the values are incorrect. Displaying a character from the keyboard simply displays a space instead of the character it is supposed to display. Strangely, the carriage return key appears to work correctly. I suspect something is causing the issue in the scan-code converter routine. But I have not been able to trace why yet. CTRL-ALT-DEL works as expected. Maybe someone with good eyes can spot the issue. Code: ; KeyState2 variable bit meanings ;1176543210 ; ||||||||+ = shift ; |||||||+- = alt ; ||||||+-- = control ; |||||+--- = numlock ; ||||+---- = capslock ; |||+----- = scrolllock ; ||+------ = <empty> ; |+------- = " ; | = " ; | = " ; | = " ; +-------- = extended
; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ; Debug versison of keyboard get routine. ; ; Parameters: ; b: 0 = non blocking, otherwise blocking ; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
DBGGetKey: pshs x stb KeybdBlock ; save off blocking status dbgk2: ldb KeybdBlock pshs b bsr KeybdGetStatus andb #$80 ; is key available? puls b bne dbgk1 ; branch if key tstb ; block? bne dbgk2 ; If no key and blocking - loop ldd #-1 ; return -1 if no block and no key puls x,pc dbgk1: bsr KeybdGetScancode ; lbsr DispByteAsHex ; Make sure there is a small delay between scancode reads ldx #20 dbgk3: dex bne dbgk3 ; switch on scan code cmpb #SC_KEYUP bne dbgk4 clr KeyState1 ; make KeyState1 = -1 neg KeyState1 bra dbgk2 ; loop back dbgk4: cmpb #SC_EXTEND bne dbgk5 lda KeyState2 ora #$800 sta KeyState2 bra dbgk2 dbgk5: cmpb #SC_CTRL bne dbgkNotCtrl tst KeyState1 bmi dbgk7 lda KeyState2 ora #4 sta KeyState2 bra dbgk8 dbgk7: lda KeyState2 anda #~4 sta KeyState2 dbgk8: clr KeyState1 bra dbgk2 dbgkNotCtrl: cmpb #SC_RSHIFT bne dbgkNotRshift tst KeyState1 bmi dbgk9 lda KeyState2 ora #1 sta KeyState2 bra dbgk10 dbgk9: lda KeyState2 anda #~1 sta KeyState2 dbgk10: clr KeyState1 bra dbgk2 dbgkNotRshift: cmpb #SC_NUMLOCK bne dbgkNotNumlock lda KeyState2 eora #16 sta KeyState2 lda KeyLED eora #2 sta KeyLED tfr a,b clra bsr KeybdSetLED bra dbgk2 dbgkNotNumlock: cmpb #SC_CAPSLOCK bne dbgkNotCapslock lda KeyState2 eora #32 sta KeyState2 lda KeyLED eora #4 sta KeyLED tfr a,b clra bsr KeybdSetLED bra dbgk2 dbgkNotCapslock: cmpb #SC_SCROLLLOCK bne dbgkNotScrolllock lda KeyState2 eora #64 sta KeyState2 lda KeyLED eora #1 sta KeyLED tfr a,b clra bsr KeybdSetLED bra dbgk2 dbgkNotScrolllock: cmpb #SC_ALT bne dbgkNotAlt tst KeyState1 bmi dbgk11 lda KeyState2 ora #2 sta KeyState2 bra dbgk12 dbgk11: lda KeyState2 anda #~2 sta KeyState2 dbgk12: clr KeyState1 bra dbgk2 dbgkNotAlt: tst KeyState1 beq dbgk13 clr KeyState1 bra dbgk2 dbgk13: lda KeyState2 ; Check for CTRL-ALT-DEL anda #6 cmpa #6 bne dbgk14 cmpb #SC_DEL bne dbgk14 jmp [$FFFFFC] ; jump to NMI vector dbgk14: tst KeyState2 ; extended code? bpl dbgk15 lda KeyState2 anda #$7FF sta KeyState2 ldx #keybdExtendedCodes bra dbgk18 dbgk15: lda KeyState2 ; Is CTRL down? bita #4 beq dbgk16 ldx #keybdControlCodes bra dbgk18 dbgk16: bita #1 ; Is shift down? beq dbgk17 ldx #shiftedScanCodes bra dbgk18 dbgk17: ldx #unshiftedScanCodes dbgk18: abx ; index into table is scancode in accb ldb ,x ; load accb with ascii from table clra puls x,pc ; and return
_________________Robert Finch http://www.finitron.ca
|
Fri Jan 14, 2022 4:57 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2307 Location: Canada
|
I found the error in the keyboard routine about five minutes after I posted the code. Keyboard input works now.
Got the uart core up and going for 12-bit operation. It can transmit or receive data in 5 to 12 bits format and remains 6551 compatible.
There are several 12-bit bus peripherals now. Keyboard, Uart, Text controller, sprite controller and bitmap controller. Building a 12-bit eco-system.
_________________Robert Finch http://www.finitron.ca
|
Sat Jan 15, 2022 4:16 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2307 Location: Canada
|
Working on interrupts today. They almost work. The system hangs when the second interrupt occurs. I had to build a special circuit to detect the bad address of the second interrupt. <- The adjustment of the stack pointer during an interrupt call was off by one. The system hangs now because the input/output vectors are getting overwritten. But it only occurs if interrupts are enabled. It is strange because if the system is sitting idle interrupts can occur just fine. It appears to hang only when returning to the monitor after executing a monitor command.
Modified the keyboard routines. Getkey() may now either get a character directly from the keyboard device or less directly from a scancode buffer. The scancode buffer will be loaded by an interrupt driven interface.
Had to rearrange the boot code a little bit, the 16k ROM area is getting full.
For the NoC version, the code in the aging circuit was stripped out. Packets live forever now until they are processed. The issue was that when things were aged, a retry response was sent back to the requester for packets that were too old. This sort of worked, but caused a flood of retry requests and responses, ultimately hanging the network.
_________________Robert Finch http://www.finitron.ca
|
Sun Jan 16, 2022 4:49 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2307 Location: Canada
|
Ran into an interesting issue today. Indirect addressing was not working properly. The first issue found and fixed was the detection of indirect addressing mode. The wrong bit of the indexing post-byte was being checked. With that fixed, things almost work. The first fetch using an indirect address worked, but the second fetch failed to fetch from the correct address. It turns out the indirect address is complementing the high order byte of the indirect address every other time it is used. I cannot figure out why this is happening. I can see it in the logic analyzer. Every other time the indirect address is used, the high order byte is complemented. It is looking to me like some sort of hardware glitch.
Found a further issue with outer indexing. The core was still using the eight-bit method of detecting outer indexes for the twelve-bit version.
Worked on a disassembler for the monitor program. It should handle the twelve-bit nature of the core. It uses a couple of tables with a more or less brute force approach. It could be improved.
_________________Robert Finch http://www.finitron.ca
|
Mon Jan 17, 2022 4:24 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2307 Location: Canada
|
The core seems to run at 60MHz. It does not run reliably at 80Mhz although it can display the start-up message. At 60MHz the tools report timing is missed by a few ns. But it may be the case that the signals are not critical. The FPGA is likely somewhat faster than the minimum for the speed-grade.
I should mention I am working on two versions of the system. A single-core version and a network-on-chip version. The NoC version does not really work yet. It has been a lot of fun to debug. It gets as far as lighting up LEDs. The single-core version is obviously further along.
Adjusted the operation of the NIC for write cycles. They must now wait for an ack back before the cycle is complete. The issue is that when writes did not wait for an ack, they could end up writing in the wrong order because the packets might circulate around the network several times before getting the opportunity to write. I suppose asynchronous writes could be provided but that would mean altering the instruction set somewhat. There would have to be a means to indicate the type of write cycle in the instruction set. Maybe an asynchronous prefix byte? Or maybe an IO port switch. If clearing the screen for instance, we do not care what order the writes take place in. Only that they are all done. This is true for many circumstances such as initializing a buffer.
I changed the text screen layout from 56x29 to 64x32. Gives a few more characters and has a better font.
_________________Robert Finch http://www.finitron.ca
|
Tue Jan 18, 2022 5:11 am |
|
 |
BigEd
Joined: Wed Jan 09, 2013 6:54 pm Posts: 1821
|
Maybe a single extra instruction would be enough: a write barrier. https://en.m.wikipedia.org/wiki/Memory_barrier
|
Tue Jan 18, 2022 10:35 am |
|
Who is online |
Users browsing this forum: claudebot and 0 guests |
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot post attachments in this forum
|
|