Last visit was: Thu May 01, 2025 12:12 pm
It is currently Thu May 01, 2025 12:12 pm



 [ 204 posts ]  Go to page Previous  1 ... 9, 10, 11, 12, 13, 14  Next
 Qupls (Q+) 
Author Message

Joined: Mon Oct 07, 2019 2:41 am
Posts: 768
You may want a few more segments if you have a window display, and heap.


Sun Mar 30, 2025 4:43 pm

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2307
Location: Canada
It is a good idea but I am not sure I want to dedicate address bits to referencing them. PowerPC has 16 segment registers, I'd follow the same pattern.
But ATM the bounds registers can use means other than the address to be selected. For instance, the code bounds is selected for instruction fetches. The stack bounds can be detected by the use of the stack pointer register in instructions, and the remaining bounds are for data.
I switched to having base and bound registers (instead of just bounds), adding more is tempting. It is also more context to save on context switches and interrupts.

I had been working on context switch code earlier today. I was going to micro-code it, but it is too complex and too many things could go wrong.
I ended up adding a limit feature to the BLR (branch-to-link) instruction for tabular jumps. It is somewhat similar to the memory indirect on the My66000. The branch register contains a table offset, and it must be between zero and the limit. Otherwise it branches to the limit address.

I also discovered an issue with jump tables and the way immediate constants are handled. Because large constants are placed at the end of a cache line, a jump table cannot have any large constants in it, or they could end up in the middle of a table.

Sample context switch code:
Code:
#
   loadi br6,XJMP            # select context switch function
   loadi a0,destination TCB
   sys                            # call the system
   # we get back here after the context switched



# system dispatcher
# The system routine must end with RFI
   blr DispatchTable[b6],DispatchTableLimit


DispatchTable:
   .4byte   ContextSwitch
   
# Context switch code   
ContextSwitch:
   csrrw r0,SCRATCH,a0      # save a0 in scratch register
   csrrd a0,TCBA               # get TCB pointer
   store a1,8[a0]            # save a1
   store a2,12[a0]
   store a3,16[a0]
   store a4,20[a0]
   store a5,24[a0]
   store a6,28[a0]
   store a7,32[a0]
   store t0,36[a0]
   store t1,40[a0]
   store t2,44[a0]
   store t3,48[a0]
   ...
   csrrd a1,SCRATCH         # get value of a0
   store a1,4[a0]            # save it
   # check if the FP registers should be saved
   ...
   # save stack pointers and branch registers
   movea a1,USP
   store a1,256[a0]
   movea a1,SSP
   store a1,260[a0]
   movea a1,HSP
   store a1,264[a0]
   movea a1,MSP
   store a1,268[a0]
   move a1,BR1
   store a1,276[a0]
   move a1,BR2
   store a2,280[a0]
   ...
   # store the return program counter
   ...
   move a1,CR0
   store a1,320[a0]
   move a1,CR1
   store a1,324[a0]
   ...
   move a1,LC
   store a1,352[a0]
   move a1,MCLR
   store a1,356[a0]
   move a1,MCPC
   store a1,364[a0]
   csrrd a1,SR
   store a1,384[a0]
   csrrd a1,PBL
   store a1,388[a0]
   #
   # Load the destination context
   csrrd a0,SCRATCH         # get destination TCB
   csrwr r0,TCBA,a0         # update running TCB address
   load a1,388[a0]
   move PBL,a1
   load a1,384[a0]
   move SR,a1
   load a1,364[a0]
   move MCPC,a1
   ...
   # walk forwards loading registers
   load a1,8[a0]
   load a2,12[a0]
   load a2,16[a0]
   load a3,20[a0]
   ...
   rfi


I am trying to use some code example to determine if there is anything amiss.
I decided to add an explicit 'loadi' (load immediate) as it can load more bits than using an add or or instruction. It can also access more registers than just the GPRs.
I have the move instruction setup the same way. It can access all the programmable registers (96 of them). A normal instruction can access only 32 regs.
Not keen on adding 12 more segment registers. But maybe four more. They are not accessible to user / app mode. Only OS software needs to know about them.

_________________
Robert Finch http://www.finitron.ca


Sun Mar 30, 2025 5:57 pm WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2307
Location: Canada
A bug was found in the 6809 version of Femtiki, I could not get it to work very well. It turns out there was a bug in the timer decrement routine. I found it when porting the code over for Qupls3.

I got a good chunk of the kernel ported over for Q+3 and changed a couple of instructions as a result. The PUSH and POP instructions can now push or pop a list of registers instead of just four max. Up to 17 regs may be pushed or popped with a single instruction. The register list is specified in groups since there are potentially 96 registers that could be stacked. The new version of PUSH and POP can also handle all 96 registers. Before there were separate instructions to push only integer registers or floating point registers.

Mainly worked on the task switch code. It is over 300 LOC in assembler.

_________________
Robert Finch http://www.finitron.ca


Tue Apr 01, 2025 3:48 am WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2307
Location: Canada
Changes / Additions:
Added the BMAP instruction to the architecture which can map bytes from a source onto a destination. It is very flexible and can permute bytes, reverse order, broadcast, mix and shuffle. It got added because it is used by software in the library functions which I did not want to re-write.
Changed the FTA bus so that there is only a single address in the command request and added a flag as to whether it is virtual or physical. Previously it had separate physical and virtual address fields. This was extra bus baggage as there is only ever one used at a time. Had to go through all the peripherals and change .padr to .adr. Going into the MMU it is virtual; coming out of the MMU it is physical.
Also removed the stb signal from the bus. Stb was a holdover from the WISHBONE bus to control when data was strobed. It did not apply in the FTA bus. And, removed the asid from the bus. It had no business being there. Asid is just used by the TLB to prevent the need for it to be flushed on a context switch.
In the MMU, the bus retry wait was changed to be a random number of cycles from 1 to 32. Previously it was a fixed wait time of 32 cycles.

Bug fixes:
In the MMU the select lines were being set inactive if a bus retry was needed.

_________________
Robert Finch http://www.finitron.ca


Thu Apr 03, 2025 8:09 am WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1821
Is the random bus retry a means of improving test coverage, or is it a feature somehow improving performance?


Thu Apr 03, 2025 8:28 am

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2307
Location: Canada
Quote:
Is the random bus retry a means of improving test coverage, or is it a feature somehow improving performance?

It is an attempt to avoid collisions on a shared bus. If the wait times were fixed and there was bus contention it might get stuck repeating. Of course if the same LFSR is used in different devices to generate a random wait, the result might be the same anyway. It should also improve performance as the average wait time is reduced.

_________________
Robert Finch http://www.finitron.ca


Thu Apr 03, 2025 2:31 pm WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1821
Ah, I see. I wonder how a fibonacci backoff might perform, perhaps with a small random perturbation if that's appropriate.


Thu Apr 03, 2025 2:59 pm

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2307
Location: Canada
Quote:
I wonder how a fibonacci backoff might perform
Fibonacci might be a good one to try. I was looking for something really simple that does not use a lot of hardware. I should look at network controllers again. Could also do quadratic backoff.

Changes / Additions
Added a feature to branch-and-link instructions. Normally the destination link register cannot be br7 as that is a read-only reference to the program counter to generate program counter relative addresses. So, storing a return address there is illegal and would generate an illegal instruction trap. However, this has changed to generate a call to an interrupt subroutine instead. So, it is now possible to easily run hardware interrupt routines from software.
This arose from the need to invoke the timer ISR to perform context switching in some cases from OS calls. For instance sleep() now causes a context switch. Also called from StartTask() and ExitTask() when I get around to finishing those routine.

_________________
Robert Finch http://www.finitron.ca


Fri Apr 04, 2025 3:19 am WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2307
Location: Canada
Added code in the assembler to merge common constants together, using the same constant bucket. This reduces code size. It happens surprisingly often.

Changes
Changed the PRED instruction to reusing conditional branch instructions. Since it is illegal for a conditional branch to store a return address in the PC register, that opcode is repurposed to represent a predicate “branch”. The instruction has the same format as the branch instruction, except that it operates under the opposite condition. The predicate instruction accepts a destination label like a branch, but the destination label represents where the predication stops. If the predicate is false, the CPU will skip over instructions until it hits the destination label. Otherwise, the instructions will be executed. The destination label for a predicate must come after the predicate and be within 13 instructions. The branch displacement field turns into an instruction mask field for a predicate. The reason a mask is used instead of a displacement: a displacement would require an address comparator for every instruction following the predicate in the re-order buffer. A mask is a lot less expensive. It just loads into a shift register and shifts out as instructions are encountered.
Code:
BEQ mylabel   # branches to my label if condition is true
PEQ mylabel    # executes instructions if the condition is true

The following code shows the use of a predicate, note that there are no branches in the code. Compare results are combined using CR operate instructions to generate a predicate. One might think it may be slow without branches bypassing some of the code, however the code should only take about 3 clocks to execute. If there were a branch miss it would take about 10 to 13 clocks for each miss. The code with branches could end up taking considerably longer to execute.
Code:
                                  284: FMTK_LockSemaphore:
                                  285:    macAdrCheck %a2
01:00000104 031A0000                1M    cmpai %cr0,%a2,0                  # NULL pointer?
01:00000108 431A00A0                2M    cmpai %cr1,%a2,0x00800000      # too low
01:0000010C 831A04A0                3M    cmpai %cr2,%a2,0xC0000000      # too high
01:00000110 0B002C02                4M    cror %cr0?eq,%cr0?eq,%cr1?lt
01:00000114 0B00500E                5M    crorc %cr0?eq,%cr0?eq,%cr2?le
01:00000118 F93E80E0                6M    peq %cr0,.00017
01:0000011C 44000200                7M    loadi %a0,E_Arg
01:00000120 1B5A0000                8M    b OSExit
                                    9M .00017:


The strange question marks in the compare instructions indicate which bit of the condition register to select. I tried to use a ‘.’ But the assembler insisted on interpreting the result as a number and gave a divide by zero error.

Managed to free up an opcode by moving the instructions into used portions of other opcodes. There are 12 out of 64 opcodes available.

_________________
Robert Finch http://www.finitron.ca


Mon Apr 07, 2025 5:19 am WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1821
That's quite something - the CPU can execute some 7 or 8 instructions in 3 clocks if predication makes enough of them NOPs.

Thanks for the example - unfortunately I quite don't understand it! I'm having to guess:
3 comparisons set flags
2 predicated instructions do something, or nothing
1 predication marker marks two subsequent instructions as predicated
2 further instructions now do something, or nothing

Is that about right?


Mon Apr 07, 2025 7:08 am

Joined: Mon Oct 07, 2019 2:41 am
Posts: 768
Faster is good, providing you don't out pace your cache.


Mon Apr 07, 2025 8:22 am

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2307
Location: Canada
Quote:
hat's quite something - the CPU can execute some 7 or 8 instructions in 3 clocks if predication makes enough of them NOPs.

Thanks for the example - unfortunately I quite don't understand it! I'm having to guess:
3 comparisons set flags
2 predicated instructions do something, or nothing
1 predication marker marks two subsequent instructions as predicated
2 further instructions now do something, or nothing

Close. The "2 predicated instructions" are not predicated instructions. They are instructions that combine condition register bits logically. They are always doing something. It's "cr" operate instructions.
The predication marker uses the status of the condition register to predicate the following instructions. The branch label indicates the window of the predicated instructions. The assembler calculates a mask for the instructions instead of a displacement.

I think it may be possible for the CPU to execute all eight instructions in just two clock cycles, I may be overly optimistic. But I would need to measure it, and I do not have a working CPU ATM. The "cr" instructions execute one-at-a-time, but the compares can all execute in parallel.

The '?' in the CR register names is in lieu of a '.'. I could maybe change it to [] if that makes more sense. CR0?EQ means the EQ bit in the CR0 register. So, the individual bits are being 'or'd "cror"

I have stared at assembler for so long, I think it is easy to understand.

_________________
Robert Finch http://www.finitron.ca


Mon Apr 07, 2025 1:09 pm WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1821
Thanks for explaining!


Mon Apr 07, 2025 1:15 pm

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2307
Location: Canada
Quote:
Faster is good, providing you don't out pace your cache.

Yes. It'll be fast if everything is loaded in the cache. But otherwise quite slow.
I have not released anything yet. I am beginning to believe things are just vapor ware. A lot of learning and experimentation with ideas, to innovate just the right mix.

_________________
Robert Finch http://www.finitron.ca


Mon Apr 07, 2025 1:18 pm WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2307
Location: Canada
Modified the system call instruction to only escalate the call to the next higher operating mode or environment. Renamed “SYS” to “ECALL”. Previously it would switch to machine / secure operating mode. ECALL also now accepts a vector number to call. This is to allow for different environment call dispatchers. There are 4096 vectors allowed. Vectors zero through ten are reserved for the operating system dispatchers. Vector eleven is for built in ROM routines.

_________________
Robert Finch http://www.finitron.ca


Tue Apr 08, 2025 2:13 am WWW
 [ 204 posts ]  Go to page Previous  1 ... 9, 10, 11, 12, 13, 14  Next

Who is online

Users browsing this forum: claudebot and 0 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software