Last visit was: Thu May 01, 2025 5:36 pm
|
It is currently Thu May 01, 2025 5:36 pm
|
Author |
Message |
oldben
Joined: Mon Oct 07, 2019 2:41 am Posts: 768
|
The problem with many registers is to keep track of them. When you look at high level languages they never really say what the limits are and you still have to spill registers. This favors the C style platform of code generation.The advantage with a stacked based architecture, is that while a little messy to cache,spills are hidden, and short order codes. This favors the Algol/Pascal platform. I think the PDP VAX had standard calling format, for everybody and it generated code for multi-able languages.
|
Fri Feb 07, 2025 6:08 pm |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2307 Location: Canada
|
Quote: The problem with many registers is to keep track of them. The same tracking is used as for a machine with 32-regs. For the Q+ ABI the register usage is much the same as a machine with 32-regs. Reg usage from codes 96 to 255 is completely undefined, expected to be used to process vectors. After that reg codes 65 to 91 are reserved as the high-order half of the args/temp/saved registers for 128-bit values. There are 8 args regs, 10 temp regs, and 9 saved regs defined in the ABI. These occupy regs 1 to 27. 5 regs codes are reserved for stack pointers, one for each operating mode. There are also 4 link registers, and 7 predicate registers. All these regs use up about the first 56 register codes. Only the registers used in the ABI have to be spilled, so it looks largely like a 32-reg machine. The ABI has A0L (arg zero low) is r1, and A0H (arg zero high) is r65. Other registers work similarly. Q+ originally had 32-regs and 32-vector regs. But supporting the vector register type added two pipeline stages to the processor for all instructions and would slow it down considerably. The choice was to go with a whole lot more registers and handle vector data on the software side of things.
_________________Robert Finch http://www.finitron.ca
|
Tue Feb 11, 2025 2:19 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2307 Location: Canada
|
Added the ADDW (add widening) instruction which computes the sum of two values and the carry. The results are placed in a pair of registers. This is for extended precision arithmetic. (There is already a MULW multiply widening instruction).
Sketched out how to support more instructions per cycle while using only four write ports on the register file. Currently Q+ has the output of functional units directly driving the write ports of the register file. Instead of doing this, output of functional units could go to an output queue for the functional unit. Up to four results from the output queues would be sent to the register file per clock. There could then be more functional units in operation. Since not all instructions require to write to the register file, the instructions not needing to write to the file could execute at the same time. Stores and branches do not need to update the register file. This is about ¼ of instructions. It should be possible to get more than four instructions per clock.
_________________Robert Finch http://www.finitron.ca
|
Thu Feb 13, 2025 3:35 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2307 Location: Canada
|
Slowly moving my cores over to cloud storage. Setup a Cores2025 folder.
Modified the register file code to return a -1 if predicate register zero is specified. Predicate register zero indicates to always execute all lanes for the instruction.
Extended the capabilities of the byte map instruction, BMAP, to include operations not requiring a mapping value. Simple operations like reverse byte order, or broadcast byte can be performed without having to load a map value.
_________________Robert Finch http://www.finitron.ca
|
Fri Feb 14, 2025 3:57 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2307 Location: Canada
|
Moving into year two of Q+ development. Modified the PRED modifier. Previously the modifier had three options for executing instructions under its scope. 1) the predicate bit could be ignored, 2) the in scope instruction could execute only if the predicate bit was set, or 3) the in scope instruction could execute only if the predicate bit was clear. The predicate bit came from register specified by Ra. PRED now operates slightly differently and more powerfully. It has four options. 1) the predicate bit can be ignored (always execute as before). 2) Execute in scope instruction only if predicate bit in Ra is set (as before). 3) Execute in scope instruction only if predicate bit in Rb is set. 4) Execute in scope instruction only if predicate bit in Rc is set. Note that now three different predicate registers may be applied to the group of instructions governed by the PRED modifier. Because register specs may be inverted with sign-control, the PRED modifier can operate in a backwards compatible fashion. Ra specified for Ra, and ~Ra specified for Rb, and finally R0 specified for Rc. Code: PRED r2,~r2,r0,”AIBIIIII” ; execute one if true, ignore one, next execute if false, one after always execute MUL r3,r4,r5 ; executes if R2 True ADD r6,r3,r7 ; always executes ADD r6,r6,#1234 ; executes if R2 FALSE DIV r3,r4,r5 ; always executes
_________________Robert Finch http://www.finitron.ca
|
Tue Feb 18, 2025 4:35 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2307 Location: Canada
|
Rearranged the displacement field for load / store instructions. It was getting a bit dog-chewed. It is now simply the upper 24 bits of the instruction. Replaced the predicate register spec with a data type field. It was either add a data type field, or add a whole whack of opcodes to the ISA. The data type is one of 0) integer, 1) float, 2) decimal float or 3) posit. The data type is necessary as it is handy to convert float data to the appropriate internal representation. NaN boxing where the incoming data is less precise than a double value.
_________________Robert Finch http://www.finitron.ca
|
Sat Feb 22, 2025 8:17 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2307 Location: Canada
|
Updated the assembler with the new formats for load and store instructions.
Wrote a module to route results to the register file. Supports up to four write ports on the register file, from up to twelve source input ports. Multiple clock cycles are required if more than four ports need to be written during the clock cycle. 1 additional clock for up to 8 ports, 2 clocks for all 12.
Currently the register file supports six write ports to handle writing from each of six functional units. The number of write ports on the register file can be reduced which will reduce the size of the register file significantly as each write port is a replication of the file.
_________________Robert Finch http://www.finitron.ca
|
Sun Feb 23, 2025 5:08 am |
|
 |
oldben
Joined: Mon Oct 07, 2019 2:41 am Posts: 768
|
Is there any way to make the logic easy to set in a test mode. The register file needs to be bug free.
|
Sun Feb 23, 2025 8:27 pm |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2307 Location: Canada
|
Quote: Is there any way to make the logic easy to set in a test mode. The register file needs to be bug free. It could be parameterized for different modes. But just adding parameterization adds a level of complexity, increasing the odds of an error. Currently the register file code checks whether its being processed for simulation or synthesis and uses specialized code for one or the other. There is also a parameter to set the number of registers, and number of read ports. Number of write ports is not currently a parameter. It is more LOC to make it so. Changes:Significantly altered the opcode arrangement for loads and stores. Incorporated the data type as the opcode and included a precision field in the instruction. Previously precision was determined by the opcode, and there was a data type field in the instruction. So, the precision and data type decoding have been flipped around. Flipping the fields around like this freed up about four root opcodes. There are about 23 available opcodes at the root level now. Having the opcode connected to the data type seems more natural. Type should take precedence over the size for decoding. I am sketching out how to incorporate 32-bit opcodes into the design. Not sure I will try implementing it though. If 50% of instructions can be captured as 32-bit instructions it would go a long way towards increasing code density. To incorporate 32-bit opcodes more decoders (twice as many) are required. The outputs of the decoders are selected according to the PC value for the instruction. Fortunately, decoders are relatively inexpensive and they are already modularized. PC increment is currently very simple, just add 32 bytes (to fetch four instructions) to the program counter for each sequential fetch. Having variable length instructions makes things more complex. The lengths must be decoded and the correct PC selected for the next instruction bundle. There may be an additional timing delay to support variable length (32/64 bit) instructions.
_________________Robert Finch http://www.finitron.ca
|
Mon Mar 03, 2025 4:52 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2307 Location: Canada
|
Worked on shift operations today, creating 32-bit instruction forms, and making better use of 64-bit forms.
Modified the operation of the LSRP (logical shift right pair) to include masking the result. Called the instruction LSRPAM temporarily. This is like the ROLAM (rotate left and with mask) instruction of the PowerPC. However it is possible to extract a bit-field that spans a pair of words. Since it encompasses the operation of the EXT and EXTU instructions they were made redundant. Or rather the LSRP instruction was renamed EXT[U] and the two bit-field specific instructions removed. LSR is a subset of the EXTU instruction.
I am also planning to turn the ASLP instruction into a field-insert instruction.
_________________Robert Finch http://www.finitron.ca
|
Tue Mar 04, 2025 4:59 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2307 Location: Canada
|
Sketching out the Qupls2 architecture. Qupls2 will use the Qupls hardware but with changes to the front end. Instructions will be more compact.
Going with a two-bit length code as the first two bits of an instruction like RISCV. However, the code indicates 24/48/96-bit instruction lengths. Going to try to have the most common operations as 24-bit instructions.
There are 64-registers, but only the first 32 are general purpose in nature. The last 32 include things like multiple stack pointers, link registers, and registers for high-order 64-bit ops. All 64-registers are renamable.
Branches are compare-and-branch 48-bit instructions. The target has three address modes, relative, absolute and register indirect. 21-bit branch target field.
Loads and stores use scaled indexed addressing with an 18 or 66-bit displacement.
_________________Robert Finch http://www.finitron.ca
|
Thu Mar 06, 2025 4:25 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2307 Location: Canada
|
Documenting the Qupls2 architecture. It is time consuming as there are about 500 pages to port from Qupls. All of the instruction formats need to be revised. One difference Q+2 has from Q+ is the register file specs float around a bit. In Q+ they were always in the same position. This change was needed to support 24-bit formats and other miscellaneous instructions like branches. Most instructions use a six-bit register spec with an additional bit for sign control. However, some instructions like branches do not support sign control. Floating the register specs probably will not impact this design as the register spec is a registered output of the decoder. However, some designs may use the register spec in an unregistered fashion to trim a clock cycle time from the register file access. Having a floating register spec would not be a good choice for such designs. Floating register specs also make the assembler more complex. ADD, AND, OR, EOR, CMP have 24-bit instruction forms: iiiiiaaaaatttttooooooo00 <- immediate bbbbbaaaaatttttooooooo00 <- register
LOAD / STORE word size have 24-bit forms dddddaaaaatttttooooooo00
Conditional Branches (compare and branch) are 48-bit ppRTTTTTTTTTTTTTTTTTTTaaaaaabbbbbbAffffooooooo01
iii…17 more….iii pp aaaaaaa ttttttt ooooooo 01 <- immediate mode (23 bit imm) instructions iii…65 more….iii pp aaaaaaa ttttttt ooooooo 10 <- immediate mode (71 bit imm) instructions
With load / store / basic arithmetic as 24-bit and 48-bit compare-and-branch a good portion of instructions should occupy the same storage space as a 32-bit ISA. I have not written the assembler yet, so nothing to measure.
_________________Robert Finch http://www.finitron.ca
|
Fri Mar 07, 2025 4:51 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2307 Location: Canada
|
More documentation of the Q+2 architecture. Today shift and bitfield instructions. I lumped these two together in the Q+2 docs as they are closely related.
Renamed some of the ZSETxx and SETxx instructions to conditional move CMOVxx instructions which is semantically closer to what the instructions do. A SETxx instruction implies to 0 or 1, whereas the MOVxx instruction implies moving a register value. The conditional move instructions CMOVxx can also be used to set a value which makes SETxx instructions redundant.
Got rid of the CMOVZ instruction. CMOVZ was the same as CMOVNZ with source registers swapped, making it redundant.
_________________Robert Finch http://www.finitron.ca
|
Sat Mar 08, 2025 4:50 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2307 Location: Canada
|
Changes Added partial support for shortcut pages in the MMU/TLB. Shortcut pages are in the spec but were not previously supported. Page levels 1 and 2 can be shortcut in the spec resulting in 8MB or 8GB pages, currently shortcutting is only present for level 1. This allows a page size of 8MB to be used. Support for level 1 shortcutting was made a TLB option as it makes the TLB about 100% larger.
Bug Fixes When the bus got expanded to 256-bits the MMU did not get updated. It was still trying to use a 128-bit bus. MMU has now been switched to use the 256-bit bus.
Still documenting away.
_________________Robert Finch http://www.finitron.ca
|
Thu Mar 13, 2025 3:07 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2307 Location: Canada
|
Delved into the world of MSI and MSI-X interrupts. Been doing some reading up on the topic. It is quite complex especially when multiple CPU cores and peripherals are involved.
Added a QMSI interrupt controller for Qupls. Note that the controller was called a QMSI controller as it does not exactly follow the MSI or MSI-X standard and was built for the Qupls SoC. Unfortunately the acronym MSI seems to be tied to a particular implementation of message signaled interrupts.
QMSI is message signaling. For QMSI messages are sent in the form of bus slave responses to the interrupt controller. This differs from MSI which sends messages in a bus master fashion. The controller places the interrupt message in one of sixty-three priority queues. If there is a message in any queue, then an interrupt is signaled to the CPU. The queues are searched in priority order. The controller can detect some stuck interrupts. It does this by keeping an interrupt message history with a 24-bit timestamp, and if the same message is present within a short time, then it is assumed the interrupt is stuck active. The CPU also has access to the queue status signals. If any queue is full then a flag is set in the interrupt controller’s status register.
The MSI-X standard has a lot of capability to it. One of the neat things it can do is directly log interrupt requests to memory. It can do this because messages are sent in a bus master fashion. This is not available with QMSI but it can be emulated using a dummy CPU core that redirects interrupt responses to memory.
_________________Robert Finch http://www.finitron.ca
|
Fri Mar 14, 2025 3:01 am |
|
Who is online |
Users browsing this forum: claudebot and 7 guests |
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot post attachments in this forum
|
|