Last visit was: Fri May 02, 2025 9:00 am
|
It is currently Fri May 02, 2025 9:00 am
|
Author |
Message |
BigEd
Joined: Wed Jan 09, 2013 6:54 pm Posts: 1821
|
This sounds like you've wrestled it into a healthier state - hurrah!
|
Mon Mar 21, 2022 4:18 pm |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2308 Location: Canada
|
It is slowly making progress.
Latest fixes: The LEA instruction opcode was freed up and LEA made an alternate mnemonic for ADD but the assembler was not updated. This led to the wrong value loaded into the global pointer register.
Latest mods: had a case where the BIU locked up waiting for an ack, but there was no active memory cycle. The ack must have got missed, the cycle must have aborted, or there is a hardware error of some sort. The hardware is complicated so maybe I missed a corner case where the bus was not to be active. The BIU was modified to continue if either an ack was present or the bus was no longer active.
A memory load / store queue was written. It can accept input from two sources and bypass stores to loads. It was integrated into the BIU, replacing a fifo. Only one source is used for this design. The queue depth was set quite shallow since the current design has a synchronous memory interface. There is only ever one outstanding memory request.
_________________Robert Finch http://www.finitron.ca
|
Wed Mar 23, 2022 5:03 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2308 Location: Canada
|
Squeezed the PTE back down to 128-bit by removing most of the access rights information. This allows twice as many PTEs to be used in mapping. The PTE only maps an address now. There are 32768 PTEs, four times the number of physical pages. The other information usually associated with a PTE comes from a second table, PMT which contains only 16384 entries. The PMT contains the protection key, privilege level, access count, and rwx access rights. rwx access rights are also stored in the PTE so they may be setup differently for different users. The PMT is accessed after the physical address is known. So, translations require two block RAM accesses which amounts to two clock cycles as the block RAM is clocked at double the cpu clock rate. The setup allows for up to 1GB physical space split between DRAM, 512 MB and secondary storage.
_________________Robert Finch http://www.finitron.ca
|
Thu Mar 24, 2022 3:46 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2308 Location: Canada
|
The compiler ran out of temporary registers and output incorrect code. Ran into a case where constants were not being reduced to a single constant. Started working a new compiler based on VBCC.
_________________Robert Finch http://www.finitron.ca
|
Sat Mar 26, 2022 3:12 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2308 Location: Canada
|
Worked on the compiler code generator. Set things up so that prolog and epilog statements completely override the generation of prolog and epilog code. The prolog and epilog statements allow things like interrupt routines to be written. They give greater control over code without needing the ‘interrupt’ keyword applied to a function.
_________________Robert Finch http://www.finitron.ca
|
Sun Mar 27, 2022 3:15 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2308 Location: Canada
|
Modified the operation of the JBS instruction. The JBS and JBC instructions now share the same opcode. This frees up an opcode. This was possible because JBS / JBC do not need to regard the compare method, so that field was reused for additional opcode bits.
Modified branches repurposing the Tb field into a branch displacement bit. This gives branches a whopping 21-bit displacement for a range of ±2MB.
In the Thor test system, there is a table secondary to the page translation table that stores info that may be used by hardware. The table is accessed once the physical address is known, so it adds a clock cycle to memory access. It could be loaded into the TLB when a page is loaded, but due to hardware limitations (No TLB) that is not done on the test system. For lack of a better name I am calling this the page management table, PMT. It contains: SC - Share Count M – page modified bit AC - Access Count, number of times page accessed since last clear of count ACL – access control list reference KY - key required to access page PL – privilege level required to access page PCI – sub page compression indicators AL – compression algorithm EN – encrypted page indicator N – conforming code page indicator V – entry valid bit
_________________Robert Finch http://www.finitron.ca
|
Mon Mar 28, 2022 3:29 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2308 Location: Canada
|
Test output almost worked now. Displaying the tetra value 0x87654321 displays 8<space>654321, it’s a bit maddening. It could be a glitch in the update of display memory. Previously the display of the constant was something like: 87:;=?89. What I did to improve the output was zero out a whole bunch of pipeline registers on a flow-change, instead of simply clearing the valid flag bits. There must have been something in the pipeline causing erroneous operation. Anyway, it seems to be fixed now. Was stuck on that bug for like months.
After a couple of compiler fixes the MapPage() routine seems to work. MapPage() looks for an empty entry in the hash table where to place a new map entry. Another routine does an absolute placement of an entry overwriting any existing entry in the same spot.
_________________Robert Finch http://www.finitron.ca
|
Tue Mar 29, 2022 4:31 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2308 Location: Canada
|
Updating the TLB to read the PMT on a load and write the PMT when a modified TLB entry is written. Changed the way the TLB is accessed. Previously it was updated using a hexi-byte pair store instruction. And the TLB appeared as a series of memory locations. However, the hexi-byte pair did not contain enough information. Now the TLB is updated indirectly. There is an array of eight buckets, of which only five are used, to hold onto the TLB info for update. After all buckets have been set appropriately, the TLB is updated in an atomic fashion, triggered by a write operation to bucket seven. The TLB buckets appear as a MMIO device.
_________________Robert Finch http://www.finitron.ca
|
Thu Mar 31, 2022 3:28 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2308 Location: Canada
|
Added a shorter branch and branch to subroutine instruction. The compiler was modified to use the shorter instruction format for functions declared as static. Static functions are local to the translation unit being processed and hence very likely can be accommodated with shorter subroutine calls. The short call version supports a 19-bit displacement value or +/- 256kB.
Added shorter forms for shift immediate instructions, but the shift amount can go only up to 63. To shift more bits the amount must be loaded into a register and the register form used.
_________________Robert Finch http://www.finitron.ca
|
Fri Apr 01, 2022 3:08 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2308 Location: Canada
|
What I am looking at for opcode space. One of the groups of barred off instruction opcodes will likely be used to implement 16-bit compressed instructions. Attachment: RootOpcodes.png
You do not have the required permissions to view the files attached to this post.
_________________Robert Finch http://www.finitron.ca
|
Fri Apr 01, 2022 8:41 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2308 Location: Canada
|
Added short 16-bit form register indirect load and store to stack instructions. These instructions are used at function entry and exit to save and restore registers. Just these two instructions saved about 7 to 8% of code space.
With the shorter branch to subroutine the link register was not being set. This led to an infinite loop forming as a return to a previous address occurred.
_________________Robert Finch http://www.finitron.ca
|
Sat Apr 02, 2022 2:56 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2308 Location: Canada
|
Added the load hexi-byte quad instruction which loads a 512-bit value into a group of four registers. Among other things, this instruction allows a page table group, PTG, to be loaded using a single burst memory access, or if found in the cache one cache access time. Added a hash table accelerator instruction. The PTENDX instruction locates a PTE in a quad of registers with a matching virtual address. It works like the other indexing instructions and returns the index of the PTE or -1 if not found in the registers. Code: # Incoming: # a0 = virtual address # a1 = hash table base address # a2 = 0 PTGHASH t1,a0 # get hash of virtual address SLL t1,t1,6 # turn hash into group index, 8*8 LDI t6,0 # t6 = miss count .again: MULF t7,t6,t6,a1 # square miss count, fast multiply and add base table address LDHQ t2,[t1+t7] # get the group into t2 to t5 PTENDX t1,a0,t2 # search for PTE BGE t1,r0,.found # exit loop if found ADD t6,t6,1 # increment miss count BBC t6,4,.again # if fewer than 16 tries, repeat < page fault code >
_________________Robert Finch http://www.finitron.ca
|
Sun Apr 03, 2022 3:25 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2308 Location: Canada
|
Latest Fixes: the 2R form of the subtract instruction was not implemented, but was used by the assembler for the NEG instruction. This led to sprites not bouncing at the edge of the display screen as the dx, dy movement components were not negated.
Removed the PTENDX instruction. For some reason it causes the core to hang during startup. I suspect a bad route in the FPGA as I cannot see any reason why adding this relatively simple module would affect anything.
Spent some time playing around with the DSD design. Had a look at the micro-ops for the superscalar 6502. Trying to see if something similar could be done for a 68000 processor. Also coded a xmodem receiver. The hope is to use it to load software rather than rebuilding the system all the time.
_________________Robert Finch http://www.finitron.ca
|
Mon Apr 04, 2022 4:01 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2308 Location: Canada
|
Spent most of the day converting the in-order core to an out-of-order core. First renamed the core, suffixing it with an ‘oo’ to indicate the out-of-order version.
_________________Robert Finch http://www.finitron.ca
|
Tue Apr 05, 2022 4:19 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2308 Location: Canada
|
Finally got things running well enough in simulation to try synthesizing the design. I was sure it would blow the LUT budget given the amount of additional bypassing logic. But it turns out it cost only about 5,000 additional LUTs. Much better than I expected.
It is interesting because the reorder buffer is essentially randomly ordered. Ordering is determined by supplying a sequence number with each instruction. Instruction fetch places an instruction in the buffer wherever it can find an empty entry. Decode then searches the buffer for fetched instructions and decodes them. Decoding can occur in any order. Once instructions are decoded and arguments are valid they may be executed. Instructions may execute in any order, excepting memory instructions for which strict ordering is applied. Once instructions are executed, the buffer is searched for executed instructions that are next in sequence after the just previously executed instruction then those instructions are copied to the register file and the instruction marked retired. This is how order is maintained.
Essentially each stage searches for entries in the buffer in the appropriate state.
One complication is instruction prefixes. Since instructions and prefixes may be placed into the buffer in any position, for a prefix to work the buffer must be searched for prefixes when the instruction is executed. Prefixes can be retired only when the corresponding instruction is retired.
One issue is that simulation becomes doggedly slow after about 300 us. That is not far enough into the boot code to see LEDs activated in sim.
_________________Robert Finch http://www.finitron.ca
|
Wed Apr 06, 2022 4:02 am |
|
Who is online |
Users browsing this forum: claudebot and 0 guests |
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot post attachments in this forum
|
|