Last visit was: Fri May 02, 2025 8:05 am
|
It is currently Fri May 02, 2025 8:05 am
|
74xx based CPU (yet another)
Author |
Message |
joanlluch
Joined: Fri Mar 22, 2019 8:03 am Posts: 328 Location: Girona-Catalonia
|
Wow, that video and presentation is a gem!. I must confess that not being able to catch the subtleness of the (English) language, and not knowing about the work of this guy, I was a bit confused at the beginning, as this speech was apparently given only a few yeas ago, as you actually pointed out!. But I didn't figure it out until relatively late at watching the video. It got me thinking weird things for a while. Thanks for that!. The Binary to Assembly Language to Fortran "enormous amount of resistance" tells a lot about human nature, very true. I also found this http://worrydream.com/dbx/ about this presentation by the author, with several links to documents (including videos) from the original times.
|
Thu Sep 12, 2019 7:06 am |
|
 |
BigEd
Joined: Wed Jan 09, 2013 6:54 pm Posts: 1821
|
So glad you liked it - and thanks for the link. I will add that to the post over on retrocomputing. It's one of my favourite talks, for the information, the mode of delivery, and the overall message.
Back to prefixes and so on: I think somewhere we've discussed whether or when the short constants in opcodes should be sign-extended. There are various pros and cons, and it might be that different tactics apply to different opcodes. (Another way to extend the usefulness of a short constant is to left-shift it, perhaps for example to point only to even addresses. This could get messy!)
|
Thu Sep 12, 2019 8:05 am |
|
 |
joanlluch
Joined: Fri Mar 22, 2019 8:03 am Posts: 328 Location: Girona-Catalonia
|
Hi Ed, about sign or zero extension of short constants embedded in opcodes, I have given some thought and also looked at what's most commonly used by the compiler. For some instructions, such as PC relative branches, it is obvious that they must be always sign extended, but I decided the following for the instruction set.
- CALL (11 bits ): Zero extends, as the constant represents absolute unsigned addresses.
- JMP, BRCC (9 bits): Sign extends, as it represents a positive or negative PC relative offset.
- ADD, SUB (8 bits): Both zero extend, as they are complementary instructions that the compiler can chose depending on whether a constant must be added or substracted
- AND (8 bits): Zero extends, as I think using sing extend for this one would look weird
- MOV (8 bits): Sign extends, so both small positive and negative values can be expressed
- CMP (8 bits): Sign extends, as it is desirable that comparisons with small negative numbers can be carried out
- "Indirect addressing modes" and "load effective address" instructions such as LD.W [SP, 8], R0, or LD.W [R0, 10], R1 or LEA SP, 44, R2 : All of them Zero extend the constant, so only positive offsets are available on them, (8 bit for the SP, and 5 bit for general purpose registers). I decided against negative offsets because I found that they are virtually never used, specially since I incorporated the "Base Frame Register" feature in the compiler. In the case of the SP or the Frame Pointer/ Base Pointer register, all offsets are positive because the SP is always at the top of the stack (the lower frame address) and the FP is at the lower address of the interesting access range.
- Access to global variables, for example with LD.W [myVarName], R0, or struct members with LD.W [R0, 10], R1, or even instances of this LD.W [R0, myVar], R1 : All require only positive offsets because the compiler always references base addresses of objects.
- All prefixed instructions, that is, all of the above instructions preceded by a prefix instruction, just combine the 11 bits of the prefix with the lower 5 bits of the embedded constant to get a 16 bit value, so whatever the constant value is, its sign will be implicit through the 16 bit, 2's complement arithmetic that will be performed with it, thus enabling negative offsets if necessary for the indirect addressing modes.
About left shifting the constant, this is also something that was considered and discussed here. In my case I think the only use I have for that is PC relative/absolute addresses, and SP arithmetic.
- Program memory addresses are already expressed in words. All instructions are 1 word long, so the PC and all program memory addresses are already implicitly shifted. The total program memory space is 128K although I have 64K addresses.
- Stack Pointer arithmetic could benefit from left-shifted constants because the SP is always word aligned. But given the constrains of the architecture and register set, this is not possible (or practical), because the same exact instruction codes are used to perform General Purpose Register arithmetic and SP arithmetic.
- The specific SP indirect addressing instructions, can't either benefit from that because although the SP is always pointing to an even address, the instruction may require the load of a single byte in an odd address. For example, the "load zero extended byte" instruction LD.SB [SP, 3], R0 may require an odd numbered offset. This is allegedly very rare, because it will only happen with structs passed by-value containing odd positioned 'char' fields. But this forbids the use of left-shifted constants, unless the compiler was to generate explicit code to access such odd positioned bytes, which I ultimately decided to avoid.
|
Thu Sep 12, 2019 9:17 am |
|
 |
BigEd
Joined: Wed Jan 09, 2013 6:54 pm Posts: 1821
|
Thanks for the comprehensive response! I confess I mentioned sign-extension without a firm mental picture of what had and hadn't been discussed or decided...
|
Fri Sep 13, 2019 6:01 am |
|
 |
joanlluch
Joined: Fri Mar 22, 2019 8:03 am Posts: 328 Location: Girona-Catalonia
|
BigEd wrote: Thanks for the comprehensive response! I confess I mentioned sign-extension without a firm mental picture of what had and hadn't been discussed or decided... I believe this was in the context of a comment by Rob (I think) who made an interesting suggestion about "displaced" immediate fields (not sure how he named it), meaning immediate constant fields with a non symmetrical range on the negative and positive side. I had that implemented for a while but I certainly did not post my late decision on the subject in any detailed way. The summary is that I finally decided to set most of them only in the positive range.
|
Sat Sep 14, 2019 1:10 pm |
|
 |
joanlluch
Joined: Fri Mar 22, 2019 8:03 am Posts: 328 Location: Girona-Catalonia
|
I did some more work and started testing the operations that the processor does not support directly, and therefore require library calls, such as non-constant shifts, multiplication and division, with the compiler + assembler + simulator. So far, I tested all 16 bit based functions, and all 32 bit functions except multiplication and division, that I'm leaving for latter. This is the c code that is put to test, which in this case is configured to test multiplication: Code: int indx;
int add(int a, int b) {return a+b;} int sub(int a, int b) {return a-b;} int and(int a, int b) {return a&b;} int or(int a, int b) {return a|b;} int xor(int a, int b) {return a^b;} int lsr(unsigned int a, int b) {return a>>b;} int lsl(int a, int b) {return a<<b;} int asr(int a, int b) {return a>>b;} int mul(int a, int b) {return a*b;} int div(int a, int b) {return a/b;} int mod(int a, int b) {return a%b;}
int (*funcList[])() = {add, sub, and, or, xor, lsl, asr, lsr, mul, div, mod, };
__attribute__((noinline)) unsigned int funcListTest( unsigned int a, unsigned int b, int i) { return funcList[i]( a, b ); }
int main() { return funcListTest(10, 15, 8); }
I will not perform exhaustive testing procedures or implementing testing units because I think this would be overkill for this project. I just attempt to test manually a number of cases that I think that are representative, including the edge cases that could cause trouble. If this passes I will just assume that, for the untested range of values, the results will be fine too. I may well find latter that something that I considered tested does not work in a particular scenario, and I need to go back to the testing code, but I guess I will have to live with that. The assembler generates machine code taking into account the Harvard architecture of the processor. This means that it's not possible or efficient to store data next to program code, as it's usually the case for Von-Neuman processors. Therefore, the assembler must create specific initialisation code that will move all required initial values to program memory, including constant data, and compiler initialised user variables. The Log file output of the Assembler for the c code above looks like that: Code: Constant Data: 00000 : 0 bytes
Initialised Variables: 00000 : 0x0e,0x00 add Program:14 00002 : 0x10,0x00 sub Program:16 00004 : 0x12,0x00 and Program:18 00006 : 0x14,0x00 or Program:20 00008 : 0x16,0x00 xor Program:22 00010 : 0x1a,0x00 lsl Program:26 00012 : 0x1c,0x00 asr Program:28 00014 : 0x18,0x00 lsr Program:24 00016 : 0x1e,0x00 mul Program:30 00018 : 0x20,0x00 div Program:32 00020 : 0x22,0x00 mod Program:34
Unitialised Variables: 00022 : 2 bytes
----- file:/Users/joan/Documents-Local/Relay/RelayNou/main.c74
Source: setup 00000 : 1111100000010011 _pfix :00613 00001 : 0101000000101001 mov setupAddr, r1 :00613 00002 : 0101000000000010 mov dataAddr, r2 :00000 00003 : 0101000001011000 mov wordLength, r0 :00011 00004 : 0101100000000000 cmp r0, 0 00005 : 0100000000000110 brcc 0, .LL1 Program:+6 00006 : 0000111000001011 ld.w {r1}, r3 00007 : 1110000000010011 st.w r3, [r2, 0] 00008 : 0110000000001001 add r1, 1, r1 00009 : 0110000000010010 add r2, 2, r2 00010 : 0110100000001000 sub r0, 1, r0 00011 : 0011111111111000 jmp .LL0 Program:-8 00012 : 1111000000101001 call main Program:00041 00013 : 0000100011000000 halt
Source: main.c 00014 : 0010000000001000 add r1, r0, r0 00015 : 0000000011000000 ret 00016 : 0010010001000000 sub r0, r1, r0 00017 : 0000000011000000 ret 00018 : 0010101000001000 and r1, r0, r0 00019 : 0000000011000000 ret 00020 : 0010100000001000 or r1, r0, r0 00021 : 0000000011000000 ret 00022 : 0010110000001000 xor r1, r0, r0 00023 : 0000000011000000 ret 00024 : 1111000001011101 call __lshrhi3 Program:00093 00025 : 0000000011000000 ret 00026 : 1111000001101001 call __ashlhi3 Program:00105 00027 : 0000000011000000 ret 00028 : 1111000001010001 call __ashrhi3 Program:00081 00029 : 0000000011000000 ret 00030 : 1111000011010011 call __mulhi3 Program:00211 00031 : 0000000011000000 ret 00032 : 1111000100010011 call __divhi3 Program:00275 00033 : 0000000011000000 ret 00034 : 1111000100100101 call __modhi3 Program:00293 00035 : 0000000011000000 ret 00036 : 1010000000010010 ld.w [SP, 2], r2 00037 : 0000001001010010 lsl r2, r2 00038 : 1100100000010010 ld.w [r2, funcList], r2 Data:00000 00039 : 0000001010000010 call r2 00040 : 0000000011000000 ret 00041 : 0110100000010111 sub SP, 2, SP 00042 : 0101000001000000 mov 8, r0 00043 : 1011000000000000 st.w r0, [SP, 0] 00044 : 0101000001010000 mov 10, r0 00045 : 0101000001111001 mov 15, r1 00046 : 1111000000100100 call funcListTest Program:00036 00047 : 0110000000010111 add SP, 2, SP 00048 : 0000000011000000 ret
Source: setupData 00613 : 0000000000001110 _imm 14 00614 : 0000000000010000 _imm 16 00615 : 0000000000010010 _imm 18 00616 : 0000000000010100 _imm 20 00617 : 0000000000010110 _imm 22 00618 : 0000000000011010 _imm 26 00619 : 0000000000011100 _imm 28 00620 : 0000000000011000 _imm 24 00621 : 0000000000011110 _imm 30 00622 : 0000000000100000 _imm 32 00623 : 0000000000100010 _imm 34
Assembly completed
For brevity, I removed from the output above all the system library functions that in the example above would go from address 00048 to 00612 The starting sections: "Constant Data", "Initialised Variables" and "Uninitialised Variables" represent addresses in Data memory space. They are output to the Log file just for information purposes, but the assembler does not generate anything in Data memory. All the remaining sections are in Program memory, and they are the actual binary output of the assembler. The very last section, named "Source: setupData" is copied verbatim to Data memory by the processor initialisation code. The initialisation code goes to the "Source: setup" section as the processor starts execution at address 00000 (at least for now). Just at the end of the initialisation code, the "main" user function is called, and user program execution begins. For the testing code above, I implemented a call table in C with the interesting functions to test. From the point of view of the C program, it is just an array of function pointers, which is initialised by the compiler with the test function addresses, so this array goes to the "Initialised Variables" section as shown above. Other than just testing the functions themselves, the code above allowed me to verify that at least simple cases of memory/stack access, subroutine calls, and other basic instructions are encoded correctly and executed correctly by the simulator. So far, 16 bit non-constant shifts, multiplication, division and remainder, are already working fine in the simulator. Also 32 bit shifts, adds, subtraction, logical ops, have been tested. (Edit: typo)
|
Sat Sep 14, 2019 10:15 pm |
|
 |
joanlluch
Joined: Fri Mar 22, 2019 8:03 am Posts: 328 Location: Girona-Catalonia
|
While playing with the arithmetic routines in the simulator I figured out a couple of small changes on the instruction set that would be beneficial, and even potentially simplify the actual processor ALU implementation. Since long ago, I have kept a double set of Status Registers. The first set was intended for boolean comparisons and logical operation, and the second set was meant for arithmetic ops. The following link of a previous post shows why this was beneficial: http://anycpu.org/forum/viewtopic.php?f=23&t=583&start=90#p4501Particularly, on the very last example, the compiler performs a signed addition of a 32 bit value with a 16 bit value. To add the upper word, the 16 bit value needs to be sign extended. Since there was (at the time) no specific instruction for that, a trick with compare (CMP) and set (SETLT) instructions was used. This got inserted between the ADD and the ADDC instruction thus saving one register. This was right and possible without any status flag interferences because the arithmetic flags were not affected by the in-between comparison. However, things evolved and at some point I incorporated a sign extend word instruction (SEXTW). The same C code now results in the following assembly code which is 3 instructions shorter than previously: CPU74 Code: arith32: add r2, r0, r0 sextw r2, r2 addc r2, r1, r1 ret The SEXTW instruction already leaves status flags unaffected. As a consequence of that, the 'dual' set of status flags have lost some utility. I have still seen the compiler taking advantage of the dual set by for example inserting a logical operation between an ADD and ADDC in order to save a register, but in reality the occurrences of that are very rare, and possibly not worth the complication of dual register flags. Many instructions that are candidates to be inserted between add and addc are register moves, and memory accesses, but these do not affect status flags anyway so there's no need for dual flags either. So, I decided to simplify things at the cpu level and removed completely the dual status register. There's now a single set of SR flags. So far: I, V, S, C, Z The next thing I did is incorporating shift instructions through carry (some processors call them rotate through carry) as well as carry flag setting for the normal shifts. This actually doesn't imply any new physical instruction encoding because, once carry generation is incorporated into the shift instructions, the ones performing left shifts (and left rotation through carry) can be replaced by the existing ADD and ADDC. So now, I only have explicit right-shift instructions, namely: Code: lsr Rs, Rd 1 bit logical shift right. Bit 0 is shifted to the C Flag. Bit 15 is set to zero. Result is stored in Rd lsrc Rs, Rd 1 bit shift Right through carry. Bit 0 is shifted to the C Flag. The old C flag is shifted to bit 15. Result is stored in Rd asr Rs, Rd 1 bit arithmetic shift right. Bit 0 is shifted to the C Flag. bit 15 is preserved. Result is stored in Rd which should also simplify somehow the ALU design thanks to the lack of explicit left-shifts. I named "LSRC" as this, instead of RCR or ROR or something to that effect, because it virtually always occurs after an ASR or LSR, which looks more elegant to me as it makes more obvious that it is performing the second half of a shift right. So, the code that is now generated for 1 bit shifts of 32 long values is this: CPU74 Code: # --------------------------------------------- # asr32_1 # ---------------------------------------------
asr32_1: asr r1, r1 lsrc r0, r0 ret
# --------------------------------------------- # lsr32_1 # ---------------------------------------------
lsr32_1: lsr r1, r1 lsrc r0, r0 ret
# --------------------------------------------- # lsl32_1 # ---------------------------------------------
lsl32_1: add r1, r1, r1 addc r0, r0, r0 ret Note the use of ADD and ADDC for the left shift. Non constant shifts are still generated through library calls. For constant shifts above 1, the compiler still generates the combination of optimised Left/Right Shifts combined with OR. For example, the following cases of 16, 8 and 4 bit shifts. Code: # --------------------------------------------- # lsl32_16 # ---------------------------------------------
lsl32_16: (16 bit constant shift left of long word) mov r0, r1 mov 0, r0 ret
# --------------------------------------------- # lsl32_8 # ---------------------------------------------
lsl32_8: (8 bit constant shift left or long word) bswap r0, r2 zext r2, r2 zext r1, r1 bswap r1, r1 or r1, r2, r1 zext r0, r0 bswap r0, r0 ret
# --------------------------------------------- # lsl32_4 (4 bit constant shift left of long word) # ---------------------------------------------
lsl32_4: add r1, r1, r1 add r1, r1, r1 add r1, r1, r1 add r1, r1, r1 bswap r0, r2 zext r2, r2 lsr r2, r2 lsr r2, r2 lsr r2, r2 lsr r2, r2 or r1, r2, r1 add r0, r0, r0 add r0, r0, r0 add r0, r0, r0 add r0, r0, r0 ret It may seem a bit convoluted, specially the 4 bit shift case, but it's still better than the compiler outputs for other processors without muti-shift hardware support that I looked at. This could be improved further in some cases by playing with the 1 bit long word shifts described above. For example the 4 bit shift would result in a total of 8 instructions only. I'm still undecided about whether I should spend some more time on this. I guess I will leave this for now and will restart the simulator tests.
|
Tue Sep 17, 2019 8:13 am |
|
 |
BigEd
Joined: Wed Jan 09, 2013 6:54 pm Posts: 1821
|
Another interesting evolution - thanks for the update!
|
Tue Sep 17, 2019 8:18 am |
|
 |
joanlluch
Joined: Fri Mar 22, 2019 8:03 am Posts: 328 Location: Girona-Catalonia
|
As I had already Add, Sub with carry instructions, and having recently added shifts with carry, I decided to complete the lot and added "cmpc" (compare with carry) as well. So I now have the whole set, which enables easy extension of 16 bit integer operations, into 32 or even 64 bits with relatively little effort, by following an identical approach to the AVR processors. There are 4 instructions that take the carry flag (I call them the "carry-In instructions") : Code: addc Rs, Rn, Rd subc Rs, Rn, Rd cmpc Rs, Rn lsrc Rs, Rd
I also decided to implement the In-Zero flag approach that was recently mentioned in the OPC thread, but not only for the 'cmpc' instruction but for all of them. The idea is that the Status Register flags after carry-in instructions will correctly reflect the result of the combined (wide) operation by taking into account the Zero flag. This makes possible to use conditional branches or set/select instructions after wide operations, as if they were normal 16 bit operations. Other than that, I made the following changes on the ISA: - Went back to 8 general purpose registers R0 through R7, with a separated SP and PC, instead of 7 registers + SP. That is, the SP is no longer part of the general register set. The SP requires a number of instructions for it alone, as it can no longer be used as a general purpose register in regular instructions. This has not been a major issue because after the incorporation of prefixed immediates I had left some free slots that I used now. Having separated instructions for the SP not only adds 1 precious general purpose register, but makes the ISA encodings more dense on instructions that could actually be used. I mean that when the SP was a general purpose register it could theoretically be used as an operand or result in instructions with little sense for a SP, such as logical operations, shifts, and others, thus wasting encoding combinations that now have been reclaimed for the new general purpose register. The SP has been left with only the instructions that are meaningful for it. The set of SP instructions include just basic SP stack adjust, and stack load/stores with only one addressing mode. No more fancy is required for the SP operations. They are the following: Code: add SP, K, SP // Adjust SP up or down, K is a sign extended constant lea SP, K, Rd // Get the address pointed by SP+K. this can also be used to move SP to Rd by just setting K to zero mov Rs, SP // Copy register Rd to SP. Along with the previous instruction, enables any kind of arithmetic on the SP by first transferring to a general purpose register ld.w [SP, K], Rd // Load word at address SP+K ld.sb [SP, K], Rd // Load sign extended byte at address SP+K st.w Rd, [SP, K] // Store word to address SP+K st.b Rd, [SP, K] // Store byte to address SP+K push Rd // Push Rd pop Rd // Pop Rd
[I have not listed subroutine calls tand returns which of course also have SP involvement] The first two instructions may appear identical but they are not because the SP is not encoded as such, but implicit in the instructions. So these instructions do not even belong to the same encoding pattern. Zero extended loads from the stack have been removed due to lack of encoding space, but this is not a major problem because explicit zero extends can be generated just after loads, and anyway the cases where this is required are relatively rare, as stack passed arguments always take a word even if they are byte sized. I am now tempted to remove completely the push and pop instructions. They are really redundant because push/pops are only generated to save/restore registers upon function entry and exit. Register save/restore can be performed with store/load instructions followed or preceded by a stack adjust. It's one more instruction in most cases at function entry and another one at function exit, but two less instructions to implement at the microcode level. So far, I am not totally decided on that one though. Updated the Compiler files and Instruction set Docs from the github repo https://github.com/John-Lluch/CPU74Joan
|
Wed Oct 02, 2019 9:28 pm |
|
 |
joanlluch
Joined: Fri Mar 22, 2019 8:03 am Posts: 328 Location: Girona-Catalonia
|
I removed the "push" and "pop" instructions from the set. Function prologue and epilogue code now look like this: Code: .globl arith arith: add SP, -4, SP # Allocate stack space st.w r4, [SP, 2] # Save register R4 st.w r5, [SP, 0] # Save register R5 ... ... # Function body ... ld.w [SP, 0], r5 # Restore register R5 ld.w [SP, 2], r4 # Restore register R4 add SP, 4, SP # Deallocate stack space ret Compared with Push / Pop code, this requires one more instruction at function entry and exit in simple cases, but the cost is identical for the more general case because stack allocation code after push sequences is required anyway for local variables or stack arguments to calling functions. With the new approach, all the required stack allocation for the function is folded into a single instruction at the beginning of the function, which also accounts for the callee saved registers. The new approach also normalises Frame Pointer usage when required, because the designed register is just saved normally as any other, and it is set to the initial SP pointer just before the function body begins. From the point of view of hardware, this means two less instructions (push/pop) and relaxed need to implement predecrement/postincrement stores/loads only for these two. There's still the 'call' and 'return' instructions requiring pre/post SP de/in crements, but these instructions only push/pop the PC, so I suppose this should facilitate things. This is something that I wanted to do for some time because the LLVM backend appeared to have the required hooks to implement it, and looked as a sensible thing to do, but I was refraining from doing it because I thought it was tricky. However, recently watching the RISCV doing exactly the same has motivated me to go ahead with this change.
|
Sat Oct 05, 2019 5:40 pm |
|
 |
oldben
Joined: Mon Oct 07, 2019 2:41 am Posts: 768
|
DO you have real hardware?
|
Mon Oct 07, 2019 6:39 am |
|
 |
joanlluch
Joined: Fri Mar 22, 2019 8:03 am Posts: 328 Location: Girona-Catalonia
|
oldben wrote: DO you have real hardware? No, I don't. I currently only have a hand made diagram of the overall CPU architecture for the purposes of understanding which buses and control lines I might need for the execution of all my instruction set. I also have a software simulator that is coded in a way that exposes the required decoding logic to convert the instruction encodings and their embedded fields into executable microcodes and their operands, as well as cycle accurate components to perform the execution of such microcodes according to the cpu constraints. So far, my work has consisted in implementing software tools (most time has gone to the compiler) as well as tweaking both the instruction set and the CPU diagram to get the best balance (or at least to my understanding) among: instruction set completeness, raw performance, code density, and easy instruction decoding. One thing that I learned is that achieving such a balance is much harder for constant-width 16 bit instruction sets, than for 32 bit or 8 bit sets. Said that, I will need a lot of help for the actual hardware.
|
Mon Oct 07, 2019 7:54 am |
|
 |
oldben
Joined: Mon Oct 07, 2019 2:41 am Posts: 768
|
It might be wise to keep a look out for the kind of connectors you wish to use, for the best prices, as well as front panel switches if used and other harder to find compnents. AM29705 looks like just the RAM you need. The Block diagram looks good other than adding a write path to your program memory,for program loading. At $7.99 each Am29705's are here https://unicornelectronics.com/IC/MISCELLANEOUS.html
|
Wed Oct 09, 2019 5:20 am |
|
 |
joanlluch
Joined: Fri Mar 22, 2019 8:03 am Posts: 328 Location: Girona-Catalonia
|
Hi Ben, thank you for having looked at my project.
About components, I will purchase switches, connectors, cabling and so on, from local stores, and will possibly use Mouser for ics. I'm located in Europe, so purchasing relatively inexpensive items from the US is not generally viable due to delays and added costs. For pcbs I will possibly use JLCPCB, I have great experience with them based on a few small pcbs I made in the past, and they offer great quality as far as I can tell.
I'm aware that I need a writing path to program memory, but to be honest, I don't currently know what's the best way to do it, or what should be implemented. I think some hobbyists use an arduino to bridge their CPU to a standard computer through a serial port. In a perfect world, I would want to have a boot loader implemented in my processor, with an ethernet or serial interface to connect to a computer. But so far this is way above my head. Any suggestions would be greatly appreciated.
Joan
Edit: Forgot to add that I attempt to use SOIC surface mount components, particularly for ics, rather than through-hole DIL, for compactness and short length pcb traces, as my processor will have a relatively great number of them, and I aim for speeds of about 16 MHz or so.
|
Wed Oct 09, 2019 7:29 am |
|
 |
joanlluch
Joined: Fri Mar 22, 2019 8:03 am Posts: 328 Location: Girona-Catalonia
|
I started figuring out an actual hardware schematic for decoding the instruction set. Found that some 'exceptions' to the encoding orthogonally required more gating than I felt comfortable with... The decoder in the software simulator is already implemented in terms that can be directly translated into hardware decoders, multiplexers and logic gates, but you know, just one line of seemingly inoffensive code, or just any tiny logical operator in the middle of an expression, adds extra complexity when that must be translated into actual hardware. So I thought that I needed to free again, somehow, encoding slots that would allow me to place instruction bits where they are easier to decode. I decided to completely remove the condition code fields from the conditional instructions ("select", "set" and "conditional branch"). So I did this: - I placed the actual condition code right in the compare instruction, instead of in the conditional instructions. The compare instruction does no longer set the classic S, V, C, S flags, but just one single flag (I named it T) indicating whether the comparison was true or false. - For flags-setting arithmetic or logical operations, the convention is to set the "T" flag when the result was Zero. So after an ALU op (except a compare), the T flag is synonymous of Z. The compiler already takes advantage of this to optimise things, as shown in one of the functions below. Code: int compareselect2(int a) { return a > 3 ? a : 0; }
int sftest_test( int x, int y, int a, int b ) { if ( a < 8 ) { x = x + a; y = - b; } return x+y; }
int loopTest0( int a ) { for ( int i = 0 ; i<10 ; i++ ) a = a<<1;
return a; }
CPU74 Code: # --------------------------------------------- # compareselect2 # --------------------------------------------- .globl compareselect2 compareselect2: cmp.gt r0, 3 # compare r0 with 3 for greater than, and update T flag mov 0, r1 # move 0 to r1 selcc r0, r1, r0 # select r0 if T flag is true, or r1 otherwise, put the result in r0 ret # return
# --------------------------------------------- # sftest_test # --------------------------------------------- .globl sftest_test sftest_test: cmp.ge r2, 8 # compare r2 with 8 for greater than or equal, and update T flag brcc .LBB1_2 # conditional branch is T is true add r2, r0, r0 neg r3, r1 .LBB1_2: add r0, r1, r0 ret
# --------------------------------------------- # loopTest0 # --------------------------------------------- .globl loopTest0 loopTest0: mov 10, r1 # initial induction variable value, set to 10 .LBB0_1: add r0, r0, r0 # this is the actual shift left sub r1, 1, r1 # decrement the iv brncc .LBB0_1 # loop back if not zero ret So now, it's not the conditional instruction but the compare instruction what carries condition code information. Compiled code quality is not degraded because checking for multiple conditions after a single compare is extremely rare for compiler generated code. The advantage is that this frees 3 bits from all previous conditional instruction encodings, and it doesn't add anything to the compare instruction because I am able to use the space of the (previously unused) destination register to encode the condition code (the compare does not need a destination register). The new instruction encodings are pushed here as usual : https://github.com/John-Lluch/CPU74/tree/master/Docs The relevant document is "CPU74InstrSetV9.pdf" (Labeled as version 9). The same document also explains the working of the status flags, and the instruction decoding patterns that should be pretty straightforward now. The idea is to convert the 9 bit wide instruction field into a 7 bit microcode that can be input to one or more ATF16V8 to produce the control signals [The assembler and simulator are not yet updated] Joan
|
Sat Oct 12, 2019 5:57 pm |
|
Who is online |
Users browsing this forum: claudebot and 0 guests |
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot post attachments in this forum
|
|