![]() |
|
If you can't view the Datasheet, Please click here to try to view without PDF Reader . |
|
Datasheet File OCR Text: |
regarding the change of names mentioned in the document, such as hitachi electric and hitachi xx, to renesas technology corp. the semiconductor operations of mitsubishi electric and hitachi were transferred to renesas technology corporation on april 1st 2003. these operations include microcomputer, logic, analog and discrete devices, and memory chips other than drams (flash memory, srams etc.) accordingly, although hitachi, hitachi, ltd., hitachi semiconductors, and other hitachi brand names are mentioned in the document, these names have in fact all been changed to renesas technology corp. thank you for your understanding. except for our corporate trademark, logo and corporate statement, no changes whatsoever have been made to the contents of the document, and these changes do not constitute any alteration to the contents of the document itself. renesas technology home page: http://www.renesas.com renesas technology corp. customer support dept. april 1, 2003 to all our customers
cautions keep safety first in your circuit designs! 1. renesas technology corporation puts the maximum effort into making semiconductor products better and more reliable, but there is always the possibility that trouble may occur with them. trouble with semiconductors may lead to personal injury, fire or property damage. remember to give due consideration to safety when making your circuit designs, with appropriate measures such as (i) placement of substitutive, auxiliary circuits, (ii) use of nonflammable material or (iii) prevention against any malfunction or mishap. notes regarding these materials 1. these materials are intended as a reference to assist our customers in the selection of the renesas technology corporation product best suited to the customer's application; they do not convey any license under any intellectual property rights, or any other rights, belonging to renesas technology corporation or a third party. 2. renesas technology corporation assumes no responsibility for any damage, or infringement of any third-party's rights, originating in the use of any product data, diagrams, charts, programs, algorithms, or circuit application examples contained in these materials. 3. all information contained in these materials, including product data, diagrams, charts, programs and algorithms represents information on products at the time of publication of these materials, and are subject to change by renesas technology corporation without notice due to product improvements or other reasons. it is therefore recommended that customers contact renesas technology corporation or an authorized renesas technology corporation product distributor for the latest product information before purchasing a product listed herein. the information described here may contain technical inaccuracies or typographical errors. renesas technology corporation assumes no responsibility for any damage, liability, or other loss rising from these inaccuracies or errors. please also pay attention to information published by renesas technology corporation by various means, including the renesas technology corporation semiconductor home page (http://www.renesas.com). 4. when using any or all of the information contained in these materials, including product data, diagrams, charts, programs, and algorithms, please be sure to evaluate all information as a total system before making a final decision on the applicability of the information and products. renesas technology corporation assumes no responsibility for any damage, liability or other loss resulting from the information contained herein. 5. renesas technology corporation semiconductors are not designed or manufactured for use in a device or system that is used under circumstances in which human life is potentially at stake. please contact renesas technology corporation or an authorized renesas technology corporation product distributor when considering the use of a product contained herein for any specific purposes, such as apparatus or systems for transportation, vehicular, medical, aerospace, nuclear, or undersea repeater use. 6. the prior written approval of renesas technology corporation is n ecessary to reprint or reproduce in whole or in part these materials. 7. if these products or technologies are subject to the japanese export control restrictions, they must be exported under a license from the japanese government and cannot be imported into a country other than the approved destination. any diversion or reexport contrary to the export control laws and regulations of japan and/or the country of destination is prohibited. 8. please contact renesas technology corporation for further details on these materials or the products contained therein. hitachi superh? risc engine sh-3/sh-3e/sh3-dsp programming manual ade-602-096b rev.3.0 3/6/03 hitachi, ltd cautions 1. hitachi neither warrants nor grants licenses of any rights of hitachis or any third partys patent, copyright, trademark, or other intellectual property rights for information contained in this document. hitachi bears no responsibility for problems that may arise with third partys rights, including intellectual property rights, in connection with use of the information contained in this document. 2. products and product specifications may be subject to change without notice. confirm that you have received the latest product standards or specifications before final design, purchase or use. 3. hitachi makes every attempt to ensure that its products are of high quality and reliability. however, contact hitachis sales office before using the product in an application that demands especially high quality and reliability or where its failure or malfunction may directly threaten human life or cause risk of bodily injury, such as aerospace, aeronautics, nuclear power, combustion control, transportation, traffic, safety equipment or medical equipment for life support. 4. design your application so that the product is used within the ranges guaranteed by hitachi particularly for maximum rating, operating supply voltage range, heat radiation characteristics, installation conditions and other characteristics. hitachi bears no responsibility for failure or damage when used beyond the guaranteed ranges. even within the guaranteed ranges, consider normally foreseeable failure rates or failure modes in semiconductor devices and employ systemic measures such as fail-safes, so that the equipment incorporating hitachi product does not cause bodily injury, fire or other consequential damage due to operation of the hitachi product. 5. this product is not designed to be radiation resistant. 6. no one is permitted to reproduce or duplicate, in any form, the whole or part of this document without written approval from hitachi. 7. contact hitachis sales office for any questions regarding this document or hitachi semiconductor products. introduction the sh-3/sh-3e/sh3-dsp is a new generation of risc microcomputers that integrate a risc- type cpu and the peripheral functions required for system configuration onto a single chip to achieve high-performance operation. it can operate in a power-down state, which is an essential feature for portable equipment. these cpus have a risc-type instruction set. basic instructions can be executed in one clock cycle, improving instruction execution speed. in addition, the cpu has a 32-bit internal architecture for enhanced data-processing ability. in addition, the sh-3e supports single-precision floating point calculations as well as entirely pcapi compatible emulation of double-precision floating point calculations. the sh-3e instructions are a subset of the floating point calculations conforming to the ieee754 standard. this programming manual describes in detail the instructions for the sh-3/sh-3e/sh3-dsp and is intended as a reference on instruction operation and architecture. it also covers the pipeline operation, which is a feature of the sh-3/sh-3e/sh3-dsp. for information on the hardware, please refer to the hardware manual for the product in question. please contact a hitachi sales office for information on development environment systems. organization of this manual table 1 describes how this manual is organized. table 2 show the relationships between the items listed and lists the sections within this manual that cover those items. table 1 manual organization category section title contents introduction 1. features cpu features architecture (1) 2. programming model types and structure of general registers, control registers and system registers 3. data formats data formats for registers and memory 4. floating point processor unit fpu register configuration, fpu exceptions 5. dsp operations and data transfer fixed-point operations, integer operations, logic operations, multiplication, shift operations, overview of dsp operations such as saturation operations, repeat control introduction to instructions 6. instruction features instruction features, addressing modes, and instruction formats 7. instruction sets summary of instructions by category and list in alphabetic order detailed information on instructions 8. description of each instruction operation of each instruction in alphabetical order architecture (2) 9. processing states power-down and other processing states 10. pipeline operation pipeline operation table 2 subjects and corresponding sections category topic section title introduction and cpu features 1. features features instruction features 6.1 risc-type instruction set pipelines 10.1 basic configuration of pipelines 10.2 slot and pipeline flow architecture organization of registers 2. programming model data formats 3. data formats floating point processor unit 4. floating point processor unit dsp 5. dsp operations and data transfer processing states, reset state, exception processing state, bus release state, program execution state, power-down state, sleep mode and standby mode 9. processing states pipeline operation 10. pipeline operation introduction to instruction features 6. instruction features instructions addressing modes 6.2 addressing modes instruction formats 6.3 instruction formats list of instructions instruction sets 7.1 instruction set by classification 7.2 instruction set in alphabetical order detailed information on instructions detailed information of instruction operation 8. instruction description 10.7 instruction pipelines number of instruction execution states 10.3 number of instruction execution cycles i contents section 1 features ................................................................................................................ 1 1.1 sh-3 cpu features.......................................................................................................... .. 1 1.2 sh3-dsp features ........................................................................................................... .. 3 section 2 programming model ......................................................................................... 5 2.1 organization of registers .................................................................................................. 5 2.1.1 privileged mode and banks.................................................................................. 5 2.2 general-purpose registers ................................................................................................. 1 1 2.3 control registers.......................................................................................................... ...... 13 2.4 system registers........................................................................................................... ..... 15 2.5 initial register value ..................................................................................................... .... 16 section 3 data formats ....................................................................................................... 17 3.1 data format in registers................................................................................................... .17 3.2 data format in memory..................................................................................................... 1 7 3.3 data format for immediate data ....................................................................................... 18 3.4 dsp type data formats (sh3-dsp only)........................................................................ 18 section 4 floating point unit (sh-3e only) ................................................................ 21 4.1 introduction............................................................................................................... ......... 21 4.2 floating point registers and system registers for fpu.................................................... 22 4.2.1 floating point register file .................................................................................. 22 4.2.2 floating point communication register (fpul) ................................................. 22 4.2.3 floating point status/control register (fpscr).................................................. 22 4.3 floating point format ...................................................................................................... .. 24 4.3.1 floating point format ........................................................................................... 24 4.3.2 not a number (nan) ............................................................................................ 24 4.3.3 denormalized values............................................................................................ 25 4.3.4 other special values ............................................................................................ 25 4.4 floating point exception model ........................................................................................ 26 4.4.1 enabled exception ................................................................................................ 26 4.4.2 disabled exception ............................................................................................... 26 4.4.3 exception event and code for fpu...................................................................... 26 4.4.4 alignment of floating point data in memory...................................................... 27 4.4.5 arithmetic with special operands........................................................................ 27 4.5 synchronization issues .................................................................................................... .. 27 section 5 dsp operation functions and data transfers (sh3-dsp only) ........ 29 5.1 alu fixed decimal point operations............................................................................... 30 ii 5.1.1 function ................................................................................................................ 3 0 5.1.2 instructions and operands .................................................................................... 32 5.1.3 dc bit ................................................................................................................... 32 5.1.4 condition bits ....................................................................................................... 35 5.1.5 overflow prevention function (saturation operation) ........................................ 35 5.2 alu integer operations .................................................................................................... 3 5 5.3 alu logical operations.................................................................................................... 3 7 5.3.1 function ................................................................................................................ 3 7 5.3.2 instructions and operands .................................................................................... 38 5.3.3 dc bit ................................................................................................................... 39 5.3.4 condition bits ....................................................................................................... 39 5.4 fixed decimal point multiplication................................................................................... 39 5.5 shift operations ........................................................................................................... ...... 41 5.5.1 arithmetic shift operations.................................................................................. 41 5.5.2 logical shift operations ....................................................................................... 43 5.6 the msb detection instruction ......................................................................................... 45 5.6.1 function ................................................................................................................ 4 5 5.6.2 instructions and operands .................................................................................... 47 5.6.3 dc bit ................................................................................................................... 48 5.6.4 condition bits ....................................................................................................... 48 5.7 rounding................................................................................................................... ......... 49 5.7.1 operation function ............................................................................................... 49 5.7.2 instructions and operands .................................................................................... 50 5.7.3 dc bit ................................................................................................................... 51 5.7.4 condition bits ....................................................................................................... 51 5.7.5 overflow prevention function (saturation operation) ........................................ 51 5.8 condition select bits (cs) and the dsp condition bit (dc)............................................ 52 5.9 overflow prevention function (saturation operation)...................................................... 54 5.10 data transfers............................................................................................................ ........ 54 5.10.1 x and y memory data transfer ........................................................................... 55 5.10.2 single data transfers............................................................................................ 56 5.11 operand contention ........................................................................................................ ... 59 5.12 sp repeat (loop) control.................................................................................................. 60 5.12.1 usage notes .......................................................................................................... 64 5.13 conditional instructions and data transfers...................................................................... 67 section 6 instruction features .......................................................................................... 69 6.1 risc-type instruction set................................................................................................. 6 9 6.1.1 16-bit fixed length.............................................................................................. 69 6.1.2 one instruction/cycle ........................................................................................... 69 6.1.3 data length .......................................................................................................... 69 6.1.4 load-store architecture........................................................................................ 69 6.1.5 delayed branch instructions................................................................................. 70 iii 6.1.6 multiplication/accumulation operation............................................................... 70 6.1.7 t bit .................................................................................................................... .. 70 6.1.8 immediate data..................................................................................................... 71 6.1.9 absolute address.................................................................................................. 71 6.1.10 16-bit/32-bit displacement.................................................................................. 72 6.1.11 privileged instructions .......................................................................................... 72 6.2 cpu instruction addressing modes .................................................................................. 73 6.3 dsp data addressing (sh3-dsp only) ............................................................................ 76 6.3.1 x and y data addressing ..................................................................................... 77 6.3.2 single data addressing ........................................................................................ 78 6.3.3 modulo addressing .............................................................................................. 79 6.3.4 dsp addressing operation ................................................................................... 81 6.4 instruction format of cpu instructions............................................................................. 83 6.5 instruction formats for dsp instructions (sh3-dsp only).............................................. 86 6.5.1 double and single data transfer instructions...................................................... 86 6.5.2 parallel processing instructions............................................................................ 89 section 7 instruction set ..................................................................................................... 93 7.1 instruction set by classification ........................................................................................ 93 7.1.1 data transfer instructions .................................................................................... 98 7.1.2 arithmetic instructions ......................................................................................... 100 7.1.3 logic operation instructions ................................................................................ 102 7.1.4 shift instructions................................................................................................... 103 7.1.5 branch instruction ................................................................................................ 104 7.1.6 system control instructions.................................................................................. 105 7.1.7 floating point instructions (sh-3e only) ............................................................ 109 7.1.8 fpu system register related cpu instructions (sh-3e only) .......................... 110 7.1.9 cpu instructions that support dsp functions (sh3-dsp only) ....................... 110 7.2 instruction set in alphabetical order ................................................................................ 112 7.3 dsp data transfer instruction set (sh3-dsp only) ........................................................ 123 7.3.1 double data transfer instructions (x memory data).......................................... 124 7.3.2 double data transfer instructions (y memory data).......................................... 124 7.3.3 single data transfer instructions ......................................................................... 125 7.4 dsp operation instruction set (sh3-dsp only) .............................................................. 126 7.4.1 alu arithmetic operation instructions ............................................................... 130 7.4.2 alu logical operation instructions .................................................................... 134 7.4.3 fixed decimal point multiplication instructions.................................................. 134 7.4.4 shift operation instructions.................................................................................. 135 7.4.5 system control instructions.................................................................................. 137 7.4.6 nopx and nopy instruction code ..................................................................... 137 section 8 instruction descriptions .................................................................................. 139 8.1 sample description (name): classification....................................................................... 139 iv 8.2 instruction description (listing and description of instructions common to the sh-3, sh-3e and sh3-dsp).................................................................... 143 8.2.1 add (add binary): arithmetic instruction.......................................................... 143 8.2.2 addc (add with carry): arithmetic instruction ................................................ 144 8.2.3 addv (add with v flag overflow check): arithmetic instruction ................... 145 8.2.4 and (and logical): logic operation instruction.............................................. 146 8.2.5 bf (branch if false): branch instruction.............................................................. 148 8.2.6 bf/s (branch if false with delay slot): branch instruction................................ 149 8.2.7 bra (branch): branch instruction ....................................................................... 151 8.2.8 braf (branch far): branch instruction .............................................................. 153 8.2.9 bsr (branch to subroutine): branch instruction ................................................. 154 8.2.10 bsrf (branch to subroutine far): branch instruction ........................................ 156 8.2.11 bt (branch if true): branch instruction .............................................................. 158 8.2.12 bt/s (branch if true with delay slot): branch instruction................................. 159 8.2.13 clrmac (clear mac register): system control instruction ........................... 161 8.2.14 clrs (clear s bit): system control instruction.................................................. 162 8.2.15 clrt (clear t bit): system control instruction ................................................. 163 8.2.16 cmp/cond (compare conditionally): arithmetic instruction.............................. 164 8.2.17 div0s (divide step 0 as signed): arithmetic instruction ................................... 168 8.2.18 div0u (divide step 0 as unsigned): arithmetic instruction .............................. 169 8.2.19 div1 (divide step 1): arithmetic instruction...................................................... 170 8.2.20 dmuls.l (double-length multiply as signed): arithmetic instruction............ 175 8.2.21 dmulu.l (double-length multiply as unsigned): arithmetic instruction....... 177 8.2.22 dt (decrement and test): arithmetic instruction................................................ 179 8.2.23 exts (extend as signed): arithmetic instruction ............................................... 180 8.2.24 extu (extend as unsigned): arithmetic instruction .......................................... 181 8.2.25 jmp (jump): branch instruction........................................................................... 182 8.2.26 jsr (jump to subroutine): branch instruction ..................................................... 184 8.2.27 ldc (load to control register): system control instruction (privileged only) 186 8.2.28 ldre (load effective address to re register): system control instruction (sh3-dsp only) ................................................................................................... 191 8.2.29 ldrs (load effective address to rs register): system control instruction (sh3-dsp only) ................................................................................................... 192 8.2.30 lds (load to system register): system control instruction .............................. 193 8.2.31 ldtlb (load pteh/ptel to tlb): system control instruction (privileged only) .................................................................................................. 197 8.2.32 mac.l (multiply and accumulate long): arithmetic instruction...................... 198 8.2.33 mac (multiply and accumulate): arithmetic instruction................................... 201 8.2.34 mov (move data): data transfer instruction ..................................................... 204 8.2.35 mov (move immediate data): data transfer instruction ................................... 209 8.2.36 mov (move peripheral data): data transfer instruction.................................... 211 8.2.37 mov (move structure data): data transfer instruction ..................................... 214 8.2.38 mova (move effective address): data transfer instruction ............................. 217 v 8.2.39 movt (move t bit): data transfer instruction.................................................. 218 8.2.40 mul.l (multiply long): arithmetic instruction ................................................. 219 8.2.41 muls.w (multiply as signed word): arithmetic instruction ............................ 220 8.2.42 mulu.w (multiply as unsigned word): arithmetic instruction........................ 221 8.2.43 neg (negate): arithmetic instruction.................................................................. 222 8.2.44 negc (negate with carry): arithmetic instruction ............................................ 223 8.2.45 nop (no operation): system control instruction................................................ 224 8.2.46 not (notlogical complement): logic operation instruction...................... 225 8.2.47 or (or logical) logic operation instruction..................................................... 226 8.2.48 pref (prefetch data to the cache) ...................................................................... 228 8.2.49 rotcl (rotate with carry left): shift instruction ............................................. 229 8.2.50 rotcr (rotate with carry right): shift instruction........................................... 230 8.2.51 rotl (rotate left): shift instruction.................................................................. 231 8.2.52 rotr (rotate right): shift instruction................................................................ 232 8.2.53 rte (return from exception): system control instruction (privileged only).... 233 8.2.54 rts (return from subroutine): branch instruction ............................................. 235 8.2.55 setrc (set repeat count to rc): system control instruction (sh3-dsp only) ................................................................................................... 237 8.2.56 sets (set s bit): system control instruction...................................................... 239 8.2.57 sett (set t bit): system control instruction ..................................................... 240 8.2.58 shad (shift arithmetic dynamically): shift instruction.................................... 241 8.2.59 shal (shift arithmetic left): shift instruction .................................................. 243 8.2.60 shar (shift arithmetic right): shift instruction................................................ 244 8.2.61 shld (shift logical dynamically): shift instruction.......................................... 245 8.2.62 shll (shift logical left): shift instruction........................................................ 247 8.2.63 shlln (shift logical left n bits): shift instruction............................................ 248 8.2.64 shlr (shift logical right): shift instruction...................................................... 250 8.2.65 shlrn (shift logical right n bits): shift instruction ......................................... 251 8.2.66 sleep (sleep): system control instruction (privileged only)............................ 253 8.2.67 stc (store control register): system control instruction (privileged only) ..... 254 8.2.68 sts (store system register): system control instruction ................................... 259 8.2.69 sub (subtract binary): arithmetic instruction.................................................... 264 8.2.70 subc (subtract with carry): arithmetic instruction ........................................... 265 8.2.71 subv (subtract with v flag underflow check): arithmetic instruction ........... 266 8.2.72 swap (swap register halves): data transfer instruction.................................. 267 8.2.73 tas (test and set): logic operation instruction................................................. 268 8.2.74 trapa (trap always): system control instruction............................................ 269 8.2.75 tst (test logical): logic operation instruction................................................. 270 8.2.76 xor (exclusive or logical): logic operation instruction................................ 272 8.2.77 xtrct (extract): data transfer instruction........................................................ 274 8.3 floating point instructions and fpu related cpu instructions (sh-3e only)................ 275 8.3.1 fabs (floating point absolute value): floating point instruction..................... 277 8.3.2 fadd (floating point add): floating point instruction...................................... 278 vi 8.3.3 fcmp (floating point compare): floating point instruction .............................. 281 8.3.4 fdiv (floating point divide): floating point instruction.................................... 285 8.3.5 fldi0 (floating point load immediate 0): floating point instruction ............... 287 8.3.6 fldi1 (floating point load immediate 1): floating point instruction................ 288 8.3.7 flds (floating point load to system register): floating point instruction ...... 289 8.3.8 float (floating point convert from integer): floating point instruction ......... 290 8.3.9 fmac (floating point multiply accumulate): floating point instruction .......... 291 8.3.10 fmov (floating point move): floating point instruction ................................... 294 8.3.11 fmul (floating point multiply): floating point instruction............................... 298 8.3.12 fneg (floating point negate): floating point instruction.................................. 300 8.3.13 fsqrt (floating point square root): floating point instruction........................ 301 8.3.14 fsts (floating point store from system register): floating point instruction.. 302 8.3.15 fsub (floating point subtract): floating point instruction ................................ 303 8.3.16 ftrc (floating point truncate and convert to integer): floating point instruction ..................................................................................... 306 8.3.17 lds (load to system register): fpu related cpu instruction.......................... 308 8.3.18 sts (store from fpu system register): fpu related cpu instruction ............. 311 8.4 dsp data transfer instructions (sh3-dsp only) ............................................................. 314 8.4.1 movs (move single data between memory and dsp register): dsp data transfer instruction.............................................................................. 321 8.4.2 movx (move between x memory and dsp register): dsp data transfer instruction.............................................................................. 323 8.4.3 movy (move between y memory and dsp register): dsp data transfer instruction.............................................................................. 324 8.4.4 nopx (no access operation for x memory): dsp data transfer instruction .. 326 8.4.5 nopy (no access operation for y memory): dsp data transfer instruction .. 326 8.5 dsp operation instructions ............................................................................................... 32 7 8.5.1 pabs (absolute): dsp arithmetic operation instruction ................................... 340 8.5.2 [if cc]padd (addition with condition): dsp arithmetic operation instruction 343 8.5.3 padd pmuls (addition & multiply signed by signed): dsp arithmetic operation instruction.................................................................. 346 8.5.4 paddc (addition with carry): dsp arithmetic operation instruction.............. 348 8.5.5 [if cc] pand (logical and): dsp logical operation instruction ..................... 350 8.5.6 [if cc] pclr (clear): dsp arithmetic operation instruction .............................. 353 8.5.7 pcmp (compare two data): dsp arithmetic operation instruction.................. 355 8.5.8 [if cc] pcopy (copy with condition): dsp arithmetic operation instruction .. 357 8.5.9 [if cc] pdec (decrement by 1): dsp arithmetic operation instruction ............. 360 8.5.10 [if cc] pdmsb (detect msb with condition): dsp arithmetic operation instruction.................................................................. 363 8.5.11 [if cc] pinc (increment by 1 with condition): dsp arithmetic operation instruction.................................................................. 366 8.5.12 [if cc] plds (load system register): dsp system control instruction ............ 369 8.5.13 pmuls (multiply signed by signed): dsp arithmetic operation instruction ... 371 vii 8.5.14 [if cc] pneg (negate): dsp arithmetic operation instruction ........................... 372 8.5.15 [if cc] por (logical or): dsp logical operation instruction............................ 375 8.5.16 prnd (rounding): dsp arithmetic operation instruction ................................. 378 8.5.17 [if cc] psha (shift arithmetically with condition): dsp arithmetic shift instruction.......................................................................... 381 8.5.18 [if cc] pshl (shift logically with condition): dsp logical shift instruction ... 387 8.5.19 [if cc] psts (store system register): dsp system control instruction.............. 392 8.5.20 [if cc]psub (subtract with condition): dsp arithmetic operation instruction.. 395 8.5.21 psub pmuls (subtraction & multiply signed by signed): dsp arithmetic operation instruction.................................................................. 398 8.5.22 psubc (subtraction with carry): dsp arithmetic operation instruction .......... 400 8.5.23 [if cc] pxor (logical exclusive or): dsp logical operation instruction........ 402 section 9 processing states ............................................................................................... 405 9.1 state transitions.......................................................................................................... ....... 405 9.1.1 reset state ............................................................................................................ 40 6 9.1.2 exception processing state ................................................................................... 406 9.1.3 program execution state ...................................................................................... 406 9.1.4 power-down state ................................................................................................ 406 9.1.5 bus release state.................................................................................................. 406 9.2 power-down state ........................................................................................................... .. 406 9.2.1 sleep mode ........................................................................................................... 406 9.2.2 standby mode ....................................................................................................... 407 9.2.3 hardware standby mode ...................................................................................... 407 9.2.4 module standby function..................................................................................... 407 section 10 pipeline operation .......................................................................................... 409 10.1 basic configuration of pipelines ....................................................................................... 409 10.1.1 five-stage pipeline ............................................................................................... 409 10.1.2 slot and pipeline flow.......................................................................................... 410 10.1.3 number of cycles required for execution of one slot ....................................... 411 10.1.4 number of instruction execution cycles.............................................................. 412 10.2 contention................................................................................................................ .......... 413 10.2.1 contention between instruction fetch (if) and memory access (ma) ............... 413 10.2.2 effects of memory load instructions on pipelines .............................................. 417 10.2.3 contention due to sr update instructions............................................................ 418 10.2.4 multiplier access contention ............................................................................... 418 10.2.5 fpu contention (sh-3e only)............................................................................. 420 10.2.6 contention between dsp data operation instructions and store instructions (sh3-dsp only) ................................................................................................... 422 10.2.7 relationship between load and store instructions (sh3-dsp only) .................. 423 10.3 programming guidelines ................................................................................................... 4 24 10.3.1 correspondence between contention and instructions......................................... 424 viii 10.3.2 increasing instruction execution speed................................................................ 427 10.3.3 number of cycles ................................................................................................. 427 10.4 operation of instruction pipelines ..................................................................................... 428 10.4.1 data transfer instructions .................................................................................... 445 10.4.2 arithmetic instructions ......................................................................................... 450 10.4.3 logic operation instructions ................................................................................ 456 10.4.4 shift instructions................................................................................................... 461 10.4.5 branch instructions ............................................................................................... 463 10.4.6 system control instructions.................................................................................. 469 10.4.7 exception processing............................................................................................ 484 10.4.8 pipeline for fpu instructions (sh-3e only)........................................................ 488 10.4.9 dsp data transfer instructions (sh3-dsp only)................................................ 490 10.4.10 dsp operation instructions (sh3-dsp only)...................................................... 496 appendix a instruction code ........................................................................................... 501 a.1 instruction set by addressing mode.................................................................................. 501 a.1.1 no operand........................................................................................................... 502 a.1.2 direct register addressing ................................................................................... 503 a.1.3 indirect register addressing ................................................................................ 509 a.1.4 post-increment indirect register addressing ....................................................... 510 a.1.5 pre-decrement indirect register addressing ....................................................... 512 a.1.6 indirect register addressing with displacement ................................................. 513 a.1.7 indirect indexed register addressing .................................................................. 514 a.1.8 indirect gbr addressing with displacement ...................................................... 514 a.1.9 indirect indexed gbr addressing........................................................................ 515 a.1.10 pc relative addressing with displacement ......................................................... 515 a.1.11 pc relative addressing........................................................................................ 515 a.1.12 immediate ............................................................................................................. 51 6 a.2 instruction sets by instruction format............................................................................... 518 a.2.1 0 format................................................................................................................ 5 19 a.2.2 n format................................................................................................................ 5 20 a.2.3 m format.............................................................................................................. 523 a.2.4 nm format............................................................................................................. 526 a.2.5 md format............................................................................................................. 529 a.2.6 nd4 format............................................................................................................ 530 a.2.7 nmd format........................................................................................................... 530 a.2.8 d format................................................................................................................ 5 30 a.2.9 d12 format............................................................................................................ 531 a.2.10 nd8 format............................................................................................................ 53 1 a.2.11 i format................................................................................................................ 531 a.2.12 ni format............................................................................................................... 532 a.3 operation code map......................................................................................................... . 533 ix appendix b pipeline operation and contention ......................................................... 539 x 1 section 1 features 1.1 sh-3 cpu features the sh-3/sh-3e/sh3-dsp has risc-type instruction sets. basic instructions are executed in one clock cycle, which dramatically improves instruction execution speed. the cpu also has an internal 32-bit architecture for enhanced data processing ability. table 1-1 lists the sh-3/sh-3e/sh3-dsp cpu features. table 1-1 sh-3/sh-3e/sh3-dsp cpu features feature description architecture ? hitachi original architecture ? 32-bit internal data bus general-register machine ? sixteen 32-bit general registers (eight banked registers) ? five 32-bit control registers ? four 32-bit system registers (sh-3) ? six 32-bit system registers (sh-3e) instruction set ? instruction length: 16-bit fixed length for improved code efficiency ? load-store architecture (basic arithmetic and logic operations are executed between registers) ? delayed branch system used for reduced pipeline disruption ? instruction set optimized for c language instruction execution time ? one instruction/cycle for basic instructions address space ? architecture makes 4 gbytes available on-chip multiplier ? multiplication operations (32 bits 32 bits ? 64 bits) executed in 2 to 5 cycles, and multiplication/accumulation operations (32 bits 32 bits + 64 bits ? 64 bits) executed in 2 to 5 cycles pipeline ? five-stage pipeline processing states ? reset state ? exception processing state ? program execution state ? power-down state ? bus release state power-down states ? sleep mode ? standby mode ? hardware standby mode 2 table 1-1 sh-3/sh-3e/sh3-dsp cpu features (cont) feature description fpu (sh-3e only) ? single-precision floating point format ? subset of ieee754 standard data types ? invalid calculation exception and divide-by-zero exception (in compliance with ieee754 standard) ? rounding to zero (in compliance with ieee754 standard) ? general purpose register file, 16 32-bit floating point registers ? execution pitch for basic instructions: 1 cycle/latency or 2 cycles (fadd, fsub, fmul) ? fmac (floating point multiply accumulate) execution pitch: 1 cycle/latency or 2 cycles ? support for fdiv and fsqrt ? support for fldi0 and fldi1 (load constant 0/1) 3 1.2 sh3-dsp features the sh3 cpu only has 16-bit instructions. the sh3-dsp basically has the same 16-bit instructions, but it also has additional 32-bit dsp instructions that it uses for parallel processing of dsp type instructions. the sh3 cpu use a standard neumann architecture, but the sh3-dsp has the dsp data paths of the expanded harvard architecture. table 1-2 lists the added features of sh3-dsp. 4 table 1-2 features of sh3-dsp series microprocessor cpus feature description dsp unit multiplier arithmetic logic unit (alu) barrel shifter dsp registers msb detection multiplier 16 bits 16 bits ? 32 bits (fixed decimal point) 1 cycle multiplier dsp registers two 40-bit data registers six 32-bit data registers modulo register (mod, 32 bits) added to control registers repeat counter (rc) added to status registers (sr) repeat start register (rs, 32-bit) and repeat end register (re, 32- bit) added to control registers dsp data bus expanded harvard architecture simultaneous access of two data bus and one instruction bus on-chip memory 16-kbyte ram parallel processing maximum of four parallel processes (alu operation, multiplication, and two loads or stores) address operator two address operators address operations for accessing two memories dsp data addressing modes increment decrement and index increment decrement and index can have modulo addressing or not repeat control zero-overhead repeat control (loop) instruction set 16 or 32 bits 16 bits (for load or store only) 32 bits (including for alu operations and multiplication) superh microprocessor instructions added for accessing dsp registers. pipeline five-stage pipeline fifth stage is the dsp stage 5 section 2 programming model 2.1 organization of registers 2.1.1 privileged mode and banks processing modes : the sh-3/sh-3e/sh3-dsp has two operating modes: user mode and privileged mode. the sh-3/sh-3e/sh3-dsp operates in user mode under normal conditions and enters privileged mode in response to an exception or interrupt. there are three types of registers: general, system, and control. all of these registers are 32 bits. which registers can be accessed through software depends on the processing mode. general-purpose registers : there are 16 general-purpose registers, numbered r0 through r15. general-purpose registers r0 to r7 are banked registers that are switched by the processor mode. in privileged mode, the register bank (rb) bit in the status register (sr) defines which banked registers can be accessed as general-purpose registers and which cannot. inaccessible registers can be accessed through the load control register (ldc) and store control register (stc) instructions. when the rb bit is one (bank1 is selected), bank1 general-purpose registers r0_bank1 through r7_bank1 and non-banked general-purpose registers r8 through r15 (a total of 16 registers) can be accessed as general-purpose registers r0 through r15 and bank0 general- purpose registers r0_bank0 through r7_bank0 (eight registers) are accessed by the ldc and stc instructions. when the rb bit is a zero (bank0 is selected), bank0 general-purpose registers r0_bank0 through r7_bank0 and nonbanked general-purpose registers r8 through r15 (16 registers) can be accessed as general-purpose registers r0 through r15 and bank1 general-purpose registers r0_bank1 through r7_bank1 (eight registers) are accessed by the ldc and stc instructions. in user mode, bank0 general-purpose registers r0_bank0 through r7_bank0 and nonbanked general-purpose registers r8 through r15 can be accessed as general-purpose registers r0 through r15 (a total of 16 registers) and bank1 general-purpose registers r0_bank1 through r7_bank1 (eight registers) cannot be accessed. when the dsp extended features of the sh3-dsp are enabled, dsp instructions use x and y data memory and l bus data memory (single data) addressing for eight of the 16 general-purpose registers. to access x memory, r4 and r5 are used as the x address register [ax] and r8 is used as the x index register [ix]. to access the y memory, r6 and r7 are used as the y address register [ay] and r9 is used as the y index register [iy]. to access single data using the l bus, r2, r3, r4, and r5 are used as the single data address register and r8 as the single data index register [is]. 6 dsp type instructions can simultaneously access x and y memory. there are two groups of address pointers for specifying the x and y data memory addresses. control registers : the control registers include registers that can be accessed in either mode (the global base register (gbr) and status register (sr)) and registers that can only be accessed in privileged mode (the saved status register (ssr), saved program counter (spc), and vector base register (vbr)). some bits in the status register (for example, the rb bit) can only be accessed in privileged mode. system registers : there are four system registers that can be accessed in either processing mode: multiply and accumulate registers multiply and accumulate high (mach) multiply and accumulate low (macl) procedure register (pr) program counter (pc) the register configurations are shown in figure 2-1 by processing mode. switch between user and privileged modes using the processing operation mode bit in the status register. floating point registers and system registers used by the fpu (sh-3e only): there are 16 floating point registers: fr0 to fr15. these are used as source and destination registers for single- precision floating point operations. the system registers used by the fpu are the floating point communication register (fpul) and the floating point status/control register (fpscr). these are used for communication between the fpu and cpu as well as exception handling settings. the register configurations for the different processing modes are illustrated in figure 2-1 and figure 2-2. refer to 4. floating point unit. 7 31 0 r0 _ bank0 * 1, * 2 r1 _ bank0 * 2 r2 _ bank0 * 2 r3 _ bank0 * 2 r4 _ bank0 * 2 r5 _ bank0 * 2 r6 _ bank0 * 2 r7 _ bank0 * 2 r8 r9 r10 r11 r12 r13 r14 r15 sr gbr mach macl pr pc 31 0 fr0 * 3 fr1 * 3 fr2 * 3 fpscr * 3 fpul * 3 . . . . . fr15 * 3 notes 1. 2. 3. register r0 is used as an index register in the indexed register-indirect addressing mode and indexed gbr-indirect addressing mode. there are some instructions for which only r0 can be used as the source or destination register. r0 to r7 are banked registers, and bank0 is used in the user mode. these registers only exist on the sh-3e. they are used for floating point operations. refer to 4. floating point unit for details on fr0 to fr15, fpscr, and fpul. figure 2-1 user mode programming model 8 31 0 r0 _ bank1 * 1 , * 2 r1 _ bank1 * 2 r2 _ bank1 * 2 r3 _ bank1 * 2 r4 _ bank1 * 2 r5 _ bank1 * 2 r6 _ bank1 * 2 r7 _ bank1 * 2 r8 r9 r10 r11 r12 r13 r14 r15 r0 _ bank0 * 1 , * 3 r1 _ bank0 * 3 r2 _ bank0 * 3 r3 _ bank0 * 3 r4 _ bank0 * 3 r5 _ bank0 * 3 r6 _ bank0 * 3 r7 _ bank0 * 3 (b) user mode programming model (rb= 1) gbr mach macl vbr pr sr ssr pc spc 31 0 r0 _ bank1 * 1 , * 3 r1 _ bank1 * 3 r2 _ bank1 * 3 r3 _ bank1 * 3 r4 _ bank1 * 3 r5 _ bank1 * 3 r6 _ bank1 * 3 r7 _ bank1 * 3 r8 r9 r10 r11 r12 r13 r14 r15 r0 _ bank0 * 1 , * 2 r1 _ bank0 * 2 r2 _ bank0 * 2 r3 _ bank0 * 2 r4 _ bank0 * 2 r5 _ bank0 * 2 r6 _ bank0 * 2 r7 _ bank0 * 2 (c) user mode programming model (rb= 0) gbr mach macl vbr pr sr ssr pc spc 31 0 fr0 * 4 fr1 * 4 fr2 * 4 . . . . . fr15 31 0 fr0 * 4 fr1 * 4 fr2 * 4 . . . . . fr15 * 4 fpscr * 4 fpscr * 4 fpul * 4 fpul * 4 register r0 is used as an index register in the indexed register-indirect addressing mode and indexed gbr-indirect addressing mode. r0 to r7 are banked registers. in privileged mode, the rb bit of register sr determines which bank is accessed: bank0 if the rb bit is set to 0 bank1 if the rb bit is set to 1. these banks are accessed by the ldc and stc instructions only. the rb bit of register sr determines which bank is accessed: bank0 if the rb bit is set to 0 bank1 if the rb bit is set to 1. these registers only exist on the sh-3e. they are used for floating point operations. refer to 4. floating point unit for details on fr0 to fr15, fpscr, and fpul. notes 1. 2. 3. 4. figure 2-2 structure of registers in privileged mode 9 dsp registers and registers used by the dsp (sh3-dsp only) the dsp unit has nine dsp registers, divided into eight data registers and one control register. the dsp data registers include two 40-bit registers (a0 and a1) and six 32-bit registers (m0, m1, x0, x1, y0, and y1). the a1 and a0 registers each has eight guard bits, a0g and a1g. the dsp data registers are used in transferring and processing dsp data as the operand for the dsp instruction. there are three types of instructions that access the dsp data registers: dsp data processing, x data processing, and y data processing. the 32-bit dsp status register (dsr) is the control register, which indicates the results of operations. the dsr register has bits to display the results of the operation, which include a signed greater than bit (gt), a zero value bit (z), a negative value bit (n), an overflow bit (v), a dsp condition bit (dc), and condition select bits, which control the dc bit settings (cs). the dc bit is one of the status flags; it is very similar to the superh microcomputer cpu cores t bit. in the case of conditional dsp type instructions, the execution of dsp data processing is controlled in accordance with the dc bit. this control is related to dsp unit execution only, and only the dsp registers are updated. it is not related to the execution instructions of the superh microprocessors cpu core, such as address calculation and load/store instructions. the control bits cs (bits 0 to 2) specify the condition that the dc bits set. dsp instructions include both unconditional dsp instructions and conditioned dsp instructions. data processing of unconditional dsp instructions updates the condition bits and dc bits, except for the pmuls, pwad, pwsb, movx, movy, and movs instructions. conditional dsp type instructions are executed in accordance with the status of the dc bit. dsr registers are not updated, regardless of whether these instructions are executed or not. figure 2-1 shows the dsp registers. table 2-1 lists the dsr register bit functions. 10 39 32 31 0 a0g a1g a0 a1 m0 m1 x0 x1 y0 y1 dsp data registers dsp status register (dsr) gt z n v cs[2:0] dc 876543210 31 figure 2-3 organization of the dsp registers table 2-1 dsr register bits bits name function 31C8 reserved 0: always reads 0. always write 0. 7 signed greater than bit (gt) indicates whether the operation result is positive (and nonzero) or whether operand 1 is larger than operand 2. 1: operation result is positive or operand 1 is larger. 6 zero value bit (z) indicates whether the operation result is zero or whether of operands 1 and 2 are the same. 1: operation result is zero or operands 1 and 2 are the same. 5 negative value bit (n) indicates whether the operation result is negative or whether operand 1 is smaller than operand 2. 1: operation result is negative or operand 1 is smaller. 4 overflow bit (v) indicates that the operation result overflowed. 1: operation result overflowed. 3C1 condition select bits (cs) specifies the mode for selecting the status of the operation result set in the dc bit. do not specify 110 or 111. 000: carry/borrow mode 001: negative value mode 010: zero value mode 011: overflow mode 100: signed greater than mode 101: signed equal or greater than mode 0 dsp condition bit (dc) sets the operation result status in the mode specified by the cs bits. 0: specified mode status not achieved 1: specified mode status achieved. 11 cpu core instructions use the dsr register as a system register. data transfer to the dsr register include the following load store instructions: sts dsr, rm; sts.l dsr, @-rn; lds rn, dsr; lds.l @rn+, dsr; cpu core instructions also use the a0, a1, x0, x1, y0, and y1 registers as system registers. there are three dsp control registers: the repeat start (rs) register, the repeat end (re) register, and the modulo (mod) register. the rs and re registers are used to control program repetition (loops). the number of iterations is specified in the sr registers repeat counter (rc), the repeat start address is specified in the rs register, and the repeat end address is specified in the re register. the address values stored in the rs and re registers are not always the same as the physical starting address and ending address of the repeat. the mod register uses modulo addressing to buffer the repeat data. modulo addressing is specified by dmx or dmy in the sr register, the modulo end address (me) is specified in the top 16 bits of the mod register, and the modulo start address (ms) is specified in the bottom 16 bits. the dmx and dmy bits cannot simultaneously specify modulo addressing. modulo addressing can be used for x and y data transfers (movx and movy). it cannot be used in single data transfers (movs). figure 2-5 shows the control registers. 2.2 general-purpose registers figure 2-4 shows the structure of the general-purpose registers. 12 r0 * 1 , * 2 r1 * 2 r2* 2 [as] * 4 r3 * 2 [as] * 4 r4 * 2 [as, ax] * 4 r5 * 2 [as, ax] * 4 r6 * 2 [ay] * 4 r7 * 2 [ay] * 4 r8 [ix, is] * 4 r9 [iy] * 4 r10 r11 r12 r13 r14 r15 31 0 notes: 1. 2. 3. 4. r0 functions as an index register in the indexed register-indirect addressing mode and indexed gbr-indirect addressing mode. in some instructions, only r0 can be used as the source or destination register. r0 to r7 are banked registers. in privileged mode, the rb bit of register sr determines which banks (r0_bank0 to r7_bank0 or r0_bank1 to r7_bank1) are accessed as general-purpose registers. these registers only exist on the sh- 3e. they are used for floating point operations. refer to 4. floating point unit for details on fr0 to fr15. when the dsp instruction extended features of the sh3-dsp are enabled, dsp instructions use these registers as memory address registers and index registers. fr0 * 3 fr1 * 3 fr2 * 3 fr3* 3 fr4 * 3 fr5 * 3 fr6 * 3 fr7 * 3 fr8 * 3 fr9 * 3 fr10 * 3 fr11 * 3 fr12 * 3 fr13 * 3 fr14 * 3 fr15 * 3 31 0 general-purpose registers undefined after reset floating point data register the fmac instruction uses fr0 to set the multipli- cation value. figure 2-4 structure of the general-purpose registers 13 the symbols r2Cr9 are used by the assembler. to change a name to something that indicates the role of the register for dsp instructions, use an alias. the assembler writes as follows: ix: .reg (r8) the name ix becomes the alias r8. aliases are also assigned as follows: ax0: .reg (r4) ax1: .reg (r5) ix: .reg (r8) ay0: .reg (r6) ay1: .reg (r7) iy: .reg (r9) as0: .reg (r4); defined when an alias is needed for a single data transfer. as1: .reg (r5); defined when an alias is needed for a single data transfer. as2: .reg (r2); defined when an alias is needed for a single data transfer. as3: .reg (r3); defined when an alias is needed for a single data transfer. is: .reg (r8); defined when an alias is needed for a single data transfer. 2.3 control registers figure 2-5 shows the organization of the control registers. 14 ssr saved status register (ssr) stores current sr value at time of exception to indicate processor status in the return to instruction stream from exception handler. undefined after reset. saved program counter (spc) stores current pc value at time of exception to indicate return address at completion of exception processing. undefined after reset. global base register (gbr) stores the base address of the gbr-indirect addressing mode. the gbr-indirect addressing mode is used to transfer data to the register areas of the resident peripheral modules, and for logic operations. the gbr can be accessed in user mode. undefined after reset. vector base register (vbr) stores the base address of the exception processing vector area. initialized to h'00000000 after reset. 31 0 spc 31 0 gbr 31 0 vbr 31 0 md: rb: bl: dsp bit: m and q bits: rc: dmy: dmx: i3?0: s bit: rf1, rf0: t bit: 0 bits: processor operation mode bit: indicates the processor operation mode as follows: 1 = privileged mode; 0 = user mode. becomes 1 when an exception or interrupt occurs. initialized to 1 reset. register bank bit: defines the general-purpose register used as bank in privileged mode. a logic 1 designates r0_bank1?7_bank1 and r8?15 are accessed as general?urpose registers, and r0_bank0?7_bank0 are only accessed by ldc and stc instructions; a logic zero designates r0_bank0?7_bank0 and r8?15 are accessed as general-purpose registers, and r0_bank1?7_bank1 are only accessed by ldc and stc instructions. becomes 1 when an exception or interrupt occurs. initialized to 1 reset. block bit: masks exceptions and interrupts when 1. for details, see section 5, exception processing. when 0, accepts exceptions and interrupts. becomes 1 when an exception or interrupt occurs. initialized to 1 at reset. dsp operation mode. dsp instructions are enabled when set to 1. used by the divos/divou and div1 instructions. repeat counter. specifies the number of repeats for repeat (loop) control (2 to 4,095). modulo addressing specification for pointer y. 1: modulo addressing mode enabled for y memory address pointer and ay (r6 and r7). modulo addressing specification for pointer x. 1: modulo addressing mode enabled for memory address pointer and ax (r4 and r5). interrupt mask bits: a 4-bit field indicating the interrupt request mask level. the level of interrupt acceptance does not change when an interrupt occurs. initialized to b'1111 at reset. used by the mac instruction. repeat flags. used for zero-overhead repeat (loop) control. 00: 1-step repeat 01: 2-step repeat 11: 3-step repeat 10: 4-step (or more) repeat the movt, cmp/cond, tas, tst, bt, bf, sett, clrt, and dt instructions use the t bit to indicate true (logic one) or false (logic zero). the addv/addc, subv/subc, divou/divos, div1, negc, shar/shal, shlr/shrl, rota/rotl, and rotcr/rotcl instructions also use the t bit to indicate a carry, borrow, overflow or underflow. always read as 0, and should always be written as 0. notes: only the m, q, s, and t can be set or cleared by special instructions from user mode. undefined after reset. all other bits are read or written from privileged mode. * 0 for versions other than the sh3-dsp. 31 0 30 md 29 rb 10 11 12 13 15 16 0 9 m 8 q 7 i3 i2 i1 i0 32 rf0 * dmx * rf1 * 1 s 0 t 28 bl 27 0 dmy * dsp * rc * status register (sr) 31 0 rs repeat start register (rs) 31 0 re repeat end register (re) 31 15 16 0 modulo register (mod) me ms me: modulo end address ms: modulo start address figure 2-5 control registers configuration 15 2.4 system registers the system registers are accessed by the lds and sts instructions. figure 2-3 shows the system register configuration. 31 0 31 0 31 0 pc pr fpul* macl mach 31 0 fpscr* 31 0 system registers multiply and accumulate high and low registers (mach/l) store the results of multiply and multiply-and- accumulate operations. undefined after reset. floating point communication register (fpul) points the communication buffer between the cpu and the fpu. program counter (pc) indicates starting address of the current instruction incremented by four (two instructions). initialized to h'a000 0000 after reset. procedure register (pr) stores the return address for existing subroutines. undefined after reset. floating point status/control register (fpscr) stores status or controls information for floating point operations. note: * see section 4, floating point unit, for more information on the fpul and fpscr. figure 2-6 system register configuration 16 2.5 initial register value table 2-1 shows the register values after a reset. table 2-1 initial register values register type register initial value* 1 general purpose r0Cr15 undefined fr0Cfr15* 2 undefined control sr md bit is 1, rb bit is 1, bl bit is 1, bits i3Ci0 are 1111 (h'f), bits rc, dmy, and dmx are 0 (sh3- dsp only), reserved bits are 0, and all others are undefined gbr, ssr, spc undefined vbr h'00000000 rs * 2 , re * 2 undefined mod * 2 undefined system mach, macl, pr, fpscr * 1 , fpul * 1 undefined pc h'a0000000 dsp a0, a0g, a1, a1g, m0, m1, x0, x1, y0, y1 dsr h'00000000 notes: 1. these registers only exist on the sh-3e. they are used for floating point operations. refer to 4. floating point unit for details on fr0 to fr15, fpscr, and fpul. 2. these registers only exist on the sh-3e. 17 section 3 data formats 3.1 data format in registers register operands are always longwords (32 bits) (figure 3-1). when the memory operand is only a byte (8 bits) or a word (16 bits), it is sign-extended into a longword when loaded into a register. 31 0 longword figure 3-1 longword operand 3.2 data format in memory memory data formats are classified into bytes, words, and longwords. memory can be accessed in bytes (8 bits), words (16 bits), or longwords (32 bits). memory operands that do not fill out 32 bits are sign-extended and stored in a register. access word operands from word boundaries (even addresses two bytes apart: 2n addresses) and longword operands from longword boundaries (even addresses four bytes apart: 4n addresses). other accesses cause address errors. byte operands can be accessed from any address. data formats can use either big endian or little endian byte order. use the external pin (md5) to set the endian at power-on reset. when md5 is low, the processor operates in big endian; when md5 is high, the processor operates in little endian. endians cannot be changed dynamically. numbers are always assigned to bit positions, from most significant to least significant and from left to right. for example, in a longword (32 bits), the leftmost bit (31) is the most significant and the rightmost bit (0) is the least significant. figure 2-6 shows the data format in memory. when little endian is used, data written in bytes (8 bits) should be read in bytes. data written in words (16 bits) should be read in words. longword longword 31 0 31 0 31 0 15 23 7 byte0 byte1 byte2 byte3 word1 big endian word0 address a + 4 address a + 8 address a + 4 address a a + 1 a a + 2 a + 3 a + 11 a + 10 a + 9 a + 8 31 0 15 23 7 7 15 15 0 015150 0 0000 0 0 07 77 70 7 77 byte3 byte2 byte1 byte0 word0 little endian word1 address a address a + 8 figure 3-2 data formats in memory 18 3.3 data format for immediate data immediate data bytes are arranged inside instruction codes. for the mov, add, and cmp/eq instructions, immediate data is sign-extended and then processed as registers and longwords. in contrast, for the tst, and, or, and xor instructions, immediate data is zero-extended and then processed as longwords. consequently, if immediate data is used with the and instruction, the upper 24 bits of the destination register will always be cleared. word and longword immediate data is not arranged inside instruction codes. instead, it is stored in memory table. memory tables can be accessed using the immediate data transfer instruction (mov) in the pc relative addressing mode with displacement. for specific examples, see 6.1.8 immediate data in section 6. instruction features. 3.4 dsp type data formats (sh3-dsp only) the sh-dsp uses three different data formats for instructions: the fixed decimal point data format, the integer data format, and the logical data format. the dsp type of fixed decimal point data format places a binary decimal point between bits 31 and 30. this data format can have guard bits, no guard bits, or be multiplication input. the valid bit lengths and values displayed vary for each. dsp type integer data formats place a binary decimal point between bits 16 and 15. this data format can have guard bits, no guard bits, or be a shift amount. the valid bit lengths and values displayed vary for each. the shift amount for arithmetic shift (psha) is a seven-bit area between C64 and +63, although only values between C32 and +32 are valid. the shift amount for logical shifts is a six bit area, although, in the same fashion, only values between C16 and +16 are valid. the dsp type logical data format has no decimal point. the data format and valid data length vary with the instruction and dsp register. figure 3-3 shows the three dsp data formats and the position of the two binary decimal points, as well as the superh data format (as reference). 19 s s s s s s s s (16 bits) dsp logical data superh integer (word) (reference) dsp integer data dsp fixed decimal point data with guard bits no guard bits multiplication input with guard bits no guard bits arithmetic shift (psha) logical shift (pshl) 39 39 39 39 32 32 31 31 31 31 31 31 31 31 31 22 21 0 0 0 0 0 0 0 0 0 ? 8 to +2 8 ?2 ?1 ? to +1 ?2 ?1 ? to +1 ?2 ?5 ? 23 to +2 23 ? ? 15 to +2 15 ? ?2 to +32 ?6 to +16 ? 31 to +2 31 ? 16 15 16 16 16 16 16 15 15 15 15 15 s: sign bit : binary decimal point : unrelated to processing (ignored) 30 30 30 figure 3-3 dsp data formats 20 21 section 4 floating point unit (sh-3e only) 4.1 introduction the sh-3e has a built-in floating point operations unit (fpu). figure 4-1 shows the fpu registers. fr0 fr1 fr2 fr3 fr4 fr5 fr6 fr7 fr8 fr9 fr10 fr11 fr12 fr13 fr14 fr15 31 0 fpul* 31 0 fpscr* 31 0 floating point registers system registers fr0 functions as the index register for fmac instructions. floating point communication register (fpul) indicates the buffer as the communication register between the cpu and the fpu. floating point status/control register (fpscr) stores status or control information for floating point operations. note: * see section 4.2, floating point registers and fpu systems registers, for more information. figure 4-1 register set overview: floating point registers and system registers used by the fpu 22 4.2 floating point registers and system registers for fpu 4.2.1 floating point register file the sh-3e provides sixteen 32-bit single-precision floating point registers. register designators are always 4-bits. in assembly language, the floating point registers are designated as fr0, fr1, fr2, etc. fr0 functions as the index for fmac instructions. 4.2.2 floating point communication register (fpul) information is transferred between the fpu and the cpu through a communication register, fpul, which is analogous to the macl and mach registers of the integer unit. the sh-3e provides this communication register because of the differences between integer format and fpu format. fpul is a 32-bit system register, accessed on the cpu side by lds and sts instructions. 4.2.3 floating point status/control register (fpscr) the sh-3e implements a floating point status and control register, fpscr, as a system register accessed through the lds and sts instructions (figure 4-2). fpscr is available for modification by user programs. the fpscr is part of the process context. it must be saved across context switches and may need to be saved across procedure calls. the fpscr is a 32-bit register that controls fpu rounding, handling of denormalized values, and captures details about floating point exceptions. in the sh-3e, only the following modes are supported for these functions. ? rounding mode: rounding toward 0. ? handling of denormalized values: when denormalized values are in the source or destination operand, they are always treated as 0. ? fpu exceptions: divide by zero (z) and invalid (v). 23 0 -------------------- 0 1 cv cz 0 0 0 ev ez 0 0 0 fv fz 0 0 0 cause enable flag 0 1 0 31 18 19 17 16 15 12 11 10 14 7 96 2 4 510 cv: invalid-operation cause bit 1: invalid-operation exception occurred during execution of the current instruction 0: invalid-operation exception did not occur cz: divide-by-zero cause bit 1: divide-by-zero exception occurred during the execution of the current instruction 0: divide-by-zero exception did not occur ev: invalid-operation exception enable bit 1: enable invalid-operation exception 0: disable invalid-operation exception and return qnan as a result ez: divide-by-zero exception enable bit 1: enable divide-by-zero exception 0: disable divide-by-zero exception and return correctly signed infinity fv: invalid-operation exception flag bit 1: invalid-operation exception occurred during execution of the current instruction 0: invalid-operation exception did not occur fz: divide-by-zero exception flag bit 1: divide-by-zero exception occurred during the execution of the current instruction 0: divide-by-zero exception did not occur note: with the exception of the above bits, all bits are reserved as shown in the figures and cannot be modified even by lds instruction. figure 4-2 floating point status/control register the bits in the cause field indicate the cause of exception for the executing of the current instruction. the cause bits are modified by execution of a floating point instruction. these bits are set to 0 or 1, depending on occurrence or non-occurrence of exception conditions during the execution of a single instruction. the bits in the enable field indicate the specific types of exceptions that are enabled to cause an exception, that is, change of flow to an exception handling procedure. an exception occurs if the enable bit and the corresponding cause bit are set by the execution of the current instruction. the bits in the flag field are used to capture the cumulative effect of all exceptions during the execution of a sequence of instructions. these bits, once set by an instruction, can not be reset by following instructions. the bits in this field can only be reset by an explicit store operation on fpscr. see section 4.4, floating point exceptions model, for more information on handling of floating point exceptions. 24 4.3 floating point format 4.3.1 floating point format the sh-3e supports single-precision floating point operations. it also conforms fully to the ieee754 standard. floating point numbers are composed of three fields: sign field : s exponent field : e mantissa field : f the exponent is biased. in other words: e = e + bias the range of unbiased exponents e is e min C1 to e max +1. the two values (e min C1 and e max +1) are distinguished as follows. e min C1 represents zero (sign is both positive and negative) and a denormalized number while e max +1 represents positive and negative infinity and a not-a-number (nan). in single-precision operations, the bias value is 127, e min is C126, and e max is 127. 31 30 23 22 0 se f figure 4-3 floating point format the value v of the floating point number is determined as follows: if e== e max +1 and f!=0, then v is not a number (nan) regardless of sign s if e== e max +1 and f==0, then v=(C1) s (infinity) [positive or negative infinity] if e min <=e<= e max , then v =(C1) s 2 e (1.f) [normalized number] if e== e min C1 and f!=0, then v =(C1) s 2 emin (0.f) [denormalized number] if e== e min C1 and f==0, then v =(C1) s 0 [positive or negative zero] 4.3.2 not a number (nan) in not-a-number (nan) expressions in single-precision operations, at least one of the bits 22C0 is set. set bit 22 for a signaling nan (snan). when bit 22 is reset, the value is then the quiet nan (qnan). 25 the following figure shows the bit pattern of the not-a-number (nan). bit n in the figure is set for snan and reset for qnan. an x indicates a dont-care bit. at least one of bits 22-0 must be set. in a not-a-number (nan), the sign bit is a dont-care bit. 31 30 23 22 0 x 11111111 nxxxxxxxxxxxxxxxxxxxxxx n = 1: snan n = 0: qnan figure 4-4 nan bit pattern when a not-a-number (snan) is entered in the operation that generates the floating point value: when the ev bit is reset in the fpscr, the operation result (output) is qnan. when the ev bit is set in the fpscr, an invalid operation exception occurs. in such cases, the contents of the register at the destination side of the operation do not change. when qnan is input to the operation that generates the floating point value and snan is not input to the operation, the output will always be qnan regardless of how the ev bit is set in the fpscr. no exception will occur. 4.3.3 denormalized values denormalized floating point values are expressed by a biased exponent of 0, a nonzero mantissa, and a hidden bit of 0. in the sh-3es floating point unit, denormalized values (operand source or operation result) are uniformly flushed with 0 in floating point operations (other than copy) that generate values. 4.3.4 other special values other special values are as stipulated by standard ieee754. table 4-1 shows the seven different types of special values in floating point value expressions. 26 table 4-1 special value expressions in single-precision stipulated in ieee754 value expression +0.0 0x00000000 -0.0 0x80000000 denormalized number see section 4.3.3, denormalized values +inf 0x7f800000 Cinf 0xff800000 qnan (quiet nan) see section 4.3.2, not a number (nan) snan (signaling nan) see section 4.3.2, not a number (nan) 4.4 floating point exception model 4.4.1 enabled exception invalid-operation and divide-by-zero exceptions are enabled by setting the enable bit for the relevant exception (the ev or ez bit) in fpscr. all exceptions caused by the fpu are mapped as fpu exception events. the meaning of an individual exception is determined by software by reading the fpscr system register and analyzing the information held there. 4.4.2 disabled exception if enable bit ev is not set in fpscr, the result of an invalid operation will be qnan (with the exception of fcmp and ftrc). if enable bit ez is not set, division by zero will return infinity with the sign of the current expression (+ or -). the other floating-point exceptions specified in the ieee754 standardinexact, overflow, and underfloware not supported by the sh-3e. in these cases, the sh-3e operates as described below. ? an overflow will produce the number whose absolute value is the largest representable finite number in the format with the correct sign bit. an underflow will produce a correctly signed zero. if the result of an operation is inexact, the destination register will hold the inexact result. 4.4.3 exception event and code for fpu all fpu exceptions are mapped onto the single general exception event at address h'0x120. loads and stores of system registers fpul and fpscr cause the normal memory management general exceptions. 27 4.4.4 alignment of floating point data in memory single precision floating point data is aligned on modulus-4 boundaries, that is, in the same fashion as sh-3e long integers. 4.4.5 arithmetic with special operands all arithmetic with special operands (qnan, snan, +inf, Cinf, +0, C0) follows ieee754 rules. 4.5 synchronization issues synchronization with cpu: floating-point and cpu instructions are issued serially in program order, but may complete out-of-order due to execution cycle differences. a floating point operation that accesses only fpu resources does not require synchronization with the cpu, and subsequent cpu operations can complete before the completion of the floating point operation. therefore an optimized program can hide the execution cycle of a long-execution-cycle floating point operation such as divide. a floating point operation such as compare that accesses cpu resources, however, requires synchronization to ensure program order. floating point instructions requiring synchronization: loads, stores, compares/tests, and instructions accessing fpul access cpu resources and therefore require synchronization. loads and stores refer to general registers. post-increment loads and pre-decrement stores modify general registers. compares/tests modify the t bit. instructions accessing fpul refer to or modify fpul. these references and modifications must be synchronized with the cpu. maintaining program order on exceptions: floating point instructions are never completed until subsequent cpu instructions are completed. if an fpu exception is detected before subsequent cpu instructions finish and an fpu exception occurs, subsequent cpu instructions are canceled. during a floating point instruction execution, if a subsequent instruction causes an exception, the floating point instruction is left executing and fpu resources cannot be accessed by other instructions. the other instructions must await the completion of the floating point operation before they can access. this ensures program order. 28 29 section 5 dsp operation functions and data transfers (sh3-dsp only) dsp operations and data transfers are listed below: alu fixed decimal point operations: these are fixed decimal point operations with either 40- bit (with guard bits) or 32-bit (with no guard bits) fixed decimal point data. these include addition, subtraction, and comparison instructions. alu integer operations: these are integer arithmetic operations with either 24-bit (with guard bits) or 16-bit (with no guard bits) integer data. they include increment and decrement instructions. alu logical operations: these are logical operations with 16-bit logical data. they include and, or, and exclusive or. fixed decimal point multiplication: this is fixed decimal point multiplication (arithmetic operation) of the top 16 bits of fixed decimal point data. condition bits such as the dc bit are not updated. shift operations: these are arithmetic and logical shift operations. arithmetic shift operations are arithmetic shifts of 40 bits (with guard bits) or 32 bits (with no guard bits) of fixed decimal point data. logical shift operations are logical operations on 16 bits of logical data. the amount of the arithmetic shift operation is C32 to +32 (negative for right shifts, positive for left shifts); for logical shifts, the amount is C16 to +16. msb detection instruction: this operation finds the amount of the shift to normalize the data. it finds the position of the msb bit in either 40-bit (with guard bits) or 32-bit (with no guard bits) fixed decimal point data as either 24 bits (with guard bits) or 16 bits (with no guard bits) integer data. rounding operation: rounds 40-bit fixed decimal point data (with guard bits) to 24 bits or 32- bit (with no guard bits) fixed decimal point data to 16 bits. data transfers: data transfers consist of x and y data transfers, which load or store 16-bit data to and from x and y memory, and single data transfers, which load and store 16- or 32-bit data from all memories. two x and y data transfers can be processed in parallel. condition bits such as the dc bit are not updated. the operation instructions include both conditional operation instructions and instructions that are conditionally executed depending on the dc bit. condition bits such as the dc bit are not updated by conditional instructions. their settings vary for arithmetic operations, logical operations, arithmetic shifts, and logical shifts. or msb detection instructions and rounding instructions, set the condition bits like for arithmetic operations. 30 arithmetic operations include overflow preventing instructions (saturation operations). when saturation operation is specified with the s bit in the sr register, the maximum (positive) or minimum (negative) value is stored when the result of operation overflows. 5.1 alu fixed decimal point operations 5.1.1 function alu fixed decimal point operations basically work with a 32-bit unit to which 8 guard bits are added for a total of 40 bits. when the source operand is a register without guard bits, the registers sign bit is extended and copied to the guard bits. when the destination operand is a register without guard bits, the lower 32 bits of the operation result are stored in the destination register. alu fixed decimal point operations are performed between registers. the source and destination operands are selected independently from the dsp register. when there are guard bits in the selected register, the operation is also executed on the guard bits. these operations are executed in the dsp stage (the last stage) of the pipeline. whenever an alu arithmetic operation is executed, the dsr registers dc, n, z, v, and gt bits are updated by the operation result. for conditional instructions, however, condition bits are not updated even when the specified condition is achieved. for unconditional instructions, the bits are updated according to the operation result. the condition reflected in the dc bit is selected with the cs[2:0] bits. the dc bits of the paddc and psub instructions, however, are updated regardless of the cs bit settings. in the paddc instruction, it is updated as a carry flag; in the psub instruction, it is updated as a borrow flag. figure 5-1 shows the alu fixed decimal point operation flowchart. 31 31 0 31 31 0 0 alu gt v n zdc dsr source 1 source 2 destination guard bits guard bits guard bits figure 5-1 alu fixed decimal point operation flowchart when the memory read destination operand is the same as the alu operation source operand and the data transfer instruction program is written on the same line as the alu operation, data loaded from memory in the memory access stage (ma) cannot be used as the source operand of the alu operation instruction. when this occurs, the result of the instruction executed first is used as the source operand of the alu operation and is updated as the destination operand of the data load instruction thereafter. figure 5-2 is a flowchart of the operation. 123 456 movx movx, add if id if id ex (ad- dressing) ex (ad- dressing) ma (movx) ma (movx) dsp (nop) dsp (add) movx.w @ (r4, r8), x0 movx.w @ r4+, x0 padd x0, y0, a0 slot the result of the previous step is used. figure 5-2 sample processing flowchart 32 5.1.2 instructions and operands table 5-1 shows the types of alu fixed decimal point arithmetic operations. table 5-2 shows the correspondence between the operands and registers. table 5-1 types of alu fixed decimal point arithmetic operations mnemonic function source 1 source 2 destination padd addition sx sy dz (du) psub subtraction sx sy dz (du) paddc addition with carry sx sy dz psubc subtraction with borrow sx sy dz pcmp compare sx sy pcopy copy data sx dz sydz pabs absolute value sx dz sydz pneg invert sign sx dz sydz pclr zero clear dz table 5-2 correspondence between operands and registers for alu fixed decimal point arithmetic operations operand x0 x1 y0 y1 m0 m1 a0 a1 sx yes* 1 yes yes yes sy yes yes yes yes dz yes yes yes yes yes yes yes yes du* 2 yes yes yes yes notes: 1. yes: register can be used with operand. 2. du: operand when used in combination with multiplication. 5.1.3 dc bit the dc bit is set as follows depending on the specification of the cs0-cs2 bits (condition select bits) of the dsr register. 33 carry/borrow mode: cs2Ccs0 = 000: the dc bit indicates whether a carry or borrow has occurred from the msb of the operation result. the guard bits have no affect on this. this mode is the default. figure 5-3 shows examples when carries and borrows occur. 0000 0000 1111 1111 1111 1111 0000 0000 0000 0000 0000 0001 0000 0001 0000 0000 0000 0000 +) 1111 1111 0111 0000 0000 0000 0011 1111 0001 0000 0000 0000 0011 1110 1000 0000 0000 0000 +) (1) 0000 0000 0000 0000 0000 0001 0000 0000 0000 0000 0000 0001 0000 0000 0000 0000 0000 0000 ? 0000 0000 0001 0000 0000 0001 0000 0000 0001 0000 0000 0010 1111 1111 1111 1111 1111 1111 ? guard bits guard bits guard bits guard bits example 1: carry example 2: carry example 3: borrow example 4: borrow position where carry is detected position where carry is detected position where borrow is detected position where borrow is detected figure 5-3 examples of carries and borrows negative mode: cs2Ccs0 = 001: in this mode, the dc bit is the same as the msb of the operation result. when a result is negative, the dc bit is 1. when the result is positive, the dc bit is 0. alu arithmetic operations are always done in 40 bits. the sign bit indicating positive or negative is thus the msb included in the guard bits of the operation result rather than the msb of the destination operand. figure 5-4 shows an example of distinguishing negative from positive. in this mode, the dc bit has the same value as the condition bit n. 1100 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001 1100 0000 0000 0000 0000 0001 +) 0011 0000 0000 0000 0000 0000 0000 0000 1000 0000 0000 0001 0011 0000 1000 0000 0000 0001 +) guard bits guard bits example 1: negative example 2: positive sign bit sign bit figure 5-4 distinguishing negative and positive 34 zero mode: cs2Ccs0 = 010: the dc bit indicates whether the operation result is zero. when it is, the dc bit is 1. when the operation result is nonzero, the dc bit is 0. in this mode, the dc bit has the same value as the condition bit z. overflow mode: cs2Ccs0 = 011: the dc bit indicates whether the operation result has caused an overflow. when the operation result without the guard bits has exceeded the bounds of the destination register, the dc bit is set to 1. the dc bit considers there to be no guard bits, which makes it an overflow even when there are guard bits. this means that the dc bit is always set to 1 when large numbers use guard bits. in this mode, the dc bit has the same value as the condition bit v. figure 5-5 shows an example of distinguishing overflows. 1111 1111 1111 1111 1111 1111 1111 1111 1000 0000 0000 0000 1111 1111 0111 1111 1111 1111 +) 1111 1111 1111 1111 1111 1111 1111 1111 1000 0000 0000 0001 1111 1111 1000 0000 0000 0000 +) guard bits guard bits example 1: overflow example 2: no overflow overflow detection range overflow detection range figure 5-5 distinguishing overflows signed greater than mode: cs2Ccs0 = 100: the dc bit indicates whether the source 1 data (signed) is greater than the source 2 data (signed) in the result of a comparison instruction pcmp. for that reason, the pcmp instruction is executed before checking the dc bit in this mode. when the source 1 data is larger than the source 2 data, the result of the comparison is positive, so this mode becomes similar to the negative mode. when the source 1 data is larger than the source 2 data and the bounds of the destination operand are exceeded, however, the sign of the result of the comparison becomes negative. the dc bit is updated. in this mode, the dc bit has the same value as the condition bit gt. the equation shown below defines the dc bit in this mode. however, vr becomes a positive value when the result including the guard bit area exceeds the display range of the destination operand. dc bit = ~ {(n bit ^ vr)|z bit} when the pcmp instruction is executed in this mode, the dc bit becomes the same value as the t bit that indicates the result of the sh cores cmp/gt instruction. in this mode, the dc bit is updated according to the above definition for instructions other than the pcmp instruction as well. signed greater than or equal to mode: cs2Ccs0 = 101: the dc bit indicates whether or not the source 1 data (signed) is greater than or equal to the source 2 data (signed) in the result of the execution of a comparison instruction pcmp. for that reason, the pcmp instruction is executed before checking the dc bit in this mode. this mode is similar to the signed greater than mode except for checking if the operands are the same. the equation shown below defines the dc bit in 35 this mode. however, vr becomes a positive value when the result, including the guard bit area, exceeds the display range of the destination operand. dc bit = ~ (n bit ^ vr) when the pcmp instruction is executed in this mode, the dc bit becomes the same value as the t bit that indicates the result of the superh cores cmp/ge instruction. in this mode, the dc bit is updated according to the above definition for instructions other than the pcmp instruction as well. 5.1.4 condition bits the condition bits are set as follows: the n (negative) bit has the same value as the dc bit when the cs bits specify negative mode. when the operation result is negative, the n bit is 1. when the operation result is positive, the n bit is 0. the z (zero) bit has the same value as the dc bit when the cs bits specify zero mode. when the operation result is zero, the z bit is 1. when the operation result is nonzero, the z bit is 0. the v (overflow) bit has the same value as the dc bit when the cs bits specify overflow mode. when the operation result exceeds the bounds of the destination register without the guard bits, the v bit is 1. otherwise, the v bit is 0. the gt (greater than) bit has the same value as the dc bit when the cs bits specify signed greater than mode. when the comparison result indicates the source 1 data is greater than the source 2 data, the gt bit is 1. otherwise, the gt bit is 0. 5.1.5 overflow prevention function (saturation operation) when the s bit of the sr register is set to 1, the overflow prevention function is engaged for the alu fixed decimal point arithmetic operation executed by the dsp unit. when the operation result overflows, the maximum (positive) or minimum (negative) value is stored. 5.2 alu integer operations alu integer operations are basically 24-bit operations on the top word (the top 16 bits, or bits 16 through 31) and 8 guard bits. in alu integer operations, the bottom word of the source operand (the bottom 16 bits, or bits 0C15) is ignored and the bottom word of the destination operand is cleared with zeros. when the source operand has no guard bits, the sign bit is extended to fill the guard bits. when the destination operand has no guard bits, the top word of the operation result (not including the guard bits) are stored in the top word of the destination register. integer operations are basically the same as alu fixed decimal point arithmetic operations. there are only two types of integer operation instructions, increment and decrement, which change the second operand by +1 or C1. 16 bits of integer data (word data) is loaded to the dsp register and stored in the top word. the operation is performed using the top word in the dsp register. when 36 there are guard bits, they are valid as well. these operations are executed in the dsp stage (the last stage) of the pipeline. whenever an alu integer arithmetic operation is executed, the dsr registers dc, n, z, v, and gt bits are basically updated by the operation result. this is the same as for alu fixed decimal point operations. for conditional instructions, condition bits and flags are not updated even when the specified condition is achieved and the instruction executed. for unconditional instructions, the bits are always updated according to the operation result. figure 5-6 shows the alu integer operation flowchart. 31 0 31 31 0 0 alu gt v n zdc dsr : cleared to 0 guard bits guard bits guard bits : ignored destination source 1 source 2 figure 5-6 alu integer operation flowchart table 5-3 lists the types of alu integer operations. table 5-4 shows the correspondence between the operands and registers. 37 table 5-3 types of alu integer operations mnemonic function source 1 source 2 destination pinc increment by 1 sx (+1) dz (+1) sy dz pdec decrement by 1 sx (C1) dz (C1) sy dz table 5-4 correspondence between operands and registers for alu integer operations operand x0 x1 y0 y1 m0 m1 a0 a1 sx yes yes yes yes sy yes yes yes yes dz yes yes yes yes yes yes yes yes note: yes: register can be used with operand. when the s bit of the sr register is set to 1, the overflow prevention function (saturation operation) is engaged. the overflow prevention function can be specified for alu integer arithmetic operations executed by the dsp unit. when the operation result overflows, the maximum (positive) or minimum (negative) value is stored. 5.3 alu logical operations 5.3.1 function alu logical operations are performed between registers. the source and destination operands are selected independently from the dsp register. these operations use only the top word of the respective operands. the bottom word of the source operand and the guard bits are ignored and the bottom word of the destination operand and guard bits are cleared with zeros. these operations are executed in the dsp stage (the last stage) of the pipeline. whenever an alu arithmetic operation is executed, the dsr registers dc, n, z, v, and gt bits are basically updated by the operation result. for conditional instructions, condition bits and flags are not updated even when the specified condition is achieved and the instruction executed. for unconditional instructions, the bits are always updated according to the operation result. the dc bit is updated as specified in the cs bits. figure 5-7 shows the alu logical operation flowchart. 38 31 0 31 31 0 0 alu gt v n zdc dsr : cleared to 0 : ignored source 1 source 2 guard bits guard bits guard bits destination figure 5-7 alu logical operation flowchart 5.3.2 instructions and operands table 5-5 lists the types of alu logical arithmetic operations. table 5-6 shows the correspondence between the operands and registers, which is the same as for alu fixed decimal point operations. table 5-5 types of alu logical arithmetic operations mnemonic function source 1 source 2 destination pand and sx sy dz por or sx sy dz pxor exclusive or sx sy dz table 5-6 correspondence between operands and registers for alu logical arithmetic operations operand x0 x1 y0 y1 m0 m1 a0 a1 sx yes yes yes yes sy yes yes yes yes dz yes yes yes yes yes yes yes yes note: yes: register can be used with operand. 39 5.3.3 dc bit the dc bit is set in logical operations as follows: carry/borrow mode: cs2Ccs0 = 000: the dc bit is always 0. negative mode: cs2Ccs0 = 001: in this mode, the dc bit is the same as the bit 31 of the operation result. in this mode, the dc bit has the same value as bit n. zero mode: cs2Ccs0 = 010: the dc bit is 1 when the operation result is zero; otherwise, the dc bit is 0. in this mode, the dc bit has the same value as bit z. overflow mode: cs2Ccs0 = 011: the dc bit is always 0. in this mode, the dc bit has the same value as bit v. signed greater than mode: cs2Ccs0 = 100: the dc bit is always 0. in this mode, the dc bit has the same value as bit gt. signed greater than or equal to mode: cs2Ccs0 = 101: the dc bit is always 0. 5.3.4 condition bits the condition bits are set as follows. the n bit is the value of bit 31 of the operation result. the z bit is 1 when the operation result is zero; otherwise, the z bit is 0. the v bit is always 0. the gt bit is always 0. 5.4 fixed decimal point multiplication multiplication in the dsp unit is between signed single-length operands. it is processed in one cycle. when double-length multiplication is needed, use the superh risc engines double-length multiplication. basically, the operation result for multiplication is 32 bits. when a register that has guard bits is specified as the destination operand, it is sign-extended. in the dsp unit, multiplication is a fixed decimal point arithmetic operation, not an integer operation. this means the top words of the constant and multiplicand are entered into the mac operator. in superh risc engine multiplication, the bottom words of the two operands are entered into the mac operator. the operation result thus is different from the superh risc engine. the superh risc engine operation result is matched to the lsb of the destination, while the fixed decimal point multiplication operation result is matched to the msb. the lsb of the operation result in fixed decimal point multiplication is thus always 0. 40 figure 5-8 shows a flowchart of fixed decimal point multiplication. 31 0 31 31 0 0 0 s 0 mac guard bits guard bits guard bits : ignored destination figure 5-8 fixed decimal point multiplication flowchart table 5-7 shows the fixed decimal point multiplication instruction. table 5-8 shows the correspondence between the operands and registers. table 5-7 fixed decimal point multiplication mnemonic function source 1 source 2 destination pmuls signed multiplication se sf dg table 5-8 correspondence between operands and registers for fixed decimal point multiplication operand x0 x1 y0 y1 m0 m1 a0 a1 sx yes yes yes yes sy yes yes yes yes dz yes yes yes yes yes yes yes yes note: yes: register can be used with operand. dsp unit fixed decimal point multiplication completes a single-length 16 bit 16 bit operation in one cycle. other multiplication is the same as in the superh risc engines. 41 multiplication instructions do not update the dc, n, z, v, gt, or any condition bit of the dsr register. the overflow prevention function is valid for dsp unit multiplication. specify it by setting the s bit of the sr register is set to 1. when an overflow or underflow occurs, the operation result value is the maximum or minimum value respectively. in dsp unit fixed decimal point multiplication, overflows only occur for h'8000 h'8000 ((C1.0) (C1.0)). when the s bit is 0, the operation result is h'80000000, which means C1.0 rather than the correct answer of +1.0. when the s bit is 1, the overflow prevention function is engaged and the result is h'007fffffff. 5.5 shift operations the amount of shift in shift operations is specified either through a register or using a direct immediate value. other source operands and destination operands are registers. there are two types of shift operations: arithmetic and logical. table 5-9 shows the operation types. the correspondence between operands and registers is the same as for alu fixed decimal point operations, except for immediate operands. the correspondence is shown in table 5-10. table 5-9 types of shift operations mnemonic function source 1 source 2 destination psha sx, sy, dz arithmetic shift sx sy dz pshl sx, sy, dz logical shift sx sy dz psha #imm, dz arithmetic shift with immediate data dz imm1 dz pshl #imm, dz logical shift with immediate data dz imm1 dz C32 imm1 +32, C16 imm2 +16 table 5-10 correspondence between operands and registers for shift operations operand x0 x1 y0 y1 m0 m1 a0 a1 sx yes yes yes yes sy yes yes yes yes dz yes yes yes yes yes yes yes yes note: yes: register can be used with operand. 5.5.1 arithmetic shift operations function: alu arithmetic shift operations basically work with a 32-bit unit to which 8 guard bits are added for a total of 40 bits. alu fixed decimal point operations are basically performed between registers. when the source operand has no guard bits, the registers sign bit is copied to 42 the guard bits. when the destination operand has no guard bits, the lower 32 bits of the operation result are stored in the destination register. in arithmetic shifts, all bits of the source 1 operand and destination operand are valid. the source 2 operand, which specifies the shift amount, is integer data. the source 2 operand is specified as a register or immediate operand. the valid amount of shift is C32 to +32. negative values are shifts to the right; positive values are shifts to the left. between C64 and +63 can be specified for the source 2 operand, but only C32 to +32 is valid. when an invalid number is specified, the results cannot be guaranteed. when an immediate value is specified for the shift amount, the source 1 operand must be the same as the destination operand. the action of the operation is the same as for fixed decimal point operations and is executed in the dsp stage (the last stage) of the pipeline. whenever an arithmetic shift operation is executed, the dsr registers dc, n, z, v, and gt bits are basically updated by the operation result. this is the same as for alu fixed decimal point operations. for conditional instructions, condition bits are not updated even when the specified condition is achieved and the instruction executed. for unconditional instructions, the bits are always updated according to the operation result. figure 5-9 shows the arithmetic shift operation flowchart. 7g 0g 31 16 15 0 0 3 0 < 0 +32 to ?2 7g 0g 31 23 22 16 15 0 60 imm1 7g 0g 31 16 15 0 dz gt dc znv dsr left shift right shift shift out shift out (copy msb) shift amount data (source 2) update : ignored figure 5-9 arithmetic shift operation flowchart dc bit: the dc bit is set as follows depending on the mode specified by the cs bits: carry/borrow mode: cs2Ccs0 = 000: the dc bit is the operation result, the value of the bit pushed out by the last shift. negative mode: cs2Ccs0 = 001: set to 1 for a negative operation result and 0 for a positive operation result. in this mode, the dc bit has the same value as bit n. 43 zero mode: cs2Ccs0 = 010: the dc bit is 1 when the operation result is zero; otherwise, the dc bit is 0. in this mode, the dc bit has the same value as bit z. overflow mode: cs2Ccs0 = 011: the dc bit is set to 1 by an overflow. in this mode, the dc bit has the same value as bit v. signed greater than mode: cs2Ccs0 = 100: the dc bit is always 0. in this mode, the dc bit has the same value as bit gt. signed greater than or equal to mode: cs2Ccs0 = 101: the dc bit is always 0. condition bits: the condition bits are set as follows: the n bit is the same as the result of the alu fixed decimal point arithmetic operation. it is set to 1 for a negative operation result and 0 for a positive operation result. the z bit is the same as the result of the alu fixed decimal point arithmetic operation. it is set to 1 when the operation result is zero; otherwise, the z bit is 0. the v bit is the same as the result of the alu fixed decimal point arithmetic operation. it is set to 1 for an overflow. the gt bit is always 0. overflow prevention function (saturation operation): when the s bit of the sr register is set to 1, the overflow prevention function is engaged for the alu fixed decimal point arithmetic operation executed by the dsp unit. when the operation result overflows, the maximum (positive) or minimum (negative) value is stored. 5.5.2 logical shift operations function: logical shift operations use the top words of the source 1 operand and the destination operand. as in alu logical operations, the guard bits and bottom word of the operands are ignored. the source 2 operand, which specifies the shift amount, is integer data. the source 2 operand is specified as a register or immediate operand. the valid amount of shift is C16 to +16. negative values are shifts to the right; positive values are shifts to the left. between C32 and +31 can be specified for the source 2 operand, but only C16 to +16 is valid. when an invalid number is specified, the results cannot be guaranteed. when an immediate value is specified for the shift amount, the source 1 operand must be the same as the destination operand. the action of the operation is the same as for fixed decimal point operations and is executed in the dsp stage (the last stage) of the pipeline. whenever a logical shift operation is executed, the dsr registers dc, n, z, v, and gt bits are basically updated by the operation result. this is the same as for alu logical operations. for conditional instructions, condition bits are not updated even when the specified condition is achieved and the instruction executed. for unconditional instructions, the bits are always updated according to the operation result. figure 5-10 shows the logical shift operation flowchart. 44 7g 0g 31 16 15 0 0 3 0 < 0 +16 to ?6 7g 0g 31 23 22 16 15 0 5 0 imm2 7g 0g 31 16 15 0 dz gt dc znv dsr 0 shift out shift out update : ignored : cleared to 0 shift amount data (source 2) left shift right shift figure 5-10 logical shift operation flowchart dc bit: the dc bit is set as follows depending on the mode specified by the cs bits. carry/borrow mode: cs2Ccs0 = 000: the dc bit is the operation result, the value of the bit pushed out by the last shift. negative mode: cs2Ccs0 = 001: in this mode, the dc bit is the same as the bit 31 of the operation result. in this mode, the dc bit has the same value as bit n. zero mode: cs2Ccs0 = 010: the dc bit is 1 when the operation result is all zeros; otherwise, the dc bit is 0. in this mode, the dc bit has the same value as bit z. overflow mode: cs2Ccs0 = 011: the dc bit is always 0. in this mode, the dc bit has the same value as bit v. signed greater than mode: cs2Ccs0 = 100: the dc bit is always 0. in this mode, the dc bit has the same value as bit gt. signed greater than or equal to mode: cs2Ccs0 = 101: the dc bit is always 0. condition bits: the condition bits are set as follows. the n bit is the same as the result of the alu logical operation. it is set to the value of bit 31 of the operation result. the z bit is the same as the result of the alu logical operation. it is set to 1 when the operation result is all zeros; otherwise, the z bit is 0. the v bit is always 0. the gt bit is always 0. 45 5.6 the msb detection instruction 5.6.1 function the msb detection instruction (pdmsb: most significant bit detection) finds the amount of shift for normalizing the data. the operation result is the same as for alu integer operations. basically, the top 16 bits and 8 guard bits are valid for a total 24 bits. when the destination operand is a register that has no guard bits, it is stored in the top 16 bits of the destination register. the msb detection instruction works on all bits of the source operand, but gets its operation result in integer data. this is because the shift amount for normalization must be integer data for the arithmetic shift operation. the action of the operation is the same as for fixed decimal point operations and is executed in the dsp stage (the last stage) of the pipeline. whenever a pdmsb instruction is executed, the dsr registers dc, n, z, v, and gt bits are basically updated by the operation result. for conditional instructions, condition bits are not updated even when the specified condition is achieved and the instruction executed. for unconditional instructions, the bits are always updated according to the operation result. figure 5-11 shows the msb detection instruction flowchart. table 5-11 shows the relationship between source data and destination data. 31 0 31 0 gt v n zdc dsr priority encoder : cleared to 0 guard bits guard bits source 1 or 2 destination figure 5-11 msb detection flowchart 46 table 5-11 relationship between source data and destination data source data guard bits top word bottom word 7g 6g 5gC2g 1g 0g 31 30 29 28 27C4 27C4 3 2 1 0 00 00 000 0 0 000 00 00 000 0 0 001 00 00 000 0 0 01* 00 00 000 0 0 1** 00 00 000 1 * * ** 00 00 001 * * * ** 00 00 01* * * * ** 00 00 1** * * *** 00 01 *** * * *** 01****** **** 10****** **** 11 10 *** * * *** 11 11 0** * * *** 11 11 10* * * * ** 11 11 110 * * * ** 11 11 111 0 * * ** 11 11 111 1 1 0** 11 11 111 1 1 10* 11 11 111 1 1 110 11 11 111 1 1 111 47 table 5-11 relationship between source data and destination data (cont) destination result guard bits top word 7gC0g 31C22 21 20 19 18 17 16 10 hexadecimal all 0 all 0 0 1 1 1 1 1 +31 011 1 1 0+30 011 1 0 1+29 011 1 0 0+28 all 0 all 0 0 0 0 0 1 0 +2 000 0 0 1+1 000 0 0 00 all 1 all 1 1 1 1 1 1 1 C1 111 1 1 0C2 all 1 all 1 1 1 1 0 0 0 C8 111 0 0 0C8 all 1 all 1 1 1 1 1 1 0 C2 111 1 1 1C1 all 0 all 0 0 0 0 0 0 0 0 000 0 0 1+1 000 0 1 0+2 all 0 all 0 0 1 1 1 0 0 +28 011 1 0 1+29 011 1 1 0+30 011 1 1 1+31 note: dont care bits have no effect. 5.6.2 instructions and operands table 5-12 shows the msb detection instruction. the correspondence between the operands and registers is the same as for alu fixed decimal point operations. it is shown in table 5-13. 48 table 5-12 msb detection instruction mnemonic function source 1 source 2 destination pdmsb msb detection sx dz sydz table 5-13 correspondence between operands and registers for msb detection instructions operand x0 x1 y0 y1 m0 m1 a0 a1 sx yes yes yes yes sy yes yes yes yes dz yes yes yes yes yes yes yes yes note: yes: register can be used with operand. 5.6.3 dc bit the dc bit is set as follows depending on the mode specified by the cs bits: carry/borrow mode: cs2Ccs0 = 000: the dc bit is always 0. mode: cs2Ccs0 = 001: set to 1 for a negative operation result and 0 for a positive operation result. in this mode, the dc bit has the same value as bit n. zero mode: cs2Ccs0 = 010: the dc bit is 1 when the operation result is zero; otherwise, the dc bit is 0. in this mode, the dc bit has the same value as bit z. overflow mode: cs2Ccs0 = 011: the dc bit is always 0. in this mode, the dc bit has the same value as bit v. signed greater than mode: cs2Ccs0 = 100: set to 1 for a positive operation result and 0 for a negative operation result. in this mode, the dc bit has the same value as bit gt. signed greater than or equal to mode: cs2Ccs0 = 101: set to 1 for a positive or zero operation result and 0 for a negative operation result. 5.6.4 condition bits the condition bits are set as follows. the n bit is the same as the result of the alu integer operation. it is set to 1 for a negative operation result and 0 for a positive operation result. 49 the z bit is the same as the result of the alu integer operation. it is set to 1 when the operation result is zero; otherwise, the z bit is 0. the v bit is always 0. the gt bit is the same as the result of the alu integer operation. it is set 1 for a positive operation result and otherwise to 0. 5.7 rounding 5.7.1 operation function the dsp unit has a function for rounding 32-bit values to 16-bit values. when the value has guard bits, 40 bits are rounded to 24 bits. when the rounding instruction is executed, h'0000 8000 is added to the source operand and the bottom word is then cleared to zeros. rounding uses all bits of the source and destination operands. the action of the operation is the same as for fixed decimal point operations and is executed in the dsp stage (the last stage) of the pipeline. the rounding instruction is unconditional. the dsr registers dc, n, z, v, and gt bits are thus always updated according to the operation result. figure 5-12 shows the rounding flowchart. figure 5-13 shows the rounding process definitions. 31 0 31 0 alu gt v n zdc dsr : cleared to 0 h'00008000 addition destination source 1 or 2 guard bits guard bits figure 5-12 rounding flowchart 50 h'000002 h'000001 0 h'0000018000 h'0000020000 h'0000028000 rounding result analog values actual value figure 5-13 rounding process definitions 5.7.2 instructions and operands table 5-14 shows the instruction. the correspondence between the operands and registers is the same as for alu fixed decimal point operations. it is shown in table 5-15. table 5-14 rounding instruction mnemonic function source 1 source 2 destination prnd rounding sx dz sydz table 5-15 correspondence between operands and registers for rounding instruction operand x0 x1 y0 y1 m0 m1 a0 a1 sx yes yes yes yes sy yes yes yes yes dz yes yes yes yes yes yes yes yes note: yes: register can be used with operand. 51 5.7.3 dc bit the dc bit is updated as follows depending on the mode specified by the cs bits. condition bits are updated as for alu fixed decimal point arithmetic operations. carry/borrow mode: cs2Ccs0 = 000: the dc bit is set to 1 when a carry or borrow from the msb of the operation result occurs; otherwise, it is set to 0. negative mode: cs2Ccs0 = 001: set to 1 for a negative operation result and 0 for a positive operation result. in this mode, the dc bit has the same value as bit n. zero mode: cs2Ccs0 = 010: the dc bit is 1 when the operation result is zero; otherwise, the dc bit is 0. in this mode, the dc bit has the same value as bit z. overflow mode: cs2Ccs0 = 011: the dc bit is set to 1 by an overflow; otherwise, it is set to 0. in this mode, the dc bit has the same value as bit v. signed greater than mode: cs2Ccs0 = 100: set to 1 for a positive operation result; otherwise, it is set to 0. in this mode, the dc bit has the same value as bit gt. signed greater than or equal to mode: cs2Ccs0 = 101: set to 1 for a positive or zero operation result; otherwise, it is set to 0.. 5.7.4 condition bits the condition bits are set as follows. they are updated as for alu fixed decimal point arithmetic operations. the n bit is the same as the result of the alu fixed decimal point arithmetic operation. it is set to 1 for a negative operation result and 0 for a positive operation result. the z bit is the same as the result of the alu fixed decimal point arithmetic operation. it is set to 1 when the operation result is zero; otherwise, the z bit is 0. the v bit is the same as the result of the alu fixed decimal point arithmetic operation. it is set to 1 for an overflow; otherwise, the v bit is 0. the gt bit is the same as the result of the alu fixed decimal point arithmetic operation and the alu integer operation. it is set 1 for a positive operation result; otherwise, the gt bit is 0. 5.7.5 overflow prevention function (saturation operation) when the s bit of the sr register is set to 1, the overflow prevention function can be specified for all rounding processing executed by the dsp unit. when the operation result overflows, the maximum (positive) or minimum (negative) value is stored. 52 5.8 condition select bits (cs) and the dsp condition bit (dc) dsp instructions may be either conditional or unconditional. unconditional instructions are executed without regard to the dsp condition bit (dc bit), but conditional instructions may reference the dc bit before they are executed. with unconditional instructions, the dsr registers dc bit and condition bits (n, z, v, and gt) are updated according to the results of the alu operation or shift operation. the dc bit and condition bits (n, z, v, and gt) are not updated regardless of whether the conditional instruction is executed. the dc bit is updated according to the specifications of the condition select (cs) bits. updates differ for arithmetic operations, logical operations, arithmetic shifts and logical shifts. table 5-16 shows the relationship between the cs bits and the dc bit. 53 table 5-16 condition select bits (cs) and dsp condition bit (dc) cs bits 2 1 0 condition mode description 0 0 0 carry/borrow the dc bit is set to 1 when a carry or borrow occurs in the result of an alu arithmetic operation. otherwise, it is cleared to 0. in logical operations, the dc bit is always cleared to 0. for shift operations (the psha and pshl instructions), the bit shifted out last is copied to the dc bit. 0 0 1 negative in alu arithmetic operations or arithmetic shifts (psha), the msb of the result (including the guard bits) is copied to the dc bit. in alu logical operations and logical shifts (pshl), the msb of the result (not including the guard bits) is copied to the dc bit. 0 1 0 zero when the result of an alu or shift operation is all zeros (0), the dc bit is set to 1. otherwise, it is cleared to 0. 0 1 1 overflow in alu arithmetic operations or arithmetic shifts (psha), when the operation result (not including the guard bits) exceeds the destination registers value range, the dc bit is set to 1. otherwise, it is cleared to 0. in alu logical operations and logical shifts (pshl), the dc bit is always cleared to 0. 1 0 0 signed greater than this mode is like the greater than or equal to mode, but the dc bit is cleared to 0 when the operation result is zero (0). when the operation result (including the guard bits) exceeds the expressible limits, the true condition is vr. dc bit = ~{(n bit ^ vr)|z bit)}; for arithmetic operations dc bit = 0; for logical operations 1 0 1 greater than or equal to in alu arithmetic operations or arithmetic shifts (psha), when the result does not overflow, the value is the inversion of the negative modes dc bit. when the operation result (including the guard bits) exceeds the expressible limits, the value is the same as the negative modes dc bit. in alu logical operations and logical shifts (pshl), the dc bit is always cleared to 0. dc bit = ~(n bit ^ vr)); for arithmetic operations dc bit = 0; for logical operations 1 1 0 reserved 111 54 5.9 overflow prevention function (saturation operation) the overflow prevention function (saturation operation) is specified by the s bit of the sr register. this function is valid for arithmetic operations executed by the dsp unit and multiply and accumulate operations executed by the cpu core. an overflow occurs when the operation result exceeds the bounds that can be expressed as a twos complement (not including the guard bits). table 5-17 shows the overflow definitions for fixed decimal point arithmetic operations. table 5- 18 shows the overflow definitions for integer arithmetic operations. multiply/accumulate calculation instructions (mac) supported by previous superh risc engines are performed on 64- bit registers (mach and macl), so the overflow value differs from the maximum and minimum values. they are defined exactly the same as before. table 5-17 overflow definitions for fixed decimal point arithmetic operations sign overflow condition maximum/ minimum hexadecimal display positive result > 1C2 C31 1C2 C31 007fffffff negative result < C1 C1 ff80000000 table 5-18 overflow definitions for integer arithmetic operations sign overflow condition maximum/ minimum hexadecimal display positive result > 2 C15 C 1 2 C15 C 1 007fff**** negative result < C2 C15 C2 C15 ff8000**** note: dont care bits have no effect. when the overflow prevention function is specified, overflows do not occur. naturally, the overflow bit (v bit) is not set. when the cs bits specify overflow mode, the dc bit is not set either. 5.10 data transfers the sh3-dsp can perform up to two data transfers in parallel between the dsp register and on- chip memory with the dsp unit. the sh-dsp has the following types of data transfers: 1. x and y memory data transfers: data transfer to x and y memory using the xdb and ydb buses double data transfer: data transfer only, where transfer in one direction only is permitted parallel data transfers: data transfer that proceeds in parallel to alu operation processing 55 2. single data transfers: data transfer to on-chip memory using the ldb bus note: data transfer instructions do not update the dsr registers condition bits. table 5-19 shows the various functions. table 5-19 data transfer functions category bus length parallel processing with alu operation parallel processing with data transfer instruction length x and y memory data transfer xdb bus ydb bus 16 bits none (double) none (xdb or ydb bus) 16 bits available (xdb and ydb bus) 16 bits available (parallel) none (xdb or ydb bus) 32 bits available (xdb and ydb bus) 32 bits single data transfer ldb bus 32 bits 16 bits none none 16 bits 5.10.1 x and y memory data transfer x and y memory data transfers allow two data transfers to be executed in parallel and allow data transfers to be executed in parallel with dsp data operations. 32-bit instruction code is required for executing dsp data operations and transfers in parallel. this is called a parallel data transfer. when executing an x and y memory data transfer by itself, 16-bit instruction code is used. this is called a double data transfer. data transfers consist of x memory data transfers and y memory data transfers. x memory data is loaded to either the x0 or x1 register; y memory data is loaded to the y0 or y1 register. the x0, x1, y0, and y1 registers become the destination registers. data can be stored in the x and y memory if the a0 or a1 register is the source register. all these data transfers involve word data (16 bits). data is transferred from the top word of the source register. data is transferred to the top word of the destination register and the bottom word is automatically cleared with zeros. specifying a conditional instruction as the operation instruction executed in parallel has no effect on the data transfer instructions. x and y memory data transfers access only the x and y memory; they cannot access other memory areas. 56 x pointer (r4, r5) y pointer (r6, r7) xab[15:1] yab[15:1] 0, +2, +r8 0, +2, +r9 xdb[15:0] ydb[15:0] x0 x1 a0 a1 y0 y1 x memory (ram, rom) y memory (ram, rom) : cannot be set : not affected for storing; cleared for loading m0 m1 a1g dsr a0g figure 5-14 flowchart of x and y memory data transfers 5.10.2 single data transfers single data transfers execute only one data transfer. they use 16-bit instruction code. single data transfers cannot be processed in parallel with alu operations. the x pointer, which accesses x memory, and two added pointers are valid; the y pointer is not valid. as with the superh risc engine, single data transfers can access all memory areas, including external memory. except for the dsr register, the dsp registers can be specified as source and destination operands. (the dsr register is defined as the system register, so it can transfer data with lds and sts instructions.) the guard bit registers a0g and a1g can be specified for operands as independent registers. single data transfers use the lab and ldb buses in place of the xab, xdb, yab, and ydb buses, so contention occurs on the ldb bus between data transfers and instruction fetches. single data transfers handle word and longword data. word data transfers involve only the top word of the register. when data is loaded to a register, it goes to the top word and the bottom word is automatically filled with zeros. if there are guard bits, the sign bit is extended to fill them. when storing from a register, the top word is stored. when a longword is transferred, 32 bits are valid. when loading a register that has guard bits, the sign bit is extended to fill the guard bits. 57 when a guard bit register is stored, the top 24 bits become undefined, and the read out is to the ldb bus. when the guard bit registers a0g and a1g load word data as the destination registers of the movs.w instruction, the bottom byte is written to the register. pointer (r2, r3, r4, r5) lab[31:0] ?, 0, +2, +r8 ldb[15:0] x0 x1 a0 a1 y0 y1 cannot be set not affected for storing; cleared for loading. see the text for information about a0g and a1g. m0 m1 a1g dsr a0g all memory areas : : figure 5-15 single data transfer flowchart (word) 58 pointer (r2, r3, r4, r5) lab[31:0] ?, 0, +4, +r8 ldb[31:0] x0 x1 a0 a1 y0 y1 : cannot be set m0 m1 a1g dsr a0g all memory areas figure 5-16 single data transfer flowchart (longword) data transfers are executed in the ma stage of the pipeline while dsp operations are executed in the dsp stage. since the next data store instruction starts before the data operation instruction has finished, a stall cycle is inserted when the store instruction comes on the instruction line after the data operation instruction. this overhead cycle can be avoided by adding one instruction between the data operation instruction and the data transfer instruction. figure 5-17 shows an example. 59 123456 movx movx, add if id if id ex (ad- dressing) dsp (nop) movx.w a0, @r4+ movx.w @r5, x1 movx.w a0, @r4+ padd x0, y0, a0 slot if id movx movx movx dsp (nop) 7 add movx insert an unrelated step between data operation instruction and store instruction. ex (ad- dressing) ex (ad- dressing) figure 5-17 example of the execution of operation and data store instructions 5.11 operand contention data contention occurs when the same register is specified as the destination operand for two or more parallel processing instructions. it occurs in three cases. 1. when the same destination operand is specified for an alu operation and multiplication (du, dg) 2. when the same destination operand is specified for an x memory load and an alu operation (dx, du, dz) 3. when the same destination operand is specified for a y memory load and an alu operation (dx, du, dz) results cannot be guaranteed when contention occurs. table 5-20 shows the operand and register combinations that cause contention. some assemblers can detect these types of contention, so pay attention to assembler functions when selecting one. 60 table 5-20 operand and register combinations that create contention dsp register operation operand x0 x1 y0 y1 m0 m1 a0 a1 x memory ax load ix dx * 2 * 2 y memory ay load iy dy * 3 * 3 6-operand alu sx * 1 * 1 * 1 * 1 operation sy * 1 * 1 * 1 * 1 du * 2 * 3 * 4 * 4 3-operand se * 1 * 1 * 1 * 1 multiplication sf * 1 * 1 * 1 * 1 dg * 1 * 1 * 4 * 4 3-operand alu sx * 1 * 1 * 1 * 1 operation sy * 1 * 1 * 1 * 1 dz * 2 * 2 * 3 * 3 * 1 * 1 * 1 * 1 notes: 1. register is settable for the operand 2. dx, du, and dz contend 3. dy, du, and dz contend 4. du and dg contend 5.12 dsp repeat (loop) control the sh3-dsp repeat (loop) control function is a special utility for controlling repetition efficiently. the setrc instruction is executed to hold a repeat count in the repeat counter (rc, 12 bits) and set an execution mode in which the repeat (loop) program is repeated until the rc is 1. upon completion of the repeat operation, the content of the rc becomes 0. the repeat start register (rs) holds the start address of the repeated section. the repeat end register (re) holds the ending address of the repeated section. (there are some exceptions. refer to note 1, actual programming, in this section [below figure 5-18].) the repeat counter (rc) holds the repeat count. the procedure for executing repeat control is shown below: 1. set the repeat start address in the rs register. 2. set the repeat end address in the re register. 3. set the repeat count in the rc counter. 61 4. execute the repeated program (loop). the following instructions are used for executing 1 and 2: ldrs @(disp,pc); ldre @(disp,pc); the setrc instruction is used to execute 3 and 4. immediate data or a general register may be used to specify the repeat count as the operand of the setrc instruction: setrc #imm; #imm ? rc, enable repeat control setrc rm; rm ? rc, enable repeat control #imm is 8 bits and the rc counter is 12 bits, so to set the rc counter to a value of 256 or greater, use the rm register. a sample program is shown below. ldrs rptstart; ldre rptend; setrc #imm; rc=#imm instr0; ; instr1~5 executes repeatedly rptstart: instrl; instr2; instr3; instr4; rptend: instr5; instr6; there are several restrictions on repeat control: 1. at least one instruction must come between the setrc instruction and the first instruction of the repeat program (loop). 2. execute the setrc instruction after executing the ldrs and ldre instructions. 3. when there are more than four instructions for the repeat program (loop) and there is no repeat start address (in the above example, it was address instr1) at the long word boundary, one cycle stall (cycle awaiting execution) is required for each repeat. 4. when there are three or fewer instructions in the loop, branch instructions (bra, bsr, bt, bf, bt/s, bf/s, bsrf, rts, braf, rte, jsr, jmp), repeat control instructions (setrc, ldrs, ldre), sr, rs, and re load instructions, and trapa cannot be used. if such an instruction is used, illegal instruction exception handling starts and the address values shown in table 5-21 are stored in spc. 62 table 5-21 pc values address stored in spc (1) conditions position address stored in spc rc>=2 any rptstart rc=1 any program address of illegal instruction 5. if there are four or fewer instructions in the loop, branched instructions (bra, bsr, bt, bf, bt/s, bf/s, bsrf, rts, braf, rte, jsr, jmp), repeat control instructions (setrc, ldrs, ldre), sr, rs, and re load instructions, and trapa cannot be used for the last three instructions in the repeat program (loop). if such an instruction is used, illegal instruction exception handling starts and the address values shown in table 5-22 are stored in spc. in case of repeat control instruction (setrc, ldrs, ldre), and sr, rs, and re load instructions, they cannot be described in positions other than the repeat module. if described, proper operation cannot be guaranteed. table 5-22 pc values address stored in spc (2) conditions position address stored in spc rc>=2 instr3 program address of illegal instruction instr4 rptstart-4 instr5 rptstart-2 rc=1 any program address of illegal instruction 6. when there are three or fewer instructions in the loop, pc relative instructions (mova (disp,pc), r0, or the like) can only be used at the first instruction (instr1). 7. if there are four or more instructions in the loop, pc relative instructions (mova (disp,pc), r0, or the like) cannot be used in the final two instructions. 8. the sh3-dsp does not have a repeat valid flag; repeats become invalid when the rc counter becomes 0. when the rc counter is not 0 and the pc counter matches the re register contents, repeating begins. when the rc counter is set to 0, the repeat program (loop) is invalid but the loop is executed only once and does not return to the starting instruction of the loop as when rc is 1. when the rc counter is set to 1, the repeat module is executed only once. though it does not return to the repeat program (loop) start instruction, the rc counter becomes zero when the repeat module is executed. 9. if there are four or more instructions in the loop, the branched instructions including the subroutine call back and return instructions cannot be used for the inst3 through inst5 instructions as branch destination address. if they are executed, the repeat control does not work correctly. if a repeating portion of a program (a loop) contains three or more instructions and the branching destination is rptstart or an address ahead of it, repeat control does not work properly and the content of rc in the sr register is not updated. 63 10. while the repeat is being executed, interruption is restricted. figure 5-18 shows the flow for each stage of ex. the initial ex stage of interruption is usually started immediately after the ex stage of the instruction is completed (indicated by a). "b" in the figure below indicates locations where no interruption is accepted. 1-step repeat 2-step repeat 3-step repeat a: interruption is accepted. b: no interruption is accepted. when rc>=1 more than 4 steps repeat start(end): instr0 instr1 instr2 ? a ? b ? b ? a start: end: instr0 instr1 instr2 instr3 ? a ? b ? b ? b ? a start: end: instr0 instr1 instr2 instr3 instr4 ? a ? b ? b ? b ? b ? a start: end: instr0 instr1 instr n-3 instr n-2 instr n-1 instr n instr n+1 ? a ? a or b (when returning from instr n) ? a ? a ? b ? b ? b ? b ? a when rc=0: interruption is accepted. : : : figure 5-18 restriction on acceptance of interruption by repeat module 64 5.12.1 usage notes note 1. actual programming the repeat start register (rs) and repeat end register (re) store the repeat start address and repeat end address respectively. addresses stored in these registers are changed depending on the number of instructions in the repeat program (loop). this rule is shown below. repeat_start: address of repeat start instruction repeat_start0: address of instruction one higher than the repeat start instruction repeat_start3: address of instruction three higher than the repeat end instruction table 5-23 rs and re setup rule number of instructions in repeat program (loop) register 1 2 3 >=4 rs repeat_start0+8 repeat_start0+6 repeat_start0+4 repeat_start re repeat_start0+4 repeat_start0+4 repeat_start0+4 repeat_end3+4 an example of an actual repeat program (loop) assuming various cases based on the above table is given below: case 1: one repeat instruction ldrs rptstart0+8; ldre rptstart0+4; setrc rptcount; ---- rptstart0: instr0; rtpstart: instr1; repeat instruction instr2; case 2: two repeat instructions ldrs rptstart0+6; ldre rptstart0+4; setrc rptcount; ---- rptstart0: instr0; rtpstart: instr1; repeat instruction 1 rptend: instr2; repeat instruction 2 instr3; 65 case 3: three repeat instructions ldrs rptstart0+4; ldre rptstart0+4; setrc rptcount; ---- rptstart0: instr0; rtpstart: instr1; repeat instruction 1 instr2; repeat instruction 2 rptend: instr3; repeat instruction 3 instr4; case 4: four or more instructions ldrs rptstart; ldre rptstart3+4; setrc rptcount; ---- rptstart0: instr0; rtpstart: instr1; repeat instruction 1 instr2; repeat instruction 2 instr3; repeat instruction 3 ----------------------------------------- rptend3: instrn-3; repeat instruction n instrn-2; repeat instruction n-2 instrn-1; repeat instruction n-1 rptend: instrn; repeat instruction n instrn+1 the above example can be used as a template when programming this repeat program (loop) sequence. extension instruction repeat can simplify the problems of such complicated labeling and offset. details are described in note 2 below. note 2. extension instruction repeat the extension instruction repeat can simplify the handling of the labeling and offset described in table 5-23. labels used are shown below. rptstart: rptstart: address of first instruction of repeat program (loop) rptend: address of last instruction of repeat program (loop) pptcount: repeat count immediate no. use this instruction as described below. 66 repeat count can be designated as immediate value #imm or register indirect value rn. case 1: one repeat instruction repeat rptstart, rptend, rptcount ---- instr0; rptstart: instr1; repeat instruction 1 instr2; case 2: two repeat instructions repeat rptstart, rptend, rptcount ---- instr0; rptstart: instr1; repeat instruction 1 rptend: instr2; repeat instruction 2 case 3: three repeat instructions repeat rptstart, rptend, rptcount ---- instr0; rptstart: instr1; repeat instruction 1 instr2; repeat instruction 2 rptend: instr3; repeat instruction 3 case 4: four or more instructions repeat rptstart, rptend, rptcount ---- instr0; rtpstart: instr1; repeat instruction 1 instr2; repeat instruction 2 instr3; repeat instruction 3 ----------------------------------------- instrn-3; repeat instruction n instrn-2; repeat instruction n-2 instrn-1; repeat instruction n-1 rptend: instrn; repeat instruction n instrn+1 67 result of extension of each case corresponds to the case 1 in note 1. 5.13 conditional instructions and data transfers data operation instructions include both unconditional and conditional instructions. data transfer instructions that execute both in parallel can be specified, but they will always execute regardless of whether the condition is met without affecting the data transfer instruction. the following is an example of a conditional instruction and a data transfer: dct padd x0, y0, a0 movx.w @r4+, x0 movy.w a0, @r6+r9; when condition is true: before execution: x0=h'33333333, y0=h'55555555, a0=h'123456789a, r4=h'00008000, r6=h'00008232, r1=h'00000004 (r4)=h'1111, (r6)=h'2222 after execution: x0=h'11110000, y0=h'55555555, a0=h'0088888888, r4=h'00008002, r6=h'00008236, r1=h'00000004 (r4)=h'1111, (r6)=h'1234 when condition is false: before execution: x0=h'33333333, y0=h'55555555, a0=h'123456789a, r4=h'00008000, r6=h'00008232, r1=h'00000004 (r4)=h'1111, (r6)=h'2222 after execution: x0=h'11110000, y0=h'55555555, a0=h'123456789a, r4=h'00008002, r6=h'00008236, r1=h'00000004 (r4)=h'1111, (r6)=h'1234 68 69 section 6 instruction features 6.1 risc-type instruction set all instructions are risc type. their features are detailed in this section. 6.1.1 16-bit fixed length in the sh-3 cpu all instructions have a fixed length of 16 bits. this contributes to increased code efficiency. like sh-3, the sh-3dsp has 16-bit instructions, but additional 32-bit dsp instructions are provided to allow parallel processing of dsp instructions. for details on the dsp, see 5. dsp operations and data transfer. 6.1.2 one instruction/cycle basic instructions can be executed in one cycle using the pipeline system. 6.1.3 data length longword is the standard data length for all operations. memory can be accessed in bytes, words, or longwords. byte or word data accessed from memory is sign-extended and handled as longword data (table 6-1). immediate data is sign-extended for arithmetic operations or zero-extended for logic operations. it also is handled as longword data. table 6-1 sign extension of word data sh-3/sh-3e/sh3-dsp cpu description example for conventional cpu mov.w @(disp,pc),r1 add r1,r0 ......... .data.w h'1234 data is sign-extended to 32 bits, and r1 becomes h'00001234. it is next operated upon by an add instruction. add.w #h'1234,r0 note: the address of the immediate data is accessed by @(disp, pc). 6.1.4 load-store architecture basic operations are executed between registers. for operations that involve memory access, data is loaded to the registers and executed (load-store architecture). instructions such as and that manipulate bits, however, are executed directly in memory. 70 6.1.5 delayed branch instructions unconditional branch instructions are delayed. pipeline disruption during branching is reduced by first executing the instruction that follows the branch instruction, and then branching (table 6-2). table 6-2 delayed branch instructions sh-3/sh-3e/sh3-dsp cpu description example for conventional cpu bra trget add r1,r0 executes an add before branching to trget. add.w r1,r0 bra trget 6.1.6 multiplication/accumulation operation multiplication of two 16-bit values to produce a 32-bit result is executed in one to three cycles (one to two cycles for the sh3-dsp), and multiplication of two 32-bit values to produce a 64-bit result is executed in two to five cycles (two to three cycles for the sh3-dsp). multiplication/accumulation, in which two 32-bit values are multiplied and one 32-bit value is added, is executed in two to five cycles (two to four cycles for the sh3-dsp) when the mac instruction is used and in one system when the fmac instruction* is used. note: the fmac instruction is only available on the sh-3e (floating point calculation instruction). 6.1.7 t bit the t bit in the status register changes according to the result of the comparison, and in turn is the condition (true/false) that determines if the program will branch (table 6-3). the number of instructions after t bit in the status register is kept to a minimum to improve the processing speed. table 6-3 t bit sh-3/sh-3e/sh3-dsp cpu description example for conventional cpu cmp/ge r1,r0 bt trget0 bf trget1 t bit is set when r0 3 r1. the program branches to trget0 when r0 3 r1 and to trget1 when r0 < r1. cmp.w r1,r0 bge trget0 blt trget1 add #C1,r0 cmp/eq #0,r0 bt trget t bit is not changed by add. t bit is set when r0 = 0. the program branches if r0 = 0. sub.w #1,r0 beq trget 71 6.1.8 immediate data byte immediate data is located in instruction code. word or longword immediate data is not input via instruction codes but is stored in a memory table. the memory table is accessed by an immediate data transfer instruction (mov) using the pc relative addressing mode with displacement (table 6-4). table 6-4 immediate data accessing classification sh-3/sh-3e/sh3-dsp cpu example for conventional cpu 8-bit immediate mov #h'12,r0 mov.b #h'12,r0 16-bit immediate mov.w @(disp,pc),r0 ................. .data.w h'1234 mov.w #h'1234,r0 32-bit immediate mov.l @(disp,pc),r0 ................. .data.l h'12345678 mov.l #h'12345678,r0 note: the address of the immediate data is accessed by @(disp, pc). 6.1.9 absolute address when data is accessed by absolute address, the value already in the absolute address is placed in the memory table. loading the immediate data when the instruction is executed transfers that value to the register and the data is accessed in the indirect register addressing mode. table 6-5 absolute address classification sh-3/sh-3e/sh3-dsp cpu example for conventional cpu absolute address mov.l @(disp,pc),r1 mov. b @r1,r0 .................. .data.l h'12345678 mov.b @h'12345678,r0 72 6.1.10 16-bit/32-bit displacement when data is accessed by 16-bit or 32-bit displacement, the pre-existing displacement value is placed in the memory table. loading the immediate data when the instruction is executed transfers that value to the register and the data is accessed in the indirect indexed register addressing mode. table 6-6 16-bit/32-bit displacement classification sh-3/sh-3e/sh3-dsp cpu example for conventional cpu 16-bit displacement mov.w @(disp,pc),r0 mov.w @(r0,r1),r2 .................. .data.w h'1234 mov.w @(h'1234,r1),r2 6.1.11 privileged instructions the processor has two operation modes (user/privileged). if these instructions are used in user mode, an illegal instruction exception is detected. privileged instructions are: ? ldc ? stc ? rte ? ldtlb ? sleep 73 6.2 cpu instruction addressing modes addressing modes and effective address calculation are described in table 6-7. table 6-7 addressing modes and effective addresses addressing mode instruction format effective addresses calculation equation direct register addressing rn the effective address is register rn. (the operand is the contents of register rn.) indirect register addressing @rn the effective address is the content of register rn. rn rn rn post- increment indirect register addressing @rn + the effective address is the content of register rn. a constant is added to the content of rn after the instruction is executed. 1 is added for a byte operation, 2 for a word operation, and 4 for a longword operation. rn rn 1/2/4 + rn + 1/2/4 rn (after the instruction is executed) byte: rn + 1 ? rn word: rn + 2 ? rn longword: rn + 4 ? rn pre- decrement indirect register addressing @Crn the effective address is the value obtained by subtracting a constant from rn. 1 is subtracted for a byte operation, 2 for a word operation, and 4 for a longword operation. rn 1/2/4 rn ?1/2/4 rn ?1/2/4 byte: rn C 1 ? rn word: rn C 2 ? rn longword: rn C 4 ? rn (instruction executed with rn after calculation) : effective address 74 table 6-7 addressing modes and effective addresses (cont) addressing mode instruction format effective addresses calculation equation indirect register addressing with displace- ment @(disp:4, rn) the effective address is rn plus a 4-bit displacement (disp). the value of disp is zero-extended, and remains the same for a byte operation, is doubled for a word operation, and is quadrupled for a longword operation. rn 1/2/4 + disp (zero-extended) rn + disp 1/2/4 byte: rn + disp word: rn + disp 2 longword: rn + disp 4 indirect indexed register addressing @(r0, rn) the effective address is the rn value plus r0. rn r0 rn + r0 + rn + r0 indirect gbr addressing with displace- ment @(disp:8, gbr) the effective address is the gbr value plus an 8-bit displacement (disp). the value of disp is zero- extended, and remains the same for a byte operation, is doubled for a word operation, and is quadrupled for a longword operation. gbr 1/2/4 + disp (zero-extended) gbr + disp 1/2/4 byte: gbr + disp word: gbr + disp 2 longword: gbr + disp 4 indirect indexed gbr addressing @(r0, gbr) the effective address is the gbr value plus the r0. gbr r0 gbr + r0 + gbr + r0 75 table 6-7 addressing modes and effective addresses (cont) addressing mode instruction format effective addresses calculation equation indirect pc addressing with displace- ment @(disp:8, pc) the effective address is the pc value plus an 8-bit displacement (disp). the value of disp is zero- extended, and remains the same for a byte operation, is doubled for a word operation, and is quadrupled for a longword operation. for a longword operation, the lowest two bits of the pc are masked. pc h'fffffffc + 2/4 x & (for longword) disp (zero-extended) pc + disp 2 or pc&h'fffffffc + disp 4 word: pc + disp 2 longword: pc & h'fffffffc + disp 4 pc relative addressing disp:8 the effective address is the pc value sign-extended with an 8-bit displacement (disp), doubled, and added to the pc. pc 2 + disp (sign-extended) pc + disp 2 pc + disp 2 disp:12 the effective address is the pc value sign-extended with a 12-bit displacement (disp), doubled, and added to the pc. pc 2 + disp (sign-extended) pc + disp 2 pc + disp 2 76 table 6-7 addressing modes and effective addresses (cont) addressing mode instruction format effective addresses calculation equation pc relative addressing (cont) rn the effective address is the register pc plus r0. pc rn + pc + rn pc + r0 immediate addressing #imm:8 the 8-bit immediate data (imm) for the tst, and, or, and xor instructions are zero-extended. #imm:8 the 8-bit immediate data (imm) for the mov, add, and cmp/eq instructions are sign-extended. #imm:8 immediate data (imm) for the trapa instruction is zero-extended and is quadrupled. 6.3 dsp data addressing (sh3-dsp only) the dsp command performs two different types of memory accesses. one uses the x and y data transfer instructions (movx.w and movy.w) while the other uses the single data transfer instructions (movs.w and movs.l). data addressing for these two types of instructions also differs. table 6-8 summarizes the data transfer instructions. 77 table 6-8 summary of data transfer instructions item x and y data transfer processing (movx.w and movy.w) single data transfer processing (movs.w and movs.l) address registers ax: r4, r5; ay: r6, r7 as: r2, r3, r4, r5 index registers ix: r8; iy: r9 is: r8 addressing nop/inc(+2)/index addition: post updating nop/inc(+2, +4)/index addition: post updating dec(C2, C4): pre updating modulo addressing available not available data buses xdb, ydb ldb data length 16 bits (word) 16 or 32 bits (word or longword) bus contention none occurs memory x and y data memories all memory spaces source registers dx, dy: a0, a1 ds: a0/a1, m0/m1, x0/x1, y0/y1, a0g, a1g destination registers dx: x0/x1; dy: y0/y1 ds: a0/a1, m0/m1, x0/x1, y0/y1, a0g, a1g 6.3.1 x and y data addressing the dsp command allows x and y data memories to be accessed simultaneously using the movx.w and movy.w instructions. dsp instructions have two pointers so they can access the x and y data memories simultaneously. dsp instructions have only pointer addressing; immediate addressing is not available. address registers are divided in two. the r4 and r5 registers become the x memory address register (ax) while the r6 and r7 registers become the y memory address register (ay). the following three types of addressing may be used with x and y data transfer instructions. address registers with no update: the ax and ay registers are address pointers. they are not updated. addition index register addressing: the ax and ay registers are address pointers. the values of the ix and iy registers are added to the ax and ay registers respectively after data transfer (post updating). increment address register addressing: the ax and ay registers are address pointers. +2 is added to them after data transfer (post updating). 78 each of the address pointers has an index register. register r8 becomes the index register (ix) for the x memory address register (ax); register r9 becomes the index register (iy) for the y memory address register (ay). x and y data transfer instructions are processed in words. x and y data memory is accessed in 16 bit units. increment processing for that purpose adds two to the address register. to decrement them, set -2 in the index register and specify addition index register addressing. figure 6-1 shows the x and y data transfer addressing. alu au* 1 r8[ix] r4[ax] r5[ax] r9[iy] r6[ay] r7[ay] +2 (inc) +2 (inc) +0 (no update) +0 (no update) notes: 1. 2. adder added for dsp processing all three addressing methods (increment, index register addition (ix, iy), and no update) are post-updating methods. to decrement the address pointer, set the index register to ? or ?. figure 6-1 x and y data transfer addressing 6.3.2 single data addressing the dsp command has single data transfer instructions (movs.w and movs.l) that load data to dsp registers and store data from dsp registers. with these instructions, the r2Cr5 registers are used as address registers (as) for single data transfers. there are four types of data addressing for single data transfer instructions. address registers with no update: the as register is the address pointer. it is not updated. addition index register addressing: the as register is the address pointer. the value of the is register is added to the as register after data transfer (post updating). increment address register addressing: the as register is the address pointer. +2 or +4 is added to it after data transfer (post updating). 79 decrement address register addressing: the as register is the address pointer. C2 or C4 (or +2 or +4) is added to it before data transfer (pre updating). the address pointer uses the r8 register as its index register (is). figure 6-2 shows the single data transfer addressing. alu r8[is] r4[as] r5[as] +2/+4 (inc) +0 (no update) note: there are four addressing methods (no update, index register addition (is), increment, and decrement). index register addition and increment are post-updating methods. decrement is a pre-updating method. r3[as] r2[as] ?/? (dec) figure 6-2 single data transfer addressing 6.3.3 modulo addressing like other dsps, the sh3-dsp has a modulo addressing mode. address registers are updated in the same way in this mode. when a modulo end address in which the address pointer value is already set is reached, the address pointer becomes the modulo start address. modulo addressing is only effective for x and y data transfer instructions (movx.w and movy.w). when the dmx bit of the sr register is set, the x address register enters modulo addressing mode; when the dmy bit is set, the y address register enters modulo addressing mode. modulo addressing cannot be used on both x and y address registers at once. accordingly, do not set dmx and dmy at the same time. should they both be set at once, only dmy will be valid. the mod register is provided for specifying the start and end addresses for the modulo address area. the mod register stores the ms (modulo start) and me (modulo end). the following shows how to use the modulo register (ms and me). 80 mov.l modaddr,rn; rn=modend, modstart ldc rn,mod; me=modend, ms=modstart modaddr: .data.w mend; lower 8bit of modend .data.w mstart; lower 8bit of modstart modstart: .data : modend: .data set the start and end addresses in ms and me and then set the dmx or dmy bit to 1. the address register contents are compared to me. if they match me, the start address ms is stored in the address register. the bottom 16 bits of the address register are compared to me. the maximum modulo size is 64 kbytes. this is ample for accessing the x and y data memory. figure 6-3 shows a block diagram of modulo addressing. instruction (movx/movy) dmx cont ms cmp me alu au abx aby r4[ax] r6[ay] r5[ax] r7[ay] r8[ix] r9[iy] dmy 31 0 0 0 0 0 0 16 16 15 15 15 31 31 31 +2 +0 +2 +0 15 15 1 1 xab ya b 15 figure 6-3 modulo addressing the following is an example of modulo addressing. ms=h'08; me=h'0c; r4=h'c008; dmx=1; dmy=0; (sets modulo addressing for address register ax (r4, r5)) the above setting changes the r4 register as shown below. 81 r4: h'c008 inc. r4: h'c00a inc. r4: h'c00c inc. r4: h'c008 (becomes the modulo start address when the modulo end address is reached) place data so the top 16 bits of the modulo start and end address are the same, since the modulo start address only swaps the bottom 16 bits of the address register. note: when using addition index as the dsp data addressing, the address pointer may exceed this value without matching me. should this occur, the address pointer will not return to the modulo start address. 6.3.4 dsp addressing operation the following shows how dsp addressing works in the execution stage (ex) of a pipeline (including modulo addressing). if ( operation is movx.w movy.w ) { abx=ax; aby=ay /* memory access cycle uses abx and aby. the addresses to be used have not been updated */ /* ax is one of r4,5 */ if ( dmx==0 || dmx==1 @@ dmy==1 )} ax=ax+(+2 or r8[ix} or +0); /* inc,index,not-update */ else if (!not-update) ax=modulo( ax, (+2 or r8[ix]) ); /* ay is one of r6,7 */ if ( dmy==0 ) ay=ay+(+2 or r9[iy] or +0; /* inc,index,not-update */ else if (! not-update) ay=modulo( ay, (+2 or r9[iy]) ); } else if ( operation is movs.w or movs.l ) { if ( addressing is nop, inc, add-index-reg ) { mab=as; /* memory access cycle uses mab. the address to be used has not been updated */ /* as is one of r2C5 */ as=as+(+2 or +4 or r8[is] or +0); /* inc.index,not-update */ else { /* decrement, pre-update */ /* as is one of r2C5 */ as=as+(C2 or C4); 82 mab=as /* memory access cycle uses mab. the address to be used has been updated */ } /* the value to be added to the address register depends on addressing operations. for example, (+2 or r8[ix] or +0) means that +2: if operation is increment r8[ix}: if operation is add-index-reg +0: if operation is not-update /* function modulo ( addrreg, index ) { if ( adrreg[15:0]==me ) adrreg[15:0]==ms; else adrreg=adrreg+index return addrreg; } 83 6.4 instruction format of cpu instructions the instruction format table, table 6-8, refers to the source operand and the destination operand. the meaning of the operand depends on the instruction code. the symbols are used as follows: ? xxxx: instruction code ? mmmm: source register ? nnnn: destination register ? iiii: immediate data ? dddd: displacement table 6-9 instruction formats instruction formats source operand destination operand example 0 format xxxx xxxx xxxx xxxx 15 0 nop n format nnnn: direct register movt rn xxxx xxxx xxxx nnnn 15 0 control register or system register nnnn: direct register sts mach,rn control register or system register nnnn: indirect pre- decrement register stc.l sr,@-rn m format mmmm: direct register control register or system register ldc rm,sr xxxx mmmm xxxx xxxx 15 0 mmmm: indirect post-increment register control register or system register ldc.l @rm+,sr mmmm: direct register jmp @rm mmmm: pc relative using rm braf rm 84 table 6-9 instruction formats (cont) instruction formats source operand destination operand example nm format mmmm: direct register nnnn: direct register add rm,rn nnnn xxxx xxxx 15 0 mmmm mmmm: direct register nnnn: direct register mov.l rm,@rn mmmm: indirect post-increment register (multiply/ accumulate) nnnn: indirect post-increment register (multiply/ accumulate) * mach, macl mac.w @rm+,@rn+ mmmm: indirect post-increment register nnnn: direct register mov.l @rm+,rn mmmm: direct register nnnn: indirect pre- decrement register mov.l rm,@-rn mmmm: direct register nnnn: indirect indexed register mov.l rm,@(r0,rn) md format xxxx dddd 15 0 mmmm xxxx mmmmdddd: indirect register with displacement r0 (direct register) mov.b @(disp,rm),r0 nd4 format dddd nnnn xxxx 15 0 xxxx r0 (direct register) nnnndddd: indirect register with displacement mov.b r0,@(disp,rn) note: * in multiply/accumulate instructions, nnnn is the source register. 85 table 6-9 instruction formats (cont) instruction formats source operand destination operand example nmd format nnnn xxxx dddd 15 0 mmmm mmmm: direct register nnnndddd: indirect register with displacement mov.l rm,@(disp,rn) mmmmdddd: indirect register with displacement nnnn: direct register mov.l @(disp,rm),rn d format dddd xxxx 15 0 xxxx dddd dddddddd: indirect gbr with displacement r0 (direct register) mov.l @(disp,gbr),r0 r0(direct register) dddddddd: indirect gbr with displacement mov.l r0,@(disp,gbr) dddddddd: pc relative with displacement r0 (direct register) mova @(disp,pc),r0 dddddddd: pc relative bf label d12 format dddd xxxx 15 0 dddd dddd dddddddddddd: pc relative bra label (label = disp + pc) nd8 format dddd nnnn xxxx 15 0 dddd dddddddd: pc relative with displacement nnnn: direct register mov.l @(disp,pc),rn i format iiiiiiii: immediate indirect indexed gbr and.b #imm,@(r0,gbr) i i i i xxxx 15 0 xxxx i i i i iiiiiiii: immediate r0 (direct register) and #imm,r0 iiiiiiii: immediate trapa #imm ni format nnnn i i i i xxxx 15 0 i i i i iiiiiiii: immediate nnnn: direct register add #imm,rn 86 6.5 instruction formats for dsp instructions (sh3-dsp only) new instructions have been added to the sh3-dsp for use in digital signal processing. the new instructions are divided into two groups. double and single data transfer instructions for memory and dsp registers (16 bits) parallel processing instructions processed by the dsp unit (32 bits) figure 6-4 shows their instruction formats. cpu core instructions 0 0 0 0 to 1 1 1 0 double data transfer instructions single data transfer instructions parallel processing instructions b field a field a field a field 1 1 1 1 0 0 1 1 1 1 0 1 1 1 1 1 1 0 15 15 15 15 0 0 0 0 31 10 10 9 9 16 26 25 figure 6-4 instruction formats of dsp instructions 6.5.1 double and single data transfer instructions table 6-10 shows the instruction formats for double data transfer instructions. table 6-11 shows the instruction formats for single data transfer instructions 87 table 6-10 instruction formats for double data transfers category mnemonic 15 14 13 12 11 10 9 8 x memory nopx 11110 0 0 data transfers movx.w @ax,dx movx.w @ax+,dx movx.w @ax+ix,dx ax movx.w da,@ax movx.w da,@ax+ movx.w da,@ax+ix y memory nopy 11110 0 0 data transfers movy.w @ay,dy movy.w @ay+,dy movy.w @ay+iy,dy ay movy.w da,@ay movy.w da,@ay+ movy.w da,@ay+iy table 6-10 instruction formats for double data transfers (cont) category mnemonic 7 6 5 4 3 2 1 0 x memory nopx 0000 data transfers movx.w @ax,dx movx.w @ax+,dx movx.w @ax+ix,dx dx 0 0 1 1 1 0 1 movx.w da,@ax movx.w da,@ax+ movx.w da,@ax+ix da 1 0 1 1 1 0 1 y memory nopy 00 00 data transfers movy.w @ay,dy movy.w @ay+,dy movy.w @ay+iy,dy dy 0 0 1 1 1 0 1 movy.w da,@ay movy.w da,@ay+ movy.w da,@ay+iy da 1 0 1 1 1 0 1 ax: 0=r4, 1=r5 ay: 0=r6, 1=r7 dx: 0=x0, 1=x1 dy: 0=y0, 1=y1 da: 0=a0, 1=a1 88 table 6-11 instruction formats for single data transfers category mnemonic 15 14 13 12 11 10 9 8 single data transfer movs.w @Cas,ds movs.w @as,ds movs.w @as+,ds movs.w @as+is,ds 111101 as 0: r4 1: r5 2: r2 movs.w ds,@aCs movs.w ds,@as movs.w ds,@as+ movs.w ds,@as+is 3: r3 movs.l @Cas,ds movs.l @as,ds movs.l @as+,ds movs.l @as+is,ds movs.l ds,@aCs movs.l ds,@as movs.l ds,@as+ movs.l ds,@as+is table 6-11 instruction formats for single data transfers (cont) category mnemonic 7 6 5 43210 single data transfer movs.w @Cas,ds movs.w @as,ds movs.w @as+,ds movs.w @as+is,ds ds 0: (*) 1: (*) 2: (*) 3: (*) 0 0 1 1 0 1 0 1 00 movs.w ds,@aCs movs.w ds,@as movs.w ds,@as+ movs.w ds,@as+is 4: (*) 5: a1 6: (*) 7: a0 0 0 1 1 0 1 0 1 1 movs.l @Cas,ds movs.l @as,ds movs.l @as+,ds movs.l @as+is,ds 8: x0 9: x1 a: y0 b: y1 0 0 1 1 0 1 0 1 0 movs.l ds,@aCs movs.l ds,@as movs.l ds,@as+ movs.l ds,@as+is c: m0 d: a1g e:m1 f:a0g 0 0 1 1 0 1 0 1 1 note: * system reserved code 89 6.5.2 parallel processing instructions parallel processing instructions are used by the sh3-dsp to increase the execution efficiency of digital signal processing using the dsp unit. they are 32 bits long and four can be processed in parallel (one alu operation, one multiplication, and two data transfers). parallel processing instructions are divided into two fields, a and b. the data transfer instructions are defined in field a and the alu operation instruction and multiplication instruction are defined in field b. these instructions can be defined independently, processed independently, and can be executed simultaneously in parallel. table 6-12 lists the field a parallel data transfer instructions, and table 6-13 shows the field b alu operation instructions and multiplication instructions. the field a instructions are identical to the double data transfer instructions shown in table 6-10. table 6-12 field a parallel data transfer instructions category mnemonic 31 30 29 28 27 26 25 24 23 x memory nopx 11 111 00 0 data transfers movx.w @ax,dx movx.w @ax+,dx movx.w @ax+ix,dx ax dx movx.w da,@ax movx.w da,@ax+ movx.w da,@ax+ix da y memory nopy 0 data transfers movy.w @ay,dy movy.w @ay+,dy movy.w @ay+iy,dy movy.w da,@ay movy.w da,@ay+ movy.w da,@ay+iy ay 90 table 6-12 field a parallel data transfer instructions (cont) category mnemonic 22 21 20 19 18 17 16 15C0 x memory nopx 0 0 0 field b data transfers movx.w @ax,dx movx.w @ax+,dx movx.w @ax+ix,dx 00 1 1 1 0 1 movx.w da,@ax movx.w da,@ax+ movx.w da,@ax+ix 10 1 1 1 0 1 y memory nopy 00 00 data transfers movy.w @ay,dy movy.w @ay+,dy movy.w @ay+iy,dy dy 0 0 1 1 1 0 1 movy.w da,@ay movy.w da,@ay+ movy.w da,@ay+iy da 1 0 1 1 1 0 1 ax: 0=r4, 1=r5 ay: 0=r6, 1=r7 dx: 0=x0, 1=x1 dy: 0=y0, 1=y1 da: 0=a0, 1=a1 91 table 6-13 field b alu operation instructions and multiplication instructions category mnemonic 14 13 12 10 9 8 7 6 5 4 3 2 1 0 15 dz 00 0 00 0 ?6 imm +16 ?32 imm ? +32 0 1 0se sf sx sy dgdu 00 0 00 1 01 0 0 01 1 1 01 1 0 10 0 0 00 dz 0: (* 1 ) 1: (* 1 ) 2: (* 1 ) 3: (* 1 ) 4: (* 1 ) 5: a1 6: (* 1 ) 7: a0 8: x0 9: x1 a: y0 b: y1 c: m0 d: (* 1 ) e: m1 f: (* 1 ) 0 0 01 0 0 00 1 1 0 1 0 1 0 0 1 0 1 0 1 1 1 1 1 1 1 0 01 1 1 0 0 1 1 1 1 0:x0 1:x1 2:y0 3:a1 0:x0 1:x1 2:a0 3:a1 0:x0 0:y0 1:y1 2:x0 3:a1 0:y0 0:m0 1:y0 1:y1 1:m1 2:a0 2:m0 2:a0 3:a1 3:m1 3:a1 01 0 pshl #imm, dz psha #imm, dz pmuls se, sf, dg reserved reserved reserved reserved pwsb sx, sy, dz pwad sx, sy, dz pabs sx, dz prnd sx, dz prnd sy, dz pabs sy, dz reserved psubc sx, sy, dz paddc sx, sy, dz pcmp sx, sy psub sx, sy, du pmuls se, sf, dg padd sx, sy, du pmuls se, sf, dg imm. shift six operand parallel instruction three operand instructions 31?7 25?6 26 10 field a 11 0 0 1 92 table 6-13 field b alu operation instructions and multiplication instructions (cont) category mnemonic 14 13 12 10 9 8 7 6 5 4 3 2 1 0 15 11 0 0 0 10 0 1 0 0 1 1 1 0 01 0 1 0 0 1 1 1 0 00 1 1 0 0 1 1 1 0 01 1 1 0 0 1 1 1 0 00 1 1 0 0 1 1 1 0 01 if cc 1 0* 3 00 1 0 0 1 1 1 reserved reserved reserved reserved reserved (if cc) *1 pshl sx, sy, dz (if cc) psha sx, sy, dz (if cc) psub sx, sy, dz (if cc) padd sx, sy, dz (if cc) pand sx, sy, dz (if cc) pxor sx, sy, dz (if cc) por sx, sy, dz (if cc) pdec sx, dz (if cc) pdec sy, dz (if cc) pinc sx, dz (if cc) pinc sy, dz (if cc) pclr dz (if cc) pdmsb sx, dz (if cc) pdmsb sy, dz (if cc) pneg sx, dz (if cc) pneg sy, dz (if cc) pcopy sx, dz (if cc) pcopy sy, dz (if cc) psts mach, dz (if cc) psts macl, dz (if cc) plds dz, macl (if cc) plds dz, mach conditional three operand instructions 00 if cc 1 0 field a sx 0:x0 1:x1 2:y0 3:y1 sy 0:y0 1:y1 2:m0 3:m1 dz 0:(* 1 ) 1:(* 1 ) 2:(* 1 ) 3:(* 1 ) 4:(* 1 ) 5:a1 6:(* 1 ) 7:a0 8:x0 9:x1 a:y0 b:y1 c:m0 d:(* 1 ) e:m1 f:(* 1 ) 10:dct 11:dcf 01: * 2 31?7 25?6 26 11 11 notes: 1. 2. 3. [if cc]: dct (dc bit true), dcf (dc bit false), or none (unconditional instruction) unconditional system reserved code 93 section 7 instruction set 7.1 instruction set by classification the sh-3 instruction set includes 68 basic instruction types, and the sh-3e instruction set includes 84 basic instruction types, divided into seven functional classifications, as shown in table 7-1. tables 7-3 to 7-9 summarize instruction notation, machine mode, execution time, and function. 94 table 7-1 classification of instructions classification types operation code function no. of instructions data transfer 5 mov data transfer immediate data transfer peripheral module data transfer structure data transfer 39 mova effective address transfer movt t bit transfer swap swap of upper and lower bytes xtrct extraction of the middle of registers connected pref prefetching data to cache arithmetic 21 add binary addition 33 operations addc binary addition with carry addv binary addition with overflow check cmp/cond comparison div1 division div0s initialization of signed division div0u initialization of unsigned division dmuls signed double-length multiplication dmulu unsigned double-length multiplication dt decrement and test exts sign extension extu zero extension mac multiply/accumulate, double-length multiply/accumulate operation mul double-length multiplication (32 32 bits) muls signed multiplication (16 16 bits) mulu unsigned multiplication (16 16 bits) neg negation negc negation with borrow sub binary subtraction subc binary subtraction with carry subv binary subtraction with underflow check 95 table 7-1 classification of instructions (cont) classification types operation code function no. of instructions logic 6 and logical and 14 operations not bit inversion or logical or tas memory test and bit set tst logical and and t bit set xor exclusive or shift 12 rotl one-bit left rotation 16 rotr one-bit right rotation rotcl one-bit left rotation with t bit rotcr one-bit right rotation with t bit shal one-bit arithmetic left shift shar one-bit arithmetic right shift shll one-bit logical left shift shlln n-bit logical left shift shlr one-bit logical right shift shlrn n-bit logical right shift shad dynamic arithmetic shift shld dynamic logical shift branch 9 bf conditional branch, conditional branch with delay (t = 0) 11 bt conditional branch, conditional branch with delay (t = 1) bra unconditional branch braf unconditional branch bsr branch to subroutine procedure bsrf branch to subroutine procedure jmp unconditional branch jsr branch to subroutine procedure rts return from subroutine procedure 96 table 7-1 classification of instructions (cont) classification types operation code function no. of instructions system 15 clrt t bit clear 83 (75) * control clrmac mac register clear clrs s bit clear ldc load to control register lds load to system register ldtlb load pte to tlb nop no operation rte return from exception processing sets s bit set sett t bit set sleep shift into power-down mode stc storing control register data sts storing system register data trapa trap exception handling floating point 16 fabs floating point absolute value 23 instructions fadd floating point add (sh-3e only) fcmp floating point compare fdiv floating point divide fldi0 floating point load immediate 0 fldi1 floating point load immediate 1 flds floating point load to system register fpul float floating point convert from integer fmac floating point multiply accumulate fmov floating point move fmul floating point multiply fneg floating point negate fsqrt floating point square root fsts floating point store from system register fpul fsub floating point subtract ftrc floating point truncate and convert to integer total: 84 219 (188) * note: * the lds and sts instructions include instructions to load/store to the fpu system register. these instructions can only be used with the sh-3e. the figure in parentheses ( ) is the total excluding the sh-3e instructions. 97 instruction codes, operation, and execution states are listed as shown in table 7-2 in order by classification. tables 7-3 to 7-8 list the minimum number of clock cycles required for execution. in practice, the number of execution cycles increases when the instruction fetch is in contention with data access or when the destination register of a load instruction (memory ? register) is the same as the register used by the next instruction. table 7-2 instruction code format item format explanation instruction op.sz src,dest op: operation code sz: size src: source dest: destination rm: source register rn: destination register imm: immediate data disp: displacement operation ? , ? (xx) m/q/t & | ^ ~ < 98 7.1.1 data transfer instructions table 7-3 data transfer instructions instruction operation code privilege cycles t bit mov #imm,rn imm ? sign extension ? rn 1110nnnniiiiiiii 1 mov.w @(disp,pc),rn (disp 2 + pc) ? sign extension ? rn 1001nnnndddddddd 1 mov.l @(disp,pc),rn (disp 4 + pc) ? rn 1101nnnndddddddd 1 mov rm,rn rm ? rn 0110nnnnmmmm0011 1 mov.b rm,@rn rm ? (rn) 0010nnnnmmmm0000 1 mov.w rm,@rn rm ? (rn) 0010nnnnmmmm0001 1 mov.l rm,@rn rm ? (rn) 0010nnnnmmmm0010 1 mov.b @rm,rn (rm) ? sign extension ? rn 0110nnnnmmmm0000 1 mov.w @rm,rn (rm) ? sign extension ? rn 0110nnnnmmmm0001 1 mov.l @rm,rn (rm) ? rn 0110nnnnmmmm0010 1 mov.b rm,@Crn rnC1 ? rn, rm ? (rn) 0010nnnnmmmm0100 1 mov.w rm,@Crn rnC2 ? rn, rm ? (rn) 0010nnnnmmmm0101 1 mov.l rm,@Crn rnC4 ? rn, rm ? (rn) 0010nnnnmmmm0110 1 mov.b @rm+,rn (rm) ? sign extension ? rn,rm + 1 ? rm 0110nnnnmmmm0100 1 mov.w @rm+,rn (rm) ? sign extension ? rn,rm + 2 ? rm 0110nnnnmmmm0101 1 mov.l @rm+,rn (rm) ? rn,rm + 4 ? rm 0110nnnnmmmm0110 1 mov.b r0,@(disp,rn) r0 ? (disp + rn) 10000000nnnndddd 1 mov.w r0,@(disp,rn) r0 ? (disp 2 + rn) 10000001nnnndddd 1 mov.l rm,@(disp,rn) rm ? (disp 4 + rn) 0001nnnnmmmmdddd 1 mov.b @(disp,rm),r0 (disp + rm) ? sign extension ? r0 10000100mmmmdddd 1 mov.w @(disp,rm),r0 (disp 2 + rm) ? sign extension ? r0 10000101mmmmdddd 1 mov.l @(disp,rm),rn (disp 4 + rm) ? rn 0101nnnnmmmmdddd 1 mov.b rm,@(r0,rn) rm ? (r0 + rn) 0000nnnnmmmm0100 1 mov.w rm,@(r0,rn) rm ? (r0 + rn) 0000nnnnmmmm0101 1 99 table 7-3 data transfer instructions (cont) instruction operation code privilege cycles t bit mov.l rm,@(r0,rn) rm ? (r0 + rn) 0000nnnnmmmm0110 1 mov.b @(r0,rm),rn (r0 + rm) ? sign extension ? rn 0000nnnnmmmm1100 1 mov.w @(r0,rm),rn (r0 + rm) ? sign extension ? rn 0000nnnnmmmm1101 1 mov.l @(r0,rm),rn (r0 + rm) ? rn 0000nnnnmmmm1110 1 mov.b r0,@(disp,gbr) r0 ? (disp + gbr) 11000000dddddddd 1 mov.w r0,@(disp,gbr) r0 ? (disp 2 + gbr) 11000001dddddddd 1 mov.l r0,@(disp,gbr) r0 ? (disp 4 + gbr) 11000010dddddddd 1 mov.b @(disp,gbr),r0 (disp + gbr) ? sign extension ? r0 11000100dddddddd 1 mov.w @(disp,gbr),r0 (disp 2 + gbr) ? sign extension ? r0 11000101dddddddd 1 mov.l @(disp,gbr),r0 (disp 4 + gbr) ? r0 11000110dddddddd 1 mova @(disp,pc),r0 disp 4 + pc ? r0 11000111dddddddd 1 movt rn t ? rn 0000nnnn00101001 1 pref @rn (rn) ? cache 0000nnnn10000011 1/2 * swap.b rm,rn rm ? swap the bottom two bytes ? reg 0110nnnnmmmm1000 1 swap.w rm,rn rm ? swap two consecutive words ? rn 0110nnnnmmmm1001 1 xtrct rm,rn rm: middle 32 bits of rn ? rn 0010nnnnmmmm1101 1 note: * two cycles on the sh3-dsp. 100 7.1.2 arithmetic instructions table 7-4 arithmetic instructions instruction operation code privilege cycles t bit add rm,rn rn + rm ? rn 0011nnnnmmmm1100 1 add #imm,rn rn + imm ? rn 0111nnnniiiiiiii 1 addc rm,rn rn + rm + t ? rn, carry ? t 0011nnnnmmmm1110 1 carry addv rm,rn rn + rm ? rn, overflow ? t 0011nnnnmmmm1111 1 overflow cmp/eq #imm,r0 if r0 = imm, 1 ? t 10001000iiiiiiii 1 comparison result cmp/eq rm,rn if rn = rm, 1 ? t 0011nnnnmmmm0000 1 comparison result cmp/hs rm,rn if rn 3 rm with unsigned data, 1 ? t 0011nnnnmmmm0010 1 comparison result cmp/ge rm,rn if rn 3 rm with signed data, 1 ? t 0011nnnnmmmm0011 1 comparison result cmp/hi rm,rn if rn > rm with unsigned data, 1 ? t 0011nnnnmmmm0110 1 comparison result cmp/gt rm,rn if rn > rm with signed data, 1 ? t 0011nnnnmmmm0111 1 comparison result cmp/pz rn if rn 3 0, 1 ? t 0100nnnn00010001 1 comparison result cmp/pl rn if rn > 0, 1 ? t 0100nnnn00010101 1 comparison result cmp/str rm,rn if rn and rm have an equivalent byte, 1 ? t 0010nnnnmmmm1100 1 comparison result div1 rm,rn single-step division (rn/rm) 0011nnnnmmmm0100 1 calculation result div0s rm,rn msb of rn ? q, msb of rm ? m, m ^ q ? t 0010nnnnmmmm0111 1 calculation result div0u 0 ? m/q/t 0000000000011001 10 101 table 7-4 arithmetic instructions (cont) instruction operation code privilege cycles t bit dmuls.l rm,rn signed operation of rn rm ? mach, macl 32 32 ? 64 bits 0011nnnnmmmm1101 2 (to 5/4)* 1 dmulu.l rm,rn unsigned operation of rn rm ? mach, macl 32 32 ? 64 bits 0011nnnnmmmm0101 2 (to 5/4)* 1 dt rn rn C 1 ? rn, if rn = 0, 1 ? t, else 0 ? t 0100nnnn00010000 1 comparison result exts.b rm,rn a byte in rm is sign- extended ? rn 0110nnnnmmmm1110 1 exts.w rm,rn a word in rm is sign- extended ? rn 0110nnnnmmmm1111 1 extu.b rm,rn a byte in rm is zero- extended ? rn 0110nnnnmmmm1100 1 extu.w rm,rn a word in rm is zero- extended ? rn 0110nnnnmmmm1101 1 mac.l @rm+, @rn+ signed operation of (rn) (rm) + mac ? mac 0000nnnnmmmm1111 2 (to 5/4) * 1 mac.w @rm+, @rn+ signed operation of (rn) (rm) + mac ? mac 16 16 + 64 ? 64 bits 0100nnnnmmmm1111 2 (to 5) * 1 mul.l rm,rn rn rm ? macl 32 32 ? 32 bits 0000nnnnmmmm0111 2 (to 5/4) * 1 muls.w rm,rn signed operation of rn rm ? mac 16 16 ? 32 bits 0010nnnnmmmm1111 1 (to 3) * 2 mulu.w rm,rn unsigned operation of rn rm ? mac 16 16 ? 32 bits 0010nnnnmmmm1110 1 (to 3) * 2 neg rm,rn 0Crm ? rn 0110nnnnmmmm1011 1 negc rm,rn 0CrmCt ? rn, borrow ? t 0110nnnnmmmm1010 1 borrow sub rm,rn rnCrm ? rn 0011nnnnmmmm1000 1 subc rm,rn rnCrmCt ? rn, borrow ? t 0011nnnnmmmm1010 1 borrow subv rm,rn rnCrm ? rn, underflow ? t 0011nnnnmmmm1011 1 underflow notes: 1. the normal minimum number of execution cycles is 2, but 5 cycles (4 cycles on the sh3-dsp) are required when the results of an operation are read from the mac register immediately after the instruction. 2. the normal minimum number of execution cycles is 1, but 3 cycles are required when the results of an operation are read from the mac register immediately after a mul instruction. 102 7.1.3 logic operation instructions table 7-5 logic operation instructions instruction operation code privilege cycles t bit and rm,rn rn & rm ? rn 0010nnnnmmmm1001 1 and #imm,r0 r0 & imm ? r0 11001001iiiiiiii 1 and.b #imm,@(r0,gbr) (r0 + gbr) & imm ? (r0 + gbr) 11001101iiiiiiii 3 not rm,rn ~rm ? rn 0110nnnnmmmm0111 1 or rm,rn rn | rm ? rn 0010nnnnmmmm1011 1 or #imm,r0 r0 | imm ? r0 11001011iiiiiiii 1 or.b #imm,@(r0,gbr) (r0 + gbr) | imm ? (r0 + gbr) 11001111iiiiiiii 3 tas.b @rn if (rn) is 0, 1 ? t; 1 ? msb of (rn) 0100nnnn00011011 3/4 * test result tst rm,rn rn & rm; if the result is 0, 1 ? t 0010nnnnmmmm1000 1 test result tst #imm,r0 r0 & imm; if the result is 0, 1 ? t 11001000iiiiiiii 1 test result tst.b #imm,@(r0,gbr) (r0 + gbr) & imm; if the result is 0, 1 ? t 11001100iiiiiiii 3 test result xor rm,rn rn ^ rm ? rn 0010nnnnmmmm1010 1 xor #imm,r0 r0 ^ imm ? r0 11001010iiiiiiii 1 xor.b #imm,@(r0,gbr) (r0 + gbr) ^ imm ? (r0 + gbr) 11001110iiiiiiii 3 note: * four cycles on the sh3-dsp. 103 7.1.4 shift instructions table 7-6 shift instructions instruction operation code privilege cycles t bit rotl rn t ? rn ? msb 0100nnnn00000100 1 msb rotr rn lsb ? rn ? t 0100nnnn00000101 1 lsb rotcl rn t ? rn ? t 0100nnnn00100100 1 msb rotcr rn t ? rn ? t 0100nnnn00100101 1 lsb shad rm,rn rn 3 0; rn << rm ? rn rn < 0; rn >> rm ? [msb ? rn] 0100nnnnmmmm1100 1 shal rn t ? rn ? 0 0100nnnn00100000 1 msb shar rn msb ? rn ? t 0100nnnn00100001 1 lsb shld rm,rn rn 3 0; rn << rm ? rn rn < 0; rn >> rm ? [0 ? rn] 0100nnnnmmmm1101 1 shll rn t ? rn ? 0 0100nnnn00000000 1 msb shlr rn 0 ? rn ? t 0100nnnn00000001 1 lsb shll2 rn rn << 2 ? rn 0100nnnn00001000 1 shlr2 rn rn >> 2 ? rn 0100nnnn00001001 1 shll8 rn rn << 8 ? rn 0100nnnn00011000 1 shlr8 rn rn >> 8 ? rn 0100nnnn00011001 1 shll16 rn rn << 16 ? rn 0100nnnn00101000 1 shlr16 rn rn >> 16 ? rn 0100nnnn00101001 1 104 7.1.5 branch instructions table 7-7 branch instructions instruction operation code privilege cycles t bit bf label if t = 0, disp 2 + pc ? pc; if t = 1, nop 10001011dddddddd 3/1 * bf/s label delayed branch, if t = 0, disp 2 + pc ? pc; if t = 1, nop 10001111dddddddd 2/1 * bt label delayed branch, if t = 1, disp 2 + pc ? pc; if t = 0, nop 10001001dddddddd 3/1 * bt/s label if t = 1, disp 2 + pc ? pc; if t = 0, nop 10001101dddddddd 2/1 * bra label delayed branch, disp 2 + pc ? pc 1010dddddddddddd 2 braf rn rn + pc ? pc 0000nnnn00100011 2 bsr label delayed branch, pc ? pr, disp 2 + pc ? pc 1011dddddddddddd 2 bsrf rn pc ? pr, rn + pc ? pc 0000nnnn00000011 2 jmp @rn delayed branch, rn ? pc 0100nnnn00101011 2 jsr @rn delayed branch, pc ? pr, rn ? pc 0100nnnn00001011 2 rts delayed branch, pr ? pc 0000000000001011 2 note: * one state when it does not branch. 105 7.1.6 system control instructions table 7-8 system control instructions instruction operation code privilege cycles t bit clrmac 0 ? mach, macl 0000000000101000 1 clrs 0 ? s 0000000001001000 1 clrt 0 ? t 0000000000001000 10 ldc rm,sr rm ? sr 0100mmmm00001110 ? 5 lsb ldc rm,gbr rm ? gbr 0100mmmm00011110 1/3 * 1 ldc rm,vbr rm ? vbr 0100mmmm00101110 ? 1/3 * 1 ldc rm,ssr rm ? ssr 0100mmmm00111110 ? 1/3 * 1 ldc rm,spc rm ? spc 0100mmmm01001110 ? 1/3 * 1 ldc rm,r0_bank rm ? r0_bank 0100mmmm10001110 ? 1/3 * 1 ldc rm,r1_bank rm ? r1_bank 0100mmmm10011110 ? 1/3 * 1 ldc rm,r2_bank rm ? r2_bank 0100mmmm10101110 ? 1/3 * 1 ldc rm,r3_bank rm ? r3_bank 0100mmmm10111110 ? 1/3 * 1 ldc rm,r4_bank rm ? r4_bank 0100mmmm11001110 ? 1/3 * 1 ldc rm,r5_bank rm ? r5_bank 0100mmmm11011110 ? 1/3 * 1 ldc rm,r6_bank rm ? r6_bank 0100mmmm11101110 ? 1/3 * 1 ldc rm,r7_bank rm ? r7_bank 0100mmmm11111110 ? 1/3 * 1 ldc.l @rm+,sr (rm) ? sr, rm + 4 ? rm 0100mmmm00000111 ? 7 lsb ldc.l @rm+,gbr (rm) ? gbr, rm + 4 ? rm 0100mmmm00010111 1/5 * 2 ldc.l @rm+,vbr (rm) ? vbr, rm + 4 ? rm 0100mmmm00100111 ? 1/5 * 2 ldc.l @rm+,ssr (rm) ? ssr, rm + 4 ? rm 0100mmmm00110111 ? 1/5 * 2 ldc.l @rm+,spc (rm) ? spc, rm + 4 ? rm 0100mmmm01000111 ? 1/5 * 2 ldc.l @rm+,r0_ bank (rm) ? r0_bank, rm + 4 ? rm 0100mmmm10000111 ? 1/5 * 2 ldc.l @rm+,r1_ bank (rm) ? r1_bank, rm + 4 ? rm 0100mmmm10010111 ? 1/5 * 2 ldc.l @rm+,r2_ bank (rm) ? r2_bank, rm + 4 ? rm 0100mmmm10100111 ? 1/5 * 2 ldc.l @rm+,r3_ bank (rm) ? r3_bank, rm + 4 ? rm 0100mmmm10110111 ? 1/5 * 2 106 table 7-8 system control instructions (cont) instruction operation code privilege cycles t bit ldc.l @rm+,r4_ bank (rm) ? r4_bank, rm + 4 ? rm 0100mmmm11000111 ? 1/5 * 2 ldc.l @rm+,r5_ bank (rm) ? r5_bank, rm + 4 ? rm 0100mmmm11010111 ? 1/5 * 2 ldc.l @rm+,r6_ bank (rm) ? r6_bank, rm + 4 ? rm 0100mmmm11100111 ? 1/5 * 2 ldc.l @rm+,r7_ bank (rm) ? r7_bank, rm + 4 ? rm 0100mmmm11110111 ? 1/5 * 2 lds rm,mach rm ? mach 0100mmmm00001010 1 lds rm,macl rm ? macl 0100mmmm00011010 1 lds rm,pr rm ? pr 0100mmmm00101010 1 lds.l @rm+,mach (rm) ? mach, rm + 4 ? rm 0100mmmm00000110 1 lds.l @rm+,macl (rm) ? macl, rm + 4 ? rm 0100mmmm00010110 1 lds.l @rm+,pr (rm) ? pr, rm + 4 ? rm 0100mmmm00100110 1 ldtlb pteh/ptel ? tlb 0000000000111000 ? 1 nop no operation 0000000000001001 1 pref @rn (rn) ? cache 0000nnnn10000011 1 rte delayed branch, ssr/spc ? sr/pc 0000000000101011 ? 4 sets 1 ? s 0000000001011000 1 sett 1 ? t 0000000000011000 11 sleep sleep 0000000000011011 ? 4 * 3 stc sr,rn sr ? rn 0000nnnn00000010 ? 1 stc gbr,rn gbr ? rn 0000nnnn00010010 1 stc vbr,rn vbr ? rn 0000nnnn00100010 ? 1 stc ssr,rn ssr ? rn 0000nnnn00110010 ? 1 stc spc,rn spc ? rn 0000nnnn01000010 ? 1 107 table 7-8 system control instructions (cont) instruction operation code privilege cycles t bit stc r0_bank,rn r0_bank ? rn 0000nnnn10000010 ? 1 stc r1_bank,rn r1_bank ? rn 0000nnnn10010010 ? 1 stc r2_bank,rn r2_bank ? rn 0000nnnn10100010 ? 1 stc r3_bank,rn r3_bank ? rn 0000nnnn10110010 ? 1 stc r4_bank,rn r4_bank ? rn 0000nnnn11000010 ? 1 stc r5_bank,rn r5_bank ? rn 0000nnnn11010010 ? 1 stc r6_bank,rn r6_bank ? rn 0000nnnn11100010 ? 1 stc r7_bank,rn r7_bank ? rn 0000nnnn11110010 ? 1 stc.l sr,@Crn rnC4 ? rn, sr ? (rn) 0100nnnn00000011 ? 1/2 * 4 stc.l gbr,@Crn rnC4 ? rn, gbr ? (rn) 0100nnnn00010011 1/2 * 4 stc.l vbr,@Crn rnC4 ? rn, vbr ? (rn) 0100nnnn00100011 ? 1/2 * 4 stc.l ssr,@Crn rnC4 ? rn, ssr ? (rn) 0100nnnn00110011 ? 1/2 * 4 stc.l spc,@Crn rnC4 ? rn, spc ? (rn) 0100nnnn01000011 ? 1/2 * 4 stc.l r0_bank,@C rn rnC4 ? rn, r0_bank ? (rn) 0100nnnn10000011 ? 2 stc.l r1_bank,@C rn rnC4 ? rn, r1_bank ? (rn) 0100nnnn10010011 ? 2 stc.l r2_bank,@C rn rnC4 ? rn, r2_bank ? (rn) 0100nnnn10100011 ? 2 stc.l r3_bank,@C rn rnC4 ? rn, r3_bank ? (rn) 0100nnnn10110011 ? 2 stc.l r4_bank,@C rn rnC4 ? rn, r4_bank ? (rn) 0100nnnn11000011 ? 2 stc.l r5_bank,@C rn rnC4 ? rn, r5_bank ? (rn) 0100nnnn11010011 ? 2 stc.l r6_bank,@C rn rnC4 ? rn, r6_bank ? (rn) 0100nnnn11100011 ? 2 stc.l r7_bank,@C rn rnC4 ? rn, r7_bank ? (rn) 0100nnnn11110011 ? 2 sts mach,rn mach ? rn 0000nnnn00001010 1 sts macl,rn macl ? rn 0000nnnn00011010 1 sts pr,rn pr ? rn 0000nnnn00101010 1 108 table 7-8 system control instructions (cont) instruction operation code privilege cycles t bit sts.l mach,@Crn rnC4 ? rn, mach ? (rn) 0100nnnn00000010 1 sts.l macl,@Crn rnC4 ? rn, macl ? (rn) 0100nnnn00010010 1 sts.l pr,@Crn rnC4 ? rn, pr ? (rn) 0100nnnn00100010 1 trapa #imm pc/sr ? spc/ssr, #imm<<2 ? tra, 0x160 ? expevt vbr + h'0100 ? pc 11000011iiiiiiii 6/8 * 5 notes: the number of execution states before the chip enters the sleep state. this table lists the minimum execution cycles. in practice, the number of execution cycles increases when the instruction fetch is in contention with data access or when the destination register of a load instruction (memory ? register) is the same as the register used by the next instruction. 1. three cycles on the sh3-dsp. 2. five cycles on the sh3-dsp. 3. number of cycles before transition to sleep state. 4. two cycles on the sh3-dsp. 5. eight cycles on the sh3-dsp. 109 7.1.7 floating point instructions (sh-3e only) table 7-9 floating point instructions instruction operation code privilege cycles t bit fabs frn | frn | ? frn 1111nnnn01011101 1 fadd frm,frn frn + frm ? frn 1111nnnnmmmm0000 1 fcmp/eq frm,frn frn == frm? 1:0 ? t 1111nnnnmmmm0100 1 comparison result fcmp/gt frm,frn frn > frm? 1:0 ? t 1111nnnnmmmm0101 1 comparison result fdiv frm,frn frn / frm ? frn 1111nnnnmmmm0011 13 fldi0 frn h'00000000 ? frn 1111nnnn10001101 1 fldi1 frn h'3f800000 ? frn 1111nnnn10011101 1 flds frm,fpul frm ? fpul 1111nnnn00011101 1 float fpul,frn (float)fpul ? frn 1111nnnn00101101 1 fmac fr0,frm,frn fr0 frm + frn ? frn 1111nnnnmmmm1110 1 fmov frm,frn frm ? frn 1111nnnnmmmm1100 1 fmov.s @(r0,rm),frn (r0 + rm) ? frn 1111nnnnmmmm0110 1 fmov.s @rm+,frn (rm) ? frn, rm+4 ? rm 1111nnnnmmmm1001 1 fmov.s @rm,frn (rm) ? frn 1111nnnnmmmm1000 1 fmov.s frm,@(r0,rn) frm ? (r0 + rn) 1111nnnnmmmm0111 1 fmov.s frm,@-rn rn-4 ? rn, frm ? (rn) 1111nnnnmmmm1011 1 fmov.s frm,@rn frm ? (rn) 1111nnnnmmmm1010 1 fmul frm,frn fm frm ? frn 1111nnnnmmmm0010 1 fneg frn Cfrn ? frn 1111nnnn01001101 1 fsqrt frn ? frn ? frn 1111nnnn01101101 13 fsts fpul,frn fpul ? frn 1111nnnn00001101 1 fsub frm,frn frn - frm ? frn 1111nnnnmmmm0001 1 ftrc frm,fpul (long)frm ? fpul 1111nnnn00111101 1 110 7.1.8 fpu system register related cpu instructions (sh-3e only) table 7-10 fpu related cpu instructions instruction operation code privilege cycles t bit lds rm,fpscr rm ? fpscr 0100nnnn01101010 1 lds rm,fpul rm ? fpul 0100nnnn01011010 1 lds.l @rm+ ,fpscr @rm ? fpscr, rm+4 ? rm 0100nnnn01100110 1 lds.l @rm+ ,fpul @rm ? fpul, rm+4 ? rm 0100nnnn01010110 1 sts fpscr, rn fpscr ? rn 0000nnnn01101010 1 sts fpul, rn fpul ? rn 0000nnnn01011010 1 sts.l fpscr,@- rn rn-4 ? rn, fpscr ? @rn 0100nnnn01100010 1 sts.l fpul,@-rn rn-4 ? rn, fpul ? @rn 0100nnnn01010010 1 7.1.9 cpu instructions that support dsp functions (sh3-dsp only) several system control instructions have been added to the cpu core instructions to support dsp functions. the rs, re, and mod registers (which support modulo addressing) have been added, and an rc counter has been added to the sr register. ldc and stc instructions have been added to access these. lds and sts instructions have also been added for accessing the dsp registers dsr, a0, x0, x1, y0, and y1. a setrc instruction has been added for setting the value of the repeat counter (rc) in the sr register (bits 16C27). when the operand of the setrc instruction is immediate, 8 bits of immediate data are set in bits 16C23 of the sr register and bits 24C27 are cleared. when the operand is a register, the 12 bits 0C11 of the register are set in bits 16C27 of the sr register. in addition to the new ldc instructions, the ldre and ldrs instructions have been added for setting the repeat start address and repeat end address in the rs and re registers. table 7-11 shows the added instructions. 111 table 7-11 added cpu instructions instruction operation code cycles t bit ldc rm,mod rm ? mod 0100mmmm01011110 3 ldc rm,re rm ? re 0100mmmm01111110 3 ldc rm,rs rm ? rs 0100mmmm01101110 3 ldc.l @rm+,mod (rm) ? mod , rm + 4 ? rm 0100mmmm01010111 5 ldc.l @rm+,re (rm) ? re , rm + 4 ? rm 0100mmmm01110111 5 ldc.l @rm+,rs (rm) ? rs , rm + 4 ? rm 0100mmmm01100111 5 stc mod,rn mod ? rn 0000nnnn01010010 1 stc re,rn re ? rn 0000nnnn01110010 1 stc rs,rn rs ? rn 0000nnnn01100010 1 stc.l mod,@-rn rnC4 ? rn , mod ? (rn) 0100nnnn01010011 2 stc.l re,@-rn rnC4 ? rn , re ? (rn) 0100nnnn01110011 2 stc.l rs,@-rn rnC4 ? rn , rs ? (rn) 0100nnnn01100011 2 lds rm,dsr rm ? dsr 0100mmmm01101010 1 lds.l @rm+,dsr (rm) ? dsr , rm + 4 ? rm 0100mmmm01100110 1 lds rm,a0 rm ? a0 0100mmmm01110110 1 lds.l @rm+,a0 (rm) ? a0 , rm + 4 ? rm 0100mmmm01100110 1 lds rm,x0 rm ? x0 0100mmmm01110110 1 lds.l @rm+,x0 (rm) ? x0 , rm + 4 ? rm 0100mmmm01100110 1 lds rm,x1 rm ? x1 0100mmmm01110110 1 lds.l @rm+,x1 (rm) ? x1 , rm + 4 ? rm 0100mmmm01100110 1 lds rm,y0 rm ? y0 0100mmmm01110110 1 lds.l @rm+,y0 (rm) ? y0 , rm + 4 ? rm 0100mmmm01100110 1 lds rm,y1 rm ? y1 , rm + 4 ? rm 0100mmmm01110110 1 lds.l @rm+,y1 (rm) ? y1 , rm + 4 ? rm 0100mmmm01100110 1 sts dsr,rn dsr ? rn 0000nnnn01101010 1 sts.l dsr,@-rn rnC4 ? rn , dsr ? (rn) 0100nnnn01100010 1 sts a0,rn a0 ? rn 0000nnnn01111010 1 sts.l a0,@-rn rnC4 ? rn , a0 ? (rn) 0100nnnn01110010 1 sts x0,rn x0 ? rn 0000nnnn01111010 1 sts.l x0,@-rn rnC4 ? rn , x0 ? (rn) 0100nnnn01110010 1 sts x1,rn x1 ? rn 0000nnnn01111010 1 sts.l x1,@-rn rnC4 ? rn , x1 ? (rn) 0100nnnn01110010 1 112 table 7-11 added cpu instructions (cont) instruction operation code cycles t bit sts y0,rn y0 ? rn 0000nnnn10101010 1 sts.l y0,@-rn rnC4 ? rn , y0 ? (rn) 0100nnnn10100010 1 sts y1,rn y1 ? rn 0000nnnn10111010 1 sts.l y1,@-rn rnC4 ? rn , y1 ? (rn) 0100nnnn10110010 1 setrc rm rm[11:0] ? rc (sr[27:16]) 0100mmmm00010100 3 setrc #imm imm ? rc(sr[23:16]), zeros ? sr[27:24] 10000010iiiiiiii 3 ldrs @(disp,pc) disp 2+pc ? rs 10001100dddddddd 3 ldre @(disp,pc) disp 2+pc ? re 10001110dddddddd 3 7.2 instruction set in alphabetical order table 7-12 alphabetically lists the instruction codes and number of execution cycles for each instruction. table 7-12 instruction set listed alphabetically instruction operation code privilege cycles t bit add #imm,rn rn + imm ? rn 0111nnnniiiiiiii 1 add rm,rn rn + rm ? rn 0011nnnnmmmm1100 1 addc rm,rn rn + rm + t ? rn, carry ? t 0011nnnnmmmm1110 1 carry addv rm,rn rn + rm ? rn, overflow ? t 0011nnnnmmmm1111 1 overflow and #imm,r0 r0 & imm ? r0 11001001iiiiiiii 1 and rm,rn rn & rm ? rn 0010nnnnmmmm1001 1 and.b #imm,@(r0,gbr) (r0 + gbr) & imm ? (r0 + gbr) 11001101iiiiiiii 3 bf label if t = 0, disp + pc ? pc; if t = 1, nop 10001011dddddddd 3/1 * 2 bf/s label if t = 0, disp + pc ? pc; if t = 1, nop 10001111dddddddd 2/1 * 2 bra label delayed branch, disp + pc ? pc 1010dddddddddddd 2 braf rn delayed branch, rn + pc ? pc 0000nnnn00100011 2 113 table 7-12 instruction set listed alphabetically (cont) instruction operation code privilege cycles t bit bsr label delayed branch, pc ? pr, disp + pc ? pc 1011dddddddddddd 2 bsrf rn delayed branch, pc ? pr, rn + pc ? pc 0000nnnn00000011 2 bt label if t = 1, disp + pc ? pc; if t = 0, nop 10001001dddddddd 3/1 * 2 bt/s label if t = 1, disp + pc ? pc; if t = 0, nop 10001101dddddddd 2/1 * 2 clrmac 0 ? mach, macl 0000000000101000 1 clrs 0 ? s 0000000001001000 1 clrt 0 ? t 0000000000001000 10 cmp/eq #imm,r0 if r0 = imm, 1 ? t 10001000iiiiiiii 1 comparison result cmp/eq rm,rn if rn = rm, 1 ? t 0011nnnnmmmm0000 1 comparison result cmp/ge rm,rn if rn 3 rm with signed data, 1 ? t 0011nnnnmmmm0011 1 comparison result cmp/gt rm,rn if rn > rm with signed data, 1 ? t 0011nnnnmmmm0111 1 comparison result cmp/hi rm,rn if rn > rm with unsigned data, 0011nnnnmmmm0110 1 comparison result cmp/hs rm,rn if rn 3 rm with unsigned data, 1 ? t 0011nnnnmmmm0010 1 comparison result cmp/pl rn if rn>0, 1 ? t 0100nnnn00010101 1 comparison result cmp/pz rn if rn 3 0, 1 ? t 0100nnnn00010001 1 comparison result cmp/str rm,rn if rn and rm have an equivalent byte, 1 ? t 0010nnnnmmmm1100 1 comparison result div0s rm,rn msb of rn ? q, msb of rm ? m, m ^ q ? t 0010nnnnmmmm0111 1 calculation result div0u 0 ? m/q/t 0000000000011001 10 div1 rm,rn single-step division (rn/rm) 0011nnnnmmmm0100 1 calculation result dmuls.l rm,rn signed operation of rn rm ? mach, macl 0011nnnnmmmm1101 2 (to 5) * 1 114 table 7-12 instruction set listed alphabetically (cont) instruction operation code privilege cycles t bit dmulu.l rm,rn unsigned operation of rn rm ? mach, macl 0011nnnnmmmm0101 2 (to 5) * 1 dt rn rn - 1 ? rn, when rn is 0, 1 ? t. when rn is nonzero, 0 ? t 0100nnnn00010000 1 comparison result exts.b rm,rn a byte in rm is sign- extended ? rn 0110nnnnmmmm1110 1 exts.w rm,rn a word in rm is sign-extended ? rn 0110nnnnmmmm1111 1 extu.b rm,rn a byte in rm is zero-extended ? rn 0110nnnnmmmm1100 1 extu.w rm,rn a word in rm is zero-extended ? rn 0110nnnnmmmm1101 1 fabs frn * 3 | frn | ? frn 1111nnnn01011101 1 fadd frm ,frn * 3 frn + frm ? frn 1111nnnnmmmm0000 1 fcmp/eq frm ,frn * 3 (frn == frm)? 1:0 ? t 1111nnnnmmmm0100 1 comparison result fcmp/gt frm ,frn * 3 (frn > frm) ? 1:0 ? t 1111nnnnmmmm0101 1 comparison result fdiv frm ,frn * 3 frn /frm ? frn 1111nnnnmmmm0011 13 fldi0 frn * 3 h'00000000 ? frn 1111nnnn10001101 1 fldi1 frn * 3 h'3f800000 ? frn 1111nnnn10011101 1 flds frm ,fpul * 3 frm ? fpul 1111nnnn00011101 1 float fpul, frn * 3 (float)fpul ? frn 1111nnnn00101101 1 fmac fr0,frm,frn * 3 fr0 frm + frn ? frn 1111nnnnmmmm1110 1 fmov frm ,frn * 3 frm ? frn 1111nnnnmmmm1100 1 fmov.s @(r0,rm),frn * 3 (r0 + rm) ? frn 1111nnnnmmmm0110 1 fmov.s @rm+,frn * 3 (rm) ? frn,rm + 4 = rm 1111nnnnmmmm1001 1 fmov.s @rm,frn * 3 (rm) ? frn 1111nnnnmmmm1000 1 fmov.s frm,@(r0,rn) * 3 (frm) ? (r0 + rn) 1111nnnnmmmm0111 1 fmov.s frm,@-rn* * 3 rn-4 ? rn, frm ? (rn) 1111nnnnmmmm1011 1 fmov.s frm,@rn * 3 frm ? (rn) 1111nnnnmmmm1010 1 fmul frm,frn * 3 frn frm ? frn 1111nnnnmmmm0010 1 115 table 7-12 instruction set listed alphabetically (cont) instruction operation code privilege cycles t bit fneg frn * 3 Cfrn ? frn 1111nnnn01001101 1 fsqrt frn * 3 ? frn ? frn 1111nnnn01101101 13 fsts fpul,frn * 3 fpul ? frn 1111nnnn00001101 1 fsub frm,frn * 3 frn C frm ? frn 1111nnnnmmmm0001 1 ftrc frm,fpul * 3 (long)frm ? fpul 1111nnnn00111101 1 jmp @rn delayed branch, rn ? pc 0100nnnn00101011 2 jsr @rn delayed branch, pc ? pr, rn ? pc 0100nnnn00001011 2 ldc rm,gbr rm ? gbr 0100mmmm00011110 1/3 * 4 ldc rm,sr rm ? sr 0100mmmm00001110 ? 5 lsb ldc rm,vbr rm ? vbr 0100mmmm00101110 ? 1/3 * 4 ldc rm,ssr rm ? ssr 0100mmmm00111110 ? 1/3 * 4 ldc rm,spc rm ? spc 0100mmmm01001110 ? 1/3 * 4 ldc rm,mod * 9 rm ? mod 0100mmmm01011110 ? 3 ldc rm,re * 9 rm ? re 0100mmmm01101110 ? 3 ldc rm,rs * 9 rm ? rs 0100mmmm01101110 ? 3 ldc rm,r0_bank rm ? r0_bank 0100mmmm10001110 ? 1/3 * 4 ldc rm,r1_bank rm ? r1_bank 0100mmmm10011110 ? 1/3 * 4 ldc rm,r2_bank rm ? r2_bank 0100mmmm10101110 ? 1/3 * 4 ldc rm,r3_bank rm ? r3_bank 0100mmmm10111110 ? 1/3 * 4 ldc rm,r4_bank rm ? r4_bank 0100mmmm11001110 ? 1/3 * 4 ldc rm,r5_bank rm ? r5_bank 0100mmmm11011110 ? 1/3 * 4 ldc rm,r6_bank rm ? r6_bank 0100mmmm11101110 ? 1/3 * 4 ldc rm,r7_bank rm ? r7_bank 0100mmmm11111110 ? 1/3 * 4 ldc.l @rm+,gbr (rm) ? gbr, rm + 4 ? rm 0100mmmm00010111 1/5 * 5 ldc.l @rm+,sr (rm) ? sr, rm + 4 ? rm 0100mmmm00000111 ? 7 lsb ldc.l @rm+,vbr (rm) ? vbr, rm + 4 ? rm 0100mmmm00100111 ? 1/5 * 5 ldc.l @rm+,ssr (rm) ? ssr, rm + 4 ? rm 0100mmmm00110111 ? 1/5 * 5 ldc.l @rm+,spc (rm) ? spc, rm + 4 ? rm 0100mmmm01000111 ? 1/5 * 5 ldc.l @rm+,mod * 9 (rm) ? mod,rm + 4 ? rm 0100mmmm01010111 ? 5 ldc.l @rm+,re * 9 (rm) ? re,rm + 4 ? rm 0100mmmm01110111 ? 5 ldc.l @rm+,rs * 9 (rm) ? rs,rm + 4 ? rm 0100mmmm01100111 ? 5 116 table 7-12 instruction set listed alphabetically (cont) instruction operation code privilege cycles t bit ldc.l @rm+,r0_bank (rm) ? r0_bank, rm + 4 ? rm 0100mmmm10000111 ? 1/5 * 5 ldc.l @rm+,r1_bank (rm) ? r1_bank, rm + 4 ? rm 0100mmmm10010111 ? 1/5 * 5 ldc.l @rm+,r2_bank (rm) ? r2_bank, rm + 4 ? rm 0100mmmm10100111 ? 1/5 * 5 ldc.l @rm+,r3_bank (rm) ? r3_bank, rm + 4 ? rm 0100mmmm10110111 ? 1/5 * 5 ldc.l @rm+,r4_bank (rm) ? r4_bank, rm + 4 ? rm 0100mmmm11000111 ? 1/5 * 5 ldc.l @rm+,r5_bank (rm) ? r5_bank, rm + 4 ? rm 0100mmmm11010111 ? 1/5 * 5 ldc.l @rm+,r6_bank (rm) ? r6_bank, rm + 4 ? rm 0100mmmm11100111 ? 1/5 * 5 ldc.l @rm+,r7_bank (rm) ? r7_bank, rm + 4 ? rm 0100mmmm11110111 ? 1/5 * 5 ldre @(disp,pc) * 9 disp 2 + pc ? re 10001110dddddddd 3 ldrs @(disp,pc) * 9 disp 2 + pc ? rs 10001100dddddddd 3 lds rm,fpscr * 3 rm ? fpscr 0100nnnn01101010 1 lds rm,fpul * 3 rm ? fpul 0100nnnn01011010 1 lds rm,mach rm ? mach 0100mmmm00001010 1 lds rm,macl rm ? macl 0100mmmm00011010 1 lds rm,pr rm ? pr 0100mmmm00101010 1 lds rm,a0 * 9 rm ? dsr 0100mmmm01101010 1 lds rm,dsr * 9 rm ? a0 0100mmmm01111010 1 lds rm,x0 * 9 rm ? x0 0100mmmm10001010 1 lds rm,x1 * 9 rm ? x1 0100mmmm10011010 1 lds rm,y0 * 9 rm ? y0 0100mmmm10101010 1 lds rm,y1 * 9 rm ? y1 0100mmmm10111010 1 lds.l @rm+ ,fpscr * 3 @rm ? fpscr , rm+4 ? rn 0100nnnn01100110 1 lds.l @rm+ ,fpul * 3 @rm ? fpul , rm+4 ? rn 0100nnnn01010110 1 lds.l @rm+,mach (rm) ? mach, rm + 4 ? rm 0100mmmm00000110 1 lds.l @rm+,macl (rm) ? macl, rm + 4 ? rm 0100mmmm00010110 1 117 table 7-12 instruction set listed alphabetically (cont) instruction operation code privilege cycles t bit lds.l @rm+,pr (rm) ? pr, rm + 4 ? rm 0100mmmm00100110 1 lds.l @rm+,dsr * 9 (rm) ? dsr, rm+4 ? rm 0100mmmm01100110 1 lds.l @rm+,a0 * 9 (rm) ? a0, rm+4 ? rm 0100mmmm01110110 1 lds.l @rm+,x0 * 9 (rm) ? x0, rm+4 ? rm 0100mmmm10000110 1 lds.l @rm+,x1 * 9 (rm) ? x1, rm+4 ? rm 0100mmmm10010110 1 lds.l @rm+,y0 * 9 (rm) ? y0, rm+4 ? rm 0100mmmm10100110 1 lds.l @rm+,y1 * 9 (rm) ? y1, rm+4 ? rm 0100mmmm10110110 1 ldtlb pteh/ptel ? tlb 0000000000111000 ? 1 mac.l @rm+,@rn+ signed operation of (rn) (rm) + mac ? mac 0000nnnnmmmm1111 2 (to 5) * 1 mac.w @rm+,@rn+ signed operation of (rn) (rm) + mac ? mac 0100nnnnmmmm1111 2 (to 5) * 1 mov #imm,rn #imm ? sign extension ? rn 1110nnnniiiiiiii 1 mov rm,rn rm ? rn 0110nnnnmmmm0011 1 mov.b @(disp,gbr),r0 (disp + gbr) ? sign extension ? r0 11000100dddddddd 1 mov.b @(disp,rm),r0 (disp + rm) ? sign extension ? r0 10000100mmmmdddd 1 mov.b @(r0,rm),rn (r0 + rm) ? sign extension ? rn 0000nnnnmmmm1100 1 mov.b @rm+,rn (rm) ? sign extension ? rn, rm + 1 ? rm 0110nnnnmmmm0100 1 mov.b @rm,rn (rm) ? sign extension ? rn 0110nnnnmmmm0000 1 mov.b r0,@(disp,gbr) r0 ? (disp + gbr) 11000000dddddddd 1 mov.b r0,@(disp,rn) r0 ? (disp + rn) 10000000nnnndddd 1 mov.b rm,@(r0,rn) rm ? (r0 + rn) 0000nnnnmmmm0100 1 mov.b rm,@Crn rnC1 ? rn, rm ? (rn) 0010nnnnmmmm0100 1 mov.b rm,@rn rm ? (rn) 0010nnnnmmmm0000 1 118 table 7-12 instruction set listed alphabetically (cont) instruction operation code privilege cycles t bit mov.l @(disp,gbr),r0 (disp + gbr) ? r0 11000110dddddddd 1 mov.l @(disp,pc),rn (disp + pc) ? rn 1101nnnndddddddd 1 mov.l @(disp,rm),rn (disp + rm) ? rn 0101nnnnmmmmdddd 1 mov.l @(r0,rm),rn (r0 + rm) ? rn 0000nnnnmmmm1110 1 mov.l @rm+,rn (rm) ? rn, rm + 4 ? rm 0110nnnnmmmm0110 1 mov.l @rm,rn (rm) ? rn 0110nnnnmmmm0010 1 mov.l r0,@(disp,gbr) r0 ? (disp + gbr) 11000010dddddddd 1 mov.l rm,@(disp,rn) rm ? (disp + rn) 0001nnnnmmmmdddd 1 mov.l rm,@(r0,rn) rm ? (r0 + rn) 0000nnnnmmmm0110 1 mov.l rm,@Crn rnC4 ? rn, rm ? (rn) 0010nnnnmmmm0110 1 mov.l rm,@rn rm ? (rn) 0010nnnnmmmm0010 1 mov.w @(disp,gbr),r0 (disp + gbr) ? sign extension ? r0 11000101dddddddd 1 mov.w @(disp,pc),rn (disp + pc) ? sign extension ? rn 1001nnnndddddddd 1 mov.w @(disp,rm),r0 (disp + rm) ? sign extension ? r0 10000101mmmmdddd 1 mov.w @(r0,rm),rn (r0 + rm) ? sign extension ? rn 0000nnnnmmmm1101 1 mov.w @rm+,rn (rm) ? sign extension ? rn, rm + 2 ? rm 0110nnnnmmmm0101 1 mov.w @rm,rn (rm) ? sign extension ? rn 0110nnnnmmmm0001 1 mov.w r0,@(disp,gbr) r0 ? (disp + gbr) 11000001dddddddd 1 mov.w r0,@(disp,rn) r0 ? (disp + rn) 10000001nnnndddd 1 mov.w rm,@(r0,rn) rm ? (r0 + rn) 0000nnnnmmmm0101 1 mov.w rm,@Crn rnC2 ? rn, rm ? (rn) 0010nnnnmmmm0101 1 mov.w rm,@rn rm ? (rn) 0010nnnnmmmm0001 1 mova @(disp,pc),r0 disp + pc ? r0 11000111dddddddd 1 movt rn t ? rn 0000nnnn00101001 1 mul.l rm,rn rn rm ? mac 0000nnnnmmmm0111 2 (to 5) * 1 muls.w rm,rn signed operation of rn rm ? mac 0010nnnnmmmm1111 1 (to 3) * 1 119 table 7-12 instruction set listed alphabetically (cont) instruction operation code privilege cycles t bit mulu.w rm,rn unsigned operation of rn rm ? mac 0010nnnnmmmm1110 1 (to 3) * 1 neg rm,rn 0Crm ? rn 0110nnnnmmmm1011 1 negc rm,rn 0CrmCt ? rn, borrow ? t 0110nnnnmmmm1010 1 borrow nop no operation 0000000000001001 1 not rm,rn ~rm ? rn 0110nnnnmmmm0111 1 or #imm,r0 r0 | imm ? r0 11001011iiiiiiii 1 or rm,rn rn | rm ? rn 0010nnnnmmmm1011 1 or.b #imm, @(r0,gbr) (r0 + gbr) | imm ? (r0 + gbr) 11001111iiiiiiii 3 pref @rn (rn) ? cache 0000nnnn10000011 1/2 * 6 rotcl rn t ? rn ? t 0100nnnn00100100 1 msb rotcr rn t ? rn ? t 0100nnnn00100101 1 lsb rotl rn t ? rn ? msb 0100nnnn00000100 1 msb rotr rn lsb ? rn ? t 0100nnnn00000101 1 lsb rte delayed branch, ssr/spc ? sr/pc 0000000000101011 ? 4 rts delayed branch, pr ? pc 0000000000001011 2 setrc rm * 9 12 lower bits of rm ? rc (sr bits 27 to 16), repeat control flag ? rf1, rf0 0100mmmm00010100 3 setrc #imm * 9 imm ? rc (sr bits 23 to 16), repeat control flag ? rf1, rf0 10000010iiiiiiii 3 sets 1 ? s 0000000001011000 1 sett 1 ? t 0000000000011000 11 shad rm,rn rn 3 0; rn << rm ? rn rn < 0; rn >> rm ? (msb ? )rn 0100nnnnmmmm1100 1 shal rn t ? rn ? 0 0100nnnn00100000 1 msb shar rn msb ? rn ? t 0100nnnn00100001 1 lsb shld rm,rn rn 3 0; rn << rm ? rn rn < 0; rn >> rm ? (0 ? )rn 0100nnnnmmmm1101 1 shll rn t ? rn ? 0 0100nnnn00000000 1 msb 120 table 7-12 instruction set listed alphabetically (cont) instruction operation code privilege cycles t bit shll2 rn rn << 2 ? rn 0100nnnn00001000 1 shll8 rn rn << 8 ? rn 0100nnnn00011000 1 shll16 rn rn << 16 ? rn 0100nnnn00101000 1 shlr rn 0 ? rn ? t 0100nnnn00000001 1 lsb shlr2 rn rn>>2 ? rn 0100nnnn00001001 1 shlr8 rn rn>>8 ? rn 0100nnnn00011001 1 shlr16 rn rn>>16 ? rn 0100nnnn00101001 1 sleep sleep 0000000000011011 ? 4 stc gbr,rn gbr ? rn 0000nnnn00010010 1 stc sr,rn sr ? rn 0000nnnn00000010 ? 1 stc vbr,rn vbr ? rn 0000nnnn00100010 ? 1 stc ssr,rn ssr ? rn 0000nnnn00110010 ? 1 stc spc,rn spc ? rn 0000nnnn01000010 ? 1 stc mod,rn * 9 mod ? rn 0000nnnn01010010 1 stc re,rn * 9 re ? rn 0000nnnn01110010 1 stc rs,rn * 9 rs ? rn 0000nnnn01100010 1 stc r0_bank,rn r0_bank ? rn 0000nnnn10000010 ? 1 stc r1_bank,rn r1_bank ? rn 0000nnnn10010010 ? 1 stc r2_bank,rn r2_bank ? rn 0000nnnn10100010 ? 1 stc r3_bank,rn r3_bank ? rn 0000nnnn10110010 ? 1 stc r4_bank,rn r4_bank ? rn 0000nnnn11000010 ? 1 stc r5_bank,rn r5_bank ? rn 0000nnnn11010010 ? 1 stc r6_bank,rn r6_bank ? rn 0000nnnn11100010 ? 1 stc r7_bank,rn r7_bank ? rn 0000nnnn11110010 ? 1 stc.l gbr,@Crn rnC4 ? rn, gbr ? (rn) 0100nnnn00010011 1/2 * 6 stc.l sr,@Crn rnC4 ? rn, sr ? (rn) 0100nnnn00000011 ? 1/2 * 6 stc.l vbr,@Crn rnC4 ? rn, vbr ? (rn) 0100nnnn00100011 ? 1/2 * 6 stc.l ssr,@Crn rnC4 ? rn, ssr ? (rn) 0100nnnn00110011 ? 1/2 * 6 stc.l spc,@Crn rnC4 ? rn, spc ? (rn) 0100nnnn01000011 ? 1/2 * 6 stc.l mod,@-rn * 9 rnC4 ? rn, mod ? (rn) 0100nnnn01010011 ? 2 stc.l re,@-rn * 9 rnC4 ? rn, re ? (rn) 0100nnnn01110011 ? 2 121 table 7-12 instruction set listed alphabetically (cont) instruction operation code privilege cycles t bit stc.l rs,@-rn * 9 rnC4 ? rn, rs ? (rn) 0100nnnn01100011 ? 2 stc.l r0_bank,@Crn rnC4 ? rn, r0_bank ? (rn) 0100nnnn10000011 ? 2 stc.l r1_bank,@Crn rnC4 ? rn, r1_bank ? (rn) 0100nnnn10010011 ? 2 stc.l r2_bank,@Crn rnC4 ? rn, r2_bank ? (rn) 0100nnnn10100011 ? 2 stc.l r3_bank,@Crn rnC4 ? rn, r3_bank ? (rn) 0100nnnn10110011 ? 2 stc.l r4_bank,@Crn rnC4 ? rn, r4_bank ? (rn) 0100nnnn11000011 ? 2 stc.l r5_bank,@Crn rnC4 ? rn, r5_bank ? (rn) 0100nnnn11010011 ? 2 stc.l r6_bank,@Crn rnC4 ? rn, r6_bank ? (rn) 0100nnnn11100011 ? 2 stc.l r7_bank,@Crn rnC4 ? rn, r7_bank ? (rn) 0100nnnn11110011 ? 2 sts fpscr, rn * 3 fpscr ? rn 0000nnnn01101010 1 sts fpul, rn * 3 fpul ? rn 0000nnnn01011010 1 sts mach,rn mach ? rn 0000nnnn00001010 1 sts macl,rn macl ? rn 0000nnnn00011010 1 sts pr,rn pr ? rn 0000nnnn00101010 1 sts dsr,rn * 9 dsr ? rn 0000nnnn01101010 1 sts a0,rn * 9 a0 ? rn 0000nnnn01111010 1 sts x0,rn * 9 x0 ? rn 0000nnnn10001010 1 sts x1,rn * 9 x1 ? rn 0000nnnn10011010 1 sts y0,rn * 9 y0 ? rn 0000nnnn10101010 1 sts y1,rn * 9 y1 ? rn 0000nnnn10111010 1 sts.l fpscr,@-rn * 3 rn-4 ? rn, fpscr ? @rn 0100nnnn01100010 1 sts.l fpul,@-rn * 3 rn-4 ? rn, fpul ? @rn 0100nnnn01010010 1 sts.l mach,@Crn rnC4 ? rn, mach ? (rn) 0100nnnn00000010 1 sts.l macl,@Crn rnC4 ? rn, macl ? (rn) 0100nnnn00010010 1 sts.l pr,@Crn rnC4 ? rn, pr ? (rn) 0100nnnn00100010 1 sts.l dsr,@-rn * 9 rnC4 ? rn, dsr ? (rn) 0100nnnn01100010 1 122 table 7-12 instruction set listed alphabetically (cont) instruction operation code privilege cycles t bit sts.l a0,@-rn * 9 rnC4 ? rn, a0 ? (rn) 0100nnnn01110010 1 sts.l x0,@-rn * 9 rnC4 ? rn, x0 ? (rn) 0100nnnn10000010 1 sts.l x1,@-rn * 9 rnC4 ? rn, x1 ? (rn) 0100nnnn10010010 1 sts.l y0,@-rn * 9 rnC4 ? rn, y0 ? (rn) 0100nnnn10100010 1 sts.l y1,@-rn * 9 rnC4 ? rn, y1 ? (rn) 0100nnnn10110010 1 sub rm,rn rnCrm ? rn 0011nnnnmmmm1000 1 subc rm,rn rnCrmCt ? rn, borrow ? t 0011nnnnmmmm1010 1 borrow subv rm,rn rnCrm ? rn, underflow ? t 0011nnnnmmmm1011 1 under- flow swap.b rm,rn rm ? swap the two lowest-order bytes ? rn 0110nnnnmmmm1000 1 swap.w rm,rn rm ? swap two consecutive words ? rn 0110nnnnmmmm1001 1 tas.b @rn if (rn) is 0, 1 ? t; 1 ? msb of (rn) 0100nnnn00011011 3/4 * 7 test result trapa #imm pc/sr ? spc/ssr, (#imm) <<2 ? tra vbr + h'0100 ? pc 11000011iiiiiiii 6/8 * 8 tst #imm,r0 r0 & imm; if the result is 0, 1 ? t 11001000iiiiiiii 1 test result tst rm,rn rn & rm; if the result is 0, 1 ? t 0010nnnnmmmm1000 1 test result tst.b #imm, @(r0,gbr) (r0 + gbr) & imm; if the result is 0, 1 ? t 11001100iiiiiiii 3 test result xor #imm,r0 r0 ^ imm ? r0 11001010iiiiiiii 1 xor rm,rn rn ^ rm ? rn 0010nnnnmmmm1010 1 xor.b #imm, @(r0,gbr) (r0 + gbr) ^ imm ? (r0 + gbr) 11001110iiiiiiii 3 xtrct rm,rn rm: middle 32 bits of rn ? rn 0010nnnnmmmm1101 1 notes: 1. the normal minimum number of execution cycles. the number in parentheses is the number of cycles when there is contention with following instructions. 2. one state when it does not branch. 3. indicates floating point instructions and fpu related cpu instructions. these instructions can only be used with the sh-3e. 4. three cycles on the sh3-dsp. 5. five cycles on the sh3-dsp. 6. two cycles on the sh3-dsp. 7. four cycles on the sh3-dsp. 8. eight cycles on the sh3-dsp. 9. cpu instructions to provide support for dsp functions. these instructions can only be used with the sh3-dsp. 123 7.3 dsp data transfer instruction set (sh3-dsp only) table 7-13 shows the dsp data transfer instructions by category. table 7-13 dsp data transfer instruction categories category instruction types operation code function no. of instructions double data transfer instructions 4 nopx x memory no operation 14 movx x memory data transfer nopy y memory no operation movy y memory data transfer single data transfer instructions 1 movs single data transfer 16 total 5 total 30 the data transfer instructions are divided into two groups, double data transfers and single data transfers. double data transfers are combined with dsp operation instructions to create dsp parallel processing instructions. parallel processing instructions are 32 bits long and include a double data transfer instruction in field a. double data transfers that are not parallel processing instructions and single data transfer instructions are 16 bits long. in double data transfers, x memory and y memory can be accessed simultaneously in parallel. one instruction is specified each for the respective x and y memory data accesses. the ax pointer is used for accessing x memory; the ay pointer is used for accessing y memory. double data transfers can only access x and y memory. single data transfers can be accessed from any area. in single data transfers, the ax pointer and two other pointers are used as the as pointer. 124 7.3.1 double data transfer instructions (x memory data) table 7-14 double data transfer instructions (x memory data) instruction operation code cycles t bit nopx no operation 1111000*0*0*00** 1 movx.w @ax,dx (ax) ? msw of dx,0 ? lsw of dx 111100a*d*0*01** 1 movx.w @ax+,dx (ax) ? msw of dx,0 ? lsw of dx,ax+2 ? ax 111100a*d*0*10** 1 movx.w @ax+ix,dx (ax) ? msw of dx,0 ? lsw of dx,ax+ix ? ax 111100a*d*0*11** 1 movx.w da,@ax msw of da ? (ax) 111100a*d*1*01** 1 movx.w da,@ax+ msw of da ? (ax),ax+2 ? ax 111100a*d*1*10** 1 movx.w da,@ax+ix msw of da ? (ax),ax+ix ? ax 111100a*d*1*11** 1 7.3.2 double data transfer instructions (y memory data) table 7-15 double data transfer instructions (y memory data) instruction operation code cycles t bit nopy no operation 111100*0*0*0**00 1 movy.w @ay,dy (ay) ? msw of dy,0 ? lsw of dy 111100*a*d*0**01 1 movy.w @ay+,dy (ay) ? msw of dy,0 ? lsw of dy, ay+2 ? ay 111100*a*d*0**10 1 movy.w @ay+iy,dy (ay) ? msw of dy,0 ? lsw of dy, ay+iy ? ay 111100*a*d*0**11 1 movy.w da,@ay msw of da ? (ay) 111100*a*d*1**01 1 movy.w da,@ay+ msw of da ? (ay),ay+2 ? ay 111100*a*d*1**10 1 movy.w da,@ay+iy msw of da ? (ay),ay+iy ? ay 111100*a*d*1**11 1 125 7.3.3 single data transfer instructions table 7-16 single data transfer instructions instruction operation code cycles t bit movs.w @-as,ds asC2 ? as,(as) ? msw of ds,0 ? lsw of ds 111101aadddd0000 1 movs.w @as,ds (as) ? msw of ds,0 ? lsw of ds 111101aadddd0100 1 movs.w @as+,ds (as) ? msw of ds,0 ? lsw of ds, as+2 ? as 111101aadddd1000 1 movs.w @as+ix,ds (as) ? msw of ds,0 ? lsw of ds, as+ix ? as 111101aadddd1100 1 movs.w ds,@-as asC2 ? as,msw of ds ? (as)* 111101aadddd0001 1 movs.w ds,@as msw of ds ? (as)* 111101aadddd0101 1 movs.w ds,@as+ msw of ds ? (as),as+2 ? as* 111101aadddd1001 1 movs.w ds,@as+is msw of ds ? (as),as+is ? as* 111101aadddd1101 1 movs.l @-as,ds asC4 ? as,(as) ? ds 111101aadddd0010 1 movs.l @as,ds (as) ? ds 111101aadddd0110 1 movs.l @as+,ds (as) ? ds,as+4 ? as 111101aadddd1010 1 movs.l @as+is,ds (as) ? ds,as+is ? as 111101aadddd1110 1 movs.l ds, @-as asC4 ? as,ds ? (as) 111101aadddd0011 1 movs.l ds,@as ds ? (as) 111101aadddd0111 1 movs.l ds,@as+ ds ? (as),as+4 ? as 111101aadddd1011 1 movs.l ds,@as+is ds ? (as),as+is ? as 111101aadddd1111 1 note: * when guard bit registers a0g and a1g are specified for the source operand ds, data is output to the ldb[7:0] bus and the sign bit is output to the top bits [31:8]. 126 table 7-17 lists the correspondence between dsp data transfer operands and registers. cpu core registers are used as pointer addresses to indicate memory addresses. table 7-17 correspondence between dsp data transfer operands and registers superh (cpu core) registers oper- and r0 r1 r2 (as2) r3 (as3) r4 (ax0) (as0) r5 (ax1) (ax0) r6 (ay0) r7 (ay1) r8 (ix) r9 (iy) ax yes yes ix (is) yes dx ay yes yes iy yes dy da as yes yes yes yes ds oper- dsp registers and x0 x1 y0 y1 m0 m1 a0 a1 a0g a1g ax ix (is) dx yes yes ay iy dy yes yes da yes yes as ds yes yes yes yes yes yes yes yes yes yes note: yes indicates that the register can be set. 7.4 dsp operation instruction set (sh3-dsp only) dsp operation instructions are digital signal processing instructions that are processed by the dsp unit. their instruction code is 32 bits long. multiple instructions can be processed in parallel. the instruction code is divided into two fields, a and b. field a specifies a parallel data transfer instruction and field b specifies a single or double data operation instruction. instructions can be 127 specified independently, and their execution is independent and in parallel. parallel data transfer instructions specified in field a are exactly the same as double data transfer instructions. the data operation instructions of field b are of three types: double data operation instructions, conditional single data operation instructions, and unconditional single data operation instructions. table 7-18 shows the format of dsp operation instructions. the operands are selected independently from the dsp register. table 7-19 shows the correspondence of dsp operation instruction operands and registers. table 7-18 instruction formats for dsp operation instructions classification instruction forms instruction double data operation instructions (6 operands) aluop. sx, sy, du mltop. se, sf, dg padd pmuls, psub pmuls conditional single data operation instructions 3 operands aluop. sx, sy, dz dct aluop. sx, sy, dz dcf aluop. sx, sy, dz padd, pand, por, psha, pshl, psub, pxor 2 operands aluop. sx, dz dct aluop. sx, dz dcf aluop. sx, dz aluop. sy, dz dct aluop. sy, dz dcf aluop. sy, dz pcopy, pdec, pdmsb, pinc, plds, psts, pneg 1 operand aluop. dz dct aluop. dz dcf aluop. dz pclr, psha #imm, pshl #imm unconditional single data operation instructions 3 operands aluop. sx, sy, du mltop. se, sf, dg paddc, psubc, pwadd, pwsb, pmuls 2 operands aluop. sx, dz aluop. sy, dz pcmp, pabs, prnd 128 table 7-19 correspondence between dsp operation instruction operands and registers alu and bpu instructions multiplication instructions register sx sy dz du se sf dg a0 yes yes yes yes a1 yes yes yes yes yes yes m0 yes yes yes m1 yes yes yes x0 yes yes yes yes yes x1 yes yes yes y0 yes yes yes yes yes y1 yes yes yes when writing parallel instructions, first write the field b instruction, then the field a instruction. the following is an example of a parallel processing program. padd a0,m0,a0 pmulsx0,y0,m0 movx.w @r4+,x0 movy.w @r6+,y0[;] dcf pinc x1,a1 movx.w a0,@r5+r8 movy.w@r7+,y0[;] pcmp x1,m0 movx.w @r4 [nopy][;] text in brackets ([]) can be omitted. the no operation instructions nopx and nopy can be omitted. semicolons (;) are used to demarcate instruction lines, but can be omitted. if semicolons are used, the space after the semicolon can be used for comments. the individual status codes (dc, n, z, v, gt) of the dsr register is always updated by unconditional alu operation instructions and shift operation instructions. conditional instructions do not update the status codes, even if the conditions have been met. multiplication instructions also do not update the status codes. dc bit definitions are determined by the specifications of the cs bits in the dsr register. 129 table 7-20 shows the dsp operation instructions by category. table 7-20 dsp operation instruction categories classification instruction types operation code function no. of in- structions alu arith- alu fixed decimal point operation 11 pabs absolute value operation 28 metic instructions padd addition opera- tion padd pmuls addition and signed multiplication instruc- paddc addition with carry tions pclr clear pcmp compare pcopy copy pneg invert sign psub subtraction psub pmuls subtraction and signed multiplication psubc subtraction with borrow alu integer operation 2 pdec decrement 12 instructions pinc increment msb detection instruction 1 pdmsb msb detection 6 rounding operation instruction 1 prnd rounding 2 alu logical operation 3 pand logical and instructions por logical or 9 pxor logical exclusive or fixed decimal point multiplication instruction 1 pmuls signed multiplication 1 shift arithmetic shift operation instruction 1 psha arithmetic shift 4 logical shift operation instruction 1 pshl logical shift 4 system control instructions 2 plds system register load 12 psts store from system register total 23 total 78 130 7.4.1 alu arithmetic operation instructions alu fixed decimal point operation instructions table 7-21 alu fixed decimal point operation instructions instruction operation code cycles dc bit pabs sx,dz if sx 3 0,sx ? dz if sx<0,0C sx ? dz 111110********** 10001000xx00zzzz 1 update pabs sy,dz if sy 3 0,sy ? dz if sy<0,0Csy ? dz 111110********** 1010100000yyzzzz 1 update padd sx,sy,dz sx+sy ? dz 111110********** 10110001xxyyzzzz 1 update dct padd sx,sy,dz if dc=1,sx+sy ? dz if 0,nop 111110********** 10110010xxyyzzzz 1 dcf padd sx,sy,dz if dc=0,sx+sy ? dz if 1,nop 111110********** 10110011xxyyzzzz 1 padd sx,sy,du pmuls se,sf,dg sx+sy ? du msw of se msw of sf ? dg 111110********** 0111eeffxxyygguu 1 update paddc sx,sy,dz sx+sy+dc ? dz 111110********** 10110000xxyyzzzz 1 update pclr dz h'00000000 ? dz 111110********** 100011010000zzzz 1 update dct pclr dz if dc=1,h'00000000 ? dz if 0,nop 111110********** 100011100000zzzz 1 dcf pclr dz if dc=0,h'00000000 ? dz if 1,nop 111110********** 100011110000zzzz 1 pcmp sx,sy sxCsy 111110********** 10000100xxyy0000 1 update pcopy sx,dz sx ? dz 111110********** 11011001xx00zzzz 1 update pcopy sy,dz sy ? dz 111110********** 1111100100yyzzzz 1 update dct pcopy sx,dz if dc=1,sx ? dz if 0,nop 111110********** 11011010xx00zzzz 1 131 table 7-21 alu fixed decimal point operation instructions (cont) instruction operation code cycles dc bit dct pcopy sy,dz if dc=1,sy ? dz if 0,nop 111110********** 1111101000yyzzzz 1 dcf pcopy sx,dz if dc=0,sx ? dz if 1,nop 111110********** 11011011xx00zzzz 1 dcf pcopy sy,dz if dc=0,sy ? dz if 1,nop 111110********** 1111101100yyzzzz 1 pneg sx,dz 0Csx ? dz 111110********** 11001001xx00zzzz 1 update pneg sy,dz 0Csy ? dz 111110********** 1110100100yyzzzz 1 update dct pneg sx,dz if dc=1,0Csx ? dz if 0,nop 111110********** 11001010xx00zzzz 1 dct pneg sy,dz if dc=1,0Csy ? dz if 0,nop 111110********** 1110101000yyzzzz 1 dcf pneg sx,dz if dc=0,0Csx ? dz if 1,nop 111110********** 11001011xx00zzzz 1 dcf pneg sy,dz if dc=0,0Csy ? dz if 1,nop 111110********** 1110101100yyzzzz 1 psub sx,sy,dz sxCsy ? dz 111110********** 10100001xxyyzzzz 1 update dct psub sx,sy,dz if dc=1,sxCsy ? dz if 0,nop 111110********** 10100010xxyyzzzz 1 dcf psub sx,sy,dz if dc=0,sxCsy ? dz if 1,nop 111110********** 10100011xxyyzzzz 1 psub sx,sy,du pmuls se,sf,dg sxCsy ? du msw of se msw of sf ? dg 111110********** 0110eeffxxyygguu 1 update psubc sx,sy,dz sxCsyCdc ? dz 111110********** 10100000xxyyzzzz 1 update 132 alu integer operation instructions table 7-22 alu integer operation instructions instruction operation code cycles dc bit pdec sx,dz msw of sx C 1 ? msw of dz, clear lsw of dz 111110********** 10001001xx00zzzz 1 update pdec sy,dz msw of sy C 1 ? msw of dz, clear lsw of dz 111110********** 10101001xx00zzzz 1 update dct pdec sx,dz if dc=1, msw of sx C 1 ? msw of dz, clear lsw of dz; if 0, nop 111110********** 10001010xx00zzzz 1 dct pdec sy,dz if dc=1, msw of sy C 1 ? msw of dz, clear lsw of dz; if 0, nop 111110********** 10101010xx00zzzz 1 dcf pdec sx,dz if dc=0, msw of sx C 1 ? msw of dz, clear lsw of dz; if 1, nop 111110********** 10001011xx00zzzz 1 dcf pdec sy,dz if dc=0, msw of sy C 1 ? msw of dz, clear lsw of dz; if 1, nop 111110********** 10101011xx00zzzz 1 pinc sx,dz msw of sx + 1 ? msw of dz, clear lsw of dz 111110********** 10011001xx00zzzz 1 update pinc sy,dz msw of sy + 1 ? msw of dz, clear lsw of dz 111110********** 1011100100yyzzzz 1 update dct pinc sx,dz if dc=1, msw of sx + 1 ? msw of dz, clear lsw of dz; if 0, nop 111110********** 10011010xx00zzzz 1 dct pinc sy,dz if dc=1, msw of sy + 1 ? msw of dz, clear lsw of dz; if 0, nop 111110********** 1011101000yyzzzz 1 dcf pinc sx,dz if dc=0, msw of sx + 1 ? msw of dz, clear lsw of dz; if 1, nop 111110********** 10011011xx00zzzz 1 dcf pinc sy,dz if dc=0, msw of sy + 1 ? msw of dz, clear lsw of dz; if 1, nop 111110********** 1011101100yyzzzz 1 133 msb detection instructions table 7-23 msb detection instructions instruction operation code cycles dc bit pdmsb sx,dz sx data msb position ? msw of dz, clear lsw of dz 111110********** 10011101xx00zzzz 1 update pdmsb sy,dz sy data msb position ? msw of dz, clear lsw of dz 111110********** 1011110100yyzzzz 1 update dct pdmsb sx,dz if dc=1, sx data msb position ? msw of dz, clear lsw of dz; if 0, nop 111110********** 10011110xx00zzzz 1 dct pdmsb sy,dz if dc=1, sy data msb position ? msw of dz, clear lsw of dz; if 0, nop 111110********** 1011111000yyzzzz 1 dcf pdmsb sx,dz if dc=0, sx data msb position ? msw of dz, clear lsw of dz; if 1, nop 111110********** 10011111xx00zzzz 1 dcf pdmsb sy,dz if dc=0, sy data msb position ? msw of dz, clear lsw of dz; if 1, nop 111110********** 1011111100yyzzzz 1 rounding operation instructions table 7-24 rounding operation instructions instruction operation code cycles dc bit prnd sx,dz sx+h'00008000 ? dz clear lsw of dz 111110********** 10011000xx00zzzz 1 update prnd sy,dz sy+h'00008000 ? dz clear lsw of dz 111110********** 1011100000yyzzzz 1 update 134 7.4.2 alu logical operation instructions table 7-25 alu logical operation instructions instruction operation code cycles dc bit pand sx,sy,dz sx & sy ? dz, clear lsw of dz 111110********** 10010101xxyyzzzz 1 update dct pand sx,sy,dz if dc=1, sx & sy ? dz, clear lsw of dz; if 0, nop 111110********** 10010110xxyyzzzz 1 dcf pand sx,sy,dz if dc=0, sx & sy ? dz, clear lsw of dz; if 1, nop 111110********** 10010111xxyyzzzz 1 por sx,sy,dz sx | sy ? dz, clear lsw of dz 111110********** 10110101xxyyzzzz 1 update dct por sx,sy,dz if dc=1, sx | sy ? dz, clear lsw of dz; if 0, nop 111110********** 10110110xxyyzzzz 1 dcf por sx,sy,dz if dc=0, sx | sy ? dz, clear lsw of dz; if 1, nop 111110********** 10110111xxyyzzzz 1 pxor sx,sy,dz sx ^ sy ? dz, clear lsw of dz 111110********** 10100101xxyyzzzz 1 update dct pxor sx,sy,dz if dc=1, sx ^ sy ? dz, clear lsw of dz; if 0, nop 111110********** 10100110xxyyzzzz 1 dcf pxor sx,sy,dz if dc=0, sx ^ sy ? dz, clear lsw of dz; if 1, nop 111110********** 10100111xxyyzzzz 1 7.4.3 fixed decimal point multiplication instructions table 7-26 fixed decimal point multiplication instructions instruction operation code cycles dc bit pmuls se,sf,dg msw of se msw of sf ? dg 111110********** 0100eeff0000gg00 1 135 7.4.4 shift operation instructions arithmetic shift instructions table 7-27 arithmetic shift instructions instruction operation code cycles dc bit psha sx,sy,dz if sy 3 0,sx< 136 logical shift operation instructions table 7-28 logical shift operation instructions instruction operation code cycles dc bit pshl sx,sy,dz if sy 3 0,sx< 137 7.4.5 system control instructions table 7-29 system control instructions instruction operation code cycles dc bit plds dz,mach dz ? mach 111110********** 111011010000zzzz 1 plds dz,macl dz ? macl 111110********** 111111010000zzzz 1 dct plds dz,mach if dc=1,dz ? mach if 0,nop 111110********** 111011100000zzzz 1 dct plds dz,macl if dc=1,dz ? macl if 0,nop 111110********** 111111100000zzzz 1 dcf plds dz,mach if dc=0,dz ? mach if 1,nop 111110********** 111011110000zzzz 1 dcf plds dz,macl if dc=0,dz ? macl if 1,nop 111110********** 111111110000zzzz 1 psts mach,dz mach ? dz 111110********** 110011010000zzzz 1 psts macl,dz macl ? dz 111110********** 110111010000zzzz 1 dct psts mach,dz if dc=1,mach ? dz if 0,nop 111110********** 110011100000zzzz 1 dct psts macl,dz if dc=1,macl ? dz if 0,nop 111110********** 110111100000zzzz 1 dcf psts mach,dz if dc=0,mach ? dz if 1,nop 111110********** 110011110000zzzz 1 dcf psts macl,dz if dc=0,macl ? dz if 1,nop 111110********** 110111110000zzzz 1 7.4.6 nopx and nopy instruction code when there is no data transfer instruction to be processed in parallel with the dsp operation instruction, a nopx or nopy instruction can be written as the data transfer instruction or the instruction can be omitted. the operation code is the same in either case. table 7-30 shows the nopx and nopy instruction code. 138 table 7-30 sample nopx and nopy instruction code instruction code padd x0, y0, a0 movx. w @r4+, x0 movy.w @r6+r9, y0 1111100010110000 1000000010100000 padd x0, y0, a0 nopx movy.w @r6+r9, y0 1111100000110000 1000000010100000 padd x0, y0, a0 nopx nopy 1111100000000000 1000000010100000 padd x0, y0, a0 nopx padd x0, y0, a0 movx. w @r4+, x0 movy.w @r6+r9, y0 1111000010110000 movx. w @r4+, x0 nopy 1111000010000000 movs. w @r4+, x0 1111011010000000 nopx movy.w @r6+r9, y0 1111000000110000 movy.w @r6+r9, y0 nopx nopy 1111000000000000 nop 0000000000001001 139 section 8 instruction descriptions this section describes instructions in alphabetical order using the format shown below in section 8.1. the actual descriptions begin at section 8.2. 8.1 sample description (name): classification class: indicates if the instruction is a delayed branch instruction or interrupt disabled instruction format abstract code cycle t bit assembler input format; imm and disp are numbers, expressions, or symbols a brief description of operation displayed in order msb ? lsb number of cycles when there is no wait state the value of t bit after the instruction is executed note: section 8.2 contains an description of cpu instructions common to the sh-3, sh-3e, and sh3-dsp, section 8.3 covers floating point instructions that can only be used with the sh- 3e, and section 8.4 covers dsp data transfer instructions that can only be used with the sh3-dsp. the number of execution cycles required for floating point instructions is determined by the latency and pitch values. "latency" refers to the number of cycles required to generate the result value for the operation, and "pitch" indicates the number of wait cycles required before execution of the next instruction can begin. the latency and pitch values are the same for most cpu instructions, indicating that they each require one execution cycle. description: description of operation notes: notes on using the instruction operation: operation written in c language. this part is just a reference to help understanding of an operation. the following resources should be used. ? reads data of each length from address addr. an address error will occur if word data is read from an address other than 2n or if longword data is read from an address other than 4n: unsigned char read_byte(unsigned long addr); unsigned short read_word(unsigned long addr); unsigned long read_long(unsigned long addr); ? writes data of each length to address addr. an address error will occur if word data is written to an address other than 2n or if longword data is written to an address other than 4n: unsigned char write_byte(unsigned long addr, unsigned long data); unsigned short write_word(unsigned long addr, unsigned long data); unsigned long write_long(unsigned long addr, unsigned long data); 140 ? starts execution from the slot instruction located at an address (addr C 4). for delay_slot (4), execution starts from an instruction at address 0 rather than address 4. the following instructions are detected before execution as having illegal slots (they become illegal slot instructions when used as delay slot instructions): bf, bt, bra, bsr, jmp, jsr, rts, rte, trapa, bf/s, bt/s, braf, bsrf delay_slot(unsigned long addr); ? list registers: unsigned long r[16]; unsigned long sr,gbr,vbr; unsigned long mach,macl,pr; unsigned long pc; ? definition of sr structures: struct sr0 { unsigned long dummy0:4; unsigned long rc0:12; unsigned long dummy1:4; unsigned long dmy0:1; unsigned long dmx0:1; unsigned long m0:1; unsigned long q0:1; unsigned long i0:4; unsigned long rf10:1; unsigned long rf00:1; unsigned long s0:1; unsigned long t0:1; }; ? definition of bits in sr: #define m ((*(struct sr0 *)(&sr)).m0) #define q ((*(struct sr0 *)(&sr)).q0) #define s ((*(struct sr0 *)(&sr)).s0) #define t ((*(struct sr0 *)(&sr)).t0) #define rf1 ((*(struct sr0 *)(&sr)).rf10) #define rf0 ((*(struct sr0 *)(&sr)).rf00) 141 ? error display function: error( char *er ); the pc should point to the location four bytes (the second instruction) after the current instruction. therefore, pc = 4; means the instruction starts execution from address 0, not address 4. examples: examples are written in assembler mnemonics and describe status before and after executing the instruction. characters in italics such as .align are assembler control instructions (listed below). for more information, see the cross assembler user manual. .org location counter set .data.w securing integer word data .data.l securing integer longword data .sdata securing string data .align 2 2-byte boundary alignment .align 4 2-byte boundary alignment .arepeat 16 16-repeat expansion .arepeat 32 32-repeat expansion .aendr end of repeat expansion of specified number notes: the sh series cross assembler version 1.0 does not support the conditional assembler functions. 1. for the following addressing modes involving displacement (disp), the assembler descriptors in this manual indicate values before scaling ((1, (2, (3, (4) to match the operand size. this is done to clarify the operation of the lsi device. refer to the applicable assembler notation rules for the actual assembler descriptors. @(disp: 4, rn); register indirect with displacement @(disp: 8, gbr); gbr indirect with displacement @(disp: 8, pc); pc relative with displacement disp: 8, disp: 12; pc relative 2. of the 16 bits of the instruction code, codes not assigned as instructions or privileged instructions in the user mode (excluding instructions that access gbr) are treated as general invalid instructions and invalid instruction exception processing is performed. example: h'ffff [general invalid instruction] 3. if the instruction following a delayed branching instruction such as bra and bt/s is a general invalid instruction or a pc overwrite instruction (branching instruction, etc.) (such instructions are referred to as "slot invalid instructions"), slot invalid instruction exception processing is performed. 4. in the sh3-dsp, if a general invalid instruction, a pc overwrite instruction (branching instruction, etc.), or an instruction (setrc, ldrs, ldre, ldc) that overwrites the sr, rs, or re register is contained within a repeating program (loop) consisting of 142 three or fewer instructions or within the final three instructions of a repeating program (loop) consisting of four or more instructions, invalid instruction exception processing is performed. for details, refer to 5.12 dsp repeat (loop) control. 143 8.2 instruction description (listing and description of instructions common to the sh-3, sh-3e and sh3-dsp) 8.2.1 add (add binary): arithmetic instruction format abstract code cycle t bit add rm,rn rm + rn ? rn 0011nnnnmmmm1100 1 add #imm,rn rn + imm ? rn 0111nnnniiiiiiii 1 description: adds general register rn data to rm data, and stores the result in rn. 8-bit immediate data can be added instead of rm data. since the 8-bit immediate data is sign-extended to 32 bits, this instruction can add and subtract immediate data. operation: add(long m,long n) /* add rm,rn */ { r[n]+=r[m]; pc+=2; } addi(long i,long n) /* add #imm,rn */ { if ((i&0x80)==0) r[n]+=(0x000000ff & (long)i); else r[n]+=(0xffffff00 | (long)i); pc+=2; } examples: add r0,r1 ; before execution r0 = h'7fffffff, r1 = h'00000001 ; after execution r1 = h'80000000 add #h'01,r2 ; before execution r2 = h'00000000 ; after execution r2 = h'00000001 add #h'fe,r3 ; before execution r3 = h'00000001 ; after execution r3 = h'ffffffff 144 8.2.2 addc (add with carry): arithmetic instruction format abstract code cycle t bit addc rm,rn rn + rm + t ? rn, carry ? t 0011nnnnmmmm1110 1 carry description: adds general register rm data and the t bit to rn data, and stores the result in rn. the t bit changes according to the result. this instruction can add data that has more than 32 bits. operation: addc (long m,long n) /* addc rm,rn */ { unsigned long tmp0,tmp1; tmp1=r[n]+r[m]; tmp0=r[n]; r[n]=tmp1+t; if (tmp0>tmp1) t=1; else t=0; if (tmp1>r[n]) t=1; pc+=2; } examples: clrt ; r0:r1 (64 bits) + r2:r3 (64 bits) = r0:r1 (64 bits) addc r3,r1 ; before execution t = 0, r1 = h'00000001, r3 = h'ffffffff ; after execution t = 1, r1 = h'0000000 addc r2,r0 ; before execution t = 1, r0 = h'00000000, r2 = h'00000000 ; after execution t = 0, r0 = h'00000001 145 8.2.3 addv (add with v flag overflow check): arithmetic instruction format abstract code cycle t bit addv rm,rn rn + rm ? rn, overflow ? t 0011nnnnmmmm1111 1 overflow description: adds general register rn data to rm data, and stores the result in rn. if an overflow occurs, the t bit is set to 1. operation: addv(long m,long n) /*addv rm,rn */ { long dest,src,ans; if ((long)r[n]>=0) dest=0; else dest=1; if ((long)r[m]>=0) src=0; else src=1; src+=dest; r[n]+=r[m]; if ((long)r[n]>=0) ans=0; else ans=1; ans+=dest; if (src==0 || src==2) { if (ans==1) t=1; else t=0; } else t=0; pc+=2; } examples: addv r0,r1 ; before execution r0 = h'00000001, r1 = h'7ffffffe, t = 0 ; after execution r1 = h'7fffffff, t = 0 addv r0,r1 ; before execution r0 = h'00000002, r1 = h'7ffffffe, t = 0 ; after execution r1 = h'80000000, t = 1 146 8.2.4 and (and logical): logic operation instruction format abstract code cycle t bit and rm,rn rn & rm ? rn 0010nnnnmmmm1001 1 and #imm,r0 r0 & imm ? r0 11001001iiiiiiii 1 and.b #imm,@(r0,gbr) (r0 + gbr) & imm ? (r0 + gbr) 11001101iiiiiiii 3 description: logically ands the contents of general registers rn and rm, and stores the result in rn. the contents of general register r0 can be anded with zero-extended 8-bit immediate data. 8-bit memory data pointed to by gbr relative addressing can be anded with 8-bit immediate data. note: after and #imm, r0 is executed and the upper 24 bits of r0 are always cleared to 0. operation: and(long m,long n) /* and rm,rn */ { r[n]&=r[m] pc+=2; } andi(long i) /* and #imm,r0 */ { r[0]&=(0x000000ff & (long)i); pc+=2; } andm(long i) /* and.b #imm,@(r0,gbr) */ { long temp; temp=(long)read_byte(gbr+r[0]); temp&=(0x000000ff & (long)i); write_byte(gbr+r[0],temp); pc+=2; } 147 examples: and r0,r1 ; before execution r0 = h'aaaaaaaa, r1 = h'55555555 ; after execution r1 = h'00000000 and #h'0f,r0 ; before execution r0 = h'ffffffff ; after execution r0 = h'0000000f and.b #h'80,@(r0,gbr) ; before execution @(r0,gbr) = h'a5 ; after execution @(r0,gbr) = h'80 148 8.2.5 bf (branch if false): branch instruction format abstract code cycle t bit bf label when t = 0, disp 2 + pc ? pc; when t = 1, nop 10001011dddddddd 3/1 description: reads the t bit, and conditionally branches. if t = 1, bf executes the next instruction. if t = 0, it branches. the branch destination is an address specified by pc + displacement. the pc points to the starting address of the second instruction after the branch instruction. the 8-bit displacement is sign-extended and doubled. consequently, the relative interval from the branch destination is C256 to +254 bytes. if the displacement is too short to reach the branch destination, use bf with the bra instruction or the like. note: when branching, three cycles; when not branching, one cycle. if this instruction is located in a delayed slot immediately following a delayed branch instruction, it is acknowledged as an illegal slot instruction. operation: bf(long d) /* bf disp */ { long disp; if ((d&0x80)==0) disp=(0x000000ff & (long)d); else disp=(0xffffff00 | (long)d); if (t==0) pc=pc+(disp<<1)+4; else pc+=2; } example: clrt ; t is always cleared to 0 bt trget_t ; does not branch, because t = 0 bf trget_f ; branches to trget_f, because t = 0 nop nop ; ? the pc location is used to calculate the branch destination ; address of the bf instruction trget_f: ; ? branch destination of the bf instruction 149 8.2.6 bf/s (branch if false with delay slot): branch instruction class: delayed branch instruction format abstract code cycle t bit bf label when t = 0, disp 2 + pc ? pc; when t = 1, nop 10001111dddddddd 2/1 description: reads the t bit, and if t = 1, bf executes the next instruction. if t = 0, it branches after executing the next instruction. the branch destination is an address specified by pc + displacement. the pc points to the starting address of the second instruction after the branch instruction. the 8-bit displacement is sign-extended and doubled. consequently, the relative interval from the branch destination is C256 to +254 bytes. if the displacement is too short to reach the branch destination, use bf with the bra instruction or the like. note: the bf/s instruction is a conditional delayed branch instruction: taken case: the instruction immediately following is executed before the branch. between the time this instruction and the instruction immediately following are executed, no interrupts are accepted. when the instruction immediately following is a branch instruction, it is recognized as an illegal slot instruction. not taken case: this instruction operates as a nop instruction. between the time this instruction and the instruction immediately following are executed, interrupts are accepted. when the instruction immediately following is a branch instruction, it is not recognized as an illegal slot instruction. 150 operation: bfs(long d) /* bfs disp */ { long disp; unsigned long temp; temp=pc; if ((d&0x80)==0) disp=(0x000000ff & (long)d); else disp=(0xffffff00 | (long)d); if (t==0) { pc=pc+(disp<<1)+4; delay_slot(temp+2); } else pc+=2; } examples: sett ; t is always 1 bf/s target_f ; does not branch, because t = 1 nop bt/s target_t ; branches to target, because t = 1 add r0,r1 ; executed before branch . nop ; ? the pc location is used to calculate the branch destination ; address of the bt/s instruction trget_t: ; ? branch destination of the bt/s instruction note: in delayed branching, the branching operation itself takes place after the slot instruction has been executed. however, execution of instructions (register updating, etc.) should always be done in the sequence of delayed branch instruction followed by delayed slot instruction. for example, even if a delayed slot updates a register in which the branching destination address is stored, the contents of the register before updating will be used as the branching destination address. 151 8.2.7 bra (branch): branch instruction class: delayed branch instruction format abstract code cycle t bit bra label disp 2 + pc ? pc 1010dddddddddddd 2 description: branches unconditionally after executing the instruction following this bra instruction. the branch destination is an address specified by pc + displacement. the pc points to the starting address of the second instruction after this bra instruction. the 12-bit displacement is sign-extended and doubled. consequently, the relative interval from the branch destination is C 4096 to +4094 bytes. if the displacement is too short to reach the branch destination, this instruction must be changed to the jmp instruction. here, a mov instruction must be used to transfer the destination address to a register. note: since this is a delayed branch instruction, the instruction after bra is executed before branching. no interrupts are accepted between this instruction and the next instruction. if the next instruction is a branch instruction, it is acknowledged as an illegal slot instruction. if this instruction is located in a delayed slot immediately following a delayed branch instruction, it is acknowledged as an illegal slot instruction. operation: bra(long d) /* bra disp */ { unsigned long temp; long disp; if ((d&0x800)==0) disp=(0x00000fff & d); else disp=(0xfffff000 | d); temp=pc; pc=pc+(disp<<1)+4; delay_slot(temp+2); } 152 examples: bra trget ; branches to trget add r0,r1 ; executes add before branching nop ; ? the pc location is used to calculate the branch destination ; address of the bra instruction trget: ; ? branch destination of the bra instruction note: in delayed branching, the branching operation itself takes place after the slot instruction has been executed. however, execution of instructions (register updating, etc.) should always be done in the sequence of delayed branch instruction followed by delayed slot instruction. for example, even if a delayed slot updates a register in which the branching destination address is stored, the contents of the register before updating will be used as the branching destination address. 153 8.2.8 braf (branch far): branch instruction class: delayed branch instruction format abstract code cycle t bit braf rm rm + pc ? pc 0000nnnn00100011 2 description: branches unconditionally. the branch destination is pc + the 32-bit contents of the general register rn. pc is the start address of the second instruction after this instruction. note: since this is a delayed branch instruction, the instruction after braf is executed before branching. no interrupts and address errors are accepted between this instruction and the next instruction. if the next instruction is a branch instruction, it is acknowledged as an illegal slot instruction. if this instruction is located in a delayed slot immediately following a delayed branch instruction, it is acknowledged as an illegal slot instruction. operation: braf(long m) /* braf rm */ { unsigned long temp; temp=pc; pc+=r[m]; delay_slot(temp+2); } examples: mov.l #(target-bsrf_pc),r0 ; sets displacement. braf trget ; branches to target add r0,r1 ; executes add before branching braf_pc: ; ? the pc location is used to calculate the ; branch destination address of the braf ; instruction nop target: ; ? branch destination of the braf instruction note: in delayed branching, the branching operation itself takes place after the slot instruction has been executed. however, execution of instructions (register updating, etc.) should always be done in the sequence of delayed branch instruction followed by delayed slot instruction. for example, even if a delayed slot updates a register in which the branching 154 destination address is stored, the contents of the register before updating will be used as the branching destination address. 8.2.9 bsr (branch to subroutine): branch instruction class: delayed branch instruction format abstract code cycle t bit bsr label pc ? pr, disp 2 + pc ? pc 1011dddddddddddd 2 description: branches to the subroutine procedure at a specified address after executing the instruction following this bsr instruction. the pc value is stored in the pr, and the program branches to an address specified by pc + displacement. the pc points to the starting address of the second instruction after this bsr instruction. the 12-bit displacement is sign-extended and doubled. consequently, the relative interval from the branch destination is C4096 to +4094 bytes. if the displacement is too short to reach the branch destination, the jsr instruction must be used instead. with jsr, the destination address must be transferred to a register by using the mov instruction. this bsr instruction and the rts instruction are used for a subroutine procedure call. note: since this is a delayed branch instruction, the instruction after bsr is executed before branching. no interrupts are accepted between this instruction and the next instruction. if the next instruction is a branch instruction, it is acknowledged as an illegal slot instruction. if this instruction is located in a delayed slot immediately following a delayed branch instruction, it is acknowledged as an illegal slot instruction. the pr used by the instruction immediately following this instruction is updated by this instruction. also, if the instruction immediately following this instruction generates a re-execution exception other than instruction fetch, the pr is updated by this instruction. re-execute this instruction to recover. 155 operation: bsr(long d) /* bsr disp */ { long disp; if ((d&0x800)==0) disp=(0x00000fff & d); else disp=(0xfffff000 | d); pr=pc; pc=pc+(disp<<1)+4; delay_slot(pr+2); } examples: bsr trget ; branches to trget mov r3,r4 ; executes the mov instruction before branching add r0,r1 ; ? the pc location is used to calculate the branch destination ; address of the bsr instruction (return address for when the ; subroutine procedure is completed (pr data)) ....... ....... trget: ; ? procedure entrance mov r2,r3 rts ; returns to the above add instruction mov #1,r0 ; executes mov before branching note: in delayed branching, the branching operation itself takes place after the slot instruction has been executed. however, execution of instructions (register updating, etc.) should always be done in the sequence of delayed branch instruction followed by delayed slot instruction. for example, even if a delayed slot updates a register in which the branching destination address is stored, the contents of the register before updating will be used as the branching destination address. 156 8.2.10 bsrf (branch to subroutine far): branch instruction class: delayed branch instruction format abstract code cycle t bit bsrf rm pc ? pr, rm + pc ? pc 0000nnnn00000011 2 description: branches to the subroutine procedure at a specified address after executing the instruction following this bsrf instruction. the pc value is stored in the pr. the branch destination is pc + the 32-bit contents of the general register rn. pc is the start address of the second instruction after this instruction. used as a subroutine call in combination with rts. note: since this is a delayed branch instruction, the instruction after bsr is executed before branching. no interrupts are accepted between this instruction and the next instruction. if the next instruction is a branch instruction, it is acknowledged as an illegal slot instruction. if this instruction is located in a delayed slot immediately following a delayed branch instruction, it is acknowledged as an illegal slot instruction. the pr used by the instruction immediately following this instruction is updated by this instruction. also, if the instruction immediately following this instruction generates a re-execution exception other than instruction fetch, the pr is updated by this instruction. re-execute this instruction to recover. operation: bsrf(long m) /* bsrf rm */ { pr=pc; pc+=r[m]; delay_slot(pr+2); } 157 examples: mov.l #(target-bsrf_pc),r0 ; sets displacement. brsf @r0 ; branches to target mov r3,r4 ; executes the mov instruction before ; branching bsrf_pc: ; ? the pc location is used to calculate the ; branch destination with bsrf . add r0,r1 ..... ..... target: ; ? procedure entrance mov r2,r3 rts ; returns to the above add instruction mov #1,r0 ; executes mov before branching note: in delayed branching, the branching operation itself takes place after the slot instruction has been executed. however, execution of instructions (register updating, etc.) should always be done in the sequence of delayed branch instruction followed by delayed slot instruction. for example, even if a delayed slot updates a register in which the branching destination address is stored, the contents of the register before updating will be used as the branching destination address. 158 8.2.11 bt (branch if true): branch instruction format abstract code cycle t bit bt label when t = 1, disp 2 + pc ? pc; when t = 0, nop 10001001dddddddd 3/1 description: reads the t bit, and conditionally branches. if t = 1, bt branches. if t = 0, bt executes the next instruction. the branch destination is an address specified by pc + displacement. the pc points to the starting address of the second instruction after the branch instruction. the 8-bit displacement is sign-extended and doubled. consequently, the relative interval from the branch destination is C256 to +254 bytes. if the displacement is too short to reach the branch destination, use bt with the bra instruction or the like. note: when branching, requires three cycles; when not branching, one cycle. if this instruction is located in a delayed slot immediately following a delayed branch instruction, it is acknowledged as an illegal slot instruction. operation: bt(long d) /* bt disp */ { long disp; if ((d&0x80)==0) disp=(0x000000ff & (long)d); else disp=(0xffffff00 | (long)d); if (t==1) pc=pc+(disp<<1)+4; else pc+=2; } examples: sett ; t is always 1 bf trget_f ; does not branch, because t = 1 bt trget_t ; branches to trget_t, because t = 1 nop nop ; ? the pc location is used to calculate the branch destination ; address of the bt instruction trget_t: ; ? branch destination of the bt instruction 159 8.2.12 bt/s (branch if true with delay slot): branch instruction format abstract code cycle t bit bt/s label when t = 1, disp 2 + pc ? pc; when t = 0, nop 10001101dddddddd 2/1 description: reads the t bit, and if t = 1, bt/s branches after the following instruction executes. if t = 0, bt/s executes the next instruction. the branch destination is an address specified by pc + displacement. the pc points to the starting address of the second instruction after the branch instruction. the 8-bit displacement is sign-extended and doubled. consequently, the relative interval from the branch destination is C256 to +254 bytes. if the displacement is too short to reach the branch destination, use bt/s with the bra instruction or the like. note: the bf/s instruction is a conditional delayed branch instruction: taken case: the instruction immediately following is executed before the branch. between the time this instruction and the instruction immediately following are executed, no interrupts are accepted. when the instruction immediately following is a branch instruction, it is recognized as an illegal slot instruction. not taken case: this instruction operates as a nop instruction. between the time this instruction and the instruction immediately following are executed, interrupts are accepted. when the instruction immediately following is a branch instruction, it is not recognized as an illegal slot instruction. operation: bts(long d) /* bts disp */ { long disp; unsigned long temp; temp=pc; if ((d&0x80)==0) disp=(0x000000ff & (long)d); else disp=(0xffffff00 | (long)d); if (t==1) { pc=pc+(disp<<1)+4; delay_slot(temp+2); } else pc+=2; } 160 examples: sett ; t is always 1 bf/s target_f ; does not branch, because t = 1 nop bt/s target_t ; branches to target, because t = 1 add r0,r1 ; executes before branching. nop ; ? the pc location is used to calculate the branch destination ; address of the bt/s instruction target_t: ; ? branch destination of the bt/s instruction note: in delayed branching, the branching operation itself takes place after the slot instruction has been executed. however, execution of instructions (register updating, etc.) should always be done in the sequence of delayed branch instruction followed by delayed slot instruction. for example, even if a delayed slot updates a register in which the branching destination address is stored, the contents of the register before updating will be used as the branching destination address. 161 8.2.13 clrmac (clear mac register): system control instruction format abstract code cycle t bit clrmac 0 ? mach, macl 0000000000101000 1 description: clears the mach and macl registers. operation: clrmac() /* clrmac */ { mach=0; macl=0; pc+=2; } examples: clrmac ; initializes the mac register mac.w @r0+,@r1+ ; multiply and accumulate operation mac.w @r0+,@r1+ 162 8.2.14 clrs (clear s bit): system control instruction format abstract code cycle t bit clrs 0 ? s 0000000001001000 1 description: clears the s bit. operation: clrs() /* clrs */ { s=0; pc+=2; } examples: clrs ; before execution s=1 ; after execution s=0 163 8.2.15 clrt (clear t bit): system control instruction format abstract code cycle t bit clrt 0 ? t 0000000000001000 10 description: clears the t bit. operation: clrt() /* clrt */ { t=0; pc+=2; } examples: clrt ; before execution t = 1 ; after execution t = 0 164 8.2.16 cmp/cond (compare conditionally): arithmetic instruction format abstract code cycle t bit cmp/eq rm,rn when rn = rm, 1 ? t 0011nnnnmmmm0000 1 comparison result cmp/ge rm,rn when signed and rn 3 rm, 1 ? t 0011nnnnmmmm0011 1 comparison result cmp/gt rm,rn when signed and rn > rm, 1 ? t 0011nnnnmmmm0111 1 comparison result cmp/hi rm,rn when unsigned and rn > rm, 1 ? t 0011nnnnmmmm0110 1 comparison result cmp/hs rm,rn when unsigned and rn 3 rm, 1 ? t 0011nnnnmmmm0010 1 comparison result cmp/pl rn when rn > 0, 1 ? t 0100nnnn00010101 1 comparison result cmp/pz rn when rn 3 0, 1 ? t 0100nnnn00010001 1 comparison result cmp/str rm,rn when a byte in rn equals a byte in rm, 1 ? t 0010nnnnmmmm1100 1 comparison result cmp/eq #imm,r0 when r0 = imm, 1 ? t 10001000iiiiiiii 1 comparison result description: compares general register rn data with rm data, and sets the t bit to 1 if a specified condition (cond) is satisfied. the t bit is cleared to 0 if the condition is not satisfied, and the rn data does not change. the nine conditions in table 8-1 can be specified. conditions pz and pl are the results of comparisons between rn and 0. sign-extended 8-bit immediate data can also be compared with r0 by using condition eq. here, r0 data does not change. table 8-1 shows the mnemonics for the conditions. 165 table 8-1 cmp mnemonics mnemonics condition cmp/eq rm,rn if rn = rm, t = 1 cmp/ge rm,rn if rn 3 rm with signed data, t = 1 cmp/gt rm,rn if rn > rm with signed data, t = 1 cmp/hi rm,rn if rn > rm with unsigned data, t = 1 cmp/hs rm,rn if rn 3 rm with unsigned data, t = 1 cmp/pl rn if rn > 0, t = 1 cmp/pz rn if rn 3 0, t = 1 cmp/str rm,rn if a byte in rn equals a byte in rm, t = 1 cmp/eq #imm,r0 if r0 = imm, t = 1 operation: cmpeq(long m,long n) /* cmp_eq rm,rn */ { if (r[n]==r[m]) t=1; else t=0; pc+=2; } cmpge(long m,long n) /* cmp_ge rm,rn */ { if ((long)r[n]>=(long)r[m]) t=1; else t=0; pc+=2; } cmpgt(long m,long n) /* cmp_gt rm,rn */ { if ((long)r[n]>(long)r[m]) t=1; else t=0; pc+=2; } 166 cmphi(long m,long n) /* cmp_hi rm,rn */ { if ((unsigned long)r[n]>(unsigned long)r[m]) t=1; else t=0; pc+=2; } cmphs(long m,long n) /* cmp_hs rm,rn */ { if ((unsigned long)r[n]>=(unsigned long)r[m]) t=1; else t=0; pc+=2; } cmppl(long n) /* cmp_pl rn */ { if ((long)r[n]>0) t=1; else t=0; pc+=2; } cmppz(long n) /* cmp_pz rn */ { if ((long)r[n]>=0) t=1; else t=0; pc+=2; } cmpstr(long m,long n) /* cmp_str rm,rn */ { unsigned long temp; long hh,hl,lh,ll; temp=r[n]^r[m]; hh=(temp&0xff000000)>>12; hl=(temp&0x00ff0000)>>8; lh=(temp&0x0000ff00)>>4; ll=temp&0x000000ff; hh=hh&&hl&&lh&≪ if (hh==0) t=1; else t=0; 167 pc+=2; } cmpim(long i) /* cmp_eq #imm,r0 */ { long imm; if ((i&0x80)==0) imm=(0x000000ff & (long i)); else imm=(0xffffff00 | (long i)); if (r[0]==imm) t=1; else t=0; pc+=2; } examples: cmp/ge r0,r1 ; r0 = h'7fffffff, r1 = h'80000000 bt trget_t ; does not branch because t = 0 cmp/hs r0,r1 ; r0 = h'7fffffff, r1 = h'80000000 bt trget_t ; branches because t = 1 cmp/str r2,r3 ; r2 = abcd, r3 = xycz bt trget_t ; branches because t = 1 168 8.2.17 div0s (divide step 0 as signed): arithmetic instruction format abstract code cycle t bit div0s rm,rn msb of rn ? q, msb of rm ? m, m^q ? t 0010nnnnmmmm0111 1 calculation result description: div0s is an initialization instruction for signed division. it finds the quotient by repeatedly dividing in combination with the div1 or another instruction that divides for each bit after this instruction. see the description given with div1 for more information. operation: div0s(long m,long n) /* div0s rm,rn */ { if ((r[n]&0x80000000)==0) q=0; else q=1; if ((r[m]&0x80000000)==0) m=0; else m=1; t=!(m==q); pc+=2; } examples: see div1. 169 8.2.18 div0u (divide step 0 as unsigned): arithmetic instruction format abstract code cycle t bit div0u 0 ? m/q/t 0000000000011001 10 description: div0u is an initialization instruction for unsigned division. it finds the quotient by repeatedly dividing in combination with the div1 or another instruction that divides for each bit after this instruction. see the description given with div1 for more information. operation: div0u() /* div0u */ { m=q=t=0; pc+=2; } example: see div1. 170 8.2.19 div1 (divide step 1): arithmetic instruction format abstract code cycle t bit div1 rm,rn 1 step division (rn rm) 0011nnnnmmmm0100 1 calculation result description: uses single-step division to divide one bit of the 32-bit data in general register rn (dividend) by rm data (divisor). it finds a quotient through repetition either independently or used in combination with other instructions. during this repetition, do not rewrite the specified register or the m, q, and t bits. in one-step division, the dividend is shifted one bit left, the divisor is subtracted and the quotient bit reflected in the q bit according to the status (positive or negative). to find the remainder in a division, first find the quotient using a div1 instruction, then find the remainder as follows: (remainder) = (dividend) C (divisor) (quotient) zero division, overflow detection, and remainder operation are not supported. check for zero division and overflow division before dividing. find the remainder by first finding the sum of the divisor and the quotient obtained and then subtracting it from the dividend. that is, first initialize with div0s or div0u. repeat div1 for each bit of the divisor to obtain the quotient. when the quotient requires 17 or more bits, place rotcl before div1. for the division sequence, see the following examples. 171 operation: div1(long m,long n) /* div1 rm,rn */ { unsigned long tmp0; unsigned char old_q,tmp1; old_q=q; q=(unsigned char)((0x80000000 & r[n])!=0); r[n]<<=1; r[n]|=(unsigned long)t; switch(old_q){ case 0:switch(m){ case 0:tmp0=r[n]; r[n]-=r[m]; tmp1=(r[n]>tmp0); switch(q){ case 0:q=tmp1; break; case 1:q=(unsigned char)(tmp1==0); break; } break; case 1:tmp0=r[n]; r[n]+=r[m]; tmp1=(r[n] 173 example 1: ; r1 (32 bits) / r0 (16 bits) = r1 (16 bits):unsigned shll16 r0 ; upper 16 bits = divisor, lower 16 bits = 0 tst r0,r0 ; zero division check bt zero_div cmp/hs r0,r1 ; overflow check bt over_div div0u ; flag initialization .arepeat 16 div1 r0,r1 ; repeat 16 times .aendr rotcl r1 extu.w r1,r2 ; r1 = quotient example 2: ; r1:r2 (64 bits)/r0 (32 bits) = r2 (32 bits): unsigned tst r0,r0 ; zero division check bt zero_div cmp/hs r0,r1 ; overflow check bt over_div div0u ; flag initialization .arepeat 32 rotcl r2 ; repeat 32 times div1 r0,r1 .aendr rotcl r2 ; r2 = quotient 174 example 3: ; r1 (16 bits)/r0 (16 bits) = r1 (16 bits): signed shll16 r0 ; upper 16 bits = divisor, lower 16 bits = 0 exts.w r1,r1 ; sign-extends the dividend to 32 bits xor r2,r2 ; r2 = 0 mov r1,r3 rotcl r3 subc r2,r1 ; decrements if the dividend is negative div0s r0,r1 ; flag initialization .arepeat 16 div1 r0,r1 ; repeat 16 times .aendr exts.w r1,r1 rotcl r1 ; r1 = quotient (ones complement) addc r2,r1 ; increments and takes the twos complement if the msb of the ; quotient is 1 exts.w r1,r1 ; r1 = quotient (twos complement) example 4: ; r2 (32 bits) / r0 (32 bits) = r2 (32 bits): signed mov r2,r3 rotcl r3 subc r1,r1 ; sign-extends the dividend to 64 bits (r1:r2) xor r3,r3 ; r3 = 0 subc r3,r2 ; decrements and takes the ones complement if the dividend is ; negative div0s r0,r1 ; flag initialization .arepeat 32 rotcl r2 ; repeat 32 times div1 r0,r1 .aendr rotcl r2 ; r2 = quotient (ones complement) addc r3,r2 ; increments and takes the twos complement if the msb of the ; quotient is 1. r2 = quotient (twos complement) 175 8.2.20 dmuls.l (double-length multiply as signed): arithmetic instruction format abstract code cycle t bit dmuls.l rm,rn with sign, rn rm ? mach, macl 0011nnnnmmmm1101 2 (to 5) description: performs 32-bit multiplication of the contents of general registers rn and rm, and stores the 64-bit results in the macl and mach register. the operation is a signed arithmetic operation. operation: dmuls(long m,long n) /* dmuls.l rm,rn */ { unsigned long rnl,rnh,rml,rmh,res0,res1,res2; unsigned long temp0,temp1,temp2,temp3; long tempm,tempn,fnlml; tempn=(long)r[n]; tempm=(long)r[m]; if (tempn<0) tempn=0-tempn; if (tempm<0) tempm=0-tempm; if ((long)(r[n]^r[m])<0) fnlml=-1; else fnlml=0; temp1=(unsigned long)tempn; temp2=(unsigned long)tempm; rnl=temp1&0x0000ffff; rnh=(temp1>>16)&0x0000ffff; rml=temp2&0x0000ffff; rmh=(temp2>>16)&0x0000ffff; temp0=rml*rnl; temp1=rmh*rnl; temp2=rml*rnh; temp3=rmh*rnh; 176 res2=0 res1=temp1+temp2; if (res1 177 8.2.21 dmulu.l (double-length multiply as unsigned): arithmetic instruction format abstract code cycle t bit dmulu.l rm,rn without sign, rn rm ? mach, macl 0011nnnnmmmm0101 2 (to 5) description: performs 32-bit multiplication of the contents of general registers rn and rm, and stores the 64-bit results in the macl and mach register. the operation is an unsigned arithmetic operation. operation: dmulu(long m,long n) /* dmulu.l rm,rn */ { unsigned long rnl,rnh,rml,rmh,res0,res1,res2; unsigned long temp0,temp1,temp2,temp3; rnl=r[n]&0x0000ffff; rnh=(r[n]>>16)&0x0000ffff; rml=r[m]&0x0000ffff; rmh=(r[m]>>16)&0x0000ffff; temp0=rml*rnl; temp1=rmh*rnl; temp2=rml*rnh; temp3=rmh*rnh; res2=0 res1=temp1+temp2; if (res1 178 macl=res0; pc+=2; } examples: dmulu r0,r1 ; before execution r0 = h'fffffffe, r1 = h'00005555 ; after execution mach = h'ffffffff, macl = h'ffff5556 sts mach,r0 ; operation result (top) sts macl,r0 ; operation result (bottom) 179 8.2.22 dt (decrement and test): arithmetic instruction format abstract code cycle t bit dt rn rn - 1 ? rn; when rn is 0, 1 ? t, when rn is nonzero, 0 ? t 0100nnnn00010000 1 comparison result description: decrements the contents of general register rn by 1 and compares the results to 0 (zero). when the result is 0, the t bit is set to 1. when the result is not zero, the t bit is set to 0. operation: dt(long n) /* dt rn */ { r[n]--; if (r[n]==0) t=1; else t=0; pc+=2; } example: mov #4,r5 ; sets the number of loops. loop: add r0,r1 dt rs ; decrements the r5 value and checks whether it has become 0. bf loop ; branches to loop is t=0. (in this example, loops 4 times.) 180 8.2.23 exts (extend as signed): arithmetic instruction format abstract code cycle t bit exts.b rm,rn sign-extend rm from byte ? rn 0110nnnnmmmm1110 1 exts.w rm,rn sign-extend rm from word ? rn 0110nnnnmmmm1111 1 description: sign-extends general register rm data, and stores the result in rn. if byte length is specified, the bit 7 value of rm is copied into bits 8 to 31 of rn. if word length is specified, the bit 15 value of rm is copied into bits 16 to 31 of rn. operation: extsb(long m,long n) /* exts.b rm,rn */ { r[n]=r[m]; if ((r[m]&0x00000080)==0) r[n]&=0x000000ff; else r[n]|=0xffffff00; pc+=2; } extsw(long m,long n) /* exts.w rm,rn */ { r[n]=r[m]; if ((r[m]&0x00008000)==0) r[n]&=0x0000ffff; else r[n]|=0xffff0000; pc+=2; } examples: exts.b r0,r1 ; before execution r0 = h'00000080 ; after execution r1 = h'ffffff80 exts.w r0,r1 ; before execution r0 = h'00008000 ; after execution r1 = h'ffff8000 181 8.2.24 extu (extend as unsigned): arithmetic instruction format abstract code cycle t bit extu.b rm,rn zero-extend rm from byte ? rn 0110nnnnmmmm1100 1 extu.w rm,rn zero-extend rm from word ? rn 0110nnnnmmmm1101 1 description: zero-extends general register rm data, and stores the result in rn. if byte length is specified, 0s are written in bits 8 to 31 of rn. if word length is specified, 0s are written in bits 16 to 31 of rn. operation: extub(long m,long n) /* extu.b rm,rn */ { r[n]=r[m]; r[n]&=0x000000ff; pc+=2; } extuw(long m,long n) /* extu.w rm,rn */ { r[n]=r[m]; r[n]&=0x0000ffff; pc+=2; } examples: extu.b r0,r1 ; before execution r0 = h'ffffff80 ; after execution r1 = h'00000080 extu.w r0,r1 ; before execution r0 = h'ffff8000 ; after execution r1 = h'00008000 182 8.2.25 jmp (jump): branch instruction class: delayed branch instruction format abstract code cycle t bit jmp @rm rm ? pc 0100nnnn00101011 2 description: branches unconditionally after executing the instruction following this jmp instruction. the branch destination is an address specified by the 32-bit data in general register rn. note: since this is a delayed branch instruction, the instruction after jmp is executed before branching. no interrupts are accepted between this instruction and the next instruction. if the next instruction is a branch instruction, it is acknowledged as an illegal slot instruction. if this instruction is located in a delayed slot immediately following a delayed branch instruction, it is acknowledged as an illegal slot instruction. operation: jmp(long m) /* jmp @rm */ { unsigned long temp; temp=pc; pc=r[m]+4; delay_slot(temp+2); } examples: mov.l jmp_table,r0 ; address of r0 = trget jmp @r0 ; branches to trget mov r0,r1 ; executes mov before branching .align 4 jmp_table: .data.l trget ; jump table ................. trget: add #1,r1 ; ? branch destination 183 note: in delayed branching, the branching operation itself takes place after the slot instruction has been executed. however, execution of instructions (register updating, etc.) should always be done in the sequence of delayed branch instruction followed by delayed slot instruction. for example, even if a delayed slot updates a register in which the branching destination address is stored, the contents of the register before updating will be used as the branching destination address. 184 8.2.26 jsr (jump to subroutine): branch instruction class: delayed branch instruction format abstract code cycle t bit jsr @rm pc ? rm, rm ? pc 0100nnnn00001011 2 description: branches to the subroutine procedure at a specified address after executing the instruction following this jsr instruction. the pc value is stored in the pr. the jump destination is an address specified by the 32-bit data in general register rn. the pc points to the starting address of the second instruction after jsr. the jsr instruction and rts instruction are used for subroutine procedure calls. note: since this is a delayed branch instruction, the instruction after jsr is executed before branching. no interrupts are accepted between this instruction and the next instruction. if the next instruction is a branch instruction, it is acknowledged as an illegal slot instruction. if this instruction is located in a delayed slot immediately following a delayed branch instruction, it is acknowledged as an illegal slot instruction. the pr used by the instruction immediately following this instruction is updated by this instruction. also, if the instruction immediately following this instruction generates a re-execution exception other than instruction fetch, the pr is updated by this instruction. re-execute this instruction to recover. operation: jsr(long m) /* jsr @rm */ { pr=pc; pc=r[m]+4; delay_slot(pr+2); } 185 examples: mov.l jsr_table,r0 ; address of r0 = trget jsr @r0 ; branches to trget xor r1,r1 ; executes xor before branching add r0,r1 ; ? return address for when the subroutine ; procedure is completed (pr data) ........... .align 4 jsr_table: .data.l trget ; jump table trget: nop ; ? procedure entrance mov r2,r3 rts ; returns to the above add instruction mov #70,r1 ; executes mov before rts note: in delayed branching, the branching operation itself takes place after the slot instruction has been executed. however, execution of instructions (register updating, etc.) should always be done in the sequence of delayed branch instruction followed by delayed slot instruction. for example, even if a delayed slot updates a register in which the branching destination address is stored, the contents of the register before updating will be used as the branching destination address. 186 8.2.27 ldc (load to control register): system control instruction (privileged only) format abstract code cycle t bit ldc rm,sr rm ? sr 0100mmmm00001110 5 lsb ldc rm,gbr rm ? gbr 0100mmmm00011110 1 ldc rm,vbr rm ? vbr 0100mmmm00101110 1 ldc rm,ssr rm ? ssr 0100mmmm00111110 1 ldc rm,spc rm ? spc 0100mmmm01001110 1 ldc rm,mod * rm ? mod 0100mmmm01011110 3 ldc rm,re * rm ? re 0100mmmm01111110 3 ldc rm,rs * rm ? rs 0100mmmm01101110 3 ldc rm,r0_bank rm ? r0_bank 0100mmmm10001110 1 ldc rm,r1_bank rm ? r1_bank 0100mmmm10011110 1 ldc rm,r2_bank rm ? r2_bank 0100mmmm10101110 1 ldc rm,r3_bank rm ? r3_bank 0100mmmm10111110 1 ldc rm,r4_bank rm ? r4_bank 0100mmmm11001110 1 ldc rm,r5_bank rm ? r5_bank 0100mmmm11011110 1 ldc rm,r6_bank rm ? r6_bank 0100mmmm11101110 1 ldc rm,r7_bank rm ? r7_bank 0100mmmm11111110 1 ldc.l @rm+,sr (rm) ? sr, rm + 4 ? rm 0100mmmm00000111 7 lsb ldc.l @rm+,gbr (rm) ? gbr, rm + 4 ? rm 0100mmmm00010111 1 ldc.l @rm+,vbr (rm) ? vbr, rm + 4 ? rm 0100mmmm00100111 1 ldc.l @rm+,ssr (rm) ? ssr, rm + 4 ? rm 0100mmmm00110111 1 ldc.l @rm+,spc (rm) ? spc, rm + 4 ? rm 0100mmmm01000111 1 ldc.l @rm+,mod * (rm) ? mod, rm + 4 ? rm 0100mmmm01010111 5 ldc.l @rm+,re * (rm) ? re, rm + 4 ? rm 0100mmmm01110111 5 ldc.l @rm+,rs * (rm) ? rs, rm + 4 ? rm 0100mmmm01100111 5 ldc.l @rm+,r0_bank (rm) ? r0_bank, rm + 4 ? rm 0100mmmm10000111 1 ldc.l @rm+,r1_bank (rm) ? r1_bank, rm + 4 ? rm 0100mmmm10010111 1 ldc.l @rm+,r2_bank (rm) ? r2_bank, rm + 4 ? rm 0100mmmm10100111 1 ldc.l @rm+,r3_bank (rm) ? r3_bank, rm + 4 ? rm 0100mmmm10110111 1 note: * sh3-dsp only. 187 format abstract code cycle t bit ldc.l @rm+,r4_bank (rm) ? r4_bank, rm + 4 ? rm 0100mmmm11000111 1 ldc.l @rm+,r5_bank (rm) ? r5_bank, rm + 4 ? rm 0100mmmm11010111 1 ldc.l @rm+,r6_bank (rm) ? r6_bank, rm + 4 ? rm 0100mmmm11100111 1 ldc.l @rm+,r7_bank (rm) ? r7_bank, rm + 4 ? rm 0100mmmm11110111 1 notes: 1. three cycles on the sh3-dsp. 2. five cycles on the sh3-dsp. description: stores source operand in control registers sr, gbr, vbr, ssr, spc, mod, re, and rs, or r0_bank to r7_bank. ldc and ldc.l, except for ldc rm, gbr and ldc.l @rm+, gbr, are privileged instructions and can be used in privileged mode only. if used in user mode, they can cause illegal instruction exceptions. note that ldc rm, gbr and ldc.l @rm+, gbr can be used in user mode. the rm_bank operand is designated by the rb bit of the sr register. when the value of the rb bit is 1, the r0_bank1 to r7_bank1 registers and the r8 to r15 registers are used as the rn operand, and the r0_bank0 to r7_bank0 registers are used as the rm_bank operand. when the value of the rb bit is 0, the r0_bank0 to r7_bank0 registers and the r8 to r15 registers are used as the rn operand, and the r0_bank1 to r7_bank1 registers are used as the rm_bank operand. if the ldc rm, sr instruction or ldc.l @rm+, sr instruction is located in a delayed slot immediately following a delayed branch instruction, it is acknowledged as an illegal slot instruction. operation: ldcsr(long m) /* ldc rm,sr */ { sr=r[m]&0x0fff0fff; pc+=2; } ldcgbr(long m) /* ldc rm,gbr */ { gbr=r[m]; pc+=2; } 188 ldcvbr(long m) /* ldc rm,vbr */ { vbr=r[m]; pc+=2; } ldcssr(long m) /* ldc rm,ssr */ { ssr=r[m]&0x700003f3; pc+=2; } ldcspc(long m) /* ldc rm,spc */ { spc=r[m]; pc+=2; } ldcrn_bank(long m) /* ldc rm,rn_bank */ { /* n=0C7, */ rn_bank=r[m]; pc+=2; } ldcmsr(long m) /* ldc.l @rm+,sr */ { sr=read_long(r[m])&0x0fff0fff; r[m]+=4; pc+=2; } ldcmgbr(long m) /* ldc.l @rm+,gbr */ { gbr=read_long(r[m]); r[m]+=4; pc+=2; } 189 ldcmvbr(long m) /* ldc.l @rm+,vbr */ { vbr=read_long(r[m]); r[m]+=4; pc+=2; } ldcmssr(long m) /* ldc.l @rm+,ssr */ { ssr=read_long(r[m])&0x700003f3; r[m]+=4; pc+=2; } ldcmspc(long m) /* ldc.l @rm+,spc */ { spc=read_long(r[m]); r[m]+=4; pc+=2; } ldcmrn_bank(long m) /* ldc.l @rm+,rn_bank */ /* n=0C7 */ { rn_bank=read_long(r[m]); r[m]+=4; pc+=2; } ldcmod(long m) /* ldc rm,mod */ { mod=r[m]; pc+=2; } ldcre(long m) /* ldc rm,re */ { re=r[m]; pc+=2; } 190 ldcrs(long m) /* ldc rm,rs */ { rs=r[m]; pc+=2; } ldcmmod(long m) /* ldc.l @rm+,mod */ { mod=read_long(r[m]); r[m]+=4; pc+=2; } ldcmre(long m) /* ldc.l @rm+,re */ { re=read_long(r[m]); r[m]+=4; pc+=2; } ldcmrs(long m) /* ldc.l @rm+,rs */ { rs=read_long(r[m]); r[m]+=4; pc+=2; } examples: ldc r0,sr ; before execution r0 = h'ffffffff, sr = h'00000000 ; after execution sr = h'700003f3 ldc.l @r15+,gbr ; before execution r15 = h'10000000, @r15 + h'12345678, gbr = h'edcba987 ; after execution r15 = h'10000004, gbr = @h'10000000 191 8.2.28 ldre (load effective address to re register): system control instruction (sh3-dsp only) format abstract code cycle t bit ldre @(disp,pc) disp 2 + pc ? re 10001110dddddddd 3 description: stores the effective address of the source operand in the repeat end register re. the effective address is an address specified by pc + displacement. the pc is the address four bytes after this instruction. the 8-bit displacement is sign-extended and doubled. consequently, the relative interval from the branch destination is C256 to +254 bytes. note: the effective address value designated for the re reregister is different from the actual repeat end address. refer to table 8.23, rs and re design rule, for more information. when this instruction is arranged immediately after the delayed branch instruction, pc becomes the "first address +2" of the branch destination. operation: ldre(long d) /* ldre @(disp, pc) */ { long disp; if ((d&0x80)==0) disp=(0x000000ff & (long)d); else disp=(0xffffff00 | (long)d); re=pc+(disp<<1); pc+=2; } example: ldrs sta ; set repeat start address to rs. ldre end ; set repeat end address to re. setrc #32 ; repeat 32 times from inst.a to inst.c. inst.0 ; sta: inst.a ; inst.b ; ............ end: inst.c ; inst.e ; ............ 192 8.2.29 ldrs (load effective address to rs register): system control instruction (sh3-dsp only) format abstract code cycle t bit ldrs @(disp,pc) disp 2 + pc ? rs 10001100dddddddd 3 description: stores the effective address of the source operand in the repeat start register rs. the effective address is an address specified by pc + displacement. the pc is the address four bytes after this instruction. the 8-bit displacement is sign-extended and doubled. consequently, the relative interval from the branch destination is C256 to +254 bytes. note: when the instructions of the repeat (loop) program are below 3, the effective address value designated for the rs register is different from the actual repeat start address. refer to table 8-23. "rs and re setting rule", for more information. if this instruction is arranged immediately after the delayed branch instruction, the pc becomes "the first address +2" of the branch destination. operation: ldrs(long d) /* ldrs @(disp, pc) */ { long disp; if ((d&0x80)==0) disp=(0x000000ff & (long)d); else disp=(0xffffff00 | (long)d); rs=pc+(disp<<1); pc+=2; } example: ldrs sta ; set repeat start address to rs. ldre end ; set repeat end address to re. setrc #32 ; repeat 32 times from inst.a to inst.c. inst.0 ; sta: inst.a ; inst.b ; ............ end: inst.c ; inst.d ; ............ 193 8.2.30 lds (load to system register): system control instruction format abstract code cycle t bit lds rm,mach rm ? mach 0100mmmm00001010 1 lds rm,macl rm ? macl 0100mmmm00011010 1 lds rm,pr rm ? pr 0100mmmm00101010 1 lds rm,dsr * rm ? dsr 0100mmmm01101010 1 lds rm,a0 * rm ? a0 0100mmmm01111010 1 lds rm,x0 * rm ? x0 0100mmmm10001010 1 lds rm,x1 * rm ? x1 0100mmmm10011010 1 lds rm,y0 * rm ? y0 0100mmmm10101010 1 lds rm,y1 * rm ? y1 0100mmmm10111010 1 lds.l @rm+,mach (rm) ? mach, rm + 4 ? rm 0100mmmm00000110 1 lds.l @rm+,macl (rm) ? macl, rm + 4 ? rm 0100mmmm00010110 1 lds.l @rm+,pr (rm) ? pr, rm + 4 ? rm 0100mmmm00100110 1 lds.l @rm+,dsr * (rm) ? dsr, rm + 4 ? rm 0100mmmm01100110 1 lds.l @rm+,a0 * (rm) ? a0, rm + 4 ? rm 0100mmmm01110110 1 lds.l @rm+,x0 * (rm) ? x0,rm+4 ? rm 0100nnnn10000110 1 lds.l @rm+,x1 * (rm) ? x1,rm+4 ? rm 0100nnnn10010110 1 lds.l @rm+,y0 * (rm) ? y0,rm+4 ? rm 0100nnnn10100110 1 lds.l @rm+,y1 * (rm) ? y1,rm+4 ? rm 0100nnnn10110110 1 note: * sh3-dsp only. description: stores the source operand into the system registers mach, macl, pr, dsr, a0, x0, x1, y0, or y1. operation: ldsmach(long m) /* lds rm,mach */ { mach=r[m]; if ((mach&0x00000200)==0) mach&=0x000003ff; else mach|=0xfffffc00; pc+=2; } 194 ldsmacl(long m) /* lds rm,macl */ { macl=r[m]; pc+=2; } ldspr(long m) /* lds rm,pr */ { pr=r[m]; pc+=2; } ldsmmach(long m) /* lds.l @rm+,mach */ { mach=read_long(r[m]); if ((mach&0x00000200)==0) mach&=0x000003ff; else mach|=0xfffffc00; r[m]+=4; pc+=2; } ldsmmacl(long m) /* lds.l @rm+,macl */ { macl=read_long(r[m]); r[m]+=4; pc+=2; } ldsmpr(long m) /* lds.l @rm+,pr */ { pr=read_long(r[m]); r[m]+=4; pc+=2; } ldsdsr(long m) /* lds rm,dsr */ { dsr=r[m]&0x0000000f; pc+=2; } 195 ldsa0(long m) /* lds rm,a0 */ { a0=r[m]; if((a0&0x80000000)==0) a0g=0x00; else a0g=0xff; pc+=2; } ldsx0(long m) /* lds rm, x0 */ { x0=r[m]; pc+=2; } ldsx1(long m) /* lds rm, x1 */ { x1=r[m]; pc+=2; } ldsy0(long m) /* lds rm, y0 */ { y0=r[m]; pc+=2; } ldsy1(long m) /* lds rm, y1 */ { y1=r[m]; pc+=2; } ldsmdsr(long m) /* lds.l @rm+,dsr */ { dsr=read_long(r[m])&0x0000000f; r[m]+=4; pc+=2; } ldsma0(long m) /* lds.l @rm+,a0 */ { a0=read_long(r[m]); if((a0&0x80000000)==0) a0g=0x00; else a0g=0xff; 196 r[m]+=4; pc+=2; } ldsmx0(long m) /* lds.l @rm+,x0 */ { x0=read_long(r[m]); r[m]+=4; pc+=2; } ldsmx1(long m) /* lds.l @rm+,x1 */ { x1=read_long(r[m]); r[m]+=4; pc+=2; } ldsmy0(long m) /* lds.l @rm+,y0 */ { y0=read_long(r[m]); r[m]+=4; pc+=2; } ldsmy1(long m) /* lds.l @rm+,y1 */ { y1=read_long(r[m]); r[m]+=4; pc+=2; } examples: lds r0,pr ; before execution r0 = h'12345678, pr = h'00000000 ; after execution pr = h'12345678 lds.l @r15+,macl ; before execution r15 = h'10000000 ; after execution r15 = h'10000004, macl = @h'10000000 197 8.2.31 ldtlb (load pteh/ptel to tlb): system control instruction (privileged only) format abstract code cycle t bit ldtlb pteh/ptel ? tlb 0000000000111000 1 description: loads pteh/ptel registers to the translation lookaside buffer (tlb). the tlb is indexed by the virtual address held in the pteh register. the loaded set is designated by the mmucr.rc (mmucr is an mmu control register and rc is a two bit field for a counter). ldtlb is a privileged instruction and can be used in privileged mode only. if used in user mode, it causes an illegal instruction exception. note: as ldtlb is for loading pteh and ptel to the tlb, the instruction should be issued when mmu is off (mmucr.at = 0) or should be placed in the p1 or p2 space with mmu enabled (see the mmu section of the applicable hardware manual for details). if the instruction is issued in an exception handler, it should be at least two instructions prior to an rte instruction that terminates the handler. operation: ldtlb() /*ldtlb*/ { tlb_tag=pteh; tlb_data=ptel; pc+=2; } examples: mov l @r0, r1 ; load upper bits of page table entry to r1 mov l r1, @r2 ; load r1 to pteh, r2 is pteh address (h'fffffff0) mov l @r3, r4 ; load lower bits of page table entry to r4 mov l r4, @r5 ; load r4 to ptel, r5 is ptel address (h'fffffff4) ldtlb ; load pteh and ptel registers to tlb 198 8.2.32 mac.l (multiply and accumulate long): arithmetic instruction format abstract code cycle t bit mac.l @rm+,@rn+ signed operation, (rn) (rm) + mac ? mac rn + 4 ? rn, rm + 4 ? rm 0000nnnnmmmm1111 2 (to 5) description: does signed multiplication of 32-bit operands obtained using the contents of general registers rm and rn as addresses. the 64-bit result is added to contents of the mac register, and the final result is stored in the mac register. every time an operand is read, rm and rn are incremented by four. when the s bit is cleared to 0, the 64-bit result is stored in the coupled mach and macl registers. when bit s is set to 1, addition to the mac register is a saturation operation of 48 bits starting from the lsb. for the saturation operation, only the lower 48 bits of the macl register are enabled and the result is limited to between h'ffff800000000000 (minimum) and h'00007fffffffffff (maximum). operation: macl(long m,long n) /* mac.l @rm+,@rn+*/ { unsigned long rnl,rnh,rml,rmh,res0,res1,res2; unsigned long temp0,templ,temp2,temp3; long tempm,tempn,fnlml; tempn=(long)read_long(r[n]); r[n]+=4; tempm=(long)read_long(r[m]); r[m]+=4; if ((long)(tempn^tempm)<0) fnlml=-1; else fnlml=0; if (tempn<0) tempn=0-tempn; if (tempm<0) tempm=0-tempm; temp1=(unsigned long)tempn; temp2=(unsigned long)tempm; 199 rnl=temp1&0x0000ffff; rnh=(temp1>>16)&0x0000ffff; rml=temp2&0x0000ffff; rmh=(temp2>>16)&0x0000ffff; temp0=rml*rnl; temp1=rmh*rnl; temp2=rml*rnh; temp3=rmh*rnh; res2=0 res1=temp1+temp2; if (res1 200 if(((long)res2>0)&&(res2>0x00007fff)){ res2=0x00007fff; res0=0xffffffff; }; mach={res2; macl=res0; } else { res0=macl+res0; if (macl>res0) res2++; res2+=mach mach=res2; macl=res0; } pc+=2; } examples: mova tblm,r0 ; table address mov r0,r1 mova tbln,r0 ; table address clrmac ; mac register initialization mac.l @r0+,@r1+ mac.l @r0+,@r1+ sts macl,r0 ; store result into r0 ............... .align 2 tblm .data.l h'1234abcd .data.l h'5678ef01 tbln .data.l h'0123abcd .data.l h'4567def0 201 8.2.33 mac (multiply and accumulate): arithmetic instruction format abstract code cycle t bit mac.w @rm+,@rn+ mac @rm+,@rn+ with sign, (rn) (rm) + mac ? mac rn + 2 ? rn, rm + 2 ? rm 0100nnnnmmmm1111 2 (to 5) description: multiplies with sign 16-bit operands obtained using the contents of general registers rm and rn as addresses. the 32-bit result is added to the contents of the mac register, and the final result is stored in the mac register. each time an operand is read, rm and rn are each incremented by 2. when the s bit is cleared to 0, the 64-bit result of the 16-bit ( 16-bit + 64-bit = 64-bit multiply and accumulate calculation is stored in the coupled mach and macl registers. when the s bit is set to 1, the 16-bit ( 16-bit + 32-bit = 32-bit multiply and accumulate calculation involves addition to the mac register using a saturation operation. for the saturation operation, only the macl register is enabled, and the result is limited to between h'80000000 (minimum) and h'7fffffff (maximum). if an overflow occurs, the lsb of the mach register is set to 1. if the overflow is in the negative direction, h'80000000 (the minimum value) is stored in the macl register, and if the overflow is in the positive direction, h'7fffffff (the maximum value) is stored in the macl register. note: the normal number of cycles for execution is 3; however, succeeding instructions can be executed in two cycles. operation: macw(long m,long n) /* mac.w @rm+,@rn+*/ { long tempm,tempn,dest,src,ans; unsigned long templ; tempn=(long)read_word(r[n]); r[n]+=2; tempm=(long)read_word(r[m]); r[m]+=2; templ=macl; tempm=((long)(short)tempn*(long)(short)tempm); if ((long)macl>=0) dest=0; else dest=1; if ((long)tempm>=0 { 202 src=0; tempn=0; } else { src=1; tempn=0xffffffff; } src+=dest; macl+=tempm; if ((long)macl>=0) ans=0; else ans=1; ans+=dest; if (s==1) { if (ans==1) { if (src==0 || src==2) mach|=0x00000001; if (src==0) macl=0x7fffffff; if (src==2) macl=0x80000000; } } else { mach+=tempn; if (templ>macl) mach+=1; if ((mach&0x00000200)==0) mach&=0x000003ff; else mach|=0xfffffc00; } pc+=2; } 203 examples: mova tblm,r0 ; table address mov r0,r1 mova tbln,r0 ; table address clrmac ; mac register initialization mac.w @r0+,@r1+ mac.w @r0+,@r1+ sts macl,r0 ; store result into r0 ............... .align 2 tblm .data.w h'1234 .data.w h'5678 tbln .data.w h'0123 .data.w h'4567 204 8.2.34 mov (move data): data transfer instruction format abstract code cycle t bit mov rm,rn rm ? rn 0110nnnnmmmm0011 1 mov.b rm,@rn rm ? (rn) 0010nnnnmmmm0000 1 mov.w rm,@rn rm ? (rn) 0010nnnnmmmm0001 1 mov.l rm,@rn rm ? (rn) 0010nnnnmmmm0010 1 mov.b @rm,rn (rm) ? sign extension ? rn 0110nnnnmmmm0000 1 mov.w @rm,rn (rm) ? sign extension ? rn 0110nnnnmmmm0001 1 mov.l @rm,rn (rm) ? rn 0110nnnnmmmm0010 1 mov.b rm,@Crn rn C 1 ? rn, rm ? (rn) 0010nnnnmmmm0100 1 mov.w rm,@Crn rn C 2 ? rn, rm ? (rn) 0010nnnnmmmm0101 1 mov.l rm,@Crn rn C 4 ? rn, rm ? (rn) 0010nnnnmmmm0110 1 mov.b @rm+,rn (rm) ? sign extension ? rn, rm + 1 ? rm 0110nnnnmmmm0100 1 mov.w @rm+,rn (rm) ? sign extension ? rn, rm + 2 ? rm 0110nnnnmmmm0101 1 mov.l @rm+,rn (rm) ? rn, rm + 4 ? rm 0110nnnnmmmm0110 1 mov.b rm,@(r0,rn) rm ? (r0 + rn) 0000nnnnmmmm0100 1 mov.w rm,@(r0,rn) rm ? (r0 + rn) 0000nnnnmmmm0101 1 mov.l rm,@(r0,rn) rm ? (r0 + rn) 0000nnnnmmmm0110 1 mov.b @(r0,rm),rn (r0 + rm) ? sign extension ? rn 0000nnnnmmmm1100 1 mov.w @(r0,rm),rn (r0 + rm) ? sign extension ? rn 0000nnnnmmmm1101 1 mov.l @(r0,rm),rn (r0 + rm) ? rn 0000nnnnmmmm1110 1 description: transfers the source operand to the destination. when the operand is stored in memory, the transferred data can be a byte, word, or longword. loaded data from memory is stored in a register after it is sign-extended to a longword. operation: mov(long m,long n) /* mov rm,rn */ { r[n]=r[m]; pc+=2; } 205 movbs(long m,long n) /* mov.b rm,@rn */ { write_byte(r[n],r[m]); pc+=2; } movws(long m,long n) /* mov.w rm,@rn */ { write_word(r[n],r[m]); pc+=2; } movls(long m,long n) /* mov.l rm,@rn */ { write_long(r[n],r[m]); pc+=2; } movbl(long m,long n) /* mov.b @rm,rn */ { r[n]=(long)read_byte(r[m]); if ((r[n]&0x80)==0) r[n]&0x000000ff; else r[n]|=0xffffff00; pc+=2; } movwl(long m,long n) /* mov.w @rm,rn */ { r[n]=(long)read_word(r[m]); if ((r[n]&0x8000)==0) r[n]&0x0000ffff; else r[n]|=0xffff0000; pc+=2; } movll(long m,long n) /* mov.l @rm,rn */ { r[n]=read_long(r[m]); pc+=2; } 206 movbm(long m,long n) /* mov.b rm,@Crn */ { write_byte(r[n]C1,r[m]); r[n]C=1; pc+=2; } movwm(long m,long n) /* mov.w rm,@Crn */ { write_word(r[n]C2,r[m]); r[n]C=2; pc+=2; } movlm(long m,long n) /* mov.l rm,@Crn */ { write_long(r[n]C4,r[m]); r[n]C=4; pc+=2; } movbp(long m,long n) /* mov.b @rm+,rn */ { r[n]=(long)read_byte(r[m]); if ((r[n]&0x80)==0) r[n]&0x000000ff; else r[n]|=0xffffff00; if (n!=m) r[m]+=1; pc+=2; } movwp(long m,long n) /* mov.w @rm+,rn */ { r[n]=(long)read_word(r[m]); if ((r[n]&0x8000)==0) r[n]&0x0000ffff; else r[n]|=0xffff0000; if (n!=m) r[m]+=2; pc+=2; } 207 movlp(long m,long n) /* mov.l @rm+,rn */ { r[n]=read_long(r[m]); if (n!=m) r[m]+=4; pc+=2; } movbs0(long m,long n) /* mov.b rm,@(r0,rn) */ { write_byte(r[n]+r[0],r[m]); pc+=2; } movws0(long m,long n) /* mov.w rm,@(r0,rn) */ { write_word(r[n]+r[0],r[m]); pc+=2; } movls0(long m,long n) /* mov.l rm,@(r0,rn) */ { write_long(r[n]+r[0],r[m]); pc+=2; } movbl0(long m,long n) /* mov.b @(r0,rm),rn */ { r[n]=(long)read_byte(r[m]+r[0]); if ((r[n]&0x80)==0) r[n]&0x000000ff; else r[n]|=0xffffff00; pc+=2; } movwl0(long m,long n) /* mov.w @(r0,rm),rn */ { r[n]=(long)read_word(r[m]+r[0]); if ((r[n]&0x8000)==0) r[n]&0x0000ffff; else r[n]|=0xffff0000; pc+=2; } 208 movll0(long m,long n) /* mov.l @(r0,rm),rn */ { r[n]=read_long(r[m]+r[0]); pc+=2; } examples: mov r0,r1 ; before execution r0 = h'ffffffff, r1 = h'00000000 ; after execution r1 = h'ffffffff mov.w r0,@r1 ; before execution r0 = h'ffff7f80 ; after execution @r1 = h'7f80 mov.b @r0,r1 ; before execution @r0 = h'80, r1 = h'00000000 ; after execution r1 = h'ffffff80 mov.w r0,@Cr1 ; before execution r0 = h'aaaaaaaa, r1 = h'ffff7f80 ; after execution r1 = h'ffff7f7e, @r1 = h'aaaa mov.l @r0+,r1 ; before execution r0 = h'12345670 ; after execution r0 = h'12345674, r1 = @h'12345670 mov.b r1,@(r0,r2) ; before execution r2 = h'00000004, r0 = h'10000000 ; after execution r1 = @h'10000004 mov.w @(r0,r2),r1 ; before execution r2 = h'00000004, r0 = h'10000000 ; after execution r1 = @h'10000004 209 8.2.35 mov (move immediate data): data transfer instruction format abstract code cycle t bit mov #imm,rn imm ? sign extension ? rn 1110nnnniiiiiiii 1 mov.w @(disp,pc),rn (disp 2 + pc) ? sign extension ? rn 1001nnnndddddddd 1 mov.l @(disp,pc),rn (disp 4 + pc) ? rn 1101nnnndddddddd 1 description: stores immediate data, which has been sign-extended to a longword, into general register rn. if the data is a word or longword, table data stored in the address specified by pc + displacement is accessed. if the data is a word, the 8-bit displacement is zero-extended and doubled. consequently, the relative interval from the table is up to pc + 510 bytes. the pc points to the starting address of the second instruction after this mov instruction. if the data is a longword, the 8-bit displacement is zero-extended and quadrupled. consequently, the relative interval from the table is up to pc + 1020 bytes. the pc points to the starting address of the second instruction after this mov instruction, but the lowest two bits of the pc are corrected to b00. note: the end address of the program area (module) or the second address after an unconditional branch instruction are suitable for the start address of the table. if suitable table assignment is impossible (for example, if there are no unconditional branch instructions within the area specified by pc + 510 bytes or pc + 1020 bytes), the bra instruction must be used to jump past the table. when this mov instruction is placed immediately after a delayed branch instruction, the pc points to an address specified by (the starting address of the branch destination) + 2. operation: movi(long i,long n) /* mov #imm,rn */ { if ((i&0x80)==0) r[n]=(0x000000ff & (long)i); else r[n]=(0xffffff00 | (long)i); pc+=2; } 210 movwi(long d,long n) /* mov.w @(disp,pc),rn */ { long disp; disp=(0x000000ff & (long)d); r[n]=(long)read_word(pc+(disp<<1)); if ((r[n]&0x8000)==0) r[n]&=0x0000ffff; else r[n]|=0xffff0000; pc+=2; } movli(long d,long n) /* mov.l @(disp,pc),rn */ { long disp; disp=(0x000000ff & (long)d); r[n]=read_long((pc&0xfffffffc)+(disp<<2)); pc+=2; } examples: address 1000 mov #h'80,r1 ; r1 = h'ffffff80 1002 mov.w imm,r2 ; r2 = h'ffff9abc, imm means @(h'08,pc) 1004 add #C1,r0 1006 tst r0,r0 ; ? pc location used for address calculation for ; the mov.w instruction 1008 movt r13 100a bra next ; delayed branch instruction 100c mov.l @(4,pc),r3 ; r3 = h'12345678 100e imm .data.w h'9abc 1010 .data.w h'1234 1012 next jmp @r3 ; branch destination of the bra instruction 1014 cmp/eq #0,r0 ; ? pc location used for address calculation for ; the mov.l instruction .align 4 1018 .data.l h'12345678 211 8.2.36 mov (move peripheral data): data transfer instruction format abstract code cycle t bit mov.b @(disp,gbr),r0 (disp + gbr) ? sign extension ? r0 11000100dddddddd 1 mov.w @(disp,gbr),r0 (disp 2 + gbr) ? sign extension ? r0 11000101dddddddd 1 mov.l @(disp,gbr),r0 (disp 4 + gbr) ? r0 11000110dddddddd 1 mov.b r0,@(disp,gbr) r0 ? (disp + gbr) 11000000dddddddd 1 mov.w r0,@(disp,gbr) r0 ? (disp 2 + gbr) 11000001dddddddd 1 mov.l r0,@(disp,gbr) r0 ? (disp 4 + gbr) 11000010dddddddd 1 description: transfers the source operand to the destination. this instruction is suitable for accessing data in the peripheral module area. the data can be a byte, word, or longword, but only the r0 register can be used. a peripheral module base address is set to the gbr. when the peripheral module data is a byte, the only change made is to zero-extend the 8-bit displacement. consequently, an address within +255 bytes can be specified. when the peripheral module data is a word, the 8-bit displacement is zero-extended and doubled. consequently, an address within +510 bytes can be specified. when the peripheral module data is a longword, the 8-bit displacement is zero-extended and is quadrupled. consequently, an address within +1020 bytes can be specified. if the displacement is too short to reach the memory operand, the above @(r0,rn) mode must be used after the gbr data is transferred to a general register. when the source operand is in memory, the loaded data is stored in the register after it is sign-extended to a longword. note: the destination register of a data load is always r0. r0 cannot be accessed by the next instruction until the load instruction is finished. the instruction order shown in figure 8-1 will give better results. mov.b and add @(12, gbr), r0 #80, r0 #20, r1 mov.b add and @(12, gbr), r0 #20, r1 #80, r0 figure 8-1 using r0 after mov 212 operation: movblg(long d) /* mov.b @(disp,gbr),r0 */ { long disp; disp=(0x000000ff & (long)d); r[0]=(long)read_byte(gbr+disp); if ((r[0]&0x80)==0) r[0]&=0x000000ff; else r[0]|=0xffffff00; pc+=2; } movwlg(long d) /* mov.w @(disp,gbr),r0 */ { long disp; disp=(0x000000ff & (long)d); r[0]=(long)read_word(gbr+(disp<<1)); if ((r[0]&0x8000)==0) r[0]&=0x0000ffff; else r[0]|=0xffff0000; pc+=2; } movllg(long d) /* mov.l @(disp,gbr),r0 */ { long disp; disp=(0x000000ff & (long)d); r[0]=read_long(gbr+(disp<<2)); pc+=2; } 213 movbsg(long d) /* mov.b r0,@(disp,gbr) */ { long disp; disp=(0x000000ff & (long)d); write_byte(gbr+disp,r[0]); pc+=2; } movwsg(long d) /* mov.w r0,@(disp,gbr) */ { long disp; disp=(0x000000ff & (long)d); write_word(gbr+(disp<<1),r[0]); pc+=2; } movlsg(long d) /* mov.l r0,@(disp,gbr) */ { long disp; disp=(0x000000ff & (long)d); write_long(gbr+(disp<<2),r[0]); pc+=2; } examples: mov.l @(2,gbr),r0 ; before execution @(gbr + 8) = h'12345670 ; after execution r0 = @h'12345670 mov.b r0,@(1,gbr) ; before execution r0 = h'ffff7f80 ; after execution @(gbr + 1) = h'ffff7f80 214 8.2.37 mov (move structure data): data transfer instruction format abstract code cycle t bit mov.b r0,@(disp,rn) r0 ? (disp + rn) 10000000nnnndddd 1 mov.w r0,@(disp,rn) r0 ? (disp 2 + rn) 10000001nnnndddd 1 mov.l rm,@(disp,rn) rm ? (disp 4 + rn) 0001nnnnmmmmdddd 1 mov.b @(disp,rm),r0 (disp + rm) ? sign extension ? r0 10000100mmmmdddd 1 mov.w @(disp,rm),r0 (disp 2 + rm) ? sign extension ? r0 10000101mmmmdddd 1 mov.l @(disp,rm),rn (disp 4 + rm) ? rn 0101nnnnmmmmdddd 1 description: transfers the source operand to the destination. this instruction is suitable for accessing data in a structure or a stack. the data can be a byte, word, or longword, but when a byte or word is selected, only the r0 register can be used. when the data is a byte, the only change made is to zero-extend the 4-bit displacement. consequently, an address within +15 bytes can be specified. when the data is a word, the 4-bit displacement is zero-extended and doubled. consequently, an address within +30 bytes can be specified. when the data is a longword, the 4-bit displacement is zero-extended and quadrupled. consequently, an address within +60 bytes can be specified. if the displacement is too short to reach the memory operand, the aforementioned @(r0,rn) mode must be used. when the source operand is in memory, the loaded data is stored in the register after it is sign-extended to a longword. note: when byte or word data is loaded, the destination register is always r0. r0 cannot be accessed by the next instruction until the load instruction is finished. the instruction order in figure 8-2 will give better results. mov.b and add @(2, r1), r0 #80, r0 #20, r1 mov.b add and @(2, r1), r0 #20, r1 #80, r0 figure 8-2 using r0 after mov 215 operation: movbs4(long d,long n) /* mov.b r0,@(disp,rn) */ { long disp; disp=(0x0000000f & (long)d); write_byte(r[n]+disp,r[0]); pc+=2; } movws4(long d,long n) /* mov.w r0,@(disp,rn) */ { long disp; disp=(0x0000000f & (long)d); write_word(r[n]+(disp<<1),r[0]); pc+=2; } movls4(long m,long d,long n) /* mov.l rm,@(disp,rn) */ { long disp; disp=(0x0000000f & (long)d); write_long(r[n]+(disp<<2),r[m]); pc+=2; } movbl4(long m,long d) /* mov.b @(disp,rm),r0 */ { long disp; disp=(0x0000000f & (long)d); r[0]=read_byte(r[m]+disp); if ((r[0]&0x80)==0) r[0]&=0x000000ff; else r[0]|=0xffffff00; pc+=2; } 216 movwl4(long m,long d) /* mov.w @(disp,rm),r0 */ { long disp; disp=(0x0000000f & (long)d); r[0]=read_word(r[m]+(disp<<1)); if ((r[0]&0x8000)==0) r[0]&=0x0000ffff; else r[0]|=0xffff0000; pc+=2; } movll4(long m,long d,long n) /* mov.l @(disp,rm),rn */ { long disp; disp=(0x0000000f & (long)d); r[n]=read_long(r[m]+(disp<<2)); pc+=2; } examples: mov.l @(2,r0),r1 ; before execution @(r0 + 8) = h'12345670 ; after execution r1 = @h'12345670 mov.l r0,@(h'3c,r1) ; before execution r0 = h'ffff7f80 ; after execution @(r1 + 60) = h'ffff7f80 217 8.2.38 mova (move effective address): data transfer instruction format abstract code cycle t bit mova @(disp,pc),r0 disp 4 + pc ? r0 11000111dddddddd 1 description: stores the effective address of the source operand into general register r0. the 8-bit displacement is zero-extended and quadrupled. consequently, the relative interval from the operand is pc + 1020 bytes. the pc points to the starting address of the second instruction after this mova instruction, but the lowest two bits of the pc are corrected to b00. note: if this instruction is placed immediately after a delayed branch instruction, the pc must point to an address specified by (the starting address of the branch destination) + 2. operation: mova(long d) /* mova @(disp,pc),r0 */ { long disp; disp=(0x000000ff & (long)d); r[0]=(pc&0xfffffffc)+(disp<<2); pc+=2; } examples: address .org h'1006 1006 mova str,r0 ; address of str ? r0 1008 mov.b @r0,r1 ; r1 = x ? pc location after correcting the lowest ; two bits 100a add r4,r5 ; ? original pc location for address calculation for ; the mova instruction .align 4 100c str: .sdata xyzp12 ............... 2002 bra trget ; delayed branch instruction 2004 mova @(0,pc),r0 ; address of trget + 2 ? r0 2006 nop 218 8.2.39 movt (move t bit): data transfer instruction format abstract code cycle t bit movt rn t ? rn 0000nnnn00101001 1 description: stores the t bit value into general register rn. when t = 1, 1 is stored in rn, and when t = 0, 0 is stored in rn. operation: movt(long n) /* movt rn */ { r[n]=(0x00000001 & sr); pc+=2; } examples: xor r2,r2 ; r2 = 0 cmp/pz r2 ; t = 1 movt r0 ; r0 = 1 clrt ; t = 0 movt r1 ; r1 = 0 219 8.2.40 mul.l (multiply long): arithmetic instruction format abstract code cycle t bit mul.l rm,rn rn rm ? macl 0000nnnnmmmm0111 2 (to 5) description: performs 32-bit multiplication of the contents of general registers rn and rm, and stores the bottom 32 bits of the result in the macl register. the mach register data does not change. operation: mull(long m,long n) /* mul.l rm,rn */ { macl=r[n]*r[m]; pc+=2; } examples: mull r0,r1 ; before execution r0 = h'fffffffe, r1 = h'00005555 ; after execution macl = h'ffff5556 sts macl,r0 ; operation result 220 8.2.41 muls.w (multiply as signed word): arithmetic instruction format abstract code cycle t bit muls.w rm,rn muls rm,rn signed operation, rn rm ? macl 0010nnnnmmmm1111 1 (to 3) description: performs 16-bit multiplication of the contents of general registers rn and rm, and stores the 32-bit result in the macl register. the operation is signed and the mach register data does not change. operation: muls(long m,long n) /* muls rm,rn */ { macl=((long)(short)r[n]*(long)(short)r[m]); pc+=2; } examples: muls r0,r1 ; before execution r0 = h'fffffffe, r1 = h'00005555 ; after execution macl = h'ffff5556 sts macl,r0 ; operation result 221 8.2.42 mulu.w (multiply as unsigned word): arithmetic instruction format abstract code cycle t bit mulu.w rm,rn mulu rm,rn unsigned, rn rm ? macl 0010nnnnmmmm1110 1 (to 3) description: performs 16-bit multiplication of the contents of general registers rn and rm, and stores the 32-bit result in the macl register. the operation is unsigned and the mach register data does not change. operation: mulu(long m,long n) /* mulu rm,rn */ { macl=((unsigned long)(unsigned short)r[n] *(unsigned long)(unsigned short)r[m]); pc+=2; } examples: mulu r0,r1 ; b efore execution r0 = h'00000002, r1 = h'ffffaaaa ; a fter execution macl = h'00015554 sts macl,r0 ; o peration result 222 8.2.43 neg (negate): arithmetic instruction format abstract code cycle t bit neg rm,rn 0 C rm ? rn 0110nnnnmmmm1011 1 description: takes the twos complement of data in general register rm, and stores the result in rn. this effectively subtracts rm data from 0, and stores the result in rn. operation: neg(long m,long n) /* neg rm,rn */ { r[n]=0-r[m]; pc+=2; } examples: neg r0,r1 ; before execution r0 = h'00000001 ; after execution r1 = h'ffffffff 223 8.2.44 negc (negate with carry): arithmetic instruction format abstract code cycle t bit negc rm,rn 0 C rm C t ? rn, borrow ? t 0110nnnnmmmm1010 1 borrow description: subtracts general register rm data and the t bit from 0, and stores the result in rn. if a borrow is generated, t bit changes accordingly. this instruction is used for inverting the sign of a value that has more than 32 bits. operation: negc(long m,long n) /* negc rm,rn */ { unsigned long temp; temp=0-r[m]; r[n]=temp-t; if (0 225 8.2.46 not (notlogical complement): logic operation instruction format abstract code cycle t bit not rm,rn rm ? rn 0110nnnnmmmm0111 1 description: takes the ones complement of general register rm data, and stores the result in rn. this effectively inverts each bit of rm data and stores the result in rn. operation: not(long m,long n) /* not rm,rn */ { r[n]=~r[m]; pc+=2; } examples: not r0,r1 ; before execution r0 = h'aaaaaaaa ; after execution r1 = h'55555555 226 8.2.47 or (or logical) logic operation instruction format abstract code cycle t bit or rm,rn rn | rm ? rn 0010nnnnmmmm1011 1 or #imm,r0 r0 | imm ? r0 11001011iiiiiiii 1 or.b #imm,@(r0,gbr) (r0 + gbr) | imm ? (r0 + gbr) 11001111iiiiiiii 3 description: logically ors the contents of general registers rn and rm, and stores the result in rn. the contents of general register r0 can also be ored with zero-extended 8-bit immediate data, or 8-bit memory data accessed by using indirect indexed gbr addressing can be ored with 8-bit immediate data. operation: or(long m,long n) /* or rm,rn */ { r[n]|=r[m]; pc+=2; } ori(long i) /* or #imm,r0 */ { r[0]|=(0x000000ff & (long)i); pc+=2; } orm(long i) /* or.b #imm,@(r0,gbr) */ { long temp; temp=(long)read_byte(gbr+r[0]); temp|=(0x000000ff & (long)i); write_byte(gbr+r[0],temp); pc+=2; } 227 examples: or r0,r1 ; before execution r0 = h'aaaa5555, r1 = h'55550000 ; after execution r1 = h'ffff5555 or #h'f0,r0 ; before execution r0 = h'00000008 ; after execution r0 = h'000000f8 or.b #h'50,@(r0,gbr) ; before execution @(r0,gbr) = h'a5 ; after execution @(r0,gbr) = h'f5 228 8.2.48 pref (prefetch data to the cache) format abstract code cycle t bit pref @rn (rn &0xfffffff0) ? cache (rn &0xfffffff0+4) ? cache (rn &0xfffffff0+8) ? cache (rn &0xfffffff0+c) ? cache 0000nnnn10000011 1 description: loads data to cache on software prefetching. 16-byte data containing the data pointed by rn (cache 1 line) is loaded to the cache. address rn should be on longword boundary. no address related error is detected in this instruction. in case of an error, the instruction operates as nop. the destination is on-chip cache, therefore this instruction functions as an nop instruction in effect, that is, it never changes registers or processor status. operation: pref(long n) /*pref*/ { pc+=2; } examples: mov.l soft_pf,r1 ; address of r1 is soft_pf pref @r1 ; load data from soft_pf to on-chip cache .align 4 soft_pf: .data.1 h'12345678 .data.1 h'9abcdef0 .data.1 h'aaaa5555 .data.1 h'5555aaaa 229 8.2.49 rotcl (rotate with carry left): shift instruction format abstract code cycle t bit rotcl rn t ? rn ? t 0100nnnn00100100 1 msb description: rotates the contents of general register rn and the t bit to the left by one bit, and stores the result in rn. the bit that is shifted out of the operand is transferred to the t bit (figure 8-3). lsb msb t rotcl figure 8-3 rotate with carry left operation: rotcl(long n) /* rotcl rn */ { long temp; if ((r[n]&0x80000000)==0) temp=0; else temp=1; r[n]<<=1; if (t==1) r[n]|=0x00000001; else r[n]&=0xfffffffe; if (temp==1) t=1; else t=0; pc+=2; } examples: rotcl r0 ; before execution r0 = h'80000000, t = 0 ; after execution r0 = h'00000000, t = 1 230 8.2.50 rotcr (rotate with carry right): shift instruction format abstract code cycle t bit rotcr rn t ? rn ? t 0100nnnn00100101 1 lsb description: rotates the contents of general register rn and the t bit to the right by one bit, and stores the result in rn. the bit that is shifted out of the operand is transferred to the t bit (figure 8-4). lsb msb t rotcr figure 8-4 rotate with carry right operation: rotcr(long n) /* rotcr rn */ { long temp; if ((r[n]&0x00000001)==0) temp=0; else temp=1; r[n]>>=1; if (t==1) r[n]|=0x80000000; else r[n]&=0x7fffffff; if (temp==1) t=1; else t=0; pc+=2; } examples: rotcr r0 ; before execution r0 = h'00000001, t = 1 ; after execution r0 = h'80000000, t = 1 231 8.2.51 rotl (rotate left): shift instruction format abstract code cycle t bit rotl rn t ? rn ? msb 0100nnnn00000100 1 msb description: rotates the contents of general register rn to the left by one bit, and stores the result in rn (figure 8-5). the bit that is shifted out of the operand is transferred to the t bit. lsb msb t rotl figure 8-5 rotate left operation: rotl(long n) /* rotl rn */ { if ((r[n]&0x80000000)==0) t=0; else t=1; r[n]<<=1; if (t==1) r[n]|=0x00000001; else r[n]&=0xfffffffe; pc+=2; } examples: rotl r0 ; before execution r0 = h'80000000, t = 0 ; after execution r0 = h'00000001, t = 1 232 8.2.52 rotr (rotate right): shift instruction format abstract code cycle t bit rotr rn lsb ? rn ? t 0100nnnn00000101 1 lsb description: rotates the contents of general register rn to the right by one bit, and stores the result in rn (figure 8-6). the bit that is shifted out of the operand is transferred to the t bit. lsb msb t rotr figure 8-6 rotate right operation: rotr(long n) /* rotr rn */ { if ((r[n]&0x00000001)==0) t=0; else t=1; r[n]>>=1; if (t==1) r[n]|=0x80000000; else r[n]&=0x7fffffff; pc+=2; } examples: rotr r0 ; before execution r0 = h'00000001, t = 0 ; after execution r0 = h'80000000, t = 1 233 8.2.53 rte (return from exception): system control instruction (privileged only) class: delayed branch instruction format abstract code cycle t bit rte ssr ? sr, spc ? pc 0000000000101011 4 description: returns from an exception routine. the pc and sr values are loaded from spc and ssr. the program continues from the address specified by the loaded pc value. rte is a privileged instruction and can be used in privileged mode only. if used in user mode, it causes an illegal instruction exception. note: since this is a delayed branch instruction, the instruction after rte is executed before branching. no interrupts are accepted between this instruction and the one immediately following it. if the instruction immediately following is a branch instruction, it is acknowledged as an illegal slot instruction. if this instruction is located in a delayed slot immediately following a delayed branch instruction, it is acknowledged as an illegal slot instruction. an instruction executed in a delayed slot immediately following this instruction uses the sr restored by this instruction. make sure that an instruction executed in a delayed slot immediately following this instruction does not cause an exception. also, an instruction that manipulates the md and bl bits of the sr register, as well as the instruction following it, should be used with the multiplier disabled or with fixed physical address space (p1 and p2). operation: rte() /* rte */ { unsigned long temp; temp=pc; pc=spc; sr=ssr; delay_slot(temp+2); } 234 examples: rte ; returns to the original routine add #8,r15 ; executes add before branching note: in delayed branching, the branching operation itself takes place after the slot instruction has been executed. however, execution of instructions (register updating, etc.) should always be done in the sequence of delayed branch instruction followed by delayed slot instruction. for example, even if a delayed slot updates a register in which the branching destination address is stored, the contents of the register before updating will be used as the branching destination address. 235 8.2.54 rts (return from subroutine): branch instruction class: delayed branch instruction format abstract code cycle t bit rts pr ? pc 0000000000001011 2 description: returns from a subroutine procedure. the pc values are restored from the pr, and the program continues from the address specified by the restored pc value. this instruction is used to return to the program from a subroutine program called by a bsr or jsr instruction. note: since this is a delayed branch instruction, the instruction after this rts is executed before branching. no interrupts are accepted between this instruction and the next instruction. if the next instruction is a branch instruction, it is acknowledged as an illegal slot instruction. if this instruction is located in a delayed slot immediately following a delayed branch instruction, it is acknowledged as an illegal slot instruction. an instruction restoring the pr should be prior to an rts instruction. that restoring instruction should not be the delay slot of the rts. operation: rts() /* rts */ { unsigned long temp; temp=pc; pc=pr+4; delay_slot(temp+2); } 236 examples: mov.l table,r3 ; r3 = address of trget jsr @r3 ; branches to trget nop ; executes nop before branching add r0,r1 ; ? return address for when the subroutine ; procedure is completed (pr data) ............. table: .data.l trget ; jump table ............. trget: mov r1,r0 ; ? procedure entrance rts ; pr data ? pc mov #12,r0 ; executes mov before branching note: in delayed branching, the branching operation itself takes place after the slot instruction has been executed. however, execution of instructions (register updating, etc.) should always be done in the sequence of delayed branch instruction followed by delayed slot instruction. for example, even if a delayed slot updates a register in which the branching destination address is stored, the contents of the register before updating will be used as the branching destination address. 237 8.2.55 setrc (set repeat count to rc): system control instruction (sh3-dsp only) format abstract code cycle t bit setrc rm lsw of rm ? rc (msw of sr), repeat control flag ? rf1, rf0 0100mmmm00010100 3 setrc #imm imm ? rc (msw of sr), repeat control flag ? rf1, rf0 10000010iiiiiiii 3 description: sets the repeat count to the sr registers rc counter. when the operand is a register, the bottom 12 bits are used as the repeat count. when the operand is an immediate data value, 8 bits are used as the repeat count. set repeat control flags to rf1, rf0 bits of the sr register. use of the setrc instruction is subject to any limitations. refer to section 5.12, dsp repeat (loop) control, for more information. operation: setrc(long m) /* setrc rm */ { long temp; temp=(r[m] & 0x00000fff)<<16; sr&=0xf000fff3; sr|=temp; rf1=repeat_control_flag1; rf0=repeat_control_flag0; pc+=2; } setrci(long i) /* setrc #imm */ { long temp; temp=((long)i & 0x000000ff)<<16; sr&=0xf000ffff; sr|=temp; rf1=repeat_control_flag1; rf0=repeat_control_flag0; pc+=2; } 238 setrc #imm 70 setrc rn imm sr 8 bits 31 12 11 0 rn sr 1 imm 255 1 rm [11:0] 4095 12 bits 31 27 23 16 15 0 0 8 bits 12 bits 31 27 16 15 0 repeat control flag repeat control flag 32 32 figure 8-7 setrc instruction example: ldrs sta ; set repeat start address to rs. ldre end ; set repeat end address to re. setrc #32 ; repeat 32 times from inst.a to inst.c. inst.0 ; sta: inst.a ; inst.b ; ............ end: inst.c ; inst.d ; 239 8.2.56 sets (set s bit): system control instruction format abstract code cycle t bit sets 1 ? s 0000000001011000 1 description: sets the s bit to 1. operation: sett() /* sets */ { s=1; pc+=2; } examples: sets ; before execution s = 0 ; after execution s = 1 240 8.2.57 sett (set t bit): system control instruction format abstract code cycle t bit sett 1 ? t 0000000000011000 11 description: sets the t bit to 1. operation: sett() /* sett */ { t=1; pc+=2; } examples: sett ; before execution t = 0 ; after execution t = 1 241 8.2.58 shad (shift arithmetic dynamically): shift instruction format abstract code cycle t bit shad rm,rn rn << rm ? rn (rm 3 0) rn >> rm ? [msb ? rn] 0100nnnnmmmm1100 2 description: arithmetically shifts the contents of general register rn. general register rm indicates the shift direction and the number of bits to be shifted. ? if the value of the rm register is positive, the shift is to the left, if it is negative the shift is to the right. ? the number of bits to be shifted is indicated by the five lower bits (bits 4 to 0) of the rm register. if the value is negative (msb = 1), the rm register is indicated with a complement of 2. the magnitude of left shift may be 0 to 31, and the magnitude of right shift may be 1 to 32. 0 msb lsb rm 3 0 msb msb lsb rm < 0 figure 8-8 shift arithmetic dynamically 242 operation: shad(long m,n) /* shad rm,rn */ { long cont, sgn; sgn = r[m] &0x80000000; cnt = r[m] &0x0000001f; if (sgn==0) r[n]<<=cnt; else r[n]=(signed long)r[n]>>((~cnt+1) & 0x1f); /*shift arithmetic right*/ pc+=2; } examples: shad r1,r2 ; before execution r1 = h'ffffffec, r2 = h'80180000 ; after execution r1 = h'ffffffec, r2 = h'fffff801 shad r3,r4 ; before execution r3 = h'00000014, r4 = h'fffff801 ; after execution r3 = h'00000014, r4 = h'80100000 243 8.2.59 shal (shift arithmetic left): shift instruction format abstract code cycle t bit shal rn t ? rn ? 0 0100nnnn00100000 1 msb description: arithmetically shifts the contents of general register rn to the left by one bit, and stores the result in rn. the bit that is shifted out of the operand is transferred to the t bit (figure 8-9). lsb msb t0 shal figure 8-9 shift arithmetic left operation: shal(long n) /* shal rn (same as shll) */ { if ((r[n]&0x80000000)==0) t=0; else t=1; r[n]<<=1; pc+=2; } examples: shal r0 ; before execution r0 = h'80000001, t = 0 ; after execution r0 = h'00000002, t = 1 244 8.2.60 shar (shift arithmetic right): shift instruction format abstract code cycle t bit shar rn msb ? rn ? t 0100nnnn00100001 1 lsb description: arithmetically shifts the contents of general register rn to the right by one bit, and stores the result in rn. the bit that is shifted out of the operand is transferred to the t bit (figure 8-10). lsb msb t shar figure 8-10 shift arithmetic right operation: shar(long n) /* shar rn */ { long temp; if ((r[n]&0x00000001)==0) t=0; else t=1; if ((r[n]&0x80000000)==0) temp=0; else temp=1; r[n]>>=1; if (temp==1) r[n]|=0x80000000; else r[n]&=0x7fffffff; pc+=2; } examples: shar r0 ; before execution r0 = h'80000001, t = 0 ; after execution r0 = h'c0000000, t = 1 245 8.2.61 shld (shift logical dynamically): shift instruction format abstract code cycle t bit shld rm,rn rn << rm ? rn (rm 3 0) rn >> rm ? [0 ? rn] (rm < 0) 0100nnnnmmmm1101 1 description: arithmetically shifts the contents of general register rn. general register rm indicates the shift direction and the number of bits to be shifted. the t bit is the last shifted bit of rn. if the value of the rm register is positive, the shift is to the left, if it is negative the shift is to the right. if the shift is to the right, a top bit of 0 is added. the number of bits to be shifted is indicated by the five lower bits (bits 4 to 0) of the rm register. if the value is negative (msb = 1), the rm register is indicated with a complement of 2. the magnitude of left shift may be 0 to 31, and the magnitude of right shift may be 1 to 32. 0 msb lsb rm 3 0 0 msb lsb rm < 0 figure 8-11 shift logical dynamically 246 operation: shld(long m,n) /* shld rm,rn */ { long cont, sgn; sgn = r[m]&0x80000000; cnt = r[m]&0x0000001f); if (sgn==0) r[n]<<=cnt; else r[n]=r[n]>>((~cnt+1)&0x1f); pc+=2; } examples: shld r1,r2 ; before execution r1 = h'ffffffec, r2 = h'80180000 ; after execution r1 = h'ffffffec, r2 = h'00000801 shld r3,r4 ; before execution r3 = h'00000014, r4 = h'fffff801 ; after execution r3 = h'00000014, r4 = h'80100000 247 8.2.62 shll (shift logical left): shift instruction format abstract code cycle t bit shll rn t ? rn ? 0 0100nnnn00000000 1 msb description: logically shifts the contents of general register rn to the left by one bit, and stores the result in rn. the bit that is shifted out of the operand is transferred to the t bit (figure 8-12). lsb msb t0 shll figure 8-12 shift logical left operation: shll(long n) /* shll rn (same as shal) */ { if ((r[n]&0x80000000)==0) t=0; else t=1; r[n]<<=1; pc+=2; } examples: shll r0 ; before execution r0 = h'80000001, t = 0 ; after execution r0 = h'00000002, t = 1 248 8.2.63 shlln (shift logical left n bits): shift instruction format abstract code cycle t bit shll2 rn rn << 2 ? rn 0100nnnn00001000 1 shll8 rn rn << 8 ? rn 0100nnnn00011000 1 shll16 rn rn << 16 ? rn 0100nnnn00101000 1 description: logically shifts the contents of general register rn to the left by 2, 8, or 16 bits, and stores the result in rn. bits that are shifted out of the operand are not stored (figure 8-13). 0 0 0 msb lsb msb lsb msb lsb shll2 shll8 shll16 figure 8-13 shift logical left n bits operation: shll2(long n) /* shll2 rn */ { r[n]<<=2; pc+=2; } 249 shll8(long n) /* shll8 rn */ { r[n]<<=8; pc+=2; } shll16(long n) /* shll16 rn */ { r[n]<<=16; pc+=2; } examples: shll2 r0 ; before execution r0 = h'12345678 ; after execution r0 = h'48d159e0 shll8 r0 ; before execution r0 = h'12345678 ; after execution r0 = h'34567800 shll16 r0 ; before execution r0 = h'12345678 ; after execution r0 = h'56780000 250 8.2.64 shlr (shift logical right): shift instruction format abstract code cycle t bit shlr rn 0 ? rn ? t 0100nnnn00000001 1 lsb description: logically shifts the contents of general register rn to the right by one bit, and stores the result in rn. the bit that is shifted out of the operand is transferred to the t bit (figure 8-14). lsb msb t 0 shlr figure 8-14 shift logical right operation: shlr(long n) /* shlr rn */ { if ((r[n]&0x00000001)==0) t=0; else t=1; r[n]>>=1; r[n]&=0x7fffffff; pc+=2; } examples: shlr r0 ; before execution r0 = h'80000001, t = 0 ; after execution r0 = h'40000000, t = 1 251 8.2.65 shlrn (shift logical right n bits): shift instruction format abstract code cycle t bit shlr2 rn rn>>2 ? rn 0100nnnn00001001 1 shlr8 rn rn>>8 ? rn 0100nnnn00011001 1 shlr16 rn rn>>16 ? rn 0100nnnn00101001 1 description: logically shifts the contents of general register rn to the right by 2, 8, or 16 bits, and stores the result in rn. bits that are shifted out of the operand are not stored (figure 8-15). 0 0 0 msb lsb msb lsb msb lsb shlr2 shlr8 shlr16 figure 8-15 shift logical right n bits operation: shlr2(long n) /* shlr2 rn */ { r[n]>>=2; r[n]&=0x3fffffff; pc+=2; } 252 shlr8(long n) /* shlr8 rn */ { r[n]>>=8; r[n]&=0x00ffffff; pc+=2; } shlr16(long n) /* shlr16 rn */ { r[n]>>=16; r[n]&=0x0000ffff; pc+=2; } examples: shlr2 r0 ; before execution r0 = h'12345678 ; after execution r0 = h'048d159e shlr8 r0 ; before execution r0 = h'12345678 ; after execution r0 = h'00123456 shlr16 r0 ; before execution r0 = h'12345678 ; after execution r0 = h'00001234 253 8.2.66 sleep (sleep): system control instruction (privileged only) format abstract code cycle t bit sleep sleep 0000000000011011 4 description: sets the cpu into power-down mode. in power-down mode, instruction execution stops, but the cpu module status is maintained, and the cpu waits for an interrupt request. if an interrupt is requested, the cpu exits the power-down mode and begins exception processing. sleep is a privileged instruction and can be used in privileged mode only. if used in user mode, it causes an illegal instruction exception. note: the number of cycles given is for the transition to sleep mode. operation: sleep() /* sleep */ { pc-=2; error(sleep mode.); } examples: sleep ; enters power-down mode 254 8.2.67 stc (store control register): system control instruction (privileged only) format abstract code cycle t bit stc sr,rn sr ? rn 0000nnnn00000010 1 stc gbr,rn gbr ? rn 0000nnnn00010010 1 stc vbr,rn vbr ? rn 0000nnnn00100010 1 stc ssr,rn ssr ? rn 0000nnnn00110010 1 stc spc,rn spc ? rn 0000nnnn01000010 1 stc mod,rn * 1 mod ? rn 0000nnnn01010010 1 stc re,rn * 1 re ? rn 0000nnnn01110010 1 stc rs,rn * 1 rs ? rn 0000nnnn01100010 1 stc r0_bank,rn r0_bank ? rn 0000nnnn10000010 1 stc r1_bank,rn r1_bank ? rn 0000nnnn10010010 1 stc r2_bank,rn r2_bank ? rn 0000nnnn10100010 1 stc r3_bank,rn r3_bank ? rn 0000nnnn10110010 1 stc r4_bank,rn r4_bank ? rn 0000nnnn11000010 1 stc r5_bank,rn r5_bank ? rn 0000nnnn11010010 1 stc r6_bank,rn r6_bank ? rn 0000nnnn11100010 1 stc r7_bank,rn r7_bank ? rn 0000nnnn11110010 1 stc.l sr,@-rn rn C 4 ? rn, sr ? (rn) 0100nnnn00000011 1/2 * 2 stc.l gbr,@-rn rn C 4 ? rn, gbr ? (rn) 0100nnnn00010011 1/2 * 2 stc.l vbr,@-rn rn C 4 ? rn, vbr ? (rn) 0100nnnn00100011 1/2 * 2 stc.l ssr,@-rn rn C 4 ? rn, ssr ? (rn) 0100nnnn00110011 1/2 * 2 stc.l spc,@-rn rn C 4 ? rn, spc ? (rn) 0100nnnn01000011 1/2 * 2 stc.l mod,@-rn * 1 rn C 4 ? rn, mod ? (rn) 0100nnnn01010011 2 stc.l re,@-rn * 1 rn C 4 ? rn, re ? (rn) 0100nnnn01110011 2 stc.l rs,@-rn * 1 rn C 4 ? rn, rs ? (rn) 0100nnnn01100011 2 stc.l r0_bank,@-rn rn C 4 ? rn, r0_bank ? (rn) 0100nnnn10000011 2 stc.l r1_bank,@-rn rn C 4 ? rn, r1_bank ? (rn) 0100nnnn10010011 2 stc.l r2_bank,@-rn rn C 4 ? rn, r2_bank ? (rn) 0100nnnn10100011 2 stc.l r3_bank,@-rn rn C 4 ? rn, r3_bank ? (rn) 0100nnnn10110011 2 255 format abstract code cycle t bit stc.l r4_bank,@-rn rn C 4 ? rn, r4_bank ? (rn) 0100nnnn11000011 2 stc.l r5_bank,@-rn rn C 4 ? rn, r5_bank ? (rn) 0100nnnn11010011 2 stc.l r6_bank,@-rn rn C 4 ? rn, r6_bank ? (rn) 0100nnnn11100011 2 stc.l r7_bank,@-rn rn C 4 ? rn, r7_bank ? (rn) 0100nnnn11110011 2 notes: 1. sh3-dsp only. 2. two cycles on the sh3-dsp. description: stores data from control registers sr, gbr, vbr, ssr, spc, mod, re and rs, or r0_bank to r7_bank to a specified location. stc and stc.l, except for stc gbr, rn and stc.l gbr, @-rn, are privileged instructions and can be used in privileged mode only. if used in user mode, they can cause illegal instruction exceptions. note that stc gbr, rn and stc.l gbr, @-rn can be used in user mode. the rm_bank operand is designated by the rb bit of the sr register. when the value of the rb bit is 1, the r0_bank1 to r7_bank1 registers and the r8 to r15 registers are used as the rn operand, and the r0_bank0 to r7_bank0 registers are used as the rm_bank operand. when the value of the rb bit is 0, the r0_bank0 to r7_bank0 registers and the r8 to r15 registers are used as the rn operand, and the r0_bank1 to r7_bank1 registers are used as the rm_bank operand. operation: stcsr(long n) /* stc sr,rn */ { r[n]=sr; pc+=2; } stcgbr(long n) /* stc gbr,rn */ { r[n]=gbr; pc+=2; } 256 stcvbr(long n) /* stc vbr,rn */ { r[n]=vbr; pc+=2; } stcssr(long n) /* stc ssr,rn */ { r[n]=ssr; pc+=2; } stcspc(long n) /* stc spc,rn */ { r[n]=spc; pc+=2; } stcrn_bank(long n) /* stc rn_bank,rm */ { /* n=0C7 */ r[n]=rn_bank; pc+=2; } stcmsr(long n) /* stc.l sr,@-rn */ { r[n]-=4; write_long(r[n],sr); pc+=2; } stcmgbr(long n) /* stc.l gbr,@-rn */ { r[n]-=4; write_long(r[n],gbr); pc+=2; } stcmvbr(long n) /* stc.l vbr,@-rn */ { 257 r[n]-=4; write_long(r[n],vbr); pc+=2; } stcmssr(long n) /* stc.l ssr,@-rn */ { r[n]-=4; write_long(r[n],ssr); pc+=2; } stcmspc(long n) /* stc.l spc,@-rn */ { r[n]-=4; write_long(r[n],spc); pc+=2; } stcmrm(long n) /* stc.l rm_bank,@-rnn */ /* n=0C7 */ { r[n]-=4; write_long(r[n],rm_bank); pc+=2; } stcmod(long n) /* stc mod,rn */ { r[n]=mod; pc+=2; } stcre(long n) /* stc re,rn */ { r[n]=re; pc+=2; } 258 stcrs(long n) /* stc rs,rn */ { r[n]=rs; pc+=2; } stcmvbr(long n) /* stc.l vbr,@-rn */ { r[n]-=4; write_long(r[n],vbr); pc+=2; } stcmmod(long n) /* stc.l mod,@-rn */ { r[n]-=4; write_long(r[n],mod); pc+=2; } stcmre(long n) /* stc.l re,@-rn */ { r[n]-=4; write_long(r[n],re); pc+=2; } stcmrs(long n) /* stc.l rs,@-rn */ { r[n]-=4; write_long(r[n],sr); pc+=2; } examples: stc sr,r0 ; before execution r0 = h'ffffffff, sr = h'00000000 ; after execution r0 = h'00000000 stc.l gbr,@-r15 ; before execution r15 = h'10000004 ; after execution r15 = h'10000000, @r15 = gbr 259 8.2.68 sts (store system register): system control instruction format abstract code cycle t bit sts mach,rn mach ? rn 0000nnnn00001010 1 sts macl,rn macl ? rn 0000nnnn00011010 1 sts pr,rn pr ? rn 0000nnnn00101010 1 sts dsr,rn * dsr ? rn 0000nnnn01101010 1 sts a0,rn * a0 ? rn 0000nnnn01111010 1 sts x0,rn * x0 ? rn 0000nnnn10001010 1 sts x1,rn * x1 ? rn 0000nnnn10011010 1 sts y0,rn * y0 ? rn 0000nnnn10101010 1 sts y1,rn * y1 ? rn 0000nnnn10111010 1 sts.l mach,@Crn rn C 4 ? rn, mach ? (rn) 0100nnnn00000010 1 sts.l macl,@Crn rn C 4 ? rn, macl ? (rn) 0100nnnn00010010 1 sts.l pr,@Crn rn C 4 ? rn, pr ? (rn) 0100nnnn00100010 1 sts.l dsr,@Crn * rn C 4 ? rn, dsr ? (rn) 0100nnnn01100010 1 sts.l a0,@Crn * rn C 4 ? rn, a0 ? (rn) 0100nnnn01100010 1 sts.l x0,@-rn * rnC4 ? rn,x0 ? (rn) 0100nnnn10000010 1 sts.l x1,@-rn * rnC4 ? rn,x1 ? (rn) 0100nnnn10010010 1 sts.l y0,@-rn * rnC4 ? rn,y0 ? (rn) 0100nnnn10100010 1 sts.l y1,@-rn * rnC4 ? rn,y1 ? (rn) 0100nnnn10110010 1 note: * sh3-dsp only. description: stores system registers mach, macl, pr, dsp, a0, x0, x1, y0, and y1 data into a specified destination. note: in the case of system register mach, the 32-bit contents is stored unchanged. operation: stsmach(long n) /* sts mach,rn */ { r[n]=mach; if ((r[n]&0x00000200)==0) r[n]&=0x000003ff; else r[n]|=0xfffffc00; pc+=2; } 260 stsmacl(long n) /* sts macl,rn */ { r[n]=macl; pc+=2; } stspr(long n) /* sts pr,rn */ { r[n]=pr; pc+=2; } stsmmach(long n) /* sts.l mach,@Crn */ { r[n]C=4; if ((mach&0x00000200)==0) write_long(r[n],mach&0x000003ff); else write_long (r[n],mach|0xfffffc00) pc+=2; } stsmmacl(long n) /* sts.l macl,@Crn */ { r[n]C=4; write_long(r[n],macl); pc+=2; } stsmpr(long n) /* sts.l pr,@Crn */ { r[n]C=4; write_long(r[n],pr); pc+=2; } stsdsr(long n) /* sts dsr,rn */ { r[n]=dsr; pc+=2; } 261 stsa0(long n) /* sts a0,rn */ { r[n]=a0; pc+=2; } stsx0(long n) /* sts x0,rn */ { r[n]=x0; pc+=2; } stsx1(long n) /* sts x1,rn */ { r[n]=x1; pc+=2; } stsy0(long n) /* sts y0,rn */ { r[n]=y0; pc+=2; } stsy1(long n) /* sts y1,rn */ { r[n]=y1; pc+=2; } stsmdsr(long n) /* sts.l dsr,@Crn */ { r[n]C=4; write_long(r[n],dsr); pc+=2; } 262 stsma0(long n) /* sts.l a0,@Crn */ { r[n]C=4; write_long(r[n],a0); pc+=2; } stsmx0(long n) /* sts.l x0,@Crn */ { r[n]C=4; write_long(r[n],x0); pc+=2; } stsmx1(long n) /* sts.l x1,@Crn */ { r[n]C=4; write_long(r[n],x1); pc+=2; } stsmy0(long n) /* sts.l y0,@Crn */ { r[n]C=4; write_long(r[n],y0); pc+=2; } stsmy1(long n) /* sts.l y1,@Crn */ { r[n]C=4; write_long(r[n],y1); pc+=2; } 263 examples: sts mach,r0 ; before execution r0 = h'ffffffff, mach = h'00000000 ; after execution r0 = h'00000000 sts.l pr,@Cr15 ; before execution r15 = h'10000004 ; after execution r15 = h'10000000, @r15 = pr 264 8.2.69 sub (subtract binary): arithmetic instruction format abstract code cycle t bit sub rm,rn rn C rm ? rn 0011nnnnmmmm1000 1 description: subtracts general register rm data from rn data, and stores the result in rn. to subtract immediate data, use add #imm,rn. operation: sub(long m,long n) /* sub rm,rn */ { r[n]-=r[m]; pc+=2; } examples: sub r0,r1 ; before execution r0 = h'00000001, r1 = h'80000000 ; after execution r1 = h'7fffffff 265 8.2.70 subc (subtract with carry): arithmetic instruction format abstract code cycle t bit subc rm,rn rn C rmC t ? rn, borrow ? t 0011nnnnmmmm1010 1 borrow description: subtracts rm data and the t bit value from general register rn data, and stores the result in rn. the t bit changes according to the result. this instruction is used for subtraction of data that has more than 32 bits. operation: subc(long m,long n) /* subc rm,rn */ { unsigned long tmp0,tmp1; tmp1=r[n]-r[m]; tmp0=r[n]; r[n]=tmp1-t; if (tmp0 267 8.2.72 swap (swap register halves): data transfer instruction format abstract code cycle t bit swap.b rm,rn rm ? swap upper and lower halves of lower 2 bytes ? rn 0110nnnnmmmm1000 1 swap.w rm,rn rm ? swap upper and lower word ? rn 0110nnnnmmmm1001 1 description: swaps the upper and lower bytes of the general register rm data, and stores the result in rn. if a byte is specified, bits 0 to 7 of rm are swapped for bits 8 to 15. the upper 16 bits of rm are transferred to the upper 16 bits of rn. if a word is specified, bits 0 to 15 of rm are swapped for bits 16 to 31. operation: swapb(long m,long n) /* swap.b rm,rn */ { unsigned long temp0,temp1; temp0=r[m]&0xffff0000; temp1=(r[m]&0x000000ff)<<8; r[n]=(r[m]&0x0000ff00)>>8; r[n]=r[n]|temp1|temp0; pc+=2; } swapw(long m,long n) /* swap.w rm,rn */ { unsigned long temp; temp=(r[m]>>16)&0x0000ffff; r[n]=r[m]<<16; r[n]|=temp; pc+=2; } examples: swap.b r0,r1 ; before execution r0 = h'12345678 ; after execution r1 = h'12347856 swap.w r0,r1 ; before execution r0 = h'12345678 ; after execution r1 = h'56781234 268 8.2.73 tas (test and set): logic operation instruction format abstract code cycle t bit tas.b @rn when (rn) is 0, 1 ? t, 1 ? msb of (rn) 0100nnnn00011011 3/4 * test results note: * four cycles on the sh3-dsp. description: reads byte data from the address specified by general register rn, and sets the t bit to 1 if the data is 0, or clears the t bit to 0 if the data is not 0. then, data bit 7 is set to 1, and the data is written to the address specified by rn. during this operation, the bus is not released. note: the destination of the tas instruction should be placed in a non-cacheable space when the cache is enabled. operation: tas(long n) /* tas.b @rn */ { long temp; temp=(long)read_byte(r[n]); /* bus lock enable */ if (temp==0) t=1; else t=0; temp|=0x00000080; write_byte(r[n],temp); /* bus lock disable */ pc+=2; } example: _loop tas.b @r7 ; r7 = 1000 bf _loop ; loops until data in address 1000 is 0 269 8.2.74 trapa (trap always): system control instruction format abstract code cycle t bit trapa #imm imm ? tra, pc ? spc, sr ? ssr, 1 ? sr.md/bl/rb 0x160 ? expevt vbr + h'00000100 ? pc 11000011iiiiiiii 6/8 * note: * eight cycles on the sh3-dsp. description: starts the trap exception processing. the pc and sr values are saved in spc and ssr. eight-bit immediate data is stored in the tra registers (tra9 to tra2). the processor goes into privileged mode (sr.md = 1) with sr.bl = 1 and sr.rb = 1, that is, blocking exceptions and masking interrupts, and selecting bank1 registers (r0_bank1 to r7_bank1). exception code 0x160 is stored in the expevt register (expevt11 to expevt0). the program branches to an address (vbr+h'00000100). trapa and rte are both used together for system calls. note: if this instruction is located in a delayed slot immediately following a delayed branch instruction, it is acknowledged as an illegal slot instruction. operation: trapa(long i) /* trapa #imm */ { long imm; imm=(0x000000ff & i); tra=imm<<2; ssr=sr; spc=pc; sr.md=1 sr.bl=1 sr.rb=1 expevt=0x00000160; pc=vbr+h'00000100; } 270 8.2.75 tst (test logical): logic operation instruction format abstract code cycle t bit tst rm,rn rn & rm, when result is 0, 1 ? t 0010nnnnmmmm1000 1 test results tst #imm,r0 r0 & imm, when result is 0, 1 ? t 11001000iiiiiiii 1 test results tst.b #imm,@(r0,gbr) (r0 + gbr) & imm, when result is 0, 1 ? t 11001100iiiiiiii 3 test results description: logically ands the contents of general registers rn and rm, and sets the t bit to 1 if the result is 0 or clears the t bit to 0 if the result is not 0. the rn data does not change. the contents of general register r0 can also be anded with zero-extended 8-bit immediate data, or the contents of 8-bit memory accessed by indirect indexed gbr addressing can be anded with 8-bit immediate data. the r0 and memory data do not change. operation: tst(long m,long n) /* tst rm,rn */ { if ((r[n]&r[m])==0) t=1; else t=0; pc+=2; } tsti(long i) /* test #imm,r0 */ { long temp; temp=r[0]&(0x000000ff & (long)i); if (temp==0) t=1; else t=0; pc+=2; } 271 tstm(long i) /* tst.b #imm,@(r0,gbr) */ { long temp; temp=(long)read_byte(gbr+r[0]); temp&=(0x000000ff & (long)i); if (temp==0) t=1; else t=0; pc+=2; } examples: tst r0,r0 ; before execution r0 = h'00000000 ; after execution t = 1 tst #h'80,r0 ; before execution r0 = h'ffffff7f ; after execution t = 1 tst.b #h'a5,@(r0,gbr) ; before execution @(r0,gbr) = h'a5 ; after execution t = 0 272 8.2.76 xor (exclusive or logical): logic operation instruction format abstract code cycle t bit xor rm,rn rn ^ rm ? rn 0010nnnnmmmm1010 1 xor #imm,r0 r0 ^ imm ? r0 11001010iiiiiiii 1 xor.b #imm,@(r0,gbr) (r0 + gbr) ^ imm ? (r0 + gbr) 11001110iiiiiiiii 3 description: exclusive ors the contents of general registers rn and rm, and stores the result in rn. the contents of general register r0 can also be exclusive ored with zero-extended 8-bit immediate data, or 8-bit memory accessed by indirect indexed gbr addressing can be exclusive ored with 8-bit immediate data. operation: xor(long m,long n) /* xor rm,rn */ { r[n]^=r[m]; pc+=2; } xori(long i) /* xor #imm,r0 */ { r[0]^=(0x000000ff & (long)i); pc+=2; } xorm(long i) /* xor.b #imm,@(r0,gbr) */ { long temp; temp=(long)read_byte(gbr+r[0]); temp^=(0x000000ff & (long)i); write_byte(gbr+r[0],temp); pc+=2; } 273 examples: xor r0,r1 ; before execution r0 = h'aaaaaaaa, r1 = h'55555555 ; after execution r1 = h'ffffffff xor #h'f0,r0 ; before execution r0 = h'ffffffff ; after execution r0 = h'ffffff0f xor.b #h'a5,@(r0,gbr) ; before execution @(r0,gbr) = h'a5 ; after execution @(r0,gbr) = h'00 274 8.2.77 xtrct (extract): data transfer instruction format abstract code cycle t bit xtrct rm,rn rm: center 32 bits of rn ? rn 0010nnnnmmmm1101 1 description: extracts the middle 32 bits from the 64 bits of general registers rm and rn, and stores the 32 bits in rn (figure 8-16). rm rn rn msb msb lsb lsb figure 8-16 extract operation: xtrct(long m,long n) /* xtrct rm,rn */ { unsigned long temp; temp=(r[m]<<16)&0xffff0000; r[n]=(r[n]>>16)&0x0000ffff; r[n]|=temp; pc+=2; } example: xtrct r0,r1 ; before execution r0 = h'01234567, r1 = h'89abcdef ; after execution r1 = h'456789ab 275 8.3 floating point instructions and fpu related cpu instructions (sh-3e only) the functions used in the descriptions of the operation of fpu calculations are as follows. long fpscr; int t; int load_long(long *adress, *data) { /* this function is defined in cpu part */ } int store_long(long *adress, *data) { /* this function is defined in cpu part */ } int sign_of(long *src) { return(*src >> 31); } int data_type_of(long *src) { float abs; abs = *src & 0x7fffffff; if(abs < 0x00800000) { if(sign_of (src) == 0) return(pzero); else return(nzero); } else if((0x00800000 <= abs) && (abs < 0x7f800000)) return(norm); else if(0x7f800000 == abs) { if(sign_of (src) == 0) return(pinf); else return(ninf); } else if(0x00400000 & abs) return(snan); else return(qnan); } } 276 clear_cause_vz(){ fpscr &= (~cause_v & ~cause_z); } set_v(){ fpscr w = (cause_v w flag_v); } set_z(){ fpscr w = (cause_z w flag_z); } invalid(float *dest) { set_v(); if((fpscr & enable_v) == 0) qnan(dest); } } dz(float *dest, int sign) { set_z(); if((fpscr & enable_z) == 0) inf (dest,sign); } zero(float *dest, int sign) { if(sign == 0) *dest = 0x00000000; else *dest = 0x80000000; } int(float *dest, int sign) { if(sign == 0) *dest = 0x7f800000; else *dest = 0xff800000; } qnan(float *dest) { *dest = 0x7fbfffff; } 277 8.3.1 fabs (floating point absolute value): floating point instruction format abstract code latency cycles t bit fabs frn |frn| ? frn 1111nnnn01011101 21 description: obtains arithmetic absolute value (as a floating point number) of the contents of floating point register frn. the calculation result is stored in frn. operation: fabs(float *frn) /* fabs frn */ { clear_cause_vz(); case(data_type_of(frn)) { norm: if(sign_of(frn) == 0) *frn = *frn; else *frn = -*frn; break; pzero : nzero : zero(frn,0); break; pinf : ninf : inf(frn,0); break; qnan : qnan(frn); break; snan : invalid(frn); break; } pc += 2; } fabs special cases frn norm +0 C0 +inf Cinf qnan snan fabs(frn) abs +0 +0 +inf +inf qnan invalid note: non-normalized values are treated as zero. exceptions: invalid operation examples: fabs fr2 ; floating point absolute value ; before execution fr2=h'c0800000/* C4 in base 10 */ ; after execution fr2=h'40800000/* 4 in base 10 */ 278 8.3.2 fadd (floating point add): floating point instruction format abstract code latency cycles t bit fadd frm,frn frn+frm ? frn 1111nnnnmmmm0000 21 description: arithmetically adds (as floating point numbers) the contents of floating point registers frm and frn. the calculation result is stored in frn. operation: fadd (float *frm,frn) /* fadd frm,frn */ { clear_cause_vz(); if((data_type_of(frm) = = snan) || (data_type_of(frn) = = snan)) invalid(frn); else if((data_type_of(frm) = = qnan) || (data_type_of(frn) = = qnan)) qnan(frn); else case(data_type_of(frm)) { norm: case(data_type_of(frn)) { pinf : inf(frn,0); break; ninf : inf(frn,1); break; default : *frn = *frn + *frm; break; } break; pzero: case(data_type_of(frn)) { norm : *frn = *frn + *frm; break; pzero : nzero : zero(frn,0); break; pinf : inf(frn,0); break; ninf : inf(frn,1); break; } break; nzero: case(data_type_of(frn)){ norm : *frn = *frn + *frm; break; pzero : zero(frn,0); break; nzero : zero(frn,1); break; pinf : inf(frn,0); break; ninf : inf(frn,1); break; 279 } break; pinf: case(data_type_of(frn)) { ninf : invalid(frn); break; default : inf(frn,0); break; } break; ninf: case(data_type_of(frn)){ pinf : invalid(frn); break; default : inf(frn,1); break; } break; } pc += 2; } fadd special cases frm frn norm +0 C0 +inf Cinf qnan snan norm add Cinf +0 +0 C0 C0 +inf +inf invalid Cinf Cinf invalid Cinf qnan qnan snan invalid note: non-normalized values are treated as zero. exceptions: invalid operation 280 examples: fadd fr2,fr3 ; floating point add ; before execution: fr2=h'40400000/* 3 in base 10 */ ; fr3=h'3f800000/* 1 in base 10 */ ; after execution: fr2=h'40400000 ; fr3=h'40800000/* 4 in base 10 */ fadd fr5,fr4 ; ; before execution: fr5=h'40400000/* 3 in base 10 */ ; fr4=h'c0000000/* C2 in base 10 */ ; after execution: fr5=h'40400000 ; fr4=h'3f800000/* 1 in base 10 */ 281 8.3.3 fcmp (floating point compare): floating point instruction format abstract code latency cycles t bit fcmp/eq frm,frn (frn==frm)? 1:0 ? t 1111nnnnmmmm0100 2 1 comparison result fcmp/gt frm,frn (frn> frm)? 1:0 ? t 1111nnnnmmmm0101 2 1 comparison result description: arithmetically compares (as floating point numbers) the contents of floating point registers frm and frn. the calculation result (true/false) is written to the t bit. operation: fcmp_eq(float *frm,frn) /* fcmp/eq frm,frn */ { clear_cause_vz(); if (fcmp_chk(frm,frn) = = invalid) {fcmp_invalid(0); } else if(fcmp_chk(frm,frn) = = eq) t = 1; else t = 0; pc += 2; } fcmp_gt(float *frm,frn) /* fcmp/gt frm,frn */ { clear_cause_vz(); if (fcmp_chk(frm,frn)==invalid)||{fcmp_chk(frm,frn)==uo)){ fcmp_invalid(0):} else if(fcmp_chk(frm,frn) = = gt) t = 1; else t = 0; pc += 2; } fcmp_chk(float *frm,*frn) { if((data_type_of(frm) == snan) || (data_type_of(frn) == snan)) return(invalid); else if((data_type_of(frm) == qnan) || (data_type_of(frn) == qnan)) return(uo); else case(data_type_of(frm)) { norm :case(data_type_of(frn)) { pinf :return(gt); break; 282 ninf :return(notgt); break; default : break; } break; pzero : nzero : case(data_type_of(frn)) { pzero : nzero :return(eq); break; pinf :return(gt); break; ninf :return(notgt); break; default : break; } break; pinf : case(data_type_of(frn)) { pinf :return(eq) break; default :return(notgt); break; } break; ninf : case(data_type_of(frn)) { ninf :return(eq); break; default :return(gt); break; } break; } if(*frn = = *frm) return(eq); else if(*frn > *frm) return(gt); else return(notgt); } fcmp_invalid(int cmp_flag) { set_v(); if((fpscr & enable_v) = = 0) t = cmp_flag; } 283 fcmp special cases frm frn norm +0 C0 +inf Cinf qnan snan norm cmp gt !gt +0 eq C0 +inf !gt eq Cinf gt eq qnan uo snan invalid notes: 1. uo if result is fcmp/eq, invalid if result is fcmp/gt. 2. non-normalized values are treated as zero. exceptions: invalid operation note: four comparison operations that are independent of each other are defined in the ieee standard, but the sh-3e supports fcmp/eq and fcmp/gt only. however, all comparison conditions can be supported by using these two fcmp instructions in combination with the bt and bf instructions. (frm = = frn) fcmp/eq frm, frn ; bt (frm ! = frn) fcmp/eq frm, frn ; bf (frm < frn) fcmp/gt frm, frn ; bt (frm <= frn) fcmp/gt frn, frm ; bt (frm > frn) fcmp/gt frn, frm ; bt (frm >= frn) fcmp/gt frm, frn ; bf unorder frm, frn fcmp/eq frm, frm ; bf examples: fcmp/eq: fldi1 fr6 ;fr6=h'3f800000/* 1 in base 10 */ fldi1 fr7 ;fr7=h'3f800000 clrt ;t bit =0 fcmp/eq fr6,fr7 ; floating point compare, equal bf trget_f ; don't branch (t=1) nop bt/s trget_t ; branch fadd fr6,fr7 ; delay slot, fr7=h'40000000/* 2 in base 10 */ 284 nop trget_f fcmp/eq fr6,fr7 bt/s trget_t ; don't branch (t=0) fldi1 fr7 ; delay slot trget_t fcmp/eq fr6,fr7 ; t bit = 0 bf trget_f ; branch first time only nop ;fr6=fr7=h'3f800000/* 1 in base 10 */ .end fcmp/gt: fldi1 fr2 ;fr2=h'3f800000/* 1 in base 10 */ fldi1 fr7 fadd fr2,fr7 ;fr7=h'40000000/* 2 in base 10 */ clrt ; t bit = 0 fcmp/gt fr2,fr7 ; floating point compare, greater than bt/s trget_t ; branch (t=1) fldi1 fr7 trget_t fcmp/gt fr2,fr7 ; t bit = 0 bt trget_t ; don't branch (t=0) .end 285 8.3.4 fdiv (floating point divide): floating point instruction format abstract code latency cycles t bit fdiv frm, frn frn/frm ? frn 1111nnnnmmmm0011 14 13 description: arithmetically divides (as floating point numbers) the contents of floating point register frn by the contents of floating point register frm. the calculation result is stored in frn. operation: fdiv(float *frm,*frn) /* fdiv frm,frn */ { clear_cause_vz(); if((data_type_of(frm) = = snan) | | (data_type_of(frn) = = snan)) invalid(frn); else if((data_type_of(frm) = = qnan) | | (data_type_of(frn) = = qnan)) qnan(frn); else case((data_type_of(frm) { norm : case(data_type_of(frn)) { pinf : ninf : inf(frn,sign_of(frm)^sign_of(frn)); break; default : *frn =*frn / *frm; break; } break; pzero : nzero : case(data_type_of(frn)) { pzero : nzero : invalid(frn); break; pinf : ninf : inf(fn,sign_of(frm)^sign_of(frn)); break; default : dz(frn,sign_of(frm)^sign_of(frn)); break; } break; pinf : ninf : case(data_type_of(frn)) { pinf : ninf : invalid(frn); break; default :zero (frn,sign_of(frm)^sign_of(frn)); break 286 break; } pc += 2; } fdiv special cases frm frn norm +0 C0 +inf Cinf qnan snan norm div 0 +0 dz invalid inf C0 +inf 0 +0 C0 invalid Cinf C0 +0 qnan qnan snan invalid note: non-normalized values are treated as zero. exceptions: invalid operation, divide by zero examples: fdiv fr6, fr5 ; floating point divide ; before execution: ;fr5=h'40800000/* 4 in base 10 */ ; ;fr6=h'40400000/* 3 in base 10 */ ; after execution: ;fr5=h'3faaaaaa/* 1.33... in base 10 */ ; ;fr6=h'40400000 287 8.3.5 fldi0 (floating point load immediate 0): floating point instruction format abstract code latency cycles t bit fldi0 frn h'00000000 ? frn 1111nnnn10001101 21 description: loads the floating point number 0 (0x00000000) in floating point register frn. operation: fldi0(float *frn) /* fldi0 frn */ { *frn = 0x00000000; pc += 2; } exceptions: none examples: fldi0 fr1 ; load immediate 0 ; before execution: fr1=x (don't care) ; after execution: fr1=00000000 288 8.3.6 fldi1 (floating point load immediate 1): floating point instruction format abstract code latency cycles t bit fldi1 frn h'3f800000 ? frn 1111nnnn10011101 21 description: loads the floating point number 1 (0x3f800000) in floating point register frn. operation: fldi1(float *frn) /* fldi1 frn */ { *frn = 0x3f800000; pc += 2; } exceptions: none examples: fldi1 fr2 ; load immediate 1 ; before execution: fr2=x (don't care) ; after execution: fr2=h'3f800000/* 1 in base 10 */ 289 8.3.7 flds (floating point load to system register): floating point instruction format abstract code latency cycles t bit flds frm,fpul frm ? fpul 1111nnnn00011101 21 description: loads the contents of floating point register frm to system register fpul. operation: flds(float *frm,*fpul) /* flds frm,fpul */ { *fpul = *frm; pc += 2; } exceptions: none examples: ; before execution of flds and fsts: fldi1 fr6 ; fr6=h'3f800000/* 1 in base 10 */ fldi0 fr2 ; fr2=0 ; after execution of flds and fsts: flds fr6, fpul ; fpul=h'3f800000 fsts fpul, fr2 ; fr2= h'3f800000 290 8.3.8 float (floating point convert from integer): floating point instruction format abstract code latency cycles t bit float fpul,frn (float)fpul ? frn 1111nnnn00101101 21 description: interprets the contents of fpul as an integer value and converts it into a floating point number. the result is stored in floating point register frn. operation: float(int,*fpul,float *frn) /* float frn */ { clear_cause_vz(); *frn = (float)*fpul; pc += 2; } exceptions: none examples: ; floating point convert from integer ; before execution of float instruction: mov.l #h'00000003,r1 ; r1=h'00000003 fldi0 fr2 ; fr2=0 ; after execution of float instruction: lds r1, fpul ; fpul=h'00000003 float fpul, fr2 ; fr2=h'40400000/* 3 in base 10 */ 291 8.3.9 fmac (floating point multiply accumulate): floating point instruction format abstract code latency cycles t bit fmac fr0, frm,frn fr0 frm+frn ? frn 1111nnnnmmmm1110 21 description: arithmetically multiplies (as floating point numbers) the contents of floating point registers fr0 and frm. to this calculation result is added the contents of floating point register frn, and the result is stored in frn. operation: fmac(float *fr0,*frm,*frn) /* fmac fr0,frm,frn */ { long tmp_fpscr; float *tmp_fmul = *frm; fmul(f0,tmp_fmul); pc -=2; /* correct pc */ tmp_fpscr = fpscr; /* save cause field for fr0*frm */ fadd(tmp_fmul,frn); fpscr |= tmp_fpscr; /* reflect cause field for f0*frm */ } 292 fmac special cases frn fr0 frm +norm Cnorm +0 C0 +inf Cinf qnan snan norm norm mac inf 0 invalid +inf +inf Cinf invalid +inf Cinf Cinf Cinf +inf Cinf +inf +0 norm mac inf 0 +0 invalid +inf +inf Cinf invalid +inf Cinf Cinf Cinf +inf Cinf +inf C0 +norm mac +0 C0 +inf Cinf Cnorm C0 +0 Cinf +inf +0 +0 C0 +0 C0 invalid C0 C0 +0 C0 +0 +inf +inf Cinf invalid +inf Cinf Cinf Cinf +inf Cinf +inf +inf +norm +inf invalid Cnorm +inf 0 invalid +inf invalid +inf Cinf invalid +inf +inf Cinf +norm Cinf Cinf Cnorm 0 +inf invalid invalid Cinf Cinf Cinf Cinf invalid qnan 0 invalid inf invalid !snan !nan qnan qnan all types snan snan all types invalid note: non-normalized values are treated as zero. 293 exceptions: invalid operation examples: fmac fr0, fr3, fr5 ; floating point multiply accumulate fr0*fr3+fr5->fr5 ; before execution: fr0=h'40000000/* 2 in base 10 */ ; fr3=h'40800000/* 4 in base 10 */ ; fr5=h'3f800000/* 1 in base 10 */ ; after execution: fr0=h'40000000/* 2 in base 10 */ ; fr3=h'40800000/* 4 in base 10 */ ; fr5=h'41100000/* 9 in base 10 */ fmac fr0, fr0, fr5 ;fr0*fr0+fr5->fr5 ; before execution: fr0=h'40000000/* 2 in base 10 */ ; fr5=h'3f800000/* 1 in base 10 */ ; after execution: fr0=h'40000000/* 2 in base 10 */ ; fr5=h'40a00000/* 5 in base 10 */ fmac fr0, fr5, fr0 ;fr0*fr5+fr0->fr5 ; before execution: fr0=h'40000000/* 2 in base 10 */ ; fr5=h'40a00000/* 5 in base 10 */ ; after execution: fr0=h'41400000/* 12 in base 10 */ ; fr5=h'40a00000/* 5 in base 10 */ 294 8.3.10 fmov (floating point move): floating point instruction format abstract code latency (wait time) cycles t bit 1.fmov frm,frn frm ? frn 1111nnnnmmmm1100 21 2.fmov.s @rm,frn (rm) ? frn 1111nnnnmmmm1000 21 3.fmov.s frm, @rn frm ? (rn) 1111nnnnmmmm1010 21 4.fmov.s @rm+,frn (rm) ? frn, rm+=4 1111nnnnmmmm1001 21 5.fmov.s frm,@-rn rn-=4, frm ? (rn) 1111nnnnmmmm1011 21 6.fmov.s @(r0,rm),frn (r0+rm) ? frn 1111nnnnmmmm0110 21 7.fmov.s frm, @(r0,rn) frm ? (r0+rn) 1111nnnnmmmm0111 21 description: 1. moves the contents of floating point register frm to floating point register frn. 2. loads the contents of the memory addresses specified by general-use register rm to floating point register frn. 3. stores the contents of floating point register frm in the memory address position specified by general-use register rm. 4. loads the contents of the memory addresses specified by general-use register rm to floating point register frn. after the load completes successfully, increments the value of rm by 4. 5. stores the contents of floating point register frm in the memory address position specified by general-use register rn-4. after the store completes successfully, the decremented value (rn- 4) becomes the value of rm. 6. loads the contents of the memory addresses specified by general-use registers rm and r0 to floating point register frn. 7. stores the contents of floating point register frm in the memory address position specified by general-use registers rn and r0. 295 operation: fmov(float *frm,*frn) /* fmov.s frm,frn */ { *frn = *frm; pc += 2; } fmov_load(long *rm,float *frn) /* fmov @rm,frn */ { if(load_long(rm,frn) !=address_error) load_long(rm,frn); pc += 2; } fmov_store(float *frm,long *rn) /* fmov.s frm,@rn */ { if(store_long(frm,tmp_address) !=address_error) store_long(frm,rn); pc += 2; } fmov_restore(long *rm,float *frn) /* fmov.s @rm+,frn */ { if(load_long(rm,frn) !=address_error) *rm += 4; pc += 2; } fmov_save(float *frm,long *rn) /*fmov.s frm,@-rn */ { long *tmp_address =*rn -4; if(store_long(frm,tmp_address) !=address_error) rn = tmp_address; pc += 2; } fmov_load_index(long *rm, long *r0, float *frn)/* fmov.s @(r0,rm),frn*/ { if (load_long(&(*rm+*r0),frn), ! = address_error); pc += 2; } fmov_store_index(float *frm,long *r0, long *rn)/* fmov.s frm,@(r0,rn)*/ 296 { if (store_long(frm,&((*rn+*r0)), ! = address_error); pc += 2; } exceptions: address error examples: fmov.s @r1, fr2 ; load ; before execution: @r1=h'00abcdef ; fr2=0 ; after execution: @r1=h'00abcdef ; fr2=h'00abcdef fmov.s fr2, @r3 ; store ; before execution: @r3=0 ; fr2=h'40800000 ; after execution: @r3=h'40800000 ; fr2=h'40800000 fmov.s @r3+,fr3 ; restore ; before execution: r3=h'0c700028 ; @r3=h'40800000 ; fr3=0 ; after execution: r3=h'0c70002c ; fr3=h'40800000 fmov.s fr4, @-r3 ; save ; before execution: r3=h'0c700044 ; @r3=0 ; fr4=h'01234567 ; after execution: r3=h'0c700040 ; @r3=h'01234567 ; fr4=h'01234567 fmov.s @(r0, r3), fr4 ; load with index ; before execution: r0=h'00000004 ; r3=h'0c700040 297 ; @h'0c700044=h'00abcdef ; fr=4 ; after execution: r0=h'00000004 ; r3=h'0c700040 ; fr4=h'00abcdef fmov.s fr5, @(r0,r3) ; store with index ; before execution: r0=h'00000028 ; r3=h'0c700040 ; @h'0c700068=0 ; fr5=h'76543210 ; after execution: r0=h'00000028 ; r3=h'0c700040 ; @h'0c700068=h'76543210 fmov.s fr5, fr6 ; register file contents ; before execution: fr5=h'76543210 ; fr6=x(don't care) ; after execution: fr5=h'76543210 ; fr6=h'76543210 298 8.3.11 fmul (floating point multiply): floating point instruction format abstract code latency cycles t bit fmul frm,frn frn frm ? frn 1111nnnnmmmm0010 21 description: arithmetically multiplies (as floating point numbers) the contents of floating point registers frm and frn. the calculation result is stored in frn. operation: fmul(float *frm,*frn) /* fmul frm,frn */ { clear_cause_vz(); if((data_type_of(frm) = = snan) || (data_type_of(frn) = = snan)) invalid(frn); else if((data_type_of(frm) = = qnan) || (data_type_of(frn) = = qnan)) qnan(frn); else case(data_type_of(frm) { norm : case(data_type_of(frn)) { pinf : ninf : inf(frn,sign_of(frm)^sign_of(frn)); break; default: *frn=(*frn)*(*frm); break; } break; pzero : nzero : case(data_type_of(frn)) { pinf : ninf : invalid(frn); break; default: zero(frn,sign_of(frm)^sign_of(frn)); break; } break; pinf : ninf : case(data_type_of(frn)) { pzero : nzero : invalid(frn); break; default:inf (frn,sign_of(frm)^sign_of(frn)); break } break; } 299 pc += 2; } fmul special cases frm frn norm +0 C0 +inf Cinf qnan snan norm mul 0 inf +0 0 +0 C0 invalid C0 C0 +0 +inf inf invalid +inf Cinf Cinf Cinf +inf qnan qnan snan invalid note: non-normalized values are treated as zero. exceptions: invalid operation examples: fmul fr2, fr3 ; floating point multiply ; before execution: fr2=h'40000000/* 2 in base 10 */ ; fr3=h'40800000/* 4 in base 10 */ ; after execution: fr2=h'40000000 ; fr3=h'41000000/* 8 in base 10 */ 300 8.3.12 fneg (floating point negate): floating point instruction format abstract code latency cycles t bit fneg frn -frn ? frn 1111nnnn01001101 21 description: arithmetically negates (as a floating point number) the contents of floating point register frn. the calculation result is stored in frn. operation: fneg(float *frn) /* fneg frn */ { clear_cause_vz(); case(data_type_of(frn)) { qnan : qnan(frn); break; snan : invalid(frn); break; default : *frn = -(*frn); break; } pc += 2; } fneg special cases frn norm +0 C0 +inf Cinf qnan snan fneg(frn) neg C0 +0 Cinf +inf qnan invalid note: non-normalized values are treated as zero. exceptions: invalid operation examples: fneg fr2 ; floating point negate ; before execution: fr2=h'40800000/* 4 in base 10 */ ; after execution: fr2=h'c0800000/* C4 in base 10 */ 301 8.3.13 fsqrt (floating point square root): floating point instruction format abstract code latency cycles t bit fsqrt frn frn ? frn 1111nnnn01101101 14 13 description: arithmetically obtains (as a floating point number) the square root of the contents of floating point register frn. the calculation result is stored in frn. operation: fsqrt(float *frn) /* fsqrt frn */ { clear_cause_vz(); case(data_type_of(frn)) { norm : if(sign_of(frn) = = 0) *frn = sqrt(*frn); else invalid(frn); break; pzero : nzero : pinf : *frn = *frn; break; ninf : invalid(frn); break; qnan : qnan(frn); break; snan : invalid(frn); break; } pc += 2; } fsqrt special cases frn +norm Cnorm +0 C0 +inf Cinf qnan snan fsqrt(frn) sqrt invalid +0 C0 +inf invalid qnan invalid note: non-normalized values are treated as zero. exceptions: invalid operation examples: fsqrt fr4 ; floating point square root ; before execution: ;fr4=h'40400000/* 3 in base 10 */ ; after execution: ;fr4=h'3fddb3d7/* 1.7320 in base 10 */ 302 8.3.14 fsts (floating point store from system register): floating point instruction format abstract code latency cycles t bit fsts fpul,frn fpul ? frn 1111nnnn00001101 21 description: copies the contents of system register fpul to floating point register frn. operation: fsts(float *frn,*fpul) /* fsts fpul,frn */ { *frn = *fpul; pc += 2; } exceptions: none examples: mov.l #h'00000002, r2 ; before execution of fsts instruction: ;r2=h'00000002 fldi0 fr5 ;fr5=0 lds r2,fpul ; after execution of fsts instruction: ;r2=h'00000002 fsts fpul, r5 ;fr5= h'00000002 303 8.3.15 fsub (floating point subtract): floating point instruction format abstract code latency cycles t bit fsub frm, frn frn-frm ? frn 1111nnnnmmmm0001 21 description: arithmetically subtracts (as floating point numbers) the contents of floating point register frm from contents of floating point register frn. the calculation result is stored in frn. operation: fsub(float *frm,frn) /* fsub frm,frn */ { clear_cause_vz(); if((data_type_of(frm) = = snan) | | (data_type_of(frn) = = snan)) invalid(frn); else if((data_type_of(frm) = = qnan) | | (data_type_of(frn) = = qnan)) qnan(frn); else case(data_type_of(frm)) { norm : case(data_tyoe_of(frn)) { pinf : inf(frn,0); break; ninf : inf(frn,1); break; default : *frn = *frn - *frm; break; } break; pzero : case(data_type_of(frn)) { norm : *frn = *frn- *frm; break; pzero : zero(frn,0); break; nzero : zero(frn,1); break; pinf : inf(frn,0); break; ninf : inf(frn,1); break; } break; nzero : case(data_type_of(frn)) { norm : *frn = *frn - *frm; break; pzero : nzero : zero(frn,0); break; pinf : inf(frn,0); break; 304 ninf : inf(frn,1); break; } break; pinf : case(data_type_of(frn)) { ninf : invalid(frn); break; default : inf(frn,1); break; } break; ninf : case(data_type_of(frn)) { pinf : invalid(frn); break; default : inf(frn,0); break; } break; } pc += 2; } fsub special cases frm frn norm +0 C0 +inf Cinf qnan snan norm sub +inf Cinf +0 C0 C0 +0 +inf Cinf invalid Cinf +inf invalid qnan qnan snan invalid note: non-normalized values are treated as zero. exceptions: invalid operation 305 examples: fsub fr0, fr3 ; floating point subtract ; before execution: ;fr0=h'3f800000/* 1 in base 10 */ ; ;fr3=h'40e00000/* 7 in base 10 */ ; after execution: ;fr0=h'3f800000/* 1 in base 10 */ ; ;fr3=h'40c00000/* 6 in base 10 */ fsub fr3, fr2 ; ; before execution: ;fr2=h'40800000/* 4 in base 10 */ ; ;fr3=h'40c00000/* 6 in base 10 */ ; after execution: ;fr2=h'c0000000/* C2 in base 10 */ ; ;fr3=h'40c00000/* 6 in base 10 */ 306 8.3.16 ftrc (floating point truncate and convert to integer): floating point instruction format abstract code latency cycles t bit ftrc frm, fpul (long)frm ? fpul 1111nnnn00111101 21 description: interprets the contents of floating point register frm as a floating point number and converts it to an integer by truncating everything after the decimal point. the calculation result is stored in frn. operation: #define n_int_range 0xcf000000 /* 01.000000 * 2^16 */ #define p_int_range 0x47ffffff /* 1.fffffe * 2^30 */ ftrc(float *frm,int *fpul) /* ftrc frm,fpul */ { clear_cause_vz(); case(ftrc_type_of(frm)) { norm : *fpul = (long)(*frm); break; pinf : ftrc_invalid(0); break; ninf : ftrc_invalid(1); break; } pc += 2; } int ftrc_type_of(long *src) { long abs; abs = *src & 0x7ffffff; if(sign_of(src) = = 0) { if(abs > 0x7f800000) return(ninf); /* nan*/ else if(abs > p_int_range) return(pinf); /* out of range,+inf */ else return(norm); /* +0,+norm */ } else { if(*src > n_int_range) return(ninf);/* out of range ,+inf,nan*/ else return(norm); /* -0,-norm*/ } } 307 ftrc_invalid(long *dest,int sign) { set_v(); if((fpscr & enable_v) = = 0) { if(sign = = 0) *dest = 0x7fffffff; else *dest = 0x80000000; } } ftrc special cases frn norm +0 C0 positive out of range negative out of rarge +inf -inf qnan snan ftrc (frn) trc 0 0 7fffffff 80000000 invalid +max invalid Cmax invalid Cmax invalid Cmax invalid note: non-normalized values are treated as zero. exceptions: invalid operation examples: mov.l #h'402ed9eb, r2 lds r2, fpul fsts fpul, fr6 ;fr6=h'402ed9eb/* 2.7320 in base 10 */ ftrc fr6, fpul sts fpul, r2 ;r2=h'00000002/* 2 in base 10 */ ; before execution of ftrc and sts: ; r2=h'402ed9eb ; fr6=h'402ed9eb ; after execution of ftrc and sts: ; r2=h'00000002 ; fr6=h'402ed9eb 308 8.3.17 lds (load to system register): fpu related cpu instruction format abstract code latency cycles t bit 1.lds rm, fpul rm ? fpul 0100nnnn01011010 21 2.lds.l@rm+,fpul (rm) ? fpul, rm+=4 0100nnnn01010110 21 3.lds rm,fpscr rm ? fpscr 0100nnnn01101010 31 4.lds.l @rm+,fpscr (rm) ? fpscr, rm+=4 0100nnnn01100110 31 description: 1. moves the contents of general-use register rm to system register fpul. 2. loads the contents of the memory addresses specified by general-use register rm to system register fpul. after the load completes successfully, increments the value of rm by 4. 3. moves the contents of general-use register rm to system register fpscr. previously defined bits in fpscr are not changed. 4. loads the contents of the memory addresses specified by general-use register rm to system register fpscr. after the load completes successfully, increments the value of rm by 4. previously defined bits in fpscr are not changed. operation: #define fpscr_mask 0x00018c60 lds(long *rm,*fpul) /* lds rm,fpul */ { *fpul = *rm; pc += 2; } lds_restore(long *rm, *fpul) /* lds.l @rm+,fpul */ { if(load_long(rm,fpul) != address_error) *rm += 4 ; pc += 2; } lds(long *rm,*fpscr) /* lds rm,fpscr */ { *fpscr = *rm & fpscr_mask; pc += 2; 309 } lds_restore(long *rm, *fpscr) /* lds.l @rm+,fpscr */ { long *tmp_fpscr; if(load_long(rm, tmp_fpscr) != address_error){ *fpscr =*tmp_fpscr & fpscr_mask; *rm += 4 ; } pc += 2; } exceptions: address error examples: ? lds example 1 mov.l #h'12345678, r2 ; before execution of lds and fsts instructions: ; r2=h'12345678 fldi0 fr3 ; fr3=0 lds r2, fpul ; after execution of lds and fsts instructions: ; r2=h'12345678 fsts fpul, fr3 ; fr3= h'12345678 example 2 mov.l #h'00040801, r4 ; after execution of lds instruction: lds r4, fpscr ;fpscr=00040801 ? lds.l example 1 ldi0 fr0 ; before execution of lds.l and fsts instructions: mov.l #h'87654321, r4 ; fr0=0 mov.l #h'0c700128, r8 ; r8=0c700128 mov.l r4,@r8 ; after execution of lds.l and fsts instructions: lds.l @r8+, fpul ; fr0=87654321 fsts fpul, fr0 ; r8=0c70012c 310 example 2 mov.l #h'00040c01, r4 ; before execution of lds.l instruction: mov.l #h'0c700134, r8 ; r8=0c700134 mov.l r4,@r8 ; after execution of lds.l instruction: ; r8=0c700138 lds.l @r8+, fpscr ; fpscr=00040c01 311 8.3.18 sts (store from fpu system register): fpu related cpu instruction format abstract code latency (wait time) cycles t bit 1.sts fpul,rn fpul ? rn 0000nnnn01011010 21 2.sts.l fpul,@-rn rn -= 4, fpul ? @(rn) 0100nnnn01010010 21 3.sts fpscr,rn fpscr ? rn 0000nnnn01101010 31 4.sts.l fpscr,@-rn rn -= 4, fpscr ? @(rn) 0100nnnn01100010 31 description: 1. moves the contents of system register fpul to general-use register rn. 2. stores contents of system register fpul at the memory address position specified by general- use register rn-4. after the store completes successfully, the decremented value becomes the value of rn. 3. moves the contents of system register fpscr to general-use register rn. 4. stores contents of system register fpscr at the memory address position specified by general-use register rn-4. after the store completes successfully, the decremented value becomes the value of rn. operation: sts(long *fpul,*rn) /* sts.l fpul,rn */ { *rn = *fpul; pc += 2; } sts_save(long *fpul,*rn) /* sts.l fpul,@-rn */ { long *tmp_address = *rn - 4; if(store_long(fpul,tmp_address) != address_error) rn = tmp_address; pc += 2; } sts(long *fpscr,*rn) /* sts fpscr,rn */ { *rn = *fpscr; 312 pc += 2; } sts store from fpu system register sts_restore long *fpscr,*rn) /* sts.l fpscr,@-rn */ { long *tmp_address = *rn - 4; if(store_long(fpscr tmp_address) != address_error) rn = tmp_address pc += 2; } exceptions: address error examples: ? sts example 1 mov.l #h'12abcdef, r12 lds.l @r12, fpul sts fpul, r13 ; after execution of sts instruction: ; r13 = 12abcdef example 2 sts fpscr, r2 ; after execution of sts instruction: ; contents of fpscr at that point stored in r2 register ? sts.l example 1 mov.l #h'0c700148, r7 sts fpul, @-r7 ; before execution of sts.l instruction: ; r7 = h'0c700148 ; after execution of sts.l instruction: 313 ; r7 = h'0c700144, contents of fpul saved at address h'0c700144 ; location h'0c700144 example 2 mov.l #h'0c700154, r8 sts.l fpscr, @-r8 ; after execution of sts.l instruction: ; contents of fpscr saved at address h'0c700150 314 8.4 dsp data transfer instructions (sh3-dsp only) table 8-1 lists the dsp data transfer instructions in alphabetical order. table 8-1 dsp data transfer instructions in alphabetical order instruction operation code cycles dc bit movs.l @-as,ds asC4 ? as,(as) ? ds 111101aadddd0010 1 movs.l @as,ds (as) ? ds 111101aadddd0110 1 movs.l @as+,ds (as) ? ds,as+4 ? as 111101aadddd1010 1 movs.l @as+ix,ds (as) ? ds,as+ix ? as 111101aadddd1110 1 movs.l ds, @-as asC4 ? as,ds ? (as) 111101aadddd0011 1 movs.l ds,@as ds ? (as) 111101aadddd0111 1 movs.l ds,@as+ ds ? (as),as+4 ? as 111101aadddd1011 1 movs.l ds,@as+ix ds ? (as),as+ix ? as 111101aadddd1111 1 movs.w @-as,ds asC2 ? as,(as) ? msw of ds,0 ? lsw of ds 111101aadddd0000 1 movs.w @as,ds (as) ? msw of ds,0 ? lsw of ds 111101aadddd0100 1 movs.w @as+,ds (as) ? msw of ds,0 ? lsw of ds, as+2 ? as 111101aadddd1000 1 movs.w @as+ix,ds (as) ? msw of ds,0 ? lsw of ds, as+ix ? as 111101aadddd1100 1 movs.w ds,@-as asC2 ? as,msw of ds ? (as) 111101aadddd0001 1 movs.w ds,@as msw of ds ? (as) 111101aadddd0101 1 movs.w ds,@as+ msw of ds ? (as),as+2 ? as 111101aadddd1001 1 movs.w ds,@as+ix msw of ds ? (as),as+ix ? as 111101aadddd1101 1 movx.w @ax,dx (ax) ? msw of dx,0 ? lsw of dx 111100a*d*0*01** 1 movx.w @ax+,dx (ax) ? msw of dx,0 ? lsw of dx,ax+2 ? ax 111100a*d*0*10** 1 315 table 8-1 dsp data transfer instructions in alphabetical order (cont) instruction operation code cycles dc bit movx.w @ax+ix,dx (ax) ? msw of dx,0 ? lsw of dx,ax+ix ? ax 111100a*d*0*11** 1 movx.w da,@ax msw of da ? (ax) 111100a*d*1*01** 1 movx.w da,@ax+ msw of da ? (ax),ax+2 ? ax 111100a*d*1*10** 1 movx.w da,@ax+ix msw of da ? (ax),ax+ix ? ax 111100a*d*1*11** 1 movy.w @ay,dy (ay) ? msw of dy,0 ? lsw of dy 111100*a*d*0**01 1 movy.w @ay+,dy (ay) ? msw of dy,0 ? lsw of dy, ay+2 ? ay 111100*a*d*0**10 1 movy.w @ay+iy,dy (ay) ? msw of dy,0 ? lsw of dy, ay+iy ? ay 111100*a*d*0**11 1 movy.w da,@ay msw of da ? (ay) 111100*a*d*1**01 1 movy.w da,@ay+ msw of da ? (ay),ay+2 ? ay 111100*a*d*1**10 1 movy.w da,@ay+iy msw of da ? (ay),ay+iy ? ay 111100*a*d*1**11 1 nopx no operation 1111000*0*0*00** 1 nopy no operation 111100*0*0*0**00 1 note: msw = high-order word of operand lsw = low-order word of operand x and y data transfers (movx.w and movy.w) these instructions use the xdb and ydb buses to access x and y memory. areas other than x and y memory cannot be accessed. memory is accessed in word units. since independent bus is used, it does not create access contention with instruction fetches (using the ldb bus). x and y data transfer instructions are executed regardless of conditions even when the data operation instruction executed in parallel has conditions. figure 8-17 shows the load and store operations in x and y data transfers. 316 instruction code for x data transfer operation r4 [ax] r5 [ax] r6 [ay] r7 [ay] control for x memory control for y memory abx aby 31 0 31 0 15 1 15 1 x data memory 4 kbytes y data memory 4 kbytes xab 15 bits yab 15 bits 16 bits 16 bits xdb ydb x_mem y_mem x r/ w y r/ w x_mem, y_mem: select signals for x and y data memory instruction code for y data transfer operation dsp data register x0/x1, a0/a1 input/output control dsp data register y0/y1, a0/a1 input/output control figure 8-17 load and store operations in x and y data transfers x memory data transfer operation is shown below. y memory data transfers are the same. if ( !nop ) { x_mem=1; xab=abx; x r/w=1; if ( load operation ) { dx[31:16]=xdb; dx[15:0] =0x0000; /* dx is x0 or x1 */ } else {xdb=dx[31:16];x r/w=0;} /* dx is a0 or a1 */ } else { x_mem=0; xab=unknown; } 317 single data transfers (movs.w and movs.l) single data transfers are instructions that load to and store from the dsp register. they are like system register load and store instructions. data transfers between the dsp register and memory use the lab and ldb buses. like cpu core instructions, data accesses can create access contention with instruction memory accesses. single data transfers can use either word or longword data. figure 8-18 shows the load and store operations in single data transfers. wl ls mab memory control is superh core control 31 0 31 0 32 bits 32 bits lab ldb r2 [as] r3 [as] r4 [as] r5 [as] instruction code for single data transfer operation dsp data register input/output control figure 8-18 load and store operations in single data transfers load and store operations in single data transfers are shown below. 318 lab = mab; if ( ms!=nls @@ w/l is word access {/* movs.w */ if (ls==load) { if (ds!=a0g @@ ds!=a1g){ ds[31:16] = ldb[15:0]; ds[15:0] = 0x0000; if (ds==a0) a0g[7:0] = ldb[15]; if (ds==a1) a1g[7:0] = ldb[15]; } else ds[7:0] = ldb[7:0] /* ds is a0g or a1g */ } else { /* store */ if (ds!=a0g @@ ds!=a1g) ldb[15:0] = ds[31:16]; /* ds is a0g or a1g */ else ldb[15:0] = ds[7:0] with 8-bit sign extension } } else if ( ma!=nls @@ w/l is longword access ) { /* movs.l */ if (ls==load { if (ds!=a0g @@ ds!=a1g) { ds[31:0] = ldb[31:0]; if (ds==a0) a0g[7:0] = ldb[31]; if (ds==a1) a1g[7:0] = ldb[31]; } else ds[7:0] = ldb[7:0] /* ds is a0g or a1g */ } else { /* store */ if (ds!=a0g @@ ds!=a1g) ldb[31:0] = ds[31:0] /* ds is a0g or a1g */ else ldb[31:0] = ds[7:0] with 24-bit sign extension } } this section explains the breakdown of instructions, descriptions, etc. given in the rest of this section. 319 table 8-2 sample description (name): classification format abstract code cycle dc bit assembler input format. a brief description of operation displayed in order msb ? lsb all dsp instructions execute in 1 cycle the status of the dc bit after the instruction is executed format: [if cc] op.sz src1,src2,dest [if cc]: condition (unconditional, dct, or dcf) op: operation code sz: size src1: source 1 operand src2: source 2 operand dest: destination table 8-3 operation summary operation description ? , ? direction of transfer (xx) memory operand dc flag bits in the dsr & logical and of each bit | logical or of each bit ^ exclusive or of each bit ~ logical not of each bit < 320 x data transfer instructions: a(ax): 0=r4, 1=r5 d(destination, dx): 0=x0, 1=x1 d (source, da): 0=a0, 1=a1 y data transfer instructions: a(ay): 0=r6, 1=r7 d(destination, dy): 0=y0, 1=y1 d (source, da): 0=a0, 1=a1 single data transfer instructions: aa(as): 0=r4, 1=r5, 2=r2, 3=r3 dddd(ds): 5=a1, 7=a0, 8=x0, 9=x1, a=y0, b=y1, c=m0, d=a1g, e=m1 f=a0g dsp operation instructions: iiiiiii(imm): C32 to +32 ee(se): 0=x0, 1=x1, 2=y0, 3=a1 ff(sf): 0=y0, 1=y1, 2=x0, 3=a1 xx(sx): 0=x0, 1=x1, 2=a0, 3=a1 yy(sy): 0=y0, 1=y1, 2=m0, 3=m1 gg(dg): 0=m0, 1=m1, 2=a0, 3=a1 uu(du): 0=x0, 1=y0, 2=a0, 3=a1 zzzz(dz): 5=a1, 7=a0, 8=x0, 9=x1, a=y0, b=y1, c=m0, e=m1 dc bit: update: updated according to the operation result and the specifications of the cs (condition select) bits. : not updated. description: description of operation notes: notes on using the instruction operation: operation written in c language. examples: examples are written in assembler mnemonics and describe status before and after executing the instruction. 321 8.4.1 movs (move single data between memory and dsp register): dsp data transfer instruction format abstract code cycle dc bit movs.w @-as,ds asC2 ? as,(as) ? msw of ds,0 ? lsw of ds 111101aadddd0000 1 movs.w @as,ds (as) ? msw of ds,0 ? lsw of ds 111101aadddd0100 1 movs.w @as+,ds (as) ? msw of ds,0 ? lsw of ds, as+2 ? as 111101aadddd1000 1 movs.w @as+ix,ds (as) ? msw of ds,0 ? lsw of ds, as+ix ? as 111101aadddd1100 1 movs.w ds,@-as asC2 ? as,msw of ds ? (as) 111101aadddd0001 1 movs.w ds,@as msw of ds ? (as) 111101aadddd0101 1 movs.w ds,@as+ msw of ds ? (as),as+2 ? as 111101aadddd1001 1 movs.w ds,@as+ix msw of ds ? (as),as+ix ? as 111101aadddd1101 1 movs.l @-as,ds asC4 ? as,(as) ? ds 111101aadddd0010 1 movs.l @as,ds (as) ? ds 111101aadddd0110 1 movs.l @as+,ds (as) ? ds,as+4 ? as 111101aadddd1010 1 movs.l @as+ix,ds (as) ? ds,as+ix ? as 111101aadddd1110 1 movs.l ds, @-as asC4 ? as,ds ? (as) 111101aadddd0011 1 movs.l ds,@as ds ? (as) 111101aadddd0111 1 movs.l ds,@as+ ds ? (as),as+4 ? as 111101aadddd1011 1 movs.l ds,@as+ix ds ? (as),as+ix ? as 111101aadddd1111 1 description: transfers the source operand data to the destination. transfer can be from memory to register or register to memory. the transferred data can be a word or longword. when a word is transferred, the source operand is in memory, and the destination operand is a register, the word data is loaded to the top word of the register and the bottom word is cleared with zeros. when the source operand is a register and the destination operand is memory, the top word of the register is stored as the word data . in a longword transfer, the longword data is transferred. when the destination operand is a register with guard bits, the sign is extended and stored in the guard bits. note: when one of the guard bit registers a0g and a1g is the source operand for store processing, the data is output to the bottom 8 bits (bits 0C7) and the top 24 bits (bits 31C8) become undefined. 322 operation: see figure 8-19. memory to register register to memory as as any memory area any memory area 31 0 31 0 post update post update ds all 0 ds s 31 16 0 0 31 16 ldb[15:0] cleared C2, 0, +2, +lx ignored memory to register register to memory as as any memory area any memory area 31 0 31 0 post update post update ds ds s 31 0 0 31 ldb[31:0] longword data transfer word data transfer sign extension sign extension ?, 0, +2, +lx ?, 0, +4, +lx ?, 0, +4, +lx 15 15 figure 8-19 the movs instruction examples: movs.w @r4+,a0 ; before execution: r4=h'00000400, @r4=h'8765, a0=h'123456789a ; after execution: r4=h'00000402, a0=h'ff87650000 movs.l a1, @-r3 ; before execution: r3=h'00000800, a1=h'123456789a ; after execution: r3=h'000007fc, @(h'000007fc)=h'3456789a 323 8.4.2 movx (move between x memory and dsp register): dsp data transfer instruction format abstract code cycle dc bit movx.w @ax,dx (ax) ? msw of dx,0 ? lsw of dx 111100a*d*0*01** 1 movx.w @ax+,dx (ax) ? msw of dx,0 ? lsw of dx,ax+2 ? ax 111100a*d*0*10** 1 movx.w @ax+ix,dx (ax) ? msw of dx,0 ? lsw of dx,ax+ix ? ax 111100a*d*0*11** 1 movx.w da,@ax msw of da ? (ax) 111100a*d*1*01** 1 movx.w da,@ax+ msw of da ? (ax),ax+2 ? ax 111100a*d*1*10** 1 movx.w da,@ax+ix msw of da ? (ax),ax+ix ? ax 111100a*d*1*11** 1 note: "*" of the instruction code is movy instruction designation area. description: transfers the source operand data to the destination operand. transfer can be from memory to register or register to memory. the transferred data can only be word length for x memory. when the source operand is in memory, and the destination operand is a register, the word data is loaded to the top word of the register and the bottom word is cleared with zeros. when the source operand is a register and the destination operand is memory, the word data is stored in the top word of the register. operation: see figure 8-20. memory to register register to memory ax ax x memory x memory 31 0 31 0 post update post update dx all 0 da s 31 16 0 0 31 16 xdb[15:0] cleared 0, +2, +lx 0, +2, +lx ignored 15 15 figure 8-20 the movx instruction examples: movx.w @r4+,x0 ; before execution: r4=h'08010000, @r4=h'5555, x0=h'12345678 ; after execution: r4=h'08010002, x0=h'55550000 324 8.4.3 movy (move between y memory and dsp register): dsp data transfer instruction format abstract code cycle dc bit movy.w @ay,dy (ay) ? msw of dy,0 ? lsw of dy 111100*a*d*0**01 1 movy.w @ay+,dy (ay) ? msw of dy,0 ? lsw of dy, ay+2 ? ay 111100*a*d*0**10 1 movy.w @ay+iy,dy (ay) ? msw of dy,0 ? lsw of dy, ay+iy ? ay 111100*a*d*0**11 1 movy.w da,@ay msw of da ? (ay) 111100*a*d*1**01 1 movy.w da,@ay+ msw of da ? (ay),ay+2 ? ay 111100*a*d*1**10 1 movy.w da,@ay+iy msw of da ? (ay),ay+iy ? ay 111100*a*d*1**11 1 note: "*" of the instruction code is movx instruction designation area. description: transfers the source operand data to the destination operand. transfer can be from memory to register or register to memory. the transferred data can only be word length for y memory. when the source operand is in memory, and the destination operand is a register, the word data is loaded to the top word of the register and the bottom word is cleared with zeros. when the source operand is a register and the destination operand is memory, the word data is stored in the top word of the register. operation: see figure 8-21. memory to register register to memory ay ay y memory y memory 31 0 31 0 post update post update dy all 0 da s 31 16 0 0 31 16 ydb[15:0] cleared 0, +2, +ly 0, +2, +ly ignored 15 15 figure 8-21 the movy instruction 325 examples: movy.w a0, @r6+,r9 ; before execution: r6=h'08020000, r9=h'00000006, a0=h'123456789a ; after execution: r6=h'08020006, @(h'08020000)=h'3456 326 8.4.4 nopx (no access operation for x memory): dsp data transfer instruction format abstract code cycle dc bit nopx no operation 1111000*0*0*00** 1 description: no access operation for x memory. 8.4.5 nopy (no access operation for y memory): dsp data transfer instruction format abstract code cycle dc bit nopy no operation 111100*0*0*0**00 1 description: no access operation for y memory. 327 8.5 dsp operation instructions the dsp operation instructions are listed below in alphabetical order. see section 8.4, dsp data transfer instructions: classification, for an explanation of the format and symbols used in this description. table 8-4 alphabetical listing of dsp operation instructions instruction operation code cycles dc bit pabs sx,dz if sx 3 0, sx ? dz if sx < 0, 0Csx ? dz 111110********** 10001000xx00zzzz 1 update pabs sy,dz if sy 3 0, sy ? dz if sy < 0, 0Csy ? dz 111110********** 1010100000yyzzzz 1 update padd sx,sy,dz sx + sy ? dz 111110********** 10110001xxyyzzzz 1 update dct padd sx,sy,dz if dc = 1, sx + sy ? dz; if 0, nop 111110********** 10110010xxyyzzzz 1 dcf padd sx,sy,dz if dc = 0, sx + syCdz; if 1, nop 111110********** 10110011xxyyzzzz 1 padd sx,sy,du pmuls se,sf,dg sx + sy ? du; msw of se msw of sf ? dg 111110********** 0111eeffxxyygguu 1 update* paddc sx,sy,dz sx + sy + dc ? dz 111110********** 10110000xxyyzzzz 1 update pand sx,sy,dz sx & sy ? dz; clear lsw of dz 111110********** 10010101xxyyzzzz 1 update dct pand sx,sy,dz if dc = 1, sx & sy ? dz, clear lsw of dz; if 0, nop 111110********** 10010110xxyyzzzz 1 dcf pand sx,sy,dz if dc = 0, sx & sy ? dz, clear lsw of dz; if 1, nop 111110********** 10010111xxyyzzzz 1 pclr dz h'00000000 ? dz 111110********** 100011010000zzzz 1 update dct pclr dz if dc = 1, h'00000000 ? dz; if 0, nop 111110********** 100011100000zzzz 1 dcf pclr dz if dc = 0, h'00000000 ? dz; if 1, nop 111110********** 100011110000zzzz 1 328 table 8-4 alphabetical listing of dsp operation instructions (cont) instruction operation code cycles dc bit pcmp sx,sy sx C sy 111110********** 10000100xxyy0000 1 update pcopy sx,dz sx ? dz 111110********** 11011001xx00zzzz 1 update pcopy sy,dz sy ? dz 111110********** 1111100100yyzzzz 1 update dct pcopy sx,dz if dc = 1, sx ? dz; if 0, nop 111110********** 11011010xx00zzzz 1 dct pcopy sy,dz if dc = 1, sy ? dz; if 0, nop 111110********** 1111101000yyzzzz 1 dcf pcopy sx,dz if dc = 0, sx ? dz; if 1, nop 111110********** 11011011xx00zzzz 1 dcf pcopy sy,dz if dc = 0, sy ? dz; if 1, nop 111110********** 1111101100yyzzzz 1 pdec sx,dz msw of sxC1 ? msw of dz, clear lsw of dz 111110********** 10001001xx00zzzz 1 update pdec sy,dz msw of syC1 ? msw of dz, clear lsw of dz 111110********** 10101001xx00zzzz 1 update dct pdec sx,dz if dc = 1, msw of sxC1 ? msw of dz, clear lsw of dz; if 0, nop 111110********** 10001010xx00zzzz 1 dct pdec sy,dz if dc = 1, msw of syC1 ? msw of dz, clear lsw of dz; if 0, nop 111110********** 10101010xx00zzzz 1 dcf pdec sx,dz if dc = 0, msw of sxC1 ? msw of dz, clear lsw of dz; if 1, nop 111110********** 10001011xx00zzzz 1 dcf pdec sy,dz if dc = 0, msw of syC1 ? msw of dz, clear lsw of dz; if 1, nop 111110********** 10101011xx00zzzz 1 pdmsb sx,dz sx data msb position ? msw of dz, clear lsw of dz 111110********** 10011101xx00zzzz 1 update pdmsb sy,dz sy data msb position ? msw of dz, clear lsw of dz 111110********** 1011110100yyzzzz 1 update 329 table 8-4 alphabetical listing of dsp operation instructions (cont) instruction operation code cycles dc bit dct pdmsb sx,dz if dc = 1, sx data msb position ? msw of dz, clear lsw of dz; if 0, nop 111110********** 10011110xx00zzzz 1 dct pdmsb sy,dz if dc = 1, sy data msb position ? msw of dz, clear lsw of dz; if 0, nop 111110********** 1011111000yyzzzz 1 dcf pdmsb sx,dz if dc = 0, sx data msb position ? msw of dz, clear lsw of dz; if 1, nop 111110********** 10011111xx00zzzz 1 dcf pdmsb sy,dz if dc = 0, sy data msb position ? msw of dz, clear lsw of dz; if 1, nop 111110********** 1011111100yyzzzz 1 pinc sx,dz msw of sx + 1 ? msw of dz, clear lsw of dz 111110********** 10011001xx00zzzz 1 update pinc sy,dz msw of sy + 1 ? msw of dz, clear lsw of dz 111110********** 1011100100yyzzzz 1 update dct pinc sx,dz if dc = 1, msw of sx + 1 ? msw of dz, clear lsw of dz; if 0, nop 111110********** 10011010xx00zzzz 1 dct pinc sy,dz if dc = 1, msw of sy + 1 ? msw of dz, clear lsw of dz; if 0, nop 111110********** 1011101000yyzzzz 1 dcf pinc sx,dz if dc = 0, msw of sx + 1 ? msw of dz, clear lsw of dz; if 1, nop 111110********** 10011011xx00zzzz 1 dcf pinc sy,dz if dc = 0, msw of sy + 1 ? msw of dz, clear lsw of dz; if 1, nop 111110********** 1011101100yyzzzz 1 plds dz,mach dz ? mach 111110********** 111011010000zzzz 1 plds dz,macl dz ? macl 111110********** 111111010000zzzz 1 dct plds dz,mach if dc = 1, dz ? mach; if 0, nop 111110********** 111011100000zzzz 1 dct plds dz,macl if dc = 1, dz ? macl; if 0, nop 111110********** 111111100000zzzz 1 dcf plds dz,mach if dc = 0, dz ? mach; if 1, nop 111110********** 111011110000zzzz 1 330 table 8-4 alphabetical listing of dsp operation instructions (cont) instruction operation code cycles dc bit dcf plds dz,macl if dc = 0, dz ? macl; if 1, nop 111110********** 111111110000zzzz 1 pmuls se,sf,dg msw of se msw of sf ? dg 111110********** 0100eeff0000gg00 1 pneg sx,dz 0 C sx ? dz 111110********** 11001001xx00zzzz 1 update pneg sy,dz 0 C sy ? dz; 111110********** 1110100100yyzzzz 1 update dct pneg sx,dz if dc = 1, 0 C sx ? dz; if 0, nop 111110********** 11001010xx00zzzz 1 dct pneg sy,dz if dc = 1, 0 C sy ? dz; if 0, nop 111110********** 1110101000yyzzzz 1 dcf pneg sx,dz if dc = 0, 0 C sx ? dz; if 1, nop 111110********** 11001011xx00zzzz 1 dcf pneg sy,dz if dc = 0, 0 C sy ? dz; if 1, nop 111110********** 1110101100yyzzzz 1 por sx,sy,dz sx | sy ? dz, clear lsw of dz 111110********** 10110101xxyyzzzz 1 update dct por sx,sy,dz if dc = 1, sx|sy ? dz, clear lsw of dz; if 0, nop 111110********** 10110110xxyyzzzz 1 dcf por sx,sy,dz if dc = 0, sx|sy ? dz, clear lsw of dz; if 1, nop 111110********** 10110111xxyyzzzz 1 prnd sx,dz sx + h'00008000 ? dz, clear lsw of dz 111110********** 10011000xx00zzzz 1 update prnd sy,dz sy + h'00008000 ? dz, clear lsw of dz 111110********** 1011100000yyzzzz 1 update psha sx,sy,dz if sy 3 0, sx << sy ? dz; if sy < 0, sx >> sy ? dz 111110********** 10010001xxyyzzzz 1 update dct psha sx,sy,dz if dc = 1 & sy 3 0, sx << sy ? dz; if dc = 1 & sy < 0, sx >> sy ? dz; if dc = 0, nop 111110********** 10010010xxyyzzzz 1 331 table 8-4 alphabetical listing of dsp operation instructions (cont) instruction operation code cycles dc bit dcf psha sx,sy,dz if dc = 0 & sy 3 0, sx << sy ? dz; if dc = 0 & sy < 0, sx >> sy ? dz; if dc = 1, nop 111110********** 10010011xxyyzzzz 1 psha #imm,dz if imm 3 0, dz << imm ? dz; if imm < 0, dz >> imm ? dz 111110********** 00001iiiiiiizzzz 1 update pshl sx,sy,dz if sy 3 0, sx< 332 table 8-4 alphabetical listing of dsp operation instructions (cont) instruction operation code cycles dc bit psub sx,sy,dz sxCsy ? dz 111110********** 10100001xxyyzzzz 1 update dct psub sx,sy,dz if dc = 1, sx C sy ? dz; if 0, nop 111110********** 10100010xxyyzzzz 1 dcf psub sx,sy,dz if dc = 0, sx C sy ? dz; if 1, nop 111110********** 10100011xxyyzzzz 1 psub sx,sy,du pmuls se,sf,dg sx C sy ? du; msw of se msw of sf ? dg 111110********** 0110eeffxxyygguu 1 update psubc sx,sy,dz sxCsyCdc ? dz 111110********** 10100000xxyyzzzz 1 update pxor sx,sy,dz sx ^ sy ? dz, clear lsw of dz 111110********** 10100101xxyyzzzz 1 update dct pxor sx,sy,dz if dc = 1, sx ^ sy ? dz, clear lsw of dz; if 0, nop 111110********** 10100110xxyyzzzz 1 dcf pxor sx,sy,dz if dc = 0, sx ^ sy ? dz, clear lsw of dz; if 1, nop 111110********** 10100111xxyyzzzz 1 note: updated based on the padd operation results the dc bit in the dsr register is updated in accordance with the result of a dsp instruction and the specification of the status selection bit (cs). in addition to the dc bit, the dsr register also contains four status indication flags (v, n, z, and gt). the operation of each bit is described below. in the later descriptions of instruction operation for each dsp operation, the following operation contents are used as subroutine modules. operation contents (1) fix-point borrow dc bit /* sh-dsp: dsp engine: fixed_pt_dc_always_borrow.c set dsr's dc bit to borrow bit regardless the status of cs[2:0] bits */ { /* dc update policy: don't care the status of dspcsbits */ dspdcbit = borrow_bit; dspgtbit = ~((negative_bit ^ overflow_bit) | zero_bit); dspzbit = zero_bit; dspnbit = negative_bit; dspvbit = overflow_bit; 333 } operation contents (2) fixed-point carry dc bit /* sh-dsp: dsp engine: fixed_pt_dc_always_carry.c set dsr's dc bit to carry bit regardless the status of cs[2:0] bits */ { /* dc update policy: don't care the status of dspcsbits */ dspdcbit = carry_bit; dspgtbit = ~((negative_bit ^ overflow_bit) | zero_bit); dspzbit = zero_bit; dspnbit = negative_bit; dspvbit = overflow_bit; } operation contents (3) fixed-point negative value dc bit /* sh-dsp: dsp engine: fixed_pt_minus_dc_bit.c fixed point minus(-) operation: set dc bit in dsr */ { switch (dspcsbits) { case 0x0: /* borrow mode */ dspdcbit = borrow_bit; break; case 0x1: /* negative value mode */ dspdcbit = negative_bit; break; case 0x2: /* zero value mode */ dspdcbit = zero_bit; break; case 0x3: /* overflow mode */ dspdcbit = overflow_bit; break; case 0x4: /* signed greater than mode */ dspdcbit = ~((negative_bit ^ overflow_bit) | zero_bit); break; case 0x5: /* signed greater than or equal mode */ dspdcbit = ~(negative_bit ^ overflow_bit); 334 break; case 0x6: /* reserved */ case 0x7: /* reserved */ break; } dspgtbit = ~((negative_bit ^ overflow_bit) | zero_bit); dspzbit = zero_bit; dspnbit = negative_bit; dspvbit = overflow_bit; } operation contents (4) fixed-point overflow prevention function (saturated operation) /* sh-dsp: dsp engine: set to maximum non-overflow value if overlow fixed_pt_overflow_protection.c */ { if(sbit && overflow_bit) { /* overflow protection enable & overflow */ if(dsp_alu_dstg_bit7==0) { /* positive value */ if((dsp_alu_dstg_lsb8!=0x0) || (dsp_alu_dst_msb!=0)) { dsp_alu_dstg= 0x0; dsp_alu_dst = 0x7fffffff; } } else { /* negative value */ if((dsp_alu_dstg_lsb8!=0xff) || (dsp_alu_dst_msb!=1)) { dsp_alu_dstg= 0xff; dsp_alu_dst = 0x80000000; } } overflow_bit = 0; /* no more overflow when protected */ } } operation contents (5) fixed-point positive value dc bit /* sh-dsp: dsp engine: fixed_pt_plus_dc_bit.c fixed point plus(+) operation: set dc bit in dsr /* { switch (dspcsbits) { 335 case 0x0: /* carry mode */ dspdcbit = carry_bit; break; case 0x1: /* negative value mode */ dspdcbit = negative_bit; break; case 0x2: /* zero value mode */ dspdcbit = zero_bit; break; case 0x3: /* overflow mode */ dspdcbit = overflow_bit; break; case 0x4: /* signed greater than mode */ dspdcbit = ~((negative_bit ^ overflow_bit) | zero_bit); break; case 0x5: /* signed greater than or equal mode */ dspdcbit = ~(negative_bit ^ overflow_bit); break; case 0x6: /* reserved */ case 0x7: /* reserved */ break; } dspgtbit = ~((negative_bit ^ overflow_bit) | zero_bit); dspzbit = zero_bit; dspnbit = negative_bit; dspvbit = overflow_bit; } operation contents (6) fixed-point operation unconditional dc bit update /* sh-dsp: dsp engine: fixed point unconditional update fixed_pt_unconditional_update.c 1. write back to the destination register 2. update negative_bit and zero_bit. */ /* negative_bit = msb of alu's 40-bit result. zero_bit = if(alu's 40-bit result==0) sign-extend to a0/1g[31:8] */ { 336 dsp_reg[ex2_dz_no] = dsp_alu_dst; if (ex2_dz_no==0) { a0g = dsp_alu_dstg & mask000000ff; if(dsp_alu_dstg_bit7) a0g = a0g | maskffffff00; } else if (ex2_dz_no==1) { a1g = dsp_alu_dstg & mask000000ff; if(dsp_alu_dstg_bit7) a1g = a1g | maskffffff00; } negative_bit = dsp_alu_dstg_bit7; zero_bit = (dsp_alu_dst==0) & (dsp_alu_dstg_lsb8==0); } operation contents (7) integer negative value dc bit /* sh-dsp: dsp engine: integer_minus_dc_bit.c integer minus(-) operation: set dc bit in dsr */ #include "fixed_pt_minus_dc_bit.c" operation contents (8) integer overflow prevention function (saturated operation) /* sh-dsp: dsp engine: set to maximum non-overflow value if overlow integer_overflow_protection.c */ #include "fixed_pt_overflow_protection.c" operation contents (9) integer positive value dc bit /* sh-dsp: dsp engine: integer_plus_dc_bit.c integer plus(+) operation: set dc bit in dsr */ #include "fixed_pt_plus_dc_bit.c" operation contents (10) integer unconditional dc bit update /* sh-dsp: dsp engine: integer operation unconditional update integer_unconditional_update.c 1. write back to the destination register 2. update negative_bit and zero_bit. negative_bit = msb of alu's 24-bit(g-bit and hw) result. zero_bit = if(alu's g-bit & hw==0) 337 spec 1.1: clear alu integer operation's lsw. */ { dsp_reg_wd[ex2_dz_no*2] = dsp_alu_dst_hw; dsp_reg_wd[ex2_dz_no*2+1] = 0x0; /* clear lsw */ if (ex2_dz_no==0) { a0g = dsp_alu_dstg & mask000000ff; if(dsp_alu_dstg_bit7) a0g = a0g | maskffffff00; } else if (ex2_dz_no==1) { a1g = dsp_alu_dstg & mask000000ff; if(dsp_alu_dstg_bit7) a1g = a1g | maskffffff00; } negative_bit = dsp_alu_dstg_bit7; zero_bit = (dsp_alu_dst_hw==0) & (dsp_alu_dstg_lsb8==0); } operation contents (11) logical operation dc bit /* sh-dsp: dsp engine: logical_dc_bit.c logical operation: set dc bit in dsr */ { switch (dspcsbits) { case 0x0: /* carry mode */ dspdcbit = 0; break; case 0x1: /* negative value mode */ dspdcbit = negative_bit; break; case 0x2: /* zero value mode */ dspdcbit = zero_bit; break; case 0x3: /* overflow mode */ dspdcbit = 0; break; case 0x4: /* signed greater than mode */ dspdcbit = 0; break; 338 case 0x5: /* signed greater than or equal mode */ dspdcbit = 0; break; case 0x6: /* reserved */ case 0x7: /* reserved */ break; } dspgtbit = 0; dspzbit = zero_bit; dspnbit = negative_bit; dspvbit = 0; } operation contents (12) shift operation dc bit /* sh-dsp: dsp engine: shift_dc_bit.c shift operation: set dc bit in dsr */ { switch (dspcsbits) { case 0x0: /* carry mode */ dspdcbit = carry_bit; break; case 0x1: /* negative value mode */ dspdcbit = negative_bit; break; case 0x2: /* zero value mode */ dspdcbit = zero_bit; break; case 0x3: /* overflow mode */ dspdcbit = overflow_bit; break; case 0x4: /* signed greater than mode */ dspdcbit = 0; break; case 0x5: /* signed greater than or equal mode */ dspdcbit = 0; break; case 0x6: /* reserved */ 339 case 0x7: /* reserved */ break; } dspgtbit = 0; dspzbit = zero_bit; dspnbit = negative_bit; dspvbit = overflow_bit; } 340 8.5.1 pabs (absolute): dsp arithmetic operation instruction format abstract code cycle dc bit pabs sx,dz if sx 3 0,sx ? dz if sx<0,0Csx ? dz 111110********** 10001000xx00zzzz 1 update pabs sy,dz if sy 3 0,sy ? dz if sy<0,0Csy ? dz 111110********** 1010100000yyzzzz 1 update description: finds absolute values. when the sx and sy operands are positive, the contents of the operands are transferred to the dz operand. if the value is negative, the amounts of the sx and sy operand contents are subtracted from 0 and stored in the dz operand. the dc bit of the dsr register are updated according to the specifications of the cs bits. the n, z, v, and gt bits of the dsr register are updated. operation: { dsp_alu_src1 = 0; dsp_alu_src1g= 0; if (ex2_dsp_bit13==0) { /* 0 +/- sx -> dz */ switch (ex2_sx) { case 0x0: dsp_alu_src2 = x0; if (dsp_alu_src2_msb) dsp_alu_src2g = 0xff; else dsp_alu_src2g = 0x0; break; case 0x1: dsp_alu_src2 = x1; if (dsp_alu_src2_msb) dsp_alu_src2g = 0xff; else dsp_alu_src2g = 0x0; break; case 0x2: dsp_alu_src2 = a0; dsp_alu_src2g = a0g; break; case 0x3: dsp_alu_src2 = a1; dsp_alu_src2g = a1g; break; } } else { /* 0 +/- sy -> dz */ 341 switch (ex2_sy) { case 0x0: dsp_alu_src2 = y0; break; case 0x1: dsp_alu_src2 = y1; break; case 0x2: dsp_alu_src2 = m0; break; case 0x3: dsp_alu_src2 = m1; break; } if (dsp_alu_src2_msb) dsp_alu_src2g = 0xff; else dsp_alu_src2g = 0x0; } if(dsp_alu_src2g_bit7==0) { /* positive value */ dsp_alu_dst = 0x0 + dsp_alu_src2; carry_bit = 0; dsp_alu_dstg_lsb8= 0x0 + dsp_alu_src2g_lsb8 + carry_bit; } else { /* negative value */ dsp_alu_dst = 0x0 - dsp_alu_src2; borrow_bit = 1; dsp_alu_dstg_lsb8= 0x0 - dsp_alu_src2g_lsb8 - borrow_bit; } overflow_bit= plus_op_g_ov || !(pos_not_ov || neg_not_ov); #include "fixed_pt_overflow_protection.c" #include "fixed_pt_unconditional_update.c" if(dsp_alu_src2g_bit7==0) { #include "fixed_pt_plus_dc_bit.c" } else { overflow_bit= minus_op_g_ov || !(pos_not_ov || neg_not_ov); #include "fixed_pt_minus_dc_bit.c" } } 342 break; examples: pabs x0, m0 nopx nopy ; before execution: x0 = h'33333333, m0 = h'12345678 ; after execution: x0 = h'33333333, m0 = h'33333333 pabs x1, x1 nopx nopy ; before execution: x1 = h'dddddddd ; after execution: x1 = h'22222223 dc bit is updated depending on the state of cs [2:0]. 343 8.5.2 [if cc]padd (addition with condition): dsp arithmetic operation instruction format abstract code cycle dc bit padd sx,sy,dz sx+sy ? dz 111110********** 10110001xxyyzzzz 1 update dct padd sx,sy,dz if dc=1,sx+sy ? dz if 0,nop 111110********** 10110010xxyyzzzz 1 dcf padd sx,sy,dz if dc=0,sx+sy ? dz if 1,nop 111110********** 10110011xxyyzzzz 1 description: adds the contents of the sx and sy operands and stores the result in the dz operand. when conditions are specified for dct and dcf, the instruction is executed when those conditions are true. when they are false, the instruction is not executed. when conditions are not specified, the dc bit of the dsr register is updated according to the specifications for the cs bits. the n, z, v, and gt bits of the dsr register are also updated. the dc, n, z, v, and gt bits are not updated when conditions are specified, even if the conditions are true. operation: { switch (ex2_sx) { case 0x0: dsp_alu_src1 = x0; if (dsp_alu_src1_msb) dsp_alu_src1g = 0xff; else dsp_alu_src1g = 0x0; break; case 0x1: dsp_alu_src1 = x1; if (dsp_alu_src1_msb) dsp_alu_src1g = 0xff; else dsp_alu_src1g = 0x0; break; case 0x2: dsp_alu_src1 = a0; dsp_alu_src1g = a0g; break; case 0x3: dsp_alu_src1 = a1; dsp_alu_src1g = a1g; break; } switch (ex2_sy) { 344 case 0x0: dsp_alu_src2 = y0; break; case 0x1: dsp_alu_src2 = y1; break; case 0x2: dsp_alu_src2 = m0; break; case 0x3: dsp_alu_src2 = m1; break; } if (dsp_alu_src2_msb) dsp_alu_src2g = 0xff; else dsp_alu_src2g = 0x0; dsp_alu_dst = dsp_alu_src1 + dsp_alu_src2; carry_bit = ((dsp_alu_src1_msb | dsp_alu_src2_msb) & !dsp_alu _dst_msb) | (dsp_alu_src1_msb & dsp_alu_src2_msb); dsp_alu_dstg_lsb8 = dsp_alu_src1g_lsb8 + dsp_alu_src2g_lsb8 + carry_bit; overflow_bit= plus_op_g_ov || !(pos_not_ov || neg_not_ov); #include "fixed_pt_overflow_protection.c" if(dsp_unconditional_update) { /* unconditional operation */ #include "fixed_pt_unconditional_update.c" #include "fixed_pt_plus_dc_bit.c" } else if(dsp_condition_match) { /* conditional operation and match */ dsp_reg[ex2_dz_no] = dsp_alu_dst; if(ex2_dz_no==0) { a0g = dsp_alu_dstg & mask000000ff; if(dsp_alu_dstg_bit7) a0g = a0g | maskffffff00; } else if(ex2_dz_no==1) { a1g = dsp_alu_dstg & mask000000ff; if(dsp_alu_dstg_bit7) a1g = a1g | maskffffff00; } } } 345 break; examples: padd x0,y0,a0 nopx nopy ; before execution: x0 = h'22222222, y0 = h'33333333, a0 = h'123456789a ; after execution: x0 = h'22222222, y0 = h'33333333, a0 = h'0055555555 in case of unconditional execution, the dc bit is updated depending on the state of the cs [2:0] bit immediately before the operation. 346 8.5.3 padd pmuls (addition & multiply signed by signed): dsp arithmetic operation instruction format abstract code cycle dc bit padd sx,sy,du sx + sy ? du 111110********** 1 update pmuls se,sf,dg msw of se msw of sf ? dg 0111eeffxxyygguu description: adds the contents of the sx and sy operands and stores the result in the du operand. the contents of the top word of the se and sf operands are multiplied as signed and the result stored in the dg operand. these two processes are executed simultaneously in parallel. the dc bit of the dsr register is updated according to the results of the alu operation and the specifications for the cs bits. the n, z, v, and gt bits of the dsr register are also updated according to the results of the alu operation. note: since the pmuls is fixed decimal point multiplication, the operation result is different from that of muls even though the source data is the same. operation: { dsp_alu_dst = dsp_alu_src1 + dsp_alu_src2; carry_bit=((dsp_alu_src1_msb | dsp_alu_src2_msb) & !dsp_alu _dst_msb) | (dsp_alu_src1_msb & dsp_alu_src2_msb); dsp_alu_dstg_lsb8=dsp_alu_src1g_lsb8 + dsp_alu_src2g_lsb8 + carry _bit; overflow_bit= plus_op_g_ov || !(pos_not_ov || neg_not_ov); #include "../d_3operand.d/fixed_pt_overflow_protection.c" switch (ex2_du) { case 0x0: x0 = dsp_alu_dst; negative_bit = dsp_alu_dstg_bit7 zero_bit = (dsp_alu_dst==0)&(dsp_alu_dstg_lsb8==0); break; case 0x1: y0 = dsp_alu_dst; negative_bit = dsp_alu_dstg_bit7; zero_bit = (dsp_alu_dst==0)&(dsp_alu_dstg_lsb8==0); 347 break; case 0x2: a0 = dsp_alu_dst; a0g = dsp_alu_dstg & mask000000ff; if(dsp_alu_dstg_bit7) a0g = a0g | maskffffff00; negative_bit = dsp_alu_dstg_bit7; zero_bit = (dsp_alu_dst==0) & (dsp_alu_dstg _lsb8==0); break; case 0x3: a1 = dsp_alu_dst; a1g = dsp_alu_dstg & mask000000ff; if(dsp_alu_dstg_bit7) a1g = a1g | maskffffff00; negative_bit = dsp_alu_dstg_bit7; zero_bit = (dsp_alu_dst==0) & (dsp_alu_dstg _lsb8==0); break; } #include "../d_3operand.d/fixed_pt_plus_dc_bit.c" } break; examples: padd a0,m0,a0 pmuls x0,yo,mo nopx nopy ; before execution: x0 = h'00020000, y0 = h'00030000, m0 = h'22222222, a0 = h'0055555555 ; after execution: x0 = h'00020000, y0 = h'00030000, m0 = h'0000000c, a0 = h'0077777777 the dc bit is updated based on the result of the padd operation, depending on the state of cd [2:0]. 348 8.5.4 paddc (addition with carry): dsp arithmetic operation instruction format abstract code cycle dc bit paddc sx, sy, dz sx + sy + dc ? dz 111110********** 10110000xxyyzzzz 1 carry description: adds the contents of the sx and sy operands to the dc bit and stores the result in the dz operand. the dc bit of the dsr register is updated as the carry flag. the n, z, v, and gt bits of the dsr register are also updated. note: the dc bit is updated as the carry flag after execution of the paddc instruction regardless of the cs bits. operation: { switch (ex2_sx) { case 0x0: dsp_alu_src1 = x0; if (dsp_alu_src1_msb) dsp_alu_src1g = 0xff; else dsp_alu_src1g = 0x0; break; case 0x1: dsp_alu_src1 = x1; if (dsp_alu_src1_msb) dsp_alu_src1g = 0xff; else dsp_alu_src1g = 0x0; break; case 0x2: dsp_alu_src1 = a0; dsp_alu_src1g = a0g; break; case 0x3: dsp_alu_src1 = a1; dsp_alu_src1g = a1g; break; } switch (ex2_sy) { case 0x0: dsp_alu_src2 = y0; break; case 0x1: dsp_alu_src2 = y1; break; case 0x2: dsp_alu_src2 = m0; break; 349 case 0x3: dsp_alu_src2 = m1; break; } if (dsp_alu_src2_msb) dsp_alu_src2g = 0xff; else dsp_alu_src2g = 0x0; dsp_alu_dst = dsp_alu_src1 + dsp_alu_src2 + dspdcbit; carry_bit = ((dsp_alu_src1_msb | dsp_alu_src2_msb) & !dsp_alu _dst_msb) | (dsp_alu_src1_msb & dsp_alu_src2_msb); dsp_alu_dstg_lsb8 = dsp_alu_src1g_lsb8 + dsp_alu_src2g_lsb8 + carry_bit; overflow_bit= plus_op_g_ov || !(pos_not_ov || neg_not_ov); #include "fixed_pt_overflow_protection.c" #include "fixed_pt_unconditional_update.c" #include "fixed_pt_dc_always_carry.c" } break; example: cs[2:0]=***: always operate as carry or borrow mode, regardless of the status of the dc bit. paddc x0,y0,m0 nopx nopy ; before execution: x0 = h'b3333333, y0 = h'55555555 m0 = h'12345678, dc = 0 ; after execution: x0 = h'b3333333, y0 = h'55555555 m0 = h'08888888, dc = 1 paddc x0,y0,m0 nopx nopy ; before execution: x0 = h'33333333, y0 = h'55555555 m0 = h'12345678, dc = 1 ; after execution: x0 = h'33333333, y0 = h'55555555 m0 = h'88888889, dc = 0 dc bit is updated depending on the state of cs [2:0]. 350 8.5.5 [if cc] pand (logical and): dsp logical operation instruction format abstract code cycle dc bit pand sx,sy,dz sx & sy ? dz; clear lsw of dz 111110********** 10010101xxyyzzzz 1 dct pand sx,sy,dz if dc = 1, sx & sy ? dz, clear lsw of dz; if 0, nop 111110********** 10010110xxyyzzzz 1 dcf pand sx,sy,dz if dc = 0, sx & sy ? dz, clear lsw of dz; if 1, nop 111110********** 10010111xxyyzzzz 1 description: does an and of the upper word of the sx operand and the upper word of the sy operand, stores the result in the upper word of the dz operand, and clears the bottom word of the dz operand with zeros. when dz is a register that has guard bits, the guard bits are also zeroed. when conditions are specified for dct and dcf, the instruction is executed when those conditions are true. when they are false, the instruction is not executed. when conditions are not specified, the dc bit of the dsr register is updated according to the specifications for the cs bits. the n, z, v, and gt bits of the dsr register are also updated. the dc, n, z, v, and gt bits are not updated when conditions are specified, even if the conditions are true. note: the bottom word of the destination register and the guard bits are ignored when the dc bit is updated. operation: { switch (ex2_sx) { case 0x0: dsp_alu_src1 = x0; break; case 0x1: dsp_alu_src1 = x1; break; case 0x2: dsp_alu_src1 = a0; break; case 0x3: dsp_alu_src1 = a1; break; } switch (ex2_sy) { case 0x0: dsp_alu_src2 = y0; break; 351 case 0x1: dsp_alu_src2 = y1; break; case 0x2: dsp_alu_src2 = m0; break; case 0x3: dsp_alu_src2 = m1; break; } dsp_alu_dst_hw = dsp_alu_src1_hw & dsp_alu_src2_hw; if(dsp_unconditional_update) { /* unconditional operation */ dsp_reg_wd[ex2_dz_no*2] = dsp_alu_dst_hw; dsp_reg_wd[ex2_dz_no*2+1] = 0x0; /* clear lsw */ if (ex2_dz_no==0) a0g = 0x0; /* clear guard bits */ else if (ex2_dz_no==1) a1g = 0x0; carry_bit = 0x0; negative_bit = dsp_alu_dst_msb; zero_bit = (dsp_alu_dst_hw==0); overflow_bit = 0x0; #include "logical_dc_bit.c" } else if(dsp_condition_match) { /* conditional operation and match */ dsp_reg_wd[ex2_dz_no*2] = dsp_alu_dst_hw; dsp_reg_wd[ex2_dz_no*2+1] = 0x0; /* clear lsw */ if (ex2_dz_no==0) a0g = 0x0; /* clear guard bits */ else if (ex2_dz_no==1) a1g = 0x0; } } break; 352 example: pand x0,y0,a0 nopx nopy ; before execution: x0 = h'33333333, y0 = h'55555555 a0 = h'123456789a ; after execution: x0 = h'33333333, y0 = h'55555555 a0 = h'0011110000 in case of unconditional execution, the dc bit is updated depending on the state of the cs [2:0] bit immediately before the operation. 353 8.5.6 [if cc] pclr (clear): dsp arithmetic operation instruction format abstract code cycle dc bit pclr dz h'00000000 ? dz 111110********** 100011010000zzzz 1 update dct pclr dz if dc = 1, h'00000000 ? dz if 0, nop 111110********** 100011100000zzzz 1 dcf pclr dz if dc = 0, h'00000000 ? dz if 1, nop 111110********** 100011110000zzzz 1 description: clears the dz operand. when conditions are specified for dct and dcf, the instruction is executed when those conditions are true. when they are false, the instruction is not executed. when conditions are not specified, the dc bit of the dsr register is updated according to the specifications for the cs bits. the z bit of the dsr register is set to 1. the n, v, and gt bits are cleared to 0. the dc, n, z, v, and gt bits are not updated when conditions are specified, even if the conditions are true. operation: { /* 0 + 0 -> dz */ if(dsp_unconditional_update) { /* unconditional operation */ dsp_reg[ex2_dz_no] = 0x0; if (ex2_dz_no==0) a0g = 0x0; else if (ex2_dz_no==1) a1g = 0x0; carry_bit = 0; negative_bit = 0; zero_bit = 1; overflow_bit = 0; #include "fixed_pt_plus_dc_bit.c" } else if(dsp_condition_match) { /* conditional operation and match */ dsp_reg[ex2_dz_no] = 0x0; } } 354 break; example: pclr a0 nopx nopy ; before execution: a0 = h'ff87654321 ; after execution: a0 = h'0000000000 in case of unconditional execution, the dc bit is updated depending on the state of the cs [2:0]. 355 8.5.7 pcmp (compare two data): dsp arithmetic operation instruction format abstract code cycle dc bit pcmp sx, sy sxCsy 111110********** 10000100xxyy0000 1 update description: subtracts the contents of the sy operand from the sx operand. the dc bit of the dsr register is updated according to the specifications for the cs bits. the n, z, v, and gt bits of the dsr register are also updated. operation: { switch (ex2_sx) { case 0x0: dsp_alu_src1 = x0; if (dsp_alu_src1_msb) dsp_alu_src1g = 0xff; else dsp_alu_src1g = 0x0; break; case 0x1: dsp_alu_src1 = x1; if (dsp_alu_src1_msb) dsp_alu_src1g = 0xff; else dsp_alu_src1g = 0x0; break; case 0x2: dsp_alu_src1 = a0; dsp_alu_src1g = a0g; break; case 0x3: dsp_alu_src1 = a1; dsp_alu_src1g = a1g; break; } switch (ex2_sy) { case 0x0: dsp_alu_src2 = y0; break; case 0x1: dsp_alu_src2 = y1; break; case 0x2: dsp_alu_src2 = m0; break; case 0x3: dsp_alu_src2 = m1; break; } 356 if (dsp_alu_src2_msb) dsp_alu_src2g = 0xff; else dsp_alu_src2g = 0x0; dsp_alu_dst = dsp_alu_src1 - dsp_alu_src2; carry_bit =((dsp_alu_src1_msb | !dsp_alu_src2_msb) && !dsp_alu_dst_msb) | (dsp_alu_src1_msb & !dsp_alu_src2_msb); borrow_bit = !carry_bit; dsp_alu_dstg_lsb8 = dsp_alu_src1g_lsb8 - dsp_alu_src2g_lsb8 - borrow_bit; negative_bit = dsp_alu_dstg_bit7; zero_bit = (dsp_alu_dst==0) & (dsp_alu_dstg_lsb8==0); overflow_bit= minus_op_g_ov || !(pos_not_ov || neg_not_ov); #include "fixed_pt_overflow_protection.c" #include "fixed_pt_minus_dc_bit.c" } break; examples: pcmp x0, y0 nopx nopy ; before execution: x0 = h'22222222, y0 = h'33333333 ; after execution: x0 = h'22222222, y0 = h'33333333 n = 1, z = 0, v = 0, gt = 0 dc bit is updated depending on the state of cs [2:0]. 357 8.5.8 [if cc] pcopy (copy with condition): dsp arithmetic operation instruction format abstract code cycle dc bit pcopy sx,dz sx ? dz 111110********** 11011001xx00zzzz 1 update pcopy sy,dz sy ? dz 111110********** 1111100100yyzzzz 1 update dct pcopy sx,dz if dc = 1, sx ? dz if 0, nop 111110********** 11011010xx00zzzz 1 dct pcopy sy,dz if dc = 1, sy ? dz if 0, nop 111110********** 1111101000yyzzzz 1 dcf pcopy sx,dz if dc = 0, sx ? dz if 1, nop 111110********** 11011011xx00zzzz 1 dcf pcopy sy,dz if dc = 0, sy ? dz if 1, nop 111110********** 1111101100yyzzzz 1 description: stores the sx and sy operands in the dz operand. when conditions are specified for dct and dcf, the instruction is executed when those conditions are true. when they are false, the instruction is not executed. when conditions are not specified, the dc bit of the dsr register is updated according to the specifications for the cs bits. the n, z, v, and gt bits are also updated. the dc, n, z, v, and gt bits are not updated when conditions are specified, even if the conditions are true. operation: { /* sx + 0 -> dz */ if (ex2_dsp_bit13==0) { /* sx + 0 -> dz */ switch (ex2_sx) { case 0x0: dsp_alu_src1 = x0; if (dsp_alu_src1_msb) dsp_alu_src1g = 0xff; else dsp_alu_src1g = 0x0; break; case 0x1: dsp_alu_src1 = x1; if (dsp_alu_src1_msb) dsp_alu_src1g = 0xff; else dsp_alu_src1g = 0x0; break; case 0x2: dsp_alu_src1 = a0; dsp_alu_src1g = a0g; 358 break; case 0x3: dsp_alu_src1 = a1; dsp_alu_src1g = a1g; break; } dsp_alu_src2 = 0; dsp_alu_src2g= 0; } else { /* 0 + sy -> dz */ dsp_alu_src1 = 0; dsp_alu_src1g= 0; switch (ex2_sy) { case 0x0: dsp_alu_src2 = y0; break; case 0x1: dsp_alu_src2 = y1; break; case 0x2: dsp_alu_src2 = m0; break; case 0x3: dsp_alu_src2 = m1; break; } if (dsp_alu_src2_msb) dsp_alu_src2g = 0xff; else dsp_alu_src2g = 0x0; } dsp_alu_dst = dsp_alu_src1 + dsp_alu_src2; carry_bit = ((dsp_alu_src1_msb | dsp_alu_src2_msb) & !dsp_alu _dst_msb) | (dsp_alu_src1_msb & dsp_alu_src2_msb); dsp_alu_dstg_lsb8 = dsp_alu_src1g_lsb8 + dsp_alu_src2g_lsb8 + carry_bit; overflow_bit= plus_op_g_ov || !(pos_not_ov || neg_not_ov); #include "fixed_pt_overflow_protection.c" if(dsp_unconditional_update) { /* unconditional operation */ #include "fixed_pt_unconditional_update.c" #include "fixed_pt_plus_dc_bit.c" 359 } else if(dsp_condition_match) { /* conditional operation and match */ dsp_reg[ex2_dz_no] = dsp_alu_dst; if(ex2_dz_no==0) { a0g = dsp_alu_dstg & mask000000ff; if(dsp_alu_dstg_bit7) a0g = a0g | maskffffff00; } else if(ex2_dz_no==1) { a1g = dsp_alu_dstg & mask000000ff; if(dsp_alu_dstg_bit7) a1g = a1g | maskffffff00; } } } break; examples: pcopy x0, a0 nopx nopy ; before execution: x0 = h'55555555, a0 = h'ffffffff ; after execution: x0 = h'55555555, a0 = h'0055555555 in case of unconditional execution, the dc bit is updated depending on the state of cs [2:0]. 360 8.5.9 [if cc] pdec (decrement by 1): dsp arithmetic operation instruction format abstract code cycle dc bit pdec sx,dz msw of sxC1 ? msw of dz, clear lsw of dz 111110********** 10001001xx00zzzz 1 update pdec sy,dz msw of syC1 ? msw of dz, clear lsw of dz 111110********** 1010100100yyzzzz 1 update dct pdec sx,dz if dc = 1, msw of sxC1 ? msw of dz, clear lsw of dz; if 0, nop 111110********** 10001010xx00zzzz 1 dct pdec sy,dz if dc = 1, msw of syC1 ? msw of dz, clear lsw of dz; if 0, nop 111110********** 1010101000yyzzzz 1 dcf pdec sx,dz if dc = 0, msw of sxC1 ? msw of dz, clear lsw of dz; if 1, nop 111110********** 10001011xx00zzzz 1 dcf pdec sy,dz if dc = 0, msw of syC1 ? msw of dz, clear lsw of dz; if 1, nop 111110********** 1010101100yyzzzz 1 description: subtracts 1 from the top word of the sx and sy operands, stores the result in the upper word of the dz operand, and clears the bottom word of the dz operand with zeros. when conditions are specified for dct and dcf, the instruction is executed when those conditions are true. when they are false, the instruction is not executed. when conditions are not specified, the dc bit of the dsr register is updated according to the specifications for the cs bits. the n, z, v, and gt bits of the dsr register are also updated. the dc, n, z, v, and gt bits are not updated when conditions are specified, even if the conditions are true. note: the bottom word of the destination register is ignored when the dc bit is updated. operation: { dsp_alu_src2 = 0x1; dsp_alu_src2g= 0x0; if (ex2_dsp_bit13==0) { /* msw of sx -1 -> dz */ switch (ex2_sx) { case 0x0: dsp_alu_src1 = x0; if (dsp_alu_src1_msb) dsp_alu_src1g = 0xff; else dsp_alu_src1g = 0x0; 361 break; case 0x1: dsp_alu_src1 = x1; if (dsp_alu_src1_msb) dsp_alu_src1g = 0xff; else dsp_alu_src1g = 0x0; break; case 0x2: dsp_alu_src1 = a0; dsp_alu_src1g = a0g; break; case 0x3: dsp_alu_src1 = a1; dsp_alu_src1g = a1g; break; } } else { /* msw of sy -1 -> dz */ switch (ex2_sy) { case 0x0: dsp_alu_src1 = y0; break; case 0x1: dsp_alu_src1 = y1; break; case 0x2: dsp_alu_src1 = m0; break; case 0x3: dsp_alu_src1 = m1; break; } if (dsp_alu_src1_msb) dsp_alu_src1g = 0xff; else dsp_alu_src1g = 0x0; } dsp_alu_dst_hw = dsp_alu_src1_hw - 1; carry_bit =((dsp_alu_src1_msb | !dsp_alu_src2_msb) && !dsp_alu _dst_msb) | (dsp_alu_src1_msb & !dsp_alu_src2_msb); borrow_bit = !carry_bit; dsp_alu_dstg_lsb8 = dsp_alu_src1g_lsb8 - dsp_alu_src2g_lsb8 - borrow_bit; overflow_bit= plus_op_g_ov || !(pos_not_ov || neg_not_ov); 362 #include "integer_overflow_protection.c" if(dsp_unconditional_update) { /* unconditional operation */ #include "integer_unconditional_update.c" #include "integer_minus_dc_bit.c" } else if(dsp_condition_match) { /* conditional operation and match */ dsp_reg_wd[ex2_dz_no*2] = dsp_alu_dst_hw; dsp_reg_wd[ex2_dz_no*2+1] = 0x0; /* clear lsw */ if(ex2_dz_no==0) { a0g = dsp_alu_dstg & mask000000ff; if(dsp_alu_dstg_bit7) a0g = a0g | maskffffff00; } else if(ex2_dz_no==1) { a1g = dsp_alu_dstg & mask000000ff; if(dsp_alu_dstg_bit7) a1g = a1g | maskffffff00; } } } break; example: pdec x0,m0 nopx nopy ; before execution: x0 = h'0052330f, m0 = h'12345678 ; after execution: x0 = h'0052330f, m0 = h'00510000 pdec x1,x1 nopx nopy ; before execution: x1 = h'fc342855 ; after execution: x1 = h'fc330000 in case of unconditional execution, the dc bit is updated depending on the state of cs [2:0]. 363 8.5.10 [if cc] pdmsb (detect msb with condition): dsp arithmetic operation instruction format abstract code cycle dc bit pdmsb sx,dz sx data msb position ? msw of dz, clear lsw of dz 111110********** 10011101xx00zzzz 1 update pdmsb sy,dz sy data msb position ? msw of dz, clear lsw of dz 111110********** 1011110100yyzzzz 1 update dct pdmsb sx,dz if dc = 1, sx data msb position ? msw of dz, clear lsw of dz; if 0, nop 111110********** 10011110xx00zzzz 1 dct pdmsb sy,dz if dc = 1, sy data msb position ? msw of dz, clear lsw of dz; if 0, nop 111110********** 1011111000yyzzzz 1 dcf pdmsb sx,dz if dc = 0, sx data msb position ? msw of dz, clear lsw of dz; if 1, nop 111110********** 10011111xx00zzzz 1 dcf pdmsb sy,dz if dc = 0, sy data msb position ? msw of dz, clear lsw of dz; if 1, nop 111110********** 1011111100yyzzzz 1 description: finds the first position to change in the lineup of sx and sy operand bits and stores the bit position in the dz operand. when conditions are specified for dct and dcf, the instruction is executed when those conditions are true. when they are false, the instruction is not executed. when conditions are not specified, the dc bit of the dsr register is updated according to the specifications for the cs bits. the n, z, v, and gt bits of the dsr register are also updated. the dc, n, z, v, and gt bits are not updated when conditions are specified, even if the conditions are true. operation: { dsp_alu_src2 = 0x0; dsp_alu_src2g= 0x0; if (ex2_dsp_bit13==0) { /* msb(sx) -> dz */ switch (ex2_sx) { case 0x0: dsp_alu_src1 = x0; if (dsp_alu_src1_msb) dsp_alu_src1g = 0xff; else dsp_alu_src1g = 0x0; break; 364 case 0x1: dsp_alu_src1 = x1; if (dsp_alu_src1_msb) dsp_alu_src1g = 0xff; else dsp_alu_src1g = 0x0; break; case 0x2: dsp_alu_src1 = a0; dsp_alu_src1g = a0g; break; case 0x3: dsp_alu_src1 = a1; dsp_alu_src1g = a1g; break; } } else { /* msb(sy) -> dz */ switch (ex2_sy) { case 0x0: dsp_alu_src1 = y0; break; case 0x1: dsp_alu_src1 = y1; break; case 0x2: dsp_alu_src1 = m0; break; case 0x3: dsp_alu_src1 = m1; break; } if (dsp_alu_src1_msb) dsp_alu_src1g = 0xff; else dsp_alu_src1g = 0x0; } { short int i; unsigned char msb, src1g; unsigned long src1=dsp_alu_src1; msb= dsp_alu_src1g_bit7; src1g=(dsp_alu_src1g_lsb8 << 1); for(i=38;((msb==(src1g>>7))&&(i>=32));i--) { src1g <<= 1; } if(i==31) { for(i;((msb==(src1>>31))&&(i>=0));i--) { src1 <<= 1; } } dsp_alu_dst = 0x0; 365 dsp_alu_dst_hw = (short int) (30-i); if (dsp_alu_dst_msb) dsp_alu_dstg_lsb8 = 0xff; else dsp_alu_dstg_lsb8 = 0x0; } carry_bit = 0; if(dsp_unconditional_update) { /* unconditional operation */ overflow_bit= 0; #include "integer_unconditional_update.c" #include "integer_plus_dc_bit.c" } else if(dsp_condition_match) { /* conditional operation and match */ dsp_reg_wd[ex2_dz_no*2] = dsp_alu_dst_hw; dsp_reg_wd[ex2_dz_no*2+1] = 0x0; /* clear lsw */ if(ex2_dz_no==0) { a0g = dsp_alu_dstg & mask000000ff; if(dsp_alu_dstg_bit7) a0g = a0g | maskffffff00; } else if(ex2_dz_no==1) { a1g = dsp_alu_dstg & mask000000ff; if(dsp_alu_dstg_bit7) a1g = a1g | maskffffff00; } } } break; example: pdmsb x0,m0 nopx nopy ; before execution: x0 = h'0052330f, m0 = h'12345678 ; after execution: x0 = h'0052330f, m0 = h'00080000 pdmsb x1,x1 nopx nopy ; before execution: x1 = h'fc342855 ; after execution: x1 = h'00050000 in case of unconditional execution, the dc bit is updated depending on the state of cs [2:0]. 366 8.5.11 [if cc] pinc (increment by 1 with condition): dsp arithmetic operation instruction format abstract code cycle dc bit pinc sx,dz msw of sx + 1 ? msw of dz, clear lsw of dz 111110********** 10011001xx00zzzz 1 update pinc sy,dz msw of sy + 1 ? msw of dz, clear lsw of dz 111110********** 1011100100yyzzzz 1 update dct pinc sx,dz if dc = 1, msw of sx + 1 ? msw of dz, clear lsw of dz; if 0, nop 111110********** 10011010xx00zzzz 1 dct pinc sy,dz if dc = 1, msw of sy + 1 ? msw of dz, clear lsw of dz; if 0, nop 111110********** 1011101000yyzzzz 1 dcf pinc sx,dz if dc = 0, msw of sx + 1 ? msw of dz, clear lsw of dz; if 1, nop 111110********** 10011011xx00zzzz 1 dcf pinc sy,dz if dc = 0, msw of sy + 1 ? msw of dz, clear lsw of dz; if 1, nop 111110********** 1011101100yyzzzz 1 description: adds 1 to the top word of the sx and sy operands, stores the result in the upper word of the dz operand, and clears the bottom word of the dz operand with zeros. when conditions are specified for dct and dcf, the instruction is executed when those conditions are true. when they are false, the instruction is not executed. when conditions are not specified, the dc bit of the dsr register is updated according to the specifications for the cs bits. the n, z, v, and gt bits of the dsr register are also updated. the dc, n, z, v, and gt bits are not updated when conditions are specified, even if the conditions are true. note: the bottom word of the destination register is ignored when the dc bit is updated. operation: { dsp_alu_src2 = 0x1; dsp_alu_src2g= 0x0; if (ex2_dsp_bit13==0) { /* msw of sx +1 -> dz */ switch (ex2_sx) { case 0x0: dsp_alu_src1 = x0; if (dsp_alu_src1_msb) dsp_alu_src1g = 0xff; 367 else dsp_alu_src1g = 0x0; break; case 0x1: dsp_alu_src1 = x1; if (dsp_alu_src1_msb) dsp_alu_src1g = 0xff; else dsp_alu_src1g = 0x0; break; case 0x2: dsp_alu_src1 = a0; dsp_alu_src1g = a0g; break; case 0x3: dsp_alu_src1 = a1; dsp_alu_src1g = a1g; break; } } else { /* msw of sy +1 -> dz */ switch (ex2_sy) { case 0x0: dsp_alu_src1 = y0; break; case 0x1: dsp_alu_src1 = y1; break; case 0x2: dsp_alu_src1 = m0; break; case 0x3: dsp_alu_src1 = m1; break; } if (dsp_alu_src1_msb) dsp_alu_src1g = 0xff; else dsp_alu_src1g = 0x0; } dsp_alu_dst_hw = dsp_alu_src1_hw + 1; carry_bit = ((dsp_alu_src1_msb | dsp_alu_src2_msb) & !dsp_alu _dst_msb) | (dsp_alu_src1_msb & dsp_alu_src2_msb); dsp_alu_dstg_lsb8 = dsp_alu_src1g_lsb8 + dsp_alu_src2g_lsb8 + carry_bit; overflow_bit= plus_op_g_ov || !(pos_not_ov || neg_not_ov); 368 #include "integer_overflow_protection.c" if(dsp_unconditional_update) { /* unconditional operation */ #include "integer_unconditional_update.c" #include "integer_plus_dc_bit.c" } else if(dsp_condition_match) { /* conditional operation and match */ dsp_reg_wd[ex2_dz_no*2] = dsp_alu_dst_hw; dsp_reg_wd[ex2_dz_no*2+1] = 0x0; /* clear lsw */ if(ex2_dz_no==0) { a0g = dsp_alu_dstg & mask000000ff; if(dsp_alu_dstg_bit7) a0g = a0g | maskffffff00; } else if(ex2_dz_no==1) { a1g = dsp_alu_dstg & mask000000ff; if(dsp_alu_dstg_bit7) a1g = a1g | maskffffff00; } } } break; example: pinc x0,m0 nopx nopy ; before execution: x0 = h'0052330f, m0 = h'12345678 ; after execution: x0 = h'0052330f, m0 = h'00530000 pinc x1,x1 nopx nopy ; before execution: x1 = h'fc342855 ; after execution: x1 = h'fc350000 in case of unconditional execution, the dc bit is updated depending on the state of cs [2:0]. 369 8.5.12 [if cc] plds (load system register): dsp system control instruction format abstract code cycle dc bit plds dz,mach dz ? mach 111110********** 111011010000zzzz 1 plds dz,macl dz ? macl 111110********** 111111010000zzzz 1 dct plds dz,mach if dc = 1, dz ? mach if 0, nop 111110********** 111011100000zzzz 1 dct plds dz,macl if dc = 1, dz ? macl if 0, nop 111110********** 111111100000zzzz 1 dcf plds dz,mach if dc = 0, dz ? mach if 1, nop 111110********** 111011110000zzzz 1 dcf plds dz,macl if dc = 0, dz ? macl if 1, nop 111110********** 111111110000zzzz 1 description: stores the dz operand in the mach and macl registers. when conditions are specified for dct and dcf, the instruction is executed when those conditions are true. when they are false, the instruction is not executed. the dc, n, z, v, and gt bits of the dsr register are not updated. note: though psts, movx, and movy can be designated in parallel, their execution may take two cycles. operation: { /* dz -> mach */ if(dsp_unconditional_update) { /* unconditional operation */ mach = dsp_reg[ex2_dz_no] ; } else if(dsp_condition_match) { /* conditional operation and match */ mach = dsp_reg[ex2_dz_no] ; } } break; /* sh-dsp: dsp engine: local data move operation: load system register 370 plds_macl.c rev 1.0 24 may 1995, ey */ { /* dz -> macl */ if(dsp_unconditional_update) { /* unconditional operation */ macl = dsp_reg[ex2_dz_no] ; } else if(dsp_condition_match) { /* conditional operation and match */ macl = dsp_reg[ex2_dz_no] ; } } break; example: plds a0,mach nopx nopy ; before execution: a0 = h'123456789a, mach = h'66666666 ; after execution: a0 = h'123456789a, mach = h'3456789a 371 8.5.13 pmuls (multiply signed by signed): dsp arithmetic operation instruction format abstract code cycle dc bit pmuls se,sf,dg msw of se msw of sf ? dg 111110********** 0100eeff0000gg00 1 description: the contents of the top word of the se and sf operands are multiplied as signed and the result stored in the dg operand. the dc, n, z, v, and gt bits of the dsr register are not updated. note: since pmuls is fixed decimal point multiplication, the operation result is different from that of muls even thou gh the source data is the same. examples: pmuls x0,y0,m0 nopx nopy ; before execution: x0 = h'00010000, y0 = h'00020000, m0 = h'33333333 ; after execution: x0 = h'00010000, y0 = h'00020000, m0 = h'00000004 pmuls x1,y1,a0 nopx nopy ; before execution: x1 = h'fffe2222, y1 = h'0001aaaa, a0 = h'4444444444 ; after execution: x1 = h'fffe2222, y1 = h'0001aaaa, a0 = h'fffffffffc 372 8.5.14 [if cc] pneg (negate): dsp arithmetic operation instruction format abstract code cycle dc bit pneg sx,dz 0 C sx ? dz 111110********** 11001001xx00zzzz 1 update pneg sy,dz 0 C sy ? dz 111110********** 1110100100yyzzzz 1 update dct pneg sx,dz if dc = 1, 0 C sx ? dz if 0, nop 111110********** 11001010xx00zzzz 1 dct pneg sy,dz if dc = 1, 0 C sy ? dz if 0, nop 111110********** 1110101000yyzzzz 1 dcf pneg sx,dz if dc = 0, 0 C sx ? dz if 1, nop 111110********** 11001011xx00zzzz 1 dcf pneg sy,dz if dc = 0, 0 C sy ? dz if 1, nop 111110********** 1110101100yyzzzz 1 description: reverses the sign. subtracts the sx and sy operands from 0 and stores the result in the dz operand. when conditions are specified for dct and dcf, the instruction is executed when those conditions are true. when they are false, the instruction is not executed. when conditions are not specified, the dc bit of the dsr register is updated according to the specifications for the cs bits. the n, z, v, and gt bits of the dsr register are also updated. the dc, n, z, v, and gt bits are not updated when conditions are specified, even if the conditions are true. operation: { dsp_alu_src1 = 0; dsp_alu_src1g= 0; if (ex2_dsp_bit13==0) { /* 0 - sx -> dz */ switch (ex2_sx) { case 0x0: dsp_alu_src2 = x0; if (dsp_alu_src2_msb) dsp_alu_src2g = 0xff; else dsp_alu_src2g = 0x0; break; case 0x1: dsp_alu_src2 = x1; if (dsp_alu_src2_msb) dsp_alu_src2g = 0xff; 373 else dsp_alu_src2g = 0x0; break; case 0x2: dsp_alu_src2 = a0; dsp_alu_src2g = a0g; break; case 0x3: dsp_alu_src2 = a1; dsp_alu_src2g = a1g; break; } } else { /* 0 - sy -> dz */ switch (ex2_sy) { case 0x0: dsp_alu_src2 = y0; break; case 0x1: dsp_alu_src2 = y1; break; case 0x2: dsp_alu_src2 = m0; break; case 0x3: dsp_alu_src2 = m1; break; } if (dsp_alu_src2_msb) dsp_alu_src2g = 0xff; else dsp_alu_src2g = 0x0; } dsp_alu_dst = dsp_alu_src1 - dsp_alu_src2; carry_bit =((dsp_alu_src1_msb | !dsp_alu_src2_msb) && !dsp_alu_dst_msb) | (dsp_alu_src1_msb & !dsp_alu_src2_msb); borrow_bit = !carry_bit; dsp_alu_dstg_lsb8 = dsp_alu_src1g_lsb8 - dsp_alu_src2g_lsb8 - borrow_bit; overflow_bit= minus_op_g_ov || !(pos_not_ov || neg_not_ov); #include "fixed_pt_overflow_protection.c" if(dsp_unconditional_update) { /* unconditional operation */ #include "fixed_pt_unconditional_update.c" #include "fixed_pt_minus_dc_bit.c" 374 } else if(dsp_condition_match) { /* conditional operation and match */ dsp_reg[ex2_dz_no] = dsp_alu_dst; if(ex2_dz_no==0) { a0g = dsp_alu_dstg & mask000000ff; if(dsp_alu_dstg_bit7) a0g = a0g | maskffffff00; } else if(ex2_dz_no==1) { a1g = dsp_alu_dstg & mask000000ff; if(dsp_alu_dstg_bit7) a1g = a1g | maskffffff00; } } } break; examples: pneg x0,a0 nopx nopy ; before execution: x0 = h'55555555, a0 = h'a987654321 ; after execution: x0 = h'55555555, a0 = h'ffaaaaaaab pneg x1,y1 nopx nopy ; before execution: y1 = h'99999999 ; after execution: y1 = h'66666667 in case of unconditional execution, the dc bit is updated depending on the state of cs [2:0]. 375 8.5.15 [if cc] por (logical or): dsp logical operation instruction format abstract code cycle dc bit por sx,sy,dz sx | sy ? dz, clear lsw of dz 111110********** 10110101xxyyzzzz 1 update dct por sx,sy,dz if dc = 1, sx | sy ? dz, clear lsw of dz; if 0, nop 111110********** 10110110xxyyzzzz 1 dcf por sx,sy,dz if dc = 0, sx | sy ? dz, clear lsw of dz; if 1, nop 111110********** 10110111xxyyzzzz 1 description: takes the or of the top word of the sx operand and the top word of the sy operand, stores the result in the top word of the dz operand, and clears the bottom word of dz with zeros. when dz is a register that has guard bits, the guard bits are also zeroed. when conditions are specified for dct and dcf, the instruction is executed when those conditions are true. when they are false, the instruction is not executed. when conditions are not specified, the dc bit of the dsr register is updated according to the specifications for the cs bits. the n, z, v, and gt bits of the dsr register are also updated. the dc, n, z, v, and gt bits are not updated when conditions are specified, even if the conditions are true. note: the bottom word of the destination register and the guard bits are ignored when the dc bit is updated. operation: { switch (ex2_sx) { case 0x0: dsp_alu_src1 = x0; break; case 0x1: dsp_alu_src1 = x1; break; case 0x2: dsp_alu_src1 = a0; break; case 0x3: dsp_alu_src1 = a1; break; } switch (ex2_sy) { case 0x0: dsp_alu_src2 = y0; 376 break; case 0x1: dsp_alu_src2 = y1; break; case 0x2: dsp_alu_src2 = m0; break; case 0x3: dsp_alu_src2 = m1; break; } dsp_alu_dst_hw = dsp_alu_src1_hw | dsp_alu_src2_hw; if(dsp_unconditional_update) { /* unconditional operation */ dsp_reg_wd[ex2_dz_no*2] = dsp_alu_dst_hw; dsp_reg_wd[ex2_dz_no*2+1] = 0x0; /* clear lsw */ if (ex2_dz_no==0) a0g = 0x0; /* clear guard bits */ else if (ex2_dz_no==1) a1g = 0x0; carry_bit = 0x0; negative_bit = dsp_alu_dst_msb; zero_bit = (dsp_alu_dst_hw==0); overflow_bit = 0x0; #include "logical_dc_bit.c" } else if(dsp_condition_match) { /* conditional operation and match */ dsp_reg_wd[ex2_dz_no*2] = dsp_alu_dst_hw; dsp_reg_wd[ex2_dz_no*2+1] = 0x0; /* clear lsw */ if (ex2_dz_no==0) a0g = 0x0; /* clear guard bits */ else if (ex2_dz_no==1) a1g = 0x0; } } break; 377 example : por x0,y0,a0 nopx nopy ; before execution: x0 = h'33333333, y0 = h'55555555 a0 = h'123456789a ; after execution: x0 = h'33333333, y0 = h'55555555 a0 = h'127777789a in case of unconditional execution, the dc bit is updated depending on the state of cs [2:0]. 378 8.5.16 prnd (rounding): dsp arithmetic operation instruction format abstract code cycle dc bit prnd sx,dz sx + h'00008000 ? dz clear lsw of dz 111110********** 10011000xx00zzzz 1 update prnd sy,dz sy + h'00008000 ? dz clear lsw of dz 111110********** 1011100000yyzzzz 1 update description: does rounding. adds the immediate data h'00008000 to the contents of the sx and sy operands, stores the result in the upper word of the dz operand, and clears the bottom word of dz with zeros. the dc bit of the dsr register is updated according to the specifications for the cs bits. the n, z, v, and gt bits of the dsr register are also updated. operation: { dsp_alu_src2 = 0x00008000; dsp_alu_src2g= 0x0; if (ex2_dsp_bit13==0) { /* sx + h'00008000 -> dz; clr dz lw */ switch (ex2_sx) { case 0x0: dsp_alu_src1 = x0; if (dsp_alu_src1_msb) dsp_alu_src1g = 0xff; else dsp_alu_src1g = 0x0; break; case 0x1: dsp_alu_src1 = x1; if (dsp_alu_src1_msb) dsp_alu_src1g = 0xff; else dsp_alu_src1g = 0x0; break; case 0x2: dsp_alu_src1 = a0; dsp_alu_src1g = a0g; break; case 0x3: dsp_alu_src1 = a1; dsp_alu_src1g = a1g; break; } } 379 else { /* sy + h'00008000 -> dz; clr dz lw */ switch (ex2_sy) { case 0x0: dsp_alu_src1 = y0; break; case 0x1: dsp_alu_src1 = y1; break; case 0x2: dsp_alu_src1 = m0; break; case 0x3: dsp_alu_src1 = m1; break; } if (dsp_alu_src1_msb) dsp_alu_src1g = 0xff; else dsp_alu_src1g = 0x0; } dsp_alu_dst = (dsp_alu_src1 + dsp_alu_src2) & maskffff0000; carry_bit = ((dsp_alu_src1_msb | dsp_alu_src2_msb) & !dsp_alu _dst_msb) | (dsp_alu_src1_msb & dsp_alu_src2_msb); dsp_alu_dstg_lsb8 = dsp_alu_src1g_lsb8 + dsp_alu_src2g_lsb8 + carry_bit; overflow_bit= plus_op_g_ov || !(pos_not_ov || neg_not_ov); #include "fixed_pt_overflow_protection.c" #include "fixed_pt_unconditional_update.c" #include "fixed_pt_plus_dc_bit.c" } break; 380 example : prnd x0,m0 nopx nopy ; before execution: x0 = h'0052330f, m0 = h'12345678 ; after execution: x0 = h'0052330f, m0 = h'00520000 prnd x1,x1 nopx nopy ; before execution: x1 = h'fc34c087 ; after execution: x1 = h'fc350000 dc bit is updated depending on the state of cs [2:0]. 381 8.5.17 [if cc] psha (shift arithmetically with condition): dsp arithmetic shift instruction format abstract code cycle dc bit psha sx,sy,dz if sy > = 0, sx << sy ? dz if sy < 0, sx >> syC > dz 111110********** 10010001xxyyzzzz 1 update dct psha sx,sy,dz if dc = 1 & sy > = 0, sx << sy ? dz if dc = 1 & sy < 0, sx >> sy ? dz if dc = 0, nop 111110********** 10010010xxyyzzzz 1 update dcf psha sx,sy,dz if dc = 0 & sy > = 0, sx << syC > dz if dc = 0 & sy < 0, sx >> sy ? dz if dc = 1, nop 111110********** 10010011xxyyzzzz 1 psha #imm,dz if imm > = 0, dz << imm ? dz if imm < 0, dz >> imm ? dz 111110********** 00010iiiiiiizzzz 1 description: arithmetically shifts the contents of the sx or dz operand and stores the result in the dz operand. the amount of the shift is specified by the sy operand or the immediate value imm operand. when the shift amount is positive, it shifts left. when the shift amount is negative, it shifts right. when conditions are specified for dct and dcf, the instruction is executed when those conditions are true. when they are false, the instruction is not executed. when conditions are not specified, the dc bit of the dsr register is updated according to the specifications for the cs bits. the n, z, v, and gt bits of the dsr register are also updated. the dc, n, z, v, and gt bits are not updated when conditions are specified, even if the conditions are true. operation: < when register operand is used > { switch (ex2_sx) { case 0x0: dsp_alu_src1 = x0; if (dsp_alu_src1_msb) dsp_alu_src1g = 0xff; else dsp_alu_src1g = 0x0; break; 382 case 0x1: dsp_alu_src1 = x1; if (dsp_alu_src1_msb) dsp_alu_src1g = 0xff; else dsp_alu_src1g = 0x0; break; case 0x2: dsp_alu_src1 = a0; dsp_alu_src1g = a0g; break; case 0x3: dsp_alu_src1 = a1; dsp_alu_src1g = a1g; break; } switch (ex2_sy) { case 0x0: dsp_alu_src2 = y0 & mask007f0000; break; case 0x1: dsp_alu_src2 = y1 & mask007f0000; break; case 0x2: dsp_alu_src2 = m0 & mask007f0000; break; case 0x3: dsp_alu_src2 = m1 & mask007f0000; break; } if (dsp_alu_src2_msb) dsp_alu_src2g = 0xff; else dsp_alu_src2g = 0x0; if((dsp_alu_src2_hw & mask0040)==0) { /* left shift 0<=cnt<=32 */ char cnt = (dsp_alu_src2_hw & mask003f); if(cnt > 32) { printf("npsha sz,sy,dz error! shift %2x exceed range. n",cnt); exit(); } dsp_alu_dst = dsp_alu_src1 << cnt; dsp_alu_dstg = ((dsp_alu_src1g << cnt) | (dsp_alu_src1 >> (32-cnt))) & mask000000ff; carry_bit = ((dsp_alu_dstg & mask00000001)==0x1); } 383 else { /* right shift 0< cnt <=32 */ char cnt = ((~dsp_alu_src2_hw & mask003f)+1); if(cnt > 32) { printf("npsha sz,sy,dz error! shift -%2x exceed range.n",cnt); exit(); } if((cnt>8) && dsp_alu_src1g_bit7) { /* msb copy */ dsp_alu_dst=((dsp_alu_src1>>8) | (dsp_alu_src1g<< (32-8))); dsp_alu_dst=(long) dsp_alu_dst >> (cnt-8); } else { dsp_alu_dst=((dsp_alu_src1>>cnt)|(dsp_alu_src1g<< (32-cnt))); } dsp_alu_dstg_lsb8 = (char) dsp_alu_src1g_lsb8 >> cnt-- ; carry_bit = (((dsp_alu_src1 >> cnt) & mask00000001)==0x1); } /* overflow_bit = !(pos_not_ov || neg_not_ov); /* do overflow detection */ /* #include "fixed_pt_overflow_protection.c" /* do overflow protection; v=0 */ if(dsp_unconditional_update) { /* unconditional operation */ #include "fixed_pt_unconditional_update.c" #include "shift_dc_bit.c" } else if(dsp_condition_match) { /* conditional operation and match */ dsp_reg[ex2_dz_no] = dsp_alu_dst; if(ex2_dz_no==0) { a0g = dsp_alu_dstg & mask000000ff; if(dsp_alu_dstg_bit7) a0g = a0g | maskffffff00; } else if(ex2_dz_no==1) { a1g = dsp_alu_dstg & mask000000ff; if(dsp_alu_dstg_bit7) a1g = a1g | maskffffff00; } } 384 } break; 385 exit(); } if((cnt>8) && dsp_alu_src1g_bit7) { /* msb copy */ dsp_alu_dst=((dsp_alu_src1>>8) | (dsp_alu_src1g<< (32-8))); dsp_alu_dst=(long) dsp_alu_dst >> (cnt-8); } else { dsp_alu_dst=((dsp_alu_src1>>cnt)|(dsp_alu_src1g<< (32-cnt))); } dsp_alu_dstg_lsb8 = (char) dsp_alu_src1g_lsb8 >> cnt- -; carry_bit = (((dsp_alu_src1 >> cnt) & mask00000001)==0x1); } /* overflow_bit = !(pos_not_ov || neg_not_ov); /* do overflow detection */ /* #include "fixed_pt_overflow_protection.c" /* do overflow protection; v=0 */ { /* unconditional operation */ #include "fixed_pt_unconditional_update.c" #include "shift_dc_bit.c" } } break; 386 examples: psha x0,y0,a0 nopx nopy ; before execution: x0 = h'88888888, y0 = h'00020000, a0 = h'123456789a ; after execution: x0 = h'88888888, y0 = h'00020000, a0 = h'fe22222222 psha x0,y0,x0 nopx nopy ; before execution: x0 = h'33333333, y0 = h'ffff0000 ; after execution: x0 = h'19999999, y0 = h'fffe0000 psha #-5,a1 nopx nopy ; before execution: a1 = h'aaaaaaaaaa ; after execution: a1 = h'fd55555555 in case of unconditional execution, the dc bit is updated depending on the state of cs [2:0]. 387 8.5.18 [if cc] pshl (shift logically with condition): dsp logical shift instruction format abstract code cycle dc bit pshl sx,sy,dz if sy 3 0, sx< 388 case 0x2: dsp_alu_src1 = a0; break; case 0x3: dsp_alu_src1 = a1; break; } switch (ex2_sy) { case 0x0: dsp_alu_src2 = y0 & mask003f0000; break; case 0x1: dsp_alu_src2 = y1 & mask003f0000; break; case 0x2: dsp_alu_src2 = m0 & mask003f0000; break; case 0x3: dsp_alu_src2 = m1 & mask003f0000; break; } if((dsp_alu_src2_hw & mask0020)==0) { /* left shift 0<=cnt<=16 */ char cnt = (dsp_alu_src2_hw & mask001f); if(cnt > 16) { printf("pshl sx,sy,dz error! shift %2x exceed range n",cnt); exit(); } dsp_alu_dst_hw = dsp_alu_src1_hw << cnt--; carry_bit = (((dsp_alu_src1_hw << cnt) & mask8000)== 0x8000); } else { /* right shift 0 389 if(dsp_unconditional_update) { /* unconditional operation */ dsp_reg_wd[ex2_dz_no*2] = dsp_alu_dst_hw; dsp_reg_wd[ex2_dz_no*2+1] = 0x0; /* clear lsw */ if (ex2_dz_no==0) a0g = 0x0; /* clear guard bits */ else if (ex2_dz_no==1) a1g = 0x0; negative_bit = dsp_alu_dst_msb; zero_bit = (dsp_alu_dst_hw==0); overflow_bit = 0x0; #include "shift_dc_bit.c" } else if(dsp_condition_match) { /* conditional operation and match */ dsp_reg_wd[ex2_dz_no*2] = dsp_alu_dst_hw; dsp_reg_wd[ex2_dz_no*2+1] = 0x0; /* clear lsw */ if (ex2_dz_no==0) a0g = 0x0; /* clear guard bits */ else if (ex2_dz_no==1) a1g = 0x0; } } break; 390 char cnt = (tmp_imm & mask001f); if(cnt > 16) { printf("pshl dz,#imm,dz error! #imm=%6x exceed range n",tmp_imm); exit(); } dsp_alu_dst_hw = dsp_alu_src1_hw << cnt--; carry_bit = (((dsp_alu_src1_hw << cnt) & mask8000)== 0x8000); } else { /* right shift 0< cnt <=16 */ char cnt = ((~tmp_imm & mask001f)+1); if(cnt > 16) { printf("pshl dz,#imm,dz error! #imm=%6x exceed range n",tmp_imm); exit(); } dsp_alu_dst_hw = dsp_alu_src1_hw >> cnt--; carry_bit = (((dsp_alu_src1_hw >> cnt) & mask0001)==0x1); } { /* unconditional operation */ dsp_reg_wd[ex2_dz_no*2] = dsp_alu_dst_hw; dsp_reg_wd[ex2_dz_no*2+1] = 0x0; /* clear lsw */ if (ex2_dz_no==0) a0g = 0x0; /* clear guard bits */ else if (ex2_dz_no==1) a1g = 0x0; negative_bit = dsp_alu_dst_msb; zero_bit = (dsp_alu_dst_hw==0); overflow_bit = 0x0; #include "shift_dc_bit.c" } } break; 391 examples: pshl x0,y0,a0 nopx nopy ; before execution: x0 = h'22222222, y0 = h'00030000, a0 = h'123456789a ; after execution: x0 = h'22222222, y0 = h'00030000 , a0 = h'0011100000 pshl x1,y1,x1 nopx nopy ; before execution: x1 = h'cccccccc, y1 = h'fffe0000 ; after execution: x1 = h'33330000, y1 = h'fffe0000 pshl #7,a1 nopx nopy ; before execution: a1 = h'55555555 ; after execution: a1 = h'aa800000 in case of unconditional execution, the dc bit is updated depending on the state of cs [2:0]. 392 8.5.19 [if cc] psts (store system register): dsp system control instruction format abstract code cycle dc bit psts mach,dz mach ? dz 111110********** 110011010000zzzz 1 psts macl,dz macl ? dz 111110********** 110111010000zzzz 1 dct psts mach,dz if dc = 1, mach ? dz if 0, nop 111110********** 110011100000zzzz 1 dct psts macl,dz if dc = 1, macl ? dz if 0, nop 111110********** 110111100000zzzz 1 dcf psts mach,dz if dc = 0, mach ? dz if 1, nop 111110********** 110011110000zzzz 1 dcf psts macl,dz if dc = 0, macl ? dz if 1, nop 111110********** 110111110000zzzz 1 description: stores the contents of the mach and macl registers in the dz operand. when conditions are specified for dct and dcf, the instruction is executed when those conditions are true. when they are false, the instruction is not executed. the dc, n, z, v, and gt bits of the dsr register are not updated. note: though psts, movx and movy can be designated in parallel, their execution may take 2 cycles. operation: /* mach -> dz */ { if(dsp_unconditional_update) { /* unconditional operation */ dsp_reg[ex2_dz_no] = mach; if(ex2_dz_no==0) { a0g = dsp_alu_dstg & mask000000ff; if(dsp_alu_dstg_bit7) a0g = a0g | maskffffff00; } else if(ex2_dz_no==1) { a1g = dsp_alu_dstg & mask000000ff; if(dsp_alu_dstg_bit7) a1g = a1g | maskffffff00; } } 393 else if(dsp_condition_match) { /* conditional operation and match */ dsp_reg[ex2_dz_no] = mach; if(ex2_dz_no==0) { a0g = dsp_alu_dstg & mask000000ff; if(dsp_alu_dstg_bit7) a0g = a0g | maskffffff00; } else if(ex2_dz_no==1) { a1g = dsp_alu_dstg & mask000000ff; if(dsp_alu_dstg_bit7) a1g = a1g | maskffffff00; } } } break; /* macl -> dz */ { if(dsp_unconditional_update) { /* unconditional operation */ dsp_reg[ex2_dz_no] = macl; if(ex2_dz_no==0) { a0g = dsp_alu_dstg & mask000000ff; if(dsp_alu_dstg_bit7) a0g = a0g | maskffffff00; } else if(ex2_dz_no==1) { a1g = dsp_alu_dstg & mask000000ff; if(dsp_alu_dstg_bit7) a1g = a1g | maskffffff00; } } else if(dsp_condition_match) { /* conditional operation and match */ dsp_reg[ex2_dz_no] = macl; if(ex2_dz_no==0) { a0g = dsp_alu_dstg & mask000000ff; if(dsp_alu_dstg_bit7) a0g = a0g | maskffffff00; } else if(ex2_dz_no==1) { a1g = dsp_alu_dstg & mask000000ff; if(dsp_alu_dstg_bit7) a1g = a1g | maskffffff00; 394 } } } break; examples: psts mach,a0 nopx nopy ; before execution: a0 = h'123456789a, mach = h'88888888 ; after execution: a0 = h'ff88888888, mach = h'88888888 395 8.5.20 [if cc]psub (subtract with condition): dsp arithmetic operation instruction format abstract code cycle dc bit psub sx,sy,dz sx C sy ? dz 111110********** 10100001xxyyzzzz 1 update dct psub sx,sy,dz if dc = 1, sx C sy ? dz if 0, nop 111110********** 10100010xxyyzzzz 1 dcf psub sx,sy,dz if dc = 0, sx C sy ? dz if 1, nop 111110********** 10100011xxyyzzzz 1 description: subtracts the contents of the sy operand from the sx operand and stores the result in the dz operand. when conditions are specified for dct and dcf, the instruction is executed when those conditions are true. when they are false, the instruction is not executed. when conditions are not specified, the dc bit of the dsr register is updated according to the specifications for the cs bits. the n, z, v, and gt bits of the dsr register are updated. the dc, n, z, v, and gt bits are not updated when conditions are specified, even if the conditions are true. operation: { switch (ex2_sx) { case 0x0: dsp_alu_src1 = x0; if (dsp_alu_src1_msb) dsp_alu_src1g = 0xff; else dsp_alu_src1g = 0x0; break; case 0x1: dsp_alu_src1 = x1; if (dsp_alu_src1_msb) dsp_alu_src1g = 0xff; else dsp_alu_src1g = 0x0; break; case 0x2: dsp_alu_src1 = a0; dsp_alu_src1g = a0g; break; case 0x3: dsp_alu_src1 = a1; dsp_alu_src1g = a1g; break; } switch (ex2_sy) { 396 case 0x0: dsp_alu_src2 = y0; break; case 0x1: dsp_alu_src2 = y1; break; case 0x2: dsp_alu_src2 = m0; break; case 0x3: dsp_alu_src2 = m1; break; } if (dsp_alu_src2_msb) dsp_alu_src2g = 0xff; else dsp_alu_src2g = 0x0; dsp_alu_dst = dsp_alu_src1 - dsp_alu_src2; carry_bit =((dsp_alu_src1_msb | !dsp_alu_src2_msb) && !dsp_alu_dst_msb) | (dsp_alu_src1_msb & !dsp_alu_src2_msb); borrow_bit = !carry_bit; dsp_alu_dstg_lsb8 = dsp_alu_src1g_lsb8 - dsp_alu_src2g_lsb8 - borrow_bit; overflow_bit= minus_op_g_ov || !(pos_not_ov || neg_not_ov); #include "fixed_pt_overflow_protection.c" if(dsp_unconditional_update) { /* unconditional operation */ #include "fixed_pt_unconditional_update.c" #include "fixed_pt_minus_dc_bit.c" } else if(dsp_condition_match) { /* conditional operation and match */ dsp_reg[ex2_dz_no] = dsp_alu_dst; if(ex2_dz_no==0) { a0g = dsp_alu_dstg & mask000000ff; if(dsp_alu_dstg_bit7) a0g = a0g | maskffffff00; } else if(ex2_dz_no==1) { a1g = dsp_alu_dstg & mask000000ff; if(dsp_alu_dstg_bit7) a1g = a1g | maskffffff00; } } } break; 397 examples: psub x0,y0,a0 nopx nopy ; before execution: x0 = h'55555555, y0 = h'33333333, a0 = h'123456789a ; after execution: x0 = h'55555555, y0 = h'33333333, a0 = h'0022222222 in case of unconditional execution, the dc bit is updated depending on the state of cs [2:0]. 398 8.5.21 psub pmuls (subtraction & multiply signed by signed): dsp arithmetic operation instruction format abstract code cycle dc bit psub sx,sy,du sx C sy ? du 111110********** 1 update pmuls se,sf,dg msw of se msw of sf ? dg 0110eeffxxyygguu description: subtracts the contents of the sy operand from the sx operand and stores the result in the du operand. the contents of the top word of the se and sf operands are multiplied as signed and the result stored in the dg operand. these two processes are executed simultaneously in parallel. the dc bit of the dsr register is updated according to the results of the alu operation and the specifications for the cs bits. the n, z, v, and gt bits of the dsr register are also updated according to the results of the alu operation. operation: { dsp_alu_dst = dsp_alu_src1 - dsp_alu_src2; carry_bit=((dsp_alu_src1_msb | !dsp_alu_src2_msb)&& !dsp_alu_dst_msb)| (dsp_alu_src1_msb & !dsp_alu_src2_msb); borrow_bit = !carry_bit; dsp_alu_dstg_lsb8=dsp_alu_src1g_lsb8 - dsp_alu_src2g_lsb8 - borrow_bit; overflow_bit= minus_op_g_ov || !(pos_not_ov || neg_not_ov); #include "../d_3operand.d/fixed_pt_overflow_protection.c" switch (ex2_du) { case 0x0: x0 = dsp_alu_dst; negative_bit = dsp_alu_dst_msb; zero_bit = (dsp_alu_dst==0); break; case 0x1: y0 = dsp_alu_dst; negative_bit = dsp_alu_dst_msb; zero_bit = (dsp_alu_dst==0); 399 break; case 0x2: a0 = dsp_alu_dst; a0g = dsp_alu_dstg & mask000000ff; if(dsp_alu_dstg_bit7) a0g = a0g | maskffffff00; negative_bit = dsp_alu_dstg_bit7; zero_bit = (dsp_alu_dst==0) & (dsp_alu_dstg_ lsb8==0); break; case 0x3: a1 = dsp_alu_dst; a1g = dsp_alu_dstg & mask000000ff; if(dsp_alu_dstg_bit7) a1g = a1g | maskffffff00; negative_bit = dsp_alu_dstg_bit7; zero_bit = (dsp_alu_dst==0) & (dsp_alu_dstg _lsb8==0); break; } #include "../d_3operand.d/fixed_pt_minus_dc_bit.c" } break; examples: psub a0,m0,a0 pmuls x0,y0, m0 nopx nopy ; before execution: x0 = h'00020000, y0 = h'fffe0000, m0 = h'33333333, a0 = h'0022222222 ; after execution: x0 = h'00020000, y0 = h'fffe0000, m0 = h'fffffff8, a0 = h'55555555 400 8.5.22 psubc (subtraction with carry): dsp arithmetic operation instruction format abstract code cycle dc bit psubc sx,sy,dz sx C sy C dc ? dz 111110********** 10100000xxyyzzzz 1 borrow description: subtracts the contents of the sy operand and the dc bit from the sx operand and stores the result in the dz operand. the dc bit of the dsr register is updated as the borrow flag. the n, z, v, and gt bits of the dsr register are also updated. note: after the psubc instruction is executed, the dc bit is updated as the borrow flag without regard to the cs bit. operation: { switch (ex2_sx) { case 0x0: dsp_alu_src1 = x0; if (dsp_alu_src1_msb) dsp_alu_src1g = 0xff; else dsp_alu_src1g = 0x0; break; case 0x1: dsp_alu_src1 = x1; if (dsp_alu_src1_msb) dsp_alu_src1g = 0xff; else dsp_alu_src1g = 0x0; break; case 0x2: dsp_alu_src1 = a0; dsp_alu_src1g = a0g; break; case 0x3: dsp_alu_src1 = a1; dsp_alu_src1g = a1g; break; } switch (ex2_sy) { case 0x0: dsp_alu_src2 = y0; break; case 0x1: dsp_alu_src2 = y1; break; case 0x2: dsp_alu_src2 = m0; break; 401 case 0x3: dsp_alu_src2 = m1; break; } if (dsp_alu_src2_msb) dsp_alu_src2g = 0xff; else dsp_alu_src2g = 0x0; dsp_alu_dst = dsp_alu_src1 - dsp_alu_src2 - dspdcbit; carry_bit =((dsp_alu_src1_msb | !dsp_alu_src2_msb) && !dsp_alu _dst_msb) | (dsp_alu_src1_msb & !dsp_alu_src2_msb); borrow_bit = !carry_bit; dsp_alu_dstg_lsb8 = dsp_alu_src1g_lsb8 - dsp_alu_src2g_lsb8 - borrow_bit; overflow_bit= minus_op_g_ov || !(pos_not_ov || neg_not_ov); #include "fixed_pt_overflow_protection.c" #include "fixed_pt_unconditional_update.c" #include "fixed_pt_dc_always_borrow.c" } break; example : cs[2:0]=***: always carry or borrow mode psubc x0,y0,m0 nopx nopy ; before execution: x0 = h'33333333, y0 = h'55555555 m0 = h'0012345678, dc = 0 ; after execution: x0 = h'33333333, y0 = h'55555555 m0 = h'ffddddddde, dc = 1 psubc x0,y0,m0 nopx nopy ; before execution: x0 = h'33333333, y0 = h'55555555 m0 = h'0012345678, dc = 1 ; after execution: x0 = h'33333333, y0 = h'55555555 m0 = h'ffdddddddd, dc = 1 402 8.5.23 [if cc] pxor (logical exclusive or): dsp logical operation instruction format abstract code cycle dc bit pxor sx,sy,dz sx ^ sy ? dz, clear lsw of dz 111110********** 10100101xxyyzzzz 1 update dct pxor sx,sy,dz if dc = 1, sx^sy ? dz, clear lsw of dz; if 0, nop 111110********** 10100110xxyyzzzz 1 dcf pxor sx,sy,dz if dc = 0, sx^sy ? dz clear lsw of dz; if 1, nop 111110********** 10100111xxyyzzzz 1 description: takes the exclusive or of the top word of the sx operand and the top word of the sy operand, stores the result in the top word of the dz operand, and clears the bottom word of dz with zeros. when dz is a register that has guard bits, the guard bits are also zeroed. when conditions are specified for dct and dcf, the instruction is executed when those conditions are true. when they are false, the instruction is not executed. when conditions are not specified, the dc bit of the dsr register is updated according to the specifications for the cs bits. the n, z, v, and gt bits of the dsr register are also updated. the dc, n, z, v, and gt bits are not updated when conditions are specified, even if the conditions are true. note: the bottom word of the destination register and the guard bits are ignored when the dc bit is updated. operation: { switch (ex2_sx) { case 0x0: dsp_alu_src1 = x0; break; case 0x1: dsp_alu_src1 = x1; break; case 0x2: dsp_alu_src1 = a0; break; case 0x3: dsp_alu_src1 = a1; break; } switch (ex2_sy) { case 0x0: dsp_alu_src2 = y0; break; 403 case 0x1: dsp_alu_src2 = y1; break; case 0x2: dsp_alu_src2 = m0; break; case 0x3: dsp_alu_src2 = m1; break; } dsp_alu_dst_hw = dsp_alu_src1_hw ^ dsp_alu_src2_hw; if(dsp_unconditional_update) { /* unconditional operation */ dsp_reg_wd[ex2_dz_no*2] = dsp_alu_dst_hw; dsp_reg_wd[ex2_dz_no*2+1] = 0x0; /* clear lsw */ if (ex2_dz_no==0) a0g = 0x0; /* clear guard bits */ else if (ex2_dz_no==1) a1g = 0x0; carry_bit = 0x0; negative_bit = dsp_alu_dst_msb; zero_bit = (dsp_alu_dst_hw==0); overflow_bit = 0x0; #include "logical_dc_bit.c" } else if(dsp_condition_match) { /* conditional operation and match */ dsp_reg_wd[ex2_dz_no*2] = dsp_alu_dst_hw; dsp_reg_wd[ex2_dz_no*2+1] = 0x0; /* clear lsw */ if (ex2_dz_no==0) a0g = 0x0; /* clear guard bits */ else if (ex2_dz_no==1) a1g = 0x0; } } break; 404 example: pxor x0,y0,a0 nopx nopy ; before execution: x0 = h'33333333, y0 = h'55555555 a0 = h'123456789a ; after execution: x0 = h'33333333, y0 = h'55555555 a0 = h'0066660000 in case of unconditional execution, the dc bit is updated depending on the state of cs [2:0]. 405 section 9 processing states 9.1 state transitions the cpu has five processing states: reset, exception processing, bus release, program execution and power-down. the transitions between the states are shown in figure 9-1. from any state except when power-on reset * from any state when manual reset * manual reset * reset = 1, or resetp = 1 reset = 1, or resetm = 1 power-on reset * power-on reset state manual reset state module standby function note: sh-3 (sh7702, sh7707, sh7708), sh-3e: power-on reset: reset = 0, breq = 1 manual reset: reset = 0, breq = 0 sh-3 (sh7709), sh3-dsp: power-on reset: resetp = 0 manual reset: resetm = 0 bus-released state sleep mode standby mode exception-handling state hardware standby function interrupt bus request bus request clearance exception interrupt end of exception transition processing bus request bus request clearance sleep instruction with stby bit set interrupt reset state power-down state sleep instruction with stby bit cleared figure 9-1 transitions between processing states 406 9.1.1 reset state in the reset state, the cpu is reset. on the sh-3 (sh7702, sh7707, sh7708) and sh-3e, this occurs when the reset pin goes low. when the breq pin is high, the result is a power-on reset; when it is low, a manual reset occurs. on the sh-3 (sh7709) and sh3-dsp, a power-on reset occurs when the resetp pin is low, and a manual reset occurs when the resetm pin is low. 9.1.2 exception processing state the exception processing state is a transient state that occurs when the cpus processing state flow is altered by exception processing sources such as resets, general exceptions, or interrupts. for a reset, the cpu branches to h'a0000000 and starts executing the user-created exception process program. for a general exception or interrupt, the program counter (pc) is saved in the save program counter (spc), and the status register (sr) is saved in the save status register (ssr). the cpu then branches to the starting address of the user-created exception service routine by adding the content of the vector base address and the vector offset, thereby starting program execution state. 9.1.3 program execution state in the program execution state, the cpu sequentially executes the program. 9.1.4 power-down state in the power-down state, the cpu operation halts and power consumption declines. the sleep instruction places the cpu in the power-down state. this state has four modes and function: sleep mode, standby mode, hardware standby mode, and module standby function. see section 9.2 for more details. 9.1.5 bus release state in the bus release state, the cpu releases access rights to the bus to the device that has requested them. 9.2 power-down state in addition to the ordinary program execution states, the cpu also has a power-down state in which cpu operation halts and power consumption is lowered (table 9-1). there are four power- down state modes and function: sleep mode, standby mode, hardware standby mode, and module standby function. 9.2.1 sleep mode when the standby bit (stby) of the standby control register (stbcr) is cleared to 0 and the sleep instruction executed, the cpu enters the sleep mode. in sleep mode, the cpu halts but the 407 contents of the cpu and cache registers are maintained. operation of the on-chip peripheral modules continues. returning from the sleep mode is accomplished using a reset or an interrupt. the cpu first enters the exception processing mode and then makes the transition to the normal program execution mode. 9.2.2 standby mode when the standby bit (stby) of the standby control register (stbcr) is set to 1 and the sleep instruction executed, the cpu enters the standby mode. in standby mode, the functioning of the cpu, the on-chip peripheral modules, and oscillator halt. however, the contents of the cpu and cache registers are maintained. returning from the standby mode is accomplished using a reset or an interrupt. if a reset is used, the cpu enters the exception processing mode after the oscillator stabilization time has elapsed and then makes the transition to the normal program execution mode. if an interrupt is used, the cpu enters the exception processing mode after the oscillator stabilization time set in wdt has elapsed and then makes the transition to the normal program execution mode. in this mode, power consumption drops markedly, since the oscillator stops. 9.2.3 hardware standby mode the cpu enters the hardware standby mode when the ca pin is set to low level. as with the standby modes initiated using the sleep command, the hardware standby mode, all modules other than those which function using the rtc clock halt. 9.2.4 module standby function the timer (tmu), real-time clock (rtc), and serial communication interface (sci) each have a module standby function. when the module stop bit of the standby control register (stbcr) is set to 1, the supply of the clock to the corresponding modules is halted. this function can be used to reduce power consumption both in the normal program execution mode and in the sleep mode. when the module standby function is being used, the status of the external pins of the on-chip peripheral modules differs depending on the module. the external pins of the tmu maintain their status prior to standby. the external pins of the sci are reset. to cancel the module standby function, either clear the mstp bits to 0 or perform a reset. 408 table 9-1 power-down state state mode entering procedure oscil- lator cpu cpu reg- ister on-chip memory on-chip peripheral modules pins external memory canceling procedure sleep mode execute sleep instruction when stby bit of stbcr is cleared to 0 run halt held held run held refresh 1. interrupt 2. reset standby mode execute sleep instruction with stby bit set to 1 in stbcr halt halt held held halt * held self- refresh 1. interrupt 2. reset hardware standby mode set ca pin to low level halt halt held held halt * held self- refresh module standby function set mstp bit of stbcr to 1 run run held held specified module halts held refresh 1. set mstp bit to 0 2. reset note: * differs depending on the on-chip peripheral module. refer to the hardware manual for the sh-3, sh-3e, and sh3-dsp for details. 409 section 10 pipeline operation this section describes the operation of the pipelines for each instruction. this information is provided to allow calculation of the required number of cpu instruction execution states (system clock cycles). 10.1 basic configuration of pipelines 10.1.1 five-stage pipeline pipelines are composed of the following five stages: ? if (instruction fetch) fetches instruction from the memory stored in the program. ? id (instruction decode) decodes the instruction fetched. ? ex (instruction execution) does data operations and address calculations according to the results of decoding. ? ma (memory access) accesses data in memory in conjunction with instructions that involve memory access. for instructions that do not involve memory access, the resulting data is maintained as is and ma is expressed in lowercase letters as "ma". ? wb (write back) returns the results of the memory access (data) to a register in conjunction with instructions that involve memory access. for instructions that do not involve memory access, the data maintained in the ma stage is returned to the register. instructions are executed using a pipeline consisting of five stages. the various instruction stages flow with the execution of the instructions and form this pipeline. this means that at any given moment, five instructions are being executed simultaneously. the basic flow of the pipeline is shown in figure 10-1. each period during which a single stage is executed is called a slot and is indicated using the ? symbol. all instructions have at least three stages: if, id, and ex. some also have stages ma and wb. also, the way the pipeline flows varies with the type of instruction, with some containing two ma stages, some including access to the multiplier (mm), and so on. there can also be contention, for example, between if and ma. if contention occurs, the flow of the pipeline changes. 410 instruction 1 instruction 2 instruction 3 instruction 4 instruction 5 instruction 6 if id if ex id if ma ex id if wb ma ex id if wb ma ex id if wb ma ex id wb ma ex wb ma wb time slot instruction stream figure 10-1 basic structure of pipeline flow 10.1.2 slot and pipeline flow the time period in which a single stage operates is called a slot. slots must follow the rules described below. instruction execution each stage (if, id, ex, ma, wb) of an instruction must be executed in one slot. two or more stages cannot be executed within one slot (figure 10-2), with exception of wb and ma. instruction 1 instruction 2 if id if ex id ex ma wb slot note: * id and ex of instruction 1 are being executed in the same slot. figure 10-2 impossible pipeline flow 1 slot sharing a maximum of one stage from another instruction may be set per slot, and that stage must be different from the stage of the first instruction. identical stages from two different instructions may never be executed within the same slot (figure 10-3). 411 instruction 1 instruction 2 instruction 3 instruction 4 instruction 5 if id if ex id if ma ex id if wb ma ex id if wb ma ex id wb ma ex wb ma wb slot note: * same stage of another instruction is being executed in same slot. figure 10-3 impossible pipeline flow 2 10.1.3 number of cycles required for execution of one slot the number of states (system clock cycles) s for the execution of one slot is calculated with the following conditions: ? s = (the cycles of the stage with the highest number of cycles of all instruction stages contained in the slot) this means that the instruction with the longest stage stalls others with shorter stages. ? the number of execution cycles for each stage: if the number of memory access cycles for instruction fetch id always one cycle ex always one cycle ma the number of memory access cycles for data access wb always one cycle as an example, figure 10-4 shows the flow of a pipeline in which the if (memory access for instruction fetch) of instructions 1 and 2 are two cycles, the ma (memory access for data access) of instruction 1 is three cycles and all others are one cycle. the dashes indicate the instruction is being stalled. refer to the hardware manual for information on the number of clock cycles in each case. instruction 1 instruction 2 (2) if (2) id if if (1) ex id ma ex slot (3) ma (1) wb ma (1) wb ma if number of cycles figure 10-4 slots requiring multiple cycles 412 10.1.4 number of instruction execution cycles the number of instruction execution cycles is counted based on the interval between execution of ex stages. the number of cycles between the start of the ex stage for instruction 1 and the start of the ex stage for the following instruction (instruction 2) is the execution time for instruction 1. figure 10-5 shows an example of the way in which the number of instruction execution cycles is counted. in this example, the flow of the pipeline is such that the ex stage interval between instructions 1 and 2 is two cycles. therefore, the execution time for instruction 1 is two cycles. also, the ex stage interval between instructions 2 and 3 is three cycles, so the execution time for instruction 2 is three cycles. if a program ends with instruction 3, the execution time for instruction 3 would be calculated as the interval between the ex stage of instruction 3 and the ex stage of a hypothetical instruction 4 following instruction 3, using mov rm, rn. in this example, the execution time for instruction 3 is two cycles. the execution time for instructions 1 through 3 is therefore seven cycles (2 + 3 + 2 = 7). in this example, the ma of instruction 1 and the if of instruction 4 are in contention. for information on operation when ma and if are in contention, refer to section 10.2.1, contention between instruction fetch (if) and memory access (ma). instruction 1 instruction 2 instruction 3 (instruction 4 (2) if (2) id if if (2) ex ex id id if if number of cycles if ma ma ma wb wb wb ex if (1) (1) id (1) ma ex (4) : mov rm, rn ) slot ma figure 10-5 how instruction execution cycles are counted 413 10.2 contention contention occurs in the following seven situations. when contention occurs in a particular stage, that stage is stored and the next and subsequent slots are executed. (1) contention between instruction fetch (if) and memory access (ma) (2) contention caused by a memory load instruction (3) contention caused by an sr update instruction (4) contention caused by accessing the multiplier (5) fpu contention (sh-3e only) (6) contention between dsp data operation instruction and store instruction (sh3-dsp only) (7) contention between a transfer between dsp registers and a memory load or store operation (sh3-dsp only) 10.2.1 contention between instruction fetch (if) and memory access (ma) basic operation when if and ma are in contention the if and ma stages both access memory, so they cannot operate simultaneously. if the if and ma stages both try to access memory within the same slot, the if stage is stored and the next slot is executed. however, if contention with another ma stage occurs in the next slot, the if stage is again stored and the next slot is executed. figure 10-6 illustrates operation when if and ma are in contention. 414 slot instruction 1 instruction 2 instruction 3 instruction 4 instruction 5 a if b id if c ex id if d ma ex id if e wb ma ex id if : (when ma and ifare in contention, the following occurs:) f wb ma ex id g wb ma ex wb slot instruction 1 instruction 2 instruction 3 instruction 4 instruction 5 a if b id if c ex id if d ma ex id e wb ma ex if ? if stored at d f wb ma id if g wb ex id ma ex wb (a) when there is no subsequent ma stage slot instruction 1 instruction 2 instruction 3 instruction 4 instruction 5 a if b id if c ex id if d ma ex id if e wb ma ex id if ma of instruction 1 and if of instruction 4 contend at d ma of instruction 1 and if of instruction 4 contend at d : (when ma and ifare in contention, the following occurs:) f wb ma ex id g wb ma ex wb slot instruction 1 instruction 2 instruction 3 instruction 4 instruction 5 a if b id if c ex id if d ma ex e wb ma id ? id and if stored at d ? if stored at e f wb ex if g ma id if wb ex id ma ex wb (b) when there is a subsequent ma stage figure 10-6 operation when if and ma are in contention 415 the operation when there is contention between if and ma and no subsequent ma stage is shown in (a) of figure 10-6. if and ma are in contention in slot d. in this case, the if stage is stored and the following slot, e, is executed. in slot e ma and if are in contention, but the if stage is not stored because the ma stage does not generate a bus cycle. the operation when there is contention between if and ma and there is a subsequent ma stage is shown in (b) of figure 10-6. there are ma stages in slots d and e, and ma is in contention with if in slot d. in this case, the id and if of slot d are stored and then executed in slot e. however, contention between if and ma occurs again in slot e, so the if stage is stored again and then executed in the next slot, f. relationship between if and the location of instructions in memory when the instruction is located in memory, the superh microcomputer accesses the memory in 32-bit units. the superh microcomputer instructions are all fixed at 16 bits, so basically 2 instructions can be fetched in a single if stage access. whether an if fetches one or two instructions depends on the memory location (word or longword boundary). if an instruction is located on a longword boundary, an if can get two instructions at each instruction fetch. the if of the next instruction does not generate a bus cycle to fetch an instruction from memory. since the next instruction if also fetches two instructions, the instruction ifs after that do not generate a bus cycle either. this means that ifs of instructions that are located so they start from the longword boundaries within instructions located in memory (the position when the bottom two bits of the instruction address are 00 is a1 = 0 and a0 = 0) also fetch two instructions. the if of the next instruction does not generate a bus cycle. ifs that do not generate bus cycles are written in lower case as if. these ifs always take one cycle. when branching results in a fetch from an instruction located so it starts from the word boundaries (the position when the bottom two bits of the instruction address are 10 is a1 = 1, a0 = 0), the bus cycle of the if fetches only the specified instruction more than one of said instructions. the if of the next instruction thus generates a bus cycle, and fetches two instructions. figure 10-7 illustrates these operations. 416 instruction 2 ... instruction 3 instruction 4 ... instruction 5 if if ex id if ex id ex id ex ... instruction 1 id if id ex if slot instruction 6 instru- ction 1 instru- ction 2 instru- ction 3 instru- ction 4 instru- ction 5 instru- ction 6 id ex if if if bus cycle generated : no bus cycle 32 bits (memory) ... instruction 2 ... instruction 3 instruction 4 ... instruction 5 if ex if id ex id ex id if id ex if slot instruction 6 id ex if if if bus cycle generated : no bus cycle instru- ction 2 instru- ction 3 instru- ction 4 instru- ction 5 instru- ction 6 fetching from an instruction (instruction 1) located on a long word boundary fetching from an instruction (instruction 2) located on a word boundary : : figure 10-7 relationship between if and location of instructions in memory relationship between position of instructions located in memory and contention between if and ma when an instruction is located in memory, there are instruction fetch stages (if, written in lower case) that do not generate bus cycles as explained in section 10.4.2 above. when an if is in contention with an ma, the slot will not split, as it does when an if and an ma are in contention, because ifs and mas can be executed simultaneously. such slots execute in the number of cycles the ma requires for memory access, as illustrated in figure 10-8. when programming, avoid contention of ma and if whenever possible and pair mas with ifs to increase the instruction execution speed. instructions that have 4 (5)-stage pipelines of if, id, ex, ma, (wb) prevent stalls when they are located, so they start from the longword boundaries in memory (the position when the bottom 2 bits of instruction address are 00 is a1 = 0 and a0 = 0) because the ma of the instruction falls in the same slot as ifs that follow. 417 slot instruction 1 instruction 2 instruction 3 instruction 4 instruction 5 instruction 6 notes: 1. 2. ma in slot a is in contention with if, so no store occurs; ma in slot b is in contention with if, so a store occurs. in slot c ma and if are in contention, so no store occurs. if id if ex id if ma ex id if abc wb ma ex id wb ex id wb ma ex if wb ma id if if if ma ex wb instruction 3 instruction 1 instruction 5 instruction 4 instruction 2 instruction 6 : store : do not store 32 bits figure 10-8 relationship between the location of instructions in memory and contention between if and ma 10.2.2 effects of memory load instructions on pipelines instructions that involve loading from memory access data in memory at the ma stage of the pipeline. in the case of a load instruction (instruction 1) and the following instruction (instruction 2), the ex stage of instruction 2 starts before the ma stage of instruction 1 ends. when instruction 2 uses the same data that instruction 1 is loading, the contents of that register will not be ready, so any slot containing the ma of instruction and ex of instruction 2 will split. no split occurs, however, when instruction 2 is mac @rm+,@rn+ and the destinations of rm and load instruction 1 were the same. the number of cycles in the slot generated by the split is the number of ma cycles plus the number of if (or if) cycles, as illustrated in figure 10-9. this means the execution speed will be lowered if the instruction that will use the results of the load instruction is placed immediately after the load instruction. the instruction that uses the result of the load instruction will not slow down the program if placed one or more instructions after the load instruction. slot load instruction 1 (mov.w @r0,r1) instruction 2 instruction 3 instruction 4 if id if ex id if wb ex id if ma ex id wb ma (add r1,r2) figure 10-9 effects of memory load instructions on the pipeline 418 10.2.3 contention due to sr update instructions instructions (sr update instructions) that overwrite the m, q, s, and t bits of the status register (sr) use the wb stage of the pipeline. if an instruction (instruction 2) that reads sr comes immediately after such an instruction, the data to be read is not yet ready and the ex stage of instruction 2 is stalled until the overwriting of the data in sr is complete. however, in the case of instructions that overwrite all the bits of sr, such as ldc rm,sr; ldc.l@rm+,sr; or rte, no stall occurs due to the contention. the instructions that reads sr are stc sr,rn; stc.l sr,@- rn; and trapa. the status of the pipeline when a stall occurs is shown in figure 10-10. as the above makes clear, writing a program in such a way that an instruction that reads sr occurs immediately after an instruction that updates sr will cause the speed of execution to be reduced. if the instruction that reads sr occurs at least three instructions after the instruction that updates sr, no slowdown results. sr update instruction 1 (sett) instruction 2 (stc sr, r1) instruction 3 instruction 4 if if ex id if wb ex ma wb if slot id ma id ex . . . . . . id bc a figure 10-10 affect on pipeline of sr update instructions 10.2.4 multiplier access contention a multiplier-type instruction (multiply/accumulate calculations, multiplier instructions), an instruction in which the multiply and accumulate registers (mach, macl) are accessed, can cause a contention in the multiplier access. in the multiplier instruction, the multiplier takes action regardless of the slots after the ending of the last ma. in the double precision (64 bytes) type multiplier instruction and the multiply/accumulate calculations instruction, the multiplier takes action in three states. in the single precision (32 bytes) type multiplier instruction, the action is taken in two states. when ma (when there are two, the first ma takes precedence) of the multiplier instruction (multiply/accumulate calculations, multiplier instruction) contends with the multiplier access (mm) of the preceding multiplier instruction, the ma bus cycle is extended until the mm ends. the extended ma then becomes one slot. the ma instruction which accesses the multiply/accumulate register (mach, macl) also accesses the multiplier. similar to the multiplier instruction, the ma bus cycle is extended until the mm of the preceding multiplier-type instruction ends, and the extended ma becomes one slot. in particular, in the instruction (sts, sts.l), which reads out the multiply/accumulate register 419 (mach, macl,ma) is extended until one slot has elapsed after the ending of the mm, the extended ma becomes one slot. on the other hand, when the instruction has two mas, the succeeding id instruction is stalled for a one-slot period. because the multiplier-type instruction and the multiply/accumulate register access instruction both have ma cycles, a contention with if may develop. examples of multiplier access contention are shown in figures 10-10 and 10-11. in these cases, the contention between ma and if is not taken into consideration. mac.l next instruction if if ex ?d if mac.l slot id ex m a ma mm mm mm id ex ma mm ma mm mm figure 10-11 contention between two mac.l instructions sts.l next instruction if if ex ?d if id mac.l slot id ex m a ex ma mm ma mm mm figure 10-12 contention between the mac.l and sts.l instructions 420 10.2.5 fpu contention (sh-3e only) in addition to the lds and sts instructions, which move data between the cpu and fpu, loading and storing floating point numbers also uses the ma stage of the pipeline. consequently, such instructions create contention with the if stage. if the register to which the result of a floating point arithmetic calculation instruction, the fmov instruction, or a floating point number load instruction is stored is read by the next instruction, the execution of this instruction (the next instruction) is delayed by one slot cycle (figure 10-13). next floating point instruction (fmov fr2, fr2) if if e1 df sf e1 e2 sf slot floating point arithmetic calculation instruction (fadd fr1, fr2) id e2 figure 10-13 fpu contention 1 if the lds or lds.l instruction is used to change the value of fpscr, the execution of the next instruction (if it is a floating point instruction) is delayed by one slot cycle (figure 10-14). floating point arithmetic calculation instruction (fadd fr4, fr5) if if e1 df sf e1 e2 sf slot instruction 1 (lds r2, fpscr) id e2 figure 10-14 fpu contention 2 if the preceding instruction was a floating point arithmetic calculation instruction (using the sts or sts.l instruction), the execution of an instruction that reads the value of fpscr is delayed by one slot cycle (figure 10-15). instruction 2 (sts fpscr, r3) if if e1 df sf e1 e2 sf slot floating point arithmetic calculation instruction (fadd fr6, fr9) id e2 figure 10-15 fpu contention 3 421 the fdiv and fsqrt instructions require 13 cycles in the e1 stage. during this period, no other floating point instruction may enter the e1 stage. if another floating point instruction is encountered before the fdiv or fsqrt instruction has finished using the e1 stage, the fixed slot duration for the execution of that instruction is delayed, and the instruction enters the e1 stage only after the fdiv or fsqrt instruction has entered the e2 stage (figure 10-16). floating point instruction (fmov fr8, fr10) if if e1 e1 . . . df . . . . . . sf e1 e2 sf slot instruction 1 (fdiv fr6, fr7) id e2 figure 10-16 fpu contention 4 however, if contention arises because the preceding fdiv or fsqrt instruction and the fpu calculation which follows it use the same register, the fdiv or fsqrt instruction enters the e1 stage after the execution of the sf instruction. floating point instruction (fadd f1, f3) if if e1 e1 . . . df . . . . . . . . . sf e1 e2 sf slot instruction 1 (fsqrt f1) id e2 figure 10-17 fpu contention 5 422 10.2.6 contention between dsp data operation instructions and store instructions (sh3- dsp only) when dsp operations are executed by the dsp unit and the results are stored in memory by the next instruction, contention occurs just as with memory load instructions. in such cases, the data store of the ma stage of the following instruction is extended until the data operation of the wb/dsp stage of the previous instruction ends. since the operation is executed in the ex stage by the cpu core, however, no stall cycle is produced. figure 10-18 shows the relationship between dsp unit data operation instructions and store instructions; figure 10-19 shows the relationship to the cpu core. instruction 1 (padd x0,y0,a0) instruction 3 instruction 4 if if ex id ma w/d if w/d id if id : slot instruction 2 (movx a0,@ra) id ma ex ma w/d ex ma w/d ex figure 10-18 relationship between dsp engine operation instructions and store instructions instruction 2 (mov rb,@rc) instruction 3 instruction 4 if if ex id if w/d ex id if id : slot instruction 1 (add ra,rb) id ma ex ma w/d ma w/d ex ma w/d figure 10-19 relationship between cpu core operation instructions and store instructions 423 10.2.7 relationship between load and store instructions (sh3-dsp only) when data is loaded from memory to the destination register and the register is then specified as the source operand for a following store instruction, the preceding instructions load is executed in the wb/dsp stage and the following instructions store is executed in the ma stage. these stages are executed in exactly the same cycle. nevertheless, they do not contend. the cpu core and dsp unit use the same data transfer method. in this case, when the data input to the internal bus is stored to the destination register, the same data is simultaneously output again to the internal bus. in the end, the store instructions output operation never actually happens. instruction 2 (mov.l rn,@rb) instruction 3 instruction 4 if if ex id if w/d ex id if id : slot instruction 1 (mov.l @ra,rn) id ma ex ma w/d ma w/d ex ma w/d figure 10-20 relationship between load and store instructions in the cpu core instruction 2 (movs.l ds,@r5) instruction 3 instruction 4 if if ex id if w/d ex id if id : slot instruction 1 (movs.l @r4,ds) id ma ex ma w/d ma w/d ex ma w/d figure 10-21 relationship between load and store instructions in the dsp unit 424 10.3 programming guidelines 10.3.1 correspondence between contention and instructions the types of correspondence between contention and instructions can be summarized as follows. (1) instructions that do not cause contention (2) instructions where a memory access (ma) causes contention with an instruction fetch (if) (3) instructions where a write back (wb) to sr causes contention with a sr update (4) instructions where a memory access (ma) causes contention with an instruction fetch (if), and in addition a write back (wb) to memory causes contention with a memory load (5) instructions where a memory access (ma) causes contention with an instruction fetch (if), and in addition a write back (wb) to sr causes contention with a sr update (6) instructions where a memory access (ma) causes contention with an instruction fetch (if), and in addition a multiplier access (mm) causes contention with the multiplier. (7) instructions where a memory access (ma) causes contention with an instruction fetch (if), a multiplier access (mm) causes contention with the multiplier, and in addition a write back (wb) causes contention with a memory load (8) instructions that cause contention with the movx.w, movs.w, or movs.l instruction 425 table 10-1 shows the correspondence between types of contention and instructions. table 10-1 types of contention and instructions contention cycles stages instructions none 1 5 inter-register transfer instructions 1 5 inter-register operations (except multiplier type instructions) 1 5 inter-register logic operation instructions 1 5 shift instructions 3/1 3 conditional branch instructions 2/1 3 delayed conditional branch instruction 2 3 unconditional branch instructions 2 5 unconditional branch instructions (pr) 1 5 system control instructions 1 3 nop instruction 5 5 ldc instruction (sr) 7 7 ldc.l instruction 4 5 rte instruction 6 6 trap instruction 4 6 sleep instruction 1 5 dsp data operation instructions movx.w (load) and movy.w (load) instructions ? ma contends with if 1 4 memory store instructions 1 5 memory store instructions (pre- decrement) 1 4 cache instruction 3 6 memory logic operation instruction 1 4 ldtlb instruction 1 5 sts.l instruction (pr) 1 5 stc.l instruction (excluding bank registers) 2 6 stc.l instruction (bank registers) 1 5 movs.w (load) and movs.l (load) instructions ? causes dsp operation contention 1 4 movx.w (store) and movs.l (store) instructions 426 table 10-1 types of contention and instructions (cont) contention cycles stages instructions ? contention caused by sr update 1 5 arithmetic calculation instructions between sr updated registers (excluding instructions involving multiplication) 1 5 logical calculation instructions between sr updated registers 1 5 sr update shift instructions 1 5 sr update system control instructions ? ma contends with if 1 5 memory load instructions ? causes memory load contention 1 5 lds.l instruction (pr) 1 5 ldc.l instruction ? ma contends with if ? contention caused by sr 3 7 sr update memory logical calculation instructions update 3 7 tas instruction ? ma contends with if ? causes multiplier contention 2 (to 5) * 8 multiply and accumulate calculation instructions 2 (to 5) * 8 double-length multiply and accumulate calculation instructions 1 (to 3) * 6 multiplication instructions (excluding pwuls) 2 (to 5) * 8 double-length multiplication instructions 1 4 register to mac transfer instructions 1 4 memory to mac transfer instructions 1 5 mac to memory transfer instructions ? ma contends with if ? causes dsp operation contention 1 4 movs.w (store) and movs.l (store) instructions ? ma contends with if ? causes multiplier contention ? causes memory load contention ? causes dsp operation contention 1 5 mac/dsp to register transfer instructions ? causes movx.w, movs.w, or movs.l instruction 1 5 plds and psts instructions note: * indicates the normal number of cycles. the figures in parentheses are the cycles when contention also occurs with the previous instruction. 427 10.3.2 increasing instruction execution speed to improve instruction execution speed, consider the following when programming: ? to prevent contention between ma and if, locate instructions that have ma stages so they start from the longword boundaries of on-chip memory (the position when the bottom two bits of the instruction address are 00 is a1 = 0 and a0 = 0) wherever possible. ? the instruction that immediately follows an instruction that loads from memory should not use the same destination register as the load instruction. this will avoid causing contention with the memory load triggered by the write back (wb). ? locate two instructions that do not read sr immediately after any instruction that overwrites the m, q, s, and t bits of sr. this will prevent contention with sr update instructions from occurring. ? locate instructions that use the multiplier nonconsecutively (excluding pwuls). ? immediately following a data operation using the dsp unit, do not use an instruction that transfers data to memory or the cpu core from the register where the operation result is stored. by placing some other instruction in between, contention can be avoided. ? do not use movx.w, movs.w, or movs.l to perform a memory store immediately following a plds or psts instruction using the dsp unit. also, do not specify a plds or psts instruction in parallel with a memory store instruction using movx.w. 10.3.3 number of cycles these instructions are designed to require only one cycle for execution. of these one-cycle instructions, some never cause contention and some can cause contention. some instructions may require two or more cycles even if no contention occurs. instructions that require two or more cycles include instructions that execute access memory twice or more, such as branching instructions that update the branching destination address, memory logical calculation instructions, and certain system control instructions. further examples include instructions that access both memory and the multiplier, such as multiplication instructions and accumulate-and- add instructions. among instructions that require two or more cycles, some never cause contention and some can cause contention. in order to create efficient programs, it is essential to keep in mind the need to increase execution speed by avoiding contention and also to use instructions that require few cycles to execute. 428 10.4 operation of instruction pipelines this section describes the operation of the instruction pipelines. by combining these with the rules described so far, the way pipelines flow in a program and the number of instruction execution cycles can be calculated. in the following figures, instruction a refers to the instruction being discussed. when if is written in the instruction fetch stage, it may refer to either if or if. when there is contention between if and ma, the slot will split, but the manner of the split is not discussed in the tables, with a few exceptions. when a slot has split, see section 10.2, contention between instruction fetch (if) and memory access (ma). base your response on the rules for pipeline operation given there. table 10-2 shows the number of instruction stages and number of execution cycles as follows: ? type: given by function ? category: categorized by differences in instruction operation ? instructions: gives a mnemonic for the instruction concerned ? cycles: the number of execution cycles when there is no contention ? stages: the number of stages in the instruction ? contention: indicates the contention that occurs 429 table 10-2 number of instruction stages and execution cycles type category instruction cycles stages contention data transfer instructions register- register transfer instructions mov #imm,rn mov rm,rn mova @(disp,pc),r0 movt rn swap.b rm,rn swap.w rm,rn xtrct rm,rn 15 memory load instructions mov.w @(disp,pc),rn mov.l @(disp,pc),rn mov.b rm,@rn mov.w rm,@rn mov.l rm,@rn mov.b @rm+,rn mov.w @rm+,rn mov.l @rm+,rn mov.b @(disp,rm),r0 mov.w @(disp,rm),r0 mov.l @(disp,rm),rn mov.b @(r0,rm),rn mov.w @(r0,rm),rn mov.l @(r0,rm),rn mov.b @(disp,gbr),r0 mov.w @(disp,gbr),r0 mov.l @(disp,gbr),r0 1 5 ? contention occurs if the instruction placed immediately after this one uses the same destination register ? ma contends with if memory store instructions mov.b @rm,rn mov.w @rm,rn mov.l @rm,rn mov.b r0,@(disp,rn) mov.w r0,@(disp,rn) mov.l rm,@(disp,rn) 1 4 ? ma contends with if 430 table 10-2 number of instruction stages and execution cycles (cont) type category instruction cycles stages contention data transfer instructions (cont) memory store instructions (cont) mov.b rm,@(r0,rn) mov.w rm,@(r0,rn) mov.l rm,@(r0,rn) mov.b r0,@(disp,gbr) mov.w r0,@(disp,gbr) mov.l r0,@(disp,gbr) 1 4 ? ma contends with if memory store instructions (pre- decrement) mov.b rm,@-rm mov.w rm,@-rm mov.l rm,@-rm 1 5 ? ma contends with if cache instruction pref @rn 1/2 * 1 4 ? ma contends with if arithmetic instructions arithmetic operation instruction between registers (excluding multiply instructions) add rm,rn add #imm,rn exts.b rm,rn exts.w rm,rn extu.b rm,rn extu.w rm,rn neg rm,rn sub rm,rn 15 sr update arithmetic operation instruction between registers (excluding multiply instructions) addc rm,rn addv rm,rn cmp/eq #imm,r0 cmp/eq rm,rn cmp/hs rm,rn cmp/ge rm,rn cmp/hi rm,rn cmp/gt rm,rn cmp/pl rn cmp/pz rn cmp/str rm,rn 1 5 ? contention occurs if the instruction following this instruction, or the instruction after that, reads from sr. 431 table 10-2 number of instruction stages and execution cycles (cont) type category instruction cycles stages contention arithmetic instructions (cont) sr update arithmetic operation instruction between registers (excluding multiply instructions) div1 rm,rn div0s rm,rn div0u dt rn negc rm,rn subc rm,rn subv rm,rn 1 5 ? contention occurs if the instruction following this instruction, or the instruction after that, reads from sr. multiply/ accumulate instruction mac.w @rm+,@rn+ 2 (to 5) * 2 8 ? causes multiplier contention ? ma contends with if double length/ multiply accumulate instruction mac.l @rm+,@rn+ 2 (to 5) * 2 8 ? causes multiplier contention ? ma contends with if multiplic- ation instruction muls.w rm,rn mulu.w rm,rn 1 (to 3) * 2 6 ? causes multiplier contention ? ma contends with if double length multipli- cation instructions dmuls.l rm,rn dmulu.l rm,rn 2 (to 5) * 2 8 ? causes multiplier contention ? ma contends with if 432 table 10-2 number of instruction stages and execution cycles (cont) type category instruction cycles stages contention logic operation instructions register to register logic operation instructions and rm,rn and #imm,r0 not rm,rn or rm,rn or #imm,r0 xor rm,rn xor #imm,r0 15 logical calculation instructions between sr updated registers tst rm,rn tst #imm,r0 1 5 ? contention occurs if the instruction following this instruction, or the instruction after that, reads from sr memory logic operations instructions and.b #imm,@(r0,gbr) or.b #imm,@(r0,gbr) xor.b #imm,@(r0,gbr) 3 6 ? ma contends with if sr update memory logical calculation instructions tst.b #imm,@(r0,gbr) 3 7 ? contention occurs if the instruction following this instruction, or the instruction after that, reads from sr ? ma contends with if tas instruction tas.b @rn 3/4 * 3 7 ? ma contends with if 433 table 10-2 number of instruction stages and execution cycles (cont) type category instruction cycles stages contention shift instructions shift instructions shll2 rn shlr2 rn shll8 rn shlr8 rn shll16 rn shlr16 rn shad rm,rn shld rm,rn 15 sr update shift instructions rotl rn rotr rn rotcl rn rotcr rn shal rn shar rn shll rn shlr rn 1 5 ? contention occurs if the instruction following this instruction, or the instruction after that, reads from sr branch instructions conditional branch instructions bf label bt label 3/1 * 4 3 delayed conditional branch instructions bf/s label bt/s label 2/1 * 4 3 uncondi- tional branch instructions bra label braf rm jmp @rm rts 23 uncondi- tional branch instructions (pr) bsr label bsrf rm jsr @rm 25 434 table 10-2 number of instruction stages and execution cycles (cont) type category instruction cycles stages contention system control instructions system control alu instructions ldc rm,gbr ldc rm,vbr ldc rm,ssr ldc rm,spc ldc rm,mod ldc rm,re ldc rm,rs ldc rm,r0_bank ldc rm,r1_bank ldc rm,r2_bank ldc rm,r3_bank ldc rm,r4_bank ldc rm,r5_bank ldc rm,r6_bank ldc rm,r7_bank 1/3 * 5 5 setrc rm setrc #imm ldre @(disp,pc) ldrs @(disp,pc) 35 lds rm,pr stc sr,rn stc gbr,rn stc vbr,rn stc ssr,rn stc spc,rn stc mod,rn stc re,rn stc rs,rn stc r0_bank,rn stc r1_bank,rn stc r2_bank,rn stc r3_bank,rn stc r4_bank,rn 15 435 table 10-2 number of instruction stages and execution cycles (cont) type category instruction cycles stages contention system control instructions (cont) system control alu instructions stc r5_bank,rn stc r6_bank,rn stc r7_bank,rn sts pr,rn 15 sr update system control instructions clrs clrt sets sett 1 5 ? contention occurs if the instruction following this instruction, or the instruction after that, reads from sr ldtlb instruction ldtlb 1 4 ? ma contends with if nop instruction nop 13 ldc instructions (sr) ldc rm,sr 55 ldc.l instructions (sr) ldc.l @rm+,sr 77 lds.l instructions (pr) lds.l @rm+,pr 1 5 ? contention occurs when an instruction that uses the same destination register is placed immediately after this instruction ? ma contends with if sts.l instruction (pr) sts.l pr,@Crn 1 5 ? ma contends with if 436 table 10-2 number of instruction stages and execution cycles (cont) type category instruction cycles stages contention system control instructions (cont) ldc.l instructions ldc.l @rm+,gbr ldc.l @rm+,vbr ldc.l @rm+,ssr ldc.l @rm+,spc ldc.l @rm+,mod ldc.l @rm+,re ldc.l @rm+,rs ldc.l @rm+,r0_bank ldc.l @rm+,r1_bank ldc.l @rm+,r2_bank ldc.l @rm+,r3_bank ldc.l @rm+,r4_bank ldc.l @rm+,r5_bank ldc.l @rm+,r6_bank ldc.l @rm+,r7_bank 1/5 * 6 5 ? contention occurs when an instruction that uses the same destination register is placed immediately after this instruction ? ma contends with if stc.l instructions stc.l sr,@Crn stc.l gbr,@Crn stc.l vbr,@Crn stc.l ssr,@Crn stc.l spc,@Crn stc.l mod,@-rn stc.l re,@-rn stc.l rs,@-rn 1/2 * 1 5 ? ma contends with if stc.l r0_bank,@Crn stc.l r1_bank,@Crn stc.l r2_bank,@Crn stc.l r3_bank,@Crn stc.l r4_bank,@Crn stc.l r5_bank,@Crn stc.l r6_bank,@Crn stc.l r7_bank,@Crn 2 6 ? ma contends with if 437 table 10-2 number of instruction stages and execution cycles (cont) type category instruction cycles stages contention system control instructions (cont) register ? mac/dsp transfer instruction clrmac lds rm,mach lds rm,macl lds rm,dsr lds rm,a0 lds rm,x0 lds rm,x1 lds rm,y0 lds rm,y1 1 4 ? contention occurs with multiplier ? ma contends with if memory ? mac/dsp transfer instructions lds.l @rm+,mach lds.l @rm+,macl lds.l @rm+,dsr lds.l @rm+,a0 lds.l @rm+,x0 lds.l @rm+,x1 lds.l @rm+,y0 lds.l @rm+,y1 1 4 ? contention occurs with multiplier ? ma contends with if mac/dsp ? register transfer instruction sts mach,rn sts macl,rn sts dsr,rn sts a0,rn sts x0,rn sts x1,rn sts y0,rn sts y1,rn 1 5 ? contention occurs with multiplier ? contention occurs when an instruction that uses the same destination register is placed immediately after this instruction ? ma contends with if mac/dsp ? memory transfer instruction sts.l mach,@Crn sts.l macl,@Crn sts.l dsr,@Crn sts.l a0,@Crn sts.l x0,@Crn sts.l x1,@Crn sts.l y0,@Crn sts.l y1,@Crn 1 5 ? contention occurs with multiplier ? ma contends with if 438 table 10-2 number of instruction stages and execution cycles (cont) type category instruction cycles stages contention system control rte instruction rte 45 instructions (cont) trap instruction trapa #imm 6/8 * 7 6/8 * 7 sleep instruction sleep 46 register ? mac/dsp transfer instruction clrmac lds rm,mach lds rm,macl lds rm,dsr lds rm,a0 lds rm,x0 lds rm,x1 lds rm,y0 lds rm,y1 4 1 ? causes multiplier contention ? ma contends with if memory ? mac/dsp transfer instructions lds.l @rm+,mach lds.l @rm+,macl lds.l @rm+,dsr lds.l @rm+,a0 lds.l @rm+,x0 lds.l @rm+,x1 lds.l @rm+,y0 lds.l @rm+,y1 4 1 ? causes multiplier contention ? ma contends with if mac/dsp ? register transfer instruction sts mach,rn sts macl,rn sts dsr,rn sts a0,rn sts x0,rn sts x1,rn sts y0,rn sts y1,rn 5 1 ? causes multiplier contention ? contention occurs when an instruction that uses the same destination register is placed immediately after this instruction ? ma contends with if 439 table 10-2 number of instruction stages and execution cycles (cont) type category instruction cycles stages contention system control instructions (cont) mac/dsp ? memory transfer instruction sts.l mach,@Crn sts.l macl,@Crn sts.l dsr,@Crn sts.l a0,@Crn sts.l x0,@Crn sts.l x1,@Crn sts.l y0,@Crn sts.l y1,@Crn 4 1 ? causes multiplier contention ? ma contends with if rte instruction rte 54 trap instruction trapa #imm 98 sleep instruction sleep 33 register ? dsp transfer instructions clrmac lds rm,mach lds rm,macl 1 4 ? causes multiplier contention ? ma contends with if lds rm,dsr lds rm,a0 lds rm,x0 lds rm,x1 lds rm,y0 lds rm,y1 14 memory ? dsp transfer instructions lds.l @rm+,mach lds.l @rm+,macl 1 4 ? causes multiplier contention ? ma contends with if ds.l @rm+,dsr ds.l @rm+,a0 ds.l @rm+,x0 ds.l @rm+,x1 ds.l @rm+,y0 ds.l @rm+,y1 14 440 table 10-2 number of instruction stages and execution cycles (cont) type category instruction cycles stages contention system control instructions (cont) dsp ? register transfer instructions sts mach,rn sts macl,rn sts dsr,rn sts a0,rn sts x0,rn sts x1,rn sts y0,rn sts y1,rn 1 5 ? causes multiplier contention ? contention occurs when an instruction that uses the same destination register is placed immediately after this instruction ? ma contends with if ? causes contention with dsp operation. dsp ? memory transfer instructions sts.l mach,@-rn sts.l macl,@-rn 1 4 ? causes multiplier contention ? ma contends with if sts.l dsr,@-rn sts.l a0,@-rn sts.l x0,@-rn sts.l x1,@-rn sts.l y0,@-rn sts.l y1,@-rn 14 rte instruction rte 45 trap instruction trapa #imm 89 sleep instruction sleep 33 dsp data transfer instructions x memory load instructions nopx movx.w @ax,dx movx.w @ax+,dx movx.w @ax+ix,dx 15 x memory store instructions movx.w da,@ax movx.w da,@ax+ movx.w da,@ax+ix 1 4 ? causes contention with dsp operation. 441 table 10-2 number of instruction stages and execution cycles (cont) type category instruction cycles stages contention dsp data transfer instructions (cont) y memory load instructions nopy movy.w @ay,dy movy.w @ay+,dy movy.w @ay+ix,dy 15 y memory store instructions movy.w da,@ay movy.w da,@ay+ movy.w da,@ay+iy 1 4 ? causes contention with dsp operation. single load instructions movs.w @-as,ds movs.w @as,ds movs.w @as+,ds movs.w @as+is,ds movs.l @-as,ds movs.l @as,ds movs.l @as+,ds movs.l @as+is,ds 1 5 ? ma contends with if single store instructions movs.w ds,@-as movs.w ds,@as movs.w ds,@as+ movs.w ds,@as+is movs.l ds,@-as movs.l ds,@as movs.l ds,@as+ movs.l ds,@as+is 1 5 ? ma contends with if ? causes contention with dsp operation. dsp operation instructions padd sx,sy,dz(du) dct padd sx,sy,dz dcf padd sx,sy,dz psub sx,sy,dz(du) dct psub sx,sy,dz dcf psub sx,sy,dz pcopy sx,dz dct pcopy sx,dz dcf pcopy sx,dz 15 442 table 10-2 number of instruction stages and execution cycles (cont) type category instruction cycles stages contention dsp operation instructions (cont) pcopy sy,dz dct pcopy sy,dz dcf pcopy sy,dz pdmsb sx,dz dtc pdmsb sx,dz dcf pdmsb sx,dz pdmsb sy,dz dct pdmsb sy,dz dcf pdmsb sy,dz pinc sx,dz dct pinc sx,dz dcf pinc sx,dz pinc sy,dz dct pinc sy,dz dcf pinc sy,dz pneg sx,dz dct pneg sx,dz dcf pneg sx,dz pneg sy,dz dct pneg sy,dz dcf pneg sy,dz pdec sx,dz dtc pdec sx,dz dcf pdec sx,dz pdec sy,dz dtc pdec sy,dz dcf pdec sy,dz pclr dz dct pclr dz dcf pclr dz 15 443 table 10-2 number of instruction stages and execution cycles (cont) type category instruction cycles stages contention dsp operation instructions (cont) paddc sx,sy,dz psubc sx,sy,dz pcmp sx,sy pabs sx,dz pabs sy,dz prndsx,dz prndsy,dz 15 por sx,sy,dz dct por sx,sy,dz dcf por sx,sy,dz pand sx,sy,dz dct pand sx,sy,dz dcf pand sx,sy,dz pxor sx,sy,dz dct pxor sx,sy,dz dcf pxor sx,sy,dz 15 shift instructions psha sx,sy,dz dct psha sx,sy,dz dcf psha sx,sy,dz psha #imm,dz pshl sx,sy,dz dct pshl sx,sy,dz dcf pshl sx,sy,dz pshl #imm,dz 15 pmuls se,sf,dg 15 444 table 10-2 number of instruction stages and execution cycles (cont) type category instruction cycles stages contention dsp operation instructions (cont) psts mach,dz dtc psts mach,dz dcf psts mach,dz psts macl,dz dct psts macl,dz dcf psts macl,dz plds dz,mach dct plds dz,mach dcf plds dz,mach plds dz,macl dct plds dz,macl dcf plds dz,macl 1 5 ? contends with movx.w, movs.w, and movs.l notes: 1. two cycles on the sh3-dsp. 2. indicates the normal minimum number of execution states (the number in parentheses is the number of cycles when there is contention with following instructions). 3. four cycles on the sh3-dsp. 4. one state when there is no branch. 5. three cycles on the sh3-dsp. 6. five cycles on the sh3-dsp. 7. eight cycles and eight stages on the sh3-dsp. 445 10.4.1 data transfer instructions (1) register to register transfer instructions instruction types: mov #imm, rn mov rm, rn mova @(disp, pc), r0 movt rn swap.b rm, rn swap.w rm, rn xtrct rm, rn pipeline: if id if ex id if ma ex id wb ex slot instruction a next instruction third instruction in series figure 10-22 register to register transfer instruction pipeline operation description: the pipeline ends after five stages: if, id, ex, ma, and wb. in the ma stage nothing happens and the data is retained. the data is written to the register in the wb stage. 446 (2) memory load instructions instruction types: mov.w @(disp, pc), rn mov.l @(disp, pc), rn mov.b @rm, rn mov.w @rm, rn mov.l @rm, rn mov.b @rm+, rn mov.w @rm+, rn mov.l @rm+, rn mov.b @(disp, rm), r0 mov.w @(disp, rm), r0 mov.l @(disp, rm), rn mov.b @(r0, rm), rn mov.w @(r0, rm), rn mov.l @(r0, rm), rn mov.b @(disp, gbr), r0 mov.w @(disp, gbr), r0 mov.l @(disp, gbr), r0 pipeline: next instruction third instruction in series if if ex id if ex id ex slot instruction a id mb ..... ..... wb ...... figure 10-23 memory load instruction pipeline operation description: the pipeline has five stages: if, id, ex, ma, and wb (figure 10-23). if an instruction that uses the same destination register as this instruction is placed immediately after it, contention will occur (see section 10.2.2, effects of memory load instructions on pipelines). also, see section 10.2.1, contention between instruction fetch (if) and memory access (ma), with reference to contention between the ma and if stages of these instructions. 447 (3) memory store instructions instruction types: mov.b rm, @rn mov.w rm, @rn mov.l rm, @rn mov.b r0, @(disp, rn) mov.w r0, @(disp, rn) mov.l rm, @(disp, rn) mov.b rm, @(r0, rn) mov.w rm, @(r0, rn) mov.l rm, @(r0, rn) mov.b r0, @(disp, gbr) mov.w r0, @(disp, gbr) mov.l r0, @(disp, gbr) pipeline: next instruction third instruction in series if if ex id if ex id ex slot instruction a id ma ..... ..... ...... figure 10-24 memory store instructions pipeline operation description: the pipeline has four stages: if, id, ex, and ma (figure 10-24). data is not returned to the register so there is no wb stage. see section 10.2.1, contention between instruction fetch (if) and memory access (ma), with reference to contention between the ma and if stages of these instructions. 448 (4) memory store instruction (pre-decrement) instruction types: mov.b rm,@-rn mov.w rm,@-rn mov.l rm,@-rn pipeline: slot instruction a next instruction third instruction in series if id if ex id if ma ex id wb ex figure 10-25 memory store instruction (preCdecrement) pipeline operation description: the pipeline ends after five stages: if, id, ex, ma, and wb. in the wb stage the decremented value is written to the register. see section 10.2.1, contention between instruction fetch (if) and memory access (ma), with reference to contention between the ma and if stages of these instructions. 449 (5) cache instruction instruction types: pref @rn pipeline: if id if ex id if ma ex id ex slot pref next instruction third instruction in series figure 10-26 cache instruction pipeline operation description: the pipeline ends after four stages: if, id, ex, and ma. there is no wb stage because no data is returned to the register. see section 10.2.1, contention between instruction fetch (if) and memory access (ma), with reference to contention between the ma and if stages of these instructions. on the sh3-dsp, the id of the next instruction is stored one slot behind. 450 10.4.2 arithmetic instructions (1) arithmetic instructions between registers (except multiplication instructions) instruction types: add rm, rn add #imm, rn exts.b rm, rn exts.w rm, rn extu.b rm, rn extu.w rm, rn neg rm, rn sub rm, rn pipeline: if id if ex id if ma ex id wb ex slot instruction a next instruction third instruction in series figure 10-27 arithmetic instructions between registers (except multiplication instructions) pipeline operation description: the pipeline ends after five stages: if, id, ex, ma, and wb. in the ma stage nothing happens and the calculation result is retained. the result is written to the register in the wb stage. 451 (2) arithmetic calculation instructions between sr updated registers (excluding instructions involving multiplication) instruction types: addc rm,rn addv rm,rn cmp/eq #imm,r0 cmp/eq rm,rn cmp/hs rm,rn cmp/ge rm,rn cmp/hi rm,rn cmp/gt rm,rn cmp/pl rn cmp/pz rn cmp/str rm,rn div1 rm,rn div0s rm,rn div0u dt rn negc rm,rn subc rm,rn subv rm,rn pipeline: if id if ex id if ma ex id wb ex slot instruction a next instruction third instruction in series figure 10-28 pipeline for arithmetic calculation instructions between sr updated registers (excluding instructions involving multiplication) operation description: the pipeline ends after five stages: if, id, ex, ma, and wb. in the ma stage nothing happens and the calculation result is retained. the result is written to the register in the wb stage. contention occurs if the instruction immediately following this instruction, or the instruction after that, reads from sr. (see section 10.2.3, contention due to sr update instructions.) 452 (3) multiply/accumulate instruction instruction type: mac.w @rm+, @rn+ pipeline: next instruction third instruction in series if if ex ?d if id mac.w id ex ex ma ma slot wb wb ma mm ma mm mm ..... figure 10-29 multiply/accumulate instruction pipeline the pipeline has eight stages*: if, id, ex, ma, ma, mm, mm, and mm (figure 10-29). the second ma reads the memory and accesses the multiplier. the mm indicates that the multiplier is operating. the mm operates for three cycles after the final ma ends, regardless of slot. the id of the instruction after the mac.w instruction is stalled for one slot. the two mas of the mac.w instruction, when they contend with if, split the slots as described in section 10.2.1, contention between instruction fetch (if) and memory access (ma). when an instruction that does not use the multiplier follows the mac.w instruction, the mac.w instruction may be considered to be a five-stage pipeline instruction of if, id, ex, ma, ma. in such cases, the id of the next instruction simply stalls one slot and thereafter the pipeline operates normally. when an instruction that uses the multiplier comes after the mac instruction, contention occurs with the multiplier, so operation is not as normal (see 10.2.4 multiplier access contention). note: * on the sh3-dsp there are seven stages: if, id, ex, ma, ma, mm, and mm. 453 (4) double-length multiply/accumulate instruction instruction type: mac.l @rm+, @rn+ pipeline: next instruction third instruction if if ex ?d if id mac.l id ex ex ma ma slot wb wb ma mm ma mm mm ...... figure 10-30 multiply/accumulate instruction pipeline operation description: the pipeline has eight stages*: if, id, ex, ma, ma, mm, mm, and mm (figure 10-30). the second ma reads the memory and accesses the multiplier. the mm indicates that the multiplier is operating. the mm operates for three cycles after the final ma ends, regardless of slot. the id of the instruction after the mac.l instruction is stalled for one slot. the two mas of the mac.l instruction, when they contend with if, split the slots as described in section 10.2.1, contention between instruction fetch (if) and memory access (ma). when an instruction that does not use the multiplier follows the mac.l instruction, the mac.l instruction may be considered to be a five-stage pipeline instruction of if, id, ex, ma, ma. in such cases, the id of the next instruction simply stalls one slot and thereafter the pipeline operates normally. when an instruction that uses the multiplier comes after the mac.l instruction, contention occurs with the multiplier, so operation is not as normal (see 10.2.4 multiplier access contention). note: * on the sh3-dsp there are nine stages: if, id, ex, ma, ma, mm, mm, mm, and mm. 454 (5) multiplication instructions instruction types: muls.w rm, rn mulu.w rm, rn pipeline: next instruction third instruction if if ex id if id instruction a id ex ex ma ma slot wb wb ma mm mm ...... figure 10-31 multiplication instruction pipeline operation description: the pipeline has six stages: if, id, ex, ma, mm, and mm (figure 10-31). the ma accesses the multiplier. the mm indicates that the multiplier is operating. the mm operates for two cycles after the ma ends, regardless of the slot. the ma of the muls.w instruction, if it contends with if, operates as described in section 10.2.1, contention between instruction fetch (if) and memory access (ma). when an instruction that does not use the multiplier comes after the muls.w instruction, the muls.w instruction may be considered to be a four-stage pipeline instruction of if, id, ex, and ma. in such cases, it operates like a normal pipeline. when an instruction that uses the multiplier come after the muls.w instruction, however, contention occurs with the multiplier, so operation is not as normal (see 10.2.4 multiplier access contention). 455 (6) double-length multiplication instructions instruction types: dmuls.l rm, rn dmulu.l rm, rn mul.l rm, rn pipeline: next instruction third instruction if if ex ?d if id instruction a id ex ex ma ma slot wb wb ma mm ma mm mm ...... figure 10-32 multiplication instruction pipeline operation description: the pipeline has eight stages*: if, id, ex, ma, ma, mm, mm, and mm (figure 10-32). the ma accesses the multiplier. the mm indicates that the multiplier is operating. the mm operates for three cycles after the ma ends, regardless of slot. the id of the instruction following the dmuls.l instruction is stalled for 1 slot (see the description of the multiply/accumulate instruction). the two ma stages of the dmuls.l instruction, when they contend with if, split the slot as described in section 10.2.1, contention between instruction fetch (if) and memory access (ma). when an instruction that does not use the multiplier comes after the dmuls.l instruction, the dmuls.l instruction may be considered to be a five-stage pipeline instruction of if, id, ex, ma, and ma. in such cases, it operates like a normal pipeline. when an instruction that uses the multiplier come after the dmuls.l instruction, however, contention occurs with the multiplier, so operation is not as normal (see 10.2.4 multiplier access contention). note: * on the sh3-dsp there are nine stages: if, id, ex, ma, ma, mm, mm, mm, and mm. 456 10.4.3 logic operation instructions (1) register to register logic operation instructions instruction types: and rm, rn and #imm, r0 not rm, rn or rm, rn or #imm, r0 xor rm, rn xor #imm, r0 pipeline: if id if ex id if ma ex id wb ex slot instruction a next instruction third instruction in series figure 10-33 register to register logic operation instruction pipeline operation description: the pipeline ends after five stages: if, id, ex, ma, and wb. in the ma stage nothing happens and the calculation result is retained. the result is written to the register in the wb stage. 457 (2) logical calculation instructions between sr updated registers instruction types: tst rm,rn tst #imm,r0 pipeline: next instruction slot third instruction in series if if ex ma wb if id tst id ex ..... id ex ..... figure 10-34 pipeline for logical calculation instructions between sr updated registers operation description: the pipeline ends after five stages: if, id, ex, ma, and wb. in the ma stage nothing happens and the calculation result is retained. the result is written to the register in the wb stage. contention occurs if the instruction immediately following this instruction, or the instruction after that, reads from sr (see section 10.2.3, contention due to sr update instructions). 458 (3) memory logic operations instructions instruction types: and.b #imm, @(r0, gbr) or.b #imm, @(r0, gbr) xor.b #imm, @(r0, gbr) pipeline: next instruction third instruction in series if if ex if id ex instruction a id ex ..... ex ma ma ..... slot id ..... figure 10-35 memory logic operation instruction pipeline operation description: the pipeline has six stages: if, id, ex, ma, ex, and ma (figure 10-35). the id of the next instruction stalls for 2 slots. the mas of these instructions contend with if (see 10.2.1 contention between instruction fetch (if) and memory access (ma). 459 (4) sr update memory logical calculation instructions instruction type: tst.b #imm,@(r0,gbr) pipeline: next instruction third instruction in series if ex ma ex ma wb if id ex tst.b id slot ..... id ex if ..... figure 10-36 sr updated memory logical calculation instruction pipeline operation description: the pipeline ends after seven stages: if, id, ex, ma, ex, ma, and wb. the result is written to the t bit of sr in the wb stage. the ma of the tst instruction contends with if. (see section 10.2.1, contention between instruction fetch (if) and memory access (ma).) also, contention occurs if the instruction immediately following this instruction, or the instruction after that, reads from sr (see section 10.2.3, contention due to sr update instructions). 460 (5) tas instruction instruction type: tas.b @rn pipeline: next instruction third instruction in series if ex ex ma ma ma wb id if ex tas.b id slot ..... if id ..... figure 10-37 tas instruction pipeline operation description: the pipeline ends after seven stages: if, id, ex, ma, ma, ma, and wb. the result is written to the t bit of sr in the wb stage. the ma of the tst instruction contends with if (see section 10.2.1, contention between instruction fetch (if) and memory access (ma)). also, contention occurs if the instruction immediately following this instruction, or the instruction after that, reads from sr (see section 10.2.3, contention due to sr update instructions). on the sh3-dsp, the id of the next instruction is stored three slots behind. 461 10.4.4 shift instructions (1) shift instructions instruction types: shll2 rn shlr2 rn shll8 rn shlr8 rn shll16 rn shlr16 rn shad rm,rn shld rm,rn pipeline: next instruction third instruction in series if ex ma wb id if ex instruction a id slot ..... id ex if ..... figure 10-38 shift instruction pipeline operation description: the pipeline ends after five stages: if, id, ex, ma, and wb. in the ma stage nothing happens and the shift result is retained. the result is written to the register in the wb stage. 462 (2) sr update shift instructions instruction types: rotl rn rotr rn rotcl rn rotcr rn shal rn shar rn shll rn shlr rn pipeline: next instruction third instruction in series if ex ma wb id if ex instruction a id slot ..... id ex if ..... figure 10-39 sr updated shift instruction pipeline operation description: the pipeline ends after five stages: if, id, ex, ma, and wb. in the ma stage nothing happens and the result is retained. the result is written to the register in the wb stage. contention occurs if the instruction immediately following this instruction, or the instruction after that, reads from sr (see section 10.2.3, contention due to sr update instructions). 463 10.4.5 branch instructions (1) conditional branch instructions instruction types: bf label bt label pipeline/operation description: the pipeline has three stages: if, id, and ex. condition verification is performed in the id stage. conditionally branched instructions are not delay branched. 1. when condition is satisfied the branch destination address is calculated in the ex stage. the two instructions after the conditional branch instruction (instruction a) are fetched but discarded. the branch destination instruction begins its fetch from the slot following the slot which has the ex stage of instruction a (figure 10-40). next instruction third instruction in series if if ex if instruction a id slot branch destination ?fidex ..... ..... if id ex ..... (fetched but discarded) (fetched but discarded) ..... figure 10-40 branch instruction when condition is satisfied 464 2. when condition is not satisfied if it is determined that conditions are not satisfied at the id stage, the ex stage proceeds without doing anything. the next instruction also executes a fetch (figure 10-41). next instruction third instruction in series if if ex if id instruction a id id slot ..... if id ex ..... ..... ex ex ..... ..... figure 10-41 branch instruction when condition is not satisfied 465 (2) delayed conditional branch instructions instruction types: bf/s label bt/s label pipeline/operation description: the pipeline has three stages: if, id, and ex. condition verification is performed in the id stage. 1. when condition is satisfied the branch destination address is calculated in the ex stage. the instruction after the conditional branch instruction (instruction a) is fetched and executed, but the instruction after that is fetched and discarded. the branch destination instruction begins its fetch from the slot following the slot which has the ex stage of instruction a (figure 10-42). next instruction third instruction in series if if ex if instruction a id slot branch destination if id ex ..... ..... if id ex ..... (fetched but discarded) ..... id ex figure 10-42 branch instruction when condition is satisfied 466 2. when condition is not satisfied if it is determined that conditions are not satisfied at the id stage, the ex stage proceeds without doing anything. the next instruction also executes a fetch (figure 10-43). next instruction third instruction in series if if ex if id instruction a id id slot ..... if id ex ..... ..... ex ex ..... ..... figure 10-43 branch instruction when condition is not satisfied 467 (3) unconditional branch instructions instruction types: bra label braf rm jmp @rm rts pipeline: if id if ex if id if ex (fetch but then data is discarded) id if ex id ex slot instruction a next instruction third instruction in series branch destination figure 10-44 unconditional branch instruction pipeline operation description: the pipeline has three stages: if, id, and ex (figure 10-44). unconditionally branched instructions are delay branched. the branch destination address is calculated in the ex stage. the instruction following the unconditional branch instruction (instruction a), that is, the delay slot instruction is not fetched and discarded as the conditional branch instructions are, but is then executed. note that the id slot of the delay slot instruction does stall for one cycle. the branch destination instruction starts its fetch from the slot after the slot that has the ex stage of instruction a. 468 (4) unconditional branch instructions (pr) instruction types: bsr label bsrf rm jsr @rm pipeline: if id if ex if ma id if wb ex (fetch but then data is discarded) id ex slot instruction a next instruction third instruction in series figure 10-45 unconditional branch instruction (pr) pipeline operation description: the pipeline ends after five stages: if, id, ex, ma, and wb. unconditionally branched instructions are delay branching. the instruction following the unconditional branch instruction (instruction a), that is, the delay slot instruction, is fetched and executed. however, the instruction after that is fetched and discarded. the branch destination instruction starts its fetch from the slot after the slot that has the ex stage of instruction a. 469 10.4.6 system control instructions (1) system control alu instructions instruction types: ldc rm, gbr ldc rm, vbr ldc rm, ssr ldc rm, spc ldc rm, mod ldc rm, re ldc rm, rs ldc rm, r0_bank ldc rm, r1_bank ldc rm, r2_bank ldc rm, r3_bank ldc rm, r4_bank ldc rm, r5_bank ldc rm, r6_bank ldc rm, r7_bank lds rm, pr stc sr, rn stc gbr, rn stc vbr, rn stc ssr, rn stc spc, rn stc mod, rn stc re, rn stc rs, rn stc r0_bank, rn stc r1_bank, rn stc r2_bank, rn stc r3_bank, rn stc r4_bank, rn stc r5_bank, rn stc r6_bank, rn stc r7_bank, rn sts pr, rn ldre @(disp,pc) (sh3-dsp only) ldrs @(disp,pc) (sh3-dsp only) setrc rm (sh3-dsp only) setrc #imm (sh3-dsp only) pipeline: if id if ex id if ma ex id wb ex slot instruction a next instruction third instruction in series figure 10-46 system control alu instruction pipeline operation description: the pipeline ends after five stages: if, td, ex, ma, and wb. in the ex stage, the data calculation is completed via alu. in the ma stage nothing happens and the result is retained. the result is written to the register in the wb stage. on the sh3-dsp, the id of the instruction following the ldc, ldre, ldrs, and setrc instruction is stored two slots behind. 470 (2) sr update system control instructions instruction types: clrs clrt sets sett pipeline: if id if ex id if ma ex id if wb ex id ex slot instruction a next instruction third instruction in series figure 10-47 sr update system control instruction pipeline operation description: the pipeline ends after five stages: if, id, ex, ma, and wb. in the ma stage nothing happens and the data to be transferred is retained. the data is written to the register in the wb stage. contention occurs if the instruction immediately following this instruction, or the instruction after that, reads from sr (see section 10.2.3, contention due to sr update instructions). 471 (3) ldtlb instruction instruction type: ldtlb pipeline: if id if ex id if ma ex id ex slot ldtlb next instruction third instruction in series figure 10-48 ldtlb instruction pipeline operation description: the pipeline ends after four stages: if, id, ex, and ma. there is no wb stage because no data is returned to the register. see section 10.2.1, contention between instruction fetch (if) and memory access (ma), with reference to contention between the ma and if stages of these instructions. (4) nop instruction instruction type: nop pipeline: if id if ex id if ex id ex slot nop next instruction third instruction in series figure 10-49 nop instruction pipeline operation description: the pipeline ends after three stages: if, id, and ex. 472 (5) ldc instruction (sr) instruction type: ldc rm,sr pipeline: if id ex ex ex if id if ex id ex slot ldc next instruction third instruction in series figure 10-50 ldc instruction (sr) pipeline operation description: the pipeline ends after five stages: if, id, ex, ex, and ex. the data is written to sr in the last ex stage. the if of the next instruction starts from the slot after the slot that has the ex stage of instruction a. (6) ldc.l instructions (sr) instruction type: ldc.l @rm+, sr pipeline: next instruction third instruction in series if if ex ma ex ex ex if id ldc.l id slot ..... id ex ..... figure 10-51 ldc.l instruction (sr) pipeline operation description: the pipeline ends after seven stages: if, id, ex, ma, ex, ex, and ex. the data is written to sr in the last ex stage. the if of the next instruction starts from the slot after the slot that has the final ex stage of instruction a. 473 (7) lds.l instruction (pr) instruction type: lds.l @rm+, pr next instruction third instruction in series if if ex if id lds.l id slot ex ..... id ma ex ..... wb ..... figure 10-52 lds.l instructions (pr) pipeline operation description: the pipeline ends after five stages: if, id, ex, ma and wb. contention occurs if this instruction is followed by an instruction that uses the same destination register (see section 10.2.2, effects of memory load instructions on pipelines). also, the ma of this instruction contends with if (see section 10.2.1, contention between instruction fetch (if) and memory access (ma)). (8) sts.l instruction (pr) instruction type: sts.l pr, @Crn next instruction third instruction in series if if ex if id sts.l id slot ex ..... id ma ex ..... ..... figure 10-53 sts.l instruction (pr) pipeline operation description: the pipeline ends after five stages: if, id, ex, ma and wb. the wb stage writes the decremented value to the register. the ma of this instruction contends with if (see section 10.2.1, contention between instruction fetch (if) and memory access (ma)). 474 (9) ldc.l instructions instruction types: ldc.l @rm+, gbr ldc.l @rm+, vbr ldc.l @rm+, ssr ldc.l @rm+, spc ldc.l @rm+, mod (sh3-dsp only) ldc.l @rm+, re (sh3-dsp only) ldc.l @rm+, rs (sh3-dsp only) ldc.l @rm+, r0_bank ldc.l @rm+, r1_bank ldc.l @rm+, r2_bank ldc.l @rm+, r3_bank ldc.l @rm+, r4_bank ldc.l @rm+, r5_bank ldc.l @rm+, r6_bank ldc.l @rm+, r7_bank pipeline: next instruction third instruction in series if if ex ma wb if id ldc.l id slot ex ..... id ex ..... ..... figure 10-54 ldc.l instruction pipeline operation description: the pipeline ends after five stages: if, id, ex, ma and wb. contention occurs if this instruction is followed by an instruction that uses the same destination register (see section 10.2.2, effects of memory load instructions on pipelines). also, the ma of this instruction contends with if (see section 10.2.1, contention between instruction fetch (if) and memory access (ma)). on the sh3-dsp, the id of the instruction following the ldc instruction is stored four slots behind. 475 (10) stc.l instructions (excluding bank registers) instruction types: stc.l sr, @Crn stc.l gbr, @Crn stc.l vbr, @Crn stc.l ssr, @Crn stc.l spc, @Crn stc.l mod,@-rn (sh3-dsp only) stc.l re,@-rn (sh3-dsp only) stc.l rs,@-rn (sh3-dsp only) pipeline: next instruction third instruction in series if if ex if id stc.l id id slot ex ..... ..... ex ma wb ..... ..... ..... figure 10-55 stc.l instruction (excluding bank register) pipeline operation description: the pipeline ends after five stages: if, id, ex, ma and wb. the wb stage writes the decremented value to the register. the ma of this instruction contends with if (see section 10.2.1, contention between instruction fetch (if) and memory access (ma)). on the sh3-dsp, the id of the instruction following the ldc instruction is stored one slot behind. 476 (11) stc.l instructions (bank registers) instruction types: stc.l r0_bank,@Crn stc.l r1_bank,@Crn stc.l r2_bank,@Crn stc.l r3_bank,@Crn stc.l r4_bank,@Crn stc.l r5_bank,@Crn stc.l r6_bank,@Crn stc.l r7_bank,@Crn pipeline: next instruction third instruction in series if if ex ex ma if id stc.l id slot ex ..... ?dex ..... figure 10-56 stc.l instruction (bank register) pipeline operation description: the pipeline ends after six stages: if, id, ex, ex, ma and wb. the id of the next instruction is stalled one cycle. these instructions cause contention with if (see section 10.2.1, contention between instruction fetch (if) and memory access (ma)). 477 (12) register ? mac/dsp transfer instructions instruction types: clrmac lds rm, mach lds rm, macl lds rm, dsr (sh3-dsp only) lds rm, a0 (sh3-dsp only) lds rm, x0 (sh3-dsp only) lds rm, x1 (sh3-dsp only) lds rm, y0 (sh3-dsp only) lds rm, y1 (sh3-dsp only) pipeline: next instruction third instruction in series if if ex if id instruction a id slot ex ..... id ma ex ..... ..... figure 10-57 register ? mac transfer instruction pipeline operation description: the pipeline ends after four stages: if, id, ex, and ma. the ma stage is used to access the multiplier. this ma contends with if (see section 10.2.1, contention between instruction fetch (if) and memory access (ma)). also, if one of these instructions is followed by an instruction that uses the multiplier, multiplier contention will result (see section 10.2.4, multiplier access contention). 478 (13) memory ? mac transfer instructions instruction types: lds.l @rm+, mach lds.l @rm+, macl lds.l @rm+, dsr (sh3-dsp only) lds.l @rm+, a0 (sh3-dsp only) lds.l @rm+, x0 (sh3-dsp only) lds.l @rm+, x1 (sh3-dsp only) lds.l @rm+, y0 (sh3-dsp only) lds.l @rm+, y1 (sh3-dsp only) next instruction third instruction in series if if ex if id lds.l id slot ex ..... id ma ex ..... ..... figure 10-58 memory ? mac transfer instruction pipeline the pipeline ends after four stages: if, id, ex, and ma. the ma stage is used to access memory and the multiplier. this ma contends with if. (see section 10.2.1, contention between instruction fetch (if) and memory access (ma).) also, if one of these instructions is followed by an instruction that uses the multiplier, multiplier contention will result (see section 10.2.4, multiplier access contention). 479 (14) mac/dsp ? register transfer instructions instruction types: sts mach, rn sts macl, rn sts dsr, rn (sh3-dsp only) sts a0, rn (sh3-dsp only) sts x0, rn (sh3-dsp only) sts x1, rn (sh3-dsp only) sts y0, rn (sh3-dsp only) sts y1, rn (sh3-dsp only) pipeline: next instruction third instruction in series if if ex if id sts id slot ex ..... id ma ex ..... wb ..... figure 10-59 mac ? register transfer instruction pipeline operation description: the pipeline ends after five stages: if, id, ex, ma and wb. the ma stage is used to access the multiplier. this ma contends with if (see section 10.2.1, contention between instruction fetch (if) and memory access (ma)). also, if one of these instructions is followed by an instruction that uses the same destination register or an instruction that uses the multiplier, multiplier contention will result (see section 10.2.2, effects of memory load instructions on pipelines, and section 10.2.4, multiplier access contention). 480 (15) mac ? memory transfer instructions instruction types: sts.l mach, @Crn sts.l macl, @Crn sts.l dsr, @Crn (sh3-dsp only) sts.l a0, @Crn (sh3-dsp only) sts.l x0, @Crn (sh3-dsp only) sts.l x1, @Crn (sh3-dsp only) sts.l y0, @Crn (sh3-dsp only) sts.l y1, @Crn (sh3-dsp only) pipeline: next instruction third instruction in series if if ex if id sts.l id slot ex ..... id ma wb ex ..... ..... figure 10-60 mac ? memory transfer instruction pipeline operation description: the pipeline ends after five stages: if, id, ex, ma and wb. the ma stage is used to access the multiplier. this ma contends with if (see section 10.2.1, contention between instruction fetch (if) and memory access (ma)). also, if one of these instructions is followed by an instruction that uses the multiplier, multiplier contention will result (see section 10.2.4, multiplier access contention). 481 (16) rte instruction instruction type: rte pipeline: delay slot branch destination if if ex (fetch but then data is discarded) rte id slot if id ex ..... ex if ex id ex ..... ..... figure 10-61 rte instruction pipeline operation description: the pipeline ends after five stages: if, id, ex, ex, and ex. rte is a delayed branch instruction. the instruction following the rte instruction, that is, the delay slot instruction, is fetched and executed. however, the instruction after that is fetched and discarded. the if of the branch destination instruction starts from the slot after the slot that has the final ex stage of rte. 482 (17) trap instruction instruction type: trapa #imm ..... ..... next instruction third instruction in series if if (fetch but then data is discarded) ex if (fetch but then data is discarded) trapa id ex ex ex ..... ..... ..... if id branch destination ex ..... if id slot ...... ..... figure 10-62 trap instruction pipeline the pipeline has six stages*: if, id, ex, ex, ex, and ex (figure 10-62). trap is not a delayed branch instruction. the two instructions after the trap instruction are fetched, but they are discarded without being executed. the if of the branch destination instruction starts from the next slot of the last ex of the trap instruction. note: * on the sh3-dsp there are eight stages: if, id, ex, ex, ex, ex, ex, and ex. 483 (18) sleep instruction instruction type: sleep pipeline: next instruction if if ex ex ex ex sleep id slot ..... figure 10-63 sleep instruction pipeline operation description: the pipeline has three stages: if, id, and ex (figure 10-63). it is issued until the if of the next instruction. after the sleep instruction is executed, the cpu enters sleep mode or standby mode. 484 10.4.7 exception processing (1) interrupt exception processing instruction type: interrupt exception processing pipeline: ex ..... next instruction branch destination if ex interrupt id ex ex ex if id ..... ..... ..... if id slot if ...... figure 10-64 interrupt exception processing pipeline operation description: the interrupt is received during the id stage of the instruction and everything after the id stage is replaced by the interrupt exception processing sequence. the pipeline has six stages: if, id, ex, ex, ex, and ex (figure 10-64). interrupt exception processing is not a delayed branch. in interrupt exception processing, an overrun fetch (if) occurs. in branch destination instructions, the if starts from the slot following the final ex in the interrupt exception processing. interrupt sources are nmi, irl, and on-chip peripheral module interrupts. refer to the hardware manual for details. 485 (2) address error exception processing instruction type: address error exception processing pipeline: ex ..... next instruction branch destination if ex interrupt id ex ex ex if id ..... ..... ..... if id slot if ....... figure 10-65 address error exception processing pipeline operation description: the address error is received during the id stage of the instruction and everything after the id stage is replaced by the address error exception processing sequence. the pipeline has six stages: if, id, ex, ex, ex, and ex (figure 10-65). address error exception processing is not a delayed branch. in address error exception processing, an overrun fetch (if) occurs. in branch destination instructions, the if starts from the slot following the final ex in the address error exception processing. address errors are caused by instruction fetches and by data reads or writes. fetching an instruction from an odd address or fetching an instruction from an on-chip peripheral register causes an instruction fetch address error. accessing word data from other than a word boundary, accessing longword data from other than a longword boundary, and accessing an on-chip peripheral register 8-bit space by longword cause a read or write address error. refer to the hardware manual for details. 486 (3) tlb related exception processing instruction type: tlb related exception processing pipeline: if id if ex ex ex ex if id if ex id id slot tlb related exception next instruction branch destination figure 10-66 tlb related exception processing pipeline operation description: if a tlb related exception is received in the instruction's id stage, the portion following the id stage is replaced by the tlb related exception processing sequence. the pipeline ends after six stages: if, id, ex, ex, ex, and ex. tlb related exception processing is not a delayed branch. in tlb related exception processing, an overrun fetch (if) occurs. in branch destination instructions, the if starts from the slot after the slot that has the final ex stage of the tlb related exception processing. tlb related exceptions include tlb error, tlb invalid, tlb initial write, and tlb protection exceptions. refer to the hardware manual for details. 487 (4) illegal instruction exception processing instruction type: illegal instruction exception processing ex ..... next instruction if ex illegal instruction id ex ex if id ex ..... ..... ..... ..... ..... if id slot if branch destination ...... figure 10-67 illegal instruction exception processing pipeline the illegal instruction is received during the id stage of the instruction and everything after the id stage is replaced by the illegal instruction exception processing sequence. the pipeline has six stages: if, id, ex, ex, ex, and ex (figure 10-67). illegal instruction exception processing is not a delayed branch. in illegal instruction exception processing, an overrun fetch (if) occurs. whether there is an if only in the next instruction or in the one after that as well depends on the instruction that was to be executed. in branch destination instructions, the if starts from the slot following the final ex in the illegal instruction exception processing. illegal instruction exception processing is caused by ordinary illegal instructions and by instructions with illegal slots. when undefined code placed somewhere other than the slot directly after the delayed branch instruction (called the delay slot) is decoded, ordinary illegal instruction exception processing occurs. when undefined code placed in the delay slot is decoded or when an instruction placed in the delay slot to rewrite the program counter is decoded, an illegal slot instruction occurs. refer to the hardware manual for details. 488 10.4.8 pipeline for fpu instructions (sh-3e only) next instruction subsequent instruction if if e1 if id instruction a df slot ex ..... id e2 ex ..... ..... figure 10-68 fpu pipeline during data transfer between floating point register and register next instruction subsequent instruction if if e1 if id instruction a df slot ex ..... id e2 sf ex ..... ..... figure 10-69 fpu pipeline during floating point load next instruction subsequent instruction if if e1 if id instruction a df slot ex ..... id e2 ex ..... ..... figure 10-70 fpu pipeline during floating point store next instruction subsequent instruction if if e1 if id instruction a df slot ex ..... id ex ..... ..... figure 10-71 fpu pipeline during floating point compare 489 next instruction subsequent instruction if if e1 if id instruction a df slot ex ..... id e2 ex ..... ..... figure 10-72 fpu pipeline during floating point arithmetic calculation instruction (excluding fdiv and fsqrt) next instruction case 1: next instruction is fpu instruction subsequent instruction if if e1 df if df e1 sf instruction a df e2 e1 e1 e1 sf e2 ..... slot ?1e2sf next instruction case 2: next instruction is cpu instruction and subsequent instruction is fpu instruction notes: 1. 2. 3. fdiv and fsqrt require 13 cycles in the e1 stage. the next instruction enters the cpu pipeline, it is deleted from the fpu pipeline after the df stage. even if there are two to twelve cpu instructions between fdiv (or fsqrt) and the next fpu instructions, the situation is still interpreted in the same way as case 2. subsequent instruction if if e1 df ?1 sf instruction a df e2 if df e1 e1 e1 sf e2 ..... slot figure 10-73 fpu pipeline during fdiv and fsqrt instructions 490 10.4.9 dsp data transfer instructions (sh3-dsp only) (1) x memory and y memory load instructions instruction types: nopx movx.w @ax,dx movx.w @ax+,dx movx.w @ax+ix,dx pipeline: slot instructiona next instruction subsequent instruction ????? ?? if ?? id if ?? ex id if ?? ma ex id ?? wb/dsp ma ex ?? ?? wb/dsp ?? wb/dsp ma figure 10-74 x memory and y memory load instruction pipeline operation description: the pipeline has five stages: if, if, ex, ma, and wb/dsp. data is transferred via the x bus, so there is no contention with the if of other instructions. 491 (2) y memory load instructions instruction types: nopy movy.w @ay,dy movy.w @ay+,dy movy.w @ay+iy,dy pipeline: ?? if ?? id if ?? ex id if ?? ma ex id ?? wb/dsp ma ex ?? ?? wb/dsp ?? wb/dsp ma slot instruction a next instruction subsequent instruction ????? figure 10-75 y memory load instruction pipeline operation description: the pipeline has five stages: if, id, ex, ma, and wb/dsp. data is transferred via the y bus, so there is no contention with the if of other instructions. (3) x memory store instructions instruction types: movx.w da,@ax movx.w da,@ax+ movx.w da,@ax+ix pipeline: ?? if ?? id if ?? ex id if ?? ma ex id ?? ma ex ?? ?? ?? ma slot instruction a next instruction subsequent instruction ????? figure 10-76 x memory store instruction pipeline 492 operation description: the pipeline has four stages: if, id, ex, and ma. if this instruction attempts to access the dsp operation result immediately after a dsp operation instruction, contention occurs (see section 10.2.6, contention between dsp data operation instructions and store instructions (sh3-dsp only)). 493 (4) y memory store instructions instruction types: movy.w da,@ay movy.w da,@ay+ movy.w da,@ay+iy pipeline: ?? if ?? id if ?? ex id if ?? ma ex id ?? ma ex ?? ?? ?? ma slot instruction a next instruction subsequent instruction ????? figure 10-77 y memory store instruction pipeline operation description: the pipeline has four stages: if, id, ex, and ma. if this instruction attempts to access the dsp operation result immediately after a dsp operation instruction, contention occurs (see section 10.2.6, contention between dsp data operation instructions and store instructions (sh3-dsp only)). 494 (5) single load instructions instruction types: movs.w @-as,ds movs.w @as,ds movs.w @as+,ds movs.w @as+is,ds movs.l @-as,ds movs.l @as,ds movs.l @as+,ds movs.l @as+is,ds pipeline: ?? if ?? id if ?? ex id if ?? ma ex id ?? wb/dsp ma ex ?? ?? wb/dsp ?? wb/dsp ma slot instruction a next instruction subsequent instruction ????? figure 10-78 single load instruction pipeline operation description: the pipeline has five stages: if, id, ex, ma, and wb/dsp. no contention occurs even if another instruction uses the destination register of this instruction. 495 (6) single store instructions instruction types: movs.w ds,@-as movs.w ds,@as movs.w ds,@as+ movs.w ds,@as+is movs.l ds,@-as movs.l ds,@as movs.l ds,@as+ movs.l ds,@as+i pipeline: ?? if ?? id if ?? ex id if ?? ma ex id ?? ma ex ?? ?? ?? ma slot instruction a next instruction subsequent instruction ????? figure 10-79 single store instruction pipeline operation description: the pipeline has four stages: if, id, ex, and ma. if this instruction attempts to store the dsp operation result immediately after a dsp operation instruction, contention occurs (see section 10.2.6, contention between dsp data operation instructions and store instructions (sh3-dsp only)). 496 10.4.10 dsp operation instructions (sh3-dsp only) (1) alu arithmetic operation instructions instruction types: padd sx, sy,dz(du) pneg sx,dz dct padd sx, sy,dz dct pneg sx,dz dcf padd sx, sy,dz dcf pneg sx,dz psub sx, sy,dz(du) pneg sy,dz dct psub sx, sy,dz dct pneg sy,dz dcf psub sx, sy,dz dcf pneg sy,dz pcopy sx,dz pdec sx,dz dct pcopy sx,dz dct pdec sx,dz dcf pcopy sx,dz dcf pdec sx,dz pcopy sy,dz pdec sy,dz dct pcopy sy,dz dct pdec sy,dz dcf pcopy sy,dz dcf pdec sy,dz pdmsb sx,dz pclr dz dct pdmsb sx,dz dct pclr dz dcf pdmsb sx,dz dcf pclr dz pdmsb sy,dz paddc sx,sy,dz dct pdmsb sy,dz psubc sx,sy,dz dcf pdmsb sy,dz pcmp sx,sy pinc sx,dz pabs sx,dz dct pinc sx,dz pabs sy,dz dcf pinc sx,dz prnd sx,dz pinc sy,dz prnd sy,dz dct pinc sy,dz dcf pinc sy,dz pipeline: ?? if ?? id if ?? ex id if ?? ma ex id ?? wb/dsp ma ex ?? ?? wb/dsp ?? wb/dsp ma slot instruction a next instruction subsequent instruction ????? figure 10-80 alu arithmetic operation instruction pipeline 497 operation description: the pipeline has five stages: if, id, ex, ma, and wb/dsp. if the condition of a conditional operation instruction is not satisfied, the wb/dsp stage is not executed (no operation), but the pipeline does not change. (2) alu logical operation instructions instruction types: por sx,sy,dz dct por sx,sy,dz dcf por sx,sy,dz pand sx,sy,dz dct pand sx,sy,dz dcf pand sx,sy,dz pxor sx,sy,dz dct pxor sx,sy,dz dcf pxor sx,sy,dz pipeline: ?? if ?? id if ?? ex id if ?? ma ex id ?? wb/dsp ma ex ?? ?? wb/dsp ?? wb/dsp ma slot instruction a next instruction subsequent instruction ????? figure 10-81 alu logical operation instruction pipeline operation description: the pipeline has five stages: if, id, ex, ma, and wb/dsp. if the condition of a conditional operation instruction is not satisfied, the wb/dsp stage is not executed (no operation), but the pipeline does not change. 498 (3) alu logical operation instructions instruction types: psha sx,sy,dz dct psha sx,sy,dz dcf psha sx,sy,dz psha #imm,dz pshl sx,sy,dz dct pshl sx,sy,dz dcf pshl sx,sy,dz pshl #imm,dz pipeline: ?? if ?? id if ?? ex id if ?? ma ex id ?? wb/dsp ma ex ?? ?? wb/dsp ?? wb/dsp ma slot instruction a next instruction subsequent instruction ????? figure 10-82 alu logical operation instruction pipeline operation description: the pipeline has five stages: if, id, ex, ma, and wb/dsp. if the condition of a conditional operation instruction is not satisfied, the wb/dsp stage is not executed (no operation), but the pipeline does not change. 499 (4) signed multiplication instruction instruction types: pmuls se,sf,dg pipeline: ?? if ?? id if ?? ex id if ?? ma ex id ?? wb/dsp ma ex ?? ?? wb/dsp ?? wb/dsp ma slot instruction a next instruction subsequent instruction ????? figure 10-83 signed multiplication instruction pipeline operation description: the pipeline has five stages: if, id, ex, ma, and wb/dsp. 500 (5) register transfer instructions instruction types: psts mach,dz dct psts mach,dz dcf psts mach,dz psts macl,dz dct psts macl,dz dcf psts macl,dz plds dz,mach dct plds dz,mach dcf plds dz,mach plds dz,macl dct plds dz,macl dcf plds dz,macl pipeline: ?? if ?? id if ?? ex id if ?? ma ex id ?? wb/dsp ma ex ?? ?? wb/dsp ?? wb/dsp ma slot instruction a next instruction subsequent instruction ????? figure 10-84 register transfer instruction pipeline operation description: the pipeline has five stages: if, id, ex, ma, and wb/dsp. if the condition of a conditional operation instruction is not satisfied, the wb/dsp stage is not executed (no operation), but the pipeline does not change. if a memory load is performed in parallel with this instruction using movx.w, movs.w, or movx.l, contention occurs. contention also occurs if a memory store is performed immediately after this instruction using movx.w, movs.w, or movx.l (see section 10.2.7, contention between dsp register transfer and memory load/store operations (sh3-dsp only)). 501 appendix a instruction code a.1 instruction set by addressing mode table a-1 instruction set by addressing mode types addressing mode category sample instruction sh-3 sh-3e sh3- dsp no operand nop 11 11 11 direct register destination operand only movt rn 18 23 18 addressing source and destination operand add rm,rn 36 44 36 transfer to control register or system register ldc rm,sr 16 19 26 transfer from control register or system register sts mach,rn 16 19 25 indirect register source operand only jmp @rn 33 3 addressing destination operand only tas.b @rn 11 1 data transfer direct from register mov.l rm,@rn 68 6 post-increment indirect multiply/accumulate operation mac.w @rm+,@rn+ 22 2 register addressing data transfer direct from register mov.l @rm+,rn 34 3 load to control register or system register ldc.l @rm+,sr 16 18 25 pre-decrement indirect data transfer direct from register mov.l rm,@Crn 34 3 register addressing store from control register or system register stc.l sr,@Crn 16 18 25 indirect register addressing with displacement data transfer direct to register mov.l rm, @(disp,rn) 66 6 indirect indexed register addressing data transfer direct to register mov.l rm,@(r0,rn) 68 6 indirect gbr addressing with displacement data transfer direct to register mov.l r0, @(disp,gbr) 66 6 indirect indexed gbr addressing immediate data transfer and.b #imm, @(r0,gbr) 44 4 pc relative addressing with displacement data transfer direct to register mov.l @(disp,pc), rn 33 5 pc relative addressing with rn branch instruction braf rn 22 2 pc relative addressing branch instruction bra disp 66 6 immediate addressing load to register fldi0 frn 02 0 arithmetic logical operations direct with register add #imm,rn 77 7 specify exception processing vector trapa #imm 11 1 load to control register setrc #imm 00 1 total: 189 220 227 502 a.1.1 no operand table a-2 no operand instruction operation code cycles t bit clrs 0 ? s 0000000001001000 1 clrt 0 ? t 0000000000001000 10 clrmac 0 ? mach, macl 0000000000101000 1 div0u 0 ? m/q/t 0000000000011001 10 ldtlb pteh/ptel ? tlb 0000000000111000 1 nop no operation 0000000000001001 1 rte delayed branching, ssr/spc ? sr/pc 0000000000101011 4 rts delayed branching, pr ? pc 0000000000001011 2 sets 1 ? s 0000000001011000 1 sett 1 ? t 0000000000011000 11 sleep sleep 0000000000011011 4 503 a.1.2 direct register addressing table a-3 destination operand only instruction operation code cycles t bit cmp/pl rn rn > 0, 1 ? t 0100nnnn00010101 1 comparison result cmp/pz rn rn 3 0, 1 ? t 0100nnnn00010001 1 comparison result dt rn rn C 1 ? rn, when rn is 0, 1 ? t. when rn is nonzero, 0 ? t 0100nnnn00010000 1 comparison result fabs frn * abs(frn ? frn 1111nnnn01011101 1 float fpul, frn * (float)fpul ? frn 1111nnnn00101101 1 fneg frn * C1.0 frn ? frn 1111nnnn01001101 1 fsqrt frn * sqrt(frn) ? frn 1111nnnn01101101 13 ftrc frm, fpul * (long)frm ? fpul 1111mmmm00111101 1 movt rn t ? rn 0000nnnn00101001 1 rotl rn t ? rn ? msb 0100nnnn00000100 1 msb rotr rn lsb ? rn ? t 0100nnnn00000101 1 lsb rotcl rn t ? rn ? t 0100nnnn00100100 1 msb rotcr rn t ? rn ? t 0100nnnn00100101 1 lsb shal rn t ? rn ? 0 0100nnnn00100000 1 msb shar rn msb ? rn ? t 0100nnnn00100001 1 lsb shll rn t ? rn ? 0 0100nnnn00000000 1 msb shlr rn 0 ? rn ? t 0100nnnn00000001 1 lsb shll2 rn rn << 2 ? rn 0100nnnn00001000 1 shlr2 rn rn >> 2 ? rn 0100nnnn00001001 1 shll8 rn rn << 8 ? rn 0100nnnn00011000 1 shlr8 rn rn >> 8 ? rn 0100nnnn00011001 1 shll16 rn rn << 16 ? rn 0100nnnn00101000 1 shlr16 rn rn >> 16 ? rn 0100nnnn00101001 1 note: * floating point arithmetic calculation instruction or cpu instruction related to the fpu. these instructions are available only on the sh-3e. 504 table a-4 source and destination operand instruction operation code cycles t bit add rm,rn rn + rm ? rn 0011nnnnmmmm1100 1 addc rm,rn rn + rm + t ? rn, carry ? t 0011nnnnmmmm1110 1 carry addv rm,rn rn + rm ? rn, overflow ? t 0011nnnnmmmm1111 1 overflow and rm,rn rn & rm ? rn 0010nnnnmmmm1001 1 cmp/eq rm,rn when rn = rm, 1 ? t 0011nnnnmmmm0000 1 comparison result cmp/hs rm,rn when unsigned and rn 3 rm, 1 ? t 0011nnnnmmmm0010 1 comparison result cmp/ge rm,rn when signed and rn 3 rm, 1 ? t 0011nnnnmmmm0011 1 comparison result cmp/hi rm,rn when unsigned and rn > rm, 1 ? t 0011nnnnmmmm0110 1 comparison result cmp/gt rm,rn when signed and rn > rm, 1 ? t 0011nnnnmmmm0111 1 comparison result cmp/str rm,rn when a byte in rn equals a bytes in rm, 1 ? t 0010nnnnmmmm1100 1 comparison result div1 rm,rn 1 step division (rn rm) 0011nnnnmmmm0100 1 calculation result div0s rm,rn msb of rn ? q, msb of rm ? m, m ^ q ? t 0010nnnnmmmm0111 1 calculation result dmuls.l rm,rn signed operation of rn x rm ? mach, macl 0011nnnnmmmm1101 2 (to 5) * 2 dmulu.l rm,rn unsigned operation of rn rm ? mach, macl 0011nnnnmmmm0101 2 (to 5) * 2 exts.b rm,rn sign C extend rm from byte ? rn 0110nnnnmmmm1110 1 exts.w rm,rn sign C extend rm from word ? rn 0110nnnnmmmm1111 1 extu.b rm,rn zero C extend rm from byte ? rn 0110nnnnmmmm1100 1 extu.w rm,rn zero C extend rm from word ? rn 0110nnnnmmmm1101 1 fadd frm, frn * 1 frm + frn ? frn 1111nnnnmmmm0000 1 505 table a-4 source and destination operand (cont) instruction operation code cycles t bit fcmp/eq frm, frn * 1 frn = frm, 1 ? t 1111nnnnmmmm0100 1 comparison result fcmp/gt frm, frn * 1 frn > frm, 1 ? t 1111nnnnmmmm0101 1 comparison result fdiv frm, frn * 1 frn/frm ? frm 1111nnnnmmmm0011 13 fmac fr0,frm frn * 1 (fr0 frm) + frn ? frn 1111nnnnmmmm1110 1 fmov frm, frn * 1 frm ? frn 1111nnnnmmmm1100 1 fmul frm, frn * 1 frn frm ? frn 1111nnnnmmmm0010 1 fsub frm, frn * 1 frn C frm ? frn 1111nnnnmmmm0001 1 mov rm,rn rm ? rn 0110nnnnmmmm0011 1 mul.l rm,rn rn rm ? mac 0000nnnnmmmm0111 2 (to 5) * 2 muls.w rm,rn with sign, rn rm ? mac 0010nnnnmmmm1111 1 (to 3) * 2 mulu.w rm,rn unsigned, rn rm ? mac 0010nnnnmmmm1110 1 (to 3) * 2 neg rm,rn 0 C rm ? rn 0110nnnnmmmm1011 1 negc rm,rn 0 C rm C t ? rn, borrow ? t 0110nnnnmmmm1010 1 borrow not rm,rn ~rm ? rn 0110nnnnmmmm0111 1 or rm,rn rn | rm ? rn 0010nnnnmmmm1011 1 shad rm,rn rn 3 0; rn << rm ? rn rn < 0; rn >> rm ? (msb ? )rn 0100nnnnmmmm1100 1 shld rm,rn rn 3 0; rn << rm ? rn rn < 0; rn >> rm ? (0 ? )rn 0100nnnnmmmm1101 1 sub rm,rn rn C rm ? rn 0011nnnnmmmm1000 1 subc rm,rn rn C rm C t ? rn, borrow ? t 0011nnnnmmmm1010 1 borrow 506 table a-4 source and destination operand (cont) instruction operation code cycles t bit subv rm,rn rn C rm ? rn, underflow ? t 0011nnnnmmmm1011 1 underflow swap.b rm,rn rm ? swap upper and lower halves of lower 2 bytes ? rn 0110nnnnmmmm1000 1 swap.w rm,rn rm ? swap upper and lower word ? rn 0110nnnnmmmm1001 1 tst rm,rn rn & rm, when result is 0, 1 ? t 0010nnnnmmmm1000 1 test results xor rm,rn rn ^ rm ? rn 0010nnnnmmmm1010 1 xtrct rm,rn rm: center 32 bits of rn ? rn 0010nnnnmmmm1101 1 notes: 1. floating point arithmetic calculation instruction or cpu instruction related to the fpu. these instructions are available only on the sh-3e. 2. normal minimum number of execution states (the number in parentheses is the number of states when there is contention with preceding/following instructions). 507 table a-5 load and store with control register or system register instruction operation code cycles t bit flds frm,fpul * 1 frm ? fpul 1111mmmm00011101 1 ldc rm,sr rm ? sr 0100mmmm00001110 5 lsb ldc rm,gbr rm ? gbr 0100mmmm00011110 1/3 * 2 ldc rm,vbr rm ? vbr 0100mmmm00101110 1/3 * 2 ldc rm,ssr rm ? ssr 0100mmmm00111110 1/3 * 2 ldc rm,spc rm ? spc 0100mmmm01001110 1/3 * 2 ldc rm,mod * 3 rm ? mod 0100mmmm01011110 3 ldc rm,re * 3 rm ? re 0100mmmm01111110 3 ldc rm,rs * 3 rm ? rs 0100mmmm01101110 3 ldc rm,r0_bank rm ? r0_bank 0100mmmm10001110 1/3 * 2 ldc rm,r1_bank rm ? r1_bank 0100mmmm10011110 1/3 * 2 ldc rm,r2_bank rm ? r2_bank 0100mmmm10101110 1/3 * 2 ldc rm,r3_bank rm ? r3_bank 0100mmmm10111110 1/3 * 2 ldc rm,r4_bank rm ? r4_bank 0100mmmm11001110 1/3 * 2 ldc rm,r5_bank rm ? r5_bank 0100mmmm11011110 1/3 * 2 ldc rm,r6_bank rm ? r6_bank 0100mmmm11101110 1/3 * 2 ldc rm,r7_bank rm ? r7_bank 0100mmmm11111110 1/3 * 2 lds rm,fpscr * 1 rm ? fpscr 0100mmmm01101010 1 lds rm,fpul * 1 rm ? fpul 0100mmmm01011010 1 lds rm,mach rm ? mach 0100mmmm00001010 1 lds rm,macl rm ? macl 0100mmmm00011010 1 lds rm,pr rm ? pr 0100mmmm00101010 1 lds rm,dsr * 3 rm ? dsr 0100mmmm01101010 1 lds rm,a0 * 3 rm ? a0 0100mmmm01111010 1 lds rm,x0 * 3 rm ? x0 0100mmmm10001010 1 lds rm,x1 * 3 rm ? x1 0100mmmm10011010 1 lds rm,y0 * 3 rm ? y0 0100mmmm10101010 1 lds rm,y1 * 3 rm ? y1 0100mmmm10111010 1 setrc rm * 3 lsw of rm ? rc (msw of sr), repeat control flag ? rf1, rf0 0100mmmm00010100 3 notes: 1. floating point arithmetic calculation instruction or cpu instruction related to the fpu. these instructions are available only on the sh-3e. 2. three cycles on the sh3-dsp. 3. cpu instructions to provide support for dsp functions. these instructions can only be used with the sh3-dsp. 508 table a-6 load and store from control register or system register instruction operation code cycles t bit fsts fpul,frn * 1 fpul ? frn 1111nnnn01011010 1 stc sr,rn sr ? rn 0000nnnn00000010 1 stc gbr,rn gbr ? rn 0000nnnn00010010 1 stc vbr,rn vbr ? rn 0000nnnn00100010 1 stc ssr,rn ssr ? rn 0000nnnn00110010 1 stc spc,rn spc ? rn 0000nnnn01000010 1 stc mod,rn * 2 mod ? rn 0000nnnn01010010 1 stc re,rn * 2 re ? rn 0000nnnn01110010 1 stc rs,rn * 2 rs ? rn 0000nnnn01100010 1 stc r0_bank,rn r0_bank ? rn 0000nnnn10000010 1 stc r1_bank,rn r1_bank ? rn 0000nnnn10010010 1 stc r2_bank,rn r2_bank ? rn 0000nnnn10100010 1 stc r3_bank,rn r3_bank ? rn 0000nnnn10110010 1 stc r4_bank,rn r4_bank ? rn 0000nnnn11000010 1 stc r5_bank,rn r5_bank ? rn 0000nnnn11010010 1 stc r6_bank,rn r6_bank ? rn 0000nnnn11100010 1 stc r7_bank,rn r7_bank ? rn 0000nnnn11110010 1 sts fpscr,rn * 1 fpscr ? rn 1111nnnn01101010 1 sts fpul,rn * 1 fpul ? rn 1111nnnn01011010 1 sts mach,rn mach ? rn 0000nnnn00001010 1 sts macl,rn macl ? rn 0000nnnn00011010 1 sts pr,rn pr ? rn 0000nnnn00101010 1 sts dsr,rn * 2 dsr ? rn 0000nnnn01101010 1 sts a0,rn * 2 a0 ? rn 0000nnnn01111010 1 sts x0,rn * 2 x0 ? rn 0000nnnn10001010 1 sts x1,rn * 2 x1 ? rn 0000nnnn10011010 1 sts y0,rn * 2 y0 ? rn 0000nnnn10101010 1 sts y1,rn * 2 y1 ? rn 0000nnnn10111010 1 notes: 1. floating point arithmetic calculation instruction or cpu instruction related to the fpu. these instructions are available only on the sh-3e. 2. cpu instructions to provide support for dsp functions. these instructions can only be used with the sh3-dsp. 509 a.1.3 indirect register addressing table a-7 source operand only instruction operation code cycles t bit jmp @rn delayed branching, rn ? pc 0100nnnn00101011 2 jsr @rn delayed branching, pc ? rn, rn ? pc 0100nnnn00001011 2 pref @rn (rn) ? cache 0000nnnn10000011 1 note: * two cycles on the sh3-dsp. table a-8 destination operand only instruction operation code cycles t bit tas.b @rn when (rn) is 0, 1 ? t, 1 ? msb of (rn) 0100nnnn00011011 3 test results note: * four cycles on the sh3-dsp. table a-9 data transfer direct to register instruction operation code cycles t bit fmov.s frm,@rn * frm ? (frn) 1111nnnnmmmm1010 1 fmov.s @rm,frn * (rm) ? frn 1111nnnnmmmm1000 1 mov.b rm,@rn rm ? (rn) 0010nnnnmmmm0000 1 mov.w rm,@rn rm ? (rn) 0010nnnnmmmm0001 1 mov.l rm,@rn rm ? (rn) 0010nnnnmmmm0010 1 mov.b @rm,rn (rm) ? sign extension ? rn 0110nnnnmmmm0000 1 mov.w @rm,rn (rm) ? sign extension ? rn 0110nnnnmmmm0001 1 mov.l @rm,rn (rm) ? rn 0110nnnnmmmm0010 1 note: * floating point arithmetic calculation instruction or cpu instruction related to the fpu. these instructions are available only on the sh-3e. 510 a.1.4 post-increment indirect register addressing table a-10 multiply/accumulate operation instruction operation code cycles t bit mac.l @rm+,@rn+ signed operation of (rn) (rm) + mac ? mac 0000nnnnmmmm1111 2 (to 5) * mac.w @rm+,@rn+ signed operation of (rn) (rm) + mac ? mac 0100nnnnmmmm1111 2 (to 5) * note: * normal minimum number of execution states (the number in parenthesis is the number of states when there is contention with preceding/following instructions). table a-11 data transfer direct from register instruction operation code cycles t bit fmov.s @rm+,frn * (rm) ? frn, rm + 4 ? rm 1111nnnnmmmm1001 1 mov.b @rm+,rn (rm) ? sign extension ? rn, rm + 1 ? rm 0110nnnnmmmm0100 1 mov.w @rm+,rn (rm) ? sign extension ? rn, rm + 2 ? rm 0110nnnnmmmm0101 1 mov.l @rm+,rn (rm) ? rn, rm + 4 ? rm 0110nnnnmmmm0110 1 note: * floating point arithmetic calculation instruction or cpu instruction related to the fpu. these instructions are available only on the sh-3e. table a-12 load to control register or system register instruction operation code cycles t bit ldc.l @rm+,sr (rm) ? sr, rm + 4 ? rm 0100mmmm00000111 7 lsb ldc.l @rm+,gbr (rm) ? gbr, rm + 4 ? rm 0100mmmm00010111 1/5 * 2 ldc.l @rm+,vbr (rm) ? vbr, rm + 4 ? rm 0100mmmm00100111 1/5 * 2 ldc.l @rm+,ssr (rm) ? ssr, rm + 4 ? rm 0100mmmm00110111 1/5 * 2 ldc.l @rm+,spc (rm) ? spc, rm + 4 ? rm 0100mmmm01000111 1/5 * 2 ldc.l @rm+,mod * 3 (rm) ? mod, rm + 4 ? rm 0100mmmm01010111 5 ldc.l @rm+,re * 3 (rm) ? re, rm + 4 ? rm 0100mmmm01110111 5 ldc.l @rm+,rs * 3 (rm) ? rs, rm + 4 ? rm 0100mmmm01100111 5 ldc.l @rm+,r0_ bank (rm) ? r0_bank, rm + 4 ? rm 0100mmmm10000111 1/5 * 2 511 table a-12 load to control register or system register (cont) instruction operation code cycles t bit ldc.l @rm+,r1_ bank (rm) ? r1_bank, rm + 4 ? rm 0100mmmm10010111 1/5 * 2 ldc.l @rm+,r2_ bank (rm) ? r2_bank, rm + 4 ? rm 0100mmmm10100111 1/5 * 2 ldc.l @rm+,r3_ bank (rm) ? r3_bank, rm + 4 ? rm 0100mmmm10110111 1/5 * 2 ldc.l @rm+,r4_ bank (rm) ? r4_bank, rm + 4 ? rm 0100mmmm11000111 1/5 * 2 ldc.l @rm+,r5_ bank (rm) ? r5_bank, rm + 4 ? rm 0100mmmm11010111 1/5 * 2 ldc.l @rm+,r6_ bank (rm) ? r6_bank, rm + 4 ? rm 0100mmmm11100111 1/5 * 2 ldc.l @rm+,r7_ bank (rm) ? r7_bank, rm + 4 ? rm 0100mmmm11110111 1/5 * 2 lds.l @rm+,fpscr * 1 (rm) ? fpscr, rm + 4 ? rm 0100mmmm01100110 1 lds.l @rm+,fpul * 1 (rm) ? fpul, rm + 4 ? rm 0100mmmm01010110 1 lds.l @rm+,mach (rm) ? mach, @rm + 4 ? rm 0100mmmm00000110 1 lds.l @rm+,macl (rm) ? macl, @rm + 4 ? rm 0100mmmm00010110 1 lds.l @rm+,pr (rm) ? pr, @rm + 4 ? rm 0100mmmm00100110 1 lds.l @rm+,dsr * 3 (rm) ? dsr, rm + 4 ? rm 0100mmmm01100110 1 lds.l @rm+,a0 * 3 (rm) ? a0, rm + 4 ? rm 0100mmmm01110110 1 lds.l @rm+,x0 * 3 (rm) ? x0,rm+4 ? rm 0100nnnn10000110 1 lds.l @rm+,x1 * 3 (rm) ? x1,rm+4 ? rm 0100nnnn10010110 1 lds.l @rm+,y0 * 3 (rm) ? y0,rm+4 ? rm 0100nnnn10100110 1 lds.l @rm+,y1 * 3 (rm) ? y1,rm+4 ? rm 0100nnnn10110110 1 notes: 1. floating point arithmetic calculation instruction or cpu instruction related to the fpu. these instructions are available only on the sh-3e. 2. five cycles on the sh3-dsp. 3. cpu instructions to provide support for dsp functions. these instructions can only be used with the sh3-dsp. 512 a.1.5 pre-decrement indirect register addressing table a-13 data transfer direct from register instruction operation code cycles t bit fmov.s frm,@Crn * rn C 4 ? rn, frm ? (rn) 1111nnnnmmmm1011 1 mov.b rm,@Crn rn C 1 ? rn, rm ? (rn) 0010nnnnmmmm0100 1 mov.w rm,@Crn rn C 2 ? rn, rm ? (rn) 0010nnnnmmmm0101 1 mov.l rm,@Crn rn C 4 ? rn, rm ? (rn) 0010nnnnmmmm0110 1 note: * floating point arithmetic calculation instruction or cpu instruction related to the fpu. these instructions are available only on the sh-3e. table a-14 store from control register or system register instruction operation code cycles t bit stc.l sr,@-rn rn C 4 ? rn, sr ? (rn) 0100nnnn00000011 1/2 * 1 stc.l gbr,@-rn rn C 4 ? rn, gbr ? (rn) 0100nnnn00010011 1/2 * 1 stc.l vbr,@-rn rn C 4 ? rn, vbr ? (rn) 0100nnnn00100011 1/2 * 1 stc.l ssr,@Crn rnC4 ? rn, ssr ? (rn) 0100nnnn00110011 1/2 * 1 stc.l spc,@Crn rnC4 ? rn, spc ? (rn) 0100nnnn01000011 1/2 * 1 stc.l mod,@-rn * 3 rn C 4 ? rn, mod ? (rn) 0100nnnn01010011 2 stc.l re,@-rn * 3 rn C 4 ? rn, re ? (rn) 0100nnnn01110011 2 stc.l rs,@-rn * 3 rn C 4 ? rn, rs ? (rn) 0100nnnn01100011 2 stc.l r0_bank, @Crn rnC4 ? rn, r0_bank ? (rn) 0100nnnn10000011 2 stc.l r1_bank, @Crn rnC4 ? rn, r1_bank ? (rn) 0100nnnn10010011 2 stc.l r2_bank, @Crn rnC4 ? rn, r2_bank ? (rn) 0100nnnn10100011 2 stc.l r3_bank, @Crn rnC4 ? rn, r3_bank ? (rn) 0100nnnn10110011 2 stc.l r4_bank, @Crn rnC4 ? rn, r4_bank ? (rn) 0100nnnn11000011 2 stc.l r5_bank, @Crn rnC4 ? rn, r5_bank ? (rn) 0100nnnn11010011 2 stc.l r6_bank, @Crn rnC4 ? rn, r6_bank ? (rn) 0100nnnn11100011 2 stc.l r7_bank, @Crn rnC4 ? rn, r7_bank ? (rn) 0100nnnn11110011 2 513 table a-14 store from control register or system register (cont) instruction operation code cycles t bit sts.l fpscr,@Crn * rn C 4 ? rn, fpscr ? (rn) 0100nnnn01100010 1 sts.l fpul,@Crn * rn C 4 ? rn, fpul ? (rn) 0100nnnn01010010 1 sts.l mach,@Crn rn C 4 ? rn, mach ? (rn) 0100nnnn00000010 1 sts.l macl,@Crn rn C 4 ? rn, macl ? (rn) 0100nnnn00010010 1 sts.l pr,@Crn rn C 4 ? rn, pr ? (rn) 0100nnnn00100010 1 sts.l dsr,@Crn * 3 rn C 4 ? rn, dsr ? (rn) 0100nnnn01100010 1 sts.l a0,@Crn * 3 rn C 4 ? rn, a0 ? (rn) 0100nnnn01100010 1 sts.l x0,@-rn * 3 rnC4 ? rn,x0 ? (rn) 0100nnnn10000010 1 sts.l x1,@-rn * 3 rnC4 ? rn,x1 ? (rn) 0100nnnn10010010 1 sts.l y0,@-rn * 3 rnC4 ? rn,y0 ? (rn) 0100nnnn10100010 1 sts.l y1,@-rn * 3 rnC4 ? rn,y1 ? (rn) 0100nnnn10110010 1 notes: 1. floating point arithmetic calculation instruction or cpu instruction related to the fpu. these instructions are available only on the sh-3e. 2. two cycles on the sh3-dsp. 3. cpu instructions to provide support for dsp functions. these instructions can only be used with the sh3-dsp. a.1.6 indirect register addressing with displacement table a-15 indirect register addressing with displacement instruction operation code cycles t bit mov.b r0,@(disp,rn) r0 ? (disp + rn) 10000000nnnndddd 1 mov.w r0,@(disp,rn) r0 ? (disp + rn) 10000001nnnndddd 1 mov.l rm,@(disp,rn) rm ? (disp + rn) 0001nnnnmmmmdddd 1 mov.b @(disp,rm),r0 (disp + rm) ? sign extension ? r0 10000100mmmmdddd 1 mov.w @(disp,rm),r0 (disp + rm) ? sign extension ? r0 10000101mmmmdddd 1 mov.l @(disp,rm),rn (disp + rm) ? rn 0101nnnnmmmmdddd 1 514 a.1.7 indirect indexed register addressing table a-16 indirect indexed register addressing instruction operation code cycles t bit mov.b rm,@(r0,rn) rm ? (r0 + rn) 0000nnnnmmmm0100 1 mov.w rm,@(r0,rn) rm ? (r0 + rn) 0000nnnnmmmm0101 1 mov.l rm,@(r0,rn) rm ? (r0 + rn) 0000nnnnmmmm0110 1 fmov.s frm,@(r0,rn) * frm ? (r0 + rn) 1111nnnnmmmm0111 1 mov.b @(r0,rm),rn (r0 + rm) ? sign extension ? rn 0000nnnnmmmm1100 1 mov.w @(r0,rm),rn (r0 + rm) ? sign extension ? rn 0000nnnnmmmm1101 1 mov.l @(r0,rm),rn (r0 + rm) ? rn 0000nnnnmmmm1110 1 fmov.s @(r0,frm),frm * (r0 + rn) ? frn 1111nnnnmmmm0110 1 note: * floating point arithmetic calculation instruction or cpu instruction related to the fpu. these instructions are available only on the sh-3e. a.1.8 indirect gbr addressing with displacement table a-17 indirect gbr addressing with displacement instruction operation code cycles t bit mov.b r0,@(disp,gbr) r0 ? (disp + gbr) 11000000dddddddd 1 mov.w r0,@(disp,gbr) r0 ? (disp + gbr) 11000001dddddddd 1 mov.l r0,@(disp,gbr) r0 ? (disp + gbr) 11000010dddddddd 1 mov.b @(disp,gbr),r0 (disp + gbr) ? sign extension ? r0 11000100dddddddd 1 mov.w @(disp,gbr),r0 (disp 2 + gbr) ? sign extension ? r0 11000101dddddddd 1 mov.l @(disp,gbr),r0 (disp 4 + gbr) ? r0 11000110dddddddd 1 515 a.1.9 indirect indexed gbr addressing table a-18 indirect indexed gbr addressing instruction operation code cycles t bit and.b #imm,@(r0,gbr) (r0 + gbr) & imm ? (r0 + gbr) 11001101iiiiiiii 3 or.b #imm,@(r0,gbr) (r0 + gbr) | imm ? (r0 + gbr) 11001111iiiiiiii 3 tst.b #imm,@(r0,gbr) (r0 + gbr) & imm, when result is 0, 1 ? t 11001100iiiiiiii 3 test results xor.b #imm,@(r0,gbr) (r0 + gbr) ^ imm ? (r0 + gbr) 11001110iiiiiiii 3 a.1.10 pc relative addressing with displacement table a-19 pc relative addressing with displacement instruction operation code cycles t bit mov.w @(disp,pc),rn (disp 2 + pc) ? sign extension ? rn 1001nnnndddddddd 1 mov.l @(disp,pc),rn (disp 4 + pc) ? rn 1101nnnndddddddd 1 mova @(disp,pc),r0 disp 4 + pc ? r0 11000111dddddddd 1 ldrs @(disp,pc) * disp 2+pc ? rs 10001100dddddddd 3 ldre @(disp,pc) * disp 2+pc ? re 10001110dddddddd 3 note: * sh3-dsp instructions. a.1.11 pc relative addressing table a-20 pc relative addressing with rm instruction operation code cycles t bit braf rm delayed branch, rm + pc ? pc 0000mmmm00100011 2 bsrf rm delayed branch, pc ? pr, rm + pc ? pc 0000mmmm00000011 2 516 table a-21 pc relative addressing instruction operation code cycles t bit bf label when t = 0, disp 2 + pc ? pc; when t = 1, nop 10001011dddddddd 3/1 bf/s label if t = 0, disp 2 + pc ? pc; if t = 1, nop 10001111dddddddd 2/1 * bt label when t = 1, disp 2 + pc ? pc; when t = 1, nop 10001001dddddddd 3/1 bt/s label if t = 1, disp 2 + pc ? pc; if t = 0, nop 10001101dddddddd 2/1 * bra label delayed branching, disp 2 + pc ? pc 1010dddddddddddd 2 bsr label delayed branching, pc ? pr, disp 2 + pc ? pc 1011dddddddddddd 2 note: * one state when it does not branch. a.1.12 immediate table a-22 load to register instruction operation code cycles t bit fldi0 frn * 0.0 ? frn 1111nnnn10001101 1 fldi1 frn * 1.0 ? frn 1111nnnn10011101 1 note: * floating point arithmetic calculation instruction or cpu instruction related to the fpu. these instructions are available only on the sh-3e. 517 table a-23 arithmetic logical operations direct with register instruction operation code cycles t bit add #imm,rn rn + #imm ? rn 0111nnnniiiiiiii 1 and #imm,r0 r0 & imm ? r0 11001001iiiiiiii 1 cmp/eq #imm,r0 when r0 = imm, 1 ? t 10001000iiiiiiii 1 comparison result mov #imm,rn #imm ? sign extension ? rn 1110nnnniiiiiiii 1 or #imm,r0 r0 | imm ? r0 11001011iiiiiiii 1 tst #imm,r0 r0 & imm, when result is 0, 1 ? t 11001000iiiiiiii 1 test results xor #imm,r0 r0 ^ imm ? r0 11001010iiiiiiii 1 table a-24 specify exception processing vector instruction operation code cycles t bit trapa #imm imm ? tra, pc ? spc, sr ? ssr, 1 ? sr.md/bl/rb, 0x160 ? expevt vbr + h'00000100 ? pc 11000011iiiiiiii 6/8 * note: * eight cycles on the sh3-dsp. table a-25 load to control register instruction operation code cycles t bit setrc #imm imm ? rc(sr[23:16]), zeros ? sr[27:24] 10000010iiiiiiii 3 note: * sh3-dsp instruction. 518 a.2 instruction sets by instruction format tables a-26 to a-57 list instruction codes and execution cycles by instruction formats. table a-26 instruction sets by format types format category sample instruction sh-3 sh-3e sh3- dsp 0 nop 11 11 11 n direct register addressing movt rn 18 18 18 direct register addressing (store with control or system registers) sts mach,rn 18 18 25 indirect register addressing tas @rn 11 1 pre-decrement indirect register addressing stc.l sr,@Crn 16 18 25 floating point instruction fabs frn 7 m direct register addressing (load with control or system registers) ldc rm,sr 16 18 26 pc relative addressing with rm braf rm 22 2 indirect register addressing jmp @rm 22 2 post-increment indirect register addressing ldc.l @rm+,sr 16 18 25 floating point instruction flds frm,fpul 2 nm direct register addressing add rm,rn 36 36 36 indirect register addressing mov.l rm,@rn 66 6 post-increment indirect register addressing (multiply/accumulate operation) mac.w @rm+,@rn+ 22 2 post-increment indirect register addressing mov.l @rm+,rn 33 3 pre-decrement indirect register addressing mov.l rm,@Crn 33 3 indirect indexed register addressing mov.l rm,@(r0,rn) 66 6 floating point instruction fadd frm,frn 14 md indirect register addressing with displacement mov.b @(disp,rm),r0 22 2 nd4 indirect register addressing with displacement mov.b r0,@(disp,rn) 22 2 nmd indirect register addressing with displacement mov.l rm,@(disp,rn) 22 2 d indirect gbr addressing with displacement mov.l r0,@(disp,gbr) 66 6 indirect pc addressing with displacement mova @(disp,pc),r0 11 3 pc relative addressing bf label 44 4 d12 pc relative addressing bra label 22 2 note: * the figures in parentheses ( ) are the totals excluding the sh-3e instructions. 519 table a-26 instruction sets by format (cont) types format category sample instruction sh-3 sh-3e sh3- dsp nd8 pc relative addressing with displacement mov.l @(disp,pc),rn 22 2 i indirect indexed gbr addressing and.b #imm,@(r0,gbr) 44 4 immediate addressing (arithmetic and logical operations direct with register) and #imm,r0 55 5 immediate addressing (specify exception processing vector) trapa #imm 11 1 load to control register (sh3-dsp only) setrc #imm 1 ni immediate addressing (direct register arithmetic operations and data transfers ) add #imm,rn 22 2 total: 189 220 227 a.2.1 0 format table a-27 0 format instruction operation code cycles t bit clrs 0 ? s 0000000001001000 1 clrt 0 ? t 0000000000001000 10 clrmac 0 ? mach, macl 0000000000101000 1 div0u 0 ? m/q/t 0000000000011001 10 ldtlb pteh/ptel ? tlb 0000000000111000 1 nop no operation 0000000000001001 1 rte delayed branch, ssr/spc ? sr/pc 0000000000101011 4 rts delayed branching, pr ? pc 0000000000001011 2 sets 1 ? s 0000000001011000 1 sett 1 ? t 0000000000011000 11 sleep sleep 0000000000011011 4 * note: * this is number of states until a transition is made to the sleep state. 520 a.2.2 n format table a-28 direct register instruction operation code cycles t bit cmp/pl rn rn > 0, 1 ? t 0100nnnn00010101 1 comparison result cmp/pz rn rn 3 0, 1 ? t 0100nnnn00010001 1 comparison result dt rn rn C 1 ? rn, when rn is 0, 1 ? t. when rn is nonzero, 0 ? t 0100nnnn00010000 1 comparison result movt rn t ? rn 0000nnnn00101001 1 rotl rn t ? rn ? msb 0100nnnn00000100 1 msb rotr rn lsb ? rn ? t 0100nnnn00000101 1 lsb rotcl rn t ? rn ? t 0100nnnn00100100 1 msb rotcr rn t ? rn ? t 0100nnnn00100101 1 lsb shal rn t ? rn ? 0 0100nnnn00100000 1 msb shar rn msb ? rn ? t 0100nnnn00100001 1 lsb shll rn t ? rn ? 0 0100nnnn00000000 1 msb shlr rn 0 ? rn ? t 0100nnnn00000001 1 lsb shll2 rn rn << 2 ? rn 0100nnnn00001000 1 shlr2 rn rn >> 2 ? rn 0100nnnn00001001 1 shll8 rn rn << 8 ? rn 0100nnnn00011000 1 shlr8 rn rn >> 8 ? rn 0100nnnn00011001 1 shll16 rn rn << 16 ? rn 0100nnnn00101000 1 shlr16 rn rn >> 16 ? rn 0100nnnn00101001 1 521 table a-29 direct register (store with control and system registers) instruction operation code cycles t bit stc sr,rn sr ? rn 0000nnnn00000010 1 stc gbr,rn gbr ? rn 0000nnnn00010010 1 stc vbr,rn vbr ? rn 0000nnnn00100010 1 stc ssr,rn ssr ? rn 0000nnnn00110010 1 stc spc,rn spc ? rn 0000nnnn01000010 1 stc mod,rn * 2 mod ? rn 0000nnnn01010010 1 stc re,rn * 2 re ? rn 0000nnnn01110010 1 stc rs,rn * 2 rs ? rn 0000nnnn01100010 1 stc r0_bank,rn r0_bank ? rn 0000nnnn10000010 1 stc r1_bank,rn r1_bank ? rn 0000nnnn10010010 1 stc r2_bank,rn r2_bank ? rn 0000nnnn10100010 1 stc r3_bank,rn r3_bank ? rn 0000nnnn10110010 1 stc r4_bank,rn r4_bank ? rn 0000nnnn11000010 1 stc r5_bank,rn r5_bank ? rn 0000nnnn11010010 1 stc r6_bank,rn r6_bank ? rn 0000nnnn11100010 1 stc r7_bank,rn r7_bank ? rn 0000nnnn11110010 1 sts fpscr,rn * 1 fpscr ? rn 0000nnnn01101010 1 sts fpul,rn * 1 fpul ? rn 0000nnnn01011010 1 sts mach,rn mach ? rn 0000nnnn00001010 1 sts macl,rn macl ? rn 0000nnnn00011010 1 sts pr,rn pr ? rn 0000nnnn00101010 1 sts dsr,rn * 2 dsr ? rn 0000nnnn01101010 1 sts a0,rn * 2 a0 ? rn 0000nnnn01111010 1 sts x0,rn * 2 x0 ? rn 0000nnnn10001010 1 sts x1,rn * 2 x1 ? rn 0000nnnn10011010 1 sts y0,rn * 2 y0 ? rn 0000nnnn10101010 1 sts y1,rn * 2 y1 ? rn 0000nnnn10111010 1 notes: 1. sh-3e instructions. 2. sh3-dsp instructions. table a-30 indirect register instruction operation code cycles t bit tas.b @rn when (rn) is 0, 1 ? t, 1 ? msb of (rn) 0100nnnn00011011 3/4 * test results note: * four cycles on the sh3-dsp. 522 table a-31 indirect pre-decrement register instruction operation code cycles t bit stc.l sr,@-rn rn C 4 ? rn, sr ? (rn) 0100nnnn00000011 1/2 * 2 stc.l gbr,@-rn rn C 4 ? rn, gbr ? (rn) 0100nnnn00010011 1/2 * 2 stc.l vbr,@-rn rn C 4 ? rn, vbr ? (rn) 0100nnnn00100011 1/2 * 2 stc.l ssr,@Crn rnC4 ? rn, ssr ? (rn) 0100nnnn00110011 1/2 * 2 stc.l spc,@Crn rnC4 ? rn, spc ? (rn) 0100nnnn01000011 1/2 * 2 stc.l mod,@-rn * 3 rn C 4 ? rn, mod ? (rn) 0100nnnn01010011 2 stc.l re,@-rn * 3 rn C 4 ? rn, re ? (rn) 0100nnnn01110011 2 stc.l rs,@-rn * 3 rn C 4 ? rn, rs ? (rn) 0100nnnn01100011 2 stc.l r0_bank,@Crn rnC4 ? rn, r0_bank ? (rn) 0100nnnn10000011 2 stc.l r1_bank,@Crn rnC4 ? rn, r1_bank ? (rn) 0100nnnn10010011 2 stc.l r2_bank,@Crn rnC4 ? rn, r2_bank ? (rn) 0100nnnn10100011 2 stc.l r3_bank,@Crn rnC4 ? rn, r3_bank ? (rn) 0100nnnn10110011 2 stc.l r4_bank,@Crn rnC4 ? rn, r4_bank ? (rn) 0100nnnn11000011 2 stc.l r5_bank,@Crn rnC4 ? rn, r5_bank ? (rn) 0100nnnn11010011 2 stc.l r6_bank,@Crn rnC4 ? rn, r6_bank ? (rn) 0100nnnn11100011 2 stc.l r7_bank,@Crn rnC4 ? rn, r7_bank ? (rn) 0100nnnn11110011 2 sts.l fpscr,@-rn * 1 rnC4 ? rn, fpscr ? @rn 0100nnnn01100010 1 sts.l fpul,@-rn * 1 rnC4 ? rn, fpul ? @rn 0100nnnn01010010 1 sts.l mach,@Crn rn C 4 ? rn, mach ? (rn) 0100nnnn00000010 1 sts.l macl,@Crn rn C 4 ? rn, macl ? (rn) 0100nnnn00010010 1 sts.l pr,@Crn rn C 4 ? rn, pr ? (rn) 0100nnnn00100010 1 sts.l dsr,@Crn * 3 rn C 4 ? rn, dsr ? (rn) 0100nnnn01100010 1 sts.l a0,@Crn * 3 rn C 4 ? rn, a0 ? (rn) 0100nnnn01100010 1 sts.l x0,@-rn * 3 rnC4 ? rn,x0 ? (rn) 0100nnnn10000010 1 sts.l x1,@-rn * 3 rnC4 ? rn,x1 ? (rn) 0100nnnn10010010 1 sts.l y0,@-rn * 3 rnC4 ? rn,y0 ? (rn) 0100nnnn10100010 1 sts.l y1,@-rn * 3 rnC4 ? rn,y1 ? (rn) 0100nnnn10110010 1 notes: 1. sh-3e instructions. 2. two cycles on the sh3-dsp. 3. sh3-dsp instructions. 523 table a-32 floating point instructions (sh-3e only) instruction operation code cycles t bit fabs frn ? frn ? ? frn 1111nnnn01011101 1 fldi0 frn h'00000000 ? frn 1111nnnn10001101 1 fldi1 frn h'3f800000 ? frn 1111nnnn10011101 1 float fpul,frn (float)fpul ? frn 1111nnnn00101101 1 fneg frn frn ? frn 1111nnnn01001101 1 fsqrt frn ? frn ? frn 1111nnnn01101101 13 fsts fpul,frn fpul ? frn 1111nnnn00001101 1 a.2.3 m format table a-33 direct register (load from control and system registers) instruction operation code cycles t bit ldc rm,sr rm ? sr 0100mmmm00001110 5 lsb ldc rm,gbr rm ? gbr 0100mmmm00011110 1/3 * 2 ldc rm,vbr rm ? vbr 0100mmmm00101110 1/3 * 2 ldc rm,ssr rm ? ssr 0100mmmm00111110 1/3 * 2 ldc rm,spc rm ? spc 0100mmmm01001110 1/3 * 2 ldc rm,mod * 3 rm ? mod 0100mmmm01011110 3 ldc rm,re * 3 rm ? re 0100mmmm01111110 3 ldc rm,rs * 3 rm ? rs 0100mmmm01101110 3 ldc rm,r0_bank rm ? r0_bank 0100mmmm10001110 1/3 * 2 ldc rm,r1_bank rm ? r1_bank 0100mmmm10011110 1/3 * 2 ldc rm,r2_bank rm ? r2_bank 0100mmmm10101110 1/3 * 2 ldc rm,r3_bank rm ? r3_bank 0100mmmm10111110 1/3 * 2 ldc rm,r4_bank rm ? r4_bank 0100mmmm11001110 1/3 * 2 ldc rm,r5_bank rm ? r5_bank 0100mmmm11011110 1/3 * 2 ldc rm,r6_bank rm ? r6_bank 0100mmmm11101110 1/3 * 2 ldc rm,r7_bank rm ? r7_bank 0100mmmm11111110 1/3 * 2 lds rm,fpscr * 1 rm ? fpscr 0100nnnn01101010 1 lds rm,fpul * 1 rm ? fpul 0100nnnn01011010 1 lds rm,mach rm ? mach 0100mmmm00001010 1 lds rm,macl rm ? macl 0100mmmm00011010 1 524 table a-33 direct register (load from control and system registers) (cont) instruction operation code cycles t bit lds rm,pr rm ? pr 0100mmmm00101010 1 lds rm,dsr * 3 rm ? dsr 0100mmmm01101010 1 lds rm,a0 * 3 rm ? a0 0100mmmm01111010 1 lds rm,x0 * 3 rm ? x0 0100mmmm10001010 1 lds rm,x1 * 3 rm ? x1 0100mmmm10011010 1 lds rm,y0 * 3 rm ? y0 0100mmmm10101010 1 lds rm,y1 * 3 rm ? y1 0100mmmm10111010 1 setrc #imm * 3 imm ? rc(sr[23:16]), zeros ? sr[27:24] 10000010iiiiiiii 3 notes: 1. sh-3e instructions. 2. three cycles on the sh3-dsp. 3. sh3-dsp instructions. table a-34 pc relative addressing with rm instruction operation code cycles t bit braf rm delayed branch, rm + pc ? pc 0000mmmm00100011 2 bsrf rm delayed branch, pc ? pr, rm + pc ? pc 0000mmmm00000011 2 table a-35 indirect register instruction operation code cycles t bit jmp @rm delayed branch, rm ? pc 0100mmmm00101011 2 jsr @rm delayed branch, pc ? pr, rm ? pc 0100mmmm00001011 2 table a-36 indirect post-increment register instruction operation code cycles t bit ldc.l @rm+,sr (rm) ? sr, rm + 4 ? rm 0100mmmm00000111 7 lsb ldc.l @rm+,gbr (rm) ? gbr, rm + 4 ? rm 0100mmmm00010111 1/5 * 2 ldc.l @rm+,vbr (rm) ? vbr, rm + 4 ? rm 0100mmmm00100111 1/5 * 2 ldc.l @rm+,ssr (rm) ? ssr, rm + 4 ? rm 0100mmmm00110111 1/5 * 2 ldc.l @rm+,spc (rm) ? spc, rm + 4 ? rm 0100mmmm01000111 1/5 * 2 525 table a-36 indirect post-increment register (cont) instruction operation code cycles t bit ldc.l @rm+,mod * 3 (rm) ? mod, rm + 4 ? rm 0100mmmm01010111 5 ldc.l @rm+,re * 3 (rm) ? re, rm + 4 ? rm 0100mmmm01110111 5 ldc.l @rm+,rs * 3 (rm) ? rs, rm + 4 ? rm 0100mmmm01100111 5 ldc.l @rm+,r0_ bank (rm) ? r0_bank, rm + 4 ? rm 0100mmmm10000111 1/5 * 2 ldc.l @rm+,r1_ bank (rm) ? r1_bank, rm + 4 ? rm 0100mmmm10010111 1/5 * 2 ldc.l @rm+,r2_ bank (rm) ? r2_bank, rm + 4 ? rm 0100mmmm10100111 1/5 * 2 ldc.l @rm+,r3_ bank (rm) ? r3_bank, rm + 4 ? rm 0100mmmm10110111 1/5 * 2 ldc.l @rm+,r4_ bank (rm) ? r4_bank, rm + 4 ? rm 0100mmmm11000111 1/5 * 2 ldc.l @rm+,r5_ bank (rm) ? r5_bank, rm + 4 ? rm 0100mmmm11010111 1/5 * 2 ldc.l @rm+,r6_ bank (rm) ? r6_bank, rm + 4 ? rm 0100mmmm11100111 1/5 * 2 ldc.l @rm+,r7_ bank (rm) ? r7_bank, rm + 4 ? rm 0100mmmm11110111 1/5 * 2 lds.l @rm+,fpscr * 1 @rm ? fpscr, rm + 4 ? rm 0100nnnn01100110 1 lds.l @rm+,fpul * 1 @rm ? fpul, rm + 4 ? rm 0100nnnn01010110 1 lds.l @rm+,mach (rm) ? mach, rm + 4 ? rm 0100mmmm00000110 1 lds.l @rm+,macl (rm) ? macl, rm + 4 ? rm 0100mmmm00010110 1 lds.l @rm+,pr (rm) ? pr, rm + 4 ? rm 0100mmmm00100110 1 lds.l @rm+,dsr * 3 (rm) ? dsr, rm + 4 ? rm 0100mmmm01100110 1 lds.l @rm+,a0 * 3 (rm) ? a0, rm + 4 ? rm 0100mmmm01110110 1 lds.l @rm+,x0 * 3 (rm) ? x0,rm+4 ? rm 0100nnnn10000110 1 lds.l @rm+,x1 * 3 (rm) ? x1,rm+4 ? rm 0100nnnn10010110 1 lds.l @rm+,y0 * 3 (rm) ? y0,rm+4 ? rm 0100nnnn10100110 1 lds.l @rm+,y1 * 3 (rm) ? y1,rm+4 ? rm 0100nnnn10110110 1 notes: 1. sh-3e instructions. 2. five cycles on the sh3-dsp. 3. the instruction of sh3-dsp. 526 table a-37 floating point instructions (sh-3e only) instruction operation code cycles t bit flds frm,fpul frm ? fpul 1111nnnn00011101 1 ftrc frm,fpul (long)frm ? fpul 1111nnnn00111101 1 a.2.4 nm format table a-38 direct register instruction operation code cycles t bit add rm,rn rm + rn ? rn 0011nnnnmmmm1100 1 addc rm,rn rn + rm + t ? rn, carry ? t 0011nnnnmmmm1110 1 carry addv rm,rn rn + rm ? rn, overflow ? t 0011nnnnmmmm1111 1 overflow and rm,rn rn & rm ? rn 0010nnnnmmmm1001 1 cmp/eq rm,rn when rn = rm, 1 ? t 0011nnnnmmmm0000 1 comparison result cmp/hs rm,rn when unsigned and rn 3 rm, 1 ? t 0011nnnnmmmm0010 1 comparison result cmp/ge rm,rn when signed and rn 3 rm, 1 ? t 0011nnnnmmmm0011 1 comparison result cmp/hi rm,rn when unsigned and rn > rm, 1 ? t 0011nnnnmmmm0110 1 comparison result cmp/gt rm,rn when signed and rn > rm, 1 ? t 0011nnnnmmmm0111 1 comparison result cmp/str rm,rn when a byte in rn equals a byte in rm, 1 ? t 0010nnnnmmmm1100 1 comparison result div1 rm,rn 1 step division (rn rm) 0011nnnnmmmm0100 1 calculation result div0s rm,rn msb of rn ? q, msb of rm ? m, m ^ q ? t 0010nnnnmmmm0111 1 calculation result dmuls.l rm,rn signed operation of rn rm ? mach, macl 0011nnnnmmmm1101 2 (to 5) * dmulu.l rm,rn unsigned operation of rn rm ? mach, macl 0011nnnnmmmm0101 2 (to 5) * exts.b rm,rn sign-extend rm from byte ? rn 0110nnnnmmmm1110 1 527 table a-38 direct register (cont) instruction operation code cycles t bit exts.w rm,rn sign-extend rm from word ? rn 0110nnnnmmmm1111 1 extu.b rm,rn zero-extend rm from byte ? rn 0110nnnnmmmm1100 1 extu.w rm,rn zero-extend rm from word ? rn 0110nnnnmmmm1101 1 mov rm,rn rm ? rn 0110nnnnmmmm0011 1 mul.l rm,rn rn rm ? mac 0000nnnnmmmm0111 2 (to 5) * muls rm,rn with sign, rn rm ? mac 0010nnnnmmmm1111 1 (to 3) * mulu rm,rn unsigned, rn rm ? mac 0010nnnnmmmm1110 1 (to 3) * neg rm,rn 0 C rm ? rn 0110nnnnmmmm1011 1 negc rm,rn 0 C rm C t ? rn, borrow ? t 0110nnnnmmmm1010 1 borrow not rm,rn ~rm ? rn 0110nnnnmmmm0111 1 or rm,rn rn | rm ? rn 0010nnnnmmmm1011 1 shad rm,rn rn 3 0; rn << rm ? rn rn < 0; rn >> rm ? [msb ? rn] 0100nnnnmmmm1100 1 shld rm,rn rn 3 0; rn << rm ? rn rn < 0; rn >> rm ? [0 ? rn] 0100nnnnmmmm1101 1 sub rm,rn rn C rm ? rn 0011nnnnmmmm1000 1 subc rm,rn rn C rm C t ? rn, borrow ? t 0011nnnnmmmm1010 1 borrow subv rm,rn rn C rm ? rn, underflow ? t 0011nnnnmmmm1011 1 under- flow swap.b rm,rn rm ? swap upper and lower halves of lower 2 bytes ? rn 0110nnnnmmmm1000 1 swap.w rm,rn rm ? swap upper and lower word ? rn 0110nnnnmmmm1001 1 tst rm,rn rn & rm, when result is 0, 1 ? t 0010nnnnmmmm1000 1 test results xor rm,rn rn ^ rm ? rn 0010nnnnmmmm1010 1 xtrct rm,rn rm: center 32 bits of rn ? rn 0010nnnnmmmm1101 1 note: * normal minimum number of execution states (the number in parentheses is the number of states when there is contention with preceding/following instructions). 528 table a-39 indirect register instruction operation code cycles t bit mov.b rm,@rn rm ? (rn) 0010nnnnmmmm0000 1 mov.w rm,@rn rm ? (rn) 0010nnnnmmmm0001 1 mov.l rm,@rn rm ? (rn) 0010nnnnmmmm0010 1 mov.b @rm,rn (rm) ? sign extension ? rn 0110nnnnmmmm0000 1 mov.w @rm,rn (rm) ? sign extension ? rn 0110nnnnmmmm0001 1 mov.l @rm,rn (rm) ? rn 0110nnnnmmmm0010 1 table a-40 indirect post-increment register (multiply/accumulate operation) instruction operation code cycles t bit mac.l @rm+,@rn+ signed operation of (rn) (rm) + mac ? mac, rn + 4 ? rn, rm + 4 ? rm 0000nnnnmmmm1111 2 (to 5) * mac.w @rm+,@rn+ signed operation of (rn) (rm) + mac ? mac, rn + 2 ? rn, rm + 2 ? rm 0100nnnnmmmm1111 2 (to 5) * note: * normal minimum number of execution states (the number in parentheses is the number of states when there is contention with preceding/following instructions). table a-41 indirect post-increment register instruction operation code cycles t bit mov.b @rm+,rn (rm) ? sign extension ? rn, rm + 1 ? rm 0110nnnnmmmm0100 1 mov.w @rm+,rn (rm) ? sign extension ? rn, rm + 2 ? rm 0110nnnnmmmm0101 1 mov.l @rm+,rn (rm) ? rn, rm + 4 ? rm 0110nnnnmmmm0110 1 table a-42 indirect pre-decrement register instruction operation code cycles t bit mov.b rm,@Crn rn C 1 ? rn, rm ? (rn) 0010nnnnmmmm0100 1 mov.w rm,@Crn rn C 2 ? rn, rm ? (rn) 0010nnnnmmmm0101 1 mov.l rm,@Crn rn C 4 ? rn, rm ? (rn) 0010nnnnmmmm0110 1 529 table a-43 indirect indexed register instruction operation code cycles t bit mov.b rm,@(r0,rn) rm ? (r0 + rn) 0000nnnnmmmm0100 1 mov.w rm,@(r0,rn) rm ? (r0 + rn) 0000nnnnmmmm0101 1 mov.l rm,@(r0,rn) rm ? (r0 + rn) 0000nnnnmmmm0110 1 mov.b @(r0,rm),rn (r0 + rm) ? sign extension ? rn 0000nnnnmmmm1100 1 mov.w @(r0,rm),rn (r0 + rm) ? sign extension ? rn 0000nnnnmmmm1101 1 mov.l @(r0,rm),rn (r0 + rm) ? rn 0000nnnnmmmm1110 1 table a-44 floating point instructions (sh-3e only) instruction operation code cycles t bit fadd frm,frn frn+frm ? frn 1111nnnnmmmm0000 1 fcmp/eq frm,frn (frn=frm)? 1:0 ? t 1111nnnnmmmm0100 1 comparison result fcmp/gt frm,frn (frn>frm)? 1:0 ? t 1111nnnnmmmm0101 1 comparison result fdiv frm,frn frn/frm ? frn 1111nnnnmmmm0011 13 fmac fr0,frm,frn fr0 frm+frn ? frn 1111nnnnmmmm1110 1 fmov frm,frn frm ? frn 1111nnnnmmmm1100 1 fmov.s @(r0,rm),frn (r0+rm) ? frn 1111nnnnmmmm0110 1 fmov.s @rm+,frn (rm) ? frn,rm+4 ? rm 1111nnnnmmmm1001 1 fmov.s @rm,frn (rm) ? frn 1111nnnnmmmm1000 1 fmov.s frm,@(r0,rn) (frm) ? (r0+rn) 1111nnnnmmmm0111 1 fmov.s frm,@-rn rn-4 ? rn, frm ? (rn) 1111nnnnmmmm1011 1 fmov.s frm,@rn frm ? (rn) 1111nnnnmmmm1010 1 fmul frm,frn frn frm ? frn 1111nnnnmmmm0010 1 fsub frm,frn frn-frm ? frn 1111nnnnmmmm0001 1 a.2.5 md format table a-45 md format instruction operation code cycles t bit mov.b @(disp,rm),r0 (disp + rm) ? sign extension ? r0 10000100mmmmdddd 1 mov.w @(disp,rm),r0 (disp 2+ rm) ? sign extension ? r0 10000101mmmmdddd 1 530 a.2.6 nd4 format table a-46 nd4 format instruction operation code cycles t bit mov.b r0,@(disp,rn) r0 ? (disp + rn) 10000000nnnndddd 1 mov.w r0,@(disp,rn) r0 ? (disp 2 + rn) 10000001nnnndddd 1 a.2.7 nmd format table a-47 nmd format instruction operation code cycles t bit mov.l rm,@(disp,rn) rm ? (disp 4 + rn) 0001nnnnmmmmdddd 1 mov.l @(disp,rm),rn (disp 4 + rm) ? rn 0101nnnnmmmmdddd 1 a.2.8 d format table a-48 indirect gbr with displacement instruction operation code cycles t bit mov.b r0,@(disp,gbr) r0 ? (disp + gbr) 11000000dddddddd 1 mov.w r0,@(disp,gbr) r0 ? (disp 2 + gbr) 11000001dddddddd 1 mov.l r0,@(disp,gbr) r0 ? (disp 4 + gbr) 11000010dddddddd 1 mov.b @(disp,gbr),r0 (disp + gbr) ? sign extension ? r0 11000100dddddddd 1 mov.w @(disp,gbr),r0 (disp 2 + gbr) ? sign extension ? r0 11000101dddddddd 1 mov.l @(disp,gbr),r0 (disp 4 + gbr) ? r0 11000110dddddddd 1 table a-49 pc relative with displacement instruction operation code cycles t bit mova @(disp,pc),r0 disp 4 + pc ? r0 11000111dddddddd 1 ldrs @(disp,pc) * disp 2+pc ? rs 10001100dddddddd 3 ldre @(disp,pc) * disp 2+pc ? re 10001110dddddddd 3 note: * sh3-dsp instructions. 531 table a-50 pc relative instruction operation code cycles t bit bf label when t = 0, disp 2 + pc ? pc; when t = 1, nop 10001011dddddddd 3/1 bf/s label if t = 0, disp 2 + pc ? pc; if t = 1, nop 10001111dddddddd 2/1 * bt label when t = 1, disp 2 + pc ? pc; when t = 0, nop 10001001dddddddd 3/1 bt/s label if t = 1, disp 2 + pc ? pc; if t = 0, nop 10001101dddddddd 2/1 * note: * one state when it does not branch. a.2.9 d12 format table a-51 d12 format instruction operation code cycles t bit bra label delayed branching, disp 2 + pc ? pc 1010dddddddddddd 2 bsr label delayed branching, pc ? pr, disp 2 + pc ? pc 1011dddddddddddd 2 a.2.10 nd8 format table a-52 nd8 format instruction operation code cycles t bit mov.w @(disp,pc),rn (disp 2 + pc) ? sign extension ? rn 1001nnnndddddddd 1 mov.l @(disp,pc),rn (disp 4 + pc) ? rn 1101nnnndddddddd 1 a.2.11 i format table a-53 indirect indexed gbr instruction operation code cycles t bit and.b #imm,@(r0,gbr) (r0 + gbr) & imm ? (r0 + gbr) 11001101iiiiiiii 3 or.b #imm,@(r0,gbr) (r0 + gbr) | imm ? (r0 + gbr) 11001111iiiiiiii 3 tst.b #imm,@(r0,gbr) (r0 + gbr) & imm, when result is 0, 1 ? t 11001100iiiiiiii 3 test results xor.b #imm,@(r0,gbr) (r0 + gbr) ^ imm ? (r0 + gbr) 11001110iiiiiiii 3 532 table a-54 immediate (arithmetic logical operation with direct register) instruction operation code cycles t bit and #imm,r0 r0 & imm ? r0 11001001iiiiiiii 1 cmp/eq #imm,r0 when r0 = imm, 1 ? t 10001000iiiiiiii 1 comparison results or #imm,r0 r0 | imm ? r0 11001011iiiiiiii 1 tst #imm,r0 r0 & imm, when result is 0, 1 ? t 11001000iiiiiiii 1 test results xor #imm,r0 r0 ^ imm ? r0 11001010iiiiiiii 1 table a-55 immediate (specify exception processing vector) instruction operation code cycles t bit trapa #imm imm ? tra, pc ? spc, sr ? ssr, 1 ? sr.md/bl/rb, 0x160 ? expevt, vbr + h'00000100 ? pc 11000011iiiiiiii 6/8 * note: * eight cycles on the sh3-dsp. table a-56 load to control register (sh3-dsp only) instruction operation code cycles t bit setrc #imm imm ? rc(sr[23:16]), zeros ? sr[27:24] 10000010iiiiiiii 3 a.2.12 ni format table a-57 ni format instruction operation code cycles t bit add #imm,rn rn + #imm ? rn 0111nnnniiiiiiii 1 mov #imm,rn #imm ? sign extension ? rn 1110nnnniiiiiiii 1 533 a.3 operation code map table a-58 operation code map instruction code fx: 0000 fx: 0001 fx: 0010 fx: 0011C1111 msb lsb md: 00 md: 01 md: 10 md: 11 0000 rn fx 0000 0000 rn fx 0001 0000 rn 00md 0010 stc sr,rn stc gbr,rn stc vbr,rn stc ssr,rn 0000 rn 01md 0010 stc spc,rn stc mod,rn * 2 stc rs,rn * 2 stc re,rn * 2 0000 rn 10md 0010 stc r0_bank,rn stc r1_bank,rn stc r2_bank,rn stc r3_bank,rn 0000 rn 11md 0010 stc r4_bank,rn stc r5_bank,rn stc r6_bank,rn stc r7_bank,rn 0000 rn 00md 0011 bsrf rm braf rm 0000 rm 10md 0011 pref @rm 0000 rn rm 01md mov.b rm, @(r0,rn) mov.w rm, @(r0,rn) mov.l rm, @(r0,rn) mul.l rm,rn 0000 0000 00md 1000 clrt sett clrmac ldtlb 0000 0000 01md 1000 clrs sets 0000 0000 fx 1001 nop divou 0000 0000 fx 1010 0000 0000 fx 1011 rts sleep rte 0000 rn fx 1000 0000 rn fx 1001 movt rn 0000 rn 00md 1010 sts mach,rn sts macl,rn sts pr,rn 0000 rn 01md 1010 sts fpul,rn * 1 sts fpscr,rn * 1 sts dsr,rn * 2 sts a0,rn * 2 0000 rn 10md 1010 sts x0,rn * 2 sts x1,rn * 2 sts y0,rn * 2 sts y1,rn * 2 0000 rn fx 1011 0000 rn rm 11md mov.b @(r0,rm),rn mov.w @(r0,rm),rn mov.l @(r0,rm),rn mac.l @rm+,@rn+ 0001 rn rm disp mov.l rm,@(disp:4,rn) 0010 rn rm 00md mov.b rm,@rn mov.w rm,@rn mov.l rm,@rn 0010 rn rm 01md mov.b rm,@-rn mov.w rm,@Crn mov.l rm,@Crn div0s rm,rn 0010 rn rm 10md tst rm,rn and rm,rn xor rm,rn or rm,rn 0010 rn rm 11md cmp/str rm,rn xtrct rm,rn mulu.w rm,rn muls.w rm,rn 0011 rn rm 00md cmp/eq rm,rn cmp/hs rm,rn cmp/ge rm,rn 534 table a-58 operation code map (cont) instruction code fx: 0000 fx: 0001 fx: 0010 fx: 0011C1111 msb lsb md: 00 md: 01 md: 10 md: 11 0011 rn rm 01md div1 rm,rn dmulu.l rm,rn cmp/hi rm,rn cmp/gt rm,rn 0011 rn rm 10md sub rm,rn subc rm,rn subv rm,rn 0011 rn rm 11md add rm,rn dmulu.l rm,rn addc rm,rn addv rm,rn 0100 rn fx 0000 shll rn dt rn shal rn 0100 rn fx 0001 shlr rn cmp/pz rn shar rn 0100 rn 00md 0010 sts.l mach, @Crn sts.l macl, @Crn sts.l pr, @Crn 0100 rn 01md 0010 sts.l fpul, @Crn * 1 sts.l dsr, @Crn * 2 sts.l fpscr, @Crn * 1 sts.l a0, @Crn * 2 0100 rn 10md 0010 sts.l x0, @Crn * 2 sts.l x1, @Crn * 2 sts.l y0, @Crn * 2 sts.l y1, @Crn * 2 0100 rn 00md 0011 stc.l sr,@Crn stc.l gbr,@Crn stc.l vbr,@Crn stc.l ssr,a-rn 0100 rn 01md 0011 stc.l spc,@-rn sts.l mod, @Crn * 2 sts.l rs, @Crn * 2 sts.l re, @Crn * 2 0100 rn 10md 0011 stc.l r0_bank,@-rn stc.l r1_bank,@-rn stc.l r2_bank,@-rn stc.l r3_bank,@-rn 0100 rn 11md 0011 stc.l r4_bank,@-rn stc.l r5_bank,@-rn stc.l r6_bank,@-rn stc.l r7_bank,@-rn 0100 rm/ rn fx 0100 rotl rn setrc rm rotcl rn 0100 rn fx 0101 rotr rn cmp/pl rn rotcr rn 0100 rm 00md 0110 lds.l @rm+,mach lds.l @rm+,macl lds.l @rm+,pr 0100 rm 01md 0110 lds.l @rm+,fpul * 1 lds.l @rm+,dsr * 2 lds.l @rm+,fpscr * 1 lds.l @rm+,a0 * 2 0100 rm 10md 0110 lds.l @rm+,x0 * 2 lds.l @rm+,x1 * 2 lds.l @rm+,y0 * 2 lds.l @rm+,y1 * 2 0100 rm 00md 0111 ldc.l @rm+,sr ldc.l @rm+,gbr ldc.l @rm+,vbr ldc.l @rm+,ssr 535 table a-58 operation code map (cont) instruction code fx: 0000 fx: 0001 fx: 0010 fx: 0011C1111 msb lsb md: 00 md: 01 md: 10 md: 11 0100 rm 01md 0111 ldc.l @rm+,spc ldc.l @rm+,mod * 2 ldc.l @rm+,rs * 2 ldc.l @rm+,re * 2 0100 rm 10md 0111 ldc.l @rm+,r0_bank ldc.l @rm+,r1_bank ldc.l @rm+,r2_bank ldc.l @rm+,r3_bank 0100 rm 11md 0111 ldc.l @rm+,r4_bank ldc.l @rm+,r5_bank ldc.l @rm+,r6_bank ldc.l @rm+,r7_bank 0100 rn fx 1000 shll2 rn shll8 rn shll16 rn 0100 rn fx 1001 shlr2 rn shlr8 rn shlr16 rn 0100 rm 00md 1010 lds rm,mach lds rm,macl lds rm,pr 0100 rm 01md 1010 lds rm,fpul * 1 lds rm,dsr * 2 lds rm,fpscr * 1 lds rm,a0 * 2 0100 rm 10md 1010 lds rm,x0 * 2 lds rm,x1 * 2 lds rm,y0 * 2 lds rm,y1 * 2 0100 rn fx 1011 jsr @rm tas.b @rn jmp @rm 0100 rm rm 1100 shad rm,rn 0100 rm rm 1101 shld rm,rn 0100 rm 00md 1110 ldc rm,sr ldc rm,gbr ldc rm,vbr ldc rm,ssr 0100 rm 01md 1110 ldc rm,spc ldc rm,mod * 2 ldc rm,rs * 2 ldc rm,re * 2 0100 rm 10md 1110 ldc rm,r0_bank ldc rm,r1_bank ldc rm,r2_bank ldc rm,r3_bank 0100 rm 11md 1110 ldc rm,r4_bank ldc rm,r5_bank ldc rm,r6_bank ldc rm,r7_bank 0100 rn rm 1111 mac.w @rm+,@rn+ 0101 rn rm disp mov.l @(disp:4,rm),rn 0110 rn rm 00md mov.b @rm,rn mov.w @rm,rn mov.l @rm,rn mov rm,rn 0110 rn rm 01md mov.b @rm+,rn mov.w @rm+,rn mov.l @rm+,rn not rm,rn 0110 rn rm 10md swap.b rm,rn swap.w rm,rn negc rm,rn neg rm,rn 0110 rn rm 11md extu.b rm,rn extu.w rm,rn exts.b rm,rn exts.w rm,rn 0111 rn imm add #imm:8,rn 1000 00md rn disp mov.b r0, mov.w r0, setrc #imm * 2 imm @(disp:4,rn) @(disp:4,rn) 1000 01md rm disp mov.b @(disp:4, rm),r0 mov.w @(disp:4, rm),r0 536 table a-58 operation code map (cont) instruction code fx: 0000 fx: 0001 fx: 0010 fx: 0011C1111 msb lsb md: 00 md: 01 md: 10 md: 11 1000 10md imm/disp cmp/eq #imm:8,r0 bt disp:8 bf label:8 1000 10md imm/disp ldrs @(disp,pc) * 2 bt/s disp:8 ldre @(disp,pc) * 2 bf/s label:8 1001 rn disp mov.w @(disp:8,pc),rn 1010 disp bra label:12 1011 disp bsr label:12 1100 00md imm/disp mov.b r0, @(disp:8, gbr) mov.w r0, @(disp:8, gbr) mov.l r0, @(disp:8, gbr) trapa #imm:8 1100 01md disp mov.b @(disp:8, gbr),r0 mov.w @(disp:8, gbr),r0 mov.l @(disp:8, gbr),r0 mova @(disp:8, pc),r0 1100 10md imm tst #imm:8,r0 and #imm:8,r0 xor #imm:8,r0 or #imm:8,r0 1100 11md imm tst.b #imm:8, @(r0,gbr) and.b #imm:8, @(r0,gbr) xor.b #imm:8, @(r0,gbr) or.b #imm:8, @(r0,gbr) 1101 rn disp mov.l @(disp:8,pc),rn 1110 rn imm mov #imm:8,rn 1111 rn rm 00md fadd frm,frn * 1 fsub frm,frn * 1 fmul frm,frn * 1 fdiv frm,frn * 1 1111 rn rm 01md fcmp/eq frm,frn * 1 fcmp/gt frm,frn * 1 fmov.s @(r0,rm),frm * 1 fmov.s frm,@(r0,rn) * 1 1111 rn rm 10md fmov.s @rm,frn * 1 fmov.s @rm+,frm * 1 fmov.s frm,@rn * 1 fmov.s frm,@-rn * 1 1111 rn rm 1100 fmov frm,frn * 1 1111 rn 00md 1101 fsts fpul,frn * 1 flds frn,fpul * 1 float fpul,frn * 1 ftrc frn, fpul * 1 1111 rn 01md 1101 fneg frn * 1 fabs frn * 1 fsqrt frn * 1 1111 rn 10md 1101 fldi0 frn * 1 fldi1 frn * 1 1111 rn rm 1110 fmac fr0,frm,frn * 1 1111 00 ** **** (movx.w, movy.w, dps double data transfer instructions) (sh3-dsp) 537 table a-58 operation code map (cont) instruction code fx: 0000 fx: 0001 fx: 0010 fx: 0011C1111 msb lsb md: 00 md: 01 md: 10 md: 11 1111 01 ** **** (movs.w, movs.l, dps single data transfer instructions) (sh3-dsp) 1111 10 ** **** (dps parallel processing instructions, field a: movx.w, movy.w, dps double data transfer instructions, field b: pshl to plds, dps operation instructions) (sh3-dsp) 1111 11 ** **** notes: 1. floating point arithmetic calculation instruction or cpu instruction related to the fpu. these instruction are available only on the sh-3e 2. cpu instructions to provide support for dsp functions. these instructions can only be used with the sh3-dsp. table a-59 operation code map for dsp operation instructions (b field) instruction code fx: 0000 fx: 0001 fx: 0010 fx: 0011C1111 msb lsb cc:00 cc:01 * cc:10 (dct) cc:11 (dcf) 0000 imm zzzz pshl #imm, dz 0000 1 *** **** **** 0001 imm zzzz psha #imm, dz 0001 1 *** **** **** 001 * **** **** **** 0100 eeff xxyy gguu pmuls se, sf, dg 0101 **** **** **** 0110 eeff xxyy gguu psub sx, sy, du pmuls se, sf, dg 0111 eeff xxyy gguu padd sx, sy, du pmuls se, sf, dg 1000 00cc xxyy zzzz [if cc] pshl sx, sy, dz 1000 01cc xxyy zzzz pcmp sx, sy 1000 10cc xxyy zzzz pabs sx, dz [if cc] pdec sx, dz 1000 11cc xxyy zzzz [if cc] pclr dz 1001 00cc xxyy zzzz [if cc] psha sx, sy, dz 1001 01cc xxyy zzzz [if cc] pand sx, sy, dz 1001 10cc xxyy zzzz prnd sx, dz [if cc]pinc sx, dz 538 table a-59 operation code map for dsp operation instructions (b field) (cont) instruction code fx: 0000 fx: 0001 fx: 0010 fx: 0011C1111 msb lsb cc:00 cc:01 * cc:10 (dct) cc:11 (dcf) 1001 11cc xxyy zzzz [if cc] pdmsb sy dz 1010 00cc xxyy zzzz psubc sx, sy, dz [if cc] psub sx, sy, dz 1010 01cc xxyy zzzz [if cc] pxor sx, sy, dz 1010 10cc xxyy zzzz pabs sy, dz [if cc] pdec sy, dz 1010 11cc xxyy zzzz 1011 00cc xxyy zzzz paddc sx, sy, dz [if cc] padd sx, sy, dz 1011 01cc xxyy zzzz [if cc] por sx, sy, dz 1011 10cc xxyy zzzz prnd sy, dz [if cc] pinc sy, dz 1011 11cc xxyy zzzz [if cc] pdmsb sy, dz 1100 0 *** **** **** 1100 10cc xxyy zzzz [if cc] pneg sx, dz 1100 11cc xxyy zzzz [if cc] psts mach, dz 1101 0 *** **** **** 1101 10cc xxyy zzzz [if cc] pcopy sx, dz 1101 11cc xxyy zzzz [if cc] psts macl, dz 1110 0 **** **** **** 1110 10cc xxyy zzzz [if cc] pneg sy, dz 1110 11cc xxyy zzzz [if cc] plds dz, mach 1111 0 *** **** **** 1111 10cc xxyy zzzz [if cc] pcopy sy, dz 1111 11cc xxyy zzzz [if cc] plds dz, macl note: * unconditional 539 appendix b pipeline operation and contention the sh-3/sh-3e/dsp series is designed so that basic instructions are executed in one cycle. two or more cycles are required for instructions when, for example, the branch destination address is changed by a branch instruction or when the number of cycles is increased by contention between ma and if. table b-1 gives the number of execution cycles and stages for different types of contention and their instructions. instructions without contention and instructions that require 2 or more cycles even without contention are also shown. instructions contend in the following ways: ? operations and transfers between registers are executed in one cycle with no contention. ? no contention occurs, but the instruction still requires 2 or more cycles. ? contention occurs, increasing the number of execution cycles. contention combinations are: ma contends with if ma contends with if and sometimes with memory loads as well ma contends with if and sometimes with the multiplier as well ma contends with if and sometimes with memory loads and sometimes with the multiplier 540 table b-1 instructions and their contention patterns contention cycles stages instructions none 1 3 ? transfers between registers ? operations between registers (except when a multiplier is involved) ? logical operations between registers ? shift and dynamic shift instructions ? system control alu instructions 2 3 unconditional branches 3/1 3 conditional branches 2/1 3 delayed conditional branch instructions 4 3 sleep instruction 4 5 rte instruction 5 5 ldc instruction (sr), register to sr 6/8 * 2 9 trap instruction ? ma contends with if 1 4 ? memory store instructions ? sts.l instruction (pr) ? cache instruction 1/2 * 2 4 ? bank register other than stc.l instruction 2 5 stc.l instruction (bank register) 3 6 ? memory logic operations 3/4 * 2 6 ? tas instruction 7 7 ldc.l instruction (sr), memory to sr ? ma contends with if. ? causes memory load contention. 1 5 ? memory load instructions ? lds.l instruction (pr) 1/5 * 2 5 ? ldc.l instruction 541 table b-1 instructions and their contention patterns (cont) contention cycles stages instructions ? ma contends with if. ? causes multiplier contention. 1 4 ? register to mac transfer instructions ? memory to mac transfer instructions ? mac to memory transfer instructions 1 (to 3) * 1 6 multiplication instructions (excluding pmuls) 2 (to 5) * 1 7 multiply/accumulate instructions 2 (to 5) * 1 9 double length multiply/accumulate instructions 2 (to 5) * 1 9 double length multiplication instructions ? ma contends with if. ? causes memory load, contention. ? causes multiplier contention. ? causes dsp operation contention 1 5 mac/dsp to register transfer instructions notes: 1. the normal minimum number of execution states. (the number in parentheses is the number in contention with the preceding/following instructions.) 2. in the case of the sh3-dsp, the figures on the right indicate the number of cycles and stages. 542 sh-3/sh-3e/sh-dsp programming manual publication date: 1st edition, september 1995 3rd edition, september 2000 published by: electronic devices sales & marketing group semiconductor & integrated circuits hitachi, ltd. edited by: technical documentation group hitachi kodaira semiconductor co., ltd. co py ri g ht ? hitachi, ltd., 1995. all ri g hts reserved. printed in ja p an. |
Price & Availability of HD6417729R
![]() |
|
|
All Rights Reserved © IC-ON-LINE 2003 - 2022 |
[Add Bookmark] [Contact Us] [Link exchange] [Privacy policy] |
Mirror Sites : [www.datasheet.hk]
[www.maxim4u.com] [www.ic-on-line.cn]
[www.ic-on-line.com] [www.ic-on-line.net]
[www.alldatasheet.com.cn]
[www.gdcy.com]
[www.gdcy.net] |