Part Number Hot Search : 
MB6022A B230A PMB2728 T351002 DTC144EU IRFR9020 BZX55C24 P4KE100
Product Description
Full Text Search
 

To Download 3DNOWANDMMX Datasheet File

  If you can't view the Datasheet, Please click here to try to view without PDF Reader .  
 
 


  Datasheet File OCR Text:
  amd extensions to the 3dnow! and mmx instruction sets manual tm tm
trademarks amd, the amd logo, amd athlon, and combinations thereof, and 3dnow! are trademarks, and amd-k6 is a registered trademark of advanced micro devices, inc. mmx is a trademark of intel corporation. other product names used in this publication are for identification purposes only and may be trademarks of their respective companies. ? 2000 advanced micro devices, inc. all rights reserved. the contents of this document are provided in connection with advanced micro devices, inc. (?amd?) products. amd makes no representations or warranties with respect to the accuracy or completeness of the contents of this publication and reserves the right to make changes to specifications and product descriptions at any time without notice. no license, whether express, implied, arising by estoppel or otherwise, to any intellectual property rights is granted by this publication. except as set forth in amd?s standard terms and conditions of sale, amd assumes no liability whatsoever, and disclaims any express or implied warranty, relating to its products including, but not limited to, the implied warranty of merchantability, fitness for a particular purpose, or infringement of any intellectual property right. amd?s products are not designed, intended, authorized or warranted for use as components in systems intended for surgical implant into the body, or in other applications intended to support or sustain life, or in any other applica- tion in which the failure of amd?s product could create a situation where per- sonal injury, death, or severe property or environmental damage may occur. amd reserves the right to discontinue or make changes to its products at any time without notice.
contents iii 22466d/0?march 2000 amd extensions to the 3dnow!? and mmx? instruction sets contents revision history . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii 1 extensions to the 3dnow!? and mmx? instruction sets 1 introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 extensions to the 3dnow!? instruction set 3 pf2iw. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 pfnacc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 pfpnacc. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 pi2fw. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 pswapd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3 extensions to the mmx? instruction set 11 maskmovq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 movntq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 pavgb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 pavgw . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 pextrw . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 pinsrw . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 pmaxsw . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 pmaxub . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 pminsw. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 pminub . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 pmovmskb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 pmulhuw . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 prefetchnta - prefetcht0 - prefetcht1 - prefetcht2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 psadbw . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 pshufw . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 sfence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
iv contents amd extensions to the 3dnow! ? and mmx ? instruction sets 22466d/0 ? march 2000
list of tables v 22466d/0 ? march 2000 amd extensions to the 3dnow! ? and mmx ? instruction sets list of tables table 1. 3dnow! ? technology dsp extensions . . . . . . . . . . . . . . 2 table 2. mm x ? instruction set extensions . . . . . . . . . . . . . . . . . 2 table 3. numerical range for the pf2iw instruction . . . . . . . . . 5 table 4. locality references for the prefetch instructions . . . . 30
vi list of tables amd extensions to the 3dnow! ? and mmx ? instruction sets 22466d/0 ? march 2000
revision history vii 22466d/0 ? march 2000 amd extensions to the 3dnow! ? and mmx ? instruction sets revision history date rev description august 1999 b initial public release february 2000 c  clarification of pswapd operation on page 9.  clarification of pinsrw description and operation on page 20.  clarification of pshufw description and operation on page 33.  clarification of sfence encoding on page 35. march 2000 d  clarification of pfnacc operation on page 6.  clarification of pfpnacc operation on page 7.
viii revision history amd extensions to the 3dnow! ? and mmx ? instruction sets 22466d/0 ? march 2000
22466d/0 ? march 2000 amd extensions to the 3dnow! ? and mmx ? instruction sets chapter 1 extensions to the 3dnow! ? and mmx ? instruction sets 1 1 extensions to the 3dnow! ? and mmx ? instruction sets introduction with the advent of the amd athlon ? processor, amd has taken 3dnow! ? technology to the next level of performance and functionality. the amd athlon processor adds 24 new instructions to the existing 3dnow! and mmx ? instruction sets. along with the new instructions, the amd athlon processor implements additional microarchitecture enhancements that enable more efficient operation of all these instructions, and programming may be simplified because there are fewer coding restrictions. 3dnow! technology enabled fast frame rates on high-resolution 3d rendered scenes, amazing physical modeling of real-world environments, sharp and detailed 3d imaging, smooth video playback, and theater-quality audio. the new enhanced 3dnow! technology implemented in the amd athlon processor adds streaming and digital signal processing (dsp) technologies, which allow faster, more accurate speech recognition, dvd-quality audio and video, and streaming audio and video for a rich internet experience. the instructions described in this document are extensions to the instruction sets described in the 3dnow!? technology manual, order# 21928 and the multimedia technology manual, order# 20726. the five new 3dnow! technology dsp extensions
2 extensions to the 3dnow! ? and mmx ? instruction sets chapter 1 amd extensions to the 3dnow! ? and mmx ? instruction sets 22466d/0 ? march 2000 are summarized in table 1 and fully described in chapter 2. the 19 new instructions augmenting existing mmx technology are summarized in table 2 and fully described in chapter 3. table 1. 3dnow!? technology dsp extensions operation function opcode / imm8 pf2iw packed floating-point to integer word conversion with sign extend 0fh 0fh / 1ch pfnacc packed floating-point negative accumulate 0fh 0fh / 8ah pfpnacc packed floating-point mixed positive-negative accumulate 0fh 0fh / 8eh pi2fw packed integer word to floating-point conversion 0fh 0fh / 0ch pswapd packed swap doubleword 0fh 0fh / bbh table 2. mmx? instruction set extensions operation function opcode / imm8 maskmovq streaming (cache bypass) store using byte mask 0fh f7h movntq streaming (cache bypass) store 0fh e7h pavgb packed average of unsigned byte 0fh e0h pavgw packed average of unsigned word 0fh e3h pextrw extract word into integer register 0fh c5h pinsrw insert word from integer register 0fh c4h pmaxsw packed maximum signed word 0fh eeh pmaxub packed maximum unsigned byte 0fh deh pminsw packed minimum signed word 0fh eah pminub packed minimum unsigned byte 0fh dah pmovmskb move byte mask to integer register 0fh d7h pmulhuw packed multiply high unsigned word 0fh e4h prefetchnta move data closer to the processor using the nta reference 0fh 18h 0* prefetcht0 move data closer to the processor using the t0 reference 0fh 18h 1* prefetcht1 move data closer to the processor using the t1 reference 0fh 18h 2* prefetcht2 move data closer to the processor using the t2 reference 0fh 18h 3* psadbw packed sum of absolute byte differences 0fh f6h pshufw packed shuffle word 0fh 70h sfence store fence 0fh aeh / 7h note: * the number after the opcode indicates the different prefetch modes in the modr/m byte.
22466d/0 ? march 2000 amd extensions to the 3dnow! ? and mmx ? instruction sets chapter 2 extensions to the 3dnow! ? instruction set 3 2 extensions to the 3dnow! ? instruction set this chapter describes the five new dsp instructions added to the 3dnow! instruction set first defined in the 3dnow! ? technology manual , order# 21928. the five instructions enhance the performance of communications applications, including soft modems, soft adsl, mp3, and dolby digital and surround sound processing. programmers should check bit 30 of the extended feature flags in the edx register after executing extended function 8000_0001h of the cpuid instruction. if bit 30 is set, the amd processor supports these five instructions. for more information, refer to the amd processor recognition application note , order# 20734. instruction definitions are in alphabetical order according to the instruction mnemonics.
4 extensions to the 3dnow! ? instruction set chapter 2 amd extensions to the 3dnow! ? and mmx ? instruction sets 22466d/0 ? march 2000 pf2iw mnemonic opcode / imm8 description pf2iw mmreg1, mmreg2 0fh 0fh / 1ch packed floating-point to integer word conversion with pf2iw mmreg, mem64 sign extend privilege: none registers affected: mmx flags affected: none exceptions generated: pf2iw converts a register containing single-precision floating-point operands to 16-bit signed integers using truncation. arguments outside the range representable by signed 16-bit integers are saturated to the largest and smallest 16-bit integer, depending on their sign. all results are sign-extended to 32-bits. table 3 on page 5 shows the numerical range of the pf2iw instruction. exception real virtual 8086 protected description invalid opcode (6) x x x the emulate instruction bit (em) of the control register (cr0) is set to 1. device not available (7) x x x the task switch bit (ts) of the control register (cr0) is set to 1. stack exception (12) x during instruction execution, the stack segment limit was exceeded. general protection (13) x during instruction execution, the effective address of one of the segment registers used for the operand points to an illegal memory location. segment overrun (13) x x one of the instruction data operands falls outside the address range 00000h to 0ffffh. page fault (14) x x a page fault resulted from the execution of the instruction. floating-point exception pending (16) x x x an exception is pending due to the floating-point execution unit. alignment check (17) x x an unaligned memory reference resulted from the instruction execution, and the alignment mask bit (am) of the control register (cr0) is set to 1. (in protected mode, cpl = 3.)
chapter 2 extensions to the 3dnow! ? instruction set 5 22466d/0 ? march 2000 amd extensions to the 3dnow! ? and mmx ? instruction sets ? pf2iw mmreg1, mmreg2 ? performs the following operations: if (mmreg2[31:0] >= 2 15 ) then mmreg1[31:0] = 0x0000_7fff else if (mmreg2[31:0] <= C2 15 ) then mmreg1[31:0] = 0xffff_8000 else mmreg1[31:0] = int(mmreg2[31:0]) if (mmreg2[63:32] >= 2 15 ) then mmreg1[63:32] = 0x0000_7fff else if (mmreg2[63:32] <= C2 15 ) then mmreg1[63:32] = 0xffff_8000 else mmreg1[63:32] = int(mmreg2[63:32]) ? pf2iw mmreg, mem64 ? performs the following operations: if (mem64[31:0] >= 2 15 ) then mmreg[31:0] = 0x0000_7fff else if (mem64[31:0] <= C2 15 ) then mmreg[31:0] = 0xffff_8000 else mmreg[31:0] = int(mem64[31:0]) if (mem64[63:32] >= 2 15 ) then mmreg[63:32] = 0x0000_7fff else if (mem64[63:32] <= C2 15 ) then mmreg[63:32] = 0xffff_8000 else mmreg[63:32] = int(mem64[63:32]) related instructions see the pf2id, pi2fw, and pi2fd instructions. table 3. numerical range for the pf2iw instruction source 2 source 1 and destination 00 normal, abs(source 1) <1 0 normal, ?32768 < source 1 <= ?1 round to zero (source 1) normal, 1 <= source 1< 32768 round to zero (source 1) normal, source 1 >= 32768 0x0000_7fff normal, source 1 <= ?32768 0xffff_ 8000 unsupported undefined
6 extensions to the 3dnow! ? instruction set chapter 2 amd extensions to the 3dnow! ? and mmx ? instruction sets 22466d/0 ? march 2000 pfnacc mnemonic opcode / imm8 description pfnacc mmreg1, mmreg2 0fh 0fh / 8ah packed floating-point negative accumulate pfnacc mmreg, mem64 privilege: none registers affected: mmx flags affected: none exceptions generated: pfnacc performs negative accumulation of the two doublewords of the destination operand and the source operand. pfnacc then stores the results in the low and high words of the destination operand, respectively. both operands are single-precision, floating-point operands with 24-bit significands. ? pfnacc mmreg1, mmreg2 ? performs the following operations: temp = mmreg2 mmreg1[31:0] = mmreg1[31:0] C mmreg1[63:32] mmreg1[63:32] = temp[31:0] C temp[63:32] ? pfnacc mmreg, mem64 ? performs the following operations: mmreg[31:0] = mmreg[31:0] C mmreg[63:32] mmreg[63:32] = mem64[31:0] C mem64[63:32] related instructions see the pfacc and pfpnacc instructions. exception real virtual 8086 protected description invalid opcode (6) x x x the emulate instruction bit (em) of the control register (cr0) is set to 1. device not available (7) x x x the task switch bit (ts) of the control register (cr0) is set to 1. stack exception (12) x during instruction execution, the stack segment limit was exceeded. general protection (13) x during instruction execution, the effective address of one of the segment registers used for the operand points to an illegal memory location. segment overrun (13) x x one of the instruction data operands falls outside the address range 00000h to 0ffffh. page fault (14) x x a page fault resulted from the execution of the instruction. floating-point exception pending (16) x x x an exception is pending due to the floating-point execution unit. alignment check (17) x x an unaligned memory reference resulted from the instruction execution, and the alignment mask bit (am) of the control register (cr0) is set to 1. (in protected mode, cpl = 3.)
chapter 2 extensions to the 3dnow! ? instruction set 7 22466d/0 ? march 2000 amd extensions to the 3dnow! ? and mmx ? instruction sets pfpnacc mnemonic opcode / imm8 description pfpnacc mmreg1, mmreg2 0fh 0fh / 8eh packed floating-point mixed positive-negative pfpnacc mmreg, mem64 accumulate privilege: none registers affected: mmx flags affected: none exceptions generated: pfpnacc performs mixed negative and positive accumulation of the two doublewords of the destination operand and the source operand. pfpnacc then stores the results in the low and high words of the destination operand, respectively. both operands are single-precision, floating-point operands with 24-bit significands. ? pfpnacc mmreg1, mmreg2 ? performs the following operations: temp = mmreg2 mmreg1[31:0] = mmreg1[31:0] C mmreg1[63:32] mmreg1[63:32] = temp[31:0] + temp[63:32] ? pfpnacc mmreg, mem64 ? performs the following operations: mmreg[31:0] = mmreg[31:0] C mmreg[63:32] mmreg[63:32] = mem64[31:0] + mem64[63:32] related instructions see the pfacc and pfnacc instructions. exception real virtual 8086 protected description invalid opcode (6) x x x the emulate instruction bit (em) of the control register (cr0) is set to 1. device not available (7) x x x the task switch bit (ts) of the control register (cr0) is set to 1. stack exception (12) x during instruction execution, the stack segment limit was exceeded. general protection (13) x during instruction execution, the effective address of one of the segment registers used for the operand points to an illegal memory location. segment overrun (13) x x one of the instruction data operands falls outside the address range 00000h to 0ffffh. page fault (14) x x a page fault resulted from the execution of the instruction. floating-point exception pending (16) x x x an exception is pending due to the floating-point execution unit. alignment check (17) x x an unaligned memory reference resulted from the instruction execution, and the alignment mask bit (am) of the control register (cr0) is set to 1. (in protected mode, cpl = 3.)
8 extensions to the 3dnow! ? instruction set chapter 2 amd extensions to the 3dnow! ? and mmx ? instruction sets 22466d/0 ? march 2000 pi2fw mnemonic opcode / imm8 description pi2fw mmreg1, mmreg2 0fh 0fh / 0ch packed 16-bit integer to floating-point conversion pi2fw mmreg, mem64 privilege: none registers affected: mmx flags affected: none exceptions generated pi2fw converts a register containing signed, 16-bit integers to single-precision, floating-point operands. ? pi2fw mmreg1, mmreg2 ? performs the following operations: mmreg1[31:0] = float(mmreg2[15:0]) mmreg1[63:32] = float(mmreg2[47:32]) ? pi2fw mmreg, mem64 ? performs the following operations: mmreg[31:0] = float(mem64[15:0]) mmreg[63:32] = float(mem64[47:32]) related instructions see the pi2fd, pf2iw, and pf2id instructions. exception real virtual 8086 protected description invalid opcode (6) x x x the emulate instruction bit (em) of the control register (cr0) is set to 1. device not available (7) x x x the task switch bit (ts) of the control register (cr0) is set to 1. stack exception (12) x during instruction execution, the stack segment limit was exceeded. general protection (13) x during instruction execution, the effective address of one of the segment registers used for the operand points to an illegal memory location. segment overrun (13) x x one of the instruction data operands falls outside the address range 00000h to 0ffffh. page fault (14) x x a page fault resulted from the execution of the instruction. floating-point exception pending (16) x x x an exception is pending due to the floating-point execution unit. alignment check (17) x x an unaligned memory reference resulted from the instruction execution, and the alignment mask bit (am) of the control register (cr0) is set to 1. (in protected mode, cpl = 3.)
chapter 2 extensions to the 3dnow! ? instruction set 9 22466d/0 ? march 2000 amd extensions to the 3dnow! ? and mmx ? instruction sets pswapd mnemonic opcode / imm8 description pswapd mmreg1, mmreg2 0fh 0fh / bbh packed swap doubleword pswapd mmreg, mem64 privilege: none registers affected: mmx flags affected: none exceptions generated the pswapd instruction swaps or reverses the upper and lower doublewords of the source operand. ? pswapd mmreg1, mmreg2 ? performs the following operations: temp = mmreg2 mmreg1[63:32] = temp[31:0] mmreg1[31:0] = temp[63:32] ? pswapd mmreg, mem64 ? performs the following operations: mmreg[63:32] = mem64[31:0] mmreg[31:0] = mem64[63:32] exception real virtual 8086 protected description invalid opcode (6) x x x the emulate instruction bit (em) of the control register (cr0) is set to 1. device not available (7) x x x the task switch bit (ts) of the control register (cr0) is set to 1. stack exception (12) x during instruction execution, the stack segment limit was exceeded. general protection (13) x during instruction execution, the effective address of one of the segment registers used for the operand points to an illegal memory location. segment overrun (13) x x one of the instruction data operands falls outside the address range 00000h to 0ffffh. page fault (14) x x a page fault resulted from the execution of the instruction. floating-point exception pending (16) x x x an exception is pending due to the floating-point execution unit. alignment check (17) x x an unaligned memory reference resulted from the instruction execution, and the alignment mask bit (am) of the control register (cr0) is set to 1. (in protected mode, cpl = 3.)
10 extensions to the 3dnow! ? instruction set chapter 2 amd extensions to the 3dnow! ? and mmx ? instruction sets 22466d/0 ? march 2000
22466d/0 ? march 2000 amd extensions to the 3dnow! ? and mmx ? instruction sets chapter 3 extensions to the mmx ? instruction set 11 3 extensions to the mmx ? instruction set this chapter describes 19 new instructions added to the mmx instruction set defined in the amd-k6 ? mmx ? enhanced processor multimedia technology manual , order# 20726. twelve of the instructions improve multimedia-enhanced integer math calculations used in such applications as speech recognition and high-quality video processing. seven instructions are dedicated to efficiently moving multimedia data into and out of the processor. programmers should check bit 22 of the extended feature flags in the edx register after executing extended function 8000_0001h of the cpuid instruction. if bit 22 is set, the amd processor supports these 19 instructions. see the amd processor recognition application note , order# 20734 for more information. instruction definitions are in alphabetical order according to the instruction mnemonics.
12 extensions to the mmx ? instruction set chapter 3 amd extensions to the 3dnow! ? and mmx ? instruction sets 22466d/0 ? march 2000 maskmovq mnemonic opcode description maskmovq mmreg1, mmreg2 (edi) 0fh f7h streaming (cache bypass) store using byte mask privilege: none registers affected: mmx flags affected: none exceptions generated: the maskmovq instruction conditionally stores individual bytes of an mmx register to a memory location specified by the edi register, while using the byte mask in a second mmx register. the maskmovq instruction acts as a streaming store to minimize cache pollution. it is used to store data without first reading in old data (no write allocate). ? maskmovq mmreg1, mmreg2 (edi) ? performs the following operations: memory[edi][63:56] = mmreg2[63] ? mmreg1[63:56] : memory[edi][63:56] memory[edi][55:48] = mmreg2[55] ? mmreg1[55:48] : memory[edi][55:48] memory[edi][47:40] = mmreg2[47] ? mmreg1[47:40] : memory[edi][47:40] memory[edi][39:32] = mmreg2[39] ? mmreg1[39:32] : memory[edi][39:32] memory[edi][31:24] = mmreg2[31] ? mmreg1[31:24] : memory[edi][31:24] memory[edi][23:16] = mmreg2[23] ? mmreg1[23:16] : memory[edi][23:16] memory[edi][15:8] = mmreg2[15] ? mmreg1[15:8] : memory[edi][15:8] memory[edi][7:0] = mmreg2[7] ? mmreg1[7:0] : memory[edi][7:0] exception real virtual 8086 protected description invalid opcode (6) x x x the emulate instruction bit (em) of the control register (cr0) is set to 1. device not available (7) x x x the task switch bit (ts) of the control register (cr0) is set to 1. stack exception (12) x during instruction execution, the stack segment limit was exceeded. general protection (13) x during instruction execution, the effective address of one of the segment registers used for the operand points to an illegal memory location. segment overrun (13) x x one of the instruction data operands falls outside the address range 00000h to 0ffffh. page fault (14) x x a page fault resulted from the execution of the instruction. floating-point exception pending (16) x x x an exception is pending due to the floating-point execution unit. alignment check (17) x x an unaligned memory reference resulted from the instruction execution, and the alignment mask bit (am) of the control register (cr0) is set to 1. (in protected mode, cpl = 3.)
chapter 3 extensions to the mmx ? instruction set 13 22466d/0 ? march 2000 amd extensions to the 3dnow! ? and mmx ? instruction sets movntq mnemonic opcode description movntq mem64, mmreg 0fh e7h streaming (cache bypass) store privilege: none registers affected: mmx flags affected: none exceptions generated: the movntq instruction stores individual bytes of an mmx register to memory. the movntq instruction acts as a streaming store to minimize cache pollution. it is used to store data without first reading in old data (no write allocate). ? movntq mem64, mmreg ? performs the following operations: mem64[63:0] = mmreg exception real virtual 8086 protected description invalid opcode (6) x x x the emulate instruction bit (em) of the control register (cr0) is set to 1. device not available (7) x x x the task switch bit (ts) of the control register (cr0) is set to 1. stack exception (12) x during instruction execution, the stack segment limit was exceeded. general protection (13) x during instruction execution, the effective address of one of the segment registers used for the operand points to an illegal memory location. segment overrun (13) x x one of the instruction data operands falls outside the address range 00000h to 0ffffh. page fault (14) x x a page fault resulted from the execution of the instruction. floating-point exception pending (16) x x x an exception is pending due to the floating-point execution unit. alignment check (17) x x an unaligned memory reference resulted from the instruction execution, and the alignment mask bit (am) of the control register (cr0) is set to 1. (in protected mode, cpl = 3.)
14 extensions to the mmx ? instruction set chapter 3 amd extensions to the 3dnow! ? and mmx ? instruction sets 22466d/0 ? march 2000 pavgb mnemonic opcode description pavgb mmreg1, mmreg2 0fh e0h packed average of unsigned byte pavgb mmreg, mem64 privilege: none registers affected: mmx flags affected: none exceptions generated: the pavgb instruction produces the rounded up averages of the eight unsigned 8-bit integer values in the source operand (an mmx register or a 64-bit memory location) and the eight corresponding unsigned 8-bit integer values in the destination operand (an mmx register). it does so by adding the source and destination byte values to get a 9-bit unsigned intermediate value. the intermediate value is then incremented by one and finally shifted to the right by one bit position. the eight unsigned 8-bit results are stored in the mmx register specified as the destination operand. the pavgb instruction is identical to the 3dnow! pavgusb instruction and can be used for pixel averaging in mpeg-2 motion compensation and video scaling operations. exception real virtual 8086 protected description invalid opcode (6) x x x the emulate instruction bit (em) of the control register (cr0) is set to 1. device not available (7) x x x the task switch bit (ts) of the control register (cr0) is set to 1. stack exception (12) x during instruction execution, the stack segment limit was exceeded. general protection (13) x during instruction execution, the effective address of one of the segment registers used for the operand points to an illegal memory location. segment overrun (13) x x one of the instruction data operands falls outside the address range 00000h to 0ffffh. page fault (14) x x a page fault resulted from the execution of the instruction. floating-point exception pending (16) x x x an exception is pending due to the floating-point execution unit. alignment check (17) x x an unaligned memory reference resulted from the instruction execution, and the alignment mask bit (am) of the control register (cr0) is set to 1. (in protected mode, cpl = 3.)
chapter 3 extensions to the mmx ? instruction set 15 22466d/0 ? march 2000 amd extensions to the 3dnow! ? and mmx ? instruction sets functional illustration of the pavgb instruction the following list explains the functional illustration of the pavgb instruction:  the rounded byte average of ffh and ffh is ffh.  the rounded byte average of ffh and 00h is 80h.  the rounded byte average of 01h and ffh is also 80h.  the rounded byte average of 0fh and 10h is 10h.  the rounded byte average of 00h and 01h is 01h.  the rounded byte average of 70h and 44h is 5ah.  the rounded byte average of 07h and f7h is 7fh.  the rounded byte average of 9ah and a8h is a1h. ? pavgb mmreg1, mmreg2 ? performs the following operations: mmreg1[7:0] = (mmreg1[7:0] + mmreg2[7:0] + 1) >> 1 mmreg1[15:8] = (mmreg1[15:8] + mmreg2[15:8] + 1) >> 1 mmreg1[23:16] = (mmreg1[23:16] + mmreg2[23:16] + 1) >> 1 mmreg1[31:24] = (mmreg1[31:24] + mmreg2[31:24] + 1) >> 1 mmreg1[39:32] = (mmreg1[39:32] + mmreg2[39:32] + 1) >> 1 mmreg1[47:40] = (mmreg1[47:40] + mmreg2[47:40] + 1) >> 1 mmreg1[55:48] = (mmreg1[55:48] + mmreg2[55:48] + 1) >> 1 mmreg1[63:56] = (mmreg1[63:56] + mmreg2[63:56] + 1) >> 1 ffh ffh 01h 0fh 9ah 00h 70h 07h mmreg2/mem64 mmreg1 per byte averaging ====== = = ffh 80h 80h 10h a1h 01h 5ah 7fh mmreg1 ffh 00h ffh 10h a8h 01h 44h f7h 0 63 0 63 0 63 indicates a value that was rounded-up
16 extensions to the mmx ? instruction set chapter 3 amd extensions to the 3dnow! ? and mmx ? instruction sets 22466d/0 ? march 2000 ? pavgb mmreg, mem64 ? performs the following operations: mmreg[7:0] = (mmreg[7:0] + mem64[7:0] + 1) >> 1 mmreg[15:8] = (mmreg[15:8] + mem64[15:8] + 1) >> 1 mmreg[23:16] = (mmreg[23:16] + mem64[23:16] + 1) >> 1 mmreg[31:24] = (mmreg[31:24] + mem64[31:24] + 1) >> 1 mmreg[39:32] = (mmreg[39:32] + mem64[39:32] + 1) >> 1 mmreg[47:40] = (mmreg[47:40] + mem64[47:40] + 1) >> 1 mmreg[55:48] = (mmreg[55:48] + mem64[55:48] + 1) >> 1 mmreg[63:56] = (mmreg[63:56] + mem64[63:56] + 1) >> 1 related instructions see the pavgw instruction.
chapter 3 extensions to the mmx ? instruction set 17 22466d/0 ? march 2000 amd extensions to the 3dnow! ? and mmx ? instruction sets pavgw mnemonic opcode description pavgw mmreg1, mmreg2 0fh e3h packed average of unsigned word pavgw mmreg, mem64 privilege: none registers affected: mmx flags affected: none exceptions generated: the pavgw instruction is the same as the pavgb instruction, except it operates on packed unsigned words instead. pavgw produces the rounded up averages of the four unsigned 16-bit integer values in the source operand (an mmx register or a 64-bit memory location) and the four corresponding unsigned 16-bit integer values in the destination operand (an mmx register). it does so by adding the source and destination byte values to get a 17-bit unsigned intermediate value. the intermediate value is then incremented by one and finally shifted to the right by one bit position. the four unsigned 16-bit results are stored in the mmx register specified as the destination operand. ? pavgw mmreg1, mmreg2 ? performs the following operations: mmreg1[15:0] = (mmreg1[15:0] + mmreg2[15:0] + 1) >> 1 mmreg1[31:16] = (mmreg1[31:16] + mmreg2[31:16] + 1) >> 1 mmreg1[47:32] = (mmreg1[47:32] + mmreg2[47:32] + 1) >> 1 mmreg1[63:48] = (mmreg1[63:48] + mmreg2[63:48] + 1) >> 1 exception real virtual 8086 protected description invalid opcode (6) x x x the emulate instruction bit (em) of the control register (cr0) is set to 1. device not available (7) x x x the task switch bit (ts) of the control register (cr0) is set to 1. stack exception (12) x during instruction execution, the stack segment limit was exceeded. general protection (13) x during instruction execution, the effective address of one of the segment registers used for the operand points to an illegal memory location. segment overrun (13) x x one of the instruction data operands falls outside the address range 00000h to 0ffffh. page fault (14) x x a page fault resulted from the execution of the instruction. floating-point exception pending (16) x x x an exception is pending due to the floating-point execution unit. alignment check (17) x x an unaligned memory reference resulted from the instruction execution, and the alignment mask bit (am) of the control register (cr0) is set to 1. (in protected mode, cpl = 3.)
18 extensions to the mmx ? instruction set chapter 3 amd extensions to the 3dnow! ? and mmx ? instruction sets 22466d/0 ? march 2000 ? pavgw mmreg, mem64 ? performs the following operations: mmreg[15:0] = (mmreg[15:0] + mem64[15:0] + 1) >> 1 mmreg[31:16] = (mmreg[31:16] + mem64[31:16] + 1) >> 1 mmreg[47:32] = (mmreg[47:32] + mem64[47:32] + 1) >> 1 mmreg[63:48] = (mmreg[63:48] + mem64[63:48] + 1) >> 1 related instructions see the pavgb instruction.
chapter 3 extensions to the mmx ? instruction set 19 22466d/0 ? march 2000 amd extensions to the 3dnow! ? and mmx ? instruction sets pextrw mnemonic opcode description pextrw reg32, mmreg, imm8 0fh c5h extract word into integer register privilege: none registers affected: mmx flags affected: none exceptions generated: the pextrw instruction extracts one of the four words pointed to by imm8 from an mmx register and stores that into the least significant word of a 32-bit integer register. ? pextrw reg32, mmreg, imm8 ? performs the following operations: index = imm8[1:0] * 16 reg32[31:16] = 0 reg32[15:0] = mmreg[index+15:index] related instructions see the pinsrw instruction. exception real virtual 8086 protected description invalid opcode (6) x x x the emulate instruction bit (em) of the control register (cr0) is set to 1. device not available (7) x x x the task switch bit (ts) of the control register (cr0) is set to 1. stack exception (12) x during instruction execution, the stack segment limit was exceeded. general protection (13) x during instruction execution, the effective address of one of the segment registers used for the operand points to an illegal memory location. segment overrun (13) x x one of the instruction data operands falls outside the address range 00000h to 0ffffh. page fault (14) x x a page fault resulted from the execution of the instruction. floating-point exception pending (16) x x x an exception is pending due to the floating-point execution unit. alignment check (17) x x an unaligned memory reference resulted from the instruction execution, and the alignment mask bit (am) of the control register (cr0) is set to 1. (in protected mode, cpl = 3.)
20 extensions to the mmx ? instruction set chapter 3 amd extensions to the 3dnow! ? and mmx ? instruction sets 22466d/0 ? march 2000 pinsrw mnemonic opcode description pinsrw mmreg, reg32, imm8 0fh c4h insert word from integer register pinsrw mmreg, mem16, imm8 privilege: none registers affected: mmx flags affected: none exceptions generated: the pinsrw instruction inserts the least significant word of the source operand (an integer register or a 16-bit memory location) into one of the four words of the destination operand (an mmx register). ? pinsrw mmreg, reg32, imm8 ? performs the following operations: index = imm8[1:0] * 16 temp1 = 0 temp1[index+15:index] = reg32[15:0] temp2 = mmreg temp2[index+15:index] = 0 mmreg = temp1 | temp2 ? pinsrw mmreg, mem16, imm8 ? performs the following operations: index = imm8[1:0] * 16 temp1 = 0 temp1[index+15:index] = mem16[15:0] temp2 = mmreg temp2[index+15:index] = 0 mmreg = temp1 | temp2 related instructions see the pextrw instruction. exception real virtual 8086 protected description invalid opcode (6) x x x the emulate instruction bit (em) of the control register (cr0) is set to 1. device not available (7) x x x the task switch bit (ts) of the control register (cr0) is set to 1. stack exception (12) x during instruction execution, the stack segment limit was exceeded. general protection (13) x during instruction execution, the effective address of one of the segment registers used for the operand points to an illegal memory location. segment overrun (13) x x one of the instruction data operands falls outside the address range 00000h to 0ffffh. page fault (14) x x a page fault resulted from the execution of the instruction. floating-point exception pending (16) x x x an exception is pending due to the floating-point execution unit. alignment check (17) x x an unaligned memory reference resulted from the instruction execution, and the alignment mask bit (am) of the control register (cr0) is set to 1. (in protected mode, cpl = 3.)
chapter 3 extensions to the mmx ? instruction set 21 22466d/0 ? march 2000 amd extensions to the 3dnow! ? and mmx ? instruction sets pmaxsw mnemonic opcode description pmaxsw mmreg1, mmreg2 0fh eeh packed maximum signed word pmaxsw mmreg, mem64 privilege: none registers affected: mmx flags affected: none exceptions generated: the pmaxsw instruction operates on signed 16-bit data and selects the maximum signed value between source 1 and source 2 for each of the four word positions. ? pmaxsw mmreg1, mmreg2 ? performs the following signed operations: mmreg1[15:0] = (mmreg1[15:0] > mmreg2[15:0]) ? mmreg1[15:0] : mmreg2[15:0] mmreg1[31:16] = (mmreg1[31:16] > mmreg2[31:16]) ? mmreg1[31:16] : mmreg2[31:16] mmreg1[47:32] = (mmreg1[47:32] > mmreg2[47:32]) ? mmreg1[47:32] : mmreg2[47:32] mmreg1[63:48] = (mmreg1[63:48] > mmreg2[63:48]) ? mmreg1[63:48] : mmreg2[63:48] ? pmaxsw mmreg, mem64 ? performs the following signed operations: mmreg[15:0] = (mmreg[15:0] > mem64[15:0]) ? mmreg[15:0] : mem64[15:0] mmreg[31:16] = (mmreg[31:16] > mem64[31:16]) ? mmreg[31:16] : mem64[31:16] mmreg[47:32] = (mmreg[47:32] > mem64[47:32]) ? mmreg[47:32] : mem64[47:32] mmreg[63:48] = (mmreg[63:48] > mem64[63:48]) ? mmreg[63:48] : mem64[63:48] related instructions see the pminsw instruction. exception real virtual 8086 protected description invalid opcode (6) x x x the emulate instruction bit (em) of the control register (cr0) is set to 1. device not available (7) x x x the task switch bit (ts) of the control register (cr0) is set to 1. stack exception (12) x during instruction execution, the stack segment limit was exceeded. general protection (13) x during instruction execution, the effective address of one of the segment registers used for the operand points to an illegal memory location. segment overrun (13) x x one of the instruction data operands falls outside the address range 00000h to 0ffffh. page fault (14) x x a page fault resulted from the execution of the instruction. floating-point exception pending (16) x x x an exception is pending due to the floating-point execution unit. alignment check (17) x x an unaligned memory reference resulted from the instruction execution, and the alignment mask bit (am) of the control register (cr0) is set to 1. (in protected mode, cpl = 3.)
22 extensions to the mmx ? instruction set chapter 3 amd extensions to the 3dnow! ? and mmx ? instruction sets 22466d/0 ? march 2000 pmaxub mnemonic opcode description pmaxub mmreg1, mmreg2 0fh deh packed maximum unsigned byte pmaxub mmreg, mem64 privilege: none registers affected: mmx flags affected: none exceptions generated: the pmaxub instruction operates on unsigned 8-bit data and selects the maximum unsigned value between source 1 and source 2 for each of the eight byte positions. ? pmaxub mmreg1, mmreg2 ? performs the following unsigned operations: mmreg1[7:0] = (mmreg1[7:0] > mmreg2[7:0]) ? mmreg1[7:0] : mmreg2[7:0] mmreg1[15:8] = (mmreg1[15:8] > mmreg2[15:8]) ? mmreg1[15:8] : mmreg2[15:8] mmreg1[23:16] = (mmreg1[23:16] > mmreg2[23:16]) ? mmreg1[23:16] : mmreg2[23:16] mmreg1[31:24] = (mmreg1[31:24] > mmreg2[31:24]) ? mmreg1[31:24] : mmreg2[31:24] mmreg1[39:32] = (mmreg1[39:32] > mmreg2[39:32]) ? mmreg1[39:32] : mmreg2[39:32] mmreg1[47:40] = (mmreg1[47:40] > mmreg2[47:40]) ? mmreg1[47:40] : mmreg2[47:40] mmreg1[55:48] = (mmreg1[55:48] > mmreg2[55:48]) ? mmreg1[55:48] : mmreg2[55:48] mmreg1[63:56] = (mmreg1[63:56] > mmreg2[63:56]) ? mmreg1[63:56] : mmreg2[63:56] exception real virtual 8086 protected description invalid opcode (6) x x x the emulate instruction bit (em) of the control register (cr0) is set to 1. device not available (7) x x x the task switch bit (ts) of the control register (cr0) is set to 1. stack exception (12) x during instruction execution, the stack segment limit was exceeded. general protection (13) x during instruction execution, the effective address of one of the segment registers used for the operand points to an illegal memory location. segment overrun (13) x x one of the instruction data operands falls outside the address range 00000h to 0ffffh. page fault (14) x x a page fault resulted from the execution of the instruction. floating-point exception pending (16) x x x an exception is pending due to the floating-point execution unit. alignment check (17) x x an unaligned memory reference resulted from the instruction execution, and the alignment mask bit (am) of the control register (cr0) is set to 1. (in protected mode, cpl = 3.)
chapter 3 extensions to the mmx ? instruction set 23 22466d/0 ? march 2000 amd extensions to the 3dnow! ? and mmx ? instruction sets ? pmaxub mmreg, mem64 ? performs the following unsigned operations: mmreg[7:0] = (mmreg[7:0] > mem64[7:0]) ? mmreg[7:0] : mem64[7:0] mmreg[15:8] = (mmreg[15:8] > mem64[15:8]) ? mmreg[15:8] : mem64[15:8] mmreg[23:16] = (mmreg[23:16] > mem64[23:16]) ? mmreg[23:16] : mem64[23:16] mmreg[31:24] = (mmreg[31:24] > mem64[31:24]) ? mmreg[31:24] : mem64[31:24] mmreg[39:32] = (mmreg[39:32] > mem64[39:32]) ? mmreg[39:32] : mem64[39:32] mmreg[47:40] = (mmreg[47:40] > mem64[47:40]) ? mmreg[47:40] : mem64[47:40] mmreg[55:48] = (mmreg[55:48] > mem64[55:48]) ? mmreg[55:48] : mem64[55:48] mmreg[63:56] = (mmreg[63:56] > mem64[63:56]) ? mmreg[63:56] : mem64[63:56] related instructions see the pminub instruction.
24 extensions to the mmx ? instruction set chapter 3 amd extensions to the 3dnow! ? and mmx ? instruction sets 22466d/0 ? march 2000 pminsw mnemonic opcode description pminsw mmreg1, mmreg2 0fh eah packed minimum signed word pminsw mmreg, mem64 privilege: none registers affected: mmx flags affected: none exceptions generated: the pminsw instruction operates on signed 16-bit data and selects the minimum arithmetic value between source 1 and source 2 for each word. ? pminsw mmreg1, mmreg2 ? performs the following signed operations: mmreg1[15:0] = (mmreg1[15:0] <= mmreg2[15:0]) ? mmreg1[15:0] : mmreg2[15:0] mmreg1[31:16] = (mmreg1[31:16] <= mmreg2[31:16]) ? mmreg1[31:16] : mmreg2[31:16] mmreg1[47:32] = (mmreg1[47:32] <= mmreg2[47:32]) ? mmreg1[47:32] : mmreg2[47:32] mmreg1[63:48] = (mmreg1[63:48] <= mmreg2[63:48]) ? mmreg1[63:48] : mmreg2[63:48] ? pminsw mmreg, mem64 ? performs the following signed operations: mmreg[15:0] = (mmreg[15:0] <= mem64[15:0]) ? mmreg[15:0] : mem64[15:0] mmreg[31:16] = (mmreg[31:16] <= mem64[31:16]) ? mmreg[31:16] : mem64[31:16] mmreg[47:32] = (mmreg[47:32] <= mem64[47:32]) ? mmreg[47:32] : mem64[47:32] mmreg[63:48] = (mmreg[63:48] <= mem64[63:48]) ? mmreg[63:48] : mem64[63:48] related instructions see the pmaxsw instruction. exception real virtual 8086 protected description invalid opcode (6) x x x the emulate instruction bit (em) of the control register (cr0) is set to 1. device not available (7) x x x the task switch bit (ts) of the control register (cr0) is set to 1. stack exception (12) x during instruction execution, the stack segment limit was exceeded. general protection (13) x during instruction execution, the effective address of one of the segment registers used for the operand points to an illegal memory location. segment overrun (13) x x one of the instruction data operands falls outside the address range 00000h to 0ffffh. page fault (14) x x a page fault resulted from the execution of the instruction. floating-point exception pending (16) x x x an exception is pending due to the floating-point execution unit. alignment check (17) x x an unaligned memory reference resulted from the instruction execution, and the alignment mask bit (am) of the control register (cr0) is set to 1. (in protected mode, cpl = 3.)
chapter 3 extensions to the mmx ? instruction set 25 22466d/0 ? march 2000 amd extensions to the 3dnow! ? and mmx ? instruction sets pminub mnemonic opcode description pminub mmreg1, mmreg2 0fh dah packed minimum unsigned byte pminub mmreg, mem64 privilege: none registers affected: mmx flags affected: none exceptions generated: the pminub instruction operates on unsigned 8-bit data and selects the minimum value between source 1 and source 2 for each byte position. ? pminub mmreg1, mmreg2 ? performs the following unsigned operations: mmreg1[7:0] = (mmreg1[7:0] <= mmreg2[7:0]) ? mmreg1[7:0] : mmreg2[7:0] mmreg1[15:8] = (mmreg1[15:8] <= mmreg2[15:8]) ? mmreg1[15:8] : mmreg2[15:8] mmreg1[23:16] = (mmreg1[23:16] <= mmreg2[23:16]) ? mmreg1[23:16] : mmreg2[23:16] mmreg1[31:24] = (mmreg1[31:24] <= mmreg2[31:24]) ? mmreg1[31:24] : mmreg2[31:24] mmreg1[39:32] = (mmreg1[39:32] <= mmreg2[39:32]) ? mmreg1[39:32] : mmreg2[39:32] mmreg1[47:40] = (mmreg1[47:40] <= mmreg2[47:40]) ? mmreg1[47:40] : mmreg2[47:40] mmreg1[55:48] = (mmreg1[55:48] <= mmreg2[55:48]) ? mmreg1[55:48] : mmreg2[55:48] mmreg1[63:56] = (mmreg1[63:56] <= mmreg2[63:56]) ? mmreg1[63:56] : mmreg2[63:56] exception real virtual 8086 protected description invalid opcode (6) x x x the emulate instruction bit (em) of the control register (cr0) is set to 1. device not available (7) x x x the task switch bit (ts) of the control register (cr0) is set to 1. stack exception (12) x during instruction execution, the stack segment limit was exceeded. general protection (13) x during instruction execution, the effective address of one of the segment registers used for the operand points to an illegal memory location. segment overrun (13) x x one of the instruction data operands falls outside the address range 00000h to 0ffffh. page fault (14) x x a page fault resulted from the execution of the instruction. floating-point exception pending (16) x x x an exception is pending due to the floating-point execution unit. alignment check (17) x x an unaligned memory reference resulted from the instruction execution, and the alignment mask bit (am) of the control register (cr0) is set to 1. (in protected mode, cpl = 3.)
26 extensions to the mmx ? instruction set chapter 3 amd extensions to the 3dnow! ? and mmx ? instruction sets 22466d/0 ? march 2000 ? pminub mmreg1, mem64 ? performs the following unsigned operations: mmreg[7:0] = (mmreg[7:0] <= mem64[7:0]) ? mmreg[7:0] : mem64[7:0] mmreg[15:8] = (mmreg[15:8] <= mem64[15:8]) ? mmreg[15:8] : mem64[15:8] mmreg[23:16] = (mmreg[23:16] <= mem64[23:16]) ? mmreg[23:16] : mem64[23:16] mmreg[31:24] = (mmreg[31:24] <= mem64[31:24]) ? mmreg[31:24] : mem64[31:24] mmreg[39:32] = (mmreg[39:32] <= mem64[39:32]) ? mmreg[39:32] : mem64[39:32] mmreg[47:40] = (mmreg[47:40] <= mem64[47:40]) ? mmreg[47:40] : mem64[47:40] mmreg[55:48] = (mmreg[55:48] <= mem64[55:48]) ? mmreg[55:48] : mem64[55:48] mmreg[63:56] = (mmreg[63:56] <= mem64[63:56]) ? mmreg[63:56] : mem64[63:56] related instructions see the pmaxub instruction.
chapter 3 extensions to the mmx ? instruction set 27 22466d/0 ? march 2000 amd extensions to the 3dnow! ? and mmx ? instruction sets pmovmskb mnemonic opcode description pmovmskb reg32, mmreg 0fh d7h move byte mask to integer register privilege: none registers affected: mmx flags affected: none exceptions generated: the pmovmskb instruction selects the most significant bit from each byte position of an mmx register and collapses all eight bits into the least significant byte of an integer register. ? pmovmskb reg32, mmreg ? performs the following operations: reg32[31:8] = 0 reg32[7] = mmreg[63] reg32[6] = mmreg[55] reg32[5] = mmreg[47] reg32[4] = mmreg[39] reg32[3] = mmreg[31] reg32[2] = mmreg[23] reg32[1] = mmreg[15] reg32[0] = mmreg[7] exception real virtual 8086 protected description invalid opcode (6) x x x the emulate instruction bit (em) of the control register (cr0) is set to 1. device not available (7) x x x the task switch bit (ts) of the control register (cr0) is set to 1. stack exception (12) x during instruction execution, the stack segment limit was exceeded. general protection (13) x during instruction execution, the effective address of one of the segment registers used for the operand points to an illegal memory location. segment overrun (13) x x one of the instruction data operands falls outside the address range 00000h to 0ffffh. page fault (14) x x a page fault resulted from the execution of the instruction. floating-point exception pending (16) x x x an exception is pending due to the floating-point execution unit. alignment check (17) x x an unaligned memory reference resulted from the instruction execution, and the alignment mask bit (am) of the control register (cr0) is set to 1. (in protected mode, cpl = 3.)
28 extensions to the mmx ? instruction set chapter 3 amd extensions to the 3dnow! ? and mmx ? instruction sets 22466d/0 ? march 2000 pmulhuw mnemonic opcode description pmulhuw mmreg1, mmreg2 0fh e4h packed multiply high unsigned word pmulhuw mmreg, mem64 privilege: none registers affected: mmx flags affected: none exceptions generated: the pmulhuw instruction multiplies the four unsigned words in the source operand with the four unsigned words in the destination operand. the upper 16 bits of the 32-bit intermediate result is placed into the destination operand. ? pmulhuw mmreg1, mmreg2 ? performs the following operations: temp1 = (mmreg1[15:0] * mmreg2[15:0]) temp2 = (mmreg1[31:16] * mmreg2[31:16]) temp3 = (mmreg1[47:32] * mmreg2[47:32]) temp4 = (mmreg1[63:48] * mmreg2[63:48]) mmreg1[15:0] = temp1[31:16] mmreg1[31:16] = temp2[31:16] mmreg1[47:32] = temp3[31:16] mmreg1[63:48] = temp4[31:16] exception real virtual 8086 protected description invalid opcode (6) x x x the emulate instruction bit (em) of the control register (cr0) is set to 1. device not available (7) x x x the task switch bit (ts) of the control register (cr0) is set to 1. stack exception (12) x during instruction execution, the stack segment limit was exceeded. general protection (13) x during instruction execution, the effective address of one of the segment registers used for the operand points to an illegal memory location. segment overrun (13) x x one of the instruction data operands falls outside the address range 00000h to 0ffffh. page fault (14) x x a page fault resulted from the execution of the instruction. floating-point exception pending (16) x x x an exception is pending due to the floating-point execution unit. alignment check (17) x x an unaligned memory reference resulted from the instruction execution, and the alignment mask bit (am) of the control register (cr0) is set to 1. (in protected mode, cpl = 3.)
chapter 3 extensions to the mmx ? instruction set 29 22466d/0 ? march 2000 amd extensions to the 3dnow! ? and mmx ? instruction sets ? pmulhuw mmreg, mem64 ? performs the following operations: temp1 = (mmreg1[15:0] * mem642[15:0]) temp2 = (mmreg1[31:16] * mem64[31:16]) temp3 = (mmreg1[47:32] * mem64[47:32]) temp4 = (mmreg1[63:48] * mem64[63:48]) mmreg1[15:0] = temp1[31:16] mmreg1[31:16] = temp2[31:16] mmreg1[47:32] = temp3[31:16] mmreg1[63:48] = temp4[31:16]
30 extensions to the mmx ? instruction set chapter 3 amd extensions to the 3dnow! ? and mmx ? instruction sets 22466d/0 ? march 2000 prefetchnta - prefetcht0 - prefetcht1 - prefetcht2 mnemonic opcode / modr/m description prefetchnta mem8 0fh 18h / 0 move data closer to the processor using the nta reference. prefetcht0 mem8 0fh 18h / 1 move data closer to the processor using the t0 reference. prefetcht1 mem8 0fh 18h / 2 move data closer to the processor using the t1 reference. prefetcht2 mem8 0fh 18h / 3 move data closer to the processor using the t2 reference. privilege: none registers affected: none flags affected: none exceptions generated: none the prefetch instruction brings a cache line into the processor cache level(s) specified by a locality reference. the address of the prefetched cache line is specified by the mem8 value. the prefetch instruction loads a cache line even if the mem8 address is not aligned with the start of the line. if the cache line is already contained in a cache level that is lower than the locality reference or a memory fault is detected, then no bus cycle is initiated and the instruction is treated as a nop. the operation of the prefetch instructions is processor implementation dependent. the instructions can be ignored or changed by a processor implementation, though they will not change program behavior. the cache line size is also implementation dependent having a minimum size of 32 bytes. bits 5:3 of the modr/m byte indicate the cache locality references. table 4. locality references for the prefetch instructions locality reference description nta move specified data into processor with minimal l1/l2 cache pollution. t0 move specified data into all cache levels. t1 move specified data into all cache levels except 0th level cache. t2 move specified data into all cache levels except 0th and 1st level caches. note: a 0th level cache is implementation dependent.
chapter 3 extensions to the mmx ? instruction set 31 22466d/0 ? march 2000 amd extensions to the 3dnow! ? and mmx ? instruction sets psadbw mnemonic opcode description psadbw mmreg1, mmreg2 0fh f6h packed sum of absolute byte differences psadbw mmreg, mem64 privilege: none registers affected: mmx flags affected: none exceptions generated: the psadbw instruction is the sum of the absolute value of the differences between each byte position of source 1 and source 2. ? psadbw mmreg1, mmreg2 ? performs the following operations: mmreg1[63:16] = 0 mmreg1[15:0] = abs(mmreg1[7:0] C mmreg2[7:0]) + abs(mmreg1[15:8] C mmreg2[15:8]) + abs(mmreg1[23:16] C mmreg2[23:16]) + abs(mmreg1[31:24] C mmreg2[31:24]) + abs(mmreg1[39:32] C mmreg2[39:32]) + abs(mmreg1[47:40] C mmreg2[47:40]) + abs(mmreg1[55:48] C mmreg2[55:48]) + abs(mmreg1[63:56] C mmreg2[63:56]) exception real virtual 8086 protected description invalid opcode (6) x x x the emulate instruction bit (em) of the control register (cr0) is set to 1. device not available (7) x x x the task switch bit (ts) of the control register (cr0) is set to 1. stack exception (12) x during instruction execution, the stack segment limit was exceeded. general protection (13) x during instruction execution, the effective address of one of the segment registers used for the operand points to an illegal memory location. segment overrun (13) x x one of the instruction data operands falls outside the address range 00000h to 0ffffh. page fault (14) x x a page fault resulted from the execution of the instruction. floating-point exception pending (16) x x x an exception is pending due to the floating-point execution unit. alignment check (17) x x an unaligned memory reference resulted from the instruction execution, and the alignment mask bit (am) of the control register (cr0) is set to 1. (in protected mode, cpl = 3.)
32 extensions to the mmx ? instruction set chapter 3 amd extensions to the 3dnow! ? and mmx ? instruction sets 22466d/0 ? march 2000 ? psadbw mmreg, mem64 ? performs the following operations: mmreg[63:16] = 0 mmreg[15:0] = abs(mmreg[7:0] C mem64[7:0]) + abs(mmreg[15:8] C mem64[15:8]) + abs(mmreg[23:16] C mem64[23:16]) + abs(mmreg[31:24] C mem64[31:24]) + abs(mmreg[39:32] C mem64[39:32]) + abs(mmreg[47:40] C mem64[47:40]) + abs(mmreg[55:48] C mem64[55:48]) + abs(mmreg[63:56] C mem64[63:56]) related instructions see the pavgusb instruction.
chapter 3 extensions to the mmx ? instruction set 33 22466d/0 ? march 2000 amd extensions to the 3dnow! ? and mmx ? instruction sets pshufw mnemonic opcode description pshufw mmreg1, mmreg2, imm8 0fh 70h packed shuffle word pshufw mmreg, mem64, imm8 privilege: none registers affected: mmx flags affected: none exceptions generated: the pshufw instruction selects from the four words of the source operand (an mmx register or a 64-bit memory location) in one of 256 possible ways as defined by an immediate byte. ? pshufw mmreg1, mmreg2, imm8 ? performs the following operations: index3 = imm8[7:6] * 16 index2 = imm8[5:4] * 16 index1 = imm8[3:2] * 16 index0 = imm8[1:0] * 16 temp = mmreg2 mmreg1[63:48] = temp[index3+15:index3] mmreg1[47:32] = temp[index2+15:index2] mmreg1[31:16] = temp[index1+15:index1] mmreg1[15:0] = temp[index0+15:index0] exception real virtual 8086 protected description invalid opcode (6) x x x the emulate instruction bit (em) of the control register (cr0) is set to 1. device not available (7) x x x the task switch bit (ts) of the control register (cr0) is set to 1. stack exception (12) x during instruction execution, the stack segment limit was exceeded. general protection (13) x during instruction execution, the effective address of one of the segment registers used for the operand points to an illegal memory location. segment overrun (13) x x one of the instruction data operands falls outside the address range 00000h to 0ffffh. page fault (14) x x a page fault resulted from the execution of the instruction. floating-point exception pending (16) x x x an exception is pending due to the floating-point execution unit. alignment check (17) x x an unaligned memory reference resulted from the instruction execution, and the alignment mask bit (am) of the control register (cr0) is set to 1. (in protected mode, cpl = 3.)
34 extensions to the mmx ? instruction set chapter 3 amd extensions to the 3dnow! ? and mmx ? instruction sets 22466d/0 ? march 2000 ? pshufw mmreg, mem64, imm8 ? performs the following operations: index3 = imm8[7:6] * 16 index2 = imm8[5:4] * 16 index1 = imm8[3:2] * 16 index0 = imm8[1:0] * 16 mmreg[63:48] = mem64[index3+15:index3] mmreg[47:32] = mem64[index2+15:index2] mmreg[31:16] = mem64[index1+15:index1] mmreg[15:0] = mem64[index0+15:index0]
chapter 3 extensions to the mmx ? instruction set 35 22466d/0 ? march 2000 amd extensions to the 3dnow! ? and mmx ? instruction sets sfence mnemonic opcode / imm8 description sfence 0fh aeh / 7h store fence privilege: none registers affected: none flags affected: none exceptions generated: none in a weakly ordered system, hardware is allowed to reorder reads and writes between the processor and memory. for example, writeback stores can complete ahead of write-combining stores. sfence provides a mechanism to force a strong ordering between routines that produce weakly ordered results (such as wc memory types). the sfence instruction makes all previous writes globally visible to any preceding store. for example, an sfence instruction will force a newer write-back store to wait until all older streaming stores or write-combining stores are completed. note: software should encode the sfence instruction with a modr/m byte of 0xf8. all other possible modr/m encodings are reserved for future use.
36 extensions to the mmx ? instruction set chapter 3 amd extensions to the 3dnow! ? and mmx ? instruction sets 22466d/0 ? march 2000


▲Up To Search▲   

 
Price & Availability of 3DNOWANDMMX

All Rights Reserved © IC-ON-LINE 2003 - 2022  

[Add Bookmark] [Contact Us] [Link exchange] [Privacy policy]
Mirror Sites :  [www.datasheet.hk]   [www.maxim4u.com]  [www.ic-on-line.cn] [www.ic-on-line.com] [www.ic-on-line.net] [www.alldatasheet.com.cn] [www.gdcy.com]  [www.gdcy.net]


 . . . . .
  We use cookies to deliver the best possible web experience and assist with our advertising efforts. By continuing to use this site, you consent to the use of cookies. For more information on cookies, please take a look at our Privacy Policy. X