CSAPP : Arch Lab 解题报告_archlab-程序员宅基地

技术标签: CSAPP  

准备

官网下好解压。

载入tar文件,运用 tar xvf archlab-handout.tar将文件解压。里面包含README, Makefile, sim.tar, archlab.ps, archlab.pdf, and simguide.pdf.

于是你可能有以下问题

如果出现can not locate 就是镜像源不行。可以去网上搜个阿里云的。然后再把/etc/apt/sources.list把里面的网址都换了。
换后注意sudo update
/usr/bin/ld: cannot find -lfl
sudo apt-get install flex

/usr/bin/ld: cannot find -ltk
/usr/bin/ld: cannot find -ltcl

sudo apt-get install tk8.5
sudo apt-get install tcl8.5

同时把自己的实验文件Makefile修改了。


修改格式如下:

# Comment this out if you don't have Tcl/Tk on your system
 
#GUIMODE=-DHAS_GUI
 
# Modify the following line so that gcc can find the libtcl.so and
# libtk.so libraries on your system. You may need to use the -L option
# to tell gcc which directory to look in. Comment this out if you
# don't have Tcl/Tk.
 
TKLIBS=-L/usr/lib -ltk8.5 -ltcl8.5   /*改成这样*/
 
# Modify the following line so that gcc can find the tcl.h and tk.h
# header files on your system. Comment this out if you don't have
# Tcl/Tk.
 
TKINC=-isystem /usr/include/tcl8.5
 
最后重新make clean ;make就可以了

若之后出现同样问题照做。后面有个实验需要把Makefile里面的含GUI的一行给删除掉

TESTA

手写Y86汇编。要实现的函数在example.c中。本想着偷懒直接反汇编把得到的反汇编文件改成Y86。发现反汇编出来的代码更麻烦。所以还是手写吧。

对着书上第四章的一个大例子模仿出来

自己新建一个文件 vim sum_list.ys

三者的结果均在%rax中,若没有%rax的变化即代码存在bug。%rax均是cba

相关编译运行代码如下
unix > ./yas A-sum.ys
unix > ./yis A-sum.yo

# sum_list.ys example.c
  #Excution begins at address 0
           .pos 0
           irmovq stack, %rsp
           call main
           halt
   # Sample linked list
           .align 8
           ele1:
           .quad 0x00a
           .quad ele2
          ele2:
           .quad 0x0b0
           .quad ele3
           ele3:
           .quad 0xc00
           .quad 0
   main:
           irmovq ele1,%rdi
           call sum_list
           ret
  sum_list:
          xorq %rax,%rax #rax=0
          jmp test
  loop:
          mrmovq (%rdi),%r10
          addq %r10,%rax
         mrmovq 8(%rdi),%rdi 
   test:   
     andq %rdi,%rdi
          jne loop
          ret
  #Stack starts here and grows to lower addresses
          .pos 0x100
  stack: 

这里直接写递归,保存寄存器到栈里去然后递归 

# sum_list.ys example.c
#Excution begins at address 0
	.pos 0
	irmovq stack, %rsp
	call main
	halt
# Sample linked list
	.align 8
        ele1:
        .quad 0x00a
        .quad ele2
        ele2:
        .quad 0x0b0
        .quad ele3
        ele3:
        .quad 0xc00
        .quad 0
main:
	irmovq ele1,%rdi
	call sum_list
	ret
sum_list:
	xorq %rax,%rax #rax=0
	andq %rdi,%rdi	
	je return 
	mrmovq (%rdi),%r10 #long val =ls-val
	pushq %r10
	mrmovq 8(%rdi),%rdi		
	call sum_list
	popq %rbx
	addq %rbx,%rax
	ret 
return:
	ret	
#Stack starts here and grows to lower addresses
	.pos 0x1000
stack:							

 

#Excution begins at address 0
	.pos 0
	irmovq stack, %rsp
	call main
	halt
.align 8
#Source block
src:
	.quad 0x00a
	.quad 0x0b0
	.quad 0xc00
# Destination block
dest:
	.quad 0x111
	.quad 0x222
	.quad 0x333

main:
	xorq %rax,%rax #long result=0
	irmovq src,%rdi
	irmovq dest,%rsi	
	irmovq $1,%r9
	irmovq $3,%r8
	irmovq $8,%r11
	andq %r8,%r8
	jmp test
loop:
	mrmovq (%rdi),%rcx
	addq %r11,%rdi 
	
	rmmovq %rcx,(%rsi)
	addq %r11,%rsi
	
	xorq %rcx,%rax
	subq %r9,%r8	
test:	
	jne loop	
	ret
#Stack starts here and grows to lower addresses
	.pos 0x100
stack:							

TESTB

根据第四章流水线的讲解,结合opq和irmovq的表格来写。

得出的iaddq格式如下

阶段 iaddq V,rB
取指 icode:ifun <-- M1[PC]
     rA:rB <-- M1[PC+1]
     valC <-- M8[PC+2]
     valP <-- PC+10
译码 valB <-- R[rB]
执行 valE <-- valB+valC
     set CC
访存 None
写回 R[rB] <-- valE
更新 PC <-- valP

我们在sim/seq/seq-full.hcl里添加"IIADDQ",这里就要结合书上的知识判每个顺序过程

#/* $begin seq-all-hcl */
####################################################################
#  HCL Description of Control for Single Cycle Y86-64 Processor SEQ   #
#  Copyright (C) Randal E. Bryant, David R. O'Hallaron, 2010       #
####################################################################

## Your task is to implement the iaddq instruction
## The file contains a declaration of the icodes
## for iaddq (IIADDQ)
## Your job is to add the rest of the logic to make it work

####################################################################
#    C Include's.  Don't alter these                               #
####################################################################

quote '#include <stdio.h>'
quote '#include "isa.h"'
quote '#include "sim.h"'
quote 'int sim_main(int argc, char *argv[]);'
quote 'word_t gen_pc(){return 0;}'
quote 'int main(int argc, char *argv[])'
quote '  {plusmode=0;return sim_main(argc,argv);}'

####################################################################
#    Declarations.  Do not change/remove/delete any of these       #
####################################################################

##### Symbolic representation of Y86-64 Instruction Codes #############
wordsig INOP 	'I_NOP'
wordsig IHALT	'I_HALT'
wordsig IRRMOVQ	'I_RRMOVQ'
wordsig IIRMOVQ	'I_IRMOVQ'
wordsig IRMMOVQ	'I_RMMOVQ'
wordsig IMRMOVQ	'I_MRMOVQ'
wordsig IOPQ	'I_ALU'
wordsig IJXX	'I_JMP'
wordsig ICALL	'I_CALL'
wordsig IRET	'I_RET'
wordsig IPUSHQ	'I_PUSHQ'
wordsig IPOPQ	'I_POPQ'
# Instruction code for iaddq instruction
wordsig IIADDQ	'I_IADDQ'

##### Symbolic represenations of Y86-64 function codes                  #####
wordsig FNONE    'F_NONE'        # Default function code

##### Symbolic representation of Y86-64 Registers referenced explicitly #####
wordsig RRSP     'REG_RSP'    	# Stack Pointer
wordsig RNONE    'REG_NONE'   	# Special value indicating "no register"

##### ALU Functions referenced explicitly                            #####
wordsig ALUADD	'A_ADD'		# ALU should add its arguments

##### Possible instruction status values                             #####
wordsig SAOK	'STAT_AOK'	# Normal execution
wordsig SADR	'STAT_ADR'	# Invalid memory address
wordsig SINS	'STAT_INS'	# Invalid instruction
wordsig SHLT	'STAT_HLT'	# Halt instruction encountered

##### Signals that can be referenced by control logic ####################

##### Fetch stage inputs		#####
wordsig pc 'pc'				# Program counter
##### Fetch stage computations		#####
wordsig imem_icode 'imem_icode'		# icode field from instruction memory
wordsig imem_ifun  'imem_ifun' 		# ifun field from instruction memory
wordsig icode	  'icode'		# Instruction control code
wordsig ifun	  'ifun'		# Instruction function
wordsig rA	  'ra'			# rA field from instruction
wordsig rB	  'rb'			# rB field from instruction
wordsig valC	  'valc'		# Constant from instruction
wordsig valP	  'valp'		# Address of following instruction
boolsig imem_error 'imem_error'		# Error signal from instruction memory
boolsig instr_valid 'instr_valid'	# Is fetched instruction valid?

##### Decode stage computations		#####
wordsig valA	'vala'			# Value from register A port
wordsig valB	'valb'			# Value from register B port

##### Execute stage computations	#####
wordsig valE	'vale'			# Value computed by ALU
boolsig Cnd	'cond'			# Branch test

##### Memory stage computations		#####
wordsig valM	'valm'			# Value read from memory
boolsig dmem_error 'dmem_error'		# Error signal from data memory


####################################################################
#    Control Signal Definitions.                                   #
####################################################################

################ Fetch Stage     ###################################

# Determine instruction code
word icode = [
	imem_error: INOP;
	1: imem_icode;		# Default: get from instruction memory
];

# Determine instruction function
word ifun = [
	imem_error: FNONE;
	1: imem_ifun;		# Default: get from instruction memory
];

bool instr_valid = icode in 
	{ INOP, IHALT, IRRMOVQ, IIRMOVQ, IRMMOVQ, IMRMOVQ,
	       IOPQ, IJXX, ICALL, IRET, IPUSHQ, IPOPQ ,IIADDQ };

# Does fetched instruction require a regid byte?
bool need_regids =
	icode in { IRRMOVQ, IOPQ, IPUSHQ, IPOPQ, 
		     IIRMOVQ, IRMMOVQ, IMRMOVQ,IIADDQ };

# Does fetched instruction require a constant word?
bool need_valC =
	icode in { IIRMOVQ, IRMMOVQ, IMRMOVQ, IJXX, ICALL,IIADDQ };

################ Decode Stage    ###################################

## What register should be used as the A source?
word srcA = [
	icode in { IRRMOVQ, IRMMOVQ, IOPQ, IPUSHQ  } : rA;
	icode in { IPOPQ, IRET } : RRSP;
	1 : RNONE; # Don't need register
];

## What register should be used as the B source?
word srcB = [
	icode in { IOPQ, IRMMOVQ, IMRMOVQ,IIADDQ  } : rB;
	icode in { IPUSHQ, IPOPQ, ICALL, IRET } : RRSP;
	1 : RNONE;  # Don't need register
];

## What register should be used as the E destination?
word dstE = [
	icode in { IRRMOVQ } && Cnd : rB;
	icode in { IIRMOVQ, IOPQ,IIADDQ } : rB;
	icode in { IPUSHQ, IPOPQ, ICALL, IRET } : RRSP;
	1 : RNONE;  # Don't write any register
];

## What register should be used as the M destination?
word dstM = [
	icode in { IMRMOVQ, IPOPQ } : rA;
	1 : RNONE;  # Don't write any register
];

################ Execute Stage   ###################################

## Select input A to ALU
word aluA = [
	icode in { IRRMOVQ, IOPQ } : valA;
	icode in { IIRMOVQ, IRMMOVQ, IMRMOVQ,IIADDQ } : valC;
	icode in { ICALL, IPUSHQ } : -8;
	icode in { IRET, IPOPQ } : 8;
	# Other instructions don't need ALU
];

## Select input B to ALU
word aluB = [
	icode in { IRMMOVQ, IMRMOVQ, IOPQ, ICALL, 
		      IPUSHQ, IRET, IPOPQ,IIADDQ } : valB;
	icode in { IRRMOVQ, IIRMOVQ } : 0;
	# Other instructions don't need ALU
];

## Set the ALU function
word alufun = [
	icode == IOPQ : ifun;
	1 : ALUADD;
];

## Should the condition codes be updated?
bool set_cc = icode in { IOPQ,IIADDQ };

################ Memory Stage    ###################################

## Set read control signal
bool mem_read = icode in { IMRMOVQ, IPOPQ, IRET };

## Set write control signal
bool mem_write = icode in { IRMMOVQ, IPUSHQ, ICALL };

## Select memory address
word mem_addr = [
	icode in { IRMMOVQ, IPUSHQ, ICALL, IMRMOVQ } : valE;
	icode in { IPOPQ, IRET } : valA;
	# Other instructions don't need address
];

## Select memory input data
word mem_data = [
	# Value from register
	icode in { IRMMOVQ, IPUSHQ } : valA;
	# Return PC
	icode == ICALL : valP;
	# Default: Don't write anything
];

## Determine instruction status
word Stat = [
	imem_error || dmem_error : SADR;
	!instr_valid: SINS;
	icode == IHALT : SHLT;
	1 : SAOK;
];

################ Program Counter Update ############################

## What address should instruction be fetched at

word new_pc = [
	# Call.  Use instruction constant
	icode == ICALL : valC;
	# Taken branch.  Use instruction constant
	icode == IJXX && Cnd : valC;
	# Completion of RET instruction.  Use value from stack
	icode == IRET : valM;
	# Default: Use incremented PC
	1 : valP;
];
#/* $end seq-all-hcl */

 TESTC

最后这个lab,做的有点无语。首先把上面的iaddq指令放到这次的hcl里面。修改pipe-full.hcl

#/* $begin pipe-all-hcl */
####################################################################
#    HCL Description of Control for Pipelined Y86-64 Processor     #
#    Copyright (C) Randal E. Bryant, David R. O'Hallaron, 2014     #
####################################################################

## Your task is to implement the iaddq instruction
## The file contains a declaration of the icodes
## for iaddq (IIADDQ)
## Your job is to add the rest of the logic to make it work

####################################################################
#    C Include's.  Don't alter these                               #
####################################################################

quote '#include <stdio.h>'
quote '#include "isa.h"'
quote '#include "pipeline.h"'
quote '#include "stages.h"'
quote '#include "sim.h"'
quote 'int sim_main(int argc, char *argv[]);'
quote 'int main(int argc, char *argv[]){return sim_main(argc,argv);}'

####################################################################
#    Declarations.  Do not change/remove/delete any of these       #
####################################################################

##### Symbolic representation of Y86-64 Instruction Codes #############
wordsig INOP 	'I_NOP'
wordsig IHALT	'I_HALT'
wordsig IRRMOVQ	'I_RRMOVQ'
wordsig IIRMOVQ	'I_IRMOVQ'
wordsig IRMMOVQ	'I_RMMOVQ'
wordsig IMRMOVQ	'I_MRMOVQ'
wordsig IOPQ	'I_ALU'
wordsig IJXX	'I_JMP'
wordsig ICALL	'I_CALL'
wordsig IRET	'I_RET'
wordsig IPUSHQ	'I_PUSHQ'
wordsig IPOPQ	'I_POPQ'
# Instruction code for iaddq instruction
wordsig IIADDQ	'I_IADDQ'

##### Symbolic represenations of Y86-64 function codes            #####
wordsig FNONE    'F_NONE'        # Default function code

##### Symbolic representation of Y86-64 Registers referenced      #####
wordsig RRSP     'REG_RSP'    	     # Stack Pointer
wordsig RNONE    'REG_NONE'   	     # Special value indicating "no register"

##### ALU Functions referenced explicitly ##########################
wordsig ALUADD	'A_ADD'		     # ALU should add its arguments

##### Possible instruction status values                       #####
wordsig SBUB	'STAT_BUB'	# Bubble in stage
wordsig SAOK	'STAT_AOK'	# Normal execution
wordsig SADR	'STAT_ADR'	# Invalid memory address
wordsig SINS	'STAT_INS'	# Invalid instruction
wordsig SHLT	'STAT_HLT'	# Halt instruction encountered

##### Signals that can be referenced by control logic ##############

##### Pipeline Register F ##########################################

wordsig F_predPC 'pc_curr->pc'	     # Predicted value of PC

##### Intermediate Values in Fetch Stage ###########################

wordsig imem_icode  'imem_icode'      # icode field from instruction memory
wordsig imem_ifun   'imem_ifun'       # ifun  field from instruction memory
wordsig f_icode	'if_id_next->icode'  # (Possibly modified) instruction code
wordsig f_ifun	'if_id_next->ifun'   # Fetched instruction function
wordsig f_valC	'if_id_next->valc'   # Constant data of fetched instruction
wordsig f_valP	'if_id_next->valp'   # Address of following instruction
boolsig imem_error 'imem_error'	     # Error signal from instruction memory
boolsig instr_valid 'instr_valid'    # Is fetched instruction valid?

##### Pipeline Register D ##########################################
wordsig D_icode 'if_id_curr->icode'   # Instruction code
wordsig D_rA 'if_id_curr->ra'	     # rA field from instruction
wordsig D_rB 'if_id_curr->rb'	     # rB field from instruction
wordsig D_valP 'if_id_curr->valp'     # Incremented PC

##### Intermediate Values in Decode Stage  #########################

wordsig d_srcA	 'id_ex_next->srca'  # srcA from decoded instruction
wordsig d_srcB	 'id_ex_next->srcb'  # srcB from decoded instruction
wordsig d_rvalA 'd_regvala'	     # valA read from register file
wordsig d_rvalB 'd_regvalb'	     # valB read from register file

##### Pipeline Register E ##########################################
wordsig E_icode 'id_ex_curr->icode'   # Instruction code
wordsig E_ifun  'id_ex_curr->ifun'    # Instruction function
wordsig E_valC  'id_ex_curr->valc'    # Constant data
wordsig E_srcA  'id_ex_curr->srca'    # Source A register ID
wordsig E_valA  'id_ex_curr->vala'    # Source A value
wordsig E_srcB  'id_ex_curr->srcb'    # Source B register ID
wordsig E_valB  'id_ex_curr->valb'    # Source B value
wordsig E_dstE 'id_ex_curr->deste'    # Destination E register ID
wordsig E_dstM 'id_ex_curr->destm'    # Destination M register ID

##### Intermediate Values in Execute Stage #########################
wordsig e_valE 'ex_mem_next->vale'	# valE generated by ALU
boolsig e_Cnd 'ex_mem_next->takebranch' # Does condition hold?
wordsig e_dstE 'ex_mem_next->deste'      # dstE (possibly modified to be RNONE)

##### Pipeline Register M                  #########################
wordsig M_stat 'ex_mem_curr->status'     # Instruction status
wordsig M_icode 'ex_mem_curr->icode'	# Instruction code
wordsig M_ifun  'ex_mem_curr->ifun'	# Instruction function
wordsig M_valA  'ex_mem_curr->vala'      # Source A value
wordsig M_dstE 'ex_mem_curr->deste'	# Destination E register ID
wordsig M_valE  'ex_mem_curr->vale'      # ALU E value
wordsig M_dstM 'ex_mem_curr->destm'	# Destination M register ID
boolsig M_Cnd 'ex_mem_curr->takebranch'	# Condition flag
boolsig dmem_error 'dmem_error'	        # Error signal from instruction memory

##### Intermediate Values in Memory Stage ##########################
wordsig m_valM 'mem_wb_next->valm'	# valM generated by memory
wordsig m_stat 'mem_wb_next->status'	# stat (possibly modified to be SADR)

##### Pipeline Register W ##########################################
wordsig W_stat 'mem_wb_curr->status'     # Instruction status
wordsig W_icode 'mem_wb_curr->icode'	# Instruction code
wordsig W_dstE 'mem_wb_curr->deste'	# Destination E register ID
wordsig W_valE  'mem_wb_curr->vale'      # ALU E value
wordsig W_dstM 'mem_wb_curr->destm'	# Destination M register ID
wordsig W_valM  'mem_wb_curr->valm'	# Memory M value

####################################################################
#    Control Signal Definitions.                                   #
####################################################################

################ Fetch Stage     ###################################

## What address should instruction be fetched at
word f_pc = [
	# Mispredicted branch.  Fetch at incremented PC
	M_icode == IJXX && !M_Cnd : M_valA;
	# Completion of RET instruction
	W_icode == IRET : W_valM;
	# Default: Use predicted value of PC
	1 : F_predPC;
];

## Determine icode of fetched instruction
word f_icode = [
	imem_error : INOP;
	1: imem_icode;
];

# Determine ifun
word f_ifun = [
	imem_error : FNONE;
	1: imem_ifun;
];

# Is instruction valid?
bool instr_valid = f_icode in 
	{ INOP, IHALT, IRRMOVQ, IIRMOVQ, IRMMOVQ, IMRMOVQ,
	  IOPQ, IJXX, ICALL, IRET, IPUSHQ, IPOPQ,IIADDQ };

# Determine status code for fetched instruction
word f_stat = [
	imem_error: SADR;
	!instr_valid : SINS;
	f_icode == IHALT : SHLT;
	1 : SAOK;
];

# Does fetched instruction require a regid byte?
bool need_regids =
	f_icode in { IRRMOVQ, IOPQ, IPUSHQ, IPOPQ, 
		     IIRMOVQ, IRMMOVQ, IMRMOVQ,IIADDQ };
# Does fetched instruction require a constant word? 
bool need_valC =
	f_icode in { IIRMOVQ, IRMMOVQ, IMRMOVQ, IJXX, ICALL,IIADDQ };

# Predict next value of PC
word f_predPC = [
	f_icode in { IJXX, ICALL } : f_valC;
	1 : f_valP;
];

################ Decode Stage ######################################


## What register should be used as the A source?
word d_srcA = [
	D_icode in { IRRMOVQ, IRMMOVQ, IOPQ, IPUSHQ  } : D_rA;
	D_icode in { IPOPQ, IRET } : RRSP;
	1 : RNONE; # Don't need register
];

## What register should be used as the B source?
word d_srcB = [
	D_icode in { IOPQ, IRMMOVQ, IMRMOVQ,IIADDQ  } : D_rB;
	D_icode in { IPUSHQ, IPOPQ, ICALL, IRET } : RRSP;
	1 : RNONE;  # Don't need register
];

## What register should be used as the E destination?
word d_dstE = [
	D_icode in { IRRMOVQ, IIRMOVQ, IOPQ,IIADDQ} : D_rB;
	D_icode in { IPUSHQ, IPOPQ, ICALL, IRET } : RRSP;
	1 : RNONE;  # Don't write any register
];

## What register should be used as the M destination?
word d_dstM = [
	D_icode in { IMRMOVQ, IPOPQ } : D_rA;
	1 : RNONE;  # Don't write any register
];

## What should be the A value?
## Forward into decode stage for valA
word d_valA = [
	D_icode in { ICALL, IJXX } : D_valP; # Use incremented PC
	d_srcA == e_dstE : e_valE;    # Forward valE from execute
	d_srcA == M_dstM : m_valM;    # Forward valM from memory
	d_srcA == M_dstE : M_valE;    # Forward valE from memory
	d_srcA == W_dstM : W_valM;    # Forward valM from write back
	d_srcA == W_dstE : W_valE;    # Forward valE from write back
	1 : d_rvalA;  # Use value read from register file
];

word d_valB = [
	d_srcB == e_dstE : e_valE;    # Forward valE from execute
	d_srcB == M_dstM : m_valM;    # Forward valM from memory
	d_srcB == M_dstE : M_valE;    # Forward valE from memory
	d_srcB == W_dstM : W_valM;    # Forward valM from write back
	d_srcB == W_dstE : W_valE;    # Forward valE from write back
	1 : d_rvalB;  # Use value read from register file
];

################ Execute Stage #####################################

## Select input A to ALU
word aluA = [
	E_icode in { IRRMOVQ, IOPQ } : E_valA;
	E_icode in { IIRMOVQ, IRMMOVQ, IMRMOVQ,IIADDQ } : E_valC;
	E_icode in { ICALL, IPUSHQ } : -8;
	E_icode in { IRET, IPOPQ } : 8;
	# Other instructions don't need ALU
];

## Select input B to ALU
word aluB = [
	E_icode in { IRMMOVQ, IMRMOVQ, IOPQ, ICALL, 
		     IPUSHQ, IRET, IPOPQ,IIADDQ } : E_valB;
	E_icode in { IRRMOVQ, IIRMOVQ } : 0;
	# Other instructions don't need ALU
];

## Set the ALU function
word alufun = [
	E_icode == IOPQ : E_ifun;
	1 : ALUADD;
];

## Should the condition codes be updated?
bool set_cc = E_icode in {IIADDQ,IOPQ} &&
	# State changes only during normal operation
	!m_stat in { SADR, SINS, SHLT } && !W_stat in { SADR, SINS, SHLT };

## Generate valA in execute stage
word e_valA = E_valA;    # Pass valA through stage

## Set dstE to RNONE in event of not-taken conditional move
word e_dstE = [
	E_icode == IRRMOVQ && !e_Cnd : RNONE;
	1 : E_dstE;
];

################ Memory Stage ######################################

## Select memory address
word mem_addr = [
	M_icode in { IRMMOVQ, IPUSHQ, ICALL, IMRMOVQ } : M_valE;
	M_icode in { IPOPQ, IRET } : M_valA;
	# Other instructions don't need address
];

## Set read control signal
bool mem_read = M_icode in { IMRMOVQ, IPOPQ, IRET };

## Set write control signal
bool mem_write = M_icode in { IRMMOVQ, IPUSHQ, ICALL };

#/* $begin pipe-m_stat-hcl */
## Update the status
word m_stat = [
	dmem_error : SADR;
	1 : M_stat;
];
#/* $end pipe-m_stat-hcl */

## Set E port register ID
word w_dstE = W_dstE;

## Set E port value
word w_valE = W_valE;

## Set M port register ID
word w_dstM = W_dstM;

## Set M port value
word w_valM = W_valM;

## Update processor status
word Stat = [
	W_stat == SBUB : SAOK;
	1 : W_stat;
];

################ Pipeline Register Control #########################

# Should I stall or inject a bubble into Pipeline Register F?
# At most one of these can be true.
bool F_bubble = 0;
bool F_stall =
	# Conditions for a load/use hazard
	E_icode in { IMRMOVQ, IPOPQ } &&
	 E_dstM in { d_srcA, d_srcB } ||
	# Stalling at fetch while ret passes through pipeline
	IRET in { D_icode, E_icode, M_icode };

# Should I stall or inject a bubble into Pipeline Register D?
# At most one of these can be true.
bool D_stall = 
	# Conditions for a load/use hazard
	E_icode in { IMRMOVQ, IPOPQ } &&
	 E_dstM in { d_srcA, d_srcB };

bool D_bubble =
	# Mispredicted branch
	(E_icode == IJXX && !e_Cnd) ||
	# Stalling at fetch while ret passes through pipeline
	# but not condition for a load/use hazard
	!(E_icode in { IMRMOVQ, IPOPQ } && E_dstM in { d_srcA, d_srcB }) &&
	  IRET in { D_icode, E_icode, M_icode };

# Should I stall or inject a bubble into Pipeline Register E?
# At most one of these can be true.
bool E_stall = 0;
bool E_bubble =
	# Mispredicted branch
	(E_icode == IJXX && !e_Cnd) ||
	# Conditions for a load/use hazard
	E_icode in { IMRMOVQ, IPOPQ } &&
	 E_dstM in { d_srcA, d_srcB};

# Should I stall or inject a bubble into Pipeline Register M?
# At most one of these can be true.
bool M_stall = 0;
# Start injecting bubbles as soon as exception passes through memory stage
bool M_bubble = m_stat in { SADR, SINS, SHLT } || W_stat in { SADR, SINS, SHLT };

# Should I stall or inject a bubble into Pipeline Register W?
bool W_stall = W_stat in { SADR, SINS, SHLT };
bool W_bubble = 0;
#/* $end pipe-all-hcl */

测试编译:

make VERSION=full
./correctness.pl #结果是否正确
./benchmark.pl #得出分数

开始尝试六路展开,然后把条件跳转换成条件转移。

测完了之后喜提0分。
因为条件转移要的指令更多。

0分代码

#/* $begin ncopy-ys */
##################################################################
# ncopy.ys - Copy a src block of len words to dst.
# Return the number of positive words (>0) contained in src.
#
# Include your name and ID here.
#
# Describe how and why you modified the baseline code.
#
##################################################################
# Do not modify this portion
# Function prologue.
# %rdi = src, %rsi = dst, %rdx = len
ncopy:

##################################################################
# You can modify this portion
	# Loop header
	xorq %rax,%rax		# count = 0;
Loop:
	iaddq $-6,%rdx
	jl Remain		# 先判断剩下的长度是否<6,进入特判;不然循环做	
	iaddq $6,%rdx		# 把长度变回来,最后再减掉 
	
	mrmovq (%rdi),%r8    		
	mrmovq 8(%rdi),%r9   	

        	
	

	 rrmovq %rax,%r13 
        	 iaddq $1,%rax	 
       	 andq %r8,%r8
       	 cmovle %r13,%rax
       	 rmmovq %r8,(%rsi)

       	 iaddq $8,%rsi	
      	  jmp S2	
S2:
       
       	 rrmovq %rax,%r13
        	iaddq $1,%rax
        	andq %r9,%r9
        	cmovle %r13,%rax
        	rmmovq %r9,(%rsi)
        	iaddq $8,%rsi
        	jmp S3		
S3:
	mrmovq 16(%rdi),%r10
	mrmovq 24(%rdi),%r11
	rrmovq %rax,%r13
        	iaddq $1,%rax
        	andq %r10,%r10
        	cmovle %r13,%rax
        	rmmovq %r10,(%rsi)
	iaddq $8,%rsi
        	jmp S4		
S4:
	rrmovq %rax,%r13
        	iaddq $1,%rax
        	andq %r11,%r11
        	cmovle %r13,%rax
        	rmmovq %r11,(%rsi)
        	iaddq $8,%rsi
        	jmp S5		
S5:
	mrmovq 32(%rdi),%r12
	mrmovq 40(%rdi),%r14

	rrmovq %rax,%r13
        	iaddq $1,%rax
        	andq %r12,%r12
       	cmovle %r13,%rax
        	rmmovq %r12,(%rsi)
        	iaddq $8,%rsi
        	jmp S6		
S6:
	rrmovq %rax,%r13
        	iaddq $1,%rax
        	andq %r14,%r14
        	cmovle %r13,%rax
        	rmmovq %r14,(%rsi)
	
	iaddq $8,%rsi
	iaddq $-6,%rdx
	iaddq $48,%rdi
	jmp Loop
#####################################################################
Solveremain:
	mrmovq (%rdi),%r8	
	mrmovq 8(%rdi),%r9

	rrmovq %rax,%r13	
	iaddq $1,%rax		#条件转移
	andq %r8,%r8
	cmovle %r13,%rax
	rmmovq %r8,(%rsi)	

	iaddq $8,%rsi	
	jmp Solver1	
Solver1:
	iaddq $-1,%rdx
	jl Done
	rrmovq %rax,%r13
	iaddq $1,%rax
	andq %r9,%r9
	cmovle %r13,%rax
	rmmovq %r9,(%rsi)
	
	iaddq $8,%rsi
        	jmp Solver2		
Solver2:
	mrmovq 16(%rdi),%r10
	mrmovq 24(%rdi),%r11
	
	iaddq $-1,%rdx
	jl Done
	rrmovq %rax,%r13
	iaddq $1,%rax
	andq %r10,%r10
	cmovle %r13,%rax
	rmmovq %r10,(%rsi)
	
	iaddq $8,%rsi
	jmp Solver3
Solver3:
	iaddq $-1,%rdx
	jl Done
	rrmovq %rax,%r13
	iaddq $1,%rax
	andq %r11,%r11
	cmovle %r13,%rax
	rmmovq %r11,(%rsi)
	
	iaddq $8,%rsi
	jmp Solver4
Solver4:
	mrmovq 32(%rdi),%r12	
	iaddq $-1,%rdx
	jl Done
	rrmovq %rax,%r13
	iaddq $1,%rax
	andq %r12,%r12
	cmovle %r13,%rax
	rmmovq %r12,(%rsi)
	iaddq $8,%rsi
	jmp Done 
Remain:
	iaddq $5,%rdx		#如果此时为负数说明原来就是0 此时rdx存的是下标0~4
	jl Done
	jmp Solveremain		#跳转到处理剩余函数的部分
Done:
	ret
##################################################################
# Keep the following label at the end of your function
End:
#/* $end ncopy-ys */

 然后出去吃了个饭回来看了看别人的博客。得到了启发:直接进行六路展开,>=6的不断跑循环直到<6为止。对于>=6的直接if跳就完事。<6的部分直接对半判断然后开整。<6的部分处理得不够好。只拿了40.分

##################################################################
# You can modify this portion
#/* $begin ncopy-ys */
##################################################################
# ncopy.ys - Copy a src block of len words to dst.
# Return the number of positive words (>0) contained in src.
#
# Include your name and ID here.
#
# Describe how and why you modified the baseline code.
#
##################################################################
# Do not modify this portion
# Function prologue.
# %rdi = src , %rsi = dst, %rdx = len
ncopy:

##################################################################
# You can modify this portion
	xorq %rax,%rax
	jmp StartLoop1

Loop6:

	mrmovq (%rdi),%r8
	mrmovq 8(%rdi),%r9
	mrmovq 16(%rdi),%r10
	mrmovq 24(%rdi),%r11
	mrmovq 32(%rdi),%r12
	mrmovq 40(%rdi),%r13

	rmmovq %r8,(%rsi)

	andq %r8,%r8
	jle L61
	iaddq $1,%rax
L61:	
	rmmovq %r9,8(%rsi)
	andq %r9,%r9
	jle L62
	iaddq $1,%rax
L62:
	rmmovq %r10,16(%rsi)
	andq %r10,%r10
	jle L63
	iaddq $1,%rax
L63:	
	rmmovq %r11,24(%rsi)
	andq %r11,%r11
	jle L64
	iaddq $1,%rax
L64:
	rmmovq %r12,32(%rsi)
	andq %r12,%r12
	jle L65
	iaddq $1,%rax
L65:	
	rmmovq %r13,40(%rsi)
	andq %r13,%r13
	jle L66
	iaddq $1,%rax
L66:
	iaddq $48,%rdi
	iaddq $48,%rsi
StartLoop1:
	iaddq $-6,%rdx
	jge Loop6
	
	iaddq $6,%rdx
	jmp StartLoop2
Loop2:
	iaddq $3,%rdx
	iaddq $-1,%rdx
	jl  Done
	rmmovq %r8,(%rsi)
	andq %r8,%r8
	jle L21
	iaddq $1,%rax
L21:	
	iaddq $-1,%rdx
	jl Done
	rmmovq %r9,8(%rsi)
	andq %r9,%r9
	jle L22
	iaddq $1,%rax
L22:
	iaddq $-1,%rdx
	jl Done
	rmmovq %r10,16(%rsi)
	andq %r10,%r10
	jle Done
	iaddq $1,%rax
	jmp Done

Loop3:
	iaddq $-1,%rdx
	rmmovq %r8,(%rsi)
	andq %r8,%r8
	jle L31
	iaddq $1,%rax
L31:
	iaddq $-1,%rdx
	rmmovq %r9,8(%rsi)
	andq %r9,%r9
	jle L32
	iaddq $1,%rax
L32:
	iaddq $-1,%rdx
	rmmovq %r10,16(%rsi)
	andq %r10,%r10
	jle L33
	iaddq $1,%rax
L33:
	iaddq $-1,%rdx
	jl Done
	rmmovq %r11,24(%rsi)
	andq %r11,%r11
	jle L34
	iaddq $1,%rax
L34:
	iaddq $-1,%rdx
	jl Done
	rmmovq %r12,32(%rsi)
	andq %r12,%r12
	jle L35
	iaddq $1,%rax
L35:
	iaddq $-1,%rdx
	jl Done
	rmmovq %r13,40(%rsi)
	andq %r13,%r13
	jle Done
	iaddq $1,%rax
	jmp Done
StartLoop2:
	
	mrmovq (%rdi),%r8
	mrmovq 8(%rdi),%r9
	mrmovq 16(%rdi),%r10
	
	iaddq $-3,%rdx
	jle Loop2 
	iaddq $3,%rdx	
	mrmovq 24(%rdi),%r11
	mrmovq 32(%rdi),%r12
	jmp Loop3
##################################################################
# Do not modify the following section of code
# Function epilogue.
Done:
	ret
##################################################################
# Keep the following label at the end of your function
End:
#/* $end ncopy-ys */

再去学习了其他人的博客。

由CSAPP4.5.8节,对流水线的优化有:

  • 加载/使用冒险: 即在一条从内存读出一个值的指令和一条使用这个值的指令间,流水线必会暂停一个周期
  • 预测错误分支: 在分支逻辑发现不该选择分支之前,分支目标处几条指令已经进入流水线了.必须取消这些指令,并从跳转指令后面的那条指令开始取指.可以通过重新架构硬件更改处理器预测逻辑,或者写代码时迎合处理器预测逻辑解决.
     

还有CSAPP第五章的循环展开+提高并行性。(个人认为这个要求的代码主要也只能继续优化这两点)

于是我们看到若直接把rmmovq 放mrmovq (%rdi),%r8的下面。会有一个加载/冒险冲突。我们中间拿其他可用的代码代替即可。

    mrmovq (%rdi),%r8
	mrmovq 8(%rdi),%r9
	
	rmmovq %r8,(%rsi)	

对于下面<6的部分,我们对其二路展开。喜提47.3

##################################################################
# You can modify this portion
#/* $begin ncopy-ys */
##################################################################
# ncopy.ys - Copy a src block of len words to dst.
# Return the number of positive words (>0) contained in src.
#
# Include your name and ID here.
#
# Describe how and why you modified the baseline code.
#
##################################################################
# Do not modify this portion
# Function prologue.
# %rdi = src , %rsi = dst, %rdx = len
ncopy:

##################################################################
# You can modify this portion
	xorq %rax,%rax
	jmp Start1

Loop6:

	mrmovq (%rdi),%r8
	mrmovq 8(%rdi),%r9
	
	rmmovq %r8,(%rsi)	

	andq %r8,%r8
	jle L61
	iaddq $1,%rax
L61:	
	mrmovq 16(%rdi),%r10
	rmmovq %r9,8(%rsi)
	andq %r9,%r9
	jle L62
	iaddq $1,%rax
L62:
	mrmovq 24(%rdi),%r11
	rmmovq %r10,16(%rsi)
	andq %r10,%r10
	jle L63
	iaddq $1,%rax
L63:	
	mrmovq 32(%rdi),%r12
	rmmovq %r11,24(%rsi)
	andq %r11,%r11
	jle L64
	iaddq $1,%rax
L64:
	mrmovq 40(%rdi),%r13
	rmmovq %r12,32(%rsi)
	andq %r12,%r12
	jle L65
	iaddq $1,%rax
L65:	
	rmmovq %r13,40(%rsi)
	andq %r13,%r13
	jle L66
	iaddq $1,%rax
L66:
	iaddq $48,%rdi
	iaddq $48,%rsi
Start1:
	iaddq $-6,%rdx
	jge Loop6
	
	iaddq $6,%rdx
	jmp Start2
Loop2:
	mrmovq (%rdi),%r8
	mrmovq 8(%rdi),%r9
	rmmovq %r8,(%rsi)
	andq %r8,%r8
	jle L21
	iaddq $1,%rax
L21:	
	rmmovq %r9,8(%rsi)
	andq %r9,%r9
	jle L22
	iaddq $1,%rax
L22:
	iaddq $16,%rdi
	iaddq $16,%rsi
Start2:
	iaddq $-2,%rdx	#二路循环
	jge Loop2 
	
	mrmovq (%rdi),%r8
	iaddq $1,%rdx
	jne Done
	rmmovq %r8,(%rsi)
	andq %r8,%r8
	jle Done
	iaddq $1,%rax
	
##################################################################
# Do not modify the following section of code
# Function epilogue.
Done:
	ret
##################################################################
# Keep the following label at the end of your function
End:
#/* $end ncopy-ys */

后记

看到知乎的那篇文章说按照他的代码再六路展开能上50分.实测那份代码四路能跑48分。

但是有一篇16年的文章四路跑了60分我就比较迷惑了。怀疑是数据水了。copy过来那份代码改了一定的编译问题之后还是无法编译。

暂时先这样了 

参考文章1

参考文章2

参考文章3

版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
本文链接:https://blog.csdn.net/zstuyyyyccccbbbb/article/details/119429533

智能推荐

分布式光纤传感器的全球与中国市场2022-2028年:技术、参与者、趋势、市场规模及占有率研究报告_预计2026年中国分布式传感器市场规模有多大-程序员宅基地

文章浏览阅读3.2k次。本文研究全球与中国市场分布式光纤传感器的发展现状及未来发展趋势,分别从生产和消费的角度分析分布式光纤传感器的主要生产地区、主要消费地区以及主要的生产商。重点分析全球与中国市场的主要厂商产品特点、产品规格、不同规格产品的价格、产量、产值及全球和中国市场主要生产商的市场份额。主要生产商包括:FISO TechnologiesBrugg KabelSensor HighwayOmnisensAFL GlobalQinetiQ GroupLockheed MartinOSENSA Innovati_预计2026年中国分布式传感器市场规模有多大

07_08 常用组合逻辑电路结构——为IC设计的延时估计铺垫_基4布斯算法代码-程序员宅基地

文章浏览阅读1.1k次,点赞2次,收藏12次。常用组合逻辑电路结构——为IC设计的延时估计铺垫学习目的:估计模块间的delay,确保写的代码的timing 综合能给到多少HZ,以满足需求!_基4布斯算法代码

OpenAI Manager助手(基于SpringBoot和Vue)_chatgpt网页版-程序员宅基地

文章浏览阅读3.3k次,点赞3次,收藏5次。OpenAI Manager助手(基于SpringBoot和Vue)_chatgpt网页版

关于美国计算机奥赛USACO,你想知道的都在这_usaco可以多次提交吗-程序员宅基地

文章浏览阅读2.2k次。USACO自1992年举办,到目前为止已经举办了27届,目的是为了帮助美国信息学国家队选拔IOI的队员,目前逐渐发展为全球热门的线上赛事,成为美国大学申请条件下,含金量相当高的官方竞赛。USACO的比赛成绩可以助力计算机专业留学,越来越多的学生进入了康奈尔,麻省理工,普林斯顿,哈佛和耶鲁等大学,这些同学的共同点是他们都参加了美国计算机科学竞赛(USACO),并且取得过非常好的成绩。适合参赛人群USACO适合国内在读学生有意向申请美国大学的或者想锻炼自己编程能力的同学,高三学生也可以参加12月的第_usaco可以多次提交吗

MySQL存储过程和自定义函数_mysql自定义函数和存储过程-程序员宅基地

文章浏览阅读394次。1.1 存储程序1.2 创建存储过程1.3 创建自定义函数1.3.1 示例1.4 自定义函数和存储过程的区别1.5 变量的使用1.6 定义条件和处理程序1.6.1 定义条件1.6.1.1 示例1.6.2 定义处理程序1.6.2.1 示例1.7 光标的使用1.7.1 声明光标1.7.2 打开光标1.7.3 使用光标1.7.4 关闭光标1.8 流程控制的使用1.8.1 IF语句1.8.2 CASE语句1.8.3 LOOP语句1.8.4 LEAVE语句1.8.5 ITERATE语句1.8.6 REPEAT语句。_mysql自定义函数和存储过程

半导体基础知识与PN结_本征半导体电流为0-程序员宅基地

文章浏览阅读188次。半导体二极管——集成电路最小组成单元。_本征半导体电流为0

随便推点

【Unity3d Shader】水面和岩浆效果_unity 岩浆shader-程序员宅基地

文章浏览阅读2.8k次,点赞3次,收藏18次。游戏水面特效实现方式太多。咱们这边介绍的是一最简单的UV动画(无顶点位移),整个mesh由4个顶点构成。实现了水面效果(左图),不动代码稍微修改下参数和贴图可以实现岩浆效果(右图)。有要思路是1,uv按时间去做正弦波移动2,在1的基础上加个凹凸图混合uv3,在1、2的基础上加个水流方向4,加上对雾效的支持,如没必要请自行删除雾效代码(把包含fog的几行代码删除)S..._unity 岩浆shader

广义线性模型——Logistic回归模型(1)_广义线性回归模型-程序员宅基地

文章浏览阅读5k次。广义线性模型是线性模型的扩展,它通过连接函数建立响应变量的数学期望值与线性组合的预测变量之间的关系。广义线性模型拟合的形式为:其中g(μY)是条件均值的函数(称为连接函数)。另外,你可放松Y为正态分布的假设,改为Y 服从指数分布族中的一种分布即可。设定好连接函数和概率分布后,便可以通过最大似然估计的多次迭代推导出各参数值。在大部分情况下,线性模型就可以通过一系列连续型或类别型预测变量来预测正态分布的响应变量的工作。但是,有时候我们要进行非正态因变量的分析,例如:(1)类别型.._广义线性回归模型

HTML+CSS大作业 环境网页设计与实现(垃圾分类) web前端开发技术 web课程设计 网页规划与设计_垃圾分类网页设计目标怎么写-程序员宅基地

文章浏览阅读69次。环境保护、 保护地球、 校园环保、垃圾分类、绿色家园、等网站的设计与制作。 总结了一些学生网页制作的经验:一般的网页需要融入以下知识点:div+css布局、浮动、定位、高级css、表格、表单及验证、js轮播图、音频 视频 Flash的应用、ul li、下拉导航栏、鼠标划过效果等知识点,网页的风格主题也很全面:如爱好、风景、校园、美食、动漫、游戏、咖啡、音乐、家乡、电影、名人、商城以及个人主页等主题,学生、新手可参考下方页面的布局和设计和HTML源码(有用点赞△) 一套A+的网_垃圾分类网页设计目标怎么写

C# .Net 发布后,把dll全部放在一个文件夹中,让软件目录更整洁_.net dll 全局目录-程序员宅基地

文章浏览阅读614次,点赞7次,收藏11次。之前找到一个修改 exe 中 DLL地址 的方法, 不太好使,虽然能正确启动, 但无法改变 exe 的工作目录,这就影响了.Net 中很多获取 exe 执行目录来拼接的地址 ( 相对路径 ),比如 wwwroot 和 代码中相对目录还有一些复制到目录的普通文件 等等,它们的地址都会指向原来 exe 的目录, 而不是自定义的 “lib” 目录,根本原因就是没有修改 exe 的工作目录这次来搞一个启动程序,把 .net 的所有东西都放在一个文件夹,在文件夹同级的目录制作一个 exe._.net dll 全局目录

BRIEF特征点描述算法_breif description calculation 特征点-程序员宅基地

文章浏览阅读1.5k次。本文为转载,原博客地址:http://blog.csdn.net/hujingshuang/article/details/46910259简介 BRIEF是2010年的一篇名为《BRIEF:Binary Robust Independent Elementary Features》的文章中提出,BRIEF是对已检测到的特征点进行描述,它是一种二进制编码的描述子,摈弃了利用区域灰度..._breif description calculation 特征点

房屋租赁管理系统的设计和实现,SpringBoot计算机毕业设计论文_基于spring boot的房屋租赁系统论文-程序员宅基地

文章浏览阅读4.1k次,点赞21次,收藏79次。本文是《基于SpringBoot的房屋租赁管理系统》的配套原创说明文档,可以给应届毕业生提供格式撰写参考,也可以给开发类似系统的朋友们提供功能业务设计思路。_基于spring boot的房屋租赁系统论文